Re: [DISCUSS] Improving reviews

2016-12-05 Thread Chris Hillery
It's always been my opinion that code reviews are a very nice-to-have, but
not more than that. The real value in proposing changes for review comes
from the automated testing that can be performed at that stage. I think
we'd be better served overall by shoring up and expanding our automated
testing rather than spending time discussing and implementing non-technical
process.

The main benefits of code reviews are catching large-scale design errors
and spreading code knowledge. You can't really have the former until you
already have the latter - if only one person really understands an area,
nobody else will be able to catch design errors in that area. That's
clearly a risky place to be, but IMHO at least it's not a problem that can
be solved through rules. It requires a cultural shift from the team to make
spreading code knowledge an actual priority, rather than someone everyone
wants but nobody has time or interest to achieve.

If we as a team don't have the drive to do that, then we should accept that
about ourselves and move on. You'll always do best spending time on
enhancing the strengths of a team, not fighting against the things they
don't excel at. I'm also not trying to make any kind of value judgment here
- software development is always about tradeoffs and compromise, risk
versus goals. Any time taken to shift focus towards spreading code
knowledge will by necessity pull from other parts of the development
process, and the upshot may well not be an overall improvement in
functionality or quality.

Ceej
aka Chris Hillery

On Mon, Dec 5, 2016 at 10:49 PM, Till Westmann  wrote:

> Hi,
>
> today a few of us had a discussion about how we could make the reviewing
> process moving along a little smoother. The goal is to increase the
> likeliness
> that the reviews and review comments get addressed reasonably quickly. To
> do
> that, the proposal is to
> a) try to keep ourselves to some time limit up to which a reviewer or
> author
>responds to a review or a comment and to
> a) regularly report via e-mail about open reviews and how long they have
> been
>open (Ian already has filed an issue to automate this [1]).
> Of course one is not always able to spend the time to do a thorough review
> [2]
> / respond fully to comments, but in this case we should aim to let the
> other
> participants know within the time limit that the task is not feasible so
> that
> they adapt their plan accordingly.
> The first proposal for the time limit would be 72h (which is taken from the
> minimal time that a [VOTE] stays open to allow people in all different
> locations and timezones to vote).
> Another goal would be to abandon reviews, if nobody seems to be working on
> them
> for a while (and we’d need to find out what "a while" could be).
>
> Thoughts on this?
> A good idea or too much process?
> Is the time limit reasonable?
> Please let us know what you think (ideally more than a +1 or a -1 ...)
>
> Cheers,
> Till
>
> [1] https://issues.apache.org/jira/browse/ASTERIXDB-1745
> [2] https://cwiki.apache.org/confluence/display/ASTERIXDB/Code+Reviews
>


Re: One of NC node is not stopping for a hash join.

2016-12-05 Thread Taewoo Kim
@Abdullah: Thanks. I missed your e-mail and just checked that. Will try.

Best,
Taewoo

On Fri, Dec 2, 2016 at 10:32 AM, abdullah alamoudi 
wrote:

> Taewoo,
> You can use the diagnostics end point (/admin/diagnostics) to look at all
> the stack traces from a single interface when that happens. This could give
> an idea on what is happening in such case.
> Although, from what you described, it could be that we have some skewness
> during query execution? (could be nulls,missing? any special values?). That
> is also worth considering.
>
> Trying to help without enough context :-). Cheers,
> Abdullah.
>
> > On Dec 2, 2016, at 10:22 AM, Taewoo Kim  wrote:
> >
> > Additional note: @Till: Yes. It happened again for the same hash-join
> > query. As we can see in the bold part of the following CC.log, one node
> > alone was executing for two hours.
> >
> >
> > Dec 01, 2016 10:41:56 PM
> > org.apache.hyracks.control.cc.scheduler.ActivityClusterPlanner
> > planActivityCluster
> > INFO: Plan for org.apache.hyracks.api.job.ActivityCluster@383ecfdd
> > Dec 01, 2016 10:41:56 PM
> > org.apache.hyracks.control.cc.scheduler.ActivityClusterPlanner
> > planActivityCluster
> > INFO: Built 1 Task Clusters
> > Dec 01, 2016 10:41:56 PM
> > org.apache.hyracks.control.cc.scheduler.ActivityClusterPlanner
> > planActivityCluster
> > INFO: Tasks: [TID:ANID:ODID:1:1:0, TID:ANID:ODID:1:1:1,
> > TID:ANID:ODID:1:1:2, TID:ANID:ODID:1:1:3, TID:ANID:ODID:1:1:4,
> > TID:ANID:ODID:1:1:5, TID:ANID:ODID:1:1:6, TID:ANID:ODID:1:1:7,
> > TID:ANID:ODID:1:1:8, TID:ANID:ODID:1:1:9, TID:ANID:ODID:1:1:10,
> > TID:ANID:ODID:1:1:11, TID:ANID:ODID:1:1:12, TID:ANID:ODID:1:1:13,
> > TID:ANID:ODID:1:1:14, TID:ANID:ODID:1:1:15, TID:ANID:ODID:1:1:16,
> > TID:ANID:ODID:1:1:17, TID:ANID:ODID:4:0:0]
> > Dec 01, 2016 10:43:18 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete: [ss1120_nc3[JID:5:TAID:TID:
> ANID:ODID:1:1:5:0]
> > Dec 01, 2016 10:43:22 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete: [ss1120_nc4[JID:5:TAID:TID:
> ANID:ODID:1:1:7:0]
> > Dec 01, 2016 10:43:23 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete:
> > [ss1120_nc9[JID:5:TAID:TID:ANID:ODID:1:1:16:0]
> > Dec 01, 2016 10:43:28 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete: [ss1120_nc2[JID:5:TAID:TID:
> ANID:ODID:1:1:2:0]
> > Dec 01, 2016 10:43:31 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete: [ss1120_nc2[JID:5:TAID:TID:
> ANID:ODID:1:1:3:0]
> > Dec 01, 2016 10:43:34 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete: [ss1120_nc5[JID:5:TAID:TID:
> ANID:ODID:1:1:8:0]
> > Dec 01, 2016 10:43:40 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete:
> > [ss1120_nc9[JID:5:TAID:TID:ANID:ODID:1:1:17:0]
> > Dec 01, 2016 10:43:41 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete: [ss1120_nc4[JID:5:TAID:TID:
> ANID:ODID:1:1:6:0]
> > Dec 01, 2016 10:43:49 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete:
> > [ss1120_nc7[JID:5:TAID:TID:ANID:ODID:1:1:12:0]
> > Dec 01, 2016 10:43:51 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete: [ss1120_nc1[JID:5:TAID:TID:
> ANID:ODID:1:1:1:0]
> > Dec 01, 2016 10:43:53 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete: [ss1120_nc5[JID:5:TAID:TID:
> ANID:ODID:1:1:9:0]
> > Dec 01, 2016 10:43:58 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete:
> > [ss1120_nc7[JID:5:TAID:TID:ANID:ODID:1:1:13:0]
> > Dec 01, 2016 10:44:25 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete:
> > [ss1120_nc8[JID:5:TAID:TID:ANID:ODID:1:1:14:0]
> > Dec 01, 2016 10:44:29 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete:
> > [ss1120_nc6[JID:5:TAID:TID:ANID:ODID:1:1:11:0]
> > Dec 01, 2016 10:44:51 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete:
> > [ss1120_nc8[JID:5:TAID:TID:ANID:ODID:1:1:15:0]
> > Dec 01, 2016 10:45:19 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> > INFO: Executing: TaskComplete: [ss1120_nc3[JID:5:TAID:TID:
> ANID:ODID:1:1:4:0]
> > *Dec 01, 2016 10:46:31 PM
> > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run*
> > *INFO: Executing: TaskComplete:
> > [ss1120_nc1[JID:5:TAID:TID:ANID:ODID:1:1:0:0]*
> > *Dec 02, 2016 12:30:19 AM
> > 

Re: [AsterixDB] execution time metric in the HTTP API

2016-12-05 Thread Mike Carey
Those names sound good to me (essentially interpreted as modifiers of 
"results" :-)).



On 12/5/16 8:52 AM, Till Westmann wrote:
Yes, we have these 3 modes. In a upcoming change I call the delivery 
modes
"immediate", "deferred", and "async". But I’ll happily change that if 
somebody

has a better idea for the names.
Also, I’ve recently tested the "deferred" mode, but I haven’t tried 
"async"

(specifically the result status part of it …).

Cheers,
Till

On 5 Dec 2016, at 8:02, Mike Carey wrote:

Agreed on this.  Also, my recollection is that there were/are three 
combos of results and synchrony that we support - sync execution with 
results returned, sync execution with just a handle returned, and 
async execution with a handle returned.  The handle can then be used 
to ask for the results in any case, when they are ready, and to ask 
"is it done yet" in the last of the three cases.  For this use case, 
getting stats back, one would want case two (sync with handle) - 
right? The idea would be to not hear back until the work was done, 
but to not measure that final overhead of returning and swallowing 
the results themselves.



On 12/4/16 9:20 PM, Till Westmann wrote:
I think that the best solution for your use-case would be to add 
deferred

result delivery for the query/service API, as
a) this API returns metrics today and
b) it anyway is a missing feature of this API.
I’ve filed an issue for this [1] and started work on it, but it’s 
not quite

done yet …

One of the additional challenges is, that the query/service API 
currently only
supports SQL++ and not AQL. If you need AQL, we would also need to 
fix another

issue [2].
Which query language do you use (and does it matter for what you are 
trying to

do)?

Cheers,
Till

[1] https://issues.apache.org/jira/browse/ASTERIXDB-1744
[2] https://issues.apache.org/jira/browse/ASTERIXDB-1559

On 3 Dec 2016, at 6:47, Thor Martin Abrahamsen wrote:


Thanks for your suggestions, Pouria and Till.

Using the «mode=asynchronous-deferred» returned only the handle, 
just as I wanted. But I am not getting any metrics with the 
response. Just the handle. The same goes for the request without 
the «mode» parameter: I only get the results, not any metrics. Is 
there a way to get metrics with the API response? Or is it possible 
to use «mode=asynchronous-deferred» in the web interface?


The asterixClient worked perfectly, but the execution time returned 
is calculated in the client, not by AsterixDB (please correct me if 
I’m wrong Pouria). Ideally I would like the isolated AsterixDB 
execution time, and don’t consider network delay or client 
performance. I might try running the client on the same node as the 
CC :)




Best regards
Thor Martin Abrahamsen
Student @ NTNU
Tlf. +47 470 78 713

2. des. 2016 kl. 20.45 skrev Till Westmann 
>:


Hi,

an alternative to using the client that Pouria suggested would be 
to add
"mode=asynchronous-deferred" as an HTTP parameter when talking to 
the API. In
that case the query should be evaluated completely and an HTTP 
response should
come back - however without the result. Instead the response should 
contain a
handle where you could pick up the query result (which you could 
choose not to

do).
I'm using "should" a lot, as I haven't used this feature for a 
while, and so my

recollection of what it does might be buggy or outdated.

Cheers,
Till

On 2 Dec 2016, at 11:16, Pouria Pirzadeh wrote:

You may find the following client useful.
It works against the HTTP, runs query workload for one or more 
iterations

and dumps the response time per query/iteration in a stats file.

https://github.com/pouriapirz/asterixClient

Pouria

On Fri, Dec 2, 2016 at 10:45 AM, Thor Martin Abrahamsen 


Re: [AsterixDB] execution time metric in the HTTP API

2016-12-05 Thread Thor Martin Abrahamsen
For this use case, getting stats back, one would want case two (sync with 
handle) - right?
Correct, this is the behavior I was looking for.

Which query language do you use (and does it matter for what you are trying to
do)?
I’m currently using AQL, but it’s a quite simple query so I can change to SQL++.
Is it possible to apply library functions in SQL++?
The specific query in AQL is: «for $x in dataset Twitter return 
testilb#classify($x);»

I’ll probably just use the response time in my experiments for now.
I would love to contribute, and will try to find the time after my exams.

Thank you so much for your help!


Best regards
Thor Martin Abrahamsen
Student @ NTNU
Tlf. +47 470 78 713

5. des. 2016 kl. 17.02 skrev Mike Carey 
>:

Agreed on this.  Also, my recollection is that there were/are three combos of 
results and synchrony that we support - sync execution with results returned, 
sync execution with just a handle returned, and async execution with a handle 
returned.  The handle can then be used to ask for the results in any case, when 
they are ready, and to ask "is it done yet" in the last of the three cases.  
For this use case, getting stats back, one would want case two (sync with 
handle) - right?  The idea would be to not hear back until the work was done, 
but to not measure that final overhead of returning and swallowing the results 
themselves.


On 12/4/16 9:20 PM, Till Westmann wrote:
I think that the best solution for your use-case would be to add deferred
result delivery for the query/service API, as
a) this API returns metrics today and
b) it anyway is a missing feature of this API.
I’ve filed an issue for this [1] and started work on it, but it’s not quite
done yet …

One of the additional challenges is, that the query/service API currently only
supports SQL++ and not AQL. If you need AQL, we would also need to fix another
issue [2].
Which query language do you use (and does it matter for what you are trying to
do)?

Cheers,
Till

[1] https://issues.apache.org/jira/browse/ASTERIXDB-1744
[2] https://issues.apache.org/jira/browse/ASTERIXDB-1559

On 3 Dec 2016, at 6:47, Thor Martin Abrahamsen wrote:

Thanks for your suggestions, Pouria and Till.

Using the «mode=asynchronous-deferred» returned only the handle, just as I 
wanted. But I am not getting any metrics with the response. Just the handle. 
The same goes for the request without the «mode» parameter: I only get the 
results, not any metrics. Is there a way to get metrics with the API response? 
Or is it possible to use «mode=asynchronous-deferred» in the web interface?

The asterixClient worked perfectly, but the execution time returned is 
calculated in the client, not by AsterixDB (please correct me if I’m wrong 
Pouria). Ideally I would like the isolated AsterixDB execution time, and don’t 
consider network delay or client performance. I might try running the client on 
the same node as the CC :)



Best regards
Thor Martin Abrahamsen
Student @ NTNU
Tlf. +47 470 78 713

2. des. 2016 kl. 20.45 skrev Till Westmann 
>:

Hi,

an alternative to using the client that Pouria suggested would be to add
"mode=asynchronous-deferred" as an HTTP parameter when talking to the API. In
that case the query should be evaluated completely and an HTTP response should
come back - however without the result. Instead the response should contain a
handle where you could pick up the query result (which you could choose not to
do).
I'm using "should" a lot, as I haven't used this feature for a while, and so my
recollection of what it does might be buggy or outdated.

Cheers,
Till

On 2 Dec 2016, at 11:16, Pouria Pirzadeh wrote:

You may find the following client useful.
It works against the HTTP, runs query workload for one or more iterations
and dumps the response time per query/iteration in a stats file.

https://github.com/pouriapirz/asterixClient

Pouria

On Fri, Dec 2, 2016 at 10:45 AM, Thor Martin Abrahamsen 

Re: [AsterixDB] execution time metric in the HTTP API

2016-12-05 Thread Mike Carey
Agreed on this.  Also, my recollection is that there were/are three 
combos of results and synchrony that we support - sync execution with 
results returned, sync execution with just a handle returned, and async 
execution with a handle returned.  The handle can then be used to ask 
for the results in any case, when they are ready, and to ask "is it done 
yet" in the last of the three cases.  For this use case, getting stats 
back, one would want case two (sync with handle) - right?  The idea 
would be to not hear back until the work was done, but to not measure 
that final overhead of returning and swallowing the results themselves.



On 12/4/16 9:20 PM, Till Westmann wrote:

I think that the best solution for your use-case would be to add deferred
result delivery for the query/service API, as
a) this API returns metrics today and
b) it anyway is a missing feature of this API.
I’ve filed an issue for this [1] and started work on it, but it’s not 
quite

done yet …

One of the additional challenges is, that the query/service API 
currently only
supports SQL++ and not AQL. If you need AQL, we would also need to fix 
another

issue [2].
Which query language do you use (and does it matter for what you are 
trying to

do)?

Cheers,
Till

[1] https://issues.apache.org/jira/browse/ASTERIXDB-1744
[2] https://issues.apache.org/jira/browse/ASTERIXDB-1559

On 3 Dec 2016, at 6:47, Thor Martin Abrahamsen wrote:


Thanks for your suggestions, Pouria and Till.

Using the «mode=asynchronous-deferred» returned only the handle, just 
as I wanted. But I am not getting any metrics with the response. Just 
the handle. The same goes for the request without the «mode» 
parameter: I only get the results, not any metrics. Is there a way to 
get metrics with the API response? Or is it possible to use 
«mode=asynchronous-deferred» in the web interface?


The asterixClient worked perfectly, but the execution time returned 
is calculated in the client, not by AsterixDB (please correct me if 
I’m wrong Pouria). Ideally I would like the isolated AsterixDB 
execution time, and don’t consider network delay or client 
performance. I might try running the client on the same node as the 
CC :)




Best regards
Thor Martin Abrahamsen
Student @ NTNU
Tlf. +47 470 78 713

2. des. 2016 kl. 20.45 skrev Till Westmann 
>:


Hi,

an alternative to using the client that Pouria suggested would be to add
"mode=asynchronous-deferred" as an HTTP parameter when talking to the 
API. In
that case the query should be evaluated completely and an HTTP 
response should
come back - however without the result. Instead the response should 
contain a
handle where you could pick up the query result (which you could 
choose not to

do).
I'm using "should" a lot, as I haven't used this feature for a while, 
and so my

recollection of what it does might be buggy or outdated.

Cheers,
Till

On 2 Dec 2016, at 11:16, Pouria Pirzadeh wrote:

You may find the following client useful.
It works against the HTTP, runs query workload for one or more 
iterations

and dumps the response time per query/iteration in a stats file.

https://github.com/pouriapirz/asterixClient

Pouria

On Fri, Dec 2, 2016 at 10:45 AM, Thor Martin Abrahamsen