[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-01 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 2:

(7 comments)

Thanks for doing this patch. This will help reduce a lot of unnecessary network 
traffic.

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h
File be/src/statestore/statestore-subscriber.h:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h@165
PS2, Line 165:   typedef TUniqueId RegistrationId;
You can move this typedef to statestore.h and use the same type in 
statestore.h/cc too.


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383
PS2, Line 383: SubscriberId
Do we even need to store the SubscriberId here? Can't we just store a unique 
registration ID? Where ever the 'SubscriberId' is required, we just get it from 
the Subscriber object anyway, and that object can be retrieved using the unique 
registration id.


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@481
PS2, Line 481: subscriber exists
Could you clarify what "exists" means here exactly? It could be confused with a 
node just existing as a part of the cluster. I think we want to say that it 
exists in the subscribers_ map.


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@482
PS2, Line 482: registration_ids
nit: registration_id


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@484
PS2, Line 484: std::shared_ptr* subscriber
Add a comment about what is returned in this out parameter.


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414
PS2, Line 414: onst SubscriberId& subscriber_id,
 : const TUniqueId& registration_id
It looks like it just makes sense to pass 'const Subscriber&' here? Is there a 
case where we would not get a subscriber_id and a registration_id from the same 
Subscriber object while calling this function?


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@634
PS2, Line 634:   if (!RegisteredSubscriberExists(subscriber_to_update.first, 
subscriber_to_update.second,
I'm a little worried that we're contending for a mutex two more times in this 
function. Do you anticipate any performance regression due to increased context 
switching?

Consider using a spin lock if we won't have more than ~1000 entries in the map 
at one time. (unordered_map has a worst-case O(N) time complexity)



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 02 Nov 2017 03:19:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-01 Thread Gabor Kaszab (Code Review)
Hello Lars Volker, Laszlo Gaal, Zoltan Borok-Nagy, Attila Jeges, Tim Armstrong, 
Csaba Ringhofer, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8447

to look at the new patch set (#2).

Change subject: IMPALA-2181: Add query option levels for display
..

IMPALA-2181: Add query option levels for display

Three display levels are introduced for each query option: REGULAR, ADVANCED
and DEPRECATED. When the query options are diplayed in Impala shell using
SET then only the REGULAR options are shown. A new command called SET ALL
shows all the options grouped by their option levels.

Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
---
M be/src/service/child-query.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M shell/impala_client.py
M shell/impala_shell.py
M tests/shell/test_shell_interactive.py
9 files changed, 242 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8447/2
--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options

2017-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8038 )

Change subject: IMPALA-5736: Add impala-shell argument to set default query 
options
..


Patch Set 17: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1425/


--
To view, visit http://gerrit.cloudera.org:8080/8038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986
Gerrit-Change-Number: 8038
Gerrit-PatchSet: 17
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Thu, 02 Nov 2017 01:28:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-01 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8449


Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=6

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 65 insertions(+), 33 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/2
--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-6121: remove I/O mgr request context cache

2017-11-01 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8408 )

Change subject: IMPALA-6121: remove I/O mgr request context cache
..


Patch Set 5: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108
Gerrit-Change-Number: 8408
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 02 Nov 2017 00:45:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5541: Reject BATCH SIZE greater than 65536

2017-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8419 )

Change subject: IMPALA-5541: Reject BATCH_SIZE greater than 65536
..

IMPALA-5541: Reject BATCH_SIZE greater than 65536

Setting a very large value could cause strange behaviour like crashing,
hanging, excessive memory usage, spinning etc. This patch rejects values
out of the range [0,65536].

Change-Id: Idd5a2490a73b6915224160d7604b4badc72c1d97
Reviewed-on: http://gerrit.cloudera.org:8080/8419
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
2 files changed, 12 insertions(+), 3 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idd5a2490a73b6915224160d7604b4badc72c1d97
Gerrit-Change-Number: 8419
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5541: Reject BATCH SIZE greater than 65536

2017-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8419 )

Change subject: IMPALA-5541: Reject BATCH_SIZE greater than 65536
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd5a2490a73b6915224160d7604b4badc72c1d97
Gerrit-Change-Number: 8419
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 02 Nov 2017 00:45:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 02 Nov 2017 00:40:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..

IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

This is a revert of a revert, re-enabling parallel data load.  It avoid
the race condition by explicitly configuring the temporary directory in
question in load-data.py.

When the parallel data load change went in, we discovered
a race with a signature of:

  java.io.FileNotFoundException: File
  /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist

The number in this path is milliseconds since the epoch, and the race
occurs when two queries submitted to HiveServer2, running with the local
runner, hit the same millisecond time stamp.  The upstream bug is
https://issues.apache.org/jira/browse/MAPREDUCE-6441, and I described the
symptoms in https://issues.apache.org/jira/browse/MAPREDUCE-6992 (which
is now marked as a dupe).

I've tested this by running data load 5 times on the same machines
where it failed before. I also ran data load manually and inspected
the system to make sure that the temporary directories are getting
created as expected in /tmp/impala-data-load-*.

Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Reviewed-on: http://gerrit.cloudera.org:8080/8405
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M bin/load-data.py
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
4 files changed, 59 insertions(+), 6 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-11-01 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8023 )

Change subject: IMPALA-4856: Port data stream service to KRPC
..


Patch Set 6:

(8 comments)

Here's my last set of comments for this round.

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.cc@59
PS6, Line 59: boost::bind
I don't have a strong preference either way, but it'd be nice to be consistent 
between either using bind or [], rather than both...


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.cc@322
PS6, Line 322: RespondToTimedOutSenders
shouldn't be plural


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@52
PS6, Line 52: overflows the queue
it's not clear what that means just from reading the comment. It'd be nice to 
briefly explain that this is talking about the soft limit of the number of 
bytes across all sender queues.


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@105
PS6, Line 105:   // number of pending row batch insertion.
 :   int num_pending_enqueue_ = 0;
it's not clear what a "pending insertion" means or why we have this.


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@113
PS6, Line 113: RowBatch*
how about using unique_ptr since this owns the row batch (until it's 
transferred to current_batch_)?


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@127
PS6, Line 127:   queue blocked_senders_;
given that a single soft limit is imposed across all sender queues, does it 
make sense that the blocked_senders_ are maintained per sender? Why don't we 
maintain a single blocked_senders_ list per datastream recvr?

Hmm, I guess we need to know if this sender has a blocked sender in GetBatch(). 
But given the single limit, it seems wrong that one sender's row batches can 
bypass another sender once we get into the blocked sender situation. i.e. the 
flow of batches across senders seems quite different depending on when the 
limit was reached.


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@231
PS6, Line 231: proto_batch
update


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@322
PS6, Line 322: payload->rpc_context->RespondSuccess();
doing this in Close() goes against the paradigm that Close() is only about 
releasing (local) resources. We've been going that way because there might be 
no where to bubble up a status from Close().  At least RespondSuccess() doesn't 
return a status, I suppose.  But is there any place sooner we could do this? 
Does it make sense to do in during Cancel instead?



--
To view, visit http://gerrit.cloudera.org:8080/8023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Gerrit-Change-Number: 8023
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 02 Nov 2017 00:10:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Correct log line in start-impala-cluster.py.

2017-11-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8432 )

Change subject: Correct log line in start-impala-cluster.py.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8432/1/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/8432/1/bin/start-impala-cluster.py@398
PS1, Line 398: executors = options.cluster_size - options.num_coordinators
Could executors be negative due to the same bug?



--
To view, visit http://gerrit.cloudera.org:8080/8432
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ceece1c05b9a4ca9f0a08fa30d195f811490c0e
Gerrit-Change-Number: 8432
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Wed, 01 Nov 2017 23:58:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 9:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@13
PS9, Line 13: In RuntimeFilterGenerator in the planner, each partitioned hash 
join
... each hash join node generates a bloom and min-max filter for each equi-join 
predicate, but only those ...


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27
PS9, Line 27: For now, min-max filters are only applied at the KuduScanner, 
which
Not specific to the code changes, and I don't expect a response here (probably 
too long :)).

How do the existing query options around runtime filters affect the new min/max 
filters on Kudu? For example, what does DISABLE_ROW_RUNTIME_FILTERING mean for 
the Kudu min/max filters?

How should users think about setting:
RUNTIME_FILTER_WAIT_TIME_MS

In particular, are min/max filters more effective against Kudu PK or partition 
columns?


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@41
PS9, Line 41: Perf Testing:
Contrived extreme queries are good data points, but how about running the 
TPCH/DS perf suites against Kudu?


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@44
PS9, Line 44: - Ran a contrived query with a filter that does eliminate any rows
does not eliminate


http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift@202
PS9, Line 202:   // Maximum number of bloom runtime filters allowed per query
I think I understand why you did this, but it seems confusing from a user's 
perspective. Ok to leave, but do you have a story around the eventual meaning 
of existing query options when HDFS can do min/max and Kudu can do bloom?


http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/PlanNode.java@730
PS9, Line 730: output.append(Joiner.on(", ").join(filtersStr) + "\n");
just return the string? don't think we need the 'output' StringBuilder


http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@111
PS9, Line 111: private final Operator joinOp_;
Let's call this cmpOp_ or exprCmpOp_ or something else because "joinOp_" 
usually indicates the join type like left outer, right outer, etc.


http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@170
PS9, Line 170:   SlotRef slotRef = expr.unwrapSlotRef(false);
Add a comment stating that the validity of this is checked elsewhere (and where 
exactly)


http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@368
PS9, Line 368:   if (node instanceof HdfsScanNode && type_ != 
TRuntimeFilterType.BLOOM) {
I feel like these checks belong in the caller. Having an addTarget() function 
be a co-op in some cases seems difficult to reason about. Can be cleaned with a 
isValidTarget() helper function.


http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@463
PS9, Line 463: // We only enforce a limit on the number of bloom filters as 
they are much more
This seems really confusing for users. I'm ok with checking in this version, 
and let's discuss separately.


http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test@1248
PS9, Line 1248: |  runtime filters: RF004 <- o_orderkey, RF005 <- o_clerk
To me the re-numbering is a little strange. We can think about how to address 
this in a follow-on change. I'm thinking that ideally users should be able to 
quickly determine the number of runtime filters based on the max RF id, so we 
could assign the real RF id lazily instead of eagerly.


http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test:


[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7822 )

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
..


Patch Set 8: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc
File be/src/exec/parquet-plain-test.cc:

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc@33
PS7, Line 33: int Encode(const InternalType& t, int encoded_byte_size, uint8_t* 
buffer,
> This test suite basically takes a test value and its expected size and firs
Oh ok, thanks for the explanation. I think we should also add a comment 
mentioning that this doesn't exactly implement the Parquet spec, since the 
parquet spec would require that it remove leading zero bytes.



--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 8
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Nov 2017 23:29:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

2017-11-01 Thread Bikramjeet Vig (Code Review)
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7822

to look at the new patch set (#8).

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
..

IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

Extendes parquet column reader and associated classes to allow for more
than one possible physical type for a given logical type. This patch
only adds support for variable sized byte array encoded decimals and
more will be added in upcoming commits.
Also, column level metadata verification which was currently being
done per row group will now only be done once per column per file.

Testing:
Added backend test for verifying newly added decimal types are decoded
correctly.
Added Query test that decodes both plain and dictionary-encoded
decimals using binary encoding.

Performance:
Initial perf testing using tpcds_1000 shows no regression.

Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
---
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exec/parquet-plain-test.cc
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
A testdata/data/binary_decimal_dictionary.parquet
A testdata/data/binary_decimal_no_dictionary.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test
M tests/query_test/test_scanners.py
18 files changed, 629 insertions(+), 283 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/8
--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 8
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

2017-11-01 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7822 )

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h@391
PS7, Line 391: fixed_len_size
> Looked again. The variable name (and recycling the argument storage) is con
Done


http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc
File be/src/exec/parquet-plain-test.cc:

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc@33
PS7, Line 33: int EncodeVarLenDecimal(const DECIMAL_TYPE& t, int 
fixed_len_size, uint8_t* buffer){
> I took another look at the standard and it says that the minimum number of
This test suite basically takes a test value and its expected size and first 
encodes it then decodes it, in order to make minimal changes to the test suite 
I reused the  function signature and passed the "minimum number of bytes 
required to store the unscaled value" as the expected size which is passed to 
the encode methods names as  "fixed_len_size". As per you suggestion in a 
previous comment, I believe this will be more clear if i change the name to 
encoded_byte_size.

Also, I gave an explanation of  what "fixed_len_size" for decimals stored as 
BYTE_ARRAY as a comment at line 46. I believe it'll be better to move that 
comment right above this method instead.



--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 7
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Nov 2017 23:21:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.

2017-11-01 Thread Pranay Singh (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8215

to look at the new patch set (#6).

Change subject: IMPALA-5142 EventSequence displays negative elapsed time.
..

IMPALA-5142 EventSequence displays negative elapsed time.

EventSequence::EventList which stores the Event in increasing time
order may have at times Events that are not stored in increasing time
order. This may happen when concurrently EventSequence::MarkEvent()
is called for different Events in the same EventSequence.

The incorrect time order of Event in EventList results in a negative
value being displayed, which is the issue reported in this JIRA.

The issue can be fixed either be re-ordering the lock in
EventSequence::MarkEvent() or by sorting the EventList before
printing. Reordering of lock in EventSequence::MarkEvent() will
involve holding the lock across clock_gettime() so that
approach is not taken. This patch fixes the issue by sorting the
EventList in GetEvents() as this function is expected to return
sorted list of events based on time.

Testing:
Ran all the front-end/backend and end-end tests.

Change-Id: I8c944396d96473b17b453da3e913ffc56680a896
---
M be/src/util/runtime-profile-counters.h
1 file changed, 6 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/8215/6
--
To view, visit http://gerrit.cloudera.org:8080/8215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896
Gerrit-Change-Number: 8215
Gerrit-PatchSet: 6
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh


[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.

2017-11-01 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8215 )

Change subject: IMPALA-5142 EventSequence displays negative elapsed time.
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8215/5/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/8215/5/be/src/util/runtime-profile-counters.h@327
PS5, Line 327:[](Event const , Event const ) {
 :   return event1.second < event2.second;
> rather than creating a tmp eventlist_sorted, you should create a temp for t
I'm dropping this logic to call the sort() all the time to make the result more 
predictable each time this function gets called and also since the event list 
will be small so sorting won't be that expensive.


http://gerrit.cloudera.org:8080/#/c/8215/5/be/src/util/runtime-profile-counters.h@330
PS5, Line 330:  == false
> our style uses ! rather than == false
Since I'm not checking the event list to be sorted so I won't need this anymore.


http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h@326
PS4, Line 326: bool eventlist_sorted = std::is_sorted(events_.begin(), 
events_.end(),
 : [](Event const , Event const ) {
 :   return event1.second < event2.second;
 :   });
 : if (eventlist_sorted == false) {
 :   std::sort(events_.begin(), events_.end(),
 :   [](Event const , Event const ) {
 :   return event1.second < event2.second;
 :   });
 :
> the flipside is that this becomes a cliff in the sense that normally it wil
Done as suggested kept the functionality simple, removed the logic of having an 
extra check.



--
To view, visit http://gerrit.cloudera.org:8080/8215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896
Gerrit-Change-Number: 8215
Gerrit-PatchSet: 5
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Wed, 01 Nov 2017 23:01:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6121: remove I/O mgr request context cache

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8408 )

Change subject: IMPALA-6121: remove I/O mgr request context cache
..


Patch Set 4:

(4 comments)

Thanks for looking at the patch and for the good questions.

http://gerrit.cloudera.org:8080/#/c/8408/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8408/4//COMMIT_MSG@12
PS4, Line 12: from TCMalloc's thread caches should scale much better than a
> Though reader_context_ is not frequently allocated in current code, theoret
Yeah I think that's a reasonable option in a lot of cases, particularly if perf 
is a concern.

Allocating on the heap can avoid some slightly tricky issues around memory 
alignment (if the class requires more than the default alignment, e.g. cache 
line alignment, then any class including it must also be cache line aligned).

Including everything inline also can also force viral header includes, since 
any class that needs to know the layout of Y that includes X inline then also 
needs to know the layout of X.

I did actually just realise that we don't have copy constructors disable for 
request context, so I added that.


http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-reader-context.h
File be/src/runtime/disk-io-mgr-reader-context.h:

http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-reader-context.h@274
PS4, Line 274:   __sync_synchronize();
> Just a question - Does b need to be atomic here? If it does not generate a
Yeah I think it would make more sense to use an explicit atomic for 
is_on_queue_. The whole concurrency story here could probably be simplified a 
lot but it would need some thought.

I don't want to make any subtle changes to this logic as part of a refactoring 
patch, but it would be nice to simplify this code at some point.


http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-reader-context.h@319
PS4, Line 319:   if (!is_on_queue_ && num_threads_in_op_.Load() == 0 && 
!done_) {
> The return value of Add can be used to remove this Load
I'd have to think about this carefully to understand if the Acquire barrier in 
Load() makes a difference. I think you're probably right but I don't want to 
potentially include subtle behavioural changes in this patch.


http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-test.cc
File be/src/runtime/disk-io-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-test.cc@251
PS4, Line 251:   unique_ptr writer = 
io_mgr.RegisterContext(NULL);
> nullptr?
Did it for the whole file.



--
To view, visit http://gerrit.cloudera.org:8080/8408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108
Gerrit-Change-Number: 8408
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Nov 2017 22:39:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6121: remove I/O mgr request context cache

2017-11-01 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Tianyi Wang,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8408

to look at the new patch set (#5).

Change subject: IMPALA-6121: remove I/O mgr request context cache
..

IMPALA-6121: remove I/O mgr request context cache

This simplifies the lifecycle of the request contexts and eliminates
some code. The comments claim that request context cache improves
performance when allocating smallish the objects. But allocating
from TCMalloc's thread caches should scale much better than a
global object pool protected by a lock.

I needed to move the definition to a non-internal header file so that it
was visible to clients that manage it by unique_ptr.

We also do not need to transfer the request contexts to the RuntimeState
since I/O buffers do not leave scanners now.

Testing:
Ran exhaustive tests.

Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/runtime/disk-io-mgr-internal.h
M be/src/runtime/disk-io-mgr-reader-context.cc
A be/src/runtime/disk-io-mgr-reader-context.h
M be/src/runtime/disk-io-mgr-stress.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
16 files changed, 639 insertions(+), 804 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8408/5
--
To view, visit http://gerrit.cloudera.org:8080/8408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108
Gerrit-Change-Number: 8408
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.

2017-11-01 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8215 )

Change subject: IMPALA-5142 EventSequence displays negative elapsed time.
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8215/5/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/8215/5/be/src/util/runtime-profile-counters.h@327
PS5, Line 327:int64_t prev = 0L;
 : for (const Event& event: events_) {
rather than creating a tmp eventlist_sorted, you should create a temp for this 
lamba (you can use "auto" for that), since it's used twice and must be 
consistent in order for this code to make sense.


http://gerrit.cloudera.org:8080/#/c/8215/5/be/src/util/runtime-profile-counters.h@330
PS5, Line 330:  = false;
our style uses ! rather than == false


http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h@326
PS4, Line 326: bool eventlist_sorted = true;
 : int64_t prev = 0L;
 : for (const Event& event: events_) {
 :   if (event.second < prev) {
 : eventlist_sorted = false;
 : break;
 :   }
 :   prev = event.second;
 : }
 :
> Since the list is sorted in most of the times calling sort() may be expensi
the flipside is that this becomes a cliff in the sense that normally it will be 
fast, but every once in a while (when the race occurs), we'll pay the cost of 
sort(). If we always call sort(), then the performance is more predictable.

I don't think the event list will ever be so long that this matters one way or 
another, so simplest seems best. That said, I don't feel strongly if you really 
want to keep the check.



--
To view, visit http://gerrit.cloudera.org:8080/8215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896
Gerrit-Change-Number: 8215
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Wed, 01 Nov 2017 22:31:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.

2017-11-01 Thread Pranay Singh (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8215

to look at the new patch set (#5).

Change subject: IMPALA-5142 EventSequence displays negative elapsed time.
..

IMPALA-5142 EventSequence displays negative elapsed time.

EventSequence::EventList which stores the Event in increasing time
order may have at times Events that are not stored in increasing time
order. This may happen when concurrently EventSequence::MarkEvent()
is called for different Events in the same EventSequence.

The incorrect time order of Event in EventList results in a negative
value being displayed, which is the issue reported in this JIRA.

The issue can be fixed either be re-ordering the lock in
EventSequence::MarkEvent() or by sorting the EventList before
printing. Reordering of lock in EventSequence::MarkEvent() will
involve holding the lock across clock_gettime() so that
approach is not taken. This patch fixes the issue by sorting the
EventList in GetEvents() as this function is expected to return
sorted list of events based on time.

Testing:
Ran all the front-end/backend and end-end tests.

Change-Id: I8c944396d96473b17b453da3e913ffc56680a896
---
M be/src/util/runtime-profile-counters.h
1 file changed, 14 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/8215/5
--
To view, visit http://gerrit.cloudera.org:8080/8215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896
Gerrit-Change-Number: 8215
Gerrit-PatchSet: 5
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh


[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.

2017-11-01 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8215 )

Change subject: IMPALA-5142 EventSequence displays negative elapsed time.
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h@326
PS4, Line 326: bool eventlist_sorted = true;
 : int64_t prev = 0L;
 : for (const Event& event: events_) {
 :   if (event.second < prev) {
 : eventlist_sorted = false;
 : break;
 :   }
 :   prev = event.second;
 : }
 :
> why bother with this? the list is pretty short and this isn't a critical pa
Since the list is sorted in most of the times calling sort() may be expensive 
(if in future # of events becomes large), hence I'll use the std::is_sorted() 
instead.


http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h@336
PS4, Line 336: [](Event const ,
> nit: how about breaking the line before the [] instead of in the middle of
Done



--
To view, visit http://gerrit.cloudera.org:8080/8215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896
Gerrit-Change-Number: 8215
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Wed, 01 Nov 2017 22:24:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

2017-11-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
..


Patch Set 3:

(62 comments)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@281
PS3, Line 281: return new FunctionCallExpr("if", ifParams);
Let's please avoid code changes unrelated to the purpose of this patch as much 
as reasonable. Cleanup is nice in general, but this patch is already complex 
and huge so let's not add anything extra.

Such changes can also make backporting more difficult due to conflicts, so 
cleanup should be done in a separate change.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91:  * Slot A has value transfer to slot B if a predicate on A can be 
applied to B in most
We need a precise definition. The original definition was more precise.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@93
PS3, Line 93:  * It is a reflexive, transitive binary relation on the set of 
slots.
Not sure this part adds value. What's the significance of these statements with 
respect to our use of value transfers for planning purposes? If we don't make 
use of these terms/properties anywhere else in the code, then we should remove 
these statements.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@277
PS3, Line 277: // The SCC-condensed graph representation of value transfer 
relationship.
The SCC-condensed graph representation of all slot value transfers.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@278
PS3, Line 278: SccCondensedGraph valueTransferGraph;
public/private?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1146
PS3, Line 1146:* Create and register an auxiliary predicate to express an 
mutual value transfer
a mutual value transfer


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1467
PS3, Line 1467:* by replacing the slots of a source predicate with slots of 
the destTid, if for each
how about: if every source slot has a value transfer to a slot in destTid


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1513
PS3, Line 1513:   // its referenced tuples are NULL. For example:
Let's simplify the example and make it as clear as possible:

select * from (select A.a, B.b from A left join B on A.a=B.b) v where b is null


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1618
PS3, Line 1618:* For each slot equivalence class, adds/removes predicates 
from conjuncts such that it
Need a definition of equivalence class in the Analyzer class comment. You do 
mention the term "equivalence class" but I don't think it has the same meaning 
of the "equivalence class" terminology used here.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1636
PS3, Line 1636: // A map from equivalence class ids to slot equivalence 
classes containing on slots
Garbled sentence, please clean up


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1956
PS3, Line 1956:* Compute the value transfer graph based on the registered 
value transfer relation
the registered value transfers


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1973
PS3, Line 1973: String condensedTc = 
globalState_.valueTransferGraph.print();
missing space in string


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1982
PS3, Line 1982:* Populate the value transfer relation from eq conjuncts to 
graph 'g'.
Add value-transfer edges to 'g' based on the registered equi-join conjuncts.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2059
PS3, Line 2059:* Returns the equivalence class of slotID.
of the given slot id


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2062
PS3, Line 2062:   public List getEquivClass(SlotId slot) {
slot -> sid



[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-01 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8447


Change subject: IMPALA-2181: Add query option levels for display
..

IMPALA-2181: Add query option levels for display

Three display levels are introduced for each query option: REGULAR, ADVANCED
and DEPRECATED. When the query options are diplayed in Impala shell using
SET then only the REGULAR options are shown. A new command called SET ALL
shows all the options grouped by their option levels.

Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
---
M be/src/service/child-query.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M shell/impala_client.py
M shell/impala_shell.py
M tests/shell/test_shell_interactive.py
9 files changed, 244 insertions(+), 84 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8447/1
--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 


[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-11-01 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8023 )

Change subject: IMPALA-4856: Port data stream service to KRPC
..


Patch Set 6:

(14 comments)

Note to self: remaining files: krpc-data-stream-{mgr,recvr}.cc

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/exec-env.cc@89
PS6, Line 89: "Number of datastream service processing threads");
how are these defaults chosen?


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@50
PS6, Line 50: outbound
I think we should say something about KRPC to at least give that hint. maybe:

A KRPC outbound row batch...


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@60
PS6, Line 60: sizeof(int32_t)
sizeof(tuple_offsets_[0]) seems clearer and more robust


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@354
PS6, Line 354:   /// it is ignored. This function does not Reset().
we should preserve this comment when removing the thrift variant. So you could 
just put the new decl here now so we don't forget that.


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@424
PS6, Line 424:   ///
nit: i don't think we generally have all these line breaks between parameter 
comments.


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@426
PS6, Line 426:  .
delete space


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@444
PS6, Line 444: nput_
delete


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@447
PS6, Line 447: input_
delete


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@537
PS6, Line 537:   std::string compression_scratch_;
this seems like a hack and we could do something simpler, but let's leave it 
alone for now.


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.cc@241
PS6, Line 241:   // as sidecars to the RpcController.
this comment was probably meant to be deleted?


http://gerrit.cloudera.org:8080/#/c/8023/6/common/protobuf/data_stream_service.proto
File common/protobuf/data_stream_service.proto:

http://gerrit.cloudera.org:8080/#/c/8023/6/common/protobuf/data_stream_service.proto@29
PS6, Line 29: fragment
isn't this the id of the instance?  The comment in KrpcDataStreamSender is 
clearer, let's copy that:
  /// Sender instance id, unique within a fragment.
  int sender_id_;


http://gerrit.cloudera.org:8080/#/c/8023/6/common/protobuf/data_stream_service.proto@59
PS6, Line 59:   // Id of this fragment in its role as a sender.
same


http://gerrit.cloudera.org:8080/#/c/8023/3/common/protobuf/row_batch.proto
File common/protobuf/row_batch.proto:

http://gerrit.cloudera.org:8080/#/c/8023/3/common/protobuf/row_batch.proto@32
PS3, Line 32: = 2;
> That's the tuple data sent as sidecar. Clarified in the new comments.
My point is that writing it like 'tuple_data' doesn't make sense since it's not 
a field in this struct. You should just write:
Size of the tuple data (sent as a sidecar) in bytes ...


http://gerrit.cloudera.org:8080/#/c/8023/6/common/protobuf/row_batch.proto
File common/protobuf/row_batch.proto:

http://gerrit.cloudera.org:8080/#/c/8023/6/common/protobuf/row_batch.proto@32
PS6, Line 32: epeated int32 row_tuples = 2;
why is this needed? i don't see it used. The size of it is used, though it 
seems like even that can be inferred from the descriptors.



--
To view, visit http://gerrit.cloudera.org:8080/8023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Gerrit-Change-Number: 8023
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 01 Nov 2017 21:48:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options

2017-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8038 )

Change subject: IMPALA-5736: Add impala-shell argument to set default query 
options
..


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1425/


--
To view, visit http://gerrit.cloudera.org:8080/8038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986
Gerrit-Change-Number: 8038
Gerrit-PatchSet: 17
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 01 Nov 2017 21:32:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options

2017-11-01 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8038 )

Change subject: IMPALA-5736: Add impala-shell argument to set default query 
options
..


Patch Set 17: Code-Review+2

Carrying Michael's +2


--
To view, visit http://gerrit.cloudera.org:8080/8038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986
Gerrit-Change-Number: 8038
Gerrit-PatchSet: 17
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 01 Nov 2017 21:32:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc@717
PS2, Line 717: byte_buffer_last_byte_
> Oops, didn't read line 712 ;)
I couldn't see an obvious code path that would take us there. Maybe some 
specially-constructed compressed data that expanded to zero bytes? Will think 
some more.



--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 01 Nov 2017 21:27:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.

2017-11-01 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8430 )

Change subject: IMPALA-6136: Part 1: Query duration should not be normally 
negative.
..


Patch Set 3:

> Please update the comment as suggested by Phil.

Done


--
To view, visit http://gerrit.cloudera.org:8080/8430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90
Gerrit-Change-Number: 8430
Gerrit-PatchSet: 3
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 01 Nov 2017 21:24:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-01 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Todd Lipcon, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#9).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

In RuntimeFilterGenerator in the planner, each partitioned hash join
generates a bloom and min-max filter, but only those filters that end
up being assigned to a target make it into the final plan.

Min-max filters are only assigned to Kudu scans if the target expr is
a column, as Kudu doesn't support bounds on general exprs, and only if
the join op is '=' and not 'is distinct from', as Kudu doesn't support
returning NULLs if a bound is set.

Min-max filters are inserted into by the PartitionedHashJoinBuilder.
Codegen is used to eliminate branching on the type of filter. String
min-max filters truncate their bounds at 1024 chars, so that the max
amount of memory used by min-max filters is negligible.

For now, min-max filters are only applied at the KuduScanner, which
passes them into the Kudu client.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Functional Testing:
- Added new planner tests and updated the old ones. (in old tests, a
  lot of runtime filters are renumbered as we always generate min-max
  filters even if they don't end up getting assigned and they take up
  some of the RF ids).
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter-test.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/Data.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 

[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.

2017-11-01 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8430 )

Change subject: IMPALA-6136: Part 1: Query duration should not be normally 
negative.
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8430/2/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/8430/2/be/src/service/impala-http-handler.cc@303
PS2, Line 303:   // record.end_time_us might still be zero if the query is not 
yet done
> nit: Could you update the following comment in client-request-state.h and i
Done



--
To view, visit http://gerrit.cloudera.org:8080/8430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90
Gerrit-Change-Number: 8430
Gerrit-PatchSet: 3
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 01 Nov 2017 21:23:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.

2017-11-01 Thread Zoram Thanga (Code Review)
Hello Michael Ho, Philip Zeyliger, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8430

to look at the new patch set (#3).

Change subject: IMPALA-6136: Part 1: Query duration should not be normally 
negative.
..

IMPALA-6136: Part 1: Query duration should not be normally negative.

The second commit for IMPALA-5599, 1640aa97, introduced a regression
where calculating the duration of an in-progress query can result
in negative values. This can happen for two reasons:

1. The value of ClientRequestState::end_time_us_ is not initialized
   in the constructor, and may have some random value until
   ClientRequestState::Done() is called.
2. If ClientRequestState::end_time_us_ happens to have a positive
   value less than ClientRequestState::start_time_us_, the query
   duration will be negative. Here, since the query is still in
   flight, we need to use the local current time as the end time.

The fix has been manually verified by executing long-running queries
and checking the query profiles to ensure the durations are not some
random junks. A new test case will be added to check for sane time
values in a follow-on commit.

Since we are using Unix time (system clock) for time stamps, it is
still possible for end_time_us_ to be less than start_time_us_ if
the system clock gets reset while the query is executing. If there
are clients that assume that a query duration is never negative,
we really should switch to a monotonic clock to time queries.

Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.h
4 files changed, 13 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/8430/3
--
To view, visit http://gerrit.cloudera.org:8080/8430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90
Gerrit-Change-Number: 8430
Gerrit-PatchSet: 3
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-01 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc@717
PS2, Line 717: byte_buffer_last_byte_
> The byte_buffer_read_size_ check above prevents using a stale value from a
Oops, didn't read line 712 ;)

That does seem like a possibility, and seems worth addressing. But is there a 
way we can exercise that?



--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 01 Nov 2017 21:20:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc@717
PS2, Line 717: byte_buffer_last_byte_
> given that this doesn't happen when readsize is 0, is it possible to use a
The byte_buffer_read_size_ check above prevents using a stale value from a 
previous byte buffer.

There's a related scenario where I can't convince myself that it is or isn't 
possible: if the last FillByteBuffer() call filled zero bytes but the call 
before that had a split delimiter at the end of the buffer, we could 
potentially ignore the split delimiter, even though it was the last thing we 
read from the range. We could fix that by tracking whether *any* bytes were 
read - if so, it means that byte_buffer_last_byte_ is valid and is the last 
byte read.

What do you think? Should I try to more explicitly prevent that scenario? I 
don't think this change makes it more or less likely to hit.



--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 01 Nov 2017 21:10:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:54:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1424/


--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:54:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5541: Reject BATCH SIZE greater than 65536

2017-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8419 )

Change subject: IMPALA-5541: Reject BATCH_SIZE greater than 65536
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1423/


--
To view, visit http://gerrit.cloudera.org:8080/8419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd5a2490a73b6915224160d7604b4badc72c1d97
Gerrit-Change-Number: 8419
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:54:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5541: Reject BATCH SIZE greater than 65536

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8419 )

Change subject: IMPALA-5541: Reject BATCH_SIZE greater than 65536
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd5a2490a73b6915224160d7604b4badc72c1d97
Gerrit-Change-Number: 8419
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:54:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7822 )

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
..


Patch Set 7:

(2 comments)

I think we'll need to do some more work on testing for the int32/64 patches - 
we don't have pre-existing test files from parquet-mr. I think we'll have to 
generate some more test files with parquet-mr for the other cases, and we could 
consider turning that code into a data generator to generate more test files. 
From what I could tell Hive doesn't have a knob to generate some of these 
alternative output encodings.

 I feel ok with the coverage since we have end-to-end tests then more 
exhaustive unit tests for the various ways of encoding it.

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h@391
PS7, Line 391: fixed_len_size
Looked again. The variable name (and recycling the argument storage) is 
confusing. Maybe 'encoded_byte_size'?


http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc
File be/src/exec/parquet-plain-test.cc:

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc@33
PS7, Line 33: int EncodeVarLenDecimal(const DECIMAL_TYPE& t, int 
fixed_len_size, uint8_t* buffer){
I took another look at the standard and it says that the minimum number of 
bytes required to store the unscaled value should be used: 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

I think this means that we should not be including any preceding "0" bytes. 
I.e. we should not have a fixed_len_size argument and instead determine the 
size based on the number of leading zero bytes in the value.



--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 7
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:53:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-01 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc@717
PS2, Line 717: byte_buffer_last_byte_
given that this doesn't happen when readsize is 0, is it possible to use a 
stale value here?



--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:49:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2017-11-01 Thread Pranay Singh (Code Review)
Hello Joe McDonnell, Tim Armstrong, Bikramjeet Vig,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8034

to look at the new patch set (#15).

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..

IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

Currently DictDecoder class and DictEncoder class uses std::vector
to store the tables mapping codeword to value and vice-versa. It is
hard to detect the memory usage by these tables when they becomes
very large, since this memory is not accounted by Impala's memory
mangement infrastructure.

This patch uses the memory tracker of HdfsScanner to track the memory used
by dictionary in DictDecoder class. Similary it uses memory tracker of
HdfsTableSink to track the memory used by dictionary in DictEncoder class.

Memory for the dictionary, stored as std::vector is still allocated
from std:allocator but the amount allocated is accounted by
introducing a counter which is incremented and decremented as the
memory is consumed and released by vector.

Testing
---
Ran all the backend and end-end tests with no failures.

Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
5 files changed, 140 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/8034/15
--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 15
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2017-11-01 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8034 )

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/util/dict-encoding.h@63
PS12, Line 63:   /// dict_encoded_size() bytes.
 :   virtual void Writ
> Close() should call ClearIndices() as one of its steps, ClearIndices() stil
My bad didn't realize it, now that's fixed.



--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 14
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:44:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

2017-11-01 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7822 )

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
..


Patch Set 7:

> Looks good to me and it looks like you addressed Dan's comments.
 > Dan, did you want to take another look?

No, I don't plan to look through the details.

How confident are we that we have sufficient test coverage with the added 
tests? If we're confident then, Tim, please do the +2 (or ask for a second 
review from another reviewer if you think it's warranted).


--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 7
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:40:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has removed Taras Bobrovytsky from this change.  ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Removed reviewer Taras Bobrovytsky.
--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src.

2017-11-01 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8305 )

Change subject: IMPALA-5599: Clean up references to TimestampValue in be/src.
..


Patch Set 11:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/client-request-state.h@342
PS11, Line 342: start_time_us_, end_time_us_;
> These two may need to be initialized.

https://gerrit.cloudera.org/#/c/8430/


http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/impala-http-handler.cc@303
PS11, Line 303: record.end_time_us - record.start_time_us;
> This may be broken if end_time_us is not set yet. The old code will use the
Fixing via https://gerrit.cloudera.org/#/c/8430/


http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/impala-server.h@622
PS11, Line 622: /// Start and end time of the query, in Unix microseconds.
  : int64_t start_time_us, end_time_us;
> Can these be const ? They are always initialized in the constructor, right
The default constructor does not handle this well if we make them constants. 
We'll have to make all/most of the other fields constant as well if we are 
going to fix this.



--
To view, visit http://gerrit.cloudera.org:8080/8305
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c
Gerrit-Change-Number: 8305
Gerrit-PatchSet: 11
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:21:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.

2017-11-01 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8215 )

Change subject: IMPALA-5142 EventSequence displays negative elapsed time.
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h@326
PS4, Line 326: bool eventlist_sorted = true;
 : int64_t prev = 0L;
 : for (const Event& event: events_) {
 :   if (event.second < prev) {
 : eventlist_sorted = false;
 : break;
 :   }
 :   prev = event.second;
 : }
 :
why bother with this? the list is pretty short and this isn't a critical path, 
so I think it should be fine to always call sort().
(and if you were to do that, you might as well use std::is_sorted())


http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h@336
PS4, Line 336: [](Event const ,
nit: how about breaking the line before the [] instead of in the middle of the 
lambda param list?



--
To view, visit http://gerrit.cloudera.org:8080/8215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896
Gerrit-Change-Number: 8215
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:17:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala

2017-11-01 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8439


Change subject: IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala
..

IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala

KRPC has some flags that turn on TLS. This patch sets those to enable
TLS communication.

Tests are added to rpc-mgr-test.

Change-Id: I9a14a44fdea9ab668f3714eb69fdb188bce38f5a
---
M be/src/catalog/catalogd-main.cc
M be/src/rpc/authentication-test.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr.cc
M be/src/rpc/rpc-mgr.h
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/runtime/exec-env.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestored-main.cc
M be/src/testutil/in-process-servers.cc
M be/src/util/openssl-util.cc
M be/src/util/openssl-util.h
15 files changed, 366 insertions(+), 37 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/8439/1
--
To view, visit http://gerrit.cloudera.org:8080/8439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9a14a44fdea9ab668f3714eb69fdb188bce38f5a
Gerrit-Change-Number: 8439
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5053: [SECURITY] Make KRPC work with Kerberos

2017-11-01 Thread Sailesh Mukil (Code Review)
Hello Michael Ho,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8270

to look at the new patch set (#2).

Change subject: IMPALA-5053: [SECURITY] Make KRPC work with Kerberos
..

IMPALA-5053: [SECURITY] Make KRPC work with Kerberos

KuduRPC has support for Kerberos. However, since Impala's client transport
still uses the Thrift transport stack, we need to make sure that a single
security configuration applies to both internal communication (KuduRPC)
and external communication (Thrift's TSaslTransport).

This patch changes InitAuth() to start Sasl regardless of security
configuration, since KRPC uses plain SASL for negotiation on insecure
clusters.

It also moves some utility code out of authentication.cc into
auth-util.cc for resuse by the RpcMgr while enabling kerberos.

The MiniKDC related code is moved out of thrift-server-test.cc into a
new file called mini-kdc-wrapper.h/cc. This file exposes a new class
MiniKdcWrapper which can be easily used by the tests to configure the
kerberos environment, create the keytab, start the KDC and also
initialize the Impala security library.

Tests are added to rpc-mgr-test for kerberos tests over KRPC.
thrift-server-test also has a mechanical change to use MiniKdcWrapper.
Also tested on a live cluster configured to use kerberos.

Change-Id: I8cec5cca5fdb4b1d46bab19e86cb1a8a3ad718fd
---
M be/src/rpc/CMakeLists.txt
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication-test.cc
M be/src/rpc/authentication.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr.cc
M be/src/rpc/thrift-server-test.cc
M be/src/testutil/CMakeLists.txt
A be/src/testutil/mini-kdc-wrapper.cc
A be/src/testutil/mini-kdc-wrapper.h
M be/src/util/auth-util.cc
M be/src/util/auth-util.h
12 files changed, 427 insertions(+), 172 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8270/2
--
To view, visit http://gerrit.cloudera.org:8080/8270
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8cec5cca5fdb4b1d46bab19e86cb1a8a3ad718fd
Gerrit-Change-Number: 8270
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-01 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h@376
PS1, Line 376:   virtual bool DictionaryReferencesBuffer() const { return 
false; }
Add comment. Make pure if possible. Alternatively, you could do something 
similar to PageContainsTupleData().


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@272
PS1, Line 272:   virtual bool DictionaryReferencesBuffer() const {
add "override"


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@541
PS1, Line 541: some
how about rewording this to "Column readers for types that require ..."


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@542
PS1, Line 542:   inline bool DictionaryReferencesBufferInline() const {
nit: single line


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@945
PS1, Line 945: tmp_buffer
better name


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@947
PS1, Line 947:   int uncompressed_size = decompressor_.get() != nullptr
Move this to L954


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@951
PS1, Line 951:   if (decompressor_.get() == nullptr && 
!DictionaryReferencesBuffer()) {
Can you add a brief comment to explain how the 4 different cases are handled?


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@987
PS1, Line 987: if (DictionaryReferencesBuffer()) {
 :   memcpy(dict_values, data_, data_size);
 : }
Single line :)



--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 01 Nov 2017 19:16:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-11-01 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8434 )

Change subject: IMPALA-5564: Release lock during planning. (wip)
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17
PS1, Line 17: whereas the web UI would previously block on the lock.
> Looking over https://gerrit.cloudera.org/c/6707/12/be/src/service/impala-se
Yep, we take the lock if the query is not in a CREATED state.

--
Status ImpalaServer::GetRuntimeProfileStr(const TUniqueId& query_id,
const string& user, bool base64_encoded, stringstream* output) {
  DCHECK(output != nullptr);
  // Search for the query id in the active query map
  {
shared_ptr request_state = 
GetClientRequestState(query_id);
if (request_state.get() != nullptr) {
  // For queries in CREATED state, the profile information isn't populated 
yet.
  if (request_state->query_state() == beeswax::QueryState::CREATED) {
return Status::Expected("Query plan is not ready.");
  }
  lock_guard l(*request_state->lock());
--

I agree this is a very important change from a usability/supportability 
perspective, to have a partial profile visible to the user.


http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@26
PS1, Line 26: know the query id
: that's necessary to call GetResultSetMetadata()
> Yes, IMPALA-2568 is one of the eventual goals. This is actually listed as t
I was thinking more along the "profile copy" path Dan suggested, so that we can 
avoid removing the lock during planning. I thought the latter involved more 
work since it could affect cancellations. With the current patch, cancellations 
don't block anymore. when the query is planning, so I think we should make sure 
to get that part right. I'm fine with this approach, if the cancellation works 
without any issues.



--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 01 Nov 2017 18:41:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-11-01 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8434 )

Change subject: IMPALA-5564: Release lock during planning. (wip)
..


Patch Set 1:

Also, besides the RPCs, let's consider how to handle the webserver cancelation 
path. Maybe we need to not show the cancel link until we can actually cancel?


--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 01 Nov 2017 17:56:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-11-01 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8434 )

Change subject: IMPALA-5564: Release lock during planning. (wip)
..


Patch Set 1:

(2 comments)

I think this path is worth continuing down. The alternate solution would be to 
have a profile copy that can be returned before the actual profile is ready, 
but if releasing the lock works out, that seems simplest.  We should check how 
this minimal profile looks in CM -- there might be some other fields that they 
require?

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17
PS1, Line 17: whereas the web UI would previously block on the lock.
> Looking over https://gerrit.cloudera.org/c/6707/12/be/src/service/impala-se
Right, this JIRA is a follow up to 1972 to improve things further so that the 
(partial) query profile can be retrieved.


http://gerrit.cloudera.org:8080/#/c/8434/1/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/8434/1/be/src/service/client-request-state.h@342
PS1, Line 342: is_planning_
maybe make that is_planning_finished_ (or similar), and initialize it to false 
and just set to true after planning? Actually, since the goal is to guard 
against other RPCs to execute until the handle is returned, maybe just make it 
is_execute_stmt_finished?



--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 01 Nov 2017 17:54:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1575: part 2: yield admission control resources

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8323 )

Change subject: IMPALA-1575: part 2: yield admission control resources
..


Patch Set 5: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac
Gerrit-Change-Number: 8323
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 01 Nov 2017 17:34:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1575: part 2: yield admission control resources

2017-11-01 Thread Tim Armstrong (Code Review)
Hello anujphadke, Joe McDonnell, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8323

to look at the new patch set (#5).

Change subject: IMPALA-1575: part 2: yield admission control resources
..

IMPALA-1575: part 2: yield admission control resources

This change releases admission control resources more eagerly,
once the query has finished actively executing. Some resources
(tracked and untracked) are still consumed by the client request
as long as it remains open, e.g. memory for control structures
and the result cache. However, these resources are relatively
small and should not block admission of new queries.

The same as in part 1, query execution is considered to be finished
under any of the following conditions:
1. The query encounters an error and fails
2. The query is cancelled due to the idle query timeout
3. The query reaches eos (or the DML completes)
4. The client cancels the query without closing the query

Admission control resources are released in two ways:
1. by calling AdmissionController::ReleaseQuery() on the coordinator
   promptly after query execution finishes, instead of waiting for
   UnregisterQuery(). This means that the query and its memory is
   no longer considered "admitted".
2. by changing the behaviour of MemTracker::GetPoolMemReserved() so
   that it is aware of when a query has finished executing and does not
   consider its entire memory limit to be "reserved".

The preconditions for releasing an admitted query are subtle because the
queries are being admitted to a distributed system, not just the
coordinator.  The comment for ReleaseAdmissionControlResources()
documents the preconditions and rationale. Note that the preconditions
are not weaker than the preconditions of calling UnregisterQuery()
before this patch.

Testing:
TestAdmissionController is extended to end queries in four ways:
cancellation by client, idle timeout, the last row being fetched,
and the client closing the query. The test uses a mix of all four.
After the query ends, all clients wait for the test to complete
before closing the query or closing the connection. This ensures
that the admission control decisions are based entirely on the
query end behavior. This test works for both query admission control
and mem_limit admission control and can detect both kinds of admission
control resources ("admitted" and "reserved") not being released
promptly.

This is based on an earlier patch by Joe McDonnell.

Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/query-state.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/service/client-request-state.cc
M tests/custom_cluster/test_admission_controller.py
9 files changed, 192 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8323/5
--
To view, visit http://gerrit.cloudera.org:8080/8323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac
Gerrit-Change-Number: 8323
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-01 Thread anujphadke (Code Review)
anujphadke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 01 Nov 2017 17:29:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

2017-11-01 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7009 )

Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format
..


Patch Set 5:

I will certainly get back to this change. Needed to handle some personal issues 
and got off track.


--
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-Change-Number: 7009
Gerrit-PatchSet: 5
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Wed, 01 Nov 2017 17:18:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 2:

Could you have a look, Taras? I know you recently did a little work on memory 
management in the text scanners. Hopefully the bug/fix should be fairly 
clear-cut.


--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 01 Nov 2017 16:58:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.

2017-11-01 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8430 )

Change subject: IMPALA-6136: Part 1: Query duration should not be normally 
negative.
..


Patch Set 2: Code-Review+2

Please update the comment as suggested by Phil.


--
To view, visit http://gerrit.cloudera.org:8080/8430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90
Gerrit-Change-Number: 8430
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 01 Nov 2017 16:49:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.

2017-11-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8430 )

Change subject: IMPALA-6136: Part 1: Query duration should not be normally 
negative.
..


Patch Set 2: Code-Review+1

Please see some more comments in https://gerrit.cloudera.org/c/8305/


--
To view, visit http://gerrit.cloudera.org:8080/8430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90
Gerrit-Change-Number: 8430
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 01 Nov 2017 16:36:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src.

2017-11-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8305 )

Change subject: IMPALA-5599: Clean up references to TimestampValue in be/src.
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/impala-server.h@622
PS11, Line 622: /// Start and end time of the query, in Unix microseconds.
  : int64_t start_time_us, end_time_us;
Can these be const ? They are always initialized in the constructor, right ?



--
To view, visit http://gerrit.cloudera.org:8080/8305
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c
Gerrit-Change-Number: 8305
Gerrit-PatchSet: 11
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 01 Nov 2017 16:36:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8438/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8438/2//COMMIT_MSG@9
PS2, Line 9: The bug was that the buffer pointed to by byte_buffer_ could be 
freed by
> Do you mean byte_buffer_ptr_?
Yeah I will fix in the next iteration.



--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 01 Nov 2017 16:16:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.

2017-11-01 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8430 )

Change subject: IMPALA-6136: Part 1: Query duration should not be normally 
negative.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8430/2/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/8430/2/be/src/service/impala-http-handler.cc@303
PS2, Line 303:   // record.end_time_us might still be zero if the query is not 
yet done
nit: Could you update the following comment in client-request-state.h and 
impala-server.h (once for ClientRequestState, once for QueryStateRecord) to say 
something about how 0 is a special value.

  /// Start/end time of the query, in Unix microseconds.
  int64_t start_time_us_, end_time_us_;



--
To view, visit http://gerrit.cloudera.org:8080/8430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90
Gerrit-Change-Number: 8430
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 01 Nov 2017 16:02:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-11-01 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8434 )

Change subject: IMPALA-5564: Release lock during planning. (wip)
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17
PS1, Line 17: whereas the web UI would previously block on the lock.
> I tried this out (admittedly 1.5 weeks ago, but I don't think it's changed)
Looking over 
https://gerrit.cloudera.org/c/6707/12/be/src/service/impala-server.cc a bit 
more clearly, this is replacing "Query plan is not ready" with the actual 
profile. The profile is useful at this point: it has been populated with the 
query string, the time that planning started, and so on.



--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 01 Nov 2017 15:36:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-11-01 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8434 )

Change subject: IMPALA-5564: Release lock during planning. (wip)
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17
PS1, Line 17: whereas the web UI would previously block on the lock.
> After IMPALA-1972, it doesn't block, it just returns an empty profile.
I tried this out (admittedly 1.5 weeks ago, but I don't think it's changed). 
You can go to the web UI and see the queries, but if you click on "Profile", 
you'll get stuck getting the lock in ImpalaServer::GetRuntimeProfileStr() 
below. Am I missing something?

Status ImpalaServer::GetRuntimeProfileStr(const TUniqueId& query_id,
const string& user, bool base64_encoded, stringstream* output) {
  DCHECK(output != nullptr);
  // Search for the query id in the active query map
  {
shared_ptr request_state = 
GetClientRequestState(query_id);
if (request_state.get() != nullptr) {
  lock_guard l(*request_state->lock());


http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@26
PS1, Line 26: know the query id
: that's necessary to call GetResultSetMetadata()
> It looks to me like this is against IMPALA-2568. I guess we eventually want
Yes, IMPALA-2568 is one of the eventual goals. This is actually listed as the 
first subtask of that JIRA. It's easier to argue that this one is compatible.

As for this one needing to be undone, it's hard to say. I think you solve 
IMPALA-2568 by having more fine-grained and clear query states/lifecycles. I 
think this is a step in that direction.

What would you suggest?



--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 01 Nov 2017 15:32:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-01 Thread anujphadke (Code Review)
anujphadke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8438/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8438/2//COMMIT_MSG@9
PS2, Line 9: The bug was that the buffer pointed to by byte_buffer_ could be 
freed by
Do you mean byte_buffer_ptr_?



--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 01 Nov 2017 08:52:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6080: clean up table descriptor handling

2017-11-01 Thread anujphadke (Code Review)
anujphadke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8330 )

Change subject: IMPALA-6080: clean up table descriptor handling
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8330/2/be/src/benchmarks/expr-benchmark.cc
File be/src/benchmarks/expr-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/8330/2/be/src/benchmarks/expr-benchmark.cc@57
PS2, Line 57: #include "service/frontend.h"
I don't think this include is needed.



--
To view, visit http://gerrit.cloudera.org:8080/8330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id427dab0c196b556bd8b2d64ec618403d5cbd4d6
Gerrit-Change-Number: 8330
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 01 Nov 2017 08:05:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8438


Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..

IMPALA-6137: fix text scanner split delim mem mgmt

The bug was that the buffer pointed to by byte_buffer_ could be freed by
ReleaseCompletedResources() before CheckForSplitDelimiter() was called.

The simple fix is to copy out the single byte that is needed each time
the buffer is filled.

Testing:
Ran exhaustive query tests under ASAN with --disable_mem_pools=true.

Before the change test_text_split_delimiters reliably caused an ASAN
failure when run with --disable_mem_pools=true. We should get this
coverage automatically once the I/O mgr switches to using the buffer
pool, which uses ASAN poisoning on freed buffers.

Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
2 files changed, 10 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8438/1
--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/8414 )

Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt
..

IMPALA-4835: Part 1: simplify I/O mgr mem mgmt

In preparation for switching the I/O mgr to the buffer pool, this
removes and cleans up a lot of code so that the switchover patch starts
from a cleaner slate.

* Remove the free buffer cache (which will be replaced by buffer pool's
  own caching).
* Make memory limit exceeded error checking synchronous (in anticipation
  of having to propagate buffer pool errors synchronously).
* Misc other refactoring and simplification.

This will probably regress performance somewhat because of the cache
removal, so my plan is to merge it around the same time as switching
the I/O mgr to allocate from the buffer pool. I'm keeping the patches
separate to make reviewing easier.

Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/runtime/disk-io-mgr-reader-context.cc
M be/src/runtime/disk-io-mgr-reader-context.h
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr-stress.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/test-env.cc
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
15 files changed, 216 insertions(+), 575 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8414/4
--
To view, visit http://gerrit.cloudera.org:8080/8414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1
Gerrit-Change-Number: 8414
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6121: remove I/O mgr request context cache

2017-11-01 Thread Tim Armstrong (Code Review)
Hello Michael Ho,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8408

to look at the new patch set (#4).

Change subject: IMPALA-6121: remove I/O mgr request context cache
..

IMPALA-6121: remove I/O mgr request context cache

This simplifies the lifecycle of the request contexts and eliminates
some code. The comments claim that request context cache improves
performance when allocating smallish the objects. But allocating
from TCMalloc's thread caches should scale much better than a
global object pool protected by a lock.

I needed to move the definition to a non-internal header file so that it
was visible to clients that manage it by unique_ptr.

We also do not need to transfer the request contexts to the RuntimeState
since I/O buffers do not leave scanners now.

Testing:
Ran exhaustive tests.

Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/runtime/disk-io-mgr-internal.h
M be/src/runtime/disk-io-mgr-reader-context.cc
A be/src/runtime/disk-io-mgr-reader-context.h
M be/src/runtime/disk-io-mgr-stress.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
16 files changed, 614 insertions(+), 780 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8408/4
--
To view, visit http://gerrit.cloudera.org:8080/8408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108
Gerrit-Change-Number: 8408
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.

2017-11-01 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8430 )

Change subject: IMPALA-6136: Part 1: Query duration should not be normally 
negative.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8430/1/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/8430/1/be/src/service/impala-http-handler.cc@306
PS1, Line 306:  record.end_time_us
> record.end_time_us > 0
Done


http://gerrit.cloudera.org:8080/#/c/8430/1/be/src/service/impala-http-handler.cc@306
PS1, Line 306: ending_time_us
> end_time_us
Changed as suggested.



--
To view, visit http://gerrit.cloudera.org:8080/8430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90
Gerrit-Change-Number: 8430
Gerrit-PatchSet: 1
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 01 Nov 2017 06:28:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.

2017-11-01 Thread Zoram Thanga (Code Review)
Hello Michael Ho, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8430

to look at the new patch set (#2).

Change subject: IMPALA-6136: Part 1: Query duration should not be normally 
negative.
..

IMPALA-6136: Part 1: Query duration should not be normally negative.

The second commit for IMPALA-5599, 1640aa97, introduced a regression
where calculating the duration of an in-progress query can result
in negative values. This can happen for two reasons:

1. The value of ClientRequestState::end_time_us_ is not initialized
   in the constructor, and may have some random value until
   ClientRequestState::Done() is called.
2. If ClientRequestState::end_time_us_ happens to have a positive
   value less than ClientRequestState::start_time_us_, the query
   duration will be negative. Here, since the query is still in
   flight, we need to use the local current time as the end time.

The fix has been manually verified by executing long-running queries
and checking the query profiles to ensure the durations are not some
random junks. A new test case will be added to check for sane time
values in a follow-on commit.

Since we are using Unix time (system clock) for time stamps, it is
still possible for end_time_us_ to be less than start_time_us_ if
the system clock gets reset while the query is executing. If there
are clients that assume that a query duration is never negative,
we really should switch to a monotonic clock to time queries.

Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90
---
M be/src/service/client-request-state.cc
M be/src/service/impala-http-handler.cc
2 files changed, 7 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/8430/2
--
To view, visit http://gerrit.cloudera.org:8080/8430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90
Gerrit-Change-Number: 8430
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga