[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-03 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8461 )

Change subject: IMPALA-6151: add query-level fragment/backend counters
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@993
PS2, Line 993:   RETURN_IF_ERROR(CancelInternal(query_id, check_inflight, 
cause));
> The only reason CancelInternal() can fail is if the query isn't registered
Good point


http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc@1052
PS3, Line 1052:   ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L);
Why keep it above ArchiveQuery()?

Not your change, but the function comment on ArchiveQuery seems wrong and 
confusing with decrementing the metric here.



--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 04 Nov 2017 04:23:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-11-03 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8023 )

Change subject: IMPALA-4856: Port data stream service to KRPC
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@199
PS7, Line 199:   DeserializeTask payload =
 :   {DeserializeTaskType::EARLY_SENDERS, finst_id, 
dest_node_id, 0};
 :   deserialize_pool_.Offer(move(payload));
> doesn't this mean we make early sender draining single threaded? shoudl we
Yes. Actually, I just realized that early senders may also be passed by 
incoming row batches before they are deserialized by the deserialization thread 
pool, leading to extended response time for early senders.

A simpler scheme would be to actually put the early senders into the 
'deferred_batches' queue of the corresponding sender's queue so new incoming 
row batches cannot pass it.

Regarding the parallelism for draining the deferred_batches queue, one simple 
thing to do is to enqueue as many deserialization requests as there are entries 
in the 'deferred_batch' queue. The deserialization thread logic will simply 
peek the first entry of the queue and try to insert it if there is space. An 
entry is popped off the queue only if it can be inserted. This may be wasteful 
if the 'batch_queue' fills up before all deserialization thread requests are 
drained but hopefully the peeking logic shouldn't take too long. We can be more 
fancy and record the deserialized size of each entry in deferred_batches_ and 
determine how many entries we can pop off deferred_batches_ queue without going 
over the limit.


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@235
PS7, Line 235: for (const unique_ptr& ctx : 
early_senders.waiting_sender_ctxs) {
> shouldn't we process waiting_sender_ctxs before closed_sender_ctxs? Otherwi
Yes, it's impossible for the same sender in both queues at the same time but 
yeah, I can switch the order if it's easier to understand.



--
To view, visit http://gerrit.cloudera.org:8080/8023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Gerrit-Change-Number: 8023
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Sat, 04 Nov 2017 04:00:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8455 )

Change subject: IMPALA-6144: PublishFilter() continues to run after query 
failure/cancellation
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 04 Nov 2017 02:08:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8455 )

Change subject: IMPALA-6144: PublishFilter() continues to run after query 
failure/cancellation
..

IMPALA-6144: PublishFilter() continues to run after query failure/cancellation

The CoordinatorBackendState::PublishFilter() function does not check
for query failure/cancellation. So if runtime filters are being published
during/after a failure, they will not be cancelled and still be sent
out which may take a while depending on the size of the cluster.
Also, these functions could potentially hold very large amounts
of untracked memory.

This patch fixes it by checking for cancellation/failure in
PublishFilter.

Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Reviewed-on: http://gerrit.cloudera.org:8080/8455
Reviewed-by: Bharath Vissapragada 
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.cc
2 files changed, 9 insertions(+), 0 deletions(-)

Approvals:
  Bharath Vissapragada: Looks good to me, but someone else must approve
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8459 )

Change subject: Fix errant, newline-including log directory.
..

Fix errant, newline-including log directory.

We've seen cases where a directory named "cluster\n  " sneaks
into the logs directory. The intention is that $IMPALA_ALL_LOGS_DIRS
is a space-separated list, but clean.sh (as opposed to buildall.sh)
treats it as a single argument.

I tested this manually:

Before:

  $ls logs/
  be_tests/  custom_cluster_tests/  data_loading/  ee_tests/  fe_tests/

Bad command:

  $mkdir -p "${IMPALA_ALL_LOGS_DIRS}"

After:

  $ls logs/
  be_tests/  cluster?  /  custom_cluster_tests/  data_loading/  ee_tests/
  fe_tests/

Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
Reviewed-on: http://gerrit.cloudera.org:8080/8459
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M bin/clean.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
Gerrit-Change-Number: 8459
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8459 )

Change subject: Fix errant, newline-including log directory.
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
Gerrit-Change-Number: 8459
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 04 Nov 2017 01:42:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8461 )

Change subject: IMPALA-6151: add query-level fragment/backend counters
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@993
PS2, Line 993:   RETURN_IF_ERROR(CancelInternal(query_id, check_inflight, 
cause));
> What if we hit an error here? Need to still maintain the metric.
The only reason CancelInternal() can fail is if the query isn't registered - 
otherwise this would be really broken.

By decrementing it below we're keeping the count in sync with the entries in 
client_request_state_map_, which seems better.


http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@1008
PS2, Line 1008:   ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L);
> To be most accurate, shouldn't we move this to the end of this function? Th
Done


http://gerrit.cloudera.org:8080/#/c/8461/2/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/8461/2/common/thrift/metrics.json@477
PS2, Line 477: "label": "Queries Registered",
> Imo we need to be careful with terminology. Our query states are already ha
I added to the description. I agree that we should be careful here - I picked 
this definition of queries since it's the most expansive and doesn't depend on 
the different query states directly.



--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 04 Nov 2017 00:39:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-03 Thread Tim Armstrong (Code Review)
Hello Thomas Tauber-Marshall, Mostafa Mokhtar, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8461

to look at the new patch set (#3).

Change subject: IMPALA-6151: add query-level fragment/backend counters
..

IMPALA-6151: add query-level fragment/backend counters

This adds NumBackends, NumFragments and NumFragmentInstances
counters to the query profile to make it easier to manually or
programmatically analyse the query.

Also add a num-queries-registered metric to track the number
of queries that have been executed but are not yet unregistered.

Testing:
Ran "select count(*) from alltypessmall" and checked profile:

Backend startup latencies: Count: 3, min / max: 1ms / 1ms, 25th %-ile: 1ms, 
50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 1ms, 95th %-ile: 1ms, 99.9th 
%-ile: 1ms
Per Node Peak Memory Usage: tarmstrong-box:22000(1.10 MB) 
tarmstrong-box:22001(1.02 MB) tarmstrong-box:22002(1.02 MB)
 - FiltersReceived: 0 (0)
 - FinalizationTimer: 0.000ns
 - NumBackends: 3 (3)
 - NumFragmentInstances: 4 (4)
 - NumFragments: 2 (2)

Ran some queries and manually checked the num-queries-registered metric
on the /metrics page when the queries were running and after they
finished.

Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
---
M be/src/runtime/coordinator.cc
M be/src/service/impala-server.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
5 files changed, 36 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/8461/3
--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7822 )

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
..


Patch Set 9: Code-Review+2

LGTM. Probably best to wait until after the weekend to merge it though :).


--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 9
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 04 Nov 2017 00:29:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2281: Replace FNV with FastHash in exchange nodes

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8417 )

Change subject: IMPALA-2281: Replace FNV with FastHash in exchange nodes
..


Patch Set 2:

(12 comments)

Did a first pass over it.

http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/codegen/gen_ir_descriptions.py
File be/src/codegen/gen_ir_descriptions.py:

http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/codegen/gen_ir_descriptions.py@99
PS2, Line 99:   ["HASH_FAST", "IrFastHash"],
Do we use this? I don't see any places where it's used currently.


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/data-stream-sender.cc@a468
PS2, Line 468:
Lol, weird.


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/data-stream-sender.cc@474
PS2, Line 474: partition_expr_evals_[j
Use 'eval' directly.


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/raw-value.cc
File be/src/runtime/raw-value.cc:

http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/raw-value.cc@220
PS2, Line 220:   return HashUtil::FastHash64(v, 
static_cast(type.GetByteSize()), seed);
I think this might be slightly slower - with the previous approach I think we 
get a specialised version of the hash function for that input length for each 
data type, whereas here we have to go through another switch in GetByteSize().


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h
File be/src/util/hash-util.h:

http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@235
PS2, Line 235:   /* The MIT License
I think this should go at the top of the file beneath the apache license. Then 
we can just say that the FastHash64 implementation came with that license. E.g.

/*
  FastHash64 implementation derived from MIT-licensed code written by Zilong Tan

  The MIT License
  ...
*/


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@263
PS2, Line 263: size_t
use int64_t - we generally use signed integers for lengths.


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@266
PS2, Line 266: (const uint64_t *)
We use c-style casts - i.e. reinterpret_cast<>


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@267
PS2, Line 267: pos
I believe the C++ standard doesn't allow pointer arithmetic on void - we should 
convert to uint8_t for doing that.


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@278
PS2, Line 278:  (const unsigned char*
See above comments.


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@282
PS2, Line 282: (uint64_t)
should use static_cast()


http://gerrit.cloudera.org:8080/#/c/8417/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test:

http://gerrit.cloudera.org:8080/#/c/8417/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test@145
PS2, Line 145: 8,'k1',-1,'k1',1
Why does this make a difference when we have VERIFY_IS_EQUAL_SORTED above?


http://gerrit.cloudera.org:8080/#/c/8417/2/testdata/workloads/tpcds/queries/tpcds-q77a.test
File testdata/workloads/tpcds/queries/tpcds-q77a.test:

http://gerrit.cloudera.org:8080/#/c/8417/2/testdata/workloads/tpcds/queries/tpcds-q77a.test@125
PS2, Line 125: 'catalog channel',NULL,538912.55,2050279.74,-1383554.73
I'll file a JIRA to make this test deterministically pass: IMPALA-6155. In the 
meantime, can you change this to

    RESULTS: VERIFY_IS_EQUAL_SORTED

That will make it ignore the order of results.



--
To view, visit http://gerrit.cloudera.org:8080/8417
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Gerrit-Change-Number: 8417
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 04 Nov 2017 00:28:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

2017-11-03 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7822 )

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
..


Patch Set 8:

(2 comments)

@Tim, can you take a last look at the changes and carry over the +2 if they are 
ok?

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc
File be/src/exec/parquet-plain-test.cc:

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc@33
PS7, Line 33: int Encode(const InternalType& t, int encoded_byte_size, uint8_t* 
buffer,
> Oh ok, thanks for the explanation. I think we should also add a comment men
Done


http://gerrit.cloudera.org:8080/#/c/7822/8/testdata/data/binary_decimal_dictionary.parquet
File testdata/data/binary_decimal_dictionary.parquet:

PS8:
> We should also mention how these files were generated in the README
Done



--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 8
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 04 Nov 2017 00:23:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

2017-11-03 Thread Bikramjeet Vig (Code Review)
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7822

to look at the new patch set (#9).

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
..

IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

Extendes parquet column reader and associated classes to allow for more
than one possible physical type for a given logical type. This patch
only adds support for variable sized byte array encoded decimals and
more will be added in upcoming commits.
Also, column level metadata verification which was currently being
done per row group will now only be done once per column per file.

Testing:
Added backend test for verifying newly added decimal types are decoded
correctly.
Added Query test that decodes both plain and dictionary-encoded
decimals using binary encoding.

Performance:
Initial perf testing using tpcds_1000 shows no regression.

Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
---
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exec/parquet-plain-test.cc
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
M testdata/data/README
A testdata/data/binary_decimal_dictionary.parquet
A testdata/data/binary_decimal_no_dictionary.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test
M tests/query_test/test_scanners.py
19 files changed, 639 insertions(+), 283 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/9
--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 9
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt

2017-11-03 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8414 )

Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8414/4/be/src/runtime/disk-io-mgr-reader-context.cc
File be/src/runtime/disk-io-mgr-reader-context.cc:

http://gerrit.cloudera.org:8080/#/c/8414/4/be/src/runtime/disk-io-mgr-reader-context.cc@50
PS4, Line 50:   DCHECK(buffer->scan_range()->external_buffer_tag_ == 
ScanRange::ExternalBufferTag::NO_BUFFER)
long line


http://gerrit.cloudera.org:8080/#/c/8414/4/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8414/4/be/src/runtime/disk-io-mgr.cc@462
PS4, Line 462:   // Buffers the were not allocated by DiskIoMgr don't need 
to be freed.
existing typo: that



--
To view, visit http://gerrit.cloudera.org:8080/8414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1
Gerrit-Change-Number: 8414
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Sat, 04 Nov 2017 00:20:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE

2017-11-03 Thread Tim Wood (Code Review)
Tim Wood has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8372 )

Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use 
of TRUNCATE
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q47.test
File testdata/workloads/tpcds/queries/tpcds-q47.test:

http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q47.test@3
PS4, Line 3: -- RESULT MISMATCH FROM ORIGINAL in LSD of DECIMAL values, E.G,
> Would be good to use an example to better describe the mismatch
Done



--
To view, visit http://gerrit.cloudera.org:8080/8372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a
Gerrit-Change-Number: 8372
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Sat, 04 Nov 2017 00:19:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-03 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383
PS2, Line 383: d subscriber
> >  Where ever the 'SubscriberId' is
Thanks for the explanation.

Yea my point was if we're going to have a unique RegistrationId anyway, why 
have a SubscriberId. It seemed redundant.

But as you pointed out, it looks like the subscriber chooses the subscriber_id 
and not the statestore. So, it would be hard to enforce this.

Let's leave this for now.


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414
PS2, Line 414: onst SubscriberId& subscriber_id,
 : const TUniqueId& registration_id
> Not sure I understand. We get the subscriber/registration_id from the Sched
Nvm, my bad, I thought both were coming from the Subscriber object. Ignore this.


http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278
PS3, Line 278:   lock_guard l(subscribers_lock_);
 :   lock_guard t(topic_lock_);
I just noticed this. Getting a SpinLock before getting a mutex is an 
anti-pattern.

Even attempting to get a spinlock while already holding a spinlock is also not 
exactly a great idea. However, our SpinLock implementation sleeps after a few 
cycles of trying to obtain the lock anyway.

Do we know if we do a lot of work holding the topic_lock_? If not, let's change 
this to a SpinLock too. (The GatherTopicUpdates() holds topic_lock_ and 
iterates through a nested loop, but I'm not sure how many iterations that would 
be in the worst case).

If it looks like we will end up doing a lot of work holing the lock, we can be 
safe and just turn the 'subscribers_lock_' back to a mutex.


http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@415
PS3, Line 415: const TUniqueId&
const RegistrationId&



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Sat, 04 Nov 2017 00:18:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-11-03 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..


Patch Set 6:

(12 comments)

Some high-level comments before I dig deeper.

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@94
PS2, Line 94: // TODO: derived slot refs (e.g., star-expanded) will not 
have rawPath set.
Why can't this be addressed in this patch?


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@287
PS2, Line 287:* rewritten, null is returned. If 'inPred' is rewritten, the 
resulting expression
... and the RHS is a subquery.


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@299
PS2, Line 299:* 2) Predicate is NOT IN and RHS returns a single row.
What if the RHS subquery is correlated?


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319
PS2, Line 319:* NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL 
THEN C ELSE e END) = C)
Is the transformation even correct if RHS is correlated?


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@312
PS6, Line 312:*a) No group by or analytic function in subquery.
Ideally, we should not have to distinguish between cases 3a and 3b, and we 
should always do this rewrite:

C NOT IN (RHS)

Rewrites to:

NOT EXISTS (SELECT 1 FROM (RHS) v WHERE C IS NULL OR IFNULL(v.e, C) = C)

My understanding is that such a rewrite dues not work for case 3a when there is 
correlation, so we use an alternate rewrite that does not require a new inline 
view. I think we should reorganize the comments here to reflect this line of 
thinking and the limitations. For example, something like this:


3) Predicate is NOT IN and RHS returns multiple rows.

General rewrite: 


a) No group by or analytic


b) Group by  or analytic function in subquery



http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@316
PS6, Line 316:*REWRITE:
Use IFNULL() instead of CASE to simplify.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319
PS6, Line 319:* NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL 
THEN C ELSE e END) = C)
I think there's another case where this rewrite is not correct: when the 
subquery has an order by + limit. Basically, we are reimplementing the 
"predicate push down" correctness checks here.

I'm wondering whether we should reduce the scope of this patch and only allow 
those case where the generic rewrite works. In a separate JIRA we should 
consider whether we can make the generic rewrite also work for case 3a, and if 
not what to do instead.

What do you think?


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@321
PS6, Line 321:*Example:
The following query gives me an IllegalStateException:

explain select id from functional.alltypes t1 where 10 not in (select id from 
functional.alltypestiny order by id limit 2);

Existing bug or related to this change?


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@328
PS6, Line 328:*Note: it useful to think of C NOT IN (RHS) as the 
boolean expansion:
it is useful


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@340
PS6, Line 340:*b) Group by  or analytic function in subquery.
extra space after "by"


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@345
PS6, Line 345:*  NOT EXISTS (SELECT x FROM (RHS) t WHERE C = t or t IS 
NOT NULL)
What does C=t mean here?
"t" is a table alias and cannot be compared to a C

It's not immediately clear to me why the rewrite here is equivalent to the 
generic rewrite described in 3a. Please explain (see my suggestion above). 
Alternatively, I'm also ok if you want to just use the generic rewrite here.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@354
PS6, Line 354:* - Assumes that subquery is uncorrelated, 

[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE

2017-11-03 Thread Tim Wood (Code Review)
Tim Wood has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8372 )

Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use 
of TRUNCATE
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q26.test
File testdata/workloads/tpcds/queries/tpcds-q26.test:

http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q26.test@3
PS4, Line 3: -- RESULT MISMATCH FROM ORIGINAL, E.G.,
> Please use a more descriptive comment.
Done


http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q26.test@4
PS4, Line 4: -- Expected: 'AAID',78.33,126.37,0.00,61.94
> What does this mean?
Done


http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q59.test
File testdata/workloads/tpcds/queries/tpcds-q59.test:

http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q59.test@3
PS4, Line 3: with
> Remove line
Done


http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q63.test
File testdata/workloads/tpcds/queries/tpcds-q63.test:

http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q63.test@3
PS4, Line 3: select
> Remove line
Done



--
To view, visit http://gerrit.cloudera.org:8080/8372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a
Gerrit-Change-Number: 8372
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Sat, 04 Nov 2017 00:15:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE

2017-11-03 Thread Tim Wood (Code Review)
Hello Greg Rahn, Matthew Mulder, Michael Brown, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8372

to look at the new patch set (#5).

Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use 
of TRUNCATE
..

IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE

The .test files in this commit are ones held out from IMPALA-5376
because of observed anomalies with returned decimal precision that I
previously controlled with TRUNCATE().  This ticket was filed after team
members expressed a preference not to use TRUNCATE() and to use
actual results generated within numerical range of expected,
unless the results could not be controlled otherwise.  Rounds of testing
for this commit show that the earlier anomalies no longer occur,
clearing the way for their inclusion in our TPC-DS suite.  It has been
observed that these tests tend to fail with the DECIMAL_V2 option set
(unless the test does so explicitly).

Testing: The gerrit_verify_dryrun_external (build #24) job passes with this
change.  The fix (rebased hereon) to IMPALA-6106 (finally) cleared up a
source of flapping for these tests.

Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a
---
A testdata/workloads/tpcds/queries/tpcds-q26.test
A testdata/workloads/tpcds/queries/tpcds-q28.test
M testdata/workloads/tpcds/queries/tpcds-q39-1.test
M testdata/workloads/tpcds/queries/tpcds-q39-2.test
A testdata/workloads/tpcds/queries/tpcds-q47.test
A testdata/workloads/tpcds/queries/tpcds-q57.test
A testdata/workloads/tpcds/queries/tpcds-q59.test
M testdata/workloads/tpcds/queries/tpcds-q61.test
A testdata/workloads/tpcds/queries/tpcds-q63.test
M testdata/workloads/tpcds/queries/tpcds-q77a.test
M testdata/workloads/tpcds/queries/tpcds-q78.test
M testdata/workloads/tpcds/queries/tpcds-q86a.test
M tests/query_test/test_tpcds_queries.py
13 files changed, 849 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8372/5
--
To view, visit http://gerrit.cloudera.org:8080/8372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a
Gerrit-Change-Number: 8372
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-03 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8461 )

Change subject: IMPALA-6151: add query-level fragment/backend counters
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@993
PS2, Line 993:   RETURN_IF_ERROR(CancelInternal(query_id, check_inflight, 
cause));
What if we hit an error here? Need to still maintain the metric.


http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@1008
PS2, Line 1008:   ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L);
To be most accurate, shouldn't we move this to the end of this function? 
There's still some non-trivial work in thus function after this point.


http://gerrit.cloudera.org:8080/#/c/8461/2/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/8461/2/common/thrift/metrics.json@477
PS2, Line 477: "label": "Queries Registered",
Imo we need to be careful with terminology. Our query states are already hard 
to reason about for users.
How do the queries displayed in our "/queries" page relate to "registered" 
queries? I know the answer, but users might not due to confusion with what 
"registered", "in-flight", and "waiting-to-be-closed" mean.

I don't really have a good alternative to "registered", but we should at least 
add in the description that "registered" refers to "in-flight" and 
"waiting-to-be-closed" queries from the "/queries" page.



--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 23:51:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8464 )

Change subject: IMPALA-4591: Bound Kudu client error mem usage
..


Patch Set 1:

(8 comments)

This is a nice improvement. Had some questions/ideas but as-is this seems 
strictly better than before.

http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h
File be/src/exec/kudu-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@105
PS1, Line 105: allocated
nit: "consumed" to be consistent with the memtracker terminology.


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@106
PS1, Line 106: allocated_mem_
Maybe client_tracked_bytes_ to make it clearer that the unit is bytes and it's 
tracked for accounting purposes.


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@a52
PS1, Line 52:
Was this flag documented? Just wondering if we should consider what happens if 
someone set this manually.


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@39
PS1, Line 39: DEFINE_int32(kudu_mutation_buffer_size, 
DEFAULT_KUDU_MUTATION_BUFFER_SIZE,
Just throwing out ideas here, but did we think about pros/cons of making these 
query options? I guess mostly these defaults should be fine but sometimes it 
turns out to be inconvenient to only have a global setting (and to have to 
restart Impala for it to take effect.)


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@124
PS1, Line 124:   FLAGS_kudu_error_buffer_size > 1024 ? 
FLAGS_kudu_error_buffer_size : 1024;
Is this equivalent to the following?

  max(1024, FLAGS_kudu_error_buffer_size)


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@132
PS1, Line 132:   
RETURN_IF_ERROR(state->exec_env()->GetKuduClient(table_desc_->kudu_master_addresses(),
 _));
nit: long line.


http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@66
PS1, Line 66:   
@CustomClusterTestSuite.with_args(impalad_args="-kudu_error_buffer_size=1024")
It might be faster to make this a regular query test but insert more data so 
that it exceeds the default error buffer limit. Starting up a new cluster takes 
a long time (to be honest, this is fine though).


http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@74
PS1, Line 74: prsent
present



--
To view, visit http://gerrit.cloudera.org:8080/8464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
Gerrit-Change-Number: 8464
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 23:41:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8424 )

Change subject: IMPALA-4835 (prep only): create io subfolder and namespace
..


Patch Set 5:

Rebased since some things changed underneath this patch.


--
To view, visit http://gerrit.cloudera.org:8080/8424
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b
Gerrit-Change-Number: 8424
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 23:24:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace

2017-11-03 Thread Tim Armstrong (Code Review)
Hello Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8424

to look at the new patch set (#5).

Change subject: IMPALA-4835 (prep only): create io subfolder and namespace
..

IMPALA-4835 (prep only): create io subfolder and namespace

Instead of using the DiskIoMgr class as a namespace, which prevents
forward-declaration of inner classes, create an impala::io namespace
and unnested the inner class.

This is done in anticipation of DiskIoMgr depending on BufferPool. This
helps avoid a circular dependency between DiskIoMgr, TmpFileMgr and
BufferPool headers that could not be broken with forward declarations.

Testing:
Ran core tests.

Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b
---
M be/CMakeLists.txt
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/CMakeLists.txt
D be/src/runtime/disk-io-mgr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
A be/src/runtime/io/CMakeLists.txt
R be/src/runtime/io/disk-io-mgr-internal.h
R be/src/runtime/io/disk-io-mgr-stress-test.cc
R be/src/runtime/io/disk-io-mgr-stress.cc
R be/src/runtime/io/disk-io-mgr-stress.h
R be/src/runtime/io/disk-io-mgr-test.cc
R be/src/runtime/io/disk-io-mgr.cc
A be/src/runtime/io/disk-io-mgr.h
R be/src/runtime/io/handle-cache.h
R be/src/runtime/io/handle-cache.inline.h
R be/src/runtime/io/request-context.cc
R be/src/runtime/io/request-context.h
A be/src/runtime/io/request-ranges.h
R be/src/runtime/io/scan-range.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
38 files changed, 1,417 insertions(+), 1,311 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/8424/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8424
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b
Gerrit-Change-Number: 8424
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-03 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27
PS9, Line 27:
> RUNTIME_FILTER_WAIT_TIME_MS applies the same to bloom and min-max - its the
Thanks for explaining.
Can you add this text into a Google Doc for us to keep track of the 
evolution/meaning of these query options? No need to polish, just put it 
somewhere.

I think the new types of filters will affect our query options in non-trivial 
ways and we should come up with a plan that minimizes user confusion, adding 
new options, and deprecating options.


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@41
PS9, Line 41:
> Those results are posted in the review comments. I can include them here as
Thanks. No need to add here.


http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift@202
PS9, Line 202:   // Maximum number of bloom runtime filters allowed per query
> There's basically two reasons for this:
Thanks. Please add this text to a Google Doc for tracking the evolution of 
these query options.

Even though the min/max filters are smaller and bounded in size, I think 
extreme queries with a large number of joins and join conditions can still 
cause havoc. Keep in mind that we now allow an *unbounded* number of such 
filters. Crazy queries will happen.

We should not hold this patch, but the lack of safeguards is concerning to me 
and we should continue thinking.


http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/PlanNodes.thrift@103
PS9, Line 103:   12: optional string kudu_col_name
case sensitive?


http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@359
PS11, Line 359: public Operator getJoinOp() { return exprCmpOp_; }
getExprCmpOp()


http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@602
PS11, Line 602: if (!(targetExpr instanceof SlotRef)
I think only explicit casts are problematic. Implicit casts should be ok, or am 
I missing something?


http://gerrit.cloudera.org:8080/#/c/7793/11/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test:

http://gerrit.cloudera.org:8080/#/c/7793/11/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test@144
PS11, Line 144: on a.month = cast(b.month + 1 as int);
Was this your change? Why the change?



--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 11
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 03 Nov 2017 23:22:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8404 )

Change subject: IMPALA-5017: Error on decimal overflow
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8404/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8404/1//COMMIT_MSG@12
PS1, Line 12: In this patch, the beharior is changed so that an error is 
produced in
> typo: beharior should be behavior
I'll also accept behaviour ;)


http://gerrit.cloudera.org:8080/#/c/8404/1//COMMIT_MSG@27
PS1, Line 27:   select avg(dec_38_19) from decimal_tbl
I guess these regressions occur regardless of the width of the input decimal, 
rigth?


http://gerrit.cloudera.org:8080/#/c/8404/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8404/1/be/src/exprs/aggregate-functions-ir.cc@413
PS1, Line 413:   abs(avg->sum_val16) > 
DecimalUtil::MAX_UNSCALED_DECIMAL16 - abs(src.val4))) {
This isn't correct if sum_val16 and src.val* have opposite sign, is it?


http://gerrit.cloudera.org:8080/#/c/8404/1/be/src/exprs/aggregate-functions-ir.cc@420
PS1, Line 420:   abs(avg->sum_val16) > 
DecimalUtil::MAX_UNSCALED_DECIMAL16 - abs(src.val8))) {
What do you think about only checking for overflow of the integer type at this 
point and checking for whether it fits in the decimal type during finalisation? 
Might be cheaper.

We could also probably do a cheaper initial check of sum_val16 versus as 
constant in the 4 and 8 byte cases since they can't possible overflow unless 
the sum is already quite large.



--
To view, visit http://gerrit.cloudera.org:8080/8404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019
Gerrit-Change-Number: 8404
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 03 Nov 2017 23:21:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2017-11-03 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8034 )

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..


Patch Set 15:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@77
PS15, Line 77:   void ClearIndices() {
> We generally use the more concise one-line formatting for very short single
To add to what Tim said, this code didn't actually change, so it is nice to 
avoid touching code if you can. It makes it easier to track down what code 
change last impacted a piece of code.


http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@152
PS15, Line 152:   /// Destructor, release the bytes used by Memtracker.
  :   ~DictEncoder() {
  : ReleaseBytes();
  : DCHECK(dict_bytes_cnt_ == 0);
  :   }
I think we can omit this destructor, since DictEncoderBase does the same thing.


http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@296
PS15, Line 296:   /// Destructor, release the bytes used by Memtracker and 
close.
  :   ~DictDecoder() {
  : ReleaseBytes();
  : DCHECK(dict_bytes_cnt_ == 0);
  :}
I think we can omit this destructor, as DictDecoderBase does this.



--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 15
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 23:14:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8034 )

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..


Patch Set 15:

(8 comments)

Pretty close, have some formatting nits (nothing horrible, just trying to keep 
the codebase consistent) and one perf concern.

http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@54
PS15, Line 54: : dict_encoded_size_(0), pool_(NULL), dict_bytes_cnt_(0), 
dict_mem_tracker_(NULL) {}
Ok to leave for now but we've generally been moving towards using the recent 
C++ extension that allows initialising member variables to constants at their 
declartion.


http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@59
PS15, Line 59: DCHECK
DCHECK_EQ(dict_bytes_cnt_, 0);


http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@77
PS15, Line 77:   void ClearIndices() {
We generally use the more concise one-line formatting for very short 
single-statement functions like this one.


http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@123
PS15, Line 123:
formatting nit: * goes on the left with the type name.


http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@168
PS15, Line 168: inline
nit: inline isn't necessary. It isn't harmful but can be confusing to have 
unnecessary modifiers.


http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@269
PS15, Line 269:  *
nit: * should be on left.


http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@330
PS15, Line 330: inline
nit: unnecessary inline


http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@374
PS15, Line 374:   ConsumeBytes(sizeof(node));
I'm concerned that calling MemTracker::Consume for every node added to the 
table could hurt performance in some workloads since it will end up 
incrementing the memory consumption counter in the root process-wide MemTracker 
up to 40k times per dictionary.

(In contrast, MemPool::Allocate() is generally fine since it allocates memory 
in chunks).

How about we track the memory for some number of nodes at a time. E.g.

  const int NODE_MEM_TRACKING_GRANULARITY = 4096;
  ...
  if (nodes % NODE_MEM_TRACKING_GRANULARITY == 0) {
ConsumeBytes(sizeof(node) * NODE_MEM_TRACKING_GRANULARITY);
  }



--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 15
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 23:07:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-11-03 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8023 )

Change subject: IMPALA-4856: Port data stream service to KRPC
..


Patch Set 7:

(37 comments)

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h
File be/src/runtime/krpc-data-stream-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h@102
PS7, Line 102: the batch is added
 : /// to the receiver's 'deferred_batches_'
it's not really the batch added. and it's not just a single structure for the 
receiver (it may go into one of many queues for merging exchange). So how about 
saying:
... the RPC state is saved into a deferred queue.


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h@104
PS7, Line 104: from the pending sender queue
how about: ... from a deferred RPC queue and the row batch is deserialized.


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h@393
PS7, Line 393:
quick comment for why we define a move constructor and move operator=, since we 
don't typically want to define those.


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@199
PS7, Line 199:   DeserializeTask payload =
 :   {DeserializeTaskType::EARLY_SENDERS, finst_id, 
dest_node_id, 0};
 :   deserialize_pool_.Offer(move(payload));
doesn't this mean we make early sender draining single threaded? shoudl we 
instead use the sender_id in this case as well and offer work per sender? or do 
we think this doesn't matter?


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@217
PS7, Line 217: already_unregistered
that shouldn't be possible in the DEFERRED_BATCHES case, right? so i'd probably 
move this DCHECK into the cases below so you can tighten it up.


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@235
PS7, Line 235: for (const unique_ptr& ctx : 
early_senders.waiting_sender_ctxs) {
shouldn't we process waiting_sender_ctxs before closed_sender_ctxs? Otherwise, 
if the same sender is in both lists we'll process those RPCs out of order. I 
guess that can't really happen given the current implementation of not 
responding to early RPCs and that senders only let one in flight, but it still 
seems to make more sense to do it the other way around.


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@236
PS7, Line 236: already_unregistered
why is this possible in the waiting_sender_ctxs case but not the 
closed_sender_ctxs case?


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@247
PS7, Line 247: already_unregistered
why is that possible?


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@248
PS7, Line 248: recvr->AddDeferredBatches(task.sender_id);
So I guess we no longer multithread within a single sender queue (and for 
non-merging, within a single receiver) doing it this way. I think that's okay 
but was it intentional?


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-recvr.h
File be/src/runtime/krpc-data-stream-recvr.h:

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-recvr.h@78
PS7, Line 78: The caller must acquire data from the
:   /// returned batch
is that talking about calling TransferAllResources(), or can the caller do it 
directly?


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@127
PS6, Line 127:   // If true, the receiver fragment for this stream got 
cancelled.
> For the non-merging case, there is essentially only one queue.
As mentioned elsewhere, I'm not totally convinced yet that this is the right 
way to do it but, yes, we can think about it more and change it later if 
necessary.


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-recvr.cc@72
PS7, Line 72: s
typo


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-recvr.cc@77
PS7, Line 77: Adds as many deferred batches as possible
hmm I'm still not convinced this is the right thing to do (in the merging 
case). It seems like it's left up to chance as to the order that deferred 
batches are drained across the sender queues. But we can think about this more 
and address it later.


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-recvr.cc@97
PS7, Line 97: (1) 'batch_queue' is empty and there is no 

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet 
correctness
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml@951
PS2, Line 951: Medium
Missed this on the first pass - shouldn't this be high?



--
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 22:48:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8455 )

Change subject: IMPALA-6144: PublishFilter() continues to run after query 
failure/cancellation
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1434/


--
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 22:47:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8455 )

Change subject: IMPALA-6144: PublishFilter() continues to run after query 
failure/cancellation
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 22:45:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-03 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h@376
PS1, Line 376:   virtual bool DictionaryReferencesBuffer() const { return 
false; }
> Add comment. Make pure if possible. Alternatively, you could do something s
I have removed this and the other similar functions, and used directly 
slot_desc_->type().IsVarLenStringType() instead.


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@272
PS1, Line 272:   virtual bool DictionaryReferencesBuffer() const {
> add "override"
This function was removed.


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@541
PS1, Line 541: some
> how about rewording this to "Column readers for types that require ..."
This function was removed.


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@542
PS1, Line 542:   inline bool DictionaryReferencesBufferInline() const {
> nit: single line
This function was removed.


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@945
PS1, Line 945: tmp_buffer
> better name
Done


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@947
PS1, Line 947:   int uncompressed_size = decompressor_.get() != nullptr
> Move this to L954
Done


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@951
PS1, Line 951:   if (decompressor_.get() == nullptr && 
!DictionaryReferencesBuffer()) {
> Can you add a brief comment to explain how the 4 different cases are handle
Done



--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Fri, 03 Nov 2017 22:39:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-03 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8436

to look at the new patch set (#4).

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 35 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/4
--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE

2017-11-03 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8372 )

Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use 
of TRUNCATE
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q26.test
File testdata/workloads/tpcds/queries/tpcds-q26.test:

http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q26.test@3
PS4, Line 3: -- RESULT MISMATCH FROM ORIGINAL
Please use a more descriptive comment.
What kind of mismatch?
Which column?
Description of mismatch.


http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q26.test@4
PS4, Line 4: -- TAKE ACTUAL RESULT AS EXPECTED
What does this mean?


http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q47.test
File testdata/workloads/tpcds/queries/tpcds-q47.test:

http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q47.test@3
PS4, Line 3: -- RESULT MISMATCH FROM ORIGINAL in LSD of DECIMAL values
Would be good to use an example to better describe the mismatch


http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q59.test
File testdata/workloads/tpcds/queries/tpcds-q59.test:

http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q59.test@3
PS4, Line 3:
Remove line


http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q63.test
File testdata/workloads/tpcds/queries/tpcds-q63.test:

http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q63.test@3
PS4, Line 3:
Remove line



--
To view, visit http://gerrit.cloudera.org:8080/8372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a
Gerrit-Change-Number: 8372
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Fri, 03 Nov 2017 22:35:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-03 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8436

to look at the new patch set (#3).

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 36 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/3
--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-6121: remove I/O mgr request context cache

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8408 )

Change subject: IMPALA-6121: remove I/O mgr request context cache
..


Patch Set 6: Code-Review+1

Rebased since some of the patches under it changed.


--
To view, visit http://gerrit.cloudera.org:8080/8408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108
Gerrit-Change-Number: 8408
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 22:30:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-03 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8436

to look at the new patch set (#2).

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
2 files changed, 38 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/2
--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8428 )

Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h
File be/src/util/condition-variable.h:

http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h@65
PS1, Line 65:   bool TimedWait(boost::unique_lock& lock,
> Now that I looked into the callsites more carefully, I realized that there
I like those names - good to distinguish between the two cases. Seems 
reasonable, especially considering there's precedent in the standard library.



--
To view, visit http://gerrit.cloudera.org:8080/8428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9
Gerrit-Change-Number: 8428
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Nov 2017 22:22:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has removed a vote on this change.

Change subject: Fix errant, newline-including log directory.
..


Removed Verified-1 by Impala Public Jenkins (255)
--
To view, visit http://gerrit.cloudera.org:8080/8459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
Gerrit-Change-Number: 8459
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8459 )

Change subject: Fix errant, newline-including log directory.
..


Patch Set 1:

Looks like IMPALA-6092. We definitely shouldn't be hitting a DCHECK regardless 
so I filed IMPALA-6154


--
To view, visit http://gerrit.cloudera.org:8080/8459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
Gerrit-Change-Number: 8459
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 22:20:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8459 )

Change subject: Fix errant, newline-including log directory.
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1433/


--
To view, visit http://gerrit.cloudera.org:8080/8459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
Gerrit-Change-Number: 8459
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 22:20:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8459 )

Change subject: Fix errant, newline-including log directory.
..


Patch Set 1:

Looking through 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/574/artifact/Impala/logs_static/logs/,
 I think the following is the relevant issue.

I'm hesitant to believe that a change to bin/clean.sh could have caused this.

F1103 20:52:33.350702 67042 scalar-expr-evaluator.cc:70] Check failed: 
!initialized_ || closed_


   I1103 20:52:33.415714 67008 status.cc:124] ImpalaRuntimeException: 
Unable to find class.   


   CAUSED BY: ClassNotFoundException: org.apache.impala.TestUdf 



 @  0x15a0bef  impala::Status::Status() 



   @  0x19a42c8  impala::JniUtil::GetJniExceptionMsg()  



 @  0x1d28b85  impala::HiveUdfCall::OpenEvaluator() 



   @  0x1d2b76d  impala::ScalarExpr::OpenEvaluator()



 @  0x1d6f55d  impala::ScalarFnCall::OpenEvaluator()



   @  0x1d31d94  impala::ScalarExprEvaluator::Clone()   



 @  0x1d31fa6  impala::ScalarExprEvaluator::Clone() 


   
@  0x1ad088c  impala::HdfsScanner::Open()   


  @ 
 0x1b10f80  impala::HdfsTextScanner::Open() 


@   
   0x1ab4fdf  impala::HdfsScanNodeBase::CreateAndOpenScanner()  


  @ 
 0x1aa9f99  impala::HdfsScanNode::ProcessSplit()


@  

[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage

2017-11-03 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8464


Change subject: IMPALA-4591: Bound Kudu client error mem usage
..

IMPALA-4591: Bound Kudu client error mem usage

Previously, Kudu client errors could grow in size unbounded,
potentially causing the process to be killed. This patch sets a
bound on the mem that can be used for these error messages, with
the size determined by the flag 'kudu_error_buffer_size'.

If the errors for a Kudu client exceed this size, the query will fail,
as some errors will be dropped and we won't be able to tell if all of
the errors can be safely ignored.

Testing:
- Added a custom cluster test that verifies that a query that exceeds
  the limit fails.

Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M tests/custom_cluster/test_kudu.py
3 files changed, 39 insertions(+), 26 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/8464/1
--
To view, visit http://gerrit.cloudera.org:8080/8464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
Gerrit-Change-Number: 8464
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8459 )

Change subject: Fix errant, newline-including log directory.
..


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1432/


--
To view, visit http://gerrit.cloudera.org:8080/8459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
Gerrit-Change-Number: 8459
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 21:39:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..

IMPALA-6137: fix text scanner split delim mem mgmt

The bug was that the buffer pointed to by byte_buffer_ptr_ could be freed
by ReleaseCompletedResources() before CheckForSplitDelimiter() was called.

The simple fix is to copy out the single byte that is needed each time
the buffer is filled.

Testing:
Ran exhaustive query tests under ASAN with --disable_mem_pools=true.

Before the change test_text_split_delimiters reliably caused an ASAN
failure when run with --disable_mem_pools=true. We should get this
coverage automatically once the I/O mgr switches to using the buffer
pool, which uses ASAN poisoning on freed buffers.

Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Reviewed-on: http://gerrit.cloudera.org:8080/8438
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
2 files changed, 43 insertions(+), 7 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 03 Nov 2017 21:38:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation

2017-11-03 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8455 )

Change subject: IMPALA-6144: PublishFilter() continues to run after query 
failure/cancellation
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 21:06:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8461 )

Change subject: IMPALA-6151: add query-level fragment/backend counters
..


Patch Set 2:

I thought about adding metrics to track queries in other states - e.g. running 
but not cancelled, but that is trickier since the query lifecycle is more 
complicated and varies more between DML/DDL/queries.


--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 21:00:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8461 )

Change subject: IMPALA-6151: add query-level fragment/backend counters
..


Patch Set 2:

I also added a metric to track running queries, which mostafa requested.


--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 20:59:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-03 Thread Tim Armstrong (Code Review)
Hello Thomas Tauber-Marshall, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8461

to look at the new patch set (#2).

Change subject: IMPALA-6151: add query-level fragment/backend counters
..

IMPALA-6151: add query-level fragment/backend counters

This adds NumBackends, NumFragments and NumFragmentInstances
counters to the query profile to make it easier to manually or
programmatically analyse the query.

Also add a num-queries-registered metric to track the number
of queries that have been executed but are not yet unregistered.

Testing:
Ran "select count(*) from alltypessmall" and checked profile:

Backend startup latencies: Count: 3, min / max: 1ms / 1ms, 25th %-ile: 1ms, 
50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 1ms, 95th %-ile: 1ms, 99.9th 
%-ile: 1ms
Per Node Peak Memory Usage: tarmstrong-box:22000(1.10 MB) 
tarmstrong-box:22001(1.02 MB) tarmstrong-box:22002(1.02 MB)
 - FiltersReceived: 0 (0)
 - FinalizationTimer: 0.000ns
 - NumBackends: 3 (3)
 - NumFragmentInstances: 4 (4)
 - NumFragments: 2 (2)

Ran some queries and manually checked the num-queries-registered metric
on the /metrics page when the queries were running and after they
finished.

Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
---
M be/src/runtime/coordinator.cc
M be/src/service/impala-server.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
5 files changed, 36 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/8461/2
--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

2017-11-03 Thread John Russell (Code Review)
John Russell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet 
correctness
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml@936
PS1, Line 936: Examine the HDFS_SCAN_NODE portion 
of a query profile that scans the
> This unfortunately won't give accurate info for all queries: if the query i
Done


http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml@937
PS1, Line 937: suspected table. Use a query that performs a full 
table scan, and materializes the column
> It might be helpful to note common cases where uncompressed Parquet is/isn'
Done



--
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 20:37:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

2017-11-03 Thread John Russell (Code Review)
Hello Greg Rahn, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8418

to look at the new patch set (#2).

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet 
correctness
..

IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
---
M docs/topics/impala_known_issues.xml
1 file changed, 45 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/8418/2
--
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8455 )

Change subject: IMPALA-6144: PublishFilter() continues to run after query 
failure/cancellation
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 20:24:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE

2017-11-03 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8372 )

Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use 
of TRUNCATE
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a
Gerrit-Change-Number: 8372
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Fri, 03 Nov 2017 20:23:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation

2017-11-03 Thread Sailesh Mukil (Code Review)
Hello Bharath Vissapragada, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8455

to look at the new patch set (#2).

Change subject: IMPALA-6144: PublishFilter() continues to run after query 
failure/cancellation
..

IMPALA-6144: PublishFilter() continues to run after query failure/cancellation

The CoordinatorBackendState::PublishFilter() function does not check
for query failure/cancellation. So if runtime filters are being published
during/after a failure, they will not be cancelled and still be sent
out which may take a while depending on the size of the cluster.
Also, these functions could potentially hold very large amounts
of untracked memory.

This patch fixes it by checking for cancellation/failure in
PublishFilter.

Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.cc
2 files changed, 9 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/8455/2
--
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation

2017-11-03 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8455 )

Change subject: IMPALA-6144: PublishFilter() continues to run after query 
failure/cancellation
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc@394
PS1, Line 394:   {
> same
Done


http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc@394
PS1, Line 394:   {
> Also, should we check if the backend is done too?
Done


http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1082
PS1, Line 1082:   TPublishFilterParams rpc_params;
> lock_?
I removed this check since the check in L1106 should be sufficient.


http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1106
PS1, Line 1106: if (filter_updates_received_->value() == 0) {
> I think checking for the filter disabled is sufficient in this function, si
Good point. I removed the above check. Plus acquiring the lock above to check 
the status would be very expensive.


http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1130
PS1, Line 1130: swap(rpc_params.bloom_filter, aggregated_filter);
> I think we should also track the rpc_params memory and prevent the query fr
Sure I opened IMPALA-6153 and linked it to IMPALA-1575.



-- 
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 20:13:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Explain Boost setting needed for 96-bit timestamps

2017-11-03 Thread John Russell (Code Review)
John Russell has abandoned this change. ( http://gerrit.cloudera.org:8080/7983 )

Change subject: [DOCS] Explain Boost setting needed for 96-bit timestamps
..


Abandoned

Suggested workaround was not a robust or future-proof solution.
--
To view, visit http://gerrit.cloudera.org:8080/7983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I4b67cd7762f682c3a054e0d9641080aa51801c83
Gerrit-Change-Number: 7983
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-03 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8461 )

Change subject: IMPALA-6151: add query-level fragment/backend counters
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 03 Nov 2017 19:07:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
..


Patch Set 10: Code-Review+1

Carry Alex's +1


--
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:59:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
..


Patch Set 9:

(1 comment)

Rebased onto master now that the predecessor patch is merged.

http://gerrit.cloudera.org:8080/#/c/8172/9/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/8172/9/be/src/exec/hdfs-scan-node-base.cc@401
PS9, Line 401: state->AcquireReaderContext(reader_context_);
It's also possible to remove this logic as a consequence of this patch, but I 
ended up doing it in a follow-on patch. https://gerrit.cloudera.org/#/c/8408/



--
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:58:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8461


Change subject: IMPALA-6151: add query-level fragment/backend counters
..

IMPALA-6151: add query-level fragment/backend counters

This adds NumBackends, NumFragments and NumFragmentInstances
counters to the query profile to make it easier to manually or
programmatically analyse the query.

Testing:
Ran "select count(*) from alltypessmall" and checked profile:

Backend startup latencies: Count: 3, min / max: 1ms / 1ms, 25th %-ile: 1ms, 
50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 1ms, 95th %-ile: 1ms, 99.9th 
%-ile: 1ms
Per Node Peak Memory Usage: tarmstrong-box:22000(1.10 MB) 
tarmstrong-box:22001(1.02 MB) tarmstrong-box:22002(1.02 MB)
 - FiltersReceived: 0 (0)
 - FinalizationTimer: 0.000ns
 - NumBackends: 3 (3)
 - NumFragmentInstances: 4 (4)
 - NumFragments: 2 (2)

Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
---
M be/src/runtime/coordinator.cc
1 file changed, 12 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/8461/1
--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8368 )

Change subject: IMPALA-2235: Fix current db when shell auto-reconnects
..


Patch Set 3: Code-Review+1

Will give phil another chance to look


--
To view, visit http://gerrit.cloudera.org:8080/8368
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
Gerrit-Change-Number: 8368
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:21:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8459 )

Change subject: Fix errant, newline-including log directory.
..


Patch Set 1:

Thanks for fixing this, this has caused me headaches.


-- 
To view, visit http://gerrit.cloudera.org:8080/8459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
Gerrit-Change-Number: 8459
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:18:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8459 )

Change subject: Fix errant, newline-including log directory.
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
Gerrit-Change-Number: 8459
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:17:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8459 )

Change subject: Fix errant, newline-including log directory.
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1432/


--
To view, visit http://gerrit.cloudera.org:8080/8459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
Gerrit-Change-Number: 8459
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:17:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] Fix errant, newline-including log directory.

2017-11-03 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8459


Change subject: Fix errant, newline-including log directory.
..

Fix errant, newline-including log directory.

We've seen cases where a directory named "cluster\n  " sneaks
into the logs directory. The intention is that $IMPALA_ALL_LOGS_DIRS
is a space-separated list, but clean.sh (as opposed to buildall.sh)
treats it as a single argument.

I tested this manually:

Before:

  $ls logs/
  be_tests/  custom_cluster_tests/  data_loading/  ee_tests/  fe_tests/

Bad command:

  $mkdir -p "${IMPALA_ALL_LOGS_DIRS}"

After:

  $ls logs/
  be_tests/  cluster?  /  custom_cluster_tests/  data_loading/  ee_tests/
  fe_tests/

Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
---
M bin/clean.sh
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/8459/1
--
To view, visit http://gerrit.cloudera.org:8080/8459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317
Gerrit-Change-Number: 8459
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8438/5/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8438/5/be/src/exec/hdfs-text-scanner.h@144
PS5, Line 144: in
> if?
Done



--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:15:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-03 Thread Tim Armstrong (Code Review)
Hello anujphadke, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8438

to look at the new patch set (#6).

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..

IMPALA-6137: fix text scanner split delim mem mgmt

The bug was that the buffer pointed to by byte_buffer_ptr_ could be freed
by ReleaseCompletedResources() before CheckForSplitDelimiter() was called.

The simple fix is to copy out the single byte that is needed each time
the buffer is filled.

Testing:
Ran exhaustive query tests under ASAN with --disable_mem_pools=true.

Before the change test_text_split_delimiters reliably caused an ASAN
failure when run with --disable_mem_pools=true. We should get this
coverage automatically once the I/O mgr switches to using the buffer
pool, which uses ASAN poisoning on freed buffers.

Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
2 files changed, 43 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8438/6
--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1431/


--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:16:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 6: Code-Review+2

carry


--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:15:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs

2017-11-03 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8456


Change subject: IMPALA-6148: Specifying thirdparty deps as URLs
..

IMPALA-6148: Specifying thirdparty deps as URLs

If the environment variable $IMPALA__URL is configured for
a thirdparty dependency, use that to download it instead of
the s3://native-toolchain bucket. This makes testing against
arbitrary versions of the dependencies easier.

I did a little bit of refactoring while here, creating a small class for
a Package to handle reading the environment variables. I also changed
bootstrap_toolchain.py to use Python logging, which cleans up the output
during the multi-threaded downloading.

I tested this by both with customized URLs and by running the regular
build (pre-review-test, without most of the slow test suites).

Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
---
M bin/bootstrap_toolchain.py
1 file changed, 77 insertions(+), 44 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/8456/2
--
To view, visit http://gerrit.cloudera.org:8080/8456
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
Gerrit-Change-Number: 8456
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-11-03 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8023 )

Change subject: IMPALA-4856: Port data stream service to KRPC
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h
File be/src/runtime/krpc-data-stream-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h@430
PS7, Line 430: i
id


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h@436
PS7, Line 436: specifies
identifies


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@61
PS7, Line 61:
nit: blank space


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-sender.h@136
PS7, Line 136: //
nit: ///


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-sender.cc@182
PS7, Line 182:  It points to
delete.



--
To view, visit http://gerrit.cloudera.org:8080/8023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Gerrit-Change-Number: 8023
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:13:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-03 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 5: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8438/5/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8438/5/be/src/exec/hdfs-text-scanner.h@144
PS5, Line 144: in
if?



--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:10:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8428 )

Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable
..


Patch Set 1:

I don't think it's necessary to unit-test it, given it's a thin wrapper. The 
test code would be more complex than the product code.


--
To view, visit http://gerrit.cloudera.org:8080/8428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9
Gerrit-Change-Number: 8428
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Nov 2017 17:45:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-03 Thread Tim Armstrong (Code Review)
Hello anujphadke, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8438

to look at the new patch set (#5).

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..

IMPALA-6137: fix text scanner split delim mem mgmt

The bug was that the buffer pointed to by byte_buffer_ptr_ could be freed
by ReleaseCompletedResources() before CheckForSplitDelimiter() was called.

The simple fix is to copy out the single byte that is needed each time
the buffer is filled.

Testing:
Ran exhaustive query tests under ASAN with --disable_mem_pools=true.

Before the change test_text_split_delimiters reliably caused an ASAN
failure when run with --disable_mem_pools=true. We should get this
coverage automatically once the I/O mgr switches to using the buffer
pool, which uses ASAN poisoning on freed buffers.

Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
2 files changed, 43 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8438/5
--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8438/4/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8438/4/be/src/exec/hdfs-text-scanner.h@90
PS4, Line 90:   /// CheckForSplitDelimiter(). Valid if 'byte_buffer_filled_' is 
true.
> Maybe briefly explain why we have it: the character is needed after the byt
Done


http://gerrit.cloudera.org:8080/#/c/8438/4/be/src/exec/hdfs-text-scanner.h@144
PS4, Line 144: virtual
> I'm not following why we need a second virtual method. Is the wrapper meant
We don't, that was my mistake. Removed the virtual.



--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 03 Nov 2017 17:40:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt

2017-11-03 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8438 )

Change subject: IMPALA-6137: fix text scanner split delim mem mgmt
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8438/4/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8438/4/be/src/exec/hdfs-text-scanner.h@90
PS4, Line 90:   /// CheckForSplitDelimiter(). Valid if 'byte_buffer_filled_' is 
true.
Maybe briefly explain why we have it: the character is needed after the byte 
buffer is returned, or whatever.


http://gerrit.cloudera.org:8080/#/c/8438/4/be/src/exec/hdfs-text-scanner.h@144
PS4, Line 144: virtual
I'm not following why we need a second virtual method. Is the wrapper meant to 
always apply to all subclasses? If so, it shouldn't need (or want it) to be 
virtual right?



--
To view, visit http://gerrit.cloudera.org:8080/8438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c
Gerrit-Change-Number: 8438
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 03 Nov 2017 16:52:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable

2017-11-03 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8428 )

Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable
..


Patch Set 1:

(1 comment)

> Should I create tests for condition-variable.h? We don't seem to have unit 
> tests yet.

I think ideally we'd have tests for helper code like this. I'm ok if we think 
it's not worth it here given the low complexity.

http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h
File be/src/util/condition-variable.h:

http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h@38
PS1, Line 38:   void Wait(boost::unique_lock& lock) {
> About the 'struct' keyword: it is something that C requires, but in C++ it
Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/8428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9
Gerrit-Change-Number: 8428
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Nov 2017 16:38:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)

2017-11-03 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8434 )

Change subject: IMPALA-5564: Release lock during planning. (wip)
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8434/1/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/8434/1/be/src/service/client-request-state.h@342
PS1, Line 342: is_planning_
> I started out with this being planning_finished_ (or some such). I ran into
I see - the ExecuteMetadataOp() cases.



--
To view, visit http://gerrit.cloudera.org:8080/8434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b
Gerrit-Change-Number: 8434
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 03 Nov 2017 16:33:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-03 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8447 )

Change subject: IMPALA-2181: Add query option levels for display
..


Patch Set 5: Code-Review-1

(3 comments)

I took a quick look through. I'm uncomfortable with this change as it stands, 
because I feel it abuses the Thrift structures too much. Specifically, keeping 
the map of query option levels in TQueryOptions seems wrong: you're really 
describing "Query Option Metadata" (which might include name, description/help, 
and level), rather than the query options themselves. And, for sending this 
stuff to impala-shell, abusing the 'description' field seems wrong. (In fact, 
impala-shell should actually show useful help for these options. Things like 
"MT_DOP" are completely opaque.

Implementation wise, I think we also need to worry about what happens for the 
HiveServer2 side of things. Specifically, we respond to "SET" queries over HS2 
at 
https://github.com/apache/incubator-impala/blob/1803b403e38b3f952afc4ff3fd3e5f4d14c088f8/be/src/service/client-request-state.cc#L194.
 I think we want to stay away from adding information into the Beeswax API that 
isn't also exposed in HS2. One option is to let "SET ALL" be meaningful as a 
query for HS2, though I don't quite know what that means in terms of the schema 
returned: how would you divide the regular/advanced/deprecated options over 
that API?

http://gerrit.cloudera.org:8080/#/c/8447/5/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/8447/5/common/thrift/ImpalaInternalService.thrift@65
PS5, Line 65: // Levels to use when displaying query options from Impala shell
Let's leave a pointer here indicating where the mapping from the options in 
TQueryOptions to these levels is defined.


http://gerrit.cloudera.org:8080/#/c/8447/5/common/thrift/ImpalaInternalService.thrift@297
PS5, Line 297:   // Map for query option name to option level mapping. 
Populated by ImpalaServer
 :   // on startup
 :   61: required map 
query_option_levels;
This worries me a bit, for a couple of reasons.

First, 'required' is forever. (See 
https://diwakergupta.github.io/thrift-missing-guide/#_defining_structs.) I 
think it means that a Thrift client will refuse to serialize a struct if this 
isn't set. For some uses, like TClientRequest, it makes no sense for us to 
carry this around. It's possible it works out because it can be set to the 
empty map often, but it's still odd.

Second, this isn't a query option. It's certainly describing query options, but 
it's very very different from, say, "mem_limit", which applies to the query.


http://gerrit.cloudera.org:8080/#/c/8447/5/shell/impala_client.py
File shell/impala_client.py:

http://gerrit.cloudera.org:8080/#/c/8447/5/shell/impala_client.py@100
PS5, Line 100:   def get_query_option_level_from_description(self, description):
Have we decided we can't change ConfigVariable to include a level? It seems 
like this is very much not a description, and we're abusing it.

/** Represents a Hadoop-style configuration variable. */
struct ConfigVariable {
  1: string key,
  2: string value,
  3: string description
}



--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Nov 2017 16:33:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-03 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8447 )

Change subject: IMPALA-2181: Add query option levels for display
..


Patch Set 5:

Sorry I didn't see the discussion on the JIRA about "advanced". I don't think 
that solves the problem that the JIRA was meaning to solve (though I don't 
think implementation would have to change too much).  I'll add a comment to the 
JIRA.


--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Nov 2017 16:17:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6144: UpdateFilter()/PublishFilter() continue to run after query failure/cancellation

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8455 )

Change subject: IMPALA-6144: UpdateFilter()/PublishFilter() continue to run 
after query failure/cancellation
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1130
PS1, Line 1130: swap(rpc_params.bloom_filter, aggregated_filter);
I think we should also track the rpc_params memory and prevent the query from 
being unregistered and releasing admission control resources until this 
function has exited - could you maybe file a follow-up JIRA? That might depend 
on my admission control change anyway.



--
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 16:08:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-03 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 11:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@13
PS9, Line 13: In RuntimeFilterGenerator in the planner, each hash join node
> ... each hash join node generates a bloom and min-max filter for each equi-
Done


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27
PS9, Line 27:
> Not specific to the code changes, and I don't expect a response here (proba
RUNTIME_FILTER_WAIT_TIME_MS applies the same to bloom and min-max - its the 
time a scan node will wait for any filter of either type to arrive. Similarly, 
RUNTIME_FILTER_MODE works the same for min-max as its worked for bloom.

The size related params - RUNTIME_BLOOM_FILTER_SIZE, 
RUNTIME_FILTER_MIN/MAX_SIZE only apply to bloom filters as the mem used by 
min-max is small and bounded.

I've updated some comments to make the above clearer.

One tricky case is DISABLE_ROW_RUNTIME_FILTERING. Kudu applies the filters on 
both a per-partition and per-row basis. At the moment, there's no way to change 
this - the filters are applied through the same interface as eg. predicates 
that are pushed down, so Kudu assumes that they have to be applied for 
correctness, not just for performance.

So, I could see the argument either way - we could disable Kudu filters if 
DISABLE_ROW_RUNTIME_FILTERING is true, though it would also disable the 
partition filtering, or we could just not consider 
DISABLE_ROW_RUNTIME_FILTERING for Kudu filters.

The remaining filter related option is MAX_NUM_RUNTIME_FILTERS, which I've 
addressed in another comment.


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@41
PS9, Line 41:
> Contrived extreme queries are good data points, but how about running the T
Those results are posted in the review comments. I can include them here as 
well, I just felt it made the commit message excessively large.


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@44
PS9, Line 44:   timings are averages of 3 runs.
> does not eliminate
Done


http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/runtime/timestamp-value.h@164
PS8, Line 164: tvalue.ti
> I'm confused by this variable name.
Done


http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc
File be/src/util/min-max-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc@27
PS8, Line 27:
> Can you maybe briefly describe what is tested for each filter type? There s
Done


http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc@252
PS8, Line 252:   MinMaxFilter::Create(tFilter, string_type, _pool, 
_pool);
> Would TestEnv work? That is the semi-standard way to create an ExecEnv for
Done


http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc@353
PS8, Line 353:   t3.ToTColumnValue();
> Remove?
Done


http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter.cc@246
PS8, Line 246: null
> nullptr?
Done


http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift@202
PS9, Line 202:   // Maximum number of bloom runtime filters allowed per query
> I think I understand why you did this, but it seems confusing from a user's
There's basically two reasons for this:
- We don't want to regress queries by eliminating bloom filters that used to be 
generated.
- The purpose of this param in the first place was to prevent the coordinator 
from being overwhelmed. Min-max filters are less heavy-weight than bloom 
filters so they don't put as much stress on the coordinator anyways.

Given that, I think that it would be fine to maintain this behavior even once 
HDFS can do min-max and Kudu can do bloom.

Another concern though is what to do if we add in-list filters, which are 
probably similarly heavy-weight to bloom filters.

One option would be to deprecate this and have separate max_num_bloom and 
max_num_in_list params, but that may be confusing and requires user action to 
keep things working as expected anyways.

Another option at that point would be to keep this param and make it "Maximum 
number of expensive filters (bloom or in-list)", and then there's just once we 
have to worry about regressing queries.

Or maybe we should just apply this to min-max filters as well, since its the 
least confusing option in the long run, possibly bumping up the default a 

[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-03 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#11).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

In RuntimeFilterGenerator in the planner, each hash join node
generates a bloom and min-max filter for each equi-join predicate, but
only those filters that end up being assigned to a target make it into
the final plan.

Min-max filters are only assigned to Kudu scans if the target expr is
a column, as Kudu doesn't support bounds on general exprs, and only if
the join op is '=' and not 'is distinct from', as Kudu doesn't support
returning NULLs if a bound is set.

Min-max filters are inserted into by the PartitionedHashJoinBuilder.
Codegen is used to eliminate branching on the type of filter. String
min-max filters truncate their bounds at 1024 chars, so that the max
amount of memory used by min-max filters is negligible.

For now, min-max filters are only applied at the KuduScanner, which
passes them into the Kudu client.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Functional Testing:
- Added new planner tests and updated the old ones. (in old tests, a
  lot of runtime filters are renumbered as we always generate min-max
  filters even if they don't end up getting assigned and they take up
  some of the RF ids).
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does not eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter-test.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/Data.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 

[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-03 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#10).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

In RuntimeFilterGenerator in the planner, each partitioned hash join
generates a bloom and min-max filter, but only those filters that end
up being assigned to a target make it into the final plan.

Min-max filters are only assigned to Kudu scans if the target expr is
a column, as Kudu doesn't support bounds on general exprs, and only if
the join op is '=' and not 'is distinct from', as Kudu doesn't support
returning NULLs if a bound is set.

Min-max filters are inserted into by the PartitionedHashJoinBuilder.
Codegen is used to eliminate branching on the type of filter. String
min-max filters truncate their bounds at 1024 chars, so that the max
amount of memory used by min-max filters is negligible.

For now, min-max filters are only applied at the KuduScanner, which
passes them into the Kudu client.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Functional Testing:
- Added new planner tests and updated the old ones. (in old tests, a
  lot of runtime filters are renumbered as we always generate min-max
  filters even if they don't end up getting assigned and they take up
  some of the RF ids).
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter-test.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/Data.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 

[Impala-ASF-CR] IMPALA-6144: UpdateFilter()/PublishFilter() continue to run after query failure/cancellation

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8455 )

Change subject: IMPALA-6144: UpdateFilter()/PublishFilter() continue to run 
after query failure/cancellation
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc@394
PS1, Line 394:   if (!status_.ok()) return;
> same
Also, should we check if the backend is done too?

I think we should also include a comment explaining what this achieves.


http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1106
PS1, Line 1106: if (state->disabled()) return;
I think checking for the filter disabled is sufficient in this function, since 
it's disabled when resources are released.



--
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 16:02:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-11-03 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8023 )

Change subject: IMPALA-4856: Port data stream service to KRPC
..


Patch Set 7:

(81 comments)

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/exec/kudu-util.h
File be/src/exec/kudu-util.h:

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/exec/kudu-util.h@45
PS6, Line 45: .ok())
> nit: use same mangling between this and KUDU_RETURN_IF_ERROR (prepend seems
Done


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/exec-env.cc@89
PS6, Line 89: "processing threads. If left at default value 0, it will be 
set to number of CPU "
> how are these defaults chosen?
I didn't really change these parameters from Henry's patch.

There are no good defaults for the service queue size any way (IMPALA-6116) but 
1024 seems to be a reasonable bound for untracked memory consumption.

The latest PS sets num_svc_threads to match the number of cores on the system.


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h
File be/src/runtime/krpc-data-stream-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@97
PS6, Line 97: etion at any one
: /// time). The rate of transmission is controlled by the 
receiver: a sender will only
: /// schedule batch transmission when the previous transmissi
> i'm confused what the "two" cases are from this comment.  Also, I think it'
Done


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@103
PS6, Line 103: batch que
> deserialized?
Done


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@104
PS6, Line 104: moved from t
> what is this "respectively" referring to?
The two cases. Removed as this seems unnecessary.


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@249
PS6, Line 249:  for Transm
> unclear what that means, maybe stale comment? And this should say something
Done


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@258
PS6, Line 258:
> isn't that part of "memory pointed to by 'request'"? If so and you want to
Done


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@294
PS6, Line 294:
> How about getting rid of this typedef? The code seems easier to understand
This data structure has been renamed to DeserializeTask and also changed in the 
latest patch.


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@428
PS6, Line 428:
> a sender
Done


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h
File be/src/runtime/krpc-data-stream-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@314
PS3, Line 314: t
> Done
Oops..missed it. Will update in later patch.


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.cc@59
PS6, Line 59:
> I don't have a strong preference either way, but it'd be nice to be consist
My general guideline is to use lambda if it's one or two liners and use bind 
for larger functions. I guess there many different opinions on this topic. I 
try to optimize for readability.


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.cc@322
PS6, Line 322: tonicMillis() + STREAM_E
> shouldn't be plural
Done


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.h
File be/src/runtime/krpc-data-stream-recvr.h:

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.h@119
PS6, Line 119: row batch i
> row batch is deserialized
Done


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@52
PS6, Line 52: total amount of
> it's not clear what that means just from reading the comment. It'd be nice
Done


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@113
PS6, Line 113: erialized
> how about using unique_ptr since this owns the row batch (until it's transf
Done


http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@127
PS6, Line 127:   // If true, the receiver fragment for this stream got 
cancelled.
> given that a single soft limit is imposed across all sender queues, does it
For the non-merging case, there is essentially only one queue.

For the merging case, I can see that it's possible for a blocked row batch to a 
sender queue being passed by either new incoming row batches or blocked row 
batches to another sender queue. 

[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-11-03 Thread Michael Ho (Code Review)
Hello Sailesh Mukil, Mostafa Mokhtar, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8023

to look at the new patch set (#7).

Change subject: IMPALA-4856: Port data stream service to KRPC
..

IMPALA-4856: Port data stream service to KRPC

This patch implements a new data stream service which utilizes KRPC.
Similar to the thrift RPC implementation, there are 3 major components
to the data stream services: KrpcDataStreamSender serializes and sends
row batches materialized by a fragment instance to a KrpcDataStreamRecvr.
KrpcDataStreamMgr is responsible for routing an incoming row batch to
the appropriate receiver. The data stream service runs on the port
FLAGS_krpc_port which is 29000 by default.

Unlike the implementation with thrift RPC, KRPC provides an asynchronous
interface for invoking remote methods. As a result, KrpcDataStreamSender
doesn't need to create a thread per connection. There is one connection
between two Impalad nodes for each direction (i.e. client and server).
Multiple queries can multi-plex on the same connection for transmitting
row batches between two Impalad nodes. The asynchronous interface also
prevents some issues with thrift RPC in which a thread may be stuck in
an RPC call without being able to check for cancellation. A TransmitData()
call with KRPC is in essence a trio of RpcController, a serialized protobuf
request buffer and a protobuf response buffer. The call is invoked via a
DataStreamService proxy object. The serialized tuple offsets and row batches
are sent via "sidecars" in KRPC to avoid extra copy into the serialized
request buffer.

Each impalad node creates a singleton DataStreamService object at start-up
time. All incoming calls are served by a service thread pool created as part
of DataStreamService. By default, there are 64 service threads. The service
threads are shared across all queries so the RPC handler should avoid
blocking as much as possible. In thrift RPC implementation, we make a thrift
thread handling a TransmitData() RPC to block for extended period of time
when the receiver is not yet created when the call arrives. In KRPC
implementation, we store TransmitData() or EndDataStream() requests
which arrive before the receiver is ready in a per-receiver early sender list
stored in KrpcDataStreamMgr. These RPC calls will be processed and responded to
when the receiver is created or when timeout occurs.

Similarly, there is limited space in the sender queues in KrpcDataStreamRecvr.
If adding a row batch to a queue in KrpcDataStreamRecvr causes the buffer limit
to exceed, the request will be stashed in a blocked_sender_ queue to be 
processed
later. The stashed RPC request will not be responded to until it is processed
so as to exert back pressure to the client. An alternative would be to reply 
with
an error and the request / row batches need to be sent again. This may end up
consuming more network bandwidth than the thrift RPC implementation. This change
adopts the behavior of allowing one stashed request per sender.

All rpc requests and responses are serialized using protobuf. The equivalent of
TRowBatch would be ProtoRowBatch which contains a serialized header about the
meta-data of the row batch and two Kudu Slice objects which contain pointers to
the actual data (i.e. tuple offsets and tuple data).

This patch is based on an abandoned patch by Henry Robinson.

TESTING
---

* Build passed with FLAGS_use_krpc=true.

TO DO
-

* Port some BE tests to KRPC services.

Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
---
M be/src/common/status.cc
M be/src/common/status.h
M be/src/exec/data-sink.cc
M be/src/exec/exchange-node.cc
M be/src/exec/kudu-util.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/rpc-mgr.cc
M be/src/rpc/rpc-mgr.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/data-stream-mgr-base.h
M be/src/runtime/data-stream-mgr.h
M be/src/runtime/data-stream-recvr.h
M be/src/runtime/data-stream-sender.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
A be/src/runtime/krpc-data-stream-sender.cc
A be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/CMakeLists.txt
A be/src/service/data-stream-service.cc
A be/src/service/data-stream-service.h
M be/src/service/impala-server.cc
M cmake_modules/FindProtobuf.cmake
M common/protobuf/CMakeLists.txt
A common/protobuf/common.proto
A common/protobuf/data_stream_service.proto
A common/protobuf/row_batch.proto
M common/thrift/generate_error_codes.py
33 files changed, 3,124 insertions(+), 180 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8023/7
--
To view, visit http://gerrit.cloudera.org:8080/8023
To unsubscribe, 

[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects

2017-11-03 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8368 )

Change subject: IMPALA-2235: Fix current db when shell auto-reconnects
..


Patch Set 3:

(2 comments)

I removed ClusterController and left test_breakpad.py untouched.
The functionality available in CustomClusterTestSuite was enough for my test 
cases.

http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py
File tests/custom_cluster/test_shell_interactive_reconnect.py:

http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@25
PS2, Line 25:
> Perhaps use an actual tempfile created with tempfile.[something]
Done


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py
File tests/shell/util.py:

http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py@115
PS2, Line 115: move_shell_h
> maybe move_shell_history()/restore_shell_history()? It's more verbose but I
Done



--
To view, visit http://gerrit.cloudera.org:8080/8368
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
Gerrit-Change-Number: 8368
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Nov 2017 15:03:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects

2017-11-03 Thread Zoltan Borok-Nagy (Code Review)
Hello Philip Zeyliger, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8368

to look at the new patch set (#3).

Change subject: IMPALA-2235: Fix current db when shell auto-reconnects
..

IMPALA-2235: Fix current db when shell auto-reconnects

When precmd tested the connection it didn't validate that if we are
using the previously selected DB. The _validate_database method
is responsible for that, but it only appended the "use " command
to the cmdqueue (command queue of Cmd class). But, at this point we
might already have commands in the command queue that will run before
the "use " command.

Also, the command processed by precmd can entirely skip the cmdqueue,
therefore it is not enough to insert the "use " command to the front
of cmdqueue. We need to issue the USE command with the onecmd() method
to execute it immediately.

I extended the _validate_database method with an "immediately" flag.
If this is true, _validate_database will use the onecmd() method.
Otherwise, it will append the USE command to the cmdqueue to maintain
the previous behaviour.

I added a new automated test suite named test_shell_interactive_reconnect.py
to the "custom cluster" tests. It sets the default database, and after
reconnection it checks if the shell set it again automatically.

One test case checks if the shell set the default db after manually
reconnecting to the impala daemon by issuing the CONNECT command.
The other test case checks if the shell set the default db after
automatic reconnection due to cluster restart.

I needed to backup the impala shell history file because I didn't
want to pollute it by the test cases (just like the way it is done in
tests/shell/test_shell_interactive.py). I created utility functions for
this in tests/shell/util.py and now test_shell_interactive.py and
the newly created test suite are using these utility functions.

Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
---
M shell/impala_shell.py
A tests/custom_cluster/test_shell_interactive_reconnect.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
4 files changed, 102 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/8368/3
--
To view, visit http://gerrit.cloudera.org:8080/8368
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
Gerrit-Change-Number: 8368
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-03 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8447 )

Change subject: IMPALA-2181: Add query option levels for display
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8447/5/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/8447/5/be/src/service/query-options.cc@618
PS5, Line 618:
Probably I'm missing something here, but shouldn't this be ADVANCED?


http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@127
PS2, Line 127:   PRINT_REGULAR_OPTIONS_ONLY = 1
 :   PRINT_ALL_OPTIONS = 2
> I found it more verbose to have 2 const instead of a bool.
Ok, I see your point.

You could also put the PRINT_* members into a separate class to emulate enums 
(see class CmdStatus in L61-L69). Same goes for QUERY_OPTION_* members. On the 
other hand this might be an overkill, so use your best judgement :)


http://gerrit.cloudera.org:8080/#/c/8447/2/tests/shell/test_shell_interactive.py
File tests/shell/test_shell_interactive.py:

http://gerrit.cloudera.org:8080/#/c/8447/2/tests/shell/test_shell_interactive.py@351
PS2, Line 351:result = shell1.get_result()
 : assert "Query options (defaults shown in []):" in 
result.stdout
 : assert "ABORT_ON_ERROR" in result.stdout
> No. SUPPORT_START_OVER as I discussed with Tim is a tricky one and we don't
Thanks for the explanation.

I still find confusing that in impala::PopulateQueryOptionLevels() the query 
option level for SUPPORT_START_OVER is explicitly set to REGULAR. Shouldn't it 
be ADVANCED?



--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Nov 2017 11:20:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable

2017-11-03 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8428 )

Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h
File be/src/util/condition-variable.h:

http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h@48
PS1, Line 48:  const timespec* timeout
> I feel like this should be a const&, since it has to be non-null. That woul
OK, will do it.


http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h@65
PS1, Line 65:   bool TimedWait(boost::unique_lock& lock,
> This looks like it only has one callsite. There's another callsite in Block
Now that I looked into the callsites more carefully, I realized that there are 
two use cases for TimedWait. In Impala both of them are used. One use case is 
when we wait until a point in time:

 auto deadline = calculate_deadline();
 while (! condition) {
   if (!cv.TimedWait(lock, deadline) {
 // deadline reached without notification
 return Status::ERROR;
   }
 }

The other use case is waiting in a loop, and signaling some status periodically:

 while (! condition) {
   cv.TimedWait(lock, seconds(1));
   if (condition) break;
   else SendReport();
 }

Both API can be expressed in terms of the other, however it is a bit awkward 
(expressing deadline in terms of duration is more awkward, so if we have to 
choose, I think we should go with supporting deadlines).

There are examples for both in the code base:

FragmentInstanceState::ReportProfileThread: wait duration in terms of wait 
until deadline.

Promise::Get: wait until deadline in terms of wait duration.

Now I am on the side of support both use case with the two APIs, like 
std::condition_variable has wait_for and wait_until member functions. Maybe we 
can use these names also to be more explicit.

What do you think?



--
To view, visit http://gerrit.cloudera.org:8080/8428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9
Gerrit-Change-Number: 8428
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Nov 2017 10:05:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-03 Thread Bharath Vissapragada (Code Review)
Hello Sailesh Mukil,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#3).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=6

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 78 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/3
--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-6144: UpdateFilter()/PublishFilter() continue to run after query failure/cancellation

2017-11-03 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8455 )

Change subject: IMPALA-6144: UpdateFilter()/PublishFilter() continue to run 
after query failure/cancellation
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc@394
PS1, Line 394:   if (!status_.ok()) return;
same


http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1082
PS1, Line 1082:   if (!query_status_.ok()) return;
lock_?



--
To view, visit http://gerrit.cloudera.org:8080/8455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed
Gerrit-Change-Number: 8455
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Fri, 03 Nov 2017 06:44:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Install OpenJDK-dbg for development environments.

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8431 )

Change subject: Install OpenJDK-dbg for development environments.
..

Install OpenJDK-dbg for development environments.

Updates the boostrap script to install the OpenJDK debug symbols.

OpenJDK comes with a gdb stack unwinder that errors out if the OpenJDK
debugging symbols aren't installed. This fixes the following error
message in gdb:

  Installing openjdk unwinder
  Traceback (most recent call last):
File 
"/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py",
 line 52, in 
  class Types(object):
File 
"/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py",
 line 66, in Types
  nmethodp_t = gdb.lookup_type('nmethod').pointer()
  gdb.error: No type named nmethod.

I tested this by installing the package manually on Ubuntu16.04 and
using gdb a bit.

Change-Id: I9aa3a5e1bbba55a4b0b489e4a4dfa39e35fc0ae0
Reviewed-on: http://gerrit.cloudera.org:8080/8431
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins
---
M bin/bootstrap_system.sh
1 file changed, 2 insertions(+), 1 deletion(-)

Approvals:
  Jim Apple: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8431
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9aa3a5e1bbba55a4b0b489e4a4dfa39e35fc0ae0
Gerrit-Change-Number: 8431
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] Install OpenJDK-dbg for development environments.

2017-11-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8431 )

Change subject: Install OpenJDK-dbg for development environments.
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8431
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9aa3a5e1bbba55a4b0b489e4a4dfa39e35fc0ae0
Gerrit-Change-Number: 8431
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 03 Nov 2017 06:41:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-03 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h
File be/src/statestore/statestore-subscriber.h:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h@165
PS2, Line 165:   typedef TUniqueId RegistrationId;
> You can move this typedef to statestore.h and use the same type in statesto
Done


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383
PS2, Line 383: SubscriberId
>  Where ever the 'SubscriberId' is
 >  required, we just get it from the Subscriber object anyway, and
 >  that object can be retrieved using the unique registration id.

Not sure I follow. If you see OfferUpdate()/DoSubscriberUpdate(), we only keep 
track of ScheduledSubscriberUpdate objects and get the corresponding Subscriber 
based on ScheduledSubscriberUpdate.subscriber_Id. So we don't have a Subscriber 
object handy to get the subscriber_id in all cases.

Also, if we remove SubscriberId everywhere, this means that we need to change 
the subscribers_ map structure to map from RegistrationId -> Subscriber 
objects. Doing so we can't look up by subscriber_id, which is required in 
RegisterSubscriber() (At that point, we don't assign a RegistrationId yet to 
the new instance)

SubscriberMap::iterator subscriber_it = subscribers_.find(subscriber_id);
if (subscriber_it != subscribers_.end()) {
  UnregisterSubscriber(subscriber_it->second.get());
}

We can still figure out a way, but it seemed unnecessarily complex to me. 
Thoughts?


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@481
PS2, Line 481: subscriber exists
> Could you clarify what "exists" means here exactly? It could be confused wi
Done


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@482
PS2, Line 482: registration_ids
> nit: registration_id
Done


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@484
PS2, Line 484: std::shared_ptr* subscriber
> Add a comment about what is returned in this out parameter.
clarified


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414
PS2, Line 414: onst SubscriberId& subscriber_id,
 : const TUniqueId& registration_id
> It looks like it just makes sense to pass 'const Subscriber&' here? Is ther
Not sure I understand. We get the subscriber/registration_id from the 
ScheduledSubscriberUpdate object and not the Subscriber object. Do you mean we 
should pass directly pass ScheduledSubscriberUpdate instead? If thats the case, 
the signature seems kinda weird :)


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@634
PS2, Line 634:   if (!RegisteredSubscriberExists(subscriber_to_update.first, 
subscriber_to_update.second,
> I'm a little worried that we're contending for a mutex two more times in th
Good point. I too think (theoretically) spinlock is probably a better choice to 
avoid context switching. Also, ~1000 entries seem like a reasonable estimate 
for foreseeable future :). Also, I think for string based hashing of ~1000 
entries, it is reasonable to assume a O(1) average case lookup (even though the 
worst case is O(N)).

I created a microbenchmark to see if the lock type makes a difference. In the 
benchmark, I measured how long it takes to get 100 heartbeats for a given 
subscriber (with heart beating thread pool sizes of 10/100).
I didn't see any noticeable difference for 100 subscribers, but beyond that, 
the test runs into flaky socket connection issues. I admit that this is not 
representative of the real world use case because in real clusters, the 
statestore CPU would be much busier and the context switching could be more 
expensive.



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 03 Nov 2017 06:22:18 +
Gerrit-HasComments: Yes