[Impala-ASF-CR] IMPALA-6410: Tool to cherrypick changes across branches.

2018-01-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9045 )

Change subject: IMPALA-6410: Tool to cherrypick changes across branches.
..


Patch Set 3:

(29 comments)

Thanks for taking this on!

http://gerrit.cloudera.org:8080/#/c/9045/2/bin/compare_branches.py
File bin/compare_branches.py:

http://gerrit.cloudera.org:8080/#/c/9045/2/bin/compare_branches.py@15
PS2, Line 15: # Compares two specified branches, using the Gerrit Change-Id as 
the
Can you add an example invocation and the output of running it with --help?


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py
File bin/compare_branches.py:

http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@15
PS3, Line 15: # Compares two specified branches, using the Gerrit Change-Id as 
the
Could you add a couple of example invocations?


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@19
PS3, Line 19: #
Can you add a motivation section on why this is automated? If it were single 
cherry-picks, we would just do them manually, but this is special because it 
helps maintain two active development branches.


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@25
PS3, Line 25:
nit: heterogeneous spacing.


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@26
PS3, Line 26: # "commits": [ "", 
"" ]
nit: long line


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@26
PS3, Line 26: 
Can you make the commits a struct with a field for the hash and an optional 
field for comments on why a particular commit is ignored?

Also, can you call out the requirements for the hash? It has to be the full 
hash, yes?


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@32
PS3, Line 32: # Debug logging to stderr can be enabled with the --verbose flag.
Big-picture-wise, I think it would help if there were some prose explaining the 
process of maintaining two active branches, including things like how on 
denotes that a patch should not be cherry picked, when review happens on 
cherry-picks, and so on. Either Wiki or README would work, I think. What do you 
think?


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@49
PS3, Line 49: def read_ignored_commits(ignored_commits_file):
nit: The param is a file name, yes? Can you make a note that it uses the json 
format above?


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@50
PS3, Line 50:   '''Returns a dictionary containing commits that should be 
ignored.
nit: we normally use https://www.python.org/dev/peps/pep-0257/: double quotes, 
single subject line, then an empty line, then indentation of the following line 
that lines up with the quotation mark (not the first line's prose).


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@64
PS3, Line 64:   '''Creates a map from change id to (commit_msg, hash, author, 
date, body).'''
nit: can this tuple have the same order as the destructuring on line 74?


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@68
PS3, Line 68: %s
git help log calls this the "subject", rather than the message.


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@71
PS3, Line 71: sh.git
Possibl digression: does sh.git have documentation other than the generic sh 
documentation and https://amoffat.github.io/sh/sections/contrib.html#git?

I was hoping to understand if there was a way to separate tabbed values that 
was the common idiom.


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@75
PS3, Line 75: ValueError
Why swallow this exception?


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@79
PS3, Line 79: 0
Log an error if there are two or more matches?


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@82
PS3, Line 82:   logging.debug('Warning: commit {0} ({1}...) has no 
Change-Id.'.format(commit_hash, msg[:40]))
nit: long line


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@86
PS3, Line 86: def create_parser():
It would help readers of this file (I think) if this were right after the 
comment that opens the file. I think a few of our other scripts even paste the 
--help into the comment at the top.


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@90
PS3, Line 90: current
Also has to be the target branch, yes?


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@95
PS3, Line 95: branch names are used as is
By that, do you mean that the default is to compare asf-gerrit/master and 
asf-gerrit/2.x?


http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@97
PS3, Line 97:   help='Name of the target git remote; defaults to source 
remote')
So the special empty-string 

[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-01-17 Thread Vuk Ercegovac (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8523

to look at the new patch set (#6).

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range data structures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
- TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
---
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
8 files changed, 332 insertions(+), 183 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/6
--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 6
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-01-17 Thread Vuk Ercegovac (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8523

to look at the new patch set (#5).

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range data structures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
- TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
---
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/client-request-state.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
9 files changed, 333 insertions(+), 183 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/5
--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 5
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-01-17 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8523/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8523/4//COMMIT_MSG@24
PS4, Line 24: datastructures
> nit: data structures
Done


http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/query-schedule.h
File be/src/scheduling/query-schedule.h:

http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/query-schedule.h@260
PS4, Line 260: // Map of plan node to scan ranges that are computed from 
request_. If a plan node is
 :   // not present, then it has no associated additional scan 
ranges.
 :   // Populated by the ComputeScanRanges.
 :   std::map 
computed_ranges_;
> How is this related to PerNodeScanRanges (L41)?
backing out this refactor.


http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/scheduler.cc@258
PS4, Line 258: // Combine actual and computed ranges from the schedule.
 :   const vector* scan_ranges = 

 :   const vector* computed_ranges =
 :   schedule->GetComputedRanges(entry.first);
 :   vector combined_ranges;
 :   if (computed_ranges != nullptr) {
 : for (const auto& range : entry.second) {
 :   if (!range.scan_range.__isset.hdfs_file_split_spec) {
 : combined_ranges.push_back(range);
 :   }
 : }
 : for (const auto& range : 
(*schedule->GetComputedRanges(entry.first))) {
 :   combined_ranges.push_back(range);
 : }
 : scan_ranges = _ranges;
 :   }
> I think my previous comment made things worse, sorry about that. I think it
no problems. added back here. expansion is done per spec in a separate helper 
function below.


http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift@199
PS4, Line 199: 3: required i64 partition_id
> Comment what this is referencing.
Done


http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift@209
PS4, Line 209: 4: optional THdfsFileSplitSpec hdfs_file_split_spec
> Add a comment either here or at the top to describe when THdfsFileSplit vs
Lets go with THdfsFileSplitGeneratorSpec


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@126
PS4, Line 126: fileFormat'
> nit: missing quote
Done


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@712
PS4, Line 712: fileDesc.getNumFileBlocks()
> What will happen if this is a zero-byte file (no blocks). Can't we prune th
changed it to be more explicit (based on the fs).


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@743
PS4, Line 743: partitionFs.getDefaultBlockSize(new Path(fileDesc.getFileName())
> Do we need to do this for every file? Can't we do this once per partition?
good point, lifted that to be per partition.


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@761
PS4, Line 761: disk ids are expected in blocks,
 :* checkMissingDiskIds should be set to true. If disk ids are 
checked,
> "checkMissingDiskIds is true, "
Done



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 4
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 18 Jan 2018 05:47:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6368: make test chars parallel

2018-01-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9022 )

Change subject: IMPALA-6368: make test_chars parallel
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1740/


--
To view, visit http://gerrit.cloudera.org:8080/9022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f62ede90f619b8cebbb1276bab903e7555d9744
Gerrit-Change-Number: 9022
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Jan 2018 05:23:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6193: Track memory of incoming data streams

2018-01-17 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8914 )

Change subject: IMPALA-6193: Track memory of incoming data streams
..


Patch Set 6:

(8 comments)

Thanks for the review, please see PS6.

http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/rpc/impala-service-pool.cc
File be/src/rpc/impala-service-pool.cc:

http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/rpc/impala-service-pool.cc@58
PS5, Line 58:   DCHECK(mem_tracker_ != nullptr);
> nit: extra indentation.
Done


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@172
PS5, Line 172: const EndDataStreamRequestPB* request, 
EndDataStreamResponsePB* response,
> I generally think it's better to Consume() before Release()ing to avoid und
Done, thanks for the suggestion. Let me know if you'd like me to factor the 
code into an own method, since it's repeated in AddEarlySender() above.


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@212
PS5, Line 212: // time without considering the remaining number of senders. 
As a consequence,
> Has the memory been released at this point?
It hasn't but was about to in RespondSuccess(). I made it explicit by calling 
DiscardTransfer().


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@268
PS5, Line 268: }
> Has the rpc_context memory been released at this point?
RespondSuccess() would release it, I made it more precise by calling 
DiscardTransfer(). However, EndDataStream RPCs don't have sidecars, so only the 
serialized PB messages will be discarded, making up less than 100 bytes.


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@387
PS5, Line 387: // to senders which sent EOS RPC so all query fragments will 
eventually be cancelled.
> We haven't actually released the memory at this point, have we? Should we b
Done


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@391
PS5, Line 391:   for (const unique_ptr& ctx : 
senders_queue.waiting_sender_ctxs) {
> Same question here.
Done


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-recvr.cc@410
PS5, Line 410: }
> Is the memory released?
No, at this point it is still held. I moved the release to the right place.


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-recvr.cc@458
PS5, Line 458:   {
> Same here
At this point the rpc_context was even destroyed - this was a bug. I still 
haven't found a way to trigger deferred RPCs in a test which is why I hadn't 
caught this.



--
To view, visit http://gerrit.cloudera.org:8080/8914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4
Gerrit-Change-Number: 8914
Gerrit-PatchSet: 6
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Jan 2018 03:45:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6193: Track memory of incoming data streams

2018-01-17 Thread Lars Volker (Code Review)
Hello Michael Ho, Tim Armstrong, Bikramjeet Vig,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8914

to look at the new patch set (#6).

Change subject: IMPALA-6193: Track memory of incoming data streams
..

IMPALA-6193: Track memory of incoming data streams

This change adds memory tracking to incoming transmit data RPCs when
using KRPC. We track memory against a global tracker called "Data Stream
Service" until it is handed over to the stream manager. There we track
it in a global tracker called "Data Stream Queued RPC Calls" until a
receiver registers and takes over the early sender RPCs. Inside the
receiver, memory for deferred RPCs is tracked against the fragment
instance's memtracker until we unpack the batches and add them to the
row batch queue.

The DCHECK in MemTracker::Close() covers that all memory consumed by a
tracker gets release eventually. In addition to that, this change adds a
custom cluster test that makes sure that queued memory gets tracked by
inspecting the peak consumption of the new memtrackers.

Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr.cc
M be/src/rpc/rpc-mgr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/mem-tracker.h
M be/src/util/memory-metrics.h
M common/protobuf/data_stream_service.proto
A tests/custom_cluster/test_krpc_mem_usage.py
A tests/verifiers/mem_usage_verifier.py
13 files changed, 270 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8914/6
--
To view, visit http://gerrit.cloudera.org:8080/8914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4
Gerrit-Change-Number: 8914
Gerrit-PatchSet: 6
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5478: Run TPCDS queries with decimal v2 enabled

2018-01-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8985 )

Change subject: IMPALA-5478: Run TPCDS queries with decimal_v2 enabled
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib867c51a521ec4a087bc127d99aee4b95ba97733
Gerrit-Change-Number: 8985
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 18 Jan 2018 03:28:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6382: Cap spillable buffer size and max row size query options

2018-01-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9023 )

Change subject: IMPALA-6382: Cap spillable buffer size and max row size query 
options
..


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1738/


--
To view, visit http://gerrit.cloudera.org:8080/9023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1
Gerrit-Change-Number: 9023
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Jan 2018 02:13:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5519: Allocate fragment's runtime filter memory from Buffer pool

2018-01-17 Thread Bikramjeet Vig (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8971

to look at the new patch set (#4).

Change subject: IMPALA-5519: Allocate fragment's runtime filter memory from 
Buffer pool
..

IMPALA-5519: Allocate fragment's runtime filter memory from Buffer pool

This patch adds changes to the planner to account for memory used by
bloom filters at the fragment instance level. Also adds changes to
allocate memory for those bloom filters from the buffer pool.

Testing:
- Modified Planner Tests and end to end tests to account for memory
  reservation for the runtime filters.
- Modified backend tests and benchmarks to use the bufferpool for
  bloom filter allocation.
- Add an end to end test.
- Ran rest of the core tests.

Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f
---
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/service/fe-support.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/util/backend-gflag-util.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/BackendGflags.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M 
testdata/workloads/functional-query/queries/QueryTest/admission-reject-min-reservation.test
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
40 files changed, 883 insertions(+), 550 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/8971/4
--
To view, visit http://gerrit.cloudera.org:8080/8971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f
Gerrit-Change-Number: 8971
Gerrit-PatchSet: 4
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2397: Use atomics for IntGauge and IntCounter

2018-01-17 Thread Michael Ho (Code Review)
Hello Lars Volker, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9012

to look at the new patch set (#3).

Change subject: IMPALA-2397: Use atomics for IntGauge and IntCounter
..

IMPALA-2397: Use atomics for IntGauge and IntCounter

This change removes the spinlock in IntGauge and IntCounter
and uses AtomicInt64 instead. As shown in IMPALA-2397, multiple
threads can be contending for the spinlocks of some global metrics
under concurrent queries.

This change also breaks up SimpleMetric is renamed to ScalarMetric
and broken into two subclasses:
- LockedMetric:
  - a value store for any primitive type (int,float,string etc).
  - atomic read and write via GetValue() and SetValue() respectively.

- AtomicMetric:
  - the basis of IntGauge and IntCounter. Support atomic increment
of the metric value via Increment() interface.
  - atomic read and write via GetValue() and SetValue() respectively.
  - only support int64_t type.

Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606
---
M be/src/exec/external-data-source-executor.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/thrift-server.cc
M be/src/runtime/client-cache.cc
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/io/scan-range.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/mem-tracker-test.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/service/session-expiry-test.cc
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore.cc
M be/src/util/common-metrics.cc
M be/src/util/default-path-handlers.cc
M be/src/util/impalad-metrics.cc
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M be/src/util/metrics-test.cc
M be/src/util/metrics.h
M be/src/util/thread.cc
M common/thrift/metrics.json
30 files changed, 353 insertions(+), 332 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/9012/3
--
To view, visit http://gerrit.cloudera.org:8080/9012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606
Gerrit-Change-Number: 9012
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6368: make test chars parallel

2018-01-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9022 )

Change subject: IMPALA-6368: make test_chars parallel
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1740/


--
To view, visit http://gerrit.cloudera.org:8080/9022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f62ede90f619b8cebbb1276bab903e7555d9744
Gerrit-Change-Number: 9022
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Jan 2018 01:41:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6219: Use AES-GCM for spill-to-disk encryption

2018-01-17 Thread Xianda Ke (Code Review)
Xianda Ke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9032 )

Change subject: IMPALA-6219: Use AES-GCM for spill-to-disk encryption
..


Patch Set 4:

Hi folks, Would you please help review when you are available?
@Tim, Would you please help run the test on all the platforms(old centos...)


--
To view, visit http://gerrit.cloudera.org:8080/9032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ea87b82a8897ee8bfa187715ac1c52883790d24
Gerrit-Change-Number: 9032
Gerrit-PatchSet: 4
Gerrit-Owner: Xianda Ke 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Xianda Ke 
Gerrit-Comment-Date: Thu, 18 Jan 2018 00:43:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6219: Use AES-GCM for spill-to-disk encryption

2018-01-17 Thread Xianda Ke (Code Review)
Xianda Ke has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9032


Change subject: IMPALA-6219: Use AES-GCM for spill-to-disk encryption
..

IMPALA-6219: Use AES-GCM for spill-to-disk encryption

AES-GCM can be very fast(~10 times faster than CFB+SHA256), but it
requires an instruction that Impala can currently run without (CLMUL).
In order to be fast, we dispatch to GCM mode at run-time based on the
CPU and OpenSSL version.

Testing:
run runtime tmp-file-mgr-test, openssl-util-test, buffer-pool-test
and buffered-tuple-stream-test.
add two cases GcmIntegrity & EncryptoArbitraryLength for
openssl-util-test

Change-Id: I1ea87b82a8897ee8bfa187715ac1c52883790d24
---
M be/src/runtime/tmp-file-mgr.cc
M be/src/util/cpu-info.cc
M be/src/util/cpu-info.h
M be/src/util/openssl-util-test.cc
M be/src/util/openssl-util.cc
M be/src/util/openssl-util.h
6 files changed, 176 insertions(+), 37 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/9032/4
--
To view, visit http://gerrit.cloudera.org:8080/9032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1ea87b82a8897ee8bfa187715ac1c52883790d24
Gerrit-Change-Number: 9032
Gerrit-PatchSet: 4
Gerrit-Owner: Xianda Ke 


[Impala-ASF-CR] IMPALA-3942: Fix wrongly escaped string literal in front-end

2018-01-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8818 )

Change subject: IMPALA-3942: Fix wrongly escaped string literal in front-end
..


Patch Set 7:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8818/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java:

http://gerrit.cloudera.org:8080/#/c/8818/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@44
PS6, Line 44: this(value, ScalarType.STRING, true);
> Done. By the way, would you tell me the reason why you don't want to keep t
That's a fair question, thanks for asking!

* In most cases there will be no redundant computation. The common case is that 
this literal is passed to the BE via toThrift() so getNormalizedValue() is only 
called once.

* I'm less concerned about redundant computations and more concerned about 
doubling the memory consumption. This point will probably only matter in 
extreme cases.

* Easier to reason about and maintain fewer class members. The normalized 
string is "redundant" information because it can be derived from 'value_'.


http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java:

http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@87
PS7, Line 87: return BaseSemanticAnalyzer.unescapeSQLString("'" + 
getNormalizedValue()
whitespace


http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@91
PS7, Line 91:   // Assumes a string literal from single quotes and escapes it 
because:
What does this function do and return? Please consider these snippets which 
might improve the comment:

Returns a normalized representation of the string value suitable for embedding 
in SQL as a single-quoted string literal.
String literals can come directly from the SQL of a query or from rewrites like 
constant folding, so we generally do not know whether single or double quotes 
are appropriate for a given value.
The SQL standard prescribes single-quoted string literals so we normalize that 
way.


http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@100
PS7, Line 100:   private String getNormalizedValue() {
nit: we typically use /* */ style comment for class and function comments


http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@101
PS7, Line 101: final int lengthOfValue = value_.length();
len?


http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@468
PS7, Line 468:   public void escapeStringLiteralTest() {
normalizeStringLiteralsTest


http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@471
PS7, Line 471: testToSql("select \"'\"", "SELECT '\\''");
add a few tests with single quotes that show they are not transformed


http://gerrit.cloudera.org:8080/#/c/8818/6/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:

http://gerrit.cloudera.org:8080/#/c/8818/6/tests/query_test/test_queries.py@310
PS6, Line 310:
> I agree with your suggestion. ToSqlTest mainly checks the problem. By the w
I'm ok with leaving some E2E tests.

Imo they belong in views-ddl.test since the user-visible bug is really around 
creating/using views, and the toSql() issue is already covered by ToSqlTest 
unit tests.



--
To view, visit http://gerrit.cloudera.org:8080/8818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc4b5f5d8ffaa8feb96a466959427a04b3b06fec
Gerrit-Change-Number: 8818
Gerrit-PatchSet: 7
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Jan 2018 00:20:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5519: Allocate Runtime filter memory from Buffer pool

2018-01-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8971 )

Change subject: IMPALA-5519: Allocate Runtime filter memory from Buffer pool
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8971/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8971/3//COMMIT_MSG@12
PS3, Line 12:
Maybe mention that it doesn't address the coordinator memory problem?



--
To view, visit http://gerrit.cloudera.org:8080/8971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f
Gerrit-Change-Number: 8971
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Jan 2018 00:19:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5519: Allocate Runtime filter memory from Buffer pool

2018-01-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8971 )

Change subject: IMPALA-5519: Allocate Runtime filter memory from Buffer pool
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8971/3/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/8971/3/be/src/util/bloom-filter.h@112
PS3, Line 112: directoty_
typo: directory


http://gerrit.cloudera.org:8080/#/c/8971/3/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/8971/3/be/src/util/bloom-filter.cc@50
PS3, Line 50:   Close();
Calling Close() is a little subtle. Maybe mention the reason for doing this so 
it isn't lost.


http://gerrit.cloudera.org:8080/#/c/8971/3/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/8971/3/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@263
PS3, Line 263: 
.sum(planTreeProfile.duringOpenProfile.max(fInstancePostOpenProfile));
I missed this on the first pass. I think we need to add the runtime filters 
reservation to both duringOpenProfile and fInstancePostOpenProfile, then take 
the max of those. This is because the filters may be constructed during open 
and live after that.


http://gerrit.cloudera.org:8080/#/c/8971/3/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test:

http://gerrit.cloudera.org:8080/#/c/8971/3/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test@59
PS3, Line 59:partitions=24/24 files=24 size=174.39KB
Changes in this file look spurious.


http://gerrit.cloudera.org:8080/#/c/8971/3/testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test:

PS3:
Changes to this file (and some of the later ones) look spurious too.


http://gerrit.cloudera.org:8080/#/c/8971/2/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
File 
testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test:

http://gerrit.cloudera.org:8080/#/c/8971/2/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test@391
PS2, Line 391: # mem requirement after accounting for the memory
> the old test was already flawed in that it never exercised the code path th
Thanks for the detailed explanation.



--
To view, visit http://gerrit.cloudera.org:8080/8971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f
Gerrit-Change-Number: 8971
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Jan 2018 00:18:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5478: Run TPCDS queries with decimal v2 enabled

2018-01-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8985 )

Change subject: IMPALA-5478: Run TPCDS queries with decimal_v2 enabled
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1739/


--
To view, visit http://gerrit.cloudera.org:8080/8985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib867c51a521ec4a087bc127d99aee4b95ba97733
Gerrit-Change-Number: 8985
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 17 Jan 2018 23:45:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6410: Tool to cherrypick changes across branches.

2018-01-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9045 )

Change subject: IMPALA-6410: Tool to cherrypick changes across branches.
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/9045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9
Gerrit-Change-Number: 9045
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Wed, 17 Jan 2018 23:37:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6410: Tool to cherrypick changes across branches.

2018-01-17 Thread Philip Zeyliger (Code Review)
Hello Taras Bobrovytsky,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9045

to look at the new patch set (#3).

Change subject: IMPALA-6410: Tool to cherrypick changes across branches.
..

IMPALA-6410: Tool to cherrypick changes across branches.

This script compares two branches and optionally cherry-picks changes
across. It uses the Gerrit Change-Id as the key, and it supports a
configuration file and a string to ignore commits.

Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9
---
A bin/compare_branches.py
A bin/ignored_commits.json
2 files changed, 218 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/9045/3
--
To view, visit http://gerrit.cloudera.org:8080/9045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9
Gerrit-Change-Number: 9045
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 18: Code-Review+1

(1 comment)

Thanks for fixing this gap in test coverage! Will also give Alex a change to 
look.

http://gerrit.cloudera.org:8080/#/c/8775/18/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

http://gerrit.cloudera.org:8080/#/c/8775/18/testdata/bin/create-load-data.sh@443
PS18, Line 443:   # IMPALA-4993: Add tests where nested collections are stored 
in more than row-group
We unfortunately have multiple ways to load custom data files for tests. Doing 
it in this shell script isn't ideal and we may want to switch to some other 
method at some point. However, this seems like the best approach for now so 
it's consistent with the other multi-block/row group files



--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 18
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 17 Jan 2018 23:01:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-17 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Tim Armstrong, Alex Behm, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8775

to look at the new patch set (#18).

Change subject: IMPALA-4993: extend dictionary filtering to collections
..

IMPALA-4993: extend dictionary filtering to collections

Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of SlotId to conjuncts that
are dictionary filterable. This mapping now includes SlotId's
that reference nested tuples. The backend is adjusted to
use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/collection-value-builder.h
M be/src/runtime/scoped-buffer.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/CustomerMultiBlock/README
A testdata/CustomerMultiBlock/customer_multiblock.parquet
M testdata/bin/create-load-data.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
21 files changed, 734 insertions(+), 250 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/18
--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 18
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-17 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Tim Armstrong, Alex Behm, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8775

to look at the new patch set (#17).

Change subject: IMPALA-4993: extend dictionary filtering to collections
..

IMPALA-4993: extend dictionary filtering to collections

Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of SlotId to conjuncts that
are dictionary filterable. This mapping now includes SlotId's
that reference nested tuples. The backend is adjusted to
use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/collection-value-builder.h
M be/src/runtime/scoped-buffer.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/CustomerMultiBlock/README
A testdata/CustomerMultiBlock/customer_multiblock.parquet
M testdata/bin/create-load-data.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
21 files changed, 734 insertions(+), 250 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/17
--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 17
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload

2018-01-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9009 )

Change subject: IMPALA-6386: Invalidate metadata at table level for dataload
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/9009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
Gerrit-Change-Number: 9009
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 17 Jan 2018 22:52:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6193: Track memory of incoming data streams

2018-01-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8914 )

Change subject: IMPALA-6193: Track memory of incoming data streams
..


Patch Set 5:

(8 comments)

Looking pretty good, remaining comments are mainly about making the accounting 
as precise as possible.

http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/rpc/impala-service-pool.cc
File be/src/rpc/impala-service-pool.cc:

http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/rpc/impala-service-pool.cc@58
PS5, Line 58: DCHECK(mem_tracker_ != nullptr);
nit: extra indentation.


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@172
PS5, Line 172:   
incoming_request_tracker_->Release(rpc_context->GetTransferSize());
I generally think it's better to Consume() before Release()ing to avoid 
undercounting.

We could actually avoid the problem I think if we make use of the fact that the 
two trackers share a parent and use ConsumeLocal() and ReleaseLocal() so that 
the offsetting increments and decrements don't modify consumption further up in 
the tree. E.g.

  DCHECK_EQ(incoming_request_tracker_->parent(), mem_tracker_->parent());
  incoming_request_tracker_->ReleaseLocal(rpc_context->GetTransferSize(), 
mem_tracker_->parent());
  mem_tracker_->ConsumeLocal(rpc_context->GetTransferSize(), 
mem_tracker_->parent());

As a side-effect we'd reduce contention on the MemTrackers further up the tree.


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@212
PS5, Line 212: 
incoming_request_tracker_->Release(rpc_context->GetTransferSize());
Has the memory been released at this point?


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@268
PS5, Line 268:   
incoming_request_tracker_->Release(rpc_context->GetTransferSize());
Has the rpc_context memory been released at this point?


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@387
PS5, Line 387: 
mem_tracker_->Release(ctx->rpc_context->GetTransferSize());
We haven't actually released the memory at this point, have we? Should we be 
using the DiscardTransfer() method?


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@391
PS5, Line 391: 
mem_tracker_->Release(ctx->rpc_context->GetTransferSize());
Same question here.


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-recvr.cc@410
PS5, Line 410: 
recvr_->mem_tracker()->Release(ctx->rpc_context->GetTransferSize());
Is the memory released?


http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-recvr.cc@458
PS5, Line 458:   
recvr_->mem_tracker()->Release(payload->rpc_context->GetTransferSize());
Same here



--
To view, visit http://gerrit.cloudera.org:8080/8914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4
Gerrit-Change-Number: 8914
Gerrit-PatchSet: 5
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 22:42:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

2018-01-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8973 )

Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only 
column privileges
..

IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

USE and SHOW TABLES should be allowed if there is at least one
table in a database where the user has table or column
privileges. Impala incorrectly checked only for table privileges.

To test this issue in AuthorizationTest.java, 'functional_avro'
is added as a test database with only column level permissions.

Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Reviewed-on: http://gerrit.cloudera.org:8080/8973
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/resources/authz-policy.ini.template
4 files changed, 29 insertions(+), 6 deletions(-)

Approvals:
  Alex Behm: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Gerrit-Change-Number: 8973
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 


[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

2018-01-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8973 )

Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only 
column privileges
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Gerrit-Change-Number: 8973
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 17 Jan 2018 22:40:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6399: Increase timeout in test observability to reduce flakiness

2018-01-17 Thread Lars Volker (Code Review)
Lars Volker has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9034 )

Change subject: IMPALA-6399: Increase timeout in test_observability to reduce 
flakiness
..

IMPALA-6399: Increase timeout in test_observability to reduce flakiness

Change-Id: I58f7e7b367e73675be42e85f55fd7698d51f92af
Reviewed-on: http://gerrit.cloudera.org:8080/9034
Reviewed-by: Sailesh Mukil 
Tested-by: Lars Volker 
---
M tests/query_test/test_observability.py
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Sailesh Mukil: Looks good to me, approved
  Lars Volker: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I58f7e7b367e73675be42e85f55fd7698d51f92af
Gerrit-Change-Number: 9034
Gerrit-PatchSet: 2
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-6399: Increase timeout in test observability to reduce flakiness

2018-01-17 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9034 )

Change subject: IMPALA-6399: Increase timeout in test_observability to reduce 
flakiness
..


Patch Set 1: Verified+1

Tests passed here: https://jenkins.impala.io/job/gerrit-verify-dryrun/1729/


--
To view, visit http://gerrit.cloudera.org:8080/9034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I58f7e7b367e73675be42e85f55fd7698d51f92af
Gerrit-Change-Number: 9034
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 17 Jan 2018 22:31:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6382: Cap spillable buffer size and max row size query options

2018-01-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9023 )

Change subject: IMPALA-6382: Cap spillable buffer size and max row size query 
options
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1738/


--
To view, visit http://gerrit.cloudera.org:8080/9023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1
Gerrit-Change-Number: 9023
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 22:31:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2397: Use atomics for metrics

2018-01-17 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9012 )

Change subject: IMPALA-2397: Use atomics for metrics
..


Patch Set 2: Code-Review+1

I don't have further comments, thanks for the inline replies.


--
To view, visit http://gerrit.cloudera.org:8080/9012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606
Gerrit-Change-Number: 9012
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 22:10:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6382: Cap spillable buffer size and max row size query options

2018-01-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9023 )

Change subject: IMPALA-6382: Cap spillable buffer size and max row size query 
options
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1
Gerrit-Change-Number: 9023
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 21:47:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6382: Cap spillable buffer size and max row size query options

2018-01-17 Thread Bikramjeet Vig (Code Review)
Hello Thomas Tauber-Marshall, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9023

to look at the new patch set (#2).

Change subject: IMPALA-6382: Cap spillable buffer size and max row size query 
options
..

IMPALA-6382: Cap spillable buffer size and max row size query options

Currently the default and min spillable buffer size and max row size
query options accept any valid int64 value. Since the planner depends
on these values for memory estimations, if a very large value close to
the limits of int64 is set, the variables representing or relying on
these estimates can overflow during different phases of query execution.

This patch puts a reasonable upper limit of 1TB to these query options
to prevent such a situation.

Testing:
Added backend query option tests.

Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
3 files changed, 32 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/9023/2
--
To view, visit http://gerrit.cloudera.org:8080/9023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1
Gerrit-Change-Number: 9023
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6382: Cap default and min spillable buffer size query options

2018-01-17 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9023 )

Change subject: IMPALA-6382: Cap default and min spillable buffer size query 
options
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9023/1/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

http://gerrit.cloudera.org:8080/#/c/9023/1/be/src/exec/exec-node.h@227
PS1, Line 227:   static const int64_t MAX_SPILLABLE_BUFFER_SIZE = 1LL << 40; // 
1 TB
> I think we should just have the constant in query-options.cc instead of add
Done


http://gerrit.cloudera.org:8080/#/c/9023/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/9023/1/be/src/service/query-options.cc@573
PS1, Line 573:   case TImpalaQueryOptions::MAX_ROW_SIZE: {
> I wonder if there's a similar bug with MAX_ROW_SIZE
yes it has the same bug, added a fix for that too



--
To view, visit http://gerrit.cloudera.org:8080/9023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1
Gerrit-Change-Number: 9023
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 21:36:09 +
Gerrit-HasComments: Yes


[native-toolchain-CR] Bump thrift to 0.9.3-p2; Add bison 3.0.4

2018-01-17 Thread Tianyi Wang (Code Review)
Tianyi Wang has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9035 )

Change subject: Bump thrift to 0.9.3-p2; Add bison 3.0.4
..

Bump thrift to 0.9.3-p2; Add bison 3.0.4

This patch upgrades thrift to 0.9.3-p2. Two patches are applied:
1. THRIFT-3650.
2. Remove Twisted Python dependency.

Thrift 0.9.3 depends on bison 2.5 or higher, which is not available on
some distributions including centos 5. This patch adds bison 3.0.4 to
the toolchain.
The minimum openssl version is bumped to 1.0.1. Bundled openssl will be
used to build thrift if the requirement is not met.
Testing: It builds on all supported OSes.

Change-Id: I710e977e4a182e5d3aaf58f1233c68c04585bebd
---
M buildall.sh
A source/bison/build.sh
M source/thrift/build.sh
A source/thrift/thrift-0.9.3-patches/0001-THRIFT-3650.patch
A source/thrift/thrift-0.9.3-patches/0002-reomve-twisted-pyton-dependency.patch
5 files changed, 163 insertions(+), 34 deletions(-)

Approvals:
  Philip Zeyliger: Looks good to me, approved
  Tim Armstrong: Looks good to me, approved
  Tianyi Wang: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9035
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I710e977e4a182e5d3aaf58f1233c68c04585bebd
Gerrit-Change-Number: 9035
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 


[native-toolchain-CR] Bump thrift to 0.9.3-p2; Add bison 3.0.4

2018-01-17 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9035 )

Change subject: Bump thrift to 0.9.3-p2; Add bison 3.0.4
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/9035
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I710e977e4a182e5d3aaf58f1233c68c04585bebd
Gerrit-Change-Number: 9035
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 21:22:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6193: Track memory of incoming data streams

2018-01-17 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8914 )

Change subject: IMPALA-6193: Track memory of incoming data streams
..


Patch Set 4:

(9 comments)

Thanks for the reviews and discussion. Please see PS5 and my inline comments.

http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/impala-service-pool.h
File be/src/rpc/impala-service-pool.h:

http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/impala-service-pool.h@64
PS3, Line 64:
> Maybe make it "MemTracker* const" since it's not modified after constructio
Done


http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/impala-service-pool.cc
File be/src/rpc/impala-service-pool.cc:

http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/impala-service-pool.cc@141
PS3, Line 141:   mem_tracker_->Consume(c->GetTransferSize());
> Is it reasonable to call TryConsume() here and handle the failure, e.g. in
Following the discussion in krpc-data-stream-mgr.cc I left this unchanged for 
now.


http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/impala-service-pool.cc@180
PS3, Line 180: mem_tracker_->Release(incoming->GetTransferSize());
> Leaving the memory temporarily untracked here isn't great. Is there a way w
I updated the change to pass the service memtracker to the data stream manager. 
This allows us to Release the memory there before we either track it again in 
the data stream manager or call AddBatch on a receiver. However, we face a 
similar pattern there, that we release memory before making a function call. It 
does not cross the KRPC service boundary so it may be less prone to errors, but 
still seems not ideal.

Do you have a pattern to recommend for these cases? Should we always try to 
pass the sender's tracker along with the payload so that the receiver of a call 
can release memory before consuming it from its own tracker? Should we 
introduce a small helper class that represents a memory allocation, keeps a 
pointer to its current_tracker_ and has a Move(MemTracker* new_tracker) method 
to release memory from current_tracker_, consume it from new_tracker, and set 
current_tracker_ = new_tracker. Such a class could also DCHECK in its d'tor if 
its memory has not been released.


http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/rpc-mgr.h@165
PS3, Line 165:   /// Passed to new services.
 :   MemTracker* mem_tracker_;
> Is this still needed ?
Removed.


http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/exec-env.cc@328
PS3, Line 328: MemTracker* stream_mgr_tracker = obj_pool_->Add(
> Is there a more descriptive name we could use for this one. E.g. "Data Stre
Done, thank you for coming up with a suggestion.


http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/exec-env.cc@343
PS3, Line 343: // Bump thread cache to 1GB to reduce contention for 
TCMalloc central
> nit: extra blank line
Done


http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-mgr.cc@175
PS3, Line 175: ctx->serialized_size()
> Wouldn't the majority of the changes be not needed if we add a new interfac
Done


http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-recvr.cc@324
PS3, Line 324: Consume
> Same question about TryConsume
Ack


http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-recvr.cc@373
PS3, Line 373: Consume
> Here too
Ack



--
To view, visit http://gerrit.cloudera.org:8080/8914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4
Gerrit-Change-Number: 8914
Gerrit-PatchSet: 4
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 20:41:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6193: Track memory of incoming data streams

2018-01-17 Thread Lars Volker (Code Review)
Hello Michael Ho, Tim Armstrong, Bikramjeet Vig,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8914

to look at the new patch set (#5).

Change subject: IMPALA-6193: Track memory of incoming data streams
..

IMPALA-6193: Track memory of incoming data streams

This change adds memory tracking to incoming transmit data RPCs when
using KRPC. We track memory against a global tracker called "Data Stream
Service" until it is handed over to the stream manager. There we track
it in a global tracker called "Data Stream Queued RPC Calls" until a
receiver registers and takes over the early sender RPCs. Inside the
receiver, memory for deferred RPCs is tracked against the fragment
instance's memtracker until we unpack the batches and add them to the
row batch queue.

The DCHECK in MemTracker::Close() covers that all memory consumed by a
tracker gets release eventually. In addition to that, this change adds a
custom cluster test that makes sure that queued memory gets tracked by
inspecting the peak consumption of the new memtrackers.

Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr.cc
M be/src/rpc/rpc-mgr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/mem-tracker.h
M be/src/util/memory-metrics.h
M common/protobuf/data_stream_service.proto
11 files changed, 91 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8914/5
--
To view, visit http://gerrit.cloudera.org:8080/8914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4
Gerrit-Change-Number: 8914
Gerrit-PatchSet: 5
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Bumping version to 3.0.

2018-01-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9044 )

Change subject: Bumping version to 3.0.
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/9044
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id39f9648cb9b40b67b1029fa8c4132cd04c1d0c6
Gerrit-Change-Number: 9044
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Wed, 17 Jan 2018 20:35:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6410: Tool to cherrypick changes across branches.

2018-01-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9045 )

Change subject: IMPALA-6410: Tool to cherrypick changes across branches.
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9045/2/bin/compare_branches.py
File bin/compare_branches.py:

http://gerrit.cloudera.org:8080/#/c/9045/2/bin/compare_branches.py@21
PS2, Line 21: # [ { "source": "master", "target": "2.x", "commits": [ 
"", 
"" ] } ]
long line


http://gerrit.cloudera.org:8080/#/c/9045/2/bin/compare_branches.py@30
PS2, Line 30: from pprint import pformat
We should have "import..." grouped together and sorted, followed by "from ... 
import ..." statements also sorted


http://gerrit.cloudera.org:8080/#/c/9045/2/bin/compare_branches.py@179
PS2, Line 179: print "Cherrypicking %d changes." % 
(len(cherry_pick_hashes),)
The main() function seems long. Do you think it would make sense to separate 
the cherry picking into its own function?



--
To view, visit http://gerrit.cloudera.org:8080/9045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9
Gerrit-Change-Number: 9045
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Wed, 17 Jan 2018 20:35:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2397: Use atomics for metrics

2018-01-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9012 )

Change subject: IMPALA-2397: Use atomics for metrics
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h
File be/src/util/metrics.h:

http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@147
PS1, Line 147:   virtual void set_value(const T& value) = 0;
> Currently, there are only two derived classes for SimpleMetric and both of
What's the point of making it a virtual function if it doesn't need to be? It 
seems better to just have a regular function on the derived classes - it's more 
efficient and easier to understand which function is being called.

I checked out this PS and made it non-virtual and it seems to compile and run 
fine.


http://gerrit.cloudera.org:8080/#/c/9012/1/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/9012/1/common/thrift/metrics.json@886
PS1, Line 886: "kind": "PROPERTY",
> Unfortunately, if we set this to a GAUGE, we need to back it with a gauge i
I think SimpleProperty implements all the functionality needed for this 
one case, it would just need to declare itself as a TMetricKind::GAUGE.

I.e we could implement a skeleton DoubleGauge that only supports SetValue() and 
GetValue().

This isn't the end of the world, but it would be good if we could avoid the 
internal implementation details leaking out into our metadata about metrics.



--
To view, visit http://gerrit.cloudera.org:8080/9012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606
Gerrit-Change-Number: 9012
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 20:14:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

2018-01-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8820 )

Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
..


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1737/


--
To view, visit http://gerrit.cloudera.org:8080/8820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
Gerrit-Change-Number: 8820
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:52:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

2018-01-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8820 )

Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
..


Patch Set 16: Code-Review+2

Failed due to flaky test IMPALA-6399. Rebase and retry.


--
To view, visit http://gerrit.cloudera.org:8080/8820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
Gerrit-Change-Number: 8820
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:52:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6075: Add Impala daemon metric for catalog version.

2018-01-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8949 )

Change subject: IMPALA-6075: Add Impala daemon metric for catalog version.
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8949/2/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8949/2/be/src/service/impala-server.h@978
PS2, Line 978: CatalogUpdateMetrics
UpdateCatalogVersionMetrics()


http://gerrit.cloudera.org:8080/#/c/8949/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8949/2/be/src/service/impala-server.cc@1325
PS2, Line 1325: ImpaladMetrics::CATALOG_CURR_VER->set_value(catalog_version);
Maybe my comment was not clear. What I meant is that you should also expose the 
catalog_topic_version and the catalog_service_id as a metric.


http://gerrit.cloudera.org:8080/#/c/8949/2/be/src/util/impalad-metrics.h
File be/src/util/impalad-metrics.h:

http://gerrit.cloudera.org:8080/#/c/8949/2/be/src/util/impalad-metrics.h@113
PS2, Line 113: CATALOG_CURR_VER
CATALOG_VERSION, CATALOG_TOPIC_VERSION, CATALOG_SERVICE_ID


http://gerrit.cloudera.org:8080/#/c/8949/2/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/8949/2/common/thrift/metrics.json@233
PS2, Line 233: The catalog version which is currently with impalad.
Maybe "Version of the Impalad catalog."



--
To view, visit http://gerrit.cloudera.org:8080/8949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id2307eb434561ed74ff058106541c0ebda017d97
Gerrit-Change-Number: 8949
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:47:55 +
Gerrit-HasComments: Yes


[native-toolchain-CR] Bump thrift to 0.9.3-p2; Add bison 3.0.4

2018-01-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9035 )

Change subject: Bump thrift to 0.9.3-p2; Add bison 3.0.4
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9035
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I710e977e4a182e5d3aaf58f1233c68c04585bebd
Gerrit-Change-Number: 9035
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:40:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-01-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8523/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8523/4//COMMIT_MSG@24
PS4, Line 24: datastructures
nit: data structures


http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/query-schedule.h
File be/src/scheduling/query-schedule.h:

http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/query-schedule.h@260
PS4, Line 260: // Map of plan node to scan ranges that are computed from 
request_. If a plan node is
 :   // not present, then it has no associated additional scan 
ranges.
 :   // Populated by the ComputeScanRanges.
 :   std::map 
computed_ranges_;
How is this related to PerNodeScanRanges (L41)?


http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/scheduler.cc@258
PS4, Line 258: // Combine actual and computed ranges from the schedule.
 :   const vector* scan_ranges = 

 :   const vector* computed_ranges =
 :   schedule->GetComputedRanges(entry.first);
 :   vector combined_ranges;
 :   if (computed_ranges != nullptr) {
 : for (const auto& range : entry.second) {
 :   if (!range.scan_range.__isset.hdfs_file_split_spec) {
 : combined_ranges.push_back(range);
 :   }
 : }
 : for (const auto& range : 
(*schedule->GetComputedRanges(entry.first))) {
 :   combined_ranges.push_back(range);
 : }
 : scan_ranges = _ranges;
 :   }
I think my previous comment made things worse, sorry about that. I think it's 
fine to move the scan range generation here. In that way, we will avoid the 
double iteration over all plan nodes and the additional state (per node 
computed scan ranges). Just make sure you wrap this into a separate function.


http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift@199
PS4, Line 199: 3: required i64 partition_id
Comment what this is referencing.


http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift@209
PS4, Line 209: 4: optional THdfsFileSplitSpec hdfs_file_split_spec
Add a comment either here or at the top to describe when THdfsFileSplit vs 
THdfsFileSplitSpec is set. Actually, THdfsFileSplitSpec may not be the best 
name. How about THdfsFileSplitGenerationSpec or something like that?


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@126
PS4, Line 126: fileFormat'
nit: missing quote


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@712
PS4, Line 712: fileDesc.getNumFileBlocks()
What will happen if this is a zero-byte file (no blocks). Can't we prune this 
early on or it never reaches that point?


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@743
PS4, Line 743: partitionFs.getDefaultBlockSize(new Path(fileDesc.getFileName())
Do we need to do this for every file? Can't we do this once per partition?


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@761
PS4, Line 761: disk ids are expected in blocks,
 :* checkMissingDiskIds should be set to true. If disk ids are 
checked,
"checkMissingDiskIds is true, "



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 4
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:30:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2642: Fix a potential deadlock in statestore

2018-01-17 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9038 )

Change subject: IMPALA-2642: Fix a potential deadlock in statestore
..


Patch Set 2:

(3 comments)

Working on the test case.

http://gerrit.cloudera.org:8080/#/c/9038/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9038/2//COMMIT_MSG@17
PS2, Line 17: Testing: The problem is easily reproduced by lowering the value of
: STATESTORE_MAX_SUBSCRIBERS to 3, and then launching a mini cluster
: with 3 impalads. With the fix, we can see the following entry in 
the
: statestore log:
> Is it possible to add a test for this?
Doing it. The approach I am taking requires changing STATESTORE_MAX_SUBSCRIBERS 
to FLAGS_statestore_max_subscribers, so that we can start statestored with a 
low max value. I think this is good to do anyway, as that allows us to test 
other boundary behavior.


http://gerrit.cloudera.org:8080/#/c/9038/2/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/9038/2/be/src/statestore/statestore.cc@353
PS2, Line 353: // Assumes that the caller has taken subscribers_lock_
> I would add this comment in the header comment in the .h file.
Done


http://gerrit.cloudera.org:8080/#/c/9038/2/be/src/statestore/statestore.cc@353
PS2, Line 353: // Assumes that the caller has taken subscribers_lock_
> Its kind of weird IMO that OfferUpdate() expects the caller to take subscri
Thought of that, ie, moving the call to OfferUpdate from DoSubscriberUpdate to 
another scope that does not have subscribers_lock_. Then I saw that 
UnregisterSubscriber() also expects the caller to take subscribers_lock_. I 
thought it made sense to follow the same pattern here.

I think there is scope for cleaning up the way locking in the statestore code. 
For instance, RegisterSubscriber() takes subscribers_lock_, instead of its 
caller. I am just not sure if it makes sense to do that as part of this jira.

Another thing: BlockingQueue::BlockingPut() does not release the mutex it 
acquires in the shut down path. I know we're shutting down anyway, but if ever 
we implement clean shut down that does some work, a deadlock might happen here.



--
To view, visit http://gerrit.cloudera.org:8080/9038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d49dede221ce1f50ec299643b5532c61f93f0c6
Gerrit-Change-Number: 9038
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:22:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2397: Use atomics for metrics

2018-01-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9012 )

Change subject: IMPALA-2397: Use atomics for metrics
..


Patch Set 2:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h
File be/src/util/metrics.h:

http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@128
PS1, Line 128: /// - 'property' which is a value store which can be read and 
written only
> Maybe mention that the metric kinds can serve as a hint for how management
Done


http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@144
PS1, Line 144:   /// Returns the current value. Thread-safe.
> I feel like we should rename to GetValue()/SetValue() since they're no long
Renamed to GetValue() / SetValue(). Keep them as pure virtual as their 
implementations can be different in derived classes but they share the 
implementation for the ToJson() interface.


http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@147
PS1, Line 147:   /// Sets the current value. Thread-safe.
> Does set_value() need to be a virtual function? I can't think of a case whe
Currently, there are only two derived classes for SimpleMetric and both of them 
implement this interface albeit differently. So, it seems to make sense to have 
it as virtual for now until we have more classes which inherit SimpleMetric but 
somehow doesn't implement SetValue().


http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@215
PS1, Line 215: operty;
> nit: maybe rename to metric_kind_t to make it more clear that it's a type?
Done


http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@225
PS1, Line 225:   }
> The combination of value_/value()/CalculateValue() seems a bit too complica
Good idea. Renamed value() to GetValue().


http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@232
PS1, Line 232:   /// need to take care of synchronization among sources.
> Could this be a compile time check / assertion?
I believe the compiler will most likely fold this as a constant during 
compilation so it's a no-op for GAUGE.


http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@252
PS1, Line 252: typedef class AtomicMetric IntCounter;
> I think it might be slightly easier to follow if we move these below the cl
Done


http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@265
PS1, Line 265: int64_t sum = 0;
> The base class comment says that this happens atomically, but this implemen
The new code and the old code both don't hold locks of all gauges to prevent 
them from being written before reading them. So, although each read of the 
gauges is atomic by itself, there is no guarantee on atomicity between all the 
gauges.

The base class comments only means that it's up to the derived class to 
implement its own synchronization and the model above apparently seems good 
enough for SumGauge.

Note that value() and set_value() are racy by nature as the value can change 
right after the value is read. The atomicity guarantee has more to do with 
read-modify-write between multiple writers and atomic read of the value (i.e. 
no half written 64-bit value).


http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@284
PS1, Line 284:
> nit: this might fit into a single line
Done


http://gerrit.cloudera.org:8080/#/c/9012/1/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/9012/1/common/thrift/metrics.json@886
PS1, Line 886: "kind": "PROPERTY",
> It seems like this should still be exposed as a gauge since it's a numeric
Unfortunately, if we set this to a GAUGE, we need to back it with a gauge in 
the code too. In which case, we may need to implement DoubleGauge.



--
To view, visit http://gerrit.cloudera.org:8080/9012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606
Gerrit-Change-Number: 9012
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:18:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload

2018-01-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9009 )

Change subject: IMPALA-6386: Invalidate metadata at table level for dataload
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1736/


--
To view, visit http://gerrit.cloudera.org:8080/9009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
Gerrit-Change-Number: 9009
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:12:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload

2018-01-17 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9009 )

Change subject: IMPALA-6386: Invalidate metadata at table level for dataload
..


Patch Set 3: Code-Review+2

Carry +2


--
To view, visit http://gerrit.cloudera.org:8080/9009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
Gerrit-Change-Number: 9009
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:10:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload

2018-01-17 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9009 )

Change subject: IMPALA-6386: Invalidate metadata at table level for dataload
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9009/2/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/9009/2/testdata/bin/generate-schema-statements.py@492
PS2, Line 492: impala_output = Statements()
> Maybe s/impala_output/impala_create/g
Done


http://gerrit.cloudera.org:8080/#/c/9009/2/testdata/bin/generate-schema-statements.py@702
PS2, Line 702:   hive_output.write_to_file('load-' + output_name + 
'-hive-generated.sql')
> Ideally these names should match as well (hive_load, hbase_load), but it is
I'm going to skip this. hive_output incorporates a mix of create, load, and 
other things. This will get a bit cleaner when we do parallel Hive.



--
To view, visit http://gerrit.cloudera.org:8080/9009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
Gerrit-Change-Number: 9009
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:10:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload

2018-01-17 Thread Joe McDonnell (Code Review)
Hello Philip Zeyliger, Zach Amsden,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9009

to look at the new patch set (#3).

Change subject: IMPALA-6386: Invalidate metadata at table level for dataload
..

IMPALA-6386: Invalidate metadata at table level for dataload

Dataload currently executes bin/load-data.py for TPC-H,
TPC-DS, and functional-query concurrently. One of the final
steps for bin/load-data.py is to run a global "invalidate
metadata". Global "invalidate metadata" commands are known
to cause problem on concurrent systems. See IMPALA-5087.
For dataload, if TPC-H executes "invalidate metadata" while
TPC-DS is still creating tables and adding partitions,
the TPC-DS executor might erroneously believe that a table
does not exist.

This changes dataload to invalidate metadata at an
individual table level rather than globally. This
prevents the concurrency issue.

This also changes the names of some of the intermediate
SQL files generated by generate-schema-statements.py
and consumed by load-data.py to make them less confusing.

Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
---
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
2 files changed, 18 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/9009/3
--
To view, visit http://gerrit.cloudera.org:8080/9009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
Gerrit-Change-Number: 9009
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 9:

Saw this, will take a look.


--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:02:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

2018-01-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8973 )

Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only 
column privileges
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1735/


--
To view, visit http://gerrit.cloudera.org:8080/8973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Gerrit-Change-Number: 8973
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:00:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5519: Allocate Runtime filter memory from Buffer pool

2018-01-17 Thread Bikramjeet Vig (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8971

to look at the new patch set (#3).

Change subject: IMPALA-5519: Allocate Runtime filter memory from Buffer pool
..

IMPALA-5519: Allocate Runtime filter memory from Buffer pool

This patch adds changes to the planner to account for memory used by
bloom filters at the fragment instance level. Also adds changes to
allocate memory for the bloom filters from the buffer pool.

Testing:
- Modified Planner Tests and end to end tests to account for memory
  reservation for the runtime filters.
- Modified backend tests and benchmarks to use the bufferpool for
  bloom filter allocation.
- Add an end to end test.
- Ran rest of the core tests.

Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f
---
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/service/fe-support.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/util/backend-gflag-util.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/BackendGflags.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M 
testdata/workloads/functional-query/queries/QueryTest/admission-reject-min-reservation.test
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
40 files changed, 883 insertions(+), 550 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/8971/3
--
To view, visit http://gerrit.cloudera.org:8080/8971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f
Gerrit-Change-Number: 8971
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

2018-01-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8973 )

Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only 
column privileges
..


Patch Set 5: Code-Review+2

Thanks for moving the tests! Much better now :)


--
To view, visit http://gerrit.cloudera.org:8080/8973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Gerrit-Change-Number: 8973
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 17 Jan 2018 18:59:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5519: Allocate Runtime filter memory from Buffer pool

2018-01-17 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8971 )

Change subject: IMPALA-5519: Allocate Runtime filter memory from Buffer pool
..


Patch Set 2:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator-backend-state.cc@410
PS2, Line 410:   return res.__isset.status ? Status(res.status) : Status::OK();
> Why not always set TPublishFilterParams.status. This seems to be adding ano
Removed


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc@1179
PS2, Line 1179:   // Cancel query if the result of PublishFilter rpc 
returns a non-ok status.
> Nit: slightly redundant wording with result/returns. E.g. "if the PublishFi
As discussed, removing cancellation logic


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc@1182
PS2, Line 1182: "This Error was encountered in Fragment $0 at 
$1:$2", fragment_idx,
> nit: don't need caps on Error/Fragment.
As discussed, removing cancellation logic


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc@1184
PS2, Line 1184: Cancel();
> If we're going to cancel the query there's no point sending out more filter
As discussed, removing cancellation logic


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h
File be/src/runtime/runtime-filter-bank.h:

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h@111
PS2, Line 111: input
> output?
Done


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h@118
PS2, Line 118: *&
> Let's just use a double pointer for the output to be consistent with the re
Done


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h@165
PS2, Line 165: total_filter_mem_required_
> I think total_bloom_filter_mem_required_ would be more consistent with the
Done


http://gerrit.cloudera.org:8080/#/c/8971/1/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/8971/1/be/src/runtime/runtime-filter-bank.cc@189
PS1, Line 189:   if (buffer_pool_client_.GetUnusedReservation() < 
required_space) {
> I don't see this change in PS2
my mistake, I only added the DCHECK in AllocateScratchBloomFilter, forgot to 
put it here too. Will add it in the next patch.


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.cc@a78
PS2, Line 78:
> Hmm so we were unnecessarily re-allocating filters if they were already reg
yes thats right


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.cc@a80
PS2, Line 80:
> Was the lock removed accidentally?
yes, I will add it back


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.h@71
PS2, Line 71:   void Close();
> Maybe mention that Close() is safe to call multiple times or if Init() wasn
Done


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.h@111
PS2, Line 111:   /// buffer pool allocation failed and the filter should be 
discarded.
> Is the change in this function's behaviour needed now?
I would prefer to keep this change in case this methods is called before 
calling Init() or after calling close.
I will update the comment to more appropriately reflect the recent change.


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@30
PS2, Line 30:   : always_false_(true),
> Maybe initialize the constant ones at their definition while we're here. I
Done


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@54
PS2, Line 54: RETURN_IF_ERROR(
:   buffer_pool_->AllocateBuffer(buffer_pool_client_, 
alloc_size, _handle_));
will call Close() before this to make sure Init() is safe to call multiple 
times.


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@63
PS2, Line 63: directory_
> directory_ != nullptr, to be consistent with the rest of the code
Done


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@72
PS2, Line 72:   if (directory_) {
> directory_ != nullptr, to be consistent with the rest of the code
Done


http://gerrit.cloudera.org:8080/#/c/8971/2/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:


[Impala-ASF-CR] IMPALA-5191: Standardize column alias behavior

2018-01-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8801 )

Change subject: IMPALA-5191: Standardize column alias behavior
..


Patch Set 12: Code-Review+2

(1 comment)

I'l merge after you address the final comment.

Can you file a JIRA and work with John Russell to get the new behavior 
documented? We should document it as an incompatible change.

http://gerrit.cloudera.org:8080/#/c/8801/12/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8801/12/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@1056
PS12, Line 1056: // Constant arithmetic expressions should not be 
interpreted as ordinals
Constant exprs should not be interpreted as ordinals.

(this does not only apply to constant arithmetic exprs, but any constant expr, 
e.g. if(true,1,int_col) should also not be interpreted as an ordinal)



--
To view, visit http://gerrit.cloudera.org:8080/8801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f82483b486acf6953876cfa672b0d034f3709a8
Gerrit-Change-Number: 8801
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Jan 2018 18:57:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bumping version to 3.0.

2018-01-17 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/9044 )

Change subject: Bumping version to 3.0.
..

Bumping version to 3.0.

This changes the version that Impala presents as to 3.0. We are
simultaneously introducing a 2.x branch that continues
to identify itself as 2.x (where x=11 right now).

This commit is not for 2.x.

Change-Id: Id39f9648cb9b40b67b1029fa8c4132cd04c1d0c6
---
M bin/save-version.sh
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/9044/2
--
To view, visit http://gerrit.cloudera.org:8080/9044
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id39f9648cb9b40b67b1029fa8c4132cd04c1d0c6
Gerrit-Change-Number: 9044
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] Bumping version to 3.0.

2018-01-17 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9044


Change subject: Bumping version to 3.0.
..

Bumping version to 3.0.

This changes the version that Impala presents as to 3.0. We are
simultaneously introducing a 2.x branch that continues
to identify itself as 2.x (where x=11 right now).

This commit is not for 2.x.

Change-Id: Id39f9648cb9b40b67b1029fa8c4132cd04c1d0c6
---
M bin/save-version.sh
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/9044/1
--
To view, visit http://gerrit.cloudera.org:8080/9044
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id39f9648cb9b40b67b1029fa8c4132cd04c1d0c6
Gerrit-Change-Number: 9044
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6410: Tool to cherrypick changes across branches.

2018-01-17 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9045


Change subject: IMPALA-6410: Tool to cherrypick changes across branches.
..

IMPALA-6410: Tool to cherrypick changes across branches.

This script compares two branches and optionally cherry-picks changes
across. It uses the Gerrit Change-Id as the key, and it supports a
configuration file and a string to ignore commits.

Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9
---
A bin/compare_branches.py
A bin/ignored_commits.json
2 files changed, 194 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/9045/1
--
To view, visit http://gerrit.cloudera.org:8080/9045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9
Gerrit-Change-Number: 9045
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 9: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Jan 2018 17:40:35 +
Gerrit-HasComments: No


[native-toolchain-CR] Bump thrift to 0.9.3-p2; Add bison 3.0.4

2018-01-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9035 )

Change subject: Bump thrift to 0.9.3-p2; Add bison 3.0.4
..


Patch Set 2:

I had a question on the JIRA about patch 0011 that was previous applied to 
thrift.


--
To view, visit http://gerrit.cloudera.org:8080/9035
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I710e977e4a182e5d3aaf58f1233c68c04585bebd
Gerrit-Change-Number: 9035
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 17:24:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5478: Run TPCDS queries with decimal v2 enabled

2018-01-17 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8985 )

Change subject: IMPALA-5478: Run TPCDS queries with decimal_v2 enabled
..


Patch Set 3: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8985/2/tests/query_test/test_tpcds_queries.py
File tests/query_test/test_tpcds_queries.py:

http://gerrit.cloudera.org:8080/#/c/8985/2/tests/query_test/test_tpcds_queries.py@279
PS2, Line 279: v.get_value('table_format').compression_type != 'record')
 : cls.ImpalaTestMatrix.add_mandatory_exec_option('decimal_v2', 
1)
 :
 : if cls.exploration_strategy() != 'exhaustive':
> That's how we run the normal TPCDS workload. Are you suggesting to run this
Nevermind, misread something. Done.


http://gerrit.cloudera.org:8080/#/c/8985/2/tests/query_test/test_tpcds_queries.py@285
PS2, Line 285:   cls.ImpalaTestMatrix.add_constraint(lambda v:\
 :   v.get_value('table_format').file_format in ['parquet'])
 :
 : cls.ImpalaTestMatrix.add_constraint(lambda v:\
 : v.get_value('exec_option')['batch_size'] == 0)
 :
 :   def test_tpcds_q1(self, vector):
 : self.run_test_case(self.get_workload() + '-decimal
> Simlar response to above. Are you suggesting to run this workload different
Done



--
To view, visit http://gerrit.cloudera.org:8080/8985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib867c51a521ec4a087bc127d99aee4b95ba97733
Gerrit-Change-Number: 8985
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 17 Jan 2018 16:52:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-17 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#9).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
17 files changed, 787 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/9
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-3282: Adds regexp escape built-in function

2018-01-17 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8900 )

Change subject: IMPALA-3282: Adds regexp_escape built-in function
..


Patch Set 5:

(2 comments)

Thanks for the iteration!

http://gerrit.cloudera.org:8080/#/c/8900/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8900/5/be/src/exprs/string-functions-ir.cc@623
PS5, Line 623:   const strings::CharSet to_escape(".\\+*?[^]$(){}=!<>|:-");
This is a constant (perhaps "REGEX_ESCAPE_CHARACTERS") and could be declared 
static const in the StringFunctions:: namespace. Seems like right now it's 
creating a CharSet for every value?

See FACTORIAL_LOOKUP, which seems similar in spirit.


http://gerrit.cloudera.org:8080/#/c/8900/5/be/src/exprs/string-functions-ir.cc@637
PS5, Line 637:   DCHECK_GE(result.len, 0);
DCHECK_GE(result.len, str.len) should be true too, no?



--
To view, visit http://gerrit.cloudera.org:8080/8900
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I84c3e0ded26f6eb20794c38b75be9b25cd111e4b
Gerrit-Change-Number: 8900
Gerrit-PatchSet: 5
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Jan 2018 16:33:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-01-17 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8400 )

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8400/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8400/8//COMMIT_MSG@20
PS8, Line 20:
> nit: wrong word?
Done


http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup@1055
PS6, Line 1055: plan_hints:hints
> nit: please check that all review comments are marked as "Done" when sendin
Done


http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69
PS6, Line 69:* Helper class for parsing.
> Thanks for the explanation, I think I understand it now. Could we just have
I have no problems with going that way, but I am not sure that it would make 
the change smaller: InsertStmt would also need a Set/AddPlanHints function, and 
it would be probably better to rewrite InsertStmt to always use the setter 
instead of constructor arguments.


http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@71
PS8, Line 71:
> nit: "rule that checks" or "rules that check"
Done


http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@75
PS8, Line 75: > partitio
> nit: we generally seem to use camel case for acronyms, e.g. see createCtasT
Done


http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@344
PS6, Line 344: 01:EXCHANGE [UNPARTITIONED]
> Yes, I think it's intentional. Maybe the docs need to be more clear. Can yo
I have created IMPALA-6408 to track the doc issue.



--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 17 Jan 2018 16:04:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-01-17 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Gabor Kaszab,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8400

to look at the new patch set (#10).

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..

IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

This change adds support for "clustered", "noclustered", "shuffle" and
"noshuffle" hints in CTAS statement.

Example:
create /*+ clustered,noshuffle */ table t partitioned by (year, month) as
select * from functional.alltypes

The effect of these hints are the same as in insert statements:

clustered:
Sort locally by partition expression before insert to ensure that only
one partition is written at a time. The goal is to reduce the number of
files kept open / buffers kept in memory simultaneously.

noclustered:
Do not sort by primary key before insert to Kudu table. No effect on HDFS
tables currently, as this is their default behavior.

shuffle:
Forces the planner to add an exchange node that repartitions by the
partition expression of the output table. This means that a partition
will be written only by a single node, which minimizes the global
number of simultaneous writes.
If only one partition is written (because all partitioning columns
are constant or the target table is not partitioned), then the shuffle
hint leads to a plan where all rows are merged at the coordinator where
the table sink is executed.

noshuffle:
Do not add exchange node before insert to partitioned tables.

The parser needed some modifications to be able to differentiate between
CREATE statements that allow hints (CTAS), and CREATE statements that
do not (every other type of CREATE statements). As a result, KW_CREATE
was moved from tbl_def_without_col_defs to statement rules.

The tests mainly mirror the analysis / planner tests of INSERT.
Query tests are not created, as the hints have no effect on
the DDL part of CTAS, and the actual query ran is the same as in
the insert case.

Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
5 files changed, 332 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/10
--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-5191: Standardize column alias behavior

2018-01-17 Thread Zoltan Borok-Nagy (Code Review)
Hello Taras Bobrovytsky, Tim Armstrong, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8801

to look at the new patch set (#12).

Change subject: IMPALA-5191: Standardize column alias behavior
..

IMPALA-5191: Standardize column alias behavior

We should not perform alias substitution in the
subexpressions of GROUP BY, HAVING, and ORDER BY
to be more standard conformant.

=== Allowed ===
SELECT int_col / 2 AS x
FROM functional.alltypes
GROUP BY x;

SELECT int_col / 2 AS x
FROM functional.alltypes
ORDER BY x;

SELECT NOT bool_col AS nb
FROM functional.alltypes
GROUP BY nb
HAVING nb;

=== Not allowed ===
SELECT int_col / 2 AS x
FROM functional.alltypes
GROUP BY x / 2;

SELECT int_col / 2 AS x
FROM functional.alltypes
ORDER BY -x;

SELECT int_col / 2 AS x
FROM functional.alltypes
GROUP BY x
HAVING x > 3;

Some extra checks were added to AnalyzeExprsTest.java.

I had to update other tests to make them pass
since the new behavior is more restrictive.

I added alias.test to the end-to-end tests.

Change-Id: I0f82483b486acf6953876cfa672b0d034f3709a8
---
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
A testdata/workloads/functional-query/queries/QueryTest/alias.test
M tests/query_test/test_queries.py
8 files changed, 232 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/8801/12
--
To view, visit http://gerrit.cloudera.org:8080/8801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f82483b486acf6953876cfa672b0d034f3709a8
Gerrit-Change-Number: 8801
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-5191: Standardize column alias behavior

2018-01-17 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8801 )

Change subject: IMPALA-5191: Standardize column alias behavior
..


Patch Set 11:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8801/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8801/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@891
PS11, Line 891:* If the rewritten expr is a NumericLiteral, then return the 
original expr, otherwise
> If given expr is rewritten into an integer literal ...
Done


http://gerrit.cloudera.org:8080/#/c/8801/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@896
PS11, Line 896:   private Expr rewriteIfResultIsNotNumericLiteral(ExprRewriter 
rewriter, Expr expr)
> rewriteCheckOrdinalResult()
Done


http://gerrit.cloudera.org:8080/#/c/8801/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@899
PS11, Line 899: if (rewrittenExpr instanceof NumericLiteral) return expr;
> we use {} for multi-line if else
Done


http://gerrit.cloudera.org:8080/#/c/8801/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8801/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@1055
PS11, Line 1055: "group by one, substring(cast(two as string), 1, 1), 
three");
> Add tests for the constant-folding changes, e.g.:
Added tests.

I re-read our discussion about ordinals in HAVING:
https://gerrit.cloudera.org/#/c/8801/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@548

I checked PostgreSQL and it does not allow ordinals in HAVING. Anyway, I 
implemented it as we discussed, just wanted to clarify that.



--
To view, visit http://gerrit.cloudera.org:8080/8801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f82483b486acf6953876cfa672b0d034f3709a8
Gerrit-Change-Number: 8801
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Jan 2018 15:56:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-01-17 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Gabor Kaszab,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8400

to look at the new patch set (#9).

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..

IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

This change adds support for "clustered", "noclustered", "shuffle" and
"noshuffle" hints in CTAS statement.

Example:
create /*+ clustered,noshuffle */ table t partitioned by (year, month) as
select * from functional.alltypes

The effect of these hints are the same as in insert statements:

clustered:
Sort locally by partition expression before insert to ensure that only
one partition is written at a time. The goal is to reduce the number of
files kept open / buffers kept in memory simultaneously.

noclustered:
Do not sort by primary key before insert to Kudu table. No effect on HDFS
tables currently, as this is their default behavior.

shuffle:
Forces the planner to add an exchange node that repartitions by the
partition expression of the output table. This means that a partition
will be written only by a single node, which minimizes the global
number of simultaneous writes.

noshuffle:
Do not add exchange node before insert to partitioned tables.

The parser needed some modifications to be able to differentiate between
CREATE statements that allow hints (CTAS), and CREATE statements that
do not (every other type of CREATE statements). As a result, KW_CREATE
was moved from tbl_def_without_col_defs to statement rules.

The tests mainly mirror the analysis / planner tests of INSERT.
Query tests are not created, as the hints have no effect on
the DDL part of CTAS, and the actual query ran is the same as in
the insert case.

Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
5 files changed, 331 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/9
--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-17 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 8:

Hi Alex, Dimitris,

Could you review this change, especially the FE part?


--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Jan 2018 14:16:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-17 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#8).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
17 files changed, 787 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/8
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-17 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@89
PS6, Line 89: CopyRows
> Yeah, I think we can just assume that the frontend will only generate plans
I added comments about it to this class, and to the corresponding plan node 
class as well.


http://gerrit.cloudera.org:8080/#/c/9005/7/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/7/be/src/exec/cardinality-check-node.cc@116
PS7, Line 116: Status CardinalityCheckNode::Reset(RuntimeState* state) {
> I don't think we have test coverage for this code yet, since it's only exec
I added some tests to nested-types-subplan.test.
They use tpch_nested_parquet.customer because functional.allcomplextypes is 
empty.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Jan 2018 14:13:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

2018-01-17 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8973 )

Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only 
column privileges
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8973/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
File fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java:

http://gerrit.cloudera.org:8080/#/c/8973/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@102
PS3, Line 102:   // SELECT permissions on columns ('id') on 
'functional_avro.alltypessmall'
> missing opening single-quote for 'functional_avro.alltypessmall'
Done


http://gerrit.cloudera.org:8080/#/c/8973/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@399
PS3, Line 399: privilege = new TPrivilege("", TPrivilegeLevel.SELECT,
> typo: privilege (which has already been declared)
Thanks for spotting it!
Please ignore patch set 4, and compare 3 to 5.
Note that it may look like if this could fit into 1 line, but it would be 91 
characters.



--
To view, visit http://gerrit.cloudera.org:8080/8973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Gerrit-Change-Number: 8973
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 17 Jan 2018 13:34:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

2018-01-17 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Laszlo Gaal, Gabor Kaszab, Attila Jeges, Dimitris 
Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8973

to look at the new patch set (#5).

Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only 
column privileges
..

IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

USE and SHOW TABLES should be allowed if there is at least one
table in a database where the user has table or column
privileges. Impala incorrectly checked only for table privileges.

To test this issue in AuthorizationTest.java, 'functional_avro'
is added as a test database with only column level permissions.

Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/resources/authz-policy.ini.template
4 files changed, 29 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/8973/5
--
To view, visit http://gerrit.cloudera.org:8080/8973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Gerrit-Change-Number: 8973
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 


[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

2018-01-17 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Laszlo Gaal, Gabor Kaszab, Attila Jeges, Dimitris 
Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8973

to look at the new patch set (#4).

Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only 
column privileges
..

IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

USE and SHOW TABLES should be allowed if there is at least one
table in a database where the user has table or column
privileges. Impala incorrectly checked only for table privileges.

To test this issue in AuthorizationTest.java, 'functional_avro'
is added as a test database with only column level permissions.

Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/resources/authz-policy.ini.template
4 files changed, 29 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/8973/4
--
To view, visit http://gerrit.cloudera.org:8080/8973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Gerrit-Change-Number: 8973
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal