[Impala-ASF-CR] IMPALA-6410: Tool to cherrypick changes across branches.
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/9045 ) Change subject: IMPALA-6410: Tool to cherrypick changes across branches. .. Patch Set 3: (29 comments) Thanks for taking this on! http://gerrit.cloudera.org:8080/#/c/9045/2/bin/compare_branches.py File bin/compare_branches.py: http://gerrit.cloudera.org:8080/#/c/9045/2/bin/compare_branches.py@15 PS2, Line 15: # Compares two specified branches, using the Gerrit Change-Id as the Can you add an example invocation and the output of running it with --help? http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py File bin/compare_branches.py: http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@15 PS3, Line 15: # Compares two specified branches, using the Gerrit Change-Id as the Could you add a couple of example invocations? http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@19 PS3, Line 19: # Can you add a motivation section on why this is automated? If it were single cherry-picks, we would just do them manually, but this is special because it helps maintain two active development branches. http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@25 PS3, Line 25: nit: heterogeneous spacing. http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@26 PS3, Line 26: # "commits": [ "", "" ] nit: long line http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@26 PS3, Line 26: Can you make the commits a struct with a field for the hash and an optional field for comments on why a particular commit is ignored? Also, can you call out the requirements for the hash? It has to be the full hash, yes? http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@32 PS3, Line 32: # Debug logging to stderr can be enabled with the --verbose flag. Big-picture-wise, I think it would help if there were some prose explaining the process of maintaining two active branches, including things like how on denotes that a patch should not be cherry picked, when review happens on cherry-picks, and so on. Either Wiki or README would work, I think. What do you think? http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@49 PS3, Line 49: def read_ignored_commits(ignored_commits_file): nit: The param is a file name, yes? Can you make a note that it uses the json format above? http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@50 PS3, Line 50: '''Returns a dictionary containing commits that should be ignored. nit: we normally use https://www.python.org/dev/peps/pep-0257/: double quotes, single subject line, then an empty line, then indentation of the following line that lines up with the quotation mark (not the first line's prose). http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@64 PS3, Line 64: '''Creates a map from change id to (commit_msg, hash, author, date, body).''' nit: can this tuple have the same order as the destructuring on line 74? http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@68 PS3, Line 68: %s git help log calls this the "subject", rather than the message. http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@71 PS3, Line 71: sh.git Possibl digression: does sh.git have documentation other than the generic sh documentation and https://amoffat.github.io/sh/sections/contrib.html#git? I was hoping to understand if there was a way to separate tabbed values that was the common idiom. http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@75 PS3, Line 75: ValueError Why swallow this exception? http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@79 PS3, Line 79: 0 Log an error if there are two or more matches? http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@82 PS3, Line 82: logging.debug('Warning: commit {0} ({1}...) has no Change-Id.'.format(commit_hash, msg[:40])) nit: long line http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@86 PS3, Line 86: def create_parser(): It would help readers of this file (I think) if this were right after the comment that opens the file. I think a few of our other scripts even paste the --help into the comment at the top. http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@90 PS3, Line 90: current Also has to be the target branch, yes? http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@95 PS3, Line 95: branch names are used as is By that, do you mean that the default is to compare asf-gerrit/master and asf-gerrit/2.x? http://gerrit.cloudera.org:8080/#/c/9045/3/bin/compare_branches.py@97 PS3, Line 97: help='Name of the target git remote; defaults to source remote') So the special empty-string
[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8523 to look at the new patch set (#6). Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls .. IMPALA-5931: Generates scan ranges in planner for s3/adls Currently, for filesystems that do not include physical block information (e.g., block replica locations, caching), synthetic blocks are generated and stored in the catalog when metadata is loaded. Example file systems for which this is done includes S3, ADLS, and local fs. This change avoids generating these blocks when metadata is loaded. Instead, scan ranges are directly generated from such files by the backend coordinator. Previously, all scan ranges were produced by the planner in HDFSScanNode in the frontend. Now, those files without block information are sent to the coordinator represented by a split specification that determines how the coordinator will create scan ranges to send to executors. This change reduces the space needed in the catalog and reduces the scan range data structures that are passed from the frontend to the backend when planning and coordinating a query. In addition a bug is avoided where non-splittable files were being split anyways to support the query parameter that places a limit on scan ranges. Testing: - local fs tests cover the code paths in this change - all core tests pass when configured with s3 - manually tried larger local filesystem tables (tpch) with multiple partitions and observed the same scan ranges. - TODO: adls testing Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 --- M be/src/scheduling/query-schedule.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 8 files changed, 332 insertions(+), 183 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/6 -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 6 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8523 to look at the new patch set (#5). Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls .. IMPALA-5931: Generates scan ranges in planner for s3/adls Currently, for filesystems that do not include physical block information (e.g., block replica locations, caching), synthetic blocks are generated and stored in the catalog when metadata is loaded. Example file systems for which this is done includes S3, ADLS, and local fs. This change avoids generating these blocks when metadata is loaded. Instead, scan ranges are directly generated from such files by the backend coordinator. Previously, all scan ranges were produced by the planner in HDFSScanNode in the frontend. Now, those files without block information are sent to the coordinator represented by a split specification that determines how the coordinator will create scan ranges to send to executors. This change reduces the space needed in the catalog and reduces the scan range data structures that are passed from the frontend to the backend when planning and coordinating a query. In addition a bug is avoided where non-splittable files were being split anyways to support the query parameter that places a limit on scan ranges. Testing: - local fs tests cover the code paths in this change - all core tests pass when configured with s3 - manually tried larger local filesystem tables (tpch) with multiple partitions and observed the same scan ranges. - TODO: adls testing Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 --- M be/src/scheduling/query-schedule.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/client-request-state.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 9 files changed, 333 insertions(+), 183 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/5 -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 5 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523 ) Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls .. Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/8523/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8523/4//COMMIT_MSG@24 PS4, Line 24: datastructures > nit: data structures Done http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/query-schedule.h@260 PS4, Line 260: // Map of plan node to scan ranges that are computed from request_. If a plan node is : // not present, then it has no associated additional scan ranges. : // Populated by the ComputeScanRanges. : std::mapcomputed_ranges_; > How is this related to PerNodeScanRanges (L41)? backing out this refactor. http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/scheduler.cc@258 PS4, Line 258: // Combine actual and computed ranges from the schedule. : const vector* scan_ranges = : const vector* computed_ranges = : schedule->GetComputedRanges(entry.first); : vector combined_ranges; : if (computed_ranges != nullptr) { : for (const auto& range : entry.second) { : if (!range.scan_range.__isset.hdfs_file_split_spec) { : combined_ranges.push_back(range); : } : } : for (const auto& range : (*schedule->GetComputedRanges(entry.first))) { : combined_ranges.push_back(range); : } : scan_ranges = _ranges; : } > I think my previous comment made things worse, sorry about that. I think it no problems. added back here. expansion is done per spec in a separate helper function below. http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift@199 PS4, Line 199: 3: required i64 partition_id > Comment what this is referencing. Done http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift@209 PS4, Line 209: 4: optional THdfsFileSplitSpec hdfs_file_split_spec > Add a comment either here or at the top to describe when THdfsFileSplit vs Lets go with THdfsFileSplitGeneratorSpec http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@126 PS4, Line 126: fileFormat' > nit: missing quote Done http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@712 PS4, Line 712: fileDesc.getNumFileBlocks() > What will happen if this is a zero-byte file (no blocks). Can't we prune th changed it to be more explicit (based on the fs). http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@743 PS4, Line 743: partitionFs.getDefaultBlockSize(new Path(fileDesc.getFileName()) > Do we need to do this for every file? Can't we do this once per partition? good point, lifted that to be per partition. http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@761 PS4, Line 761: disk ids are expected in blocks, :* checkMissingDiskIds should be set to true. If disk ids are checked, > "checkMissingDiskIds is true, " Done -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 4 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 18 Jan 2018 05:47:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6368: make test chars parallel
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9022 ) Change subject: IMPALA-6368: make test_chars parallel .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1740/ -- To view, visit http://gerrit.cloudera.org:8080/9022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f62ede90f619b8cebbb1276bab903e7555d9744 Gerrit-Change-Number: 9022 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 18 Jan 2018 05:23:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6193: Track memory of incoming data streams
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8914 ) Change subject: IMPALA-6193: Track memory of incoming data streams .. Patch Set 6: (8 comments) Thanks for the review, please see PS6. http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/rpc/impala-service-pool.cc File be/src/rpc/impala-service-pool.cc: http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/rpc/impala-service-pool.cc@58 PS5, Line 58: DCHECK(mem_tracker_ != nullptr); > nit: extra indentation. Done http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@172 PS5, Line 172: const EndDataStreamRequestPB* request, EndDataStreamResponsePB* response, > I generally think it's better to Consume() before Release()ing to avoid und Done, thanks for the suggestion. Let me know if you'd like me to factor the code into an own method, since it's repeated in AddEarlySender() above. http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@212 PS5, Line 212: // time without considering the remaining number of senders. As a consequence, > Has the memory been released at this point? It hasn't but was about to in RespondSuccess(). I made it explicit by calling DiscardTransfer(). http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@268 PS5, Line 268: } > Has the rpc_context memory been released at this point? RespondSuccess() would release it, I made it more precise by calling DiscardTransfer(). However, EndDataStream RPCs don't have sidecars, so only the serialized PB messages will be discarded, making up less than 100 bytes. http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@387 PS5, Line 387: // to senders which sent EOS RPC so all query fragments will eventually be cancelled. > We haven't actually released the memory at this point, have we? Should we b Done http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@391 PS5, Line 391: for (const unique_ptr& ctx : senders_queue.waiting_sender_ctxs) { > Same question here. Done http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-recvr.cc@410 PS5, Line 410: } > Is the memory released? No, at this point it is still held. I moved the release to the right place. http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-recvr.cc@458 PS5, Line 458: { > Same here At this point the rpc_context was even destroyed - this was a bug. I still haven't found a way to trigger deferred RPCs in a test which is why I hadn't caught this. -- To view, visit http://gerrit.cloudera.org:8080/8914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4 Gerrit-Change-Number: 8914 Gerrit-PatchSet: 6 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 18 Jan 2018 03:45:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6193: Track memory of incoming data streams
Hello Michael Ho, Tim Armstrong, Bikramjeet Vig, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8914 to look at the new patch set (#6). Change subject: IMPALA-6193: Track memory of incoming data streams .. IMPALA-6193: Track memory of incoming data streams This change adds memory tracking to incoming transmit data RPCs when using KRPC. We track memory against a global tracker called "Data Stream Service" until it is handed over to the stream manager. There we track it in a global tracker called "Data Stream Queued RPC Calls" until a receiver registers and takes over the early sender RPCs. Inside the receiver, memory for deferred RPCs is tracked against the fragment instance's memtracker until we unpack the batches and add them to the row batch queue. The DCHECK in MemTracker::Close() covers that all memory consumed by a tracker gets release eventually. In addition to that, this change adds a custom cluster test that makes sure that queued memory gets tracked by inspecting the peak consumption of the new memtrackers. Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4 --- M be/src/rpc/impala-service-pool.cc M be/src/rpc/impala-service-pool.h M be/src/rpc/rpc-mgr.cc M be/src/rpc/rpc-mgr.h M be/src/runtime/exec-env.cc M be/src/runtime/krpc-data-stream-mgr.cc M be/src/runtime/krpc-data-stream-mgr.h M be/src/runtime/krpc-data-stream-recvr.cc M be/src/runtime/mem-tracker.h M be/src/util/memory-metrics.h M common/protobuf/data_stream_service.proto A tests/custom_cluster/test_krpc_mem_usage.py A tests/verifiers/mem_usage_verifier.py 13 files changed, 270 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8914/6 -- To view, visit http://gerrit.cloudera.org:8080/8914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4 Gerrit-Change-Number: 8914 Gerrit-PatchSet: 6 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5478: Run TPCDS queries with decimal v2 enabled
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8985 ) Change subject: IMPALA-5478: Run TPCDS queries with decimal_v2 enabled .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib867c51a521ec4a087bc127d99aee4b95ba97733 Gerrit-Change-Number: 8985 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 18 Jan 2018 03:28:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6382: Cap spillable buffer size and max row size query options
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9023 ) Change subject: IMPALA-6382: Cap spillable buffer size and max row size query options .. Patch Set 2: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1738/ -- To view, visit http://gerrit.cloudera.org:8080/9023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1 Gerrit-Change-Number: 9023 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 18 Jan 2018 02:13:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5519: Allocate fragment's runtime filter memory from Buffer pool
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8971 to look at the new patch set (#4). Change subject: IMPALA-5519: Allocate fragment's runtime filter memory from Buffer pool .. IMPALA-5519: Allocate fragment's runtime filter memory from Buffer pool This patch adds changes to the planner to account for memory used by bloom filters at the fragment instance level. Also adds changes to allocate memory for those bloom filters from the buffer pool. Testing: - Modified Planner Tests and end to end tests to account for memory reservation for the runtime filters. - Modified backend tests and benchmarks to use the bufferpool for bloom filter allocation. - Add an end to end test. - Ran rest of the core tests. Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f --- M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/service/fe-support.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/util/backend-gflag-util.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M common/thrift/BackendGflags.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/admission-reject-min-reservation.test M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test M testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test 40 files changed, 883 insertions(+), 550 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/8971/4 -- To view, visit http://gerrit.cloudera.org:8080/8971 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f Gerrit-Change-Number: 8971 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2397: Use atomics for IntGauge and IntCounter
Hello Lars Volker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9012 to look at the new patch set (#3). Change subject: IMPALA-2397: Use atomics for IntGauge and IntCounter .. IMPALA-2397: Use atomics for IntGauge and IntCounter This change removes the spinlock in IntGauge and IntCounter and uses AtomicInt64 instead. As shown in IMPALA-2397, multiple threads can be contending for the spinlocks of some global metrics under concurrent queries. This change also breaks up SimpleMetric is renamed to ScalarMetric and broken into two subclasses: - LockedMetric: - a value store for any primitive type (int,float,string etc). - atomic read and write via GetValue() and SetValue() respectively. - AtomicMetric: - the basis of IntGauge and IntCounter. Support atomic increment of the metric value via Increment() interface. - atomic read and write via GetValue() and SetValue() respectively. - only support int64_t type. Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606 --- M be/src/exec/external-data-source-executor.cc M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/thrift-server.cc M be/src/runtime/client-cache.cc M be/src/runtime/data-stream-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/io/scan-range.cc M be/src/runtime/krpc-data-stream-mgr.cc M be/src/runtime/mem-tracker-test.cc M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/service/session-expiry-test.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore.cc M be/src/util/common-metrics.cc M be/src/util/default-path-handlers.cc M be/src/util/impalad-metrics.cc M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M be/src/util/metrics-test.cc M be/src/util/metrics.h M be/src/util/thread.cc M common/thrift/metrics.json 30 files changed, 353 insertions(+), 332 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/9012/3 -- To view, visit http://gerrit.cloudera.org:8080/9012 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606 Gerrit-Change-Number: 9012 Gerrit-PatchSet: 3 Gerrit-Owner: Michael HoGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6368: make test chars parallel
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9022 ) Change subject: IMPALA-6368: make test_chars parallel .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1740/ -- To view, visit http://gerrit.cloudera.org:8080/9022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f62ede90f619b8cebbb1276bab903e7555d9744 Gerrit-Change-Number: 9022 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 18 Jan 2018 01:41:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6219: Use AES-GCM for spill-to-disk encryption
Xianda Ke has posted comments on this change. ( http://gerrit.cloudera.org:8080/9032 ) Change subject: IMPALA-6219: Use AES-GCM for spill-to-disk encryption .. Patch Set 4: Hi folks, Would you please help review when you are available? @Tim, Would you please help run the test on all the platforms(old centos...) -- To view, visit http://gerrit.cloudera.org:8080/9032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ea87b82a8897ee8bfa187715ac1c52883790d24 Gerrit-Change-Number: 9032 Gerrit-PatchSet: 4 Gerrit-Owner: Xianda KeGerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Xianda Ke Gerrit-Comment-Date: Thu, 18 Jan 2018 00:43:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6219: Use AES-GCM for spill-to-disk encryption
Xianda Ke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9032 Change subject: IMPALA-6219: Use AES-GCM for spill-to-disk encryption .. IMPALA-6219: Use AES-GCM for spill-to-disk encryption AES-GCM can be very fast(~10 times faster than CFB+SHA256), but it requires an instruction that Impala can currently run without (CLMUL). In order to be fast, we dispatch to GCM mode at run-time based on the CPU and OpenSSL version. Testing: run runtime tmp-file-mgr-test, openssl-util-test, buffer-pool-test and buffered-tuple-stream-test. add two cases GcmIntegrity & EncryptoArbitraryLength for openssl-util-test Change-Id: I1ea87b82a8897ee8bfa187715ac1c52883790d24 --- M be/src/runtime/tmp-file-mgr.cc M be/src/util/cpu-info.cc M be/src/util/cpu-info.h M be/src/util/openssl-util-test.cc M be/src/util/openssl-util.cc M be/src/util/openssl-util.h 6 files changed, 176 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/9032/4 -- To view, visit http://gerrit.cloudera.org:8080/9032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1ea87b82a8897ee8bfa187715ac1c52883790d24 Gerrit-Change-Number: 9032 Gerrit-PatchSet: 4 Gerrit-Owner: Xianda Ke
[Impala-ASF-CR] IMPALA-3942: Fix wrongly escaped string literal in front-end
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8818 ) Change subject: IMPALA-3942: Fix wrongly escaped string literal in front-end .. Patch Set 7: (8 comments) http://gerrit.cloudera.org:8080/#/c/8818/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java: http://gerrit.cloudera.org:8080/#/c/8818/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@44 PS6, Line 44: this(value, ScalarType.STRING, true); > Done. By the way, would you tell me the reason why you don't want to keep t That's a fair question, thanks for asking! * In most cases there will be no redundant computation. The common case is that this literal is passed to the BE via toThrift() so getNormalizedValue() is only called once. * I'm less concerned about redundant computations and more concerned about doubling the memory consumption. This point will probably only matter in extreme cases. * Easier to reason about and maintain fewer class members. The normalized string is "redundant" information because it can be derived from 'value_'. http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java: http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@87 PS7, Line 87: return BaseSemanticAnalyzer.unescapeSQLString("'" + getNormalizedValue() whitespace http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@91 PS7, Line 91: // Assumes a string literal from single quotes and escapes it because: What does this function do and return? Please consider these snippets which might improve the comment: Returns a normalized representation of the string value suitable for embedding in SQL as a single-quoted string literal. String literals can come directly from the SQL of a query or from rewrites like constant folding, so we generally do not know whether single or double quotes are appropriate for a given value. The SQL standard prescribes single-quoted string literals so we normalize that way. http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@100 PS7, Line 100: private String getNormalizedValue() { nit: we typically use /* */ style comment for class and function comments http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@101 PS7, Line 101: final int lengthOfValue = value_.length(); len? http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@468 PS7, Line 468: public void escapeStringLiteralTest() { normalizeStringLiteralsTest http://gerrit.cloudera.org:8080/#/c/8818/7/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@471 PS7, Line 471: testToSql("select \"'\"", "SELECT '\\''"); add a few tests with single quotes that show they are not transformed http://gerrit.cloudera.org:8080/#/c/8818/6/tests/query_test/test_queries.py File tests/query_test/test_queries.py: http://gerrit.cloudera.org:8080/#/c/8818/6/tests/query_test/test_queries.py@310 PS6, Line 310: > I agree with your suggestion. ToSqlTest mainly checks the problem. By the w I'm ok with leaving some E2E tests. Imo they belong in views-ddl.test since the user-visible bug is really around creating/using views, and the toSql() issue is already covered by ToSqlTest unit tests. -- To view, visit http://gerrit.cloudera.org:8080/8818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc4b5f5d8ffaa8feb96a466959427a04b3b06fec Gerrit-Change-Number: 8818 Gerrit-PatchSet: 7 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 18 Jan 2018 00:20:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5519: Allocate Runtime filter memory from Buffer pool
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8971 ) Change subject: IMPALA-5519: Allocate Runtime filter memory from Buffer pool .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8971/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8971/3//COMMIT_MSG@12 PS3, Line 12: Maybe mention that it doesn't address the coordinator memory problem? -- To view, visit http://gerrit.cloudera.org:8080/8971 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f Gerrit-Change-Number: 8971 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 18 Jan 2018 00:19:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5519: Allocate Runtime filter memory from Buffer pool
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8971 ) Change subject: IMPALA-5519: Allocate Runtime filter memory from Buffer pool .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/8971/3/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/8971/3/be/src/util/bloom-filter.h@112 PS3, Line 112: directoty_ typo: directory http://gerrit.cloudera.org:8080/#/c/8971/3/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/8971/3/be/src/util/bloom-filter.cc@50 PS3, Line 50: Close(); Calling Close() is a little subtle. Maybe mention the reason for doing this so it isn't lost. http://gerrit.cloudera.org:8080/#/c/8971/3/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/8971/3/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@263 PS3, Line 263: .sum(planTreeProfile.duringOpenProfile.max(fInstancePostOpenProfile)); I missed this on the first pass. I think we need to add the runtime filters reservation to both duringOpenProfile and fInstancePostOpenProfile, then take the max of those. This is because the filters may be constructed during open and live after that. http://gerrit.cloudera.org:8080/#/c/8971/3/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test File testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test: http://gerrit.cloudera.org:8080/#/c/8971/3/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test@59 PS3, Line 59:partitions=24/24 files=24 size=174.39KB Changes in this file look spurious. http://gerrit.cloudera.org:8080/#/c/8971/3/testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test File testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test: PS3: Changes to this file (and some of the later ones) look spurious too. http://gerrit.cloudera.org:8080/#/c/8971/2/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test File testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test: http://gerrit.cloudera.org:8080/#/c/8971/2/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test@391 PS2, Line 391: # mem requirement after accounting for the memory > the old test was already flawed in that it never exercised the code path th Thanks for the detailed explanation. -- To view, visit http://gerrit.cloudera.org:8080/8971 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f Gerrit-Change-Number: 8971 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 18 Jan 2018 00:18:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5478: Run TPCDS queries with decimal v2 enabled
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8985 ) Change subject: IMPALA-5478: Run TPCDS queries with decimal_v2 enabled .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1739/ -- To view, visit http://gerrit.cloudera.org:8080/8985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib867c51a521ec4a087bc127d99aee4b95ba97733 Gerrit-Change-Number: 8985 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 17 Jan 2018 23:45:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6410: Tool to cherrypick changes across branches.
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/9045 ) Change subject: IMPALA-6410: Tool to cherrypick changes across branches. .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/9045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9 Gerrit-Change-Number: 9045 Gerrit-PatchSet: 3 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Comment-Date: Wed, 17 Jan 2018 23:37:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6410: Tool to cherrypick changes across branches.
Hello Taras Bobrovytsky, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9045 to look at the new patch set (#3). Change subject: IMPALA-6410: Tool to cherrypick changes across branches. .. IMPALA-6410: Tool to cherrypick changes across branches. This script compares two branches and optionally cherry-picks changes across. It uses the Gerrit Change-Id as the key, and it supports a configuration file and a string to ignore commits. Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9 --- A bin/compare_branches.py A bin/ignored_commits.json 2 files changed, 218 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/9045/3 -- To view, visit http://gerrit.cloudera.org:8080/9045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9 Gerrit-Change-Number: 9045 Gerrit-PatchSet: 3 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 ) Change subject: IMPALA-4993: extend dictionary filtering to collections .. Patch Set 18: Code-Review+1 (1 comment) Thanks for fixing this gap in test coverage! Will also give Alex a change to look. http://gerrit.cloudera.org:8080/#/c/8775/18/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: http://gerrit.cloudera.org:8080/#/c/8775/18/testdata/bin/create-load-data.sh@443 PS18, Line 443: # IMPALA-4993: Add tests where nested collections are stored in more than row-group We unfortunately have multiple ways to load custom data files for tests. Doing it in this shell script isn't ideal and we may want to switch to some other method at some point. However, this seems like the best approach for now so it's consistent with the other multi-block/row group files -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 18 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 17 Jan 2018 23:01:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Hello Lars Volker, Tim Armstrong, Alex Behm, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8775 to look at the new patch set (#18). Change subject: IMPALA-4993: extend dictionary filtering to collections .. IMPALA-4993: extend dictionary filtering to collections Currently, top-level scalar columns in parquet files can be used at runtime to prune row-groups by evaluating certain conjuncts over the column's dictionary (if available). This change extends such pruning to scalar values that are stored in collection type columns. Currently, dictionary pruning works by finding eligible conjuncts for top-level slots. Since only top-level slots are supported, the slots are implicitly part of the scan node's tuple descriptor. With this change, we track eligible conjuncts by slot as well as the tuple that contains the slot (either top-level or nested collection). Since collection conjuncts are already managed by a map that associates tuple descriptors to a list of their conjuncts, this extension follows the existing representation. The frontend builds the mapping of SlotId to conjuncts that are dictionary filterable. This mapping now includes SlotId's that reference nested tuples. The backend is adjusted to use the same representation. In addition, collection readers are decomposed into scalar filterable columns and other, non-dictionary filterable readers. When filtering a row group using a conjunct associated to a (possibly) nested collection type, an additional tuple buffer is allocated per tuple descriptor. Testing: - e2e test extended to illustrate row-groups that are pruned by nested collection dictionary filters. Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/runtime/collection-value-builder.h M be/src/runtime/scoped-buffer.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A testdata/CustomerMultiBlock/README A testdata/CustomerMultiBlock/customer_multiblock.parquet M testdata/bin/create-load-data.sh M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test 21 files changed, 734 insertions(+), 250 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/18 -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 18 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Hello Lars Volker, Tim Armstrong, Alex Behm, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8775 to look at the new patch set (#17). Change subject: IMPALA-4993: extend dictionary filtering to collections .. IMPALA-4993: extend dictionary filtering to collections Currently, top-level scalar columns in parquet files can be used at runtime to prune row-groups by evaluating certain conjuncts over the column's dictionary (if available). This change extends such pruning to scalar values that are stored in collection type columns. Currently, dictionary pruning works by finding eligible conjuncts for top-level slots. Since only top-level slots are supported, the slots are implicitly part of the scan node's tuple descriptor. With this change, we track eligible conjuncts by slot as well as the tuple that contains the slot (either top-level or nested collection). Since collection conjuncts are already managed by a map that associates tuple descriptors to a list of their conjuncts, this extension follows the existing representation. The frontend builds the mapping of SlotId to conjuncts that are dictionary filterable. This mapping now includes SlotId's that reference nested tuples. The backend is adjusted to use the same representation. In addition, collection readers are decomposed into scalar filterable columns and other, non-dictionary filterable readers. When filtering a row group using a conjunct associated to a (possibly) nested collection type, an additional tuple buffer is allocated per tuple descriptor. Testing: - e2e test extended to illustrate row-groups that are pruned by nested collection dictionary filters. Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/runtime/collection-value-builder.h M be/src/runtime/scoped-buffer.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A testdata/CustomerMultiBlock/README A testdata/CustomerMultiBlock/customer_multiblock.parquet M testdata/bin/create-load-data.sh M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test 21 files changed, 734 insertions(+), 250 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/17 -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 17 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9009 ) Change subject: IMPALA-6386: Invalidate metadata at table level for dataload .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/9009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Gerrit-Change-Number: 9009 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 17 Jan 2018 22:52:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6193: Track memory of incoming data streams
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8914 ) Change subject: IMPALA-6193: Track memory of incoming data streams .. Patch Set 5: (8 comments) Looking pretty good, remaining comments are mainly about making the accounting as precise as possible. http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/rpc/impala-service-pool.cc File be/src/rpc/impala-service-pool.cc: http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/rpc/impala-service-pool.cc@58 PS5, Line 58: DCHECK(mem_tracker_ != nullptr); nit: extra indentation. http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@172 PS5, Line 172: incoming_request_tracker_->Release(rpc_context->GetTransferSize()); I generally think it's better to Consume() before Release()ing to avoid undercounting. We could actually avoid the problem I think if we make use of the fact that the two trackers share a parent and use ConsumeLocal() and ReleaseLocal() so that the offsetting increments and decrements don't modify consumption further up in the tree. E.g. DCHECK_EQ(incoming_request_tracker_->parent(), mem_tracker_->parent()); incoming_request_tracker_->ReleaseLocal(rpc_context->GetTransferSize(), mem_tracker_->parent()); mem_tracker_->ConsumeLocal(rpc_context->GetTransferSize(), mem_tracker_->parent()); As a side-effect we'd reduce contention on the MemTrackers further up the tree. http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@212 PS5, Line 212: incoming_request_tracker_->Release(rpc_context->GetTransferSize()); Has the memory been released at this point? http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@268 PS5, Line 268: incoming_request_tracker_->Release(rpc_context->GetTransferSize()); Has the rpc_context memory been released at this point? http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@387 PS5, Line 387: mem_tracker_->Release(ctx->rpc_context->GetTransferSize()); We haven't actually released the memory at this point, have we? Should we be using the DiscardTransfer() method? http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-mgr.cc@391 PS5, Line 391: mem_tracker_->Release(ctx->rpc_context->GetTransferSize()); Same question here. http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-recvr.cc@410 PS5, Line 410: recvr_->mem_tracker()->Release(ctx->rpc_context->GetTransferSize()); Is the memory released? http://gerrit.cloudera.org:8080/#/c/8914/5/be/src/runtime/krpc-data-stream-recvr.cc@458 PS5, Line 458: recvr_->mem_tracker()->Release(payload->rpc_context->GetTransferSize()); Same here -- To view, visit http://gerrit.cloudera.org:8080/8914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4 Gerrit-Change-Number: 8914 Gerrit-PatchSet: 5 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 22:42:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8973 ) Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges .. IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges USE and SHOW TABLES should be allowed if there is at least one table in a database where the user has table or column privileges. Impala incorrectly checked only for table privileges. To test this issue in AuthorizationTest.java, 'functional_avro' is added as a test database with only column level permissions. Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Reviewed-on: http://gerrit.cloudera.org:8080/8973 Reviewed-by: Alex BehmTested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M fe/src/test/resources/authz-policy.ini.template 4 files changed, 29 insertions(+), 6 deletions(-) Approvals: Alex Behm: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Gerrit-Change-Number: 8973 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal
[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8973 ) Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Gerrit-Change-Number: 8973 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 17 Jan 2018 22:40:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6399: Increase timeout in test observability to reduce flakiness
Lars Volker has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9034 ) Change subject: IMPALA-6399: Increase timeout in test_observability to reduce flakiness .. IMPALA-6399: Increase timeout in test_observability to reduce flakiness Change-Id: I58f7e7b367e73675be42e85f55fd7698d51f92af Reviewed-on: http://gerrit.cloudera.org:8080/9034 Reviewed-by: Sailesh MukilTested-by: Lars Volker --- M tests/query_test/test_observability.py 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Sailesh Mukil: Looks good to me, approved Lars Volker: Verified -- To view, visit http://gerrit.cloudera.org:8080/9034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I58f7e7b367e73675be42e85f55fd7698d51f92af Gerrit-Change-Number: 9034 Gerrit-PatchSet: 2 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-6399: Increase timeout in test observability to reduce flakiness
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9034 ) Change subject: IMPALA-6399: Increase timeout in test_observability to reduce flakiness .. Patch Set 1: Verified+1 Tests passed here: https://jenkins.impala.io/job/gerrit-verify-dryrun/1729/ -- To view, visit http://gerrit.cloudera.org:8080/9034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58f7e7b367e73675be42e85f55fd7698d51f92af Gerrit-Change-Number: 9034 Gerrit-PatchSet: 1 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 17 Jan 2018 22:31:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6382: Cap spillable buffer size and max row size query options
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9023 ) Change subject: IMPALA-6382: Cap spillable buffer size and max row size query options .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1738/ -- To view, visit http://gerrit.cloudera.org:8080/9023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1 Gerrit-Change-Number: 9023 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 22:31:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2397: Use atomics for metrics
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9012 ) Change subject: IMPALA-2397: Use atomics for metrics .. Patch Set 2: Code-Review+1 I don't have further comments, thanks for the inline replies. -- To view, visit http://gerrit.cloudera.org:8080/9012 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606 Gerrit-Change-Number: 9012 Gerrit-PatchSet: 2 Gerrit-Owner: Michael HoGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 22:10:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6382: Cap spillable buffer size and max row size query options
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9023 ) Change subject: IMPALA-6382: Cap spillable buffer size and max row size query options .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1 Gerrit-Change-Number: 9023 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 21:47:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6382: Cap spillable buffer size and max row size query options
Hello Thomas Tauber-Marshall, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9023 to look at the new patch set (#2). Change subject: IMPALA-6382: Cap spillable buffer size and max row size query options .. IMPALA-6382: Cap spillable buffer size and max row size query options Currently the default and min spillable buffer size and max row size query options accept any valid int64 value. Since the planner depends on these values for memory estimations, if a very large value close to the limits of int64 is set, the variables representing or relying on these estimates can overflow during different phases of query execution. This patch puts a reasonable upper limit of 1TB to these query options to prevent such a situation. Testing: Added backend query option tests. Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h 3 files changed, 32 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/9023/2 -- To view, visit http://gerrit.cloudera.org:8080/9023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1 Gerrit-Change-Number: 9023 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6382: Cap default and min spillable buffer size query options
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/9023 ) Change subject: IMPALA-6382: Cap default and min spillable buffer size query options .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/9023/1/be/src/exec/exec-node.h File be/src/exec/exec-node.h: http://gerrit.cloudera.org:8080/#/c/9023/1/be/src/exec/exec-node.h@227 PS1, Line 227: static const int64_t MAX_SPILLABLE_BUFFER_SIZE = 1LL << 40; // 1 TB > I think we should just have the constant in query-options.cc instead of add Done http://gerrit.cloudera.org:8080/#/c/9023/1/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/9023/1/be/src/service/query-options.cc@573 PS1, Line 573: case TImpalaQueryOptions::MAX_ROW_SIZE: { > I wonder if there's a similar bug with MAX_ROW_SIZE yes it has the same bug, added a fix for that too -- To view, visit http://gerrit.cloudera.org:8080/9023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36d3915f7019b13c3eb06f08bfdb38c71ec864f1 Gerrit-Change-Number: 9023 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 21:36:09 + Gerrit-HasComments: Yes
[native-toolchain-CR] Bump thrift to 0.9.3-p2; Add bison 3.0.4
Tianyi Wang has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9035 ) Change subject: Bump thrift to 0.9.3-p2; Add bison 3.0.4 .. Bump thrift to 0.9.3-p2; Add bison 3.0.4 This patch upgrades thrift to 0.9.3-p2. Two patches are applied: 1. THRIFT-3650. 2. Remove Twisted Python dependency. Thrift 0.9.3 depends on bison 2.5 or higher, which is not available on some distributions including centos 5. This patch adds bison 3.0.4 to the toolchain. The minimum openssl version is bumped to 1.0.1. Bundled openssl will be used to build thrift if the requirement is not met. Testing: It builds on all supported OSes. Change-Id: I710e977e4a182e5d3aaf58f1233c68c04585bebd --- M buildall.sh A source/bison/build.sh M source/thrift/build.sh A source/thrift/thrift-0.9.3-patches/0001-THRIFT-3650.patch A source/thrift/thrift-0.9.3-patches/0002-reomve-twisted-pyton-dependency.patch 5 files changed, 163 insertions(+), 34 deletions(-) Approvals: Philip Zeyliger: Looks good to me, approved Tim Armstrong: Looks good to me, approved Tianyi Wang: Verified -- To view, visit http://gerrit.cloudera.org:8080/9035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I710e977e4a182e5d3aaf58f1233c68c04585bebd Gerrit-Change-Number: 9035 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong
[native-toolchain-CR] Bump thrift to 0.9.3-p2; Add bison 3.0.4
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9035 ) Change subject: Bump thrift to 0.9.3-p2; Add bison 3.0.4 .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/9035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I710e977e4a182e5d3aaf58f1233c68c04585bebd Gerrit-Change-Number: 9035 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 21:22:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6193: Track memory of incoming data streams
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8914 ) Change subject: IMPALA-6193: Track memory of incoming data streams .. Patch Set 4: (9 comments) Thanks for the reviews and discussion. Please see PS5 and my inline comments. http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/impala-service-pool.h File be/src/rpc/impala-service-pool.h: http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/impala-service-pool.h@64 PS3, Line 64: > Maybe make it "MemTracker* const" since it's not modified after constructio Done http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/impala-service-pool.cc File be/src/rpc/impala-service-pool.cc: http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/impala-service-pool.cc@141 PS3, Line 141: mem_tracker_->Consume(c->GetTransferSize()); > Is it reasonable to call TryConsume() here and handle the failure, e.g. in Following the discussion in krpc-data-stream-mgr.cc I left this unchanged for now. http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/impala-service-pool.cc@180 PS3, Line 180: mem_tracker_->Release(incoming->GetTransferSize()); > Leaving the memory temporarily untracked here isn't great. Is there a way w I updated the change to pass the service memtracker to the data stream manager. This allows us to Release the memory there before we either track it again in the data stream manager or call AddBatch on a receiver. However, we face a similar pattern there, that we release memory before making a function call. It does not cross the KRPC service boundary so it may be less prone to errors, but still seems not ideal. Do you have a pattern to recommend for these cases? Should we always try to pass the sender's tracker along with the payload so that the receiver of a call can release memory before consuming it from its own tracker? Should we introduce a small helper class that represents a memory allocation, keeps a pointer to its current_tracker_ and has a Move(MemTracker* new_tracker) method to release memory from current_tracker_, consume it from new_tracker, and set current_tracker_ = new_tracker. Such a class could also DCHECK in its d'tor if its memory has not been released. http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/rpc-mgr.h File be/src/rpc/rpc-mgr.h: http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/rpc-mgr.h@165 PS3, Line 165: /// Passed to new services. : MemTracker* mem_tracker_; > Is this still needed ? Removed. http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/exec-env.cc@328 PS3, Line 328: MemTracker* stream_mgr_tracker = obj_pool_->Add( > Is there a more descriptive name we could use for this one. E.g. "Data Stre Done, thank you for coming up with a suggestion. http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/exec-env.cc@343 PS3, Line 343: // Bump thread cache to 1GB to reduce contention for TCMalloc central > nit: extra blank line Done http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-mgr.cc@175 PS3, Line 175: ctx->serialized_size() > Wouldn't the majority of the changes be not needed if we add a new interfac Done http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-recvr.cc@324 PS3, Line 324: Consume > Same question about TryConsume Ack http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-recvr.cc@373 PS3, Line 373: Consume > Here too Ack -- To view, visit http://gerrit.cloudera.org:8080/8914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4 Gerrit-Change-Number: 8914 Gerrit-PatchSet: 4 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 20:41:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6193: Track memory of incoming data streams
Hello Michael Ho, Tim Armstrong, Bikramjeet Vig, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8914 to look at the new patch set (#5). Change subject: IMPALA-6193: Track memory of incoming data streams .. IMPALA-6193: Track memory of incoming data streams This change adds memory tracking to incoming transmit data RPCs when using KRPC. We track memory against a global tracker called "Data Stream Service" until it is handed over to the stream manager. There we track it in a global tracker called "Data Stream Queued RPC Calls" until a receiver registers and takes over the early sender RPCs. Inside the receiver, memory for deferred RPCs is tracked against the fragment instance's memtracker until we unpack the batches and add them to the row batch queue. The DCHECK in MemTracker::Close() covers that all memory consumed by a tracker gets release eventually. In addition to that, this change adds a custom cluster test that makes sure that queued memory gets tracked by inspecting the peak consumption of the new memtrackers. Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4 --- M be/src/rpc/impala-service-pool.cc M be/src/rpc/impala-service-pool.h M be/src/rpc/rpc-mgr.cc M be/src/rpc/rpc-mgr.h M be/src/runtime/exec-env.cc M be/src/runtime/krpc-data-stream-mgr.cc M be/src/runtime/krpc-data-stream-mgr.h M be/src/runtime/krpc-data-stream-recvr.cc M be/src/runtime/mem-tracker.h M be/src/util/memory-metrics.h M common/protobuf/data_stream_service.proto 11 files changed, 91 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8914/5 -- To view, visit http://gerrit.cloudera.org:8080/8914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4 Gerrit-Change-Number: 8914 Gerrit-PatchSet: 5 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Bumping version to 3.0.
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/9044 ) Change subject: Bumping version to 3.0. .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/9044 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id39f9648cb9b40b67b1029fa8c4132cd04c1d0c6 Gerrit-Change-Number: 9044 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Taras Bobrovytsky Gerrit-Comment-Date: Wed, 17 Jan 2018 20:35:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6410: Tool to cherrypick changes across branches.
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/9045 ) Change subject: IMPALA-6410: Tool to cherrypick changes across branches. .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/9045/2/bin/compare_branches.py File bin/compare_branches.py: http://gerrit.cloudera.org:8080/#/c/9045/2/bin/compare_branches.py@21 PS2, Line 21: # [ { "source": "master", "target": "2.x", "commits": [ "", "" ] } ] long line http://gerrit.cloudera.org:8080/#/c/9045/2/bin/compare_branches.py@30 PS2, Line 30: from pprint import pformat We should have "import..." grouped together and sorted, followed by "from ... import ..." statements also sorted http://gerrit.cloudera.org:8080/#/c/9045/2/bin/compare_branches.py@179 PS2, Line 179: print "Cherrypicking %d changes." % (len(cherry_pick_hashes),) The main() function seems long. Do you think it would make sense to separate the cherry picking into its own function? -- To view, visit http://gerrit.cloudera.org:8080/9045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9 Gerrit-Change-Number: 9045 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Taras Bobrovytsky Gerrit-Comment-Date: Wed, 17 Jan 2018 20:35:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2397: Use atomics for metrics
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9012 ) Change subject: IMPALA-2397: Use atomics for metrics .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h File be/src/util/metrics.h: http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@147 PS1, Line 147: virtual void set_value(const T& value) = 0; > Currently, there are only two derived classes for SimpleMetric and both of What's the point of making it a virtual function if it doesn't need to be? It seems better to just have a regular function on the derived classes - it's more efficient and easier to understand which function is being called. I checked out this PS and made it non-virtual and it seems to compile and run fine. http://gerrit.cloudera.org:8080/#/c/9012/1/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/9012/1/common/thrift/metrics.json@886 PS1, Line 886: "kind": "PROPERTY", > Unfortunately, if we set this to a GAUGE, we need to back it with a gauge i I think SimpleProperty implements all the functionality needed for this one case, it would just need to declare itself as a TMetricKind::GAUGE. I.e we could implement a skeleton DoubleGauge that only supports SetValue() and GetValue(). This isn't the end of the world, but it would be good if we could avoid the internal implementation details leaking out into our metadata about metrics. -- To view, visit http://gerrit.cloudera.org:8080/9012 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606 Gerrit-Change-Number: 9012 Gerrit-PatchSet: 1 Gerrit-Owner: Michael HoGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 20:14:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8820 ) Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE .. Patch Set 16: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1737/ -- To view, visit http://gerrit.cloudera.org:8080/8820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f Gerrit-Change-Number: 8820 Gerrit-PatchSet: 16 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Jan 2018 19:52:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8820 ) Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE .. Patch Set 16: Code-Review+2 Failed due to flaky test IMPALA-6399. Rebase and retry. -- To view, visit http://gerrit.cloudera.org:8080/8820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f Gerrit-Change-Number: 8820 Gerrit-PatchSet: 16 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Jan 2018 19:52:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6075: Add Impala daemon metric for catalog version.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8949 ) Change subject: IMPALA-6075: Add Impala daemon metric for catalog version. .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/8949/2/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8949/2/be/src/service/impala-server.h@978 PS2, Line 978: CatalogUpdateMetrics UpdateCatalogVersionMetrics() http://gerrit.cloudera.org:8080/#/c/8949/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8949/2/be/src/service/impala-server.cc@1325 PS2, Line 1325: ImpaladMetrics::CATALOG_CURR_VER->set_value(catalog_version); Maybe my comment was not clear. What I meant is that you should also expose the catalog_topic_version and the catalog_service_id as a metric. http://gerrit.cloudera.org:8080/#/c/8949/2/be/src/util/impalad-metrics.h File be/src/util/impalad-metrics.h: http://gerrit.cloudera.org:8080/#/c/8949/2/be/src/util/impalad-metrics.h@113 PS2, Line 113: CATALOG_CURR_VER CATALOG_VERSION, CATALOG_TOPIC_VERSION, CATALOG_SERVICE_ID http://gerrit.cloudera.org:8080/#/c/8949/2/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/8949/2/common/thrift/metrics.json@233 PS2, Line 233: The catalog version which is currently with impalad. Maybe "Version of the Impalad catalog." -- To view, visit http://gerrit.cloudera.org:8080/8949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id2307eb434561ed74ff058106541c0ebda017d97 Gerrit-Change-Number: 8949 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Dimitris TsirogiannisGerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 19:47:55 + Gerrit-HasComments: Yes
[native-toolchain-CR] Bump thrift to 0.9.3-p2; Add bison 3.0.4
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9035 ) Change subject: Bump thrift to 0.9.3-p2; Add bison 3.0.4 .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I710e977e4a182e5d3aaf58f1233c68c04585bebd Gerrit-Change-Number: 9035 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 19:40:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523 ) Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls .. Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/8523/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8523/4//COMMIT_MSG@24 PS4, Line 24: datastructures nit: data structures http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/query-schedule.h@260 PS4, Line 260: // Map of plan node to scan ranges that are computed from request_. If a plan node is : // not present, then it has no associated additional scan ranges. : // Populated by the ComputeScanRanges. : std::mapcomputed_ranges_; How is this related to PerNodeScanRanges (L41)? http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/scheduler.cc@258 PS4, Line 258: // Combine actual and computed ranges from the schedule. : const vector* scan_ranges = : const vector* computed_ranges = : schedule->GetComputedRanges(entry.first); : vector combined_ranges; : if (computed_ranges != nullptr) { : for (const auto& range : entry.second) { : if (!range.scan_range.__isset.hdfs_file_split_spec) { : combined_ranges.push_back(range); : } : } : for (const auto& range : (*schedule->GetComputedRanges(entry.first))) { : combined_ranges.push_back(range); : } : scan_ranges = _ranges; : } I think my previous comment made things worse, sorry about that. I think it's fine to move the scan range generation here. In that way, we will avoid the double iteration over all plan nodes and the additional state (per node computed scan ranges). Just make sure you wrap this into a separate function. http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift@199 PS4, Line 199: 3: required i64 partition_id Comment what this is referencing. http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift@209 PS4, Line 209: 4: optional THdfsFileSplitSpec hdfs_file_split_spec Add a comment either here or at the top to describe when THdfsFileSplit vs THdfsFileSplitSpec is set. Actually, THdfsFileSplitSpec may not be the best name. How about THdfsFileSplitGenerationSpec or something like that? http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@126 PS4, Line 126: fileFormat' nit: missing quote http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@712 PS4, Line 712: fileDesc.getNumFileBlocks() What will happen if this is a zero-byte file (no blocks). Can't we prune this early on or it never reaches that point? http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@743 PS4, Line 743: partitionFs.getDefaultBlockSize(new Path(fileDesc.getFileName()) Do we need to do this for every file? Can't we do this once per partition? http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@761 PS4, Line 761: disk ids are expected in blocks, :* checkMissingDiskIds should be set to true. If disk ids are checked, "checkMissingDiskIds is true, " -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 4 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 17 Jan 2018 19:30:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2642: Fix a potential deadlock in statestore
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/9038 ) Change subject: IMPALA-2642: Fix a potential deadlock in statestore .. Patch Set 2: (3 comments) Working on the test case. http://gerrit.cloudera.org:8080/#/c/9038/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9038/2//COMMIT_MSG@17 PS2, Line 17: Testing: The problem is easily reproduced by lowering the value of : STATESTORE_MAX_SUBSCRIBERS to 3, and then launching a mini cluster : with 3 impalads. With the fix, we can see the following entry in the : statestore log: > Is it possible to add a test for this? Doing it. The approach I am taking requires changing STATESTORE_MAX_SUBSCRIBERS to FLAGS_statestore_max_subscribers, so that we can start statestored with a low max value. I think this is good to do anyway, as that allows us to test other boundary behavior. http://gerrit.cloudera.org:8080/#/c/9038/2/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/9038/2/be/src/statestore/statestore.cc@353 PS2, Line 353: // Assumes that the caller has taken subscribers_lock_ > I would add this comment in the header comment in the .h file. Done http://gerrit.cloudera.org:8080/#/c/9038/2/be/src/statestore/statestore.cc@353 PS2, Line 353: // Assumes that the caller has taken subscribers_lock_ > Its kind of weird IMO that OfferUpdate() expects the caller to take subscri Thought of that, ie, moving the call to OfferUpdate from DoSubscriberUpdate to another scope that does not have subscribers_lock_. Then I saw that UnregisterSubscriber() also expects the caller to take subscribers_lock_. I thought it made sense to follow the same pattern here. I think there is scope for cleaning up the way locking in the statestore code. For instance, RegisterSubscriber() takes subscribers_lock_, instead of its caller. I am just not sure if it makes sense to do that as part of this jira. Another thing: BlockingQueue::BlockingPut() does not release the mutex it acquires in the shut down path. I know we're shutting down anyway, but if ever we implement clean shut down that does some work, a deadlock might happen here. -- To view, visit http://gerrit.cloudera.org:8080/9038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5d49dede221ce1f50ec299643b5532c61f93f0c6 Gerrit-Change-Number: 9038 Gerrit-PatchSet: 2 Gerrit-Owner: Zoram ThangaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 17 Jan 2018 19:22:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2397: Use atomics for metrics
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9012 ) Change subject: IMPALA-2397: Use atomics for metrics .. Patch Set 2: (10 comments) http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h File be/src/util/metrics.h: http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@128 PS1, Line 128: /// - 'property' which is a value store which can be read and written only > Maybe mention that the metric kinds can serve as a hint for how management Done http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@144 PS1, Line 144: /// Returns the current value. Thread-safe. > I feel like we should rename to GetValue()/SetValue() since they're no long Renamed to GetValue() / SetValue(). Keep them as pure virtual as their implementations can be different in derived classes but they share the implementation for the ToJson() interface. http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@147 PS1, Line 147: /// Sets the current value. Thread-safe. > Does set_value() need to be a virtual function? I can't think of a case whe Currently, there are only two derived classes for SimpleMetric and both of them implement this interface albeit differently. So, it seems to make sense to have it as virtual for now until we have more classes which inherit SimpleMetric but somehow doesn't implement SetValue(). http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@215 PS1, Line 215: operty; > nit: maybe rename to metric_kind_t to make it more clear that it's a type? Done http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@225 PS1, Line 225: } > The combination of value_/value()/CalculateValue() seems a bit too complica Good idea. Renamed value() to GetValue(). http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@232 PS1, Line 232: /// need to take care of synchronization among sources. > Could this be a compile time check / assertion? I believe the compiler will most likely fold this as a constant during compilation so it's a no-op for GAUGE. http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@252 PS1, Line 252: typedef class AtomicMetric IntCounter; > I think it might be slightly easier to follow if we move these below the cl Done http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@265 PS1, Line 265: int64_t sum = 0; > The base class comment says that this happens atomically, but this implemen The new code and the old code both don't hold locks of all gauges to prevent them from being written before reading them. So, although each read of the gauges is atomic by itself, there is no guarantee on atomicity between all the gauges. The base class comments only means that it's up to the derived class to implement its own synchronization and the model above apparently seems good enough for SumGauge. Note that value() and set_value() are racy by nature as the value can change right after the value is read. The atomicity guarantee has more to do with read-modify-write between multiple writers and atomic read of the value (i.e. no half written 64-bit value). http://gerrit.cloudera.org:8080/#/c/9012/1/be/src/util/metrics.h@284 PS1, Line 284: > nit: this might fit into a single line Done http://gerrit.cloudera.org:8080/#/c/9012/1/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/9012/1/common/thrift/metrics.json@886 PS1, Line 886: "kind": "PROPERTY", > It seems like this should still be exposed as a gauge since it's a numeric Unfortunately, if we set this to a GAUGE, we need to back it with a gauge in the code too. In which case, we may need to implement DoubleGauge. -- To view, visit http://gerrit.cloudera.org:8080/9012 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606 Gerrit-Change-Number: 9012 Gerrit-PatchSet: 2 Gerrit-Owner: Michael HoGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 19:18:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9009 ) Change subject: IMPALA-6386: Invalidate metadata at table level for dataload .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1736/ -- To view, visit http://gerrit.cloudera.org:8080/9009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Gerrit-Change-Number: 9009 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 17 Jan 2018 19:12:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/9009 ) Change subject: IMPALA-6386: Invalidate metadata at table level for dataload .. Patch Set 3: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/9009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Gerrit-Change-Number: 9009 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 17 Jan 2018 19:10:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/9009 ) Change subject: IMPALA-6386: Invalidate metadata at table level for dataload .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/9009/2/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/9009/2/testdata/bin/generate-schema-statements.py@492 PS2, Line 492: impala_output = Statements() > Maybe s/impala_output/impala_create/g Done http://gerrit.cloudera.org:8080/#/c/9009/2/testdata/bin/generate-schema-statements.py@702 PS2, Line 702: hive_output.write_to_file('load-' + output_name + '-hive-generated.sql') > Ideally these names should match as well (hive_load, hbase_load), but it is I'm going to skip this. hive_output incorporates a mix of create, load, and other things. This will get a bit cleaner when we do parallel Hive. -- To view, visit http://gerrit.cloudera.org:8080/9009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Gerrit-Change-Number: 9009 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 17 Jan 2018 19:10:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload
Hello Philip Zeyliger, Zach Amsden, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9009 to look at the new patch set (#3). Change subject: IMPALA-6386: Invalidate metadata at table level for dataload .. IMPALA-6386: Invalidate metadata at table level for dataload Dataload currently executes bin/load-data.py for TPC-H, TPC-DS, and functional-query concurrently. One of the final steps for bin/load-data.py is to run a global "invalidate metadata". Global "invalidate metadata" commands are known to cause problem on concurrent systems. See IMPALA-5087. For dataload, if TPC-H executes "invalidate metadata" while TPC-DS is still creating tables and adding partitions, the TPC-DS executor might erroneously believe that a table does not exist. This changes dataload to invalidate metadata at an individual table level rather than globally. This prevents the concurrency issue. This also changes the names of some of the intermediate SQL files generated by generate-schema-statements.py and consumed by load-data.py to make them less confusing. Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f --- M bin/load-data.py M testdata/bin/generate-schema-statements.py 2 files changed, 18 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/9009/3 -- To view, visit http://gerrit.cloudera.org:8080/9009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Gerrit-Change-Number: 9009 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 9: Saw this, will take a look. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Jan 2018 19:02:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8973 ) Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1735/ -- To view, visit http://gerrit.cloudera.org:8080/8973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Gerrit-Change-Number: 8973 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 17 Jan 2018 19:00:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5519: Allocate Runtime filter memory from Buffer pool
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8971 to look at the new patch set (#3). Change subject: IMPALA-5519: Allocate Runtime filter memory from Buffer pool .. IMPALA-5519: Allocate Runtime filter memory from Buffer pool This patch adds changes to the planner to account for memory used by bloom filters at the fragment instance level. Also adds changes to allocate memory for the bloom filters from the buffer pool. Testing: - Modified Planner Tests and end to end tests to account for memory reservation for the runtime filters. - Modified backend tests and benchmarks to use the bufferpool for bloom filter allocation. - Add an end to end test. - Ran rest of the core tests. Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f --- M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/service/fe-support.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/util/backend-gflag-util.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M common/thrift/BackendGflags.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/admission-reject-min-reservation.test M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test M testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test 40 files changed, 883 insertions(+), 550 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/8971/3 -- To view, visit http://gerrit.cloudera.org:8080/8971 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f Gerrit-Change-Number: 8971 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8973 ) Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges .. Patch Set 5: Code-Review+2 Thanks for moving the tests! Much better now :) -- To view, visit http://gerrit.cloudera.org:8080/8973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Gerrit-Change-Number: 8973 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 17 Jan 2018 18:59:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5519: Allocate Runtime filter memory from Buffer pool
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8971 ) Change subject: IMPALA-5519: Allocate Runtime filter memory from Buffer pool .. Patch Set 2: (21 comments) http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator-backend-state.cc@410 PS2, Line 410: return res.__isset.status ? Status(res.status) : Status::OK(); > Why not always set TPublishFilterParams.status. This seems to be adding ano Removed http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc@1179 PS2, Line 1179: // Cancel query if the result of PublishFilter rpc returns a non-ok status. > Nit: slightly redundant wording with result/returns. E.g. "if the PublishFi As discussed, removing cancellation logic http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc@1182 PS2, Line 1182: "This Error was encountered in Fragment $0 at $1:$2", fragment_idx, > nit: don't need caps on Error/Fragment. As discussed, removing cancellation logic http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc@1184 PS2, Line 1184: Cancel(); > If we're going to cancel the query there's no point sending out more filter As discussed, removing cancellation logic http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h File be/src/runtime/runtime-filter-bank.h: http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h@111 PS2, Line 111: input > output? Done http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h@118 PS2, Line 118: *& > Let's just use a double pointer for the output to be consistent with the re Done http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h@165 PS2, Line 165: total_filter_mem_required_ > I think total_bloom_filter_mem_required_ would be more consistent with the Done http://gerrit.cloudera.org:8080/#/c/8971/1/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/8971/1/be/src/runtime/runtime-filter-bank.cc@189 PS1, Line 189: if (buffer_pool_client_.GetUnusedReservation() < required_space) { > I don't see this change in PS2 my mistake, I only added the DCHECK in AllocateScratchBloomFilter, forgot to put it here too. Will add it in the next patch. http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.cc@a78 PS2, Line 78: > Hmm so we were unnecessarily re-allocating filters if they were already reg yes thats right http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.cc@a80 PS2, Line 80: > Was the lock removed accidentally? yes, I will add it back http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.h@71 PS2, Line 71: void Close(); > Maybe mention that Close() is safe to call multiple times or if Init() wasn Done http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.h@111 PS2, Line 111: /// buffer pool allocation failed and the filter should be discarded. > Is the change in this function's behaviour needed now? I would prefer to keep this change in case this methods is called before calling Init() or after calling close. I will update the comment to more appropriately reflect the recent change. http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@30 PS2, Line 30: : always_false_(true), > Maybe initialize the constant ones at their definition while we're here. I Done http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@54 PS2, Line 54: RETURN_IF_ERROR( : buffer_pool_->AllocateBuffer(buffer_pool_client_, alloc_size, _handle_)); will call Close() before this to make sure Init() is safe to call multiple times. http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@63 PS2, Line 63: directory_ > directory_ != nullptr, to be consistent with the rest of the code Done http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@72 PS2, Line 72: if (directory_) { > directory_ != nullptr, to be consistent with the rest of the code Done http://gerrit.cloudera.org:8080/#/c/8971/2/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift:
[Impala-ASF-CR] IMPALA-5191: Standardize column alias behavior
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8801 ) Change subject: IMPALA-5191: Standardize column alias behavior .. Patch Set 12: Code-Review+2 (1 comment) I'l merge after you address the final comment. Can you file a JIRA and work with John Russell to get the new behavior documented? We should document it as an incompatible change. http://gerrit.cloudera.org:8080/#/c/8801/12/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/8801/12/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@1056 PS12, Line 1056: // Constant arithmetic expressions should not be interpreted as ordinals Constant exprs should not be interpreted as ordinals. (this does not only apply to constant arithmetic exprs, but any constant expr, e.g. if(true,1,int_col) should also not be interpreted as an ordinal) -- To view, visit http://gerrit.cloudera.org:8080/8801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f82483b486acf6953876cfa672b0d034f3709a8 Gerrit-Change-Number: 8801 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Jan 2018 18:57:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Bumping version to 3.0.
Philip Zeyliger has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/9044 ) Change subject: Bumping version to 3.0. .. Bumping version to 3.0. This changes the version that Impala presents as to 3.0. We are simultaneously introducing a 2.x branch that continues to identify itself as 2.x (where x=11 right now). This commit is not for 2.x. Change-Id: Id39f9648cb9b40b67b1029fa8c4132cd04c1d0c6 --- M bin/save-version.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/9044/2 -- To view, visit http://gerrit.cloudera.org:8080/9044 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id39f9648cb9b40b67b1029fa8c4132cd04c1d0c6 Gerrit-Change-Number: 9044 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] Bumping version to 3.0.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9044 Change subject: Bumping version to 3.0. .. Bumping version to 3.0. This changes the version that Impala presents as to 3.0. We are simultaneously introducing a 2.x branch that continues to identify itself as 2.x (where x=11 right now). This commit is not for 2.x. Change-Id: Id39f9648cb9b40b67b1029fa8c4132cd04c1d0c6 --- M bin/save-version.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/9044/1 -- To view, visit http://gerrit.cloudera.org:8080/9044 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id39f9648cb9b40b67b1029fa8c4132cd04c1d0c6 Gerrit-Change-Number: 9044 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6410: Tool to cherrypick changes across branches.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9045 Change subject: IMPALA-6410: Tool to cherrypick changes across branches. .. IMPALA-6410: Tool to cherrypick changes across branches. This script compares two branches and optionally cherry-picks changes across. It uses the Gerrit Change-Id as the key, and it supports a configuration file and a string to ignore commits. Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9 --- A bin/compare_branches.py A bin/ignored_commits.json 2 files changed, 194 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/9045/1 -- To view, visit http://gerrit.cloudera.org:8080/9045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9 Gerrit-Change-Number: 9045 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 9: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Jan 2018 17:40:35 + Gerrit-HasComments: No
[native-toolchain-CR] Bump thrift to 0.9.3-p2; Add bison 3.0.4
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9035 ) Change subject: Bump thrift to 0.9.3-p2; Add bison 3.0.4 .. Patch Set 2: I had a question on the JIRA about patch 0011 that was previous applied to thrift. -- To view, visit http://gerrit.cloudera.org:8080/9035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I710e977e4a182e5d3aaf58f1233c68c04585bebd Gerrit-Change-Number: 9035 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 17:24:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5478: Run TPCDS queries with decimal v2 enabled
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/8985 ) Change subject: IMPALA-5478: Run TPCDS queries with decimal_v2 enabled .. Patch Set 3: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/8985/2/tests/query_test/test_tpcds_queries.py File tests/query_test/test_tpcds_queries.py: http://gerrit.cloudera.org:8080/#/c/8985/2/tests/query_test/test_tpcds_queries.py@279 PS2, Line 279: v.get_value('table_format').compression_type != 'record') : cls.ImpalaTestMatrix.add_mandatory_exec_option('decimal_v2', 1) : : if cls.exploration_strategy() != 'exhaustive': > That's how we run the normal TPCDS workload. Are you suggesting to run this Nevermind, misread something. Done. http://gerrit.cloudera.org:8080/#/c/8985/2/tests/query_test/test_tpcds_queries.py@285 PS2, Line 285: cls.ImpalaTestMatrix.add_constraint(lambda v:\ : v.get_value('table_format').file_format in ['parquet']) : : cls.ImpalaTestMatrix.add_constraint(lambda v:\ : v.get_value('exec_option')['batch_size'] == 0) : : def test_tpcds_q1(self, vector): : self.run_test_case(self.get_workload() + '-decimal > Simlar response to above. Are you suggesting to run this workload different Done -- To view, visit http://gerrit.cloudera.org:8080/8985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib867c51a521ec4a087bc127d99aee4b95ba97733 Gerrit-Change-Number: 8985 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 17 Jan 2018 16:52:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#9). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 17 files changed, 787 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/9 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-3282: Adds regexp escape built-in function
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8900 ) Change subject: IMPALA-3282: Adds regexp_escape built-in function .. Patch Set 5: (2 comments) Thanks for the iteration! http://gerrit.cloudera.org:8080/#/c/8900/5/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8900/5/be/src/exprs/string-functions-ir.cc@623 PS5, Line 623: const strings::CharSet to_escape(".\\+*?[^]$(){}=!<>|:-"); This is a constant (perhaps "REGEX_ESCAPE_CHARACTERS") and could be declared static const in the StringFunctions:: namespace. Seems like right now it's creating a CharSet for every value? See FACTORIAL_LOOKUP, which seems similar in spirit. http://gerrit.cloudera.org:8080/#/c/8900/5/be/src/exprs/string-functions-ir.cc@637 PS5, Line 637: DCHECK_GE(result.len, 0); DCHECK_GE(result.len, str.len) should be true too, no? -- To view, visit http://gerrit.cloudera.org:8080/8900 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I84c3e0ded26f6eb20794c38b75be9b25cd111e4b Gerrit-Change-Number: 8900 Gerrit-PatchSet: 5 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 17 Jan 2018 16:33:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/8400/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8400/8//COMMIT_MSG@20 PS8, Line 20: > nit: wrong word? Done http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup@1055 PS6, Line 1055: plan_hints:hints > nit: please check that all review comments are marked as "Done" when sendin Done http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69 PS6, Line 69:* Helper class for parsing. > Thanks for the explanation, I think I understand it now. Could we just have I have no problems with going that way, but I am not sure that it would make the change smaller: InsertStmt would also need a Set/AddPlanHints function, and it would be probably better to rewrite InsertStmt to always use the setter instead of constructor arguments. http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@71 PS8, Line 71: > nit: "rule that checks" or "rules that check" Done http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@75 PS8, Line 75: > partitio > nit: we generally seem to use camel case for acronyms, e.g. see createCtasT Done http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@344 PS6, Line 344: 01:EXCHANGE [UNPARTITIONED] > Yes, I think it's intentional. Maybe the docs need to be more clear. Can yo I have created IMPALA-6408 to track the doc issue. -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 17 Jan 2018 16:04:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#10). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 332 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/10 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5191: Standardize column alias behavior
Hello Taras Bobrovytsky, Tim Armstrong, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8801 to look at the new patch set (#12). Change subject: IMPALA-5191: Standardize column alias behavior .. IMPALA-5191: Standardize column alias behavior We should not perform alias substitution in the subexpressions of GROUP BY, HAVING, and ORDER BY to be more standard conformant. === Allowed === SELECT int_col / 2 AS x FROM functional.alltypes GROUP BY x; SELECT int_col / 2 AS x FROM functional.alltypes ORDER BY x; SELECT NOT bool_col AS nb FROM functional.alltypes GROUP BY nb HAVING nb; === Not allowed === SELECT int_col / 2 AS x FROM functional.alltypes GROUP BY x / 2; SELECT int_col / 2 AS x FROM functional.alltypes ORDER BY -x; SELECT int_col / 2 AS x FROM functional.alltypes GROUP BY x HAVING x > 3; Some extra checks were added to AnalyzeExprsTest.java. I had to update other tests to make them pass since the new behavior is more restrictive. I added alias.test to the end-to-end tests. Change-Id: I0f82483b486acf6953876cfa672b0d034f3709a8 --- M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test A testdata/workloads/functional-query/queries/QueryTest/alias.test M tests/query_test/test_queries.py 8 files changed, 232 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/8801/12 -- To view, visit http://gerrit.cloudera.org:8080/8801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f82483b486acf6953876cfa672b0d034f3709a8 Gerrit-Change-Number: 8801 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-5191: Standardize column alias behavior
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8801 ) Change subject: IMPALA-5191: Standardize column alias behavior .. Patch Set 11: (4 comments) http://gerrit.cloudera.org:8080/#/c/8801/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8801/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@891 PS11, Line 891:* If the rewritten expr is a NumericLiteral, then return the original expr, otherwise > If given expr is rewritten into an integer literal ... Done http://gerrit.cloudera.org:8080/#/c/8801/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@896 PS11, Line 896: private Expr rewriteIfResultIsNotNumericLiteral(ExprRewriter rewriter, Expr expr) > rewriteCheckOrdinalResult() Done http://gerrit.cloudera.org:8080/#/c/8801/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@899 PS11, Line 899: if (rewrittenExpr instanceof NumericLiteral) return expr; > we use {} for multi-line if else Done http://gerrit.cloudera.org:8080/#/c/8801/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/8801/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@1055 PS11, Line 1055: "group by one, substring(cast(two as string), 1, 1), three"); > Add tests for the constant-folding changes, e.g.: Added tests. I re-read our discussion about ordinals in HAVING: https://gerrit.cloudera.org/#/c/8801/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@548 I checked PostgreSQL and it does not allow ordinals in HAVING. Anyway, I implemented it as we discussed, just wanted to clarify that. -- To view, visit http://gerrit.cloudera.org:8080/8801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f82483b486acf6953876cfa672b0d034f3709a8 Gerrit-Change-Number: 8801 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Jan 2018 15:56:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#9). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 331 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/9 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 8: Hi Alex, Dimitris, Could you review this change, especially the FE part? -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Jan 2018 14:16:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#8). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 17 files changed, 787 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/8 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@89 PS6, Line 89: CopyRows > Yeah, I think we can just assume that the frontend will only generate plans I added comments about it to this class, and to the corresponding plan node class as well. http://gerrit.cloudera.org:8080/#/c/9005/7/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/7/be/src/exec/cardinality-check-node.cc@116 PS7, Line 116: Status CardinalityCheckNode::Reset(RuntimeState* state) { > I don't think we have test coverage for this code yet, since it's only exec I added some tests to nested-types-subplan.test. They use tpch_nested_parquet.customer because functional.allcomplextypes is empty. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Jan 2018 14:13:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8973 ) Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/8973/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java File fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java: http://gerrit.cloudera.org:8080/#/c/8973/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@102 PS3, Line 102: // SELECT permissions on columns ('id') on 'functional_avro.alltypessmall' > missing opening single-quote for 'functional_avro.alltypessmall' Done http://gerrit.cloudera.org:8080/#/c/8973/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@399 PS3, Line 399: privilege = new TPrivilege("", TPrivilegeLevel.SELECT, > typo: privilege (which has already been declared) Thanks for spotting it! Please ignore patch set 4, and compare 3 to 5. Note that it may look like if this could fit into 1 line, but it would be 91 characters. -- To view, visit http://gerrit.cloudera.org:8080/8973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Gerrit-Change-Number: 8973 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 17 Jan 2018 13:34:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges
Hello Lars Volker, Laszlo Gaal, Gabor Kaszab, Attila Jeges, Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8973 to look at the new patch set (#5). Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges .. IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges USE and SHOW TABLES should be allowed if there is at least one table in a database where the user has table or column privileges. Impala incorrectly checked only for table privileges. To test this issue in AuthorizationTest.java, 'functional_avro' is added as a test database with only column level permissions. Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M fe/src/test/resources/authz-policy.ini.template 4 files changed, 29 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/8973/5 -- To view, visit http://gerrit.cloudera.org:8080/8973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Gerrit-Change-Number: 8973 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal
[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges
Hello Lars Volker, Laszlo Gaal, Gabor Kaszab, Attila Jeges, Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8973 to look at the new patch set (#4). Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges .. IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges USE and SHOW TABLES should be allowed if there is at least one table in a database where the user has table or column privileges. Impala incorrectly checked only for table privileges. To test this issue in AuthorizationTest.java, 'functional_avro' is added as a test database with only column level permissions. Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M fe/src/test/resources/authz-policy.ini.template 4 files changed, 29 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/8973/4 -- To view, visit http://gerrit.cloudera.org:8080/8973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Gerrit-Change-Number: 8973 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal