[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers
Dan Hecht has posted comments on this change. Change subject: IMPALA-5238: transfer reservations between trackers .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6708/6/be/src/runtime/bufferpool/reservation-tracker.h File be/src/runtime/bufferpool/reservation-tracker.h: PS6, Line 232: subtrees > It's really trying to define rules that produce a total lock order for the I don't understand that, given that locks need to be acquired from bottom to top if their trackers have that relationship, no? I'll look at the new comment to see if it helps. -- To view, visit http://gerrit.cloudera.org:8080/6708 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP .. Patch Set 9: Code-Review+2 carry +2, rebased -- To view, visit http://gerrit.cloudera.org:8080/6526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP .. Patch Set 9: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/567/ -- To view, visit http://gerrit.cloudera.org:8080/6526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] Bump Kudu version to 7533364
Matthew Jacobs has submitted this change and it was merged. Change subject: Bump Kudu version to 7533364 .. Bump Kudu version to 7533364 Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27 Reviewed-on: http://gerrit.cloudera.org:8080/6797 Tested-by: Impala Public Jenkins Reviewed-by: Matthew Jacobs--- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6797 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] Bump Kudu version to 7533364
Matthew Jacobs has posted comments on this change. Change subject: Bump Kudu version to 7533364 .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6797 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] Bump Kudu version to 7533364
Impala Public Jenkins has posted comments on this change. Change subject: Bump Kudu version to 7533364 .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6797 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5238: transfer reservations between trackers .. Patch Set 6: (8 comments) http://gerrit.cloudera.org:8080/#/c/6708/6/be/src/runtime/bufferpool/reservation-tracker.cc File be/src/runtime/bufferpool/reservation-tracker.cc: PS6, Line 230: common_ancestor > this is named based on the state it holds after the loop, whereas the path_ Done Line 242: //trackers under 'other', down to 'this'. > in this case, common_ancestor == other, right? Done Line 245: //under 'this', down to 'other'. > likewise Done Line 260: locks.emplace_back(tracker->lock_); > missing braces Done Line 289: if (common_ancestor == this) { > isn't this case 2? Done PS6, Line 294: Case 2 > case 1? Done http://gerrit.cloudera.org:8080/#/c/6708/6/be/src/runtime/bufferpool/reservation-tracker.h File be/src/runtime/bufferpool/reservation-tracker.h: PS6, Line 213: bool > static, or make it a method on subtree1 Done PS6, Line 232: subtrees > is this about locking subtrees, or paths? It's really trying to define rules that produce a total lock order for the whole tree. Expanded the explanation. -- To view, visit http://gerrit.cloudera.org:8080/6708 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers
Tim Armstrong has uploaded a new patch set (#7). Change subject: IMPALA-5238: transfer reservations between trackers .. IMPALA-5238: transfer reservations between trackers This is a primitive needed to implement claiming and distribution of initial reservations. It supports transferring reservation between any two ReservationTrackers under the same query. Also remove the public DecreaseReservation() method, which is now mostly redundant and we have no plans to use. Testing: * Added a test that exercises transfer between trackers at different levels and with different relationships. Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 --- M be/src/runtime/bufferpool/reservation-tracker-test.cc M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/bufferpool/reservation-tracker.h 3 files changed, 293 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6708/7 -- To view, visit http://gerrit.cloudera.org:8080/6708 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5220: memory maintenance cleanup .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5220: memory maintenance cleanup .. IMPALA-5220: memory maintenance cleanup Remove logic that tries to release pages from TcMalloc's page heap: TCMalloc's behaviour changed so that it automatically does this with "aggressive decommit" and committed spans can't accumulate in the page heap. Also increase the memory maintenance interval - 1s is too aggressive and can free memory that will be imminently reused by a running query, e.g. particularly buffer pool buffers. Testing: Tried to reproduce IMPALA-2800 in a couple of ways: * Ran a big aggregation locally and cancelled it * Looked at memz/ of some live clusters (production and stress test). In all cases "Bytes in page heap freelist" was 0. This confirms that IMPALA-2800 was already solved, probably by the gperftools 2.5 upgrade, where aggressive decommit would mean that memory is released to the system in free() instead of the ReleaseFreeMemory() callst. I was able to confirm that the ReleaseFreeMemory() calls were unnecessary to avoid retaining memory by running a couple of stress tests locally with this patch and checking that "Bytes in page heap freelist" was 0 after the change and that memory consumption was generally sensible. Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 Reviewed-on: http://gerrit.cloudera.org:8080/6626 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins --- M be/src/common/init.cc M be/src/runtime/exec-env.cc M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/util/memory-metrics.h 5 files changed, 25 insertions(+), 89 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5169: Add support for async pins in buffer pool
Tim Armstrong has uploaded a new patch set (#11). Change subject: IMPALA-5169: Add support for async pins in buffer pool .. IMPALA-5169: Add support for async pins in buffer pool Makes Pin() do async reads behind-the-scenes, instead of blocking until the read completes. The blocking is done instead when the client tries to access the buffer via PageHandle::GetBuffer() or ExtractBuffer(). This is implemented with a new sub-state of "pinned" where the page has a buffer and consumes reservation but the buffer does not contain valid data. Motivation: This unlocks various opportunities to overlap read I/Os with other work: * Reads to different disks can execute in parallel * I/O and computation can be overlapped. This initially benefits BufferedTupleStream::PinStream(), where many pages are pinned at once. With this change the reads run asynchronously. This can potentially lead to large speedups when spilling. E.g. if the pages for a Hash Join's partition are spread across 10 disks, we could get 10x the read throughput, plus overlap the I/O with hash table build. In future we can use this to do read-ahead over unpinned BufferedTupleStreams or for unpinned Runs in Sorter, but this requires changes to the client code to Pin() pages in advance. Testing: * BufferedTupleStreamV2 already exercises this. * Various BufferPool tests already exercise this. * Added a basic test to cover edge cases made possible by the new state transitions. * Extended the randomised test to cover this. Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79 --- M be/src/runtime/buffered-tuple-stream-v2.cc M be/src/runtime/buffered-tuple-stream-v2.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h 8 files changed, 423 insertions(+), 155 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6612/11 -- To view, visit http://gerrit.cloudera.org:8080/6612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5169: Add support for async pins in buffer pool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5169: Add support for async pins in buffer pool .. Patch Set 10: (14 comments) http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: Line 411: RETURN_IF_ERROR(read_page_->handle.GetBuffer(_buffer)); > don't we have to set the flag here too? Good point. Moved the flag updating into a function in the Page object so it's easier to keep consistent. http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: PS10, Line 155: If the stream was previously unpinned, the page's data : /// may not yet be in memory - the stream must call GetBuffer() (which can block on : /// I/O or fail with an I/O error) once the data is needed. > This seems like more of an implementation detail hidden behind the BufferPo I think it's relevant that it can block on I/O or fail, but some of the details arent. PS10, Line 373: data_in_mem > the "in mem" is really a detail hidden behind the BP abstraction. How abou Done http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/bufferpool/buffer-pool-internal.h File be/src/runtime/bufferpool/buffer-pool-internal.h: PS10, Line 41: async_pin_in_flight > since all pins are async, how about just 'pin_in_flight'? Done PS10, Line 107: CancelAsyncPin > CancelPinRead(), or CancelPinReadIO, or CancelReadIO? i.e. this doesn't rea I inlined the logic into the callers (since it wouldn't be clear what this did aside from call CancelRead() otherwise). PS10, Line 236: UndoAsyncPin > How about sticking with the list-oriented names, so something like: UndoMov Done PS10, Line 241: async_write_in_flight > wrong name Done PS10, Line 247: write_ > same Done http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/bufferpool/buffer-pool.cc File be/src/runtime/bufferpool/buffer-pool.cc: Line 157: Status status = ExtractBuffer(client, handle, ); > if the read had already finished before CancelAsyncPin() and had an error, Yes, there's no way the error status could propagate. I think this should be more obvious with CancelAsyncPin() inlined. PS10, Line 384: UndoAsyncPin > Sticking with the list-oriented names, this is really UndoInFlightMoveToPin Done PS10, Line 387: CancelAsyncPin > page->CancelAsyncPin() isn't really canceling the pin, it's cancelling the Yeah more-or-less. Inlined CancelAsyncPin() to avoid having to name the thing. PS10, Line 393: it doesn't have valid data. > is that true? the read may have completed, right? I think this routine shou Added a comment up the top explaining what the intended end-state is. Hopeful that's clearer. http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/bufferpool/buffer-pool.h File be/src/runtime/bufferpool/buffer-pool.h: PS10, Line 413: > previously Done http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: PS10, Line 139: Even if the read fails, it is valid to call ReadAsync() again : /// after this returns. > what does that mean? Reworded to be a bit less cryptic, hopefully. I wanted to make it clear that it was valid to retry ReadAsync() even if the read failed. In a lot of other places it's implicitly assumed that things shouldn't be retried once an error occurs so I wanted to document that this was an exception. We don't depend on this right now since we just cancel reads when we don't care about result, but in an earlier iteration of the patch I was waiting on reads instead of cancelling them and it was important because we might retry them later. -- To view, visit http://gerrit.cloudera.org:8080/6612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention
Henry Robinson has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6707/3/www/query_plan.tmpl File www/query_plan.tmpl: PS3, Line 42: {{?plan_metadata_unavailable}} : Query plan information is not yet available since the planning hasn't finished. : {{/plan_metadata_unavailable}} : : {{^plan_metadata_unavailable}} Since the webpage refreshes every few seconds, it would be best to have this say something like "Plan not yet available. Page will update when query planning completes". -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention
Dan Hecht has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6707/2//COMMIT_MSG Commit Message: Line 22: and run parallel queries. > Load isn't relevant. The lock dependency is the problem, so as long as you And can you first just try to reproduce the problem manually by doing those steps: 1) Executes a query and simulates getting stuck in planning (loop forever). 2) see if the debug webpage is stuck. It should be since you're holding a QES lock. 3) Execute a second query. Now this query should not even be able to register since the webserver is holding the map lock (while waiting for the QES::lock of #1). http://gerrit.cloudera.org:8080/#/c/6707/3/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: PS3, Line 722: ate->query_status(); not sure if that's what we should put. Let's see what Henry suggests. -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention
Dan Hecht has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/6707/2//COMMIT_MSG Commit Message: Line 22: - As an exception, we don't grab QES::lock_ while the query planning is > Not much success on this front. I'm not able to repro the scenario via test Load isn't relevant. The lock dependency is the problem, so as long as you hold the map lock, you should be able to demonstrate the problem. Just to be clear, you tried to reproduce this using the test but without the fixes, right? http://gerrit.cloudera.org:8080/#/c/6707/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS3, Line 603: // For queries in CREATED state, the profile information isn't populated yet. : if (exec_state->query_state() == beeswax::QueryState::CREATED) return Status::OK(); > Wanted to double check if this change is required. Reason being, with this Is it possible to get to this method for a query that hasn't returned from ExecStatement() yet? Where does the query_id come from along the paths that call into here? Also, what info is in the profile up to this point? -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP .. Patch Set 8: Code-Review+2 (1 comment) Merged in related refactoring patch which also had a +2: https://gerrit.cloudera.org/#/c/6510/11 http://gerrit.cloudera.org:8080/#/c/6526/5/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 193: /// Nanoseconds are rounded to the nearest microsecond supported by Impala. Returns > i don't know when that's going to happen. at least leave a todo in the impl Done -- To view, visit http://gerrit.cloudera.org:8080/6526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6526 to look at the new patch set (#8). Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP .. IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP Adds Impala support for TIMESTAMP types stored in Kudu. Impala stores TIMESTAMP values in 96-bits and has nanosecond precision. Kudu's timestamp is a 64-bit microsecond delta from the Unix epoch (called UNIXTIME_MICROS), so a conversion is necessary. When writing to Kudu, TIMESTAMP values in nanoseconds are averaged to the nearest microsecond. When reading from Kudu, the KuduScanner returns UNIXTIME_MICROS with 8bytes of padding so Impala can convert the value to a TimestampValue in-line and copy the entire row. Testing: Updated the functional_kudu schema to use TIMESTAMPs instead of converting to STRING, so this provides some decent coverage. Some BE tests were added, and some EE tests as well. TODO: Support pushing down TIMESTAMP predicates TODO: Support TIMESTAMPs in range partitioning expressions Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d --- M be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/benchmarks/parse-timestamp-benchmark.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/parquet-column-readers.cc M be/src/exec/text-converter.inline.h M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/kudu-partition-expr.cc M be/src/exprs/literal.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.cc M be/src/runtime/raw-value-test.cc M be/src/runtime/runtime-state.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h A be/src/runtime/timestamp-value.inline.h M be/src/service/client-request-state.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore-subscriber.cc M be/src/util/bloom-filter.h M be/src/util/dict-test.cc M be/src/util/promise.h M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/util/KuduUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/datasets/functional/functional_schema_template.sql A testdata/workloads/functional-query/queries/QueryTest/kudu-overflow-ts-abort-on-error.test A testdata/workloads/functional-query/queries/QueryTest/kudu-overflow-ts.test M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test M tests/query_test/test_kudu.py M tests/query_test/test_queries.py 42 files changed, 704 insertions(+), 335 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6526/8 -- To view, visit http://gerrit.cloudera.org:8080/6526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention
Bharath Vissapragada has uploaded a new patch set (#5). Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention .. IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention Holding query_exec_state_map_lock_ and QES::lock_ together in certain paths could potentially block the impalad from registering new queries. The most common occurrence of this is while loading the webpage of a query while the query planning is still in progress. Since we hold the QES::lock_ during planning, it blocks the web page from loading which inturn blocks incoming queries by holding query_exec_state_map_lock_. While the actual fix for this is to not hold the QES::lock_ during metadata load, this patch doesn't do it since it could have some unknown side effects. Instead, - We make sure that QES::lock_ doesn't block query_exec_state_map_lock_ in hot paths. - As an exception, we don't grab QES::lock_ while the query planning is in progress. Also, this patch removes locking inside the GetQueryExecState() method. It is confusing that some callers choose to lock it inside the method and wait on it later. Now, the locking is the responsibility of the callers. Testing: Tested it locally by adding a long sleep in JniFrontend.createExecRequest() and still was able to refresh the web UI and run parallel queries. Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 --- M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A tests/custom_cluster/test_query_concurrency.py M www/query_plan.tmpl 7 files changed, 144 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/5 -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-5137: Support TIMESTAMPs in Kudu range predicate DDL
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/6849 Change subject: IMPALA-5137: Support TIMESTAMPs in Kudu range predicate DDL .. IMPALA-5137: Support TIMESTAMPs in Kudu range predicate DDL Adds support in DDL for timestamps in Kudu range partition syntax. Range bounds are converted to Kudu UNIXTIME_MICROS during analysis. Testing: Adds FE and EE tests. Change-Id: Iae409b6106c073b038940f0413ed9d5859daaeff --- M fe/src/main/java/org/apache/impala/analysis/RangePartition.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test 5 files changed, 153 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/6849/1 -- To view, visit http://gerrit.cloudera.org:8080/6849 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iae409b6106c073b038940f0413ed9d5859daaeff Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5137: Support pushing TIMESTAMP predicates to Kudu
Matthew Jacobs has uploaded a new patch set (#2). Change subject: IMPALA-5137: Support pushing TIMESTAMP predicates to Kudu .. IMPALA-5137: Support pushing TIMESTAMP predicates to Kudu This change builds on the support for reading and writing TIMESTAMP columns to Kudu tables (see [1]), adding support for pushing TIMESTAMP predicates to Kudu for scans. Binary predicates and IN list predicates are supported. Testing: Added some planner and EE tests to validate the behavior. 1: https://gerrit.cloudera.org/#/c/6526/ Change-Id: I08b6c8354a408e7beb94c1a135c23722977246ea --- M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test 7 files changed, 157 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6789/2 -- To view, visit http://gerrit.cloudera.org:8080/6789 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08b6c8354a408e7beb94c1a135c23722977246ea Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention
Bharath Vissapragada has uploaded a new patch set (#4). Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention .. IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention Holding query_exec_state_map_lock_ and QES::lock_ together in certain paths could potentially block the impalad from registering new queries. The most common occurrence of this is while loading the webpage of a query while the query planning is still in progress. Since we hold the QES::lock_ during planning, it blocks the web page from loading which inturn blocks incoming queries by holding query_exec_state_map_lock_. While the actual fix for this is to not hold the QES::lock_ during metadata load, this patch doesn't do it since it could have some unknown side effects. Instead, - We make sure that QES::lock_ doesn't block query_exec_state_map_lock_ in hot paths. - As an exception, we don't grab QES::lock_ while the query planning is in progress. Also, this patch removes locking inside the GetQueryExecState() method. It is confusing that some callers choose to lock it inside the method and wait on it later. Now, the locking is the responsibility of the callers. Testing: Tested it locally by adding a long sleep in JniFrontend.createExecRequest() and still was able to refresh the web UI and run parallel queries. Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 --- M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A tests/custom_cluster/test_query_concurrency.py M www/query_plan.tmpl 7 files changed, 139 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/4 -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers
Dan Hecht has posted comments on this change. Change subject: IMPALA-5238: transfer reservations between trackers .. Patch Set 6: (8 comments) http://gerrit.cloudera.org:8080/#/c/6708/6/be/src/runtime/bufferpool/reservation-tracker.cc File be/src/runtime/bufferpool/reservation-tracker.cc: PS6, Line 230: common_ancestor this is named based on the state it holds after the loop, whereas the path_to_* are named based on the state before the loop. I'd raname those to path_to_common or something to be consistent. Line 242: //trackers under 'other', down to 'this'. in this case, common_ancestor == other, right? Line 245: //under 'this', down to 'other'. likewise Line 260: locks.emplace_back(tracker->lock_); missing braces Line 289: if (common_ancestor == this) { isn't this case 2? PS6, Line 294: Case 2 case 1? http://gerrit.cloudera.org:8080/#/c/6708/6/be/src/runtime/bufferpool/reservation-tracker.h File be/src/runtime/bufferpool/reservation-tracker.h: PS6, Line 213: bool static, or make it a method on subtree1 PS6, Line 232: subtrees is this about locking subtrees, or paths? -- To view, visit http://gerrit.cloudera.org:8080/6708 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 23: (10 comments) Thanks for the review, please see PS24. http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 336: nonterminal ArrayList opt_ident_list, opt_sort_cols; > opt_sort_by_clause or opt_sort_cols? "sort by columns" sounds odd. Done http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java: Line 43 > drop "columns", the statement doesn't include a COLUMNS Done Line 73 > rephrase "sort by columns" as "sort columns" universally Done Line 91 > maybe move this into TableDef as well? it seems kind of arbitrary to put it Done. This has moved a couple of times now. :) http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 863: // TODO: Remove this when removing the sortby() hint (IMPALA-5157). > is this going to happen soon? Yes, immediately after this change makes it in. We should remove the hint before releasing 2.9, since it hasn't been released yet. We could remove it in this change at the risk of the change becoming even larger than it already is. http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 46: > update Done Line 343: if (options_.location != null) { > why final? To express that it's just a shorter name for the constant. Removed it. http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: Line 75: final String sortByKey = AlterTableSortByStmt.TBL_PROP_SORT_COLUMNS; > it feels like that constant should live somewhere else I've discussed that with Alex and Dimitris and we no better place has emerged, though I've tried some of them. It's related to the sort by clause, similar to how TBL_PROP_SKIP_HEADER_LINE_COUNT lives in HdfsTable.java. Do you have a preference where it should go? I also thought of TableDef (where the analysis happens) or AlterTableSetTblProperties (where we deal with properties). http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 456:* the caller must make sure that the value matches any columns that were added to the > or "that were added to the table" to avoid the gender reference :) Done http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 101: addTestTable("create table test_sort_by.t (id int, int_col int, " + > why call this alltypes? it clearly doesn't contain all types. Done -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention .. Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/6707/2//COMMIT_MSG Commit Message: PS2, Line 7: yExecState::lock_ contention > let's not actually solve that problem right now. it doesn't seem necessary Done Line 11: The most common occurrence of this is while loading the webpage of a query > Yes, but please double check this is safe. From code inspection, I believe Ran a stress test and its green (Thanks Mike B.). Also I checked the callers manually and it looked ok to me too. Line 18: side effects. Instead, > As stated above, let's not do that. Done Line 22: - As an exception, we don't grab QES::lock_ while the query planning is > As we discussed, please see if you can add a test that: Not much success on this front. I'm not able to repro the scenario via tests. I added code to inject sleep simulating table loading in the exec request and loops that keep polling the web UI, but I'm not able to get the test failing on the asf-master (without this patch). May be I'm not generating enough load for lock contention to happen. Thoughts? (I've included the test in PS2). http://gerrit.cloudera.org:8080/#/c/6707/2/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: Line 197: { > Let's take the QES::lock_ here and not do the promise thing. Done. http://gerrit.cloudera.org:8080/#/c/6707/2/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: Line 741: lock_guard l(*exec_state->lock()); > same Done http://gerrit.cloudera.org:8080/#/c/6707/2/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: Line 715: if (exec_state.get() != nullptr) { > please fill in those details. Done http://gerrit.cloudera.org:8080/#/c/6707/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS2, Line 814: : (*exec_state)->query_events()->MarkEvent("Query submitted"); : Changes undone as discussed. http://gerrit.cloudera.org:8080/#/c/6707/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS3, Line 603: // For queries in CREATED state, the profile information isn't populated yet. : if (exec_state->query_state() == beeswax::QueryState::CREATED) return Status::OK(); Wanted to double check if this change is required. Reason being, with this change, the profile doesn't show up on the web UI till the metadata is loaded. That might affect some users/clients relying on the web UI profiles to track the query state? http://gerrit.cloudera.org:8080/#/c/6707/2/be/src/service/impala-server.h File be/src/service/impala-server.h: PS2, Line 296: : /// goto http://msdn.microsoft.com/en-us/library/ms714687.aspx > I think this comment just adds confusion. It's up to the QueryExecState cl Done http://gerrit.cloudera.org:8080/#/c/6707/2/be/src/service/query-exec-state.h File be/src/service/query-exec-state.h: PS2, Line 244: > we won't need that. Done -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5137: pt1, Refactor TimestampValue constructors
Matthew Jacobs has abandoned this change. Change subject: IMPALA-5137: pt1, Refactor TimestampValue constructors .. Abandoned merging with https://gerrit.cloudera.org/#/c/6526/ -- To view, visit http://gerrit.cloudera.org:8080/6510 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Id25e19f7984e5ebf9073d9c569faf69cec142fa1 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention
Bharath Vissapragada has uploaded a new patch set (#3). Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention .. IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention Holding query_exec_state_map_lock_ and QES::lock_ together in certain paths could potentially block the impalad from registering new queries. The most common occurrence of this is while loading the webpage of a query while the query planning is still in progress. Since we hold the QES::lock_ during planning, it blocks the web page from loading which inturn blocks incoming queries by holding query_exec_state_map_lock_. While the actual fix for this is to not hold the QES::lock_ during metadata load, this patch doesn't do it since it could have some unknown side effects. Instead, - We make sure that QES::lock_ doesn't block query_exec_state_map_lock_ in hot paths. - As an exception, we don't grab QES::lock_ while the query planning is in progress. Also, this patch removes locking inside the GetQueryExecState() method. It is confusing that some callers choose to lock it inside the method and wait on it later. Now, the locking is the responsibility of the callers. Testing: Tested it locally by adding a long sleep in JniFrontend.createExecRequest() and still was able to refresh the web UI and run parallel queries. Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 --- M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A tests/custom_cluster/test_query_concurrency.py M www/query_plan.tmpl 7 files changed, 139 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/3 -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#24). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); Sort columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying sort columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the sort columns. We do this because sort columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. Sort columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 39 files changed, 1,401 insertions(+), 106 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/24 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 24 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5169: Add support for async pins in buffer pool
Dan Hecht has posted comments on this change. Change subject: IMPALA-5169: Add support for async pins in buffer pool .. Patch Set 10: (14 comments) This public abstraction looks good now. Mostly some renaming suggestions to clarify the internal interfaces. http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: Line 411: RETURN_IF_ERROR(read_page_->handle.GetBuffer(_buffer)); don't we have to set the flag here too? http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: PS10, Line 155: If the stream was previously unpinned, the page's data : /// may not yet be in memory - the stream must call GetBuffer() (which can block on : /// I/O or fail with an I/O error) once the data is needed. This seems like more of an implementation detail hidden behind the BufferPool abstraction. Does the BTS or its clients really care about that? PS10, Line 373: data_in_mem the "in mem" is really a detail hidden behind the BP abstraction. How about "retrieved_buffer", or "referenced_buffer", or "buffer_referenced"? http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/bufferpool/buffer-pool-internal.h File be/src/runtime/bufferpool/buffer-pool-internal.h: PS10, Line 41: async_pin_in_flight since all pins are async, how about just 'pin_in_flight'? PS10, Line 107: CancelAsyncPin CancelPinRead(), or CancelPinReadIO, or CancelReadIO? i.e. this doesn't really do anything about pinning (that's handled by the impl_ level). PS10, Line 236: UndoAsyncPin How about sticking with the list-oriented names, so something like: UndoMoveEvictedToPinned() or UndoInFlightMoveToPinned()? PS10, Line 241: async_write_in_flight wrong name PS10, Line 247: write_ same http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/bufferpool/buffer-pool.cc File be/src/runtime/bufferpool/buffer-pool.cc: Line 157: Status status = ExtractBuffer(client, handle, ); if the read had already finished before CancelAsyncPin() and had an error, does this still always return ok? PS10, Line 384: UndoAsyncPin Sticking with the list-oriented names, this is really UndoInFlightMoveToPinned(), right? PS10, Line 387: CancelAsyncPin page->CancelAsyncPin() isn't really canceling the pin, it's cancelling the read I/O, right? This routine is really the thing that "cancels" the in-progress pinning. PS10, Line 393: it doesn't have valid data. is that true? the read may have completed, right? I think this routine should be explained more in terms of the write handle (scratch disk space) is still guaranteed to reflect the page contents. The buffer is in an unknown state (may or may not have the page's data) but a reference to it definitely didn't escape). http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/bufferpool/buffer-pool.h File be/src/runtime/bufferpool/buffer-pool.h: PS10, Line 413: previously http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: PS10, Line 139: Even if the read fails, it is valid to call ReadAsync() again : /// after this returns. what does that mean? -- To view, visit http://gerrit.cloudera.org:8080/6612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Bump Kudu version to 7533364
Impala Public Jenkins has posted comments on this change. Change subject: Bump Kudu version to 7533364 .. Patch Set 4: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/566/ -- To view, visit http://gerrit.cloudera.org:8080/6797 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#23). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); Sort columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying sort columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the sort columns. We do this because sort columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. Sort columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 39 files changed, 1,405 insertions(+), 107 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/23 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM On jenkins.impala.io gerrit-verify-dryrun jobs, free-pool-test started running out of memory after updating Kudu to a newer version which can have slightly higher memory requirements (Kudu will reject writes when mem is 80% full rather than 60% full). free-pool-test can allocate up to 12gb, and the VMs have a CommitLimit of only 16gb. While larger VMs could be used, or the minicluster could be tuned further, free-pool-test can also be modified to reduce the actual RSS usage. Instead of memsetting the entire allocation in the test, we only scribble the first byte. This reduces the max RSS from 14gb to 88mb in some local tests. Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Reviewed-on: http://gerrit.cloudera.org:8080/6842 Reviewed-by: Dan HechtTested-by: Impala Public Jenkins --- M be/src/runtime/free-pool-test.cc 1 file changed, 3 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-4499: Table name missing from exec summary
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4499: Table name missing from exec summary .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4499: Table name missing from exec summary
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4499: Table name missing from exec summary .. IMPALA-4499: Table name missing from exec summary For scan nodes, previously only HDFS tables showed the name of the table in the 'Detail' section for the scan node. This change adds the table name for all scan node types (Kudu, HBase, and DataSource). Testing: - Added an e2e test in test_observability. Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324 Reviewed-on: http://gerrit.cloudera.org:8080/6832 Reviewed-by: Thomas Tauber-MarshallReviewed-by: Alex Behm Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M tests/query_test/test_observability.py 3 files changed, 38 insertions(+), 14 deletions(-) Approvals: Impala Public Jenkins: Verified Thomas Tauber-Marshall: Looks good to me, but someone else must approve Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5294: Kudu INSERT partitioning fails with constants
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5294: Kudu INSERT partitioning fails with constants .. IMPALA-5294: Kudu INSERT partitioning fails with constants An INSERT into a Kudu table with a constant value being inserted into a partition column causes an IllegalStateExcaption. This is because DistributedPlanner removes constants from the list of partition exprs before creating the KuduPartitionExpr, but KuduPartitionExpr expects to get one expr per partition column. The fix is to pass the full list of partition exprs into the KuduPartitionExpr, instead of the list that has had constants removed. This preserves the behavior that if all of the partition exprs are constant we fall back to UNPARTITIONED. One complication is that if a partition expr is a NullLiteral, it must be cast to a specific type to be passed to the BE. The InsertStmt will cast the partition exprs to the partition column types, but these casts may be lost from the copies of the partition exprs stored by the KuduPartitionExpr during reset(). To fix this, the KuduPartitionExpr can store the types of the partition cols and recast the partition exprs to those types during analyze(). Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9 Reviewed-on: http://gerrit.cloudera.org:8080/6828 Reviewed-by: Thomas Tauber-MarshallTested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 3 files changed, 47 insertions(+), 3 deletions(-) Approvals: Impala Public Jenkins: Verified Thomas Tauber-Marshall: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5294: Kudu INSERT partitioning fails with constants
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5294: Kudu INSERT partitioning fails with constants .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2
Tim Armstrong has uploaded a new patch set (#4). Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2 .. IMPALA-5085: large rows in BufferedTupleStreamV2 The stream defaults to pages of default_page_len_. If a row doesn't fit in that page, it will allocate another page up to max_page_len_ bytes and append a single row to that page, then immediately advance to the next page. This means that when writing a stream, the large page only needs to be kept in memory temporarily, which helps with memory requirements. E.g. consider a hash join that is repartitioning 1 unpinned stream into 16 unpinned streams. We will need default_page_len_ * 15 + max_page_len_ * 2 bytes of reservation because when processing a large row we only need one large write buffer at a time. The large row cases are not as optimised for memory consumption or performance - queries processing very large numbers of large rows are an extreme edge case that is likely to hit other performance bottlenecks first. Pages with large rows can have up to 50% internal fragmentation. I tried to avoid adding empty pages to the stream, but it was tricky to avoid in the case of a read/write stream where the read iterator was already pointing at the write page - in that case we can end up with an empty page. To avoid duplicating more logic between AddRow() and AllocateRow() I restructured things so that AddRowSlow() is implemented in terms of AllocateRowSlow(). AllocateRow() now takes a function as an argument to populate the row. Testing: * Added tests for the case where 0 rows are added to the stream * Extend BigRow to exercise the new code. * Also test large strings and read/write streams. Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5 --- M be/src/runtime/buffered-tuple-stream-v2-test.cc M be/src/runtime/buffered-tuple-stream-v2.cc M be/src/runtime/buffered-tuple-stream-v2.h M be/src/runtime/buffered-tuple-stream-v2.inline.h M common/thrift/generate_error_codes.py 5 files changed, 544 insertions(+), 235 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6638/4 -- To view, visit http://gerrit.cloudera.org:8080/6638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2 .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/6638/3//COMMIT_MSG Commit Message: PS3, Line 16: buffers > buffer bytes? bytes of reservation? Done PS3, Line 16: 14 > why is that not 15 (15 pages for writing to the partitions that aren't the Done http://gerrit.cloudera.org:8080/#/c/6638/3/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: PS3, Line 95: other > aren't there n-1 other streams? Done PS3, Line 212: max_page_len > how will that be determined? This is just adding the mechanism, so policy is TBD. We'd discussed adding a query option to specify the maximum row size. PS3, Line 254: stream could not increase the reservation : /// sufficiently > does it try this for large rows, or not? above it says the caller must have I'm not sure where the "above" is. This is true regardless for the pinned case, because there's no guarantee or requirement that we can extend a stream with a pinned page. I tried to clarify further on line 262 the behaviour with a large row and an unpinned stream. PS3, Line 280: const WriteRowFn& write_fn > why is this part exposed outside BTS2? i thought this was just to reuse cod AllocateRow() is still used by PAGG. The motivation for this choice is partially to avoid duplication, but also the AllocateRow() interface needed to change regardless because we need to advance to the next page immediately after the row is written. The old interface assumes that nothing needs to be done after the caller writes the row into the stream. The alternative is to have the code duplication and expose a pair of methods like AllocateRow()/AllocateRowDone(), with PAGG responsible for calling AllocateRowDone() after it writes the data. Line 466: /// hard limit on the maximum size of row that can be stored in the stream. > motivation for this limit? Done -- To view, visit http://gerrit.cloudera.org:8080/6638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
anujphadke has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6527/2//COMMIT_MSG Commit Message: PS2, Line 12: ProcessSpit nit: typo http://gerrit.cloudera.org:8080/#/c/6527/2/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: Line 34: #include "util/pretty-printer.h" unused. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5238: transfer reservations between trackers .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/6708/5/be/src/runtime/bufferpool/reservation-tracker.cc File be/src/runtime/bufferpool/reservation-tracker.cc: Line 237: DCHECK_EQ(path_to_root.back(), other_path_to_root.back()); > maybe first dcheck they both aren't empty to support calling back() Seems like overkill to me since FindPathToRoot() always includes the current tracker by definition. I can add it if you feel strongly. PS5, Line 253: > nit: indentation mismatch Done Line 274: if (i == other_path_to_root.size() - 1 && curr->parent_ != nullptr) { > this path could use a quick explanation We actually don't need it - it was only needed to offset an increment in DecreaseReservationInternal(). I was able to remove this and the increment. Line 278: } > why is it okay to add the reservation into the other chain before subtracti There is a race if you do that but the bug would be that the caller was misusing the API and creating a semantic race. E.g. if two threads call IncreaseReservationToFit() and TransferReservationTo() concurrently, and TransferReservationTo() runs second, the post-condition of IncreaseReservationToFit() can become false. Or if AllocateFrom(x) and TransferReservationTo(y) are called concurrently and the current reservation is < x - y, then the one that runs second will DCHECK. It's safe in that a concurrent thread can't get memory that it wouldn't be entitled to, because we did the increase before the decrease. A concurrent thread could probably notice an anomaly if it was looking for one, but I can't think of any scenario where that is actually a problem in practice. It is tricky to reason about - I tried augmenting the lock ordering so that we can lock everything at the same time and therefore fully guarantee atomicity. Seemed to work out ok so going with that now. Line 281: DecreaseReservation(bytes, false, path_to_root.back()); > is this always safe while holding 'locks'? Why can't we get into a deadloc Yeah that was violating the lock order - I removed the scoped block without thinking about that. http://gerrit.cloudera.org:8080/#/c/6708/5/be/src/runtime/bufferpool/reservation-tracker.h File be/src/runtime/bufferpool/reservation-tracker.h: PS5, Line 194: Decrase > typo Done -- To view, visit http://gerrit.cloudera.org:8080/6708 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers
Tim Armstrong has uploaded a new patch set (#6). Change subject: IMPALA-5238: transfer reservations between trackers .. IMPALA-5238: transfer reservations between trackers This is a primitive needed to implement claiming and distribution of initial reservations. It supports transferring reservation between any two ReservationTrackers under the same query. Also remove the public DecreaseReservation() method, which is now mostly redundant and we have no plans to use. Testing: * Added a test that exercises transfer between trackers at different levels and with different relationships. Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 --- M be/src/runtime/bufferpool/reservation-tracker-test.cc M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/bufferpool/reservation-tracker.h 3 files changed, 282 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6708/6 -- To view, visit http://gerrit.cloudera.org:8080/6708 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5167: Reduce the number of Kudu clients created .. Patch Set 4: Code-Review+1 (2 comments) > How about a rebase? Done http://gerrit.cloudera.org:8080/#/c/6792/3/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: PS3, Line 403: std::vector sorted_master_addresses(master_addresses); : sort(sorted_master_addresses.begin(), sorted_master_addresses.end()); : std::string master_addr_concat = join(sorted_master_addresses, ","); : lock_guard l(kudu_client_map_lock_); > can't you just use the copy constructor? Done http://gerrit.cloudera.org:8080/#/c/6792/3/be/src/runtime/exec-env.h File be/src/runtime/exec-env.h: PS3, Line 193: across > spelling Done -- To view, visit http://gerrit.cloudera.org:8080/6792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6792 to look at the new patch set (#4). Change subject: IMPALA-5167: Reduce the number of Kudu clients created .. IMPALA-5167: Reduce the number of Kudu clients created Creating Kudu clients is very expensive as each will fetch metadata from the Kudu master, so we should minimize the number of Kudu clients that get created. This patch stores a map from Kudu master addressed to Kudu clients in the ExecEnv to be used across the BE for all queries, and a map in KuduUtil to be used acress the FE and catalog for all queries. This relies on changes on the Kudu side that clear non-covered range entries from the client's cache on table open (fdc022fe6231af20e307012d98c35b16cbfa7b33 and d07ecd6ded01201c912d2e336611a6a941f48d98) Testing: - Ran a stress test locally: scan of a Kudu table, 100 concurrent queries, load on the Kudu master was reduced signficantly, from ~25% cpu to ~4% cpu. TODO: perf testing on a cluster. - Ran the Kudu e2e tests. - Manually ran a test with concurrent INSERTs and 'ALTER TABLE ADD PARTITION' (which is affected by the Kudu side change mentiond above) and verified correctness. Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705 --- M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/exprs/kudu-partition-expr.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java 12 files changed, 114 insertions(+), 91 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/6792/4 -- To view, visit http://gerrit.cloudera.org:8080/6792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: Line 156: friend class AndPredicate; > The goal is to hide both Init() and Get*Val() from irrelevant classes to en Removed AggregationNode and PartitionedAggregationNode from friend class list in PS13. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 6: (6 comments) quick comments http://gerrit.cloudera.org:8080/#/c/6478/5/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 92: initial_ranges_issued_(false), you can also initialize that inline in the .h file http://gerrit.cloudera.org:8080/#/c/6478/5/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 105: std::unique_ptr fh_; no trailing _ for structs http://gerrit.cloudera.org:8080/#/c/6478/6/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 116: MapType cache_; provide comments for variables, in particular invariants (e.g., it looks like the lru list only contains unused handles) http://gerrit.cloudera.org:8080/#/c/6478/5/be/src/runtime/disk-io-mgr-handle-cache.inline.h File be/src/runtime/disk-io-mgr-handle-cache.inline.h: Line 33: if (hdfs_file_ != nullptr && fs_ != nullptr) { could these be null? http://gerrit.cloudera.org:8080/#/c/6478/6/be/src/runtime/disk-io-mgr-handle-cache.inline.h File be/src/runtime/disk-io-mgr-handle-cache.inline.h: Line 61: int index = hash()(key) % NUM_PARTITIONS; also: use something out of util/hash-util.h http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1309 > I use this as the key for the map, so switching the hash alone wouldn't eli probably a good idea, it's unlikely that we'll see a lot of overwritten files. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5220: memory maintenance cleanup .. Patch Set 11: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/565/ -- To view, visit http://gerrit.cloudera.org:8080/6626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5220: memory maintenance cleanup .. Patch Set 11: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/6626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5220: memory maintenance cleanup .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#6). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. Scan ranges that are accessing data cached by Hdfs will get a file handle from the cache, but the file handle will be kept on the scan range for the time that the scan range is in use. This prevents the cache from closing the file handle while the data buffer is in use. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0, file handle caching is off and the previous behavior applies. Tests: test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". In the delete case, we expect zero rows. In the move case, we expect the same number of rows. TODO: 1. Determine appropriate defaults. 2. Other tests a. File overwrite b. Any others? 3. For scan ranages that use Hdfs caching, should there be some sharing at the scanner level? Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/query_test/test_hdfs_fd_caching.py 11 files changed, 679 insertions(+), 368 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/6 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 4: (4 comments) Fixed the test comments and fixed some errors in the code found by other existing tests (custom_cluster/test_hdfs_fd_caching.py tests the metrics). http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1331: ImpaladMetrics::IO_MGR_NUM_CACHED_FILE_HANDLES->Increment(-1L); > I'm splitting hairs about the exact meaning of cached. I found an existing test that was checking the metrics, and I've changed this code to match what that test was expecting. IO_MGR_NUM_FILE_HANDLES_OUTSTANDING is the number of file handles in use. IO_MGR_NUM_CACHED_FILE_HANDLES now is the count of all files handles. It would also make sense for this to exclude the oustanding file handles, but right now, this counts all file handles. http://gerrit.cloudera.org:8080/#/c/6478/4/tests/query_test/test_hdfs_fd_caching.py File tests/query_test/test_hdfs_fd_caching.py: Line 76: # File deleted: either # rows is the same or # rows = 0 > we don't make any guarantees for this, so might as well not check the resul Changed to not check the results. Line 98: new_table_location = "{0}/test-warehouse/{1}".format(FILESYSTEM_PREFIX, unique_database) > long lines Done Line 105: # Delete the whole directory (including data file) > this function is pretty much identical to the one above (and below), except Done -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6812/1/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test File testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test: Line 34: | output: sum_zero_if_empty(functional_parquet.alltypes.parquet-stats: num_rows) > Personally, I prefer to show what is actually being executed in the explain sum_zero_if_empty sounds pretty obscure, what is that supposed to mean? -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 22: (13 comments) http://gerrit.cloudera.org:8080/#/c/6495/22/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: Line 382: // Optional list of sort by columns for the new table. If specified, these will override "sort columns" or "sorting columns" (as in: partition columns) Line 385: 9: optional list sort_by_columns sort_columns Line 439: 16: optional list sort_by_columns same http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 336: nonterminal ArrayList opt_ident_list, opt_sort_by_cols; opt_sort_by_clause or opt_sort_cols? "sort by columns" sounds odd. http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java: Line 43: public class AlterTableSortByColumnsStmt extends AlterTableStmt { drop "columns", the statement doesn't include a COLUMNS Line 73: // Disallow setting sort by columns on HBase and Kudu tables. rephrase "sort by columns" as "sort columns" universally Line 91: public static void analyzeSortByColumns(List sortByCols, Table table) maybe move this into TableDef as well? it seems kind of arbitrary to put it here. http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 863: // TODO: Remove this when removing the sortby() hint (IMPALA-5157). is this going to happen soon? http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 46: * correspond to the following clauses in a CREATE TABLE statement: update Line 343: final String sortByKey = AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS; why final? http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: Line 75: final String sortByKey = AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS; it feels like that constant should live somewhere else http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 456:* the caller must make sure that the value matches any columns he/she added to the or "that were added to the table" to avoid the gender reference :) http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 101: addTestTable("create table test_sort_by.alltypes (id int, int_col int, " + why call this alltypes? it clearly doesn't contain all types. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4499: Table name missing from exec summary
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4499: Table name missing from exec summary .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/564/ -- To view, visit http://gerrit.cloudera.org:8080/6832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5294: Kudu INSERT partitioning fails with constants
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5294: Kudu INSERT partitioning fails with constants .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/563/ -- To view, visit http://gerrit.cloudera.org:8080/6828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/562/ -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. Patch Set 3: fwiw the jenkins nodes (both public and private) appear to have vm_overcommit 0 ubuntu@ip-172-31-7-2:~$ cat /proc/sys/vm/overcommit_memory 0 -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-5167: Reduce the number of Kudu clients created .. Patch Set 3: How about a rebase? -- To view, visit http://gerrit.cloudera.org:8080/6792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Dan Hecht has posted comments on this change. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. Patch Set 3: Code-Review+2 > I measured the max rss w/ this change to be 88mb, down from 14gb is > what I observed with the old code. it might still fail with vm_overcommit = 2 (https://www.kernel.org/doc/Documentation/vm/overcommit-accounting) but I agree this is the best way out. -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers
Dan Hecht has posted comments on this change. Change subject: IMPALA-5238: transfer reservations between trackers .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/6708/5/be/src/runtime/bufferpool/reservation-tracker.cc File be/src/runtime/bufferpool/reservation-tracker.cc: Line 237: DCHECK_EQ(path_to_root.back(), other_path_to_root.back()); maybe first dcheck they both aren't empty to support calling back() PS5, Line 253: nit: indentation mismatch Line 274: if (i == other_path_to_root.size() - 1 && curr->parent_ != nullptr) { this path could use a quick explanation Line 278: } why is it okay to add the reservation into the other chain before subtracting it from this chain? i guess we are assuming that this function does not execute concurrently with anything that might consume reservations against the hierarchy, right? is that documented? Line 281: DecreaseReservation(bytes, false, path_to_root.back()); is this always safe while holding 'locks'? Why can't we get into a deadlock if doing a->Transfer(b) and b->Transfer(a) concurrently? http://gerrit.cloudera.org:8080/#/c/6708/5/be/src/runtime/bufferpool/reservation-tracker.h File be/src/runtime/bufferpool/reservation-tracker.h: PS5, Line 194: Decrase typo -- To view, visit http://gerrit.cloudera.org:8080/6708 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Hello Michael Ho, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6842 to look at the new patch set (#3). Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM On jenkins.impala.io gerrit-verify-dryrun jobs, free-pool-test started running out of memory after updating Kudu to a newer version which can have slightly higher memory requirements (Kudu will reject writes when mem is 80% full rather than 60% full). free-pool-test can allocate up to 12gb, and the VMs have a CommitLimit of only 16gb. While larger VMs could be used, or the minicluster could be tuned further, free-pool-test can also be modified to reduce the actual RSS usage. Instead of memsetting the entire allocation in the test, we only scribble the first byte. This reduces the max RSS from 14gb to 88mb in some local tests. Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 --- M be/src/runtime/free-pool-test.cc 1 file changed, 3 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/6842/3 -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. Patch Set 2: I measured the max rss w/ this change to be 88mb, down from 14gb is what I observed with the old code. -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Michael Ho has posted comments on this change. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Michael Ho has posted comments on this change. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. Patch Set 2: Looks good. I will let Dan +2 it. -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Matthew Jacobs has uploaded a new patch set (#2). Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM On jenkins.impala.io gerrit-verify-dryrun jobs, free-pool-test started running out of memory after updating Kudu to a newer version which can have slightly higher memory requirements (Kudu will reject writes when mem is 80% full rather than 60% full). free-pool-test can allocate up to 12gb, and the VMs have a CommitLimit of only 16gb. While larger VMs could be used, or the minicluster could be tuned further, it doesn't seem like free-pool-test needs to be testing such large allocations. Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 --- M be/src/runtime/free-pool-test.cc 1 file changed, 3 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/6842/2 -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6842/1/be/src/runtime/free-pool-test.cc File be/src/runtime/free-pool-test.cc: PS1, Line 132: 1LL > Will 2LL be too much ? I was hoping we could exercise the code for 64-bit a we can keep this as is PS1, Line 138: memset(p1, 1, size); > Should be sufficient to replace it with *p1 = 1; Good idea, this works well -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5297: Set Kudu minicluster memory limit
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5297: Set Kudu minicluster memory limit .. Patch Set 1: posted to test gerrit-verify-dryrun -- To view, visit http://gerrit.cloudera.org:8080/6844 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5297: Set Kudu minicluster memory limit
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/6844 Change subject: IMPALA-5297: Set Kudu minicluster memory limit .. IMPALA-5297: Set Kudu minicluster memory limit By default, Kudu assumes it has 80% of system memory which is far too high for the minicluster. This sets a mem limit of 1gb. Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5 --- M testdata/cluster/node_templates/cdh5/etc/init.d/kudu-common 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6844/1 -- To view, visit http://gerrit.cloudera.org:8080/6844 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Michael Ho has posted comments on this change. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6842/1/be/src/runtime/free-pool-test.cc File be/src/runtime/free-pool-test.cc: PS1, Line 138: memset(p1, 1, size); > That's probably true, depending on vm_overcommit setting. But given that we Should be sufficient to replace it with *p1 = 1; -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. Patch Set 10: Code-Review+2 (1 comment) We should test this on a range of Linux distributions (SLES, Centos, Ubuntu, etc) before committing just in case the OpenSSL script change causes any issues. http://gerrit.cloudera.org:8080/#/c/5715/10/be/src/kudu/util/kudu_export.h File be/src/kudu/util/kudu_export.h: Line 1: Apache header? -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Dan Hecht has posted comments on this change. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6842/1/be/src/runtime/free-pool-test.cc File be/src/runtime/free-pool-test.cc: PS1, Line 138: memset(p1, 1, size); > I wonder if we will be less prone to the problem if we don't actually memse That's probably true, depending on vm_overcommit setting. But given that we are seeing the OOM killer, yes, this would probably be an alternate workaround. -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Henry Robinson has posted comments on this change. Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. Patch Set 10: Thanks for the review, Tim - do you have any further comments? -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Michael Ho has posted comments on this change. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6842/1/be/src/runtime/free-pool-test.cc File be/src/runtime/free-pool-test.cc: PS1, Line 132: 1LL Will 2LL be too much ? I was hoping we could exercise the code for 64-bit allocation size. PS1, Line 138: memset(p1, 1, size); I wonder if we will be less prone to the problem if we don't actually memset() the entire chunk of memory. -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5238: transfer reservations between trackers .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/6708/3//COMMIT_MSG Commit Message: PS3, Line 7: 5239 > wrong number Done http://gerrit.cloudera.org:8080/#/c/6708/3/be/src/runtime/bufferpool/reservation-tracker.cc File be/src/runtime/bufferpool/reservation-tracker.cc: PS3, Line 204: void ReservationTracker::DecreaseReservation(int64_t bytes) { > does this interface still make sense? i.e. do we need both this and Transfe We still need this internally to implement Close() and TransferReservationTo(). This does do something distinct from either - it releases reservation on all ancestors without closing it. TransferReservationTo() could only be used to release reservation up to the root. I don't that is really necessary though for now, so I removed the public interface and updated tests to use alternative APIs. PS3, Line 253: * 'other' is a descendent of 'this'. 'path_to_root' is empty because 'this' is the : //lowest common ancestor. To transfer, we increase the reservation on the trackers : //under 'this', down to 'other'. > isn't this the only case we need for the distribution of initial reservatio The third case is the one we need to distribute reservation - the initial reservation can't be in the query root ReservationTracker since then there's no way to prevent it being used for non-initial reservations. It needs to be in a separate "initial reservation" tracker that is a child of the query tracker (e.g. the distribution would be the same as "aunt"->"child" in the unit test). Now that I think about it, I think the underlying problem is the same as the scenario you mention - some ExecNodes won't claim their reservation until after the query has started executing, which means that in the meantime the ExecNodes initial reservation needs to be stored somewhere, then transferred. http://gerrit.cloudera.org:8080/#/c/6708/3/be/src/runtime/bufferpool/reservation-tracker.h File be/src/runtime/bufferpool/reservation-tracker.h: PS3, Line 197: Release memory on linked MemTracker > not sure what that means. should it say "to linked MemTracker"? Reworded PS3, Line 199: decrease > how about reservation_decrease to be clear this is opposite TryConsumeFromM Done -- To view, visit http://gerrit.cloudera.org:8080/6708 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers
Tim Armstrong has uploaded a new patch set (#5). Change subject: IMPALA-5238: transfer reservations between trackers .. IMPALA-5238: transfer reservations between trackers This is a primitive needed to implement claiming and distribution of initial reservations. It supports transferring reservation between any two ReservationTrackers under the same query. Also remove the public DecreaseReservation() method, which is now mostly redundant and we have no plans to use. Testing: * Added a test that exercises transfer between trackers at different levels and with different relationships. Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 --- M be/src/runtime/bufferpool/reservation-tracker-test.cc M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/bufferpool/reservation-tracker.h 3 files changed, 259 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6708/5 -- To view, visit http://gerrit.cloudera.org:8080/6708 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers
Tim Armstrong has uploaded a new patch set (#4). Change subject: IMPALA-5238: transfer reservations between trackers .. IMPALA-5238: transfer reservations between trackers This is a primitive needed to implement claiming and distribution of initial reservations. It supports transferring reservation between any two ReservationTrackers under the same query. Testing: * Added a test that exercises transfer between trackers at different levels and with different relationships. Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 --- M be/src/runtime/bufferpool/reservation-tracker-test.cc M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/bufferpool/reservation-tracker.h 3 files changed, 259 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6708/4 -- To view, visit http://gerrit.cloudera.org:8080/6708 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Dan Hecht has posted comments on this change. Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. Patch Set 1: Looking at git history, the large allocations were added here, so let's be sure we're not losing coverage we think we need. Please talk to Michael. commit ed5ec6772fd24ad28901bc68f120c59597439cf2 Author: Michael HoDate: Thu Jun 30 15:15:27 2016 -0700 IMPALA-1619: Support 64-bit allocations. -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dan Hecht Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5167: Reduce the number of Kudu clients created .. Patch Set 3: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/6792/3/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: PS3, Line 403: std::vector sorted_master_addresses; : sorted_master_addresses.insert( : sorted_master_addresses.begin(), master_addresses.begin(), master_addresses.end()); : sort(sorted_master_addresses.begin(), sorted_master_addresses.end()); can't you just use the copy constructor? http://gerrit.cloudera.org:8080/#/c/6792/3/be/src/runtime/exec-env.h File be/src/runtime/exec-env.h: PS3, Line 193: acress spelling -- To view, visit http://gerrit.cloudera.org:8080/6792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 285: boost::scoped_ptr expr_mem_tracker_; > if this is per querystate, the mempool would need to be thread-safe, not su Yes, there should be one MemPool per fragment instance. TODO removed. Keep the name the same as it's clear enough already. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#12). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M
[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/6842 Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM .. IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM On jenkins.impala.io gerrit-verify-dryrun jobs, free-pool-test started running out of memory after updating Kudu to a newer version which can have slightly higher memory requirements (Kudu will reject writes when mem is 80% full rather than 60% full). free-pool-test can allocate up to 12gb, and the VMs have a CommitLimit of only 16gb. While larger VMs could be used, or the minicluster could be tuned further, it doesn't seem like free-pool-test needs to be testing such large allocations. Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 --- M be/src/runtime/free-pool-test.cc 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/6842/1 -- To view, visit http://gerrit.cloudera.org:8080/6842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#11). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 10: (57 comments) http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 66: output_tuple_desc_(descs.GetTupleDescriptor(output_tuple_id_)), > initialize in .h where possible While I appreciate the suggestion, I prefer not to add more unrelated clean-up to this already gigantic patch. Line 88: desc->type() == grouping_exprs_[i]->type()); > i would intuitively think that the build row is the input row. renamed to intermediate_row_desc_. Line 97: > it feels dicey to pass a reference to an automatic variable, Init() might h Done Line 232: int count = 0; > isn't that the same as const X**? Not really :-). const X** is pointer to pointer to constant X. X* const * is pointer to constant pointer to X. The latter fits the bill of vector::data(). http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 122: /// Output expressions to convert row batches onto output values. > It's confusing that these aren't used in all subclasses of DataSink (e.g. D Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/hash-table.h File be/src/exec/hash-table.h: PS9, Line 485: ? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: PS9, Line 69: > Does this mean that multiple scanner threads share a single MemPool? That d Nice catch. I hit similar bugs in HDFS scanners before. Those scanners also have a per-scanner MemPool. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/sort-node.h File be/src/exec/sort-node.h: PS9, Line 68: :vector materialize_tuple_ doesn't seem to be defined. I think we always need these Done Line 69: > See my comment about the name in sorter.h. Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/subplan-node.h File be/src/exec/subplan-node.h: Line 65: /// tree rooted at 'node' and does any initialization that is required as a result of > * and do any initialization that is required as a result of setting the sub Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/topn-node.h File be/src/exec/topn-node.h: PS9, Line 78: output_tuple_exprs_; > output_tuple_exprs_? Usually when we have exprs materializing tuples it's n Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.cc File be/src/exprs/agg-fn-evaluator.cc: Line 96: MemPool* mem_pool, AggFnEvaluator** result) { > move input params to front Done Line 161: Status AggFnEvaluator::Open( > why not const&? Done Line 514: > single line, here and subsequent functions, where possible Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: Line 80: /// and caches all constant input arguments. > regarding the caching: is there already a todo to move that into the expr s TODO added. PS9, Line 94: o, it > Maybe this should be called ShallowClone() to make it more explicit. Withou Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn.cc File be/src/exprs/agg-fn.cc: Line 68: } > scalarexpr does that in create. do the same for this class? Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 87: for (int i = 0; i < num_children; ++i) { > we don't use 'const ..' elsewhere for automatic variables. Done Line 88: *child_node_idx += 1; > formatting Done Line 92: return Status::OK(); > what is this referring to? Stale comment. Removed. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 82: > this is really CreateChildren, not CreateTree. Moved in/out params to end. As for the naming, this function actually builds out the entire expression tree rooted at 'root'. CreateChildren() doesn't seem to fit. May be I misunderstood your comment. Line 134: /// Called recursively to create children expr trees for sub-expressions. > misleading: the root of the created tree is not 'parent' (the root is insta I refactored this function and CreateTree() a bit so it's easier to follow now. Line 135: /// > "for subexpressions" Done Line 140: /// root: root of the new tree. Created and initialized by the caller. > let's call this root_idx, makes it easier to follow It's called child_node_idx in the new patch. Line 146: /// !status.ok() if tree is inconsistent or corrupt > while we're cleaning up, move in/out params to end Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: Line 69: > initialize vars in .h where practical Done Line 82: (*evaluator)->fn_ctxs_ptr_ = (*evaluator)->fn_ctxs_.data(); > why do you need that (for a newly
[Impala-ASF-CR] IMPALA-5152: Gather all tables with missing metadata in analysis
Alex Behm has abandoned this change. Change subject: IMPALA-5152: Gather all tables with missing metadata in analysis .. Abandoned This is temporarily on hold to focus on a cleaner but more invasive solution. Abandoning so that nobody spends time on this version. -- To view, visit http://gerrit.cloudera.org:8080/6702 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Id10a303958557982b26a07c97a9bb30932ab44dd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#10). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M
[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.
Alex Behm has posted comments on this change. Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6840/1/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test File testdata/workloads/functional-query/queries/QueryTest/compute-stats.test: Line 20: '2009','1',310,305,1,'24.56KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.* There are a ton of tests that use SHOW TABLE STATS or SHOW PARTITIONS. I have not yet gone through all of them to save time. Let's agree on what the exact output should look like and then I'll those changes everywhere. -- To view, visit http://gerrit.cloudera.org:8080/6840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/6840 Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables. .. IMPALA-2373: Extrapolate row counts for HDFS tables. The main idea of this patch is to use table stats to extrapolate the row counts for new/modified partitions. Existing behavior: - Partitions that lack the row count stat are ignored when estimating the cardinality of HDFS scans. Such partitions effectively have an estimated row count of zero. - We always use the row count stats for partitions that have one. The row count may be innaccurate if data in such partitions has changed significantly. Summary of changes: - Enhance COMPUTE STATS to also store the total number of file bytes in the table. - Use the table-level row count and file bytes stats to estimate the number of rows in a scan. - A new impalad startup flag is added to enable/disable the extrapolation behavior. The feature is enabled by default. Note that even with the feature disabled, COMPUTE STATS stores the file bytes so you can enable the feature without having to COMPUTE STATS again. Testing: - Added new FE unit test - Added new EE test Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 --- M be/src/common/global-flags.cc M be/src/exec/catalog-op-executor.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java A fe/src/test/java/org/apache/impala/planner/StatsExtrapolationTest.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-query/queries/QueryTest/alter-table-hdfs-caching.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test A testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test M tests/metadata/test_compute_stats.py 32 files changed, 1,062 insertions(+), 578 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/6840/1 -- To view, visit http://gerrit.cloudera.org:8080/6840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-5169: Add support for async pins in buffer pool
Tim Armstrong has uploaded a new patch set (#10). Change subject: IMPALA-5169: Add support for async pins in buffer pool .. IMPALA-5169: Add support for async pins in buffer pool Makes Pin() do async reads behind-the-scenes, instead of blocking until the read completes. The blocking is done instead when the client tries to access the buffer via PageHandle::GetBuffer() or ExtractBuffer(). This is implemented with a new sub-state of "pinned" where the page has a buffer and consumes reservation but the buffer does not contain valid data. Motivation: This unlocks various opportunities to overlap read I/Os with other work: * Reads to different disks can execute in parallel * I/O and computation can be overlapped. This initially benefits BufferedTupleStream::PinStream(), where many pages are pinned at once. With this change the reads run asynchronously. This can potentially lead to large speedups when spilling. E.g. if the pages for a Hash Join's partition are spread across 10 disks, we could get 10x the read throughput, plus overlap the I/O with hash table build. In future we can use this to do read-ahead over unpinned BufferedTupleStreams or for unpinned Runs in Sorter, but this requires changes to the client code to Pin() pages in advance. Testing: * BufferedTupleStreamV2 already exercises this. * Various BufferPool tests already exercise this. * Added a basic test to cover edge cases made possible by the new state transitions. * Extended the randomised test to cover this. Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79 --- M be/src/runtime/buffered-tuple-stream-v2.cc M be/src/runtime/buffered-tuple-stream-v2.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h 8 files changed, 416 insertions(+), 155 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6612/10 -- To view, visit http://gerrit.cloudera.org:8080/6612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5220: memory maintenance cleanup .. Patch Set 10: Code-Review+2 Rebase -- To view, visit http://gerrit.cloudera.org:8080/6626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5220: memory maintenance cleanup .. Patch Set 10: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/560/ -- To view, visit http://gerrit.cloudera.org:8080/6626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4499: Table name missing from exec summary
Alex Behm has posted comments on this change. Change subject: IMPALA-4499: Table name missing from exec summary .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5294: Kudu INSERT partitioning fails with constants
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5294: Kudu INSERT partitioning fails with constants .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6828/2/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: Line 249: // IMPALA-5294: Don't use 'partitionExpr' here because the constants were removed. > explain briefly why the constants matter here (they don't for hdfs) Done -- To view, visit http://gerrit.cloudera.org:8080/6828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5294: Kudu INSERT partitioning fails with constants
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6828 to look at the new patch set (#3). Change subject: IMPALA-5294: Kudu INSERT partitioning fails with constants .. IMPALA-5294: Kudu INSERT partitioning fails with constants An INSERT into a Kudu table with a constant value being inserted into a partition column causes an IllegalStateExcaption. This is because DistributedPlanner removes constants from the list of partition exprs before creating the KuduPartitionExpr, but KuduPartitionExpr expects to get one expr per partition column. The fix is to pass the full list of partition exprs into the KuduPartitionExpr, instead of the list that has had constants removed. This preserves the behavior that if all of the partition exprs are constant we fall back to UNPARTITIONED. One complication is that if a partition expr is a NullLiteral, it must be cast to a specific type to be passed to the BE. The InsertStmt will cast the partition exprs to the partition column types, but these casts may be lost from the copies of the partition exprs stored by the KuduPartitionExpr during reset(). To fix this, the KuduPartitionExpr can store the types of the partition cols and recast the partition exprs to those types during analyze(). Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9 --- M fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 3 files changed, 47 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6828/3 -- To view, visit http://gerrit.cloudera.org:8080/6828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5294: Kudu INSERT partitioning fails with constants
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-5294: Kudu INSERT partitioning fails with constants .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6828/2/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: Line 249: // IMPALA-5294: Don't use 'partitionExpr' here because the constants were removed. explain briefly why the constants matter here (they don't for hdfs) -- To view, visit http://gerrit.cloudera.org:8080/6828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4499: Table name missing from exec summary
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6832 to look at the new patch set (#2). Change subject: IMPALA-4499: Table name missing from exec summary .. IMPALA-4499: Table name missing from exec summary For scan nodes, previously only HDFS tables showed the name of the table in the 'Detail' section for the scan node. This change adds the table name for all scan node types (Kudu, HBase, and DataSource). Testing: - Added an e2e test in test_observability. Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324 --- M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M tests/query_test/test_observability.py 3 files changed, 38 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6832/2 -- To view, visit http://gerrit.cloudera.org:8080/6832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4499: Table name missing from exec summary
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4499: Table name missing from exec summary .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/6832/1/tests/query_test/test_observability.py File tests/query_test/test_observability.py: PS1, Line 61: ] > nit: extra space here and below Done -- To view, visit http://gerrit.cloudera.org:8080/6832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5137: pt1, Refactor TimestampValue constructors
Attila Jeges has posted comments on this change. Change subject: IMPALA-5137: pt1, Refactor TimestampValue constructors .. Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/6510/11/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: PS11, Line 282: nit: indentation is 4 for line continuation (here and below). http://gerrit.cloudera.org:8080/#/c/6510/11/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: PS11, Line 93: TimestampValue t You might want to use "const TimestampValue& t =" here (and below), just like you did in aggregate-functions-ir.cc -- To view, visit http://gerrit.cloudera.org:8080/6510 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id25e19f7984e5ebf9073d9c569faf69cec142fa1 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No