[Impala-ASF-CR] IMPALA-3203: Part 2: per-core free lists in buffer pool
Dan Hecht has posted comments on this change. Change subject: IMPALA-3203: Part 2: per-core free lists in buffer pool .. Patch Set 16: (15 comments) http://gerrit.cloudera.org:8080/#/c/6414/16/be/src/runtime/bufferpool/buffer-allocator.cc File be/src/runtime/bufferpool/buffer-allocator.cc: PS16, Line 58: The caller can claim up to 'max_bytes_to_claim' to allocate a buffer from the : /// system hmm is that true? what if the number of bytes freed is less than that? oh, reading on, I see that the actual number of bytes the caller can claim is returned as well. So, how about specifying as the input is target_bytes_to_free/target_bytes_to_claim and the output is the actual bytes freed and bytes claimed? And rewording the comment in those terms? Line 451: DCHECK_GT(target_bytes_to_free, 0); DCHECK_GE(target_bytes_to_free, max_bytes_to_claim)? Line 512: int64_t bytes_claimed = bytes_freed; why is that right? shouldn't it be: bytes_claimed = min(bytes_freed, max_bytes_to_claim); if (bytes_freed > bytes_claimed) ... add to system_bytes_remaining_... Line 544: lists->num_free_buffers.Add(1); it would be good to move AddFreeBuffer() and RemoveCleanPage() to be adjacent, so it's more obvious that their code has similarities (in case they need updating, etc). PS16, Line 555: two Maintenance( does this need updating? http://gerrit.cloudera.org:8080/#/c/6414/16/be/src/runtime/bufferpool/buffer-allocator.h File be/src/runtime/bufferpool/buffer-allocator.h: PS16, Line 59: stored inside the buffer allocator. "stored" is kinda confusing and ambiguous. maybe say similar to #2: in the BufferAllocator's clean page lists. PS16, Line 59: clean pages clean unpinned pages just to clarify PS16, Line 65: concurrently delete PS16, Line 66: allocate this is from the BufferAllocator, not the SystemAllocator, right? PS16, Line 66: buffer ... buffer from the BufferAllocator's free or clean page lists ... (i.e. we're not talking about releasing a 2MB buffer from the client to fit in the reservation, right?) PS16, Line 164: . I think we should also explicitly say: If 'slow_but_sure' is false, then this functional optimistically tries to reclaim the memory but may not reclaim all 'target_bytes' of memory. PS16, Line 164: is slower and nit: redundant with "slower strategy" stated earlier. http://gerrit.cloudera.org:8080/#/c/6414/16/be/src/runtime/bufferpool/buffer-pool.cc File be/src/runtime/bufferpool/buffer-pool.cc: PS16, Line 374: write how about: "write handle", since the write I/O is done already. or really, I think you can delete this first clause. http://gerrit.cloudera.org:8080/#/c/6414/15/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: PS15, Line 315: ion( > I agree it would probably be better to design the AddGcFunction() interface What I'm suggesting is to just get rid of AddGcFunction() and just have the mem-tracker.cc code call the functions in order. Or maybe you're saying it's not so simple since not all the objects of the called methods are singletons? In any case, if it is a bunch of work, I'm fine with just clarifying the order. This code should hopefully go away / simplify as we move more things under buffer-pool, right? http://gerrit.cloudera.org:8080/#/c/6414/16/be/src/runtime/mem-tracker.h File be/src/runtime/mem-tracker.h: PS16, Line 300: if none of the other : /// previously-added GC don't we need to still call the tc-malloc GC code? Otherwise, we won't make progress against lowering the process consumption and so still can't make new allocations, no? -- To view, visit http://gerrit.cloudera.org:8080/6414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I612bd1cd0f0e87f7d8186e5bedd53a22f2d80832 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes
Zach Amsden has posted comments on this change. Change subject: IMPALA-5003: Constant propagation in scan nodes .. Patch Set 16: (3 comments) http://gerrit.cloudera.org:8080/#/c/6389/16/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: Line 749: // conjuncts because NULLs are handled differently for CompoundPredicate.AND This is the comment to which I was referring. http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test: Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR > Agree we can address the issue if we want to. I suggest we leave for now be I am terrified at how invasive that diff could become. Totally agree it should be in another change. Line 335: 00:SCAN HDFS [functional.alltypes] > I think this is a minor oversight in optimizeConjuncts(). If one of the con Easy fix, will check to make sure this is consistent with existing behavior. -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Henry Robinson has posted comments on this change. Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. Patch Set 6: Re: gflags - agree we need to look hard at a solution. Do you think we could do that in a follow-on patch, or would you prefer it's done here before we put kutil in? -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports
Laurel Hale has posted comments on this change. Change subject: IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports .. Patch Set 3: Code-Review+1 All good on my side. -- To view, visit http://gerrit.cloudera.org:8080/6515 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables .. IMPALA-5154: Handle 'unpartitioned' Kudu tables The catalogd was hanging trying to load an unpartitioned Kudu table created outside of Impala. This fixes an assumption made in KuduTable.java that the list of 'partition by' expressions is not empty. Regardless, the list on the thrift structure must be created because the field is marked required. Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d Reviewed-on: http://gerrit.cloudera.org:8080/6560 Reviewed-by: Alex BehmTested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M tests/query_test/test_kudu.py 2 files changed, 35 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] Allow BlockingQueue and ThreadPool to accept rvalue args
Dan Hecht has posted comments on this change. Change subject: Allow BlockingQueue and ThreadPool to accept rvalue args .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6442/2/be/src/util/blocking-queue.h File be/src/util/blocking-queue.h: Line 110: /// false. can you concisely explain what V is? Line 133: /// put the element, returns false. same here -- To view, visit http://gerrit.cloudera.org:8080/6442 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1791870576cb269e86495034f92555de48f92f10 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes
Alex Behm has posted comments on this change. Change subject: IMPALA-5003: Constant propagation in scan nodes .. Patch Set 16: (2 comments) http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test: Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR > It is fixable. I think this is because the resulting Expr ends up with no Agree we can address the issue if we want to. I suggest we leave for now because the fix is possibly invasive and the gain is minimal. It's also unrelated to your change. Line 335: 00:SCAN HDFS [functional.alltypes] > bool_col = null is stripped from conjunct evaluation. bool_col IS null is I think this is a minor oversight in optimizeConjuncts(). If one of the conjuncts becomes NULL, then it has the same as a FALSE literal, but in that case optimizeConjuncts() will return true. -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) Impala and Hive have slightly different SQL they accept.
Jim Apple has posted comments on this change. Change subject: Impala and Hive have slightly different SQL they accept. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6567/1/index.html File index.html: PS1, Line 154: the same > maybe this should be 'compatible'. Technically we have some different metad I don't really understand the state of ODBC in Impala. Do you have a wording in mind that would reflect reality? Would something like "Impala utilizes Hive-compatible metadata and a Hive-compatible ODBC driver" work? -- To view, visit http://gerrit.cloudera.org:8080/6567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4e38dc15c31dbb3e297e1d3f9e8251a1fa22d1a7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Jim AppleGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT
Alex Behm has posted comments on this change. Change subject: IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/6261/12/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 2265: AnalyzesOk("create external table t (x int primary key) partition by hash (x) " + What does this statement do after your changes? Will it create a new Kudu table since column definitions are given? -- To view, visit http://gerrit.cloudera.org:8080/6261 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) Impala and Hive have slightly different SQL they accept.
Matthew Jacobs has posted comments on this change. Change subject: Impala and Hive have slightly different SQL they accept. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6567/1/index.html File index.html: PS1, Line 154: the same maybe this should be 'compatible'. Technically we have some different metadata, and the ODBC driver is similar but not exactly the same (we just support the HS2 interface, but that's kind of an implementation detail) -- To view, visit http://gerrit.cloudera.org:8080/6567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4e38dc15c31dbb3e297e1d3f9e8251a1fa22d1a7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Jim AppleGerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) Impala and Hive have slightly different SQL they accept.
Jim Apple has uploaded a new change for review. http://gerrit.cloudera.org:8080/6567 Change subject: Impala and Hive have slightly different SQL they accept. .. Impala and Hive have slightly different SQL they accept. https://lists.apache.org/thread.html/ad17fb6e145ecb082c529f2f21b4661ad1047021dd579117635ce501@%3Cdev.impala.apache.org%3E Change-Id: I4e38dc15c31dbb3e297e1d3f9e8251a1fa22d1a7 --- M index.html 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/6567/1 -- To view, visit http://gerrit.cloudera.org:8080/6567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I4e38dc15c31dbb3e297e1d3f9e8251a1fa22d1a7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Jim Apple
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. Patch Set 5: > (5 comments) > > > 1) Brining in all the gflags is maybe concerning. will any > conflict with ours? even if not, they will crowd our already large > number of gflags. perhaps we need to look more closely at them. > > I agree, it's a lot of new (unused) gflags. I filed > https://issues.apache.org/jira/browse/IMPALA-5174 to see if > > > 2) I'm curious to know how this (and more particularly the actual > code import change) affects the binary size. > > Before: 391865544 > After: 391795120 > > So somehow it's smaller (?). I'll take a closer look, but nothing > to worry about right now. > > > 3) It'd be good if we could push some of these changes upstream > if possible. > > Will do. The issue is that every upstream update brings in all the > changes since the last rebase, which leads to more compilation or > even functionality issues to resolve. So rather than try and make > that process converge, I figured I'd draw a line under a known good > version of kutil, and add the few changes needed to Impala. > > I should upstream these changes soon, though, so that next upgrade > we bring them all in. Thanks for looking into these. Weird that the binary got smaller. I do think the gflags thing is a problem we need to address though, there are 86 flags defined in util/ alone. Maybe we can augment the gflag macros in the directories we import to prepend "_kudu_import" or such - that wouldn't get rid of them but at least group them and keep them from making Impala's flags hard to find. Perhaps we should run this by some other folks to see if (a) others maybe aren't concerned about this and (b) anyone has some ideas for dealing with this gracefully. -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes
Zach Amsden has posted comments on this change. Change subject: IMPALA-5003: Constant propagation in scan nodes .. Patch Set 16: (4 comments) http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test: Line 159: select count(*) from > There's still a question as to whether we should impose a limit on the size I'll probably just re-use the same rule. Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR > This is the current intended behavior. See the bottom of hdfs.test planner It is fixable. I think this is because the resulting Expr ends up with no SlotRefs, and as a result conjunct.isBoundBySlotIds(partitionSlots_) returns true, which is questionable. Changing that would have deep consequences, but we can always build something to collect all the SlotRefs from an Expr and validate they are a non-empty subset of partitionSlots_ Line 289: where l_partkey < l_suppkey and c.c_nationkey = 10 and o_orderkey = o_shippriority and l_suppkey = 10 and o_shippriority = c_nationkey > long line Done Line 335: 00:SCAN HDFS [functional.alltypes] > Why not an EmptySetNode? bool_col = null is stripped from conjunct evaluation. bool_col IS null is not. I am guessing because 10 = null => null and 10 IS NULL => false. Is this behavior incorrect? It certainly is suspect, but there are some comments about how conjunct evaluation treats null conditions specially and I wasn't sure which was the correct behavior here. -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes
Alex Behm has posted comments on this change. Change subject: IMPALA-5003: Constant propagation in scan nodes .. Patch Set 16: (6 comments) http://gerrit.cloudera.org:8080/#/c/6389/16/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 418: if (analyzer.getQueryCtx().client_request.getQuery_options().enable_expr_rewrites) { use analyzer.getQueryOptions() http://gerrit.cloudera.org:8080/#/c/6389/16/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: Line 1274: if (analyzer.getQueryCtx().client_request.query_options.enable_expr_rewrites) { use analyzer.getQueryOptions() http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test: Line 159: select count(*) from There's still a question as to whether we should impose a limit on the size of the conjunct list we'll optimize (due to N^2 cost), similar to what we do in ExtractCommonConjunctsRule. I'd suggest running perf on some extreme examples to see what a reasonable cutoff might be. Or you can also adopt the ExtractCommonConjunctsRule threshold. Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR > In testing this, I discovered the following: This is the current intended behavior. See the bottom of hdfs.test planner test. The behavior is questionable, but it certainly has nothing to do with your change. Line 289: where l_partkey < l_suppkey and c.c_nationkey = 10 and o_orderkey = o_shippriority and l_suppkey = 10 and o_shippriority = c_nationkey long line Line 335: 00:SCAN HDFS [functional.alltypes] Why not an EmptySetNode? -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Henry Robinson has uploaded a new patch set (#6). Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. IMPALA-4669: [KUTIL] Add kudu_util library to the build. A few miscellaneous changes to allow kudu_util to compile with Impala. Add kudu_version.cc to substitute for the version.cc file that is automatically built during the full Kudu build. Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to use deprecated names for LZ4 methods. Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to disable building with nvm support, removing a library dependency. Also remove imported FindOpenSSL.cmake in favour of the standard one provided by cmake itself. Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/common/CMakeLists.txt A be/src/common/kudu_version.cc M be/src/common/logging.cc A be/src/kudu/gutil M be/src/kudu/util/CMakeLists.txt M be/src/kudu/util/cache.cc M be/src/kudu/util/compression/compression_codec.cc M be/src/kudu/util/env.cc M be/src/kudu/util/flags.cc A be/src/kudu/util/kudu_export.h M be/src/kudu/util/logging.cc M be/src/kudu/util/minidump.cc M be/src/kudu/util/status.h D cmake_modules/FindOpenSSL.cmake 16 files changed, 134 insertions(+), 87 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5715/6 -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Henry Robinson has posted comments on this change. Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. Patch Set 5: (5 comments) > 1) Brining in all the gflags is maybe concerning. will any conflict with > ours? even if not, they will crowd our already large number of gflags. > perhaps we need to look more closely at them. I agree, it's a lot of new (unused) gflags. I filed https://issues.apache.org/jira/browse/IMPALA-5174 to see if > 2) I'm curious to know how this (and more particularly the actual code import > change) affects the binary size. Before: 391865544 After: 391795120 So somehow it's smaller (?). I'll take a closer look, but nothing to worry about right now. > 3) It'd be good if we could push some of these changes upstream if possible. Will do. The issue is that every upstream update brings in all the changes since the last rebase, which leads to more compilation or even functionality issues to resolve. So rather than try and make that process converge, I figured I'd draw a line under a known good version of kutil, and add the few changes needed to Impala. I should upstream these changes soon, though, so that next upgrade we bring them all in. http://gerrit.cloudera.org:8080/#/c/5715/5/be/CMakeLists.txt File be/CMakeLists.txt: Line 315: maintenance_manager_proto > It might be nice to keep kudu specific libs in a separate list and merge th Turns out this isn't necessary any more. http://gerrit.cloudera.org:8080/#/c/5715/5/be/src/kudu/util/CMakeLists.txt File be/src/kudu/util/CMakeLists.txt: Line 208: if(NOT NO_NVM_SUPPORT) > I think the NO_NVM_SUPPORT flag would probably be useful to add to Kudu, an I agree - I have tried upstreaming it but the reviewers (understandably) wanted a slightly more general solution that I didn't have time to do. I would expect this to show up in future upgrades of the kutil library. Line 331: target_link_libraries(protoc-gen-insertions gutil glog gflags protobuf protoc ${KUDU_BASE_LIBS}) > can this be added upstream? Will look into it. http://gerrit.cloudera.org:8080/#/c/5715/5/be/src/kudu/util/env.cc File be/src/kudu/util/env.cc: Line 10: #include "kudu/util/status.h" > seems like they're missing this include, can this be added upstream? Yes, I think so. http://gerrit.cloudera.org:8080/#/c/5715/5/be/src/kudu/util/logging.cc File be/src/kudu/util/logging.cc: Line 47: DECLARE_string(log_filename); > what does this mean for our log files? will this code start writing to our Not AFAICT. We don't call kudu's log initialization code. -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes
Zach Amsden has posted comments on this change. Change subject: IMPALA-5003: Constant propagation in scan nodes .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test: Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR In testing this, I discovered the following: Query: explain select count(*) from (select * from functional.alltypes where int_col = 10) T where (coalesce(NULL, T.int_col) + random() * T.int_col = 100 OR coalesce(NULL, T.int_col) + T.int_col = 20) ERROR: NotImplementedException: Unsupported non-deterministic predicate: TRUE OR 10 + random() * 10 = 100 This looks like a bug, but is it a regression? -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6559/2/be/src/exprs/kudu-partition-expr.cc File be/src/exprs/kudu-partition-expr.cc: Line 87: if (!s.ok()) { > Actually I think we may want to have a parameter indicating if strings shou I checked with Alexey on the Kudu team, this would only fail if we: a) wrote null to a non nullable col, b) tried to set a value of the wrong type, e.g. calling SetString*() on a non string column I think we can avoid (a) as mentioned above, and (b) would be a bug in our code. Therefore I think we can DCHECK -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes
Zach Amsden has uploaded a new patch set (#16). Change subject: IMPALA-5003: Constant propagation in scan nodes .. IMPALA-5003: Constant propagation in scan nodes When conjuncts are pushed into table refs from inline views, they can be considered for constant progagation within that node. In certain cases, we might end up with a FALSE conditional and now we can convert ScanNodes to EmptySet nodes when that occurs, and in other cases, we might be able to prune partitions based on expressions which have now become constant. Testing: Expanded the test cases for the planner to achieve constant propagation. Added Kudu, datasource, Hdfs and HBase tests to validate we can create EmptySetNodes. Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test 18 files changed, 602 insertions(+), 88 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/16 -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true); > I commented on the decision to keep the sort.by.columns property visible in i think a 'show create table t' should print the sort-by clause, not a table property. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes
Zach Amsden has posted comments on this change. Change subject: IMPALA-5003: Constant propagation in scan nodes .. Patch Set 15: (12 comments) http://gerrit.cloudera.org:8080/#/c/6389/15/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: Line 923:* Propagates constant expressions of the form = to > Explain candidate param Done Line 924:* other uses of slot ref in all expressions in conjuncts; returns a BitSet with > in all expressions in conjuncts -> in the given conjnucts Done Line 942: // Don't rewrite with our own substitution! We may have already processed > Explanation seems more complicated than necessary. Clearly it would be wron Done Line 960:* Returns true if the conjuncts may be true and false if a contradiction has > Shorter: Returns false if a contradiction has been implied, true otherwise. Done Line 980: // applied by the rewriter. We must also re-analyze result exprs to > I think we can remove everything after "We must also re-analyze ..." to ke Done Line 985: if (index < 0) continue; > Agree. Makes sense. Done Line 988: // Reset Expr to force updating cost > Re-analyze expr to add implicit casts, resolve function signatures, and com I made this slightly simpler due to required indentation level Line 998: // because they can't be evaluated if expr rewriting is turned off. > You should be able to do this with a PlannerTest. For example, look at Plan Done http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: Line 1266: > Looks like the constant propagation is applied even when expr rewrites are Done http://gerrit.cloudera.org:8080/#/c/6389/15/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test: Line 101: # XXX - this seems correct, but is this a legal transformation? > I agree that implicit casts on both sides seem ok. Done Line 161: # We can optimize in some cases > Instead of testing various predicate assignment scenarios, we should focus Done. I haven't tried an extreme example, but I added a few crazy ones to the test. http://gerrit.cloudera.org:8080/#/c/6389/15/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 127: SCANRANGELOCATIONS > Don't think we need the SCANRANGELOCATIONS and DISTRIBUTEDPLAN sections. nope -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6559/2/be/src/exprs/kudu-partition-expr.cc File be/src/exprs/kudu-partition-expr.cc: Line 87: if (!s.ok()) { > I'm not sure if this will return an error we need to handle at runtime, i.e Actually I think we may want to have a parameter indicating if strings should be copied or not, because in this case I don't think we need to do a string copy. Then you can use SetStringNoCopy() instead of SetString(). I'm asking someone on the Kudu team if SetStringNoCopy() has any errors we need to handle. -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4086: Add benchmark for simple scheduler
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4086: Add benchmark for simple scheduler .. Patch Set 5: (3 comments) This is still just benchmarking ComputeScanRangeAssignment(), right, not the whole Schedule() method. That seems reasonable to me, but it seems like there was some back-and-forth earlier. http://gerrit.cloudera.org:8080/#/c/4554/5/be/src/benchmarks/scheduler-benchmark.cc File be/src/benchmarks/scheduler-benchmark.cc: Line 18: #include Nit: I personally think we should treat gutil as being part of our source tree, so it should go below and be quoted with "" PS5, Line 46: Blocks and block This doesn't seem right. http://gerrit.cloudera.org:8080/#/c/4554/5/be/src/scheduling/scheduler.h File be/src/scheduling/scheduler.h: Line 346: Status ComputeScanRangeAssignment(QuerySchedule* schedule); What do you think about adding a comment here, or maybe in the .cc pointing to the benchmark, so that it's more discoverable for new people working on the code. -- To view, visit http://gerrit.cloudera.org:8080/4554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I89ec1c6c1828bb0b86d1e13ce4dfc5a8df865c2e Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4817: Populate Parquet Statistics for Strings
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/6563 Change subject: IMPALA-4817: Populate Parquet Statistics for Strings .. IMPALA-4817: Populate Parquet Statistics for Strings This change adds functionality to populate the new parquet::Statistics fields 'min_value' and 'max_value', that were added in parquet-format PR change, Impala will stop populating the deprecated 'min' and 'max' fields. Keeping track of StringValue statistics requires some memory management code to materialize values that reside in memory owned by row batches. The HdfsParquetScanner will preferably read the new fields if they are populated. For tables with only the old fields populated, it will read them only if they are of simple numeric type, i.e. boolean, integer, or floating point. This change removes the comparison of the Parquet Statistics we write to Hive from the tests, since Hive does not write the new fields. TODO: This change still needs tests reading statistics from files which use the old 'min' and 'max' fields, such as those written by Hive. I'll add these tests in a subsequent patch set. Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M common/thrift/parquet.thrift M tests/query_test/test_insert_parquet.py 9 files changed, 411 insertions(+), 201 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/1 -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 6: Is this superseded by the other review? -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5171: update RAT excluded files list
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5171: update RAT excluded files list .. IMPALA-5171: update RAT excluded files list This commit IMPALA-5140: improve docs building guidelines renamed a file, but the RAT exclusion list wasn't update. This patch fixes the exclusion. Change-Id: I07c64ba2fda7a996f24a49417e3e5485872589f7 Reviewed-on: http://gerrit.cloudera.org:8080/6561 Reviewed-by: Jim AppleTested-by: Impala Public Jenkins --- M bin/rat_exclude_files.txt 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Jim Apple: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6561 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I07c64ba2fda7a996f24a49417e3e5485872589f7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-5171: update RAT excluded files list
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5171: update RAT excluded files list .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6561 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I07c64ba2fda7a996f24a49417e3e5485872589f7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5171: update RAT excluded files list
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5171: update RAT excluded files list .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/108/ -- To view, visit http://gerrit.cloudera.org:8080/6561 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I07c64ba2fda7a996f24a49417e3e5485872589f7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Michael Brown has posted comments on this change. Change subject: IMPALA-5140: improve docs building guidelines .. Patch Set 10: > > Patch Set 9: Verified+1 > > The docs build job will prevent RAT-breaking changes from getting > submitted, and it only takes about 10 minutes. > > This change breaks RAT: > http://jenkins.impala.io:8080/job/rat-check/769/console Sorry and thanks for letting me know. https://gerrit.cloudera.org/#/c/6561/ -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5171: update RAT excluded files list
Michael Brown has uploaded a new change for review. http://gerrit.cloudera.org:8080/6561 Change subject: IMPALA-5171: update RAT excluded files list .. IMPALA-5171: update RAT excluded files list This commit IMPALA-5140: improve docs building guidelines renamed a file, but the RAT exclusion list wasn't update. This patch fixes the exclusion. Change-Id: I07c64ba2fda7a996f24a49417e3e5485872589f7 --- M bin/rat_exclude_files.txt 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/6561/1 -- To view, visit http://gerrit.cloudera.org:8080/6561 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I07c64ba2fda7a996f24a49417e3e5485872589f7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Henry Robinson has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6406/3/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 218: 1: required binary file_desc_data > Nice Leave a todo to put this in a KRPC sidecar :) No need to pay serialization for the string itself! -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 2: (6 comments) Quick pass with some minor comments. I am still trying to wrap my head around the concept of KuduPartitionExpr... http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: PS1, Line 62: int col, PrimitiveType type Comment on these params. http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS1, Line 1845: f (fragment.__isset.output_sink && fragment.output_sink.__isset.stream_sink : && fragment.output_sink.type == TDataSinkType::DATA_STREAM_SINK : && fragment.output_sink.stream_sink.output_partition.type == TPartitionType::KUDU) Add a brief comment. http://gerrit.cloudera.org:8080/#/c/6559/1/common/thrift/Exprs.thrift File common/thrift/Exprs.thrift: PS1, Line 134: 4 2? PS1, Line 137: 5 3? :) http://gerrit.cloudera.org:8080/#/c/6559/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS1, Line 113: // Set in analyze(). Maps exprs in partitionKeyExprs_ to their column's position in the : // table, eg. partitionKeyExprs_[i] = table_.columns(partitionKeyIdx_[i]) : private List partitionKeyIdxs_ = Lists.newArrayList(); Just for my understanding, can you explain why this is needed for this change? http://gerrit.cloudera.org:8080/#/c/6559/1/fe/src/main/java/org/apache/impala/planner/DataPartition.java File fe/src/main/java/org/apache/impala/planner/DataPartition.java: PS1, Line 48: // For kudu partition: expr that calls into the kudu client to get the partition number. : private Expr partitionExpr_; : : // Should be called only by the static factory method for Kudu partitioned tables. : private DataPartition(TPartitionType type, Expr partitionExpr) { : Preconditions.checkNotNull(partitionExpr); : Preconditions.checkState(type == TPartitionType.KUDU); : type_ = type; : partitionExpr_ = partitionExpr; : partitionExprs_ = Lists.newArrayList(); : } I am not sure I get this. Why not passing a single element list of the case of Kudu partitioned? -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Jim Apple has posted comments on this change. Change subject: IMPALA-5140: improve docs building guidelines .. Patch Set 10: > Patch Set 9: Verified+1 The docs build job will prevent RAT-breaking changes from getting submitted, and it only takes about 10 minutes. This change breaks RAT: http://jenkins.impala.io:8080/job/rat-check/769/console -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has uploaded a new patch set (#2). Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 24 files changed, 593 insertions(+), 86 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/2 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/440/ -- To view, visit http://gerrit.cloudera.org:8080/6560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 10: (8 comments) Thank you for your reviews. Please see my inline comments and PS10. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS9, Line 376: analyzeSortByColumn > Instead of checking this here, it's a bit cleaner to alway call analyzeSort Done http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS9, Line 49: roperties.containsKe > The caller is not supposed to modify the return values, correct? You may ju Done, I ended up using ImmutableList, since I already use it below. PS9, Line 77: > Same comment as above. Done PS9, Line 89: t by column in the li > You can still construct and return an ImmutableList. See Guava's ImmutableL I needed the contains() method in line 105, as well as size(), so I ended up using a LinkedHashSet. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 515: AnalyzesOk("alter table functional.alltypes set tblproperties(" + > why allow this in addition to the '... sort by ()' clause? Alex and I discussed this and Alex suggested that we allow this, since it is easier than explicitly disabling it, and we'll always have the possibility that someone changes the values in Hive. The alternative would be to try and hide the table property completely. If Alex doesn't object and you think it's worth the effort to hide this from the users, I can implement it. Line 1905: "tblproperties ('sort.by.columns'='i')", "Table definition must not contain " + > same here, why allow the sort.by.columns table property? Please see my reply above. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: PS9, Line 2374: // SORT BY must be the first table option : ParserError("CREATE > You may want to comment why this results in a parsing error. Done http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true); > why not print that as a 'sort by' clause? I commented on the decision to keep the sort.by.columns property visible in AnalyzeDDLTest.java. I discussed ToSqlTest() with Alex, too, and IIRC he explained that this is only really relevant for views, in which create table statements cannot occur anyways. Therefore, this should never be visible to a user and is tested here only for completeness. Should I add a comment to explain this? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#10). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 30 files changed, 1,121 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/10 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 3: I ran an exhaustive build and it looks like the test failed on other file formats. I think the problems I saw were: * If the data is split between multiple files we get values < 10k (i.e. we probably need to have a separate test where we set num_nodes=1). * Some scanners, e.g. Kudu, don't have this counter -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables
Matthew Jacobs has uploaded a new patch set (#2). Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables .. IMPALA-5154: Handle 'unpartitioned' Kudu tables The catalogd was hanging trying to load an unpartitioned Kudu table created outside of Impala. This fixes an assumption made in KuduTable.java that the list of 'partition by' expressions is not empty. Regardless, the list on the thrift structure must be created because the field is marked required. Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M tests/query_test/test_kudu.py 2 files changed, 35 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/6560/2 -- To view, visit http://gerrit.cloudera.org:8080/6560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4883: Union Codegen
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4883: Union Codegen .. Patch Set 6: Code-Review+1 (4 comments) Looks good. I'll let Michael finish his review - he can probably do the +2. http://gerrit.cloudera.org:8080/#/c/6459/6/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 98: Extra blank lines. Line 235: if (ReachedLimit()) { I don't think this can be taken now, right? Since we don't increment num_rows_returned_ until later. http://gerrit.cloudera.org:8080/#/c/6459/6/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 116: /// call on the child. Maybe mention that these GetNext* functions don't apply the limit? PS6, Line 127: Increments 'num_rows_returned_' and : /// 'child_row_idx_' I think this needs an update. -- To view, visit http://gerrit.cloudera.org:8080/6459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6560/1/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 483:it can be created directly in Kudu and then loaded as an external table. > if this is harmless/useful functionality, should impala also support it? I had talked to DanB about it previously, and he said they want people to move away from creating tables like this but that they didn't want to break backwards compatibility in the client. Line 500: cursor.execute("SELECT COUNT(*) FROM %s" % name) > Just curious: Can we insert into the table? Is this table expected to suppo It should behave normally after it's created, we don't know anything about the partitioning. I'll modify this test to insert as well. -- To view, visit http://gerrit.cloudera.org:8080/6560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#2). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Open the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. Scan ranges that are accessing data cached by Hdfs will get a file handle from the cache, but the file handle will be kept on the scan range for the time that the scan range is in use. This prevents the cache from closing the file handle while the data buffer is in use. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. If max_cached_file_handles is set to 0, file handle caching is off and the previous behavior applies. TODO: 1. Determine appropriate defaults. 2. Write tests a. Delete test b. Block rebalance test? 3. For scan ranages that use Hdfs caching, should there be some sharing at the scanner level? Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M be/src/util/lru-cache-test.cc M be/src/util/lru-cache.h M be/src/util/lru-cache.inline.h 11 files changed, 387 insertions(+), 124 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/2 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Thread level file handle caching
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Thread level file handle caching .. Patch Set 1: (19 comments) http://gerrit.cloudera.org:8080/#/c/6478/1/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 70: DEFINE_uint64(max_cached_file_handles, 1, "Maximum number of HDFS file handles " > on further consideration, esp. in light of what tim brought up (separate re Comment out of date. Line 388: std::vector num_threads_per_disk; > don't use std:: in .cc files Done Line 1050: void DiskIoMgr::WorkLoop(DiskQueue* disk_queue, int thread_file_handle_cache_size) { > you use 'fh' elsewhere for 'file handle', feel free to use that elsewhere i removed Line 1052: if (thread_file_handle_cache_size > 0) { > dcheck that it's not < 0 removed Line 1081: Write(worker_context, static_cast(range)); > what about writing? We currently don't do writes to HDFS via the IO manager. The writes here are to local files. When we are writing tables, we open an Hdfs file handle in hdfs-table-sink.cc. I don't think these file handles can be reused, because we open it in write only mode. The file handle cache opens in read only mode. I don't see any way to convert them. Line 1337: : capacity_(capacity) {} > move to .h removed Line 1339: DiskIoMgr::ThreadFileHandleCache::~ThreadFileHandleCache() {} > remove removed Line 1343: std::string file_key = std::string(fname) + std::to_string(mtime); > is this guaranteed to be unique w/o some sort of separator? removed Line 1351: EvictFileHandle(); > you only call this once, and it's a small function, you might as well inlin removed http://gerrit.cloudera.org:8080/#/c/6478/1/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 233: /// This is a single-threaded LRU cache for Hdfs file handles. The cache creates and > I guess I was thinking that each thread would temporarily check out a file The approach changed to using a single cache at the DiskIoMgr level like the existing cache. Line 235: class ThreadFileHandleCache { > 'thread' is generic. LruFileHandleCache? or simply FileHandleCache? Removed. Line 240: /// Gets an Hdfs file handle for the specified file. It will lookup and use a > "look up" Removed Line 245: HdfsCachedFileHandle* GetFileHandle(const hdfsFS& fs, const char* fname, int64_t mtime); > long line Removed Line 248: > remove blank line Removed Line 256: class LruListElement { > make it a struct, class member vars are supposed to be private. Removed Line 260: ~LruListElement() {} > remove Removed Line 533: /// in the reader context. > "reader context" = reader_ member. referencing the member directly is more N/A in new code Line 544: /// the DiskIoRequestContext. This clears the statistics on this file handle. > why not make that a function of diskiorequestcontext The two statistics that need input from the ScanRange are unexpected_remote_bytes_ and num_remote_ranges_. The unexpected_remote_bytes_ requires knowing the value of expected_local_ for the ScanRange. There is some tracing that prints when a ScanRange has unexpected remote bytes (see ScanRange::Close for where it is now). The tracing is the only thing that would prevent this from being put directly into DiskIoMgrContext if we passed in expected_local_ or the ScanRange to the function. num_remote_ranges_ is impacted by the fact that we can call GetStatistics multiple times per scan range. It needs to know if this range has already been counted at the DiskIoMgrContext level. This can be worked around. Line 574: /// at read time, so hdfs_file_ is null. > is there a need to have it both ways for hdfs files, instead of always open We need to keep the hdfs file open as long as the cached buffer is around so that the hdfs file doesn't get closed. We also use this when file handle caching is off (max_cached_file_handles=0). -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6560/1/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 500: cursor.execute("SELECT COUNT(*) FROM %s" % name) Just curious: Can we insert into the table? Is this table expected to support all operations? -- To view, visit http://gerrit.cloudera.org:8080/6560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6560/1/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 483:it can be created directly in Kudu and then loaded as an external table. if this is harmless/useful functionality, should impala also support it? -- To view, visit http://gerrit.cloudera.org:8080/6560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Michael Brown has submitted this change and it was merged. Change subject: IMPALA-5140: improve docs building guidelines .. IMPALA-5140: improve docs building guidelines Move docs/generatingImpalaDoc.md to docs/README.md. This will automatically render the document inline at places like: https://github.com/apache/incubator-impala/tree/master/docs under the directory listing. Fix existing markdown which wasn't always rendering properly. Remove unneeded HTML and backslashes. Add a mention of make, and add one troubleshooting tip. Wrap most lines at 90 chars. This does not change how Github renders the markdown, and it makes reading the source easier as well. Explain how to get dita into PATH. Emphasize make and remove some of the dita advanced usage. Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Reviewed-on: http://gerrit.cloudera.org:8080/6512 Reviewed-by: Jim AppleTested-by: Michael Brown --- A docs/README.md D docs/generatingImpalaDoc.md 2 files changed, 204 insertions(+), 73 deletions(-) Approvals: Michael Brown: Verified Jim Apple: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Michael Brown has posted comments on this change. Change subject: IMPALA-5140: improve docs building guidelines .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 515: AnalyzesOk("alter table functional.alltypes set tblproperties(" + why allow this in addition to the '... sort by ()' clause? Line 1905: "tblproperties ('sort.by.columns'='i')", "Table definition must not contain " + same here, why allow the sort.by.columns table property? http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true); why not print that as a 'sort by' clause? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/6560 Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables .. IMPALA-5154: Handle 'unpartitioned' Kudu tables The catalogd was hanging trying to load an unpartitioned Kudu table created outside of Impala. This fixes an assumption made in KuduTable.java that the list of 'partition by' expressions is not empty. Regardless, the list on the thrift structure must be created because the field is marked required. Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M tests/query_test/test_kudu.py 2 files changed, 34 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/6560/1 -- To view, visit http://gerrit.cloudera.org:8080/6560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables
Thomas Tauber-Marshall has abandoned this change. Change subject: IMPALA-3742: partitions DMLs for Kudu tables .. Abandoned Replaced by https://gerrit.cloudera.org/#/c/6559/ -- To view, visit http://gerrit.cloudera.org:8080/6037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/6559 Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 21 files changed, 312 insertions(+), 86 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/1 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Alex Behm has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 3: (13 comments) http://gerrit.cloudera.org:8080/#/c/6406/3/common/fbs/CatalogObjects.fbs File common/fbs/CatalogObjects.fbs: Line 36: offset: long = 0 (id: 0); Why a default value? Seems potentially dangerous. Not for this change, but I'm thinking we don't even need to store the offset. If we know the block size, than we can derive the offset (assuming the list of file blocks is ordered by offset). Might be worth adding a TODO or recording that idea somewhere. Line 40: length: long = -1 (id: 1); Seems redundant, why keep it? Line 65: compression: FbCompression (id: 3); Will FlatBuffers add padding to align members? Ideally, we'd optimize for space and not access speed. http://gerrit.cloudera.org:8080/#/c/6406/3/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 218: 1: required binary file_desc_data Nice Line 296: 10: optional list file_name_prefixes Is this required for the move to flat buffers? I think we should consider an HdfsPathSet abstraction that assigns path ids and internally compresses the underlying strings. There's a lot of manual lookups and stitching in the current code. I don't feel too strongly about whether we should do that now, or clean up the code later. http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java: Line 78: public byte toFbCompression() { toFlatBuffer()? http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 126: locations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen()); I think it would be better for now to keep all the code that fetches information from external systems in one place (HdfsTable). Splitting up the loading and delegating to several classes may make sense, but that probably requires significant surgery, and the current fetching code is very much centralized (we iterate over all files of in a table). The loading code in this patch is more confusing to me than before. The meaning of some verbs like load/create is less clear. If you agree with that direction, we may not need a FileDescriptor class at all, and can only rely on the FB to hold the data. It may still make sense to have a FileDescBuilder which you can use to construct a FbFileDesc. Line 227: loc.getNames().length); Another case where we might be calling out to the NN. Line 321: private static int REPLICA_HOST_CACHE_MASK = 0x8000; Less code and more readable to have one var with (1 << 15). The places that need to & the mask can bit-wise invert, i.e. (x & ~MASK). http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 189: // List of file name prefixes. Sorted list of file name prefixes? Line 309: PaircompressedFileName = How predictable is the compression behavior? Do we iterate over the files in lexicographical order for both HDFS and S3? I'm worried about the case where an "invalidate metadata" suddenly leads to a higher memory requirement even if no files/partitions have changed. Line 340:* the suffix is equal to 'fileName'. Should mention that this function expects the fileNames to be sorted. Line 346: String commonPrefix = Strings.commonPrefix(prevFileName, fileName); I'm wondering if it's better to first check the last common prefix instead of the prefix between the current and prev file name. In the current version it seems like the list of prefixes could contain strings which are a common prefix of another one. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Jim Apple has posted comments on this change. Change subject: IMPALA-5140: improve docs building guidelines .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Michael Brown has posted comments on this change. Change subject: IMPALA-5140: improve docs building guidelines .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/6512/8/docs/README.md File docs/README.md: PS8, Line 168: 5 > not 8? Done -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Hello Laurel Hale, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6512 to look at the new patch set (#9). Change subject: IMPALA-5140: improve docs building guidelines .. IMPALA-5140: improve docs building guidelines Move docs/generatingImpalaDoc.md to docs/README.md. This will automatically render the document inline at places like: https://github.com/apache/incubator-impala/tree/master/docs under the directory listing. Fix existing markdown which wasn't always rendering properly. Remove unneeded HTML and backslashes. Add a mention of make, and add one troubleshooting tip. Wrap most lines at 90 chars. This does not change how Github renders the markdown, and it makes reading the source easier as well. Explain how to get dita into PATH. Emphasize make and remove some of the dita advanced usage. Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f --- A docs/README.md D docs/generatingImpalaDoc.md 2 files changed, 204 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6512/9 -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 9: Code-Review+1 (5 comments) http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS9, Line 376: if (!isHBaseTable) Instead of checking this here, it's a bit cleaner to alway call analyzeSortByColumns() and move the check there. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS9, Line 49: Lists.newArrayList() The caller is not supposed to modify the return values, correct? You may just return Collections.emptyList() which is also immutable. PS9, Line 77: Lists.newArrayList(); Same comment as above. PS9, Line 89: Lists.newArrayList(); You can still construct and return an ImmutableList. See Guava's ImmutableList.builder() http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: PS9, Line 2374: ParserError("CREATE TABLE Foo LIKE Baz STORED AS TEXTFILE LOCATION '/a/b' " + : "SORT BY(bar)"); You may want to comment why this results in a parsing error. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Jim Apple has posted comments on this change. Change subject: IMPALA-5140: improve docs building guidelines .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/6512/8/docs/README.md File docs/README.md: PS8, Line 168: 5 not 8? -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] save
Marcel Kornacker has posted comments on this change. Change subject: save .. Patch Set 1: > I think this should have been squashed. Sorry, mistake. -- To view, visit http://gerrit.cloudera.org:8080/6540 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d33f82ed6661e849b2a61f3b6dd2039813bca7c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] save
Marcel Kornacker has abandoned this change. Change subject: save .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/6540 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I9d33f82ed6661e849b2a61f3b6dd2039813bca7c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc .. Patch Set 3: (15 comments) http://gerrit.cloudera.org:8080/#/c/6535/3/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: PS3, Line 381: the total number : // of fragment instances > what's that? Done PS3, Line 386: i32 > Types.TFragmentIdx ? Done PS3, Line 412: 6: list destinations > isn't that common across all instances? Done PS3, Line 414: fragment > instance? Done PS3, Line 435: coord_state_idx > not clear why the backend will care what index it was represented by in a c Done Line 441: 4: list fragment_ctxs > are these in any particular order? i explained the ordering for fragment_instance_ctxs Line 453: // ReportExecStatus > other RPC names have a blank line after them, which makes them easier to fi Done PS3, Line 552: overall > what does "overall" mean here? this makes it seem like it's the merged sta changed code to match description PS3, Line 560: insert_exec_status > I guess this will hold the insert status for all instances, right? rephrased Line 563: 7: optional maperror_log; > and same here. this will be the aggregated error log across all instances? rephrased PS3, Line 572: CancelPlanFragment > CancelQueryFInstances Done Line 669: > It would be helpful to add the RPC comment to show which RPC the following Done Line 699: > // PublishFilter Done PS3, Line 734: ReportExecStatus > Eventually this will report status for all the instances (at least in the p well, it's overall backend execution status (this could report an error even before any fragment instances are started), so that's why i thought 'exec status' is still the right scope Line 739: TCancelQueryFInstancesResult CancelQueryFInstances( i didn't want to name this CancelQuery, because that sounds too much like a client-related rpc -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Marcel Kornacker has uploaded a new patch set (#4). Change subject: IMPALA-2550: Switch to per-query exec rpc .. IMPALA-2550: Switch to per-query exec rpc Coordinator: - FragmentInstanceState -> BackendState, which in turn records FragmentInstanceStats QueryState - does query-wide setup in a separate thread (which also launches the instance exec threads) - has a query-wide 'prepared' state at which point all static setup is done and all FragmentInstanceStates are accessible Also renamed QueryExecState to ClientRequestState. Simplified handling of execution status (in FragmentInstanceState): - status only transmitted via ReportExecStatus rpc - in particular, it's not returned anymore from the Cancel rpc FIS: Fixed bugs related to partially-prepared state (in Close() and ReleaseThreadToken()) Change-Id: I20769e420711737b6b385c744cef4851cee3facd --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/codegen-anyval.h M be/src/common/status.cc M be/src/common/status.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exprs/expr-test.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/buffered-block-mgr-test.cc A be/src/runtime/coordinator-backend-state.cc A be/src/runtime/coordinator-backend-state.h A be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/data-stream-test.cc A be/src/runtime/debug-options.cc A be/src/runtime/debug-options.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h D be/src/runtime/plan-fragment-executor.cc D be/src/runtime/plan-fragment-executor.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.cc M be/src/scheduling/query-schedule.cc M be/src/scheduling/query-schedule.h M be/src/service/CMakeLists.txt M be/src/service/child-query.cc M be/src/service/child-query.h R be/src/service/client-request-state.cc R be/src/service/client-request-state.h M be/src/service/fe-support.cc M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-internal-service.cc M be/src/service/impala-internal-service.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/testutil/desc-tbl-builder.cc M be/src/testutil/fault-injection-util.h M be/src/util/error-util-test.cc M be/src/util/error-util.cc M be/src/util/error-util.h M be/src/util/uid-util.h M common/thrift/ExecStats.thrift M common/thrift/ImpalaInternalService.thrift 66 files changed, 3,820 insertions(+), 3,615 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/6535/4 -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports
Hello Laurel Hale, Jim Apple, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6515 to look at the new patch set (#3). Change subject: IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports .. IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports In addition to switching URLs for individual JIRAs to point to issues.apache.org, also construct alternative URLs for JIRA reports (usually 1 per release, summarizing the issues fixed in that release). There is some inconsistency in which releases have associated JIRA reports. For releases where we never published a link to a JIRA report, I added a tag that could be used to construct a report in future, but didn't fill in any URL. Some releases do have a link to a JIRA report in the release notes, however the report is empty. Possibly there was some strangeness around the release process for those, or there could be some missing info in the JIRA system. I just transcribed the links in those cases and didn't try to reconstruct the history to debug the empty reports. Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525 --- M docs/impala_keydefs.ditamap M docs/topics/impala_fixed_issues.xml M docs/topics/impala_known_issues.xml 3 files changed, 232 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/6515/3 -- To view, visit http://gerrit.cloudera.org:8080/6515 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Michael Brown has posted comments on this change. Change subject: IMPALA-5140: improve docs building guidelines .. Patch Set 8: Laurel and Jim, please see patch set 8 and as always https://github.com/mikesbrown/incubator-impala/tree/fix-docs/docs for the rendered version. -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Michael Brown has posted comments on this change. Change subject: IMPALA-5140: improve docs building guidelines .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6512/5/docs/README.md File docs/README.md: PS5, Line 74: * **To generate HTML output of the Impala SQL Reference, run the following command:** : : ``` : ./bin/dita -input -format html5 \ : -output \ : -filter : ``` : : * **To generate PDF output of the Impala SQL Reference, run the following command:** : : ``` : ./bin/dita -input -format pdf \ : -output \ : -filter : ``` > Laurel was the original author. I'll add her to the review. Done -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Hello Laurel Hale, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6512 to look at the new patch set (#8). Change subject: IMPALA-5140: improve docs building guidelines .. IMPALA-5140: improve docs building guidelines Move docs/generatingImpalaDoc.md to docs/README.md. This will automatically render the document inline at places like: https://github.com/apache/incubator-impala/tree/master/docs under the directory listing. Fix existing markdown which wasn't always rendering properly. Remove unneeded HTML and backslashes. Add a mention of make, and add one troubleshooting tip. Wrap most lines at 90 chars. This does not change how Github renders the markdown, and it makes reading the source easier as well. Explain how to get dita into PATH. Emphasize make and remove some of the dita advanced usage. Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f --- A docs/README.md D docs/generatingImpalaDoc.md 2 files changed, 201 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6512/8 -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports
Laurel Hale has posted comments on this change. Change subject: IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports .. Patch Set 2: Code-Review+1 All of my issues have been addressed. Thank you. -- To view, visit http://gerrit.cloudera.org:8080/6515 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Laurel Hale has posted comments on this change. Change subject: IMPALA-5140: improve docs building guidelines .. Patch Set 7: > > Do we want to maintain the > > "make" command so users can simple 1) install the DITA-OT; 2) > > navigate to the "docs" directory; and 3) execute "make" to build > > HTML and PDF? I think this is the best way to go. > > Thanks. That means I have another patch set to do: adding dita to > PATH. I plan to remove all but 1 dita invocation example. That sounds right. -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 8: (18 comments) Thank you for the review, please see PS9. http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS7, Line 984: Sor > There is no KW_SET, right? Update the name? Done http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java: PS7, Line 73: > t? Done http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: Line 2484:* Add a warning that will be displayed to the user. Ignores null messages. Once > Comment that no warning can be issued if warnings have been retrieved? Done http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java: PS7, Line 119: ortByColumns_ ! > I think this can throw a NPE. Looking at the parser I see that CreateTableL Thanks for catching this. I was able to trigger the NPE with an additional test in ToSqlTest.java. Fixed. http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: PS7, Line 296: > inline Done http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS7, Line 41: : /** :* Analyzes the 'sort.by.columns' property in 'tblProperties' against the columns of :* 'table'. :*/ : public static List analyzeSortByColumns(MaptblProperties, : Table table) throws AnalysisException { : if (!tblProperties.containsKey(TBL_PROP_SORT_BY_COLUMNS)) { : return Lists.newArrayList(); : } : : List sortByCols = Lists.newArrayList( : Splitter.on(",").trimResults().omitEmptyStrings().split( : tblProperties.get(TBL_PROP_SORT_BY_COLUMNS))); : return TablePropertyAnalyzer.analyzeSortByColumns(sortByCols, table); : } : > These are used in only one place, right? Maybe inline there? I removed the first one. The second one is used in two places: InsertStmt.analyzeSortByColumns() (that's where I inlined the first one), and AlterTableSetTblProperties.analyze(). I'd keep the second one unless you lean strongly towards inlining it. Line 76: } > Precondition check for HBase? Done PS7, Line 127: : > You may want to consider if it's better to have a getSortByColumn(Table tab Inlined it instead. http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS7, Line 539: // If the table has a 'sort.by.columns' property and the query has a 'noclustered' : // hint, we issue a warning during analysis and ignore the 'noclustered' hint. > That comment doesn't seem relevant here. Remove? I put it there to explain the intent of the if statement. Without it one might think that the noclusteredhint gets ignored by mistake (see previous review comment by Thomas). Should I remove/rephrase it? PS7, Line 541: nsertStmt.hasClusteredHint() || > If insertStmt.hasClusteredHint() is true doesn't that imply that hasNoClust True, fixed. http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS7, Line 1793: !params.sort_by_columns.isEmpty() > use isEmpty() Done http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 526: String long_property_key = ""; > Test for HBase and Kudu? Added tests for Kudu in testAlterKuduTable(). HBase tables do not support "ALTER TABLE SET" altogether, which is tested in L683. Line 1054: > Same here (HBase + Kudu). You need to put all Kudu tests in a function and Added test to check that HBase is not supported. Added Kudu tests to TestAlterKuduTable(). PS7, Line 1904: te t > "must" typo Done PS7, Line 1920: est > No need for that comment :) Done http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: PS7, Line
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has uploaded a new patch set (#9). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 30 files changed, 1,112 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/9 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-1726: Treat parquet ENUMs as STRINGs when creating tables.
Lars Volker has posted comments on this change. Change subject: IMPALA-1726: Treat parquet ENUMs as STRINGs when creating tables. .. Patch Set 1: (1 comment) Thank you for working on this. Have you already thought about how we could test this? You can have a look at testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test, but I'm unsure whether we have a parquet file with an ENUM in our testdata already. Let me know if you get stuck there and we can figure out how to add one. http://gerrit.cloudera.org:8080/#/c/6550/1//COMMIT_MSG Commit Message: PS1, Line 7: IMPALA-1726 IMPALA-2525? -- To view, visit http://gerrit.cloudera.org:8080/6550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia7a2e20c3ab83eb3fac422c3b33c117856fec475 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jakub KukulGerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] Allow BlockingQueue and ThreadPool to accept rvalue args
Alex Behm has posted comments on this change. Change subject: Allow BlockingQueue and ThreadPool to accept rvalue args .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/6442/1//COMMIT_MSG Commit Message: Line 11: queue. Very often we create a thin wrapper for each work item we submit > The shared_ptr example is illustrative, but does also have some small effec Thanks for the explanation! -- To view, visit http://gerrit.cloudera.org:8080/6442 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1791870576cb269e86495034f92555de48f92f10 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes