[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 3: (3 comments) > Based on my reading of "man readdir", it's not threadsafe. I think > the only usage here is if a user hits "http://impalad:.../; to see > the web UI. It's possible that there's a lock somewhere preventing > concurrent use, but given that this is already reasonably > expensive, I'd recommend using the reentrant readdir_r instead. > > I looked around, and it looks like C++17 and boost have filesystem > abstractions, but I think readdir() is simple enough. I have replaced readdir() with readdir_r(), but the solution is only ok for Linux, and may cause problems on other Unix systems, because the size of dirent.d_name is not always fix. The general solution in linux.die.net/man/3/readdir_r can be actually problematic, see womble.decadent.org.uk/readdir_r-advisory.html The scenario is not. It is easy to allocate a buffer whose size will be surely enough in /proc/self/fd, but I am unsure about the ideal solution. - Should I care about portability to other Posix variants? - Is there a way to express "compilation error if other Posix variant than Linux"? - Is there a maximum size for filenames? Most examples use 255 (+1 for \0), but I do not know if it is an official value or not. - Is there a guideline for the maximum size to allocate on stack? Is ~255 ok? http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo > Please describe in this line what the change does, not what it should do, i Done http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG@11 PS2, Line 11: Posix > nit: Posix is a name and should be capitalized. Done http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG@18 PS2, Line 18: way to know the "expected value" in advance, and the number of file desciptors can > Couldn't we test that a reasonable minimum number of file descriptors is re Done -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 16 Nov 2017 01:46:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Hello Lars Volker, Philip Zeyliger, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8546 to look at the new patch set (#6). Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo Running shell commands from impalad can be problematic, because using popen() leads to forking which causes a spike in virtual memory. To avoid this, "ls" is replaced with POSIX API calls. FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, so it was unneccesery work to initialize it. It is removed, and only the number of file descriptors is computed. The automatic test for this function is only a sanity check, because there is no way to know the "expected value" in advance, and the number of file desciptors can change anytime. Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 --- M be/src/util/proc-info-test.cc M be/src/util/process-state-info.cc M be/src/util/process-state-info.h 3 files changed, 35 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8546/6 -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.h File be/src/util/process-state-info.h: http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.h@61 PS5, Line 61: /// Below are some of the Process status information that will be read > nit: long line. Please have a look at how to configure your text editor to Done http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc File be/src/util/process-state-info.cc: http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc@36 PS5, Line 36: using boost::algorithm::is_any_of; > nit: sort alphabetically? Done http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc@157 PS5, Line 157: // readdir() is not thread-safe according to its man page, but in glibc > Can you include a reference to the source (e.g. https://www.gnu.org/softwar Done http://gerrit.cloudera.org:8080/#/c/8546/6/be/src/util/process-state-info.cc File be/src/util/process-state-info.cc: http://gerrit.cloudera.org:8080/#/c/8546/6/be/src/util/process-state-info.cc@161 PS6, Line 161: if(dir_entry->d_name[0] != '.') ++fd_count; // . and .. do not count www.gnu.org/software/libc/manual/html_node/Reading_002fClosing-Directory.html states: "Portability Note: On some systems readdir may not return entries for . and ..," Because of this, the filename must be checked to return the exact number of descriptors. As every filename in fd directory is a number, it is enough to check the first character. -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 20 Nov 2017 18:51:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 ) Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. Patch Set 8: > Uploaded patch set 7. Oops, there was a rebase... to see the changes that fix CHAR type handling, see diff between patch 6 and 8. Tim, can you rerun the tests for me? -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 21 Nov 2017 00:04:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 3: (13 comments) http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG@9 PS3, Line 9: Adding support for "clustered", "noclustered", "shuffle" and "noshuffle" > How about "This change adds support..." Done http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG@12 PS3, Line 12: example: > Capitalize Done http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG@19 PS3, Line 19: Sort by partition columns before insert to make the insert more efficient. > Mention where exactly the sort happens (locally vs distributed). Done http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG@26 PS3, Line 26: exchenge > spelling Done http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup@1054 PS3, Line 1054: RESULT = new CreateTableAsSelectStmt(new CreateTableStmt(tbl_def), > nit: Can you wrap the lines at 90 chars? Done http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup@1167 PS3, Line 1167: tbl_def_without_col_defs ::= > Does that now also allow hints like "CREATE /*+ clustered */ TABLE FOO LIKE It is also accepted in that case, but has no effect - should I disallow it? I think that it would be the best to enable hints in several cases (even if no hint is accepted there currently), and check the hints in normal java code and warn if one is not valid for a given statement. I am not sure though. http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup@1168 PS3, Line 1168: KW_CREATE external_val:external opt_plan_hints:hints KW_TABLE if_not_exists_val:if_not_exists > long line Done http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@155 PS3, Line 155: TableDef(TableName tableName, boolean isExternal, boolean ifNotExists, List hints) { > long line Done http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1633 PS3, Line 1633: Only > What does "only" mean here? This was copy pasted - I guess it means "not error, just warning". I did not want to deviate too much from the original (AnalyzeStmtsTest.TestInsertHints()), to make it easier to merge the two if I find a good way to do it. http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1647 PS3, Line 1647: // HBase tables cannot be tested, as currently Impala cannot create HBase tables. > It's weird that this explanation occurs in the middle of the function. Can This is also copy paste legacy - AnalyzeStmtsTest.TestInsertHints() had an HBase test here. http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1653 PS3, Line 1653: > Why is there a newline? Also copy paste legacy. http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1677 PS3, Line 1677: "select * from functional.alltypes"); > Please also add tests that have additional hints in the select clause. Done http://gerrit.cloudera.org:8080/#/c/8400/3/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/3/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@207 PS3, Line 207: # IMPALA-4167: clustered CTAS into partitioned table adds sort node. > Can you add tests for noclustered and for sort by()? Done -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 17 Nov 2017 00:09:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Hello Lars Volker, Philip Zeyliger, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8546 to look at the new patch set (#5). Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo Running shell commands from impalad can be problematic, because using popen() leads to forking which causes a spike in virtual memory. To avoid this, "ls" is replaced with POSIX API calls. FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, so it was unneccesery work to initialize it. It is removed, and only the number of file descriptors is computed. The automatic test for this function is only a sanity check, because there is no way to know the "expected value" in advance, and the number of file desciptors can change anytime. Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 --- M be/src/util/proc-info-test.cc M be/src/util/process-state-info.cc M be/src/util/process-state-info.h 3 files changed, 32 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8546/5 -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/8546/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8546/4//COMMIT_MSG@11 PS4, Line 11: api > nit: acronyms are usually all caps Done http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.h File be/src/util/process-state-info.h: http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.h@77 PS4, Line 77: /// File Descriptors information will be read from /proc/pid/fd. > nit: /proc/self/fd Done http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc File be/src/util/process-state-info.cc: http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@155 PS4, Line 155: does > nit: do not count. Done http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@156 PS4, Line 156: self > nit: either use getpid() here to be consistent with the rest of the file, o Done http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@159 PS4, Line 159: dir > Can you try to think of better variable names? You have a DIR called d, and Done http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@160 PS4, Line 160: while ((dir = readdir(d)) != nullptr) ++fd_count; > Please add a brief comment why your usage of readdir is thread-safe here. Done http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@163 PS4, Line 163: std:: > You shouldn't need std:: here Done -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Sat, 18 Nov 2017 00:44:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8771 ) Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8771/2/docs/topics/impala_shell_options.xml File docs/topics/impala_shell_options.xml: http://gerrit.cloudera.org:8080/#/c/8771/2/docs/topics/impala_shell_options.xml@284 PS2, Line 284: --query_option="option=value" Maybe the short form -Q should be also mentioned here - as I see, both forms are mentioned for other command line options. -- To view, visit http://gerrit.cloudera.org:8080/8771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b Gerrit-Change-Number: 8771 Gerrit-PatchSet: 2 Gerrit-Owner: John RussellGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 12 Dec 2017 00:44:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@52 PS7, Line 52: > Mention the supported substitutions in the methods below (and add comments Done http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@56 PS7, Line 56: substituted > Is there a reason we don't test changes to both? Do they never change at th There is no statement that should change both. http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@76 PS7, Line 76: ns the given queries > Do you need to specify the class to access TestHelper? Yes, adding the outer classes is necessary in this case, h = TestHelper(...) would lead to "NameError: global name 'TestHelper' is not defined". It is also valid to use self.TestHelper(...) but I think that it is weird, so I prefer adding the class name. http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@179 PS7, Line 179: # dynamic partition insert > Can you move this method into the Helper class now? Done - this change resulted in a bit less code but more indentation, so I am not sure that it is better this way. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 04 May 2018 17:50:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#8). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 201 insertions(+), 161 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/8 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 ) Change subject: IMPALA-3307: Add support for IANA time-zone db .. Patch Set 4: Code-Review+1 (6 comments) Few nits otherwise looks good to me. The LocalToUtc performance part is optional - as it does not affect other parts of the code, it can be easily done later when general structure is already accepted by other reviewers. http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc@526 PS4, Line 526: const string& tz_name = (start_lookup.abbr != nullptr) ? start_lookup.abbr : : context->impl()->state()->local_time_zone()->name(); What is the goal of this logic? To print timezone abbreviations instead of the full names, or to distinguish between summer/winter time, or both? A comment would be nice, and maybe the logic could be moved to a TimestampValue member function like string GetTimezoneName(Timezone*). http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@93 PS4, Line 93: inline bool CheckIfDateOutOfRange(const cctz::civil_day& date) { : static const cctz::civil_day max_date(TimestampFunctions::MAX_YEAR, 12, 31); : static const cctz::civil_day min_date(TimestampFunctions::MIN_YEAR, 1, 1); : return date < min_date || date > max_date; : } This could be simpler and possibly faster by expecting a cctz::civil_second argument and check if 1400<=year<1. http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@128 PS4, Line 128: cctz explains pretty well the handling of dst boundaries, maybe we could add a link to it, for example to https://github.com/google/cctz/blob/a2dd3d0fbc811fe0a1d4d2dbb0341f1a3d28cb2a/include/cctz/time_zone.h#L147 http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@129 PS4, Line 129: // In case of ambiguity invalidate TimestampValue. : const cctz::time_zone::civil_lookup from_cl = local_tz->lookup(from_cs); : if (UNLIKELY(from_cl.kind != cctz::time_zone::civil_lookup::UNIQUE)) { : SetToInvalidDateTime(); : } else { : cctz::time_point from_tp = from_cl.pre; : : // Convert 'from_tp' time_point to civil_second assuming 'UTC' time-zone. : cctz::civil_second to_cs = cctz::convert(from_tp, TimezoneDatabase::GetUtcTimezone()); : : // boost::gregorian::date() throws boost::gregorian::bad_year if year is not in the : // 1400.. range. Need to check validity before creating the date object. : if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs { I may be possible to get TimestampValue from cctz::time_zone::civil_lookup in a faster way - splitting from_tp to a day part (since a constant date) and the remainder seconds part is enough for us and should be faster then getting cctz::civil_second (which contains year/month/day split). The code could look something like this: int64 secs_since_base = from_tp - BASETIME_AS_CCTZ_SYS_SEC; time_=sec_since_base%(24*60*60)+time_.fractional_seconds(); int32 days_since_base = sec_since_base/(24*60*60); if(out_of_range(days_since_base)) SetToInvalidDateTime(); date_ = days_since_base - BASEDATE_AS_BOOST_GREG_DATE; http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@146 PS4, Line 146: time_ = boost::posix_time::time_duration(to_cs.hour(), to_cs.minute(), to_cs.second(), nit: long line http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h File be/src/util/filesystem-util.h: http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@66 PS3, Line 66: iff > "iff" stands for "if and only if". Thanks for the explanation! -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 4 Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 09 May 2018 11:47:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 9: The change broke AuthorizationTest#TestDescribeTableResults. I fixed this in patch set 9. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 09 May 2018 19:11:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#9). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M tests/metadata/test_last_ddl_time_update.py 7 files changed, 202 insertions(+), 161 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/9 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6920: fix inconsistencies with scanner thread tokens
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10186 ) Change subject: IMPALA-6920: fix inconsistencies with scanner thread tokens .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/10186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16d31d72441aff7293759281d0248e641df43704 Gerrit-Change-Number: 10186 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Apr 2018 18:52:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6920: fix inconsistencies with scanner thread tokens
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10186 ) Change subject: IMPALA-6920: fix inconsistencies with scanner thread tokens .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/10186/3/be/src/runtime/thread-resource-mgr.h File be/src/runtime/thread-resource-mgr.h: http://gerrit.cloudera.org:8080/#/c/10186/3/be/src/runtime/thread-resource-mgr.h@99 PS3, Line 99: nit: extra space - several copy pasted comments contain double spaces, so this may be intentional http://gerrit.cloudera.org:8080/#/c/10186/3/be/src/runtime/thread-resource-mgr.h@189 PS3, Line 189: Set to NULL when unregistered. Is there a reason behind this? Seems to be an object pool heritage, where Reset() used to reset all the members with the exception of parent_. http://gerrit.cloudera.org:8080/#/c/10186/3/be/src/runtime/thread-resource-mgr.cc File be/src/runtime/thread-resource-mgr.cc: http://gerrit.cloudera.org:8080/#/c/10186/3/be/src/runtime/thread-resource-mgr.cc@55 PS3, Line 55: DCHECK(pools_.find(pool.get()) == pools_.end()); Is this DCHECK still useful after removing object caching? http://gerrit.cloudera.org:8080/#/c/10186/3/be/src/runtime/thread-resource-mgr.cc@133 PS3, Line 133: not nit: double not -- To view, visit http://gerrit.cloudera.org:8080/10186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16d31d72441aff7293759281d0248e641df43704 Gerrit-Change-Number: 10186 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Csaba Ringhofer Gerrit-Comment-Date: Fri, 27 Apr 2018 17:17:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#10). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M tests/metadata/test_last_ddl_time_update.py 7 files changed, 203 insertions(+), 161 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/10 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 10: Code-Review+1 Sorry, some of the authorization test changes were missing from patch 9. Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 10 May 2018 16:10:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 ) Change subject: IMPALA-3307: Add support for IANA time-zone db .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/timestamp-value.cc@139 PS5, Line 139: // In case the resulting 'time_point' is ambiguous, we have to invalidate : // TimestampValue. : // 'civil_lookup' members and the details of handling ambiguity are described at: : // https://github.com/google/cctz/blob/a2dd3d0fbc811fe0a1d4d2dbb0341f1a3d28cb2a/ : // include/cctz/time_zone.h#L106 : if (UNLIKELY(from_cl.kind != cctz::time_zone::civil_lookup::UNIQUE) I have investigated a bit about this: - there is a Jira that complains about this behavior: https://issues.apache.org/jira/browse/IMPALA-3169 - Hive does not work like this, it returns a "valid" timestamp for repeated/skipped hours: select to_utc_timestamp(cast("2011-03-13 02:15:00" as timestamp), "America/Los_Angeles"), to_utc_timestamp(cast("2011-11-06 01:15:00" as timestamp), "America/Los_Angeles") result: 2011-03-13 10:15:00.0 2011-11-06 09:15:00.0 I think that we should do the same, at least for repeated values. I can imagine several valid queries where this would be the correct behavior, for example when we filter for a time interval. So I vote for solving IMPALA-3169 in this patch by choosing pre or post time in non UNIQUE cases too. If there are no test cases yet for skipped/repeated hours, then we should create some and expect the same results that Hive returns. -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 5 Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 11 May 2018 14:04:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#11). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M tests/metadata/test_last_ddl_time_update.py 8 files changed, 220 insertions(+), 173 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/11 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 10: (8 comments) http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1235 PS10, Line 1235:* TODO: the following paragraph seems at least partially obsolate > Why not clean it up then? Point (1) is obsolete but point (2) is still vali Done http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@244 PS10, Line 244: Long.toString(System.currentTimeMillis() / 1000)); > Doesn't the HMS set this in alter_table()? It sets it if the property does not exist, but this would not work well for Kudu tables: after calling alter_table here, the table will not be loaded from HMS again (unlike in HDFS tables, where this is done to ensure HMS-Impala consistency). This means that if I would remove "transient_lastDdlTime" here, then HMS would calculate a valid value, but Impala catalogd would not know about this value. http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@785 PS10, Line 785: msTbl.putToParameters("impala.lastComputeStatsTime", > Create a constant for the table property in Table similar to what we have i I have put the constant to HdfsTable, because all the other property keys reside their, even those that are used by Kudu tables too. http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2868 PS10, Line 2868: // HMS updates transient_lastDdlTime if it is removed. > I understand what this comment says, but I don't understand it's relevance This comment remained here by mistake. http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2870 PS10, Line 2870: Long.toString(System.currentTimeMillis() / 1000)); > We're doing "System.currentTimeMillis() / 1000" in quite a few places, I su I have created a bit different helper function. http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@74 PS10, Line 74: def run_test(self, query, expect_changed_ddl_time, expect_changed_stats_time): > Passing a ";" delimited string is really weird. Why not pass a list of quer The "invalidate metadata %(TBL)s; describe %(TBL)s" combo was used the check that in case of Kudu tables, lastDdlTime is not increased by reloading the table (it used to increase it), so an extra query is needed after INVALIDATE METADATA to load the table. The two can be separated (like for other init queries) but I felt that they belong together. I replaced the string splitting with variable argument list - I can replace this with two separate calls if needed. http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@93 PS10, Line 93: # Hive uses a seconds granularity on the last ddl time. > Not just Hive - we do too. Just say "All times are stored in seconds" or so Isn't it an HMS convention in this case? Or is there a reason behind not using higher precision timestamp? http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@155 PS10, Line 155: h.expect_no_time_change("drop incremental stats %(TBL)s partition (j=1, s='2012')") > use "drop stats" instead to wipe everything (including incremental stats) Are you sure about this? I use DROP INCREMENTAL STATS on purpose to check +1 statement's effect in the test suite. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri,
[Impala-ASF-CR] IMPALA-5706: Parallelise read I/O in sorter
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9943 ) Change subject: IMPALA-5706: Parallelise read I/O in sorter .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1730 PS7, Line 1730: return (sorted_runs_.size() + 1) / 2; If the goal is to minimize the number of "extra merges" per row, then it is optimal for the final merge to always merge as much runs as possible, so this line could return sorted_runs_.size() - max_runs_per_intermediate_merge An example when this would result in less merges: max_runs_per_intermediate_merge: 3 number of runs: 7 (the numbers are the number of original runs merged into a run) 1 1 1 1 1 1 1 1 1 1 1 3 - current logic decides to merge (5+1/2)=3 runs 1 3 3 - ready for final merge after merging 6 runs 1 1 1 1 1 1 1 1 1 1 1 3 - new logic would decide to merge 5-3=2 runs 1 1 3 2 - ready for final merge after merging 5 runs On the other hand, merging more runs in the final merge means that the buffers will be released later, so I am not completely sure that maximizing it is a good idea. -- To view, visit http://gerrit.cloudera.org:8080/9943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9 Gerrit-Change-Number: 9943 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 May 2018 15:59:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 ) Change subject: IMPALA-3307: Add support for IANA time-zone db .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/exprs/timestamp-functions.cc@95 PS3, Line 95: time_t unix_time; : if (UNLIKELY(!ts_value.UtcToUnixTime(_time))) return TimestampVal::null(); : cctz::time_point from_tp = UnixSecondsToTimePoint(unix_time); : : // Convert 'from_tp' time_point to civil_second assuming 'timezone' time-zone. : cctz::civil_second to_cs = cctz::convert(from_tp, *timezone); : : if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs { : const string& msg = Substitute( : "Timestamp '$0' did not convert to a valid local time in timezone '$1'", : ts_value.ToString(), tz_string_value.DebugString()); : context->AddWarning(msg.c_str()); : return TimestampVal::null(); : } : : // Note that 'to_cs' has second granularity. Since time-zone rules do not affect : // fractional seconds, the fractional second part of the returned TimestampVal should be : // equal to ts_value.time().fractional_seconds(). : return CivilSecondToTimestampVal(to_cs, ts_value.time().fractional_seconds()); This logic is the same as TimestampValue::UtcToLocal(), plus warning if the resulting timestamp is not valid. As local time and generic timezone conversions are the same now, it would make sense to keep them at one place. http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/exprs/timestamp-functions.cc@144 PS3, Line 144: cctz::time_point from_tp = from_cl.pre; : : // Convert 'from_tp' time_point to civil_second assuming 'UTC' time-zone. : cctz::civil_second to_cs = cctz::convert(from_tp, TimezoneDatabase::GetUtcTimezone()); : : if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs { : const string& msg = : Substitute("Timestamp '$0' in timezone '$1' could not be converted to UTC", : ts_value.ToString(), tz_string_value.DebugString()); : context->AddWarning(msg.c_str()); : return TimestampVal::null(); : } : : // Note that 'to_cs' has second granularity. Since time-zone rules do not affect : // fractional seconds, the fractional second part of the returned TimestampVal should be : // equal to ts_value.time().fractional_seconds(). : return CivilSecondToTimestampVal(to_cs, ts_value.time().fractional_seconds()); Similarly to my comment at line 113, this logic could be moved to a TimestampValue function. Removing these calculations from this file would mean that the helper functions (line 47-74) could be removed too. http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/runtime/timestamp-value.cc@117 PS3, Line 117: time_ = boost::posix_time::time_duration(to_cs.hour(), to_cs.minute(), to_cs.second(), nit: long line http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h File be/src/util/filesystem-util.h: http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@66 PS3, Line 66: iff typo: if http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@70 PS3, Line 70: iff typo: if http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@71 PS3, Line 71: path Maybe writing "string" instead of "path" express better that no file system is accessed. http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@79 PS3, Line 79: path Same as line 71. http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc File be/src/util/time.cc: http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc@165 PS3, Line 165: nit: extra space http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc@168 PS3, Line 168: nit: extra space http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc@171 PS3, Line 171: nit: extra space -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 3 Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#12). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M tests/metadata/test_last_ddl_time_update.py 8 files changed, 228 insertions(+), 173 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/12 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 11: (4 comments) http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@244 PS10, Line 244: msTable_.putToParameters(StatsSetupConst.DO_NOT_UPDATE_STATS, > Makes sense, please add a comment somewhere stating why we prefer to set th Done http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@785 PS10, Line 785: Table.updateTimestampProperty(msTbl, HdfsTable.TBL_PROP_LAST_COMPUTE_STATS_TIME); > The Kudu properties are generaly in KuduTable. Which property in HdfsTable Done - I thought that some of the properties defined in HdfsTable are used by Kudu tables too, but I looked through them and they are only used in HDFS tables. http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@74 PS10, Line 74: def run_test(self, queries, expect_changed_ddl_time, expect_changed_stats_time): > The vast majority of cases only runs one query at once, and there is no fun Done http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@93 PS10, Line 93: time.sleep (1.1) > I agree it's an HMS convention. Let's change the comment to state that then I have already rewritten Hive to HMS in patch set 11 - does the comment still miss something? -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 15 May 2018 13:57:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 14: >From the failing build: 21:55:34 ] Creating an egg for /home/ubuntu/Impala/shell/ext-py/sqlparse-0.1.14 ... 21:55:34 ] Creating an egg for /home/ubuntu/Impala/shell/ext-py/sqlparse-0.1.19 21:55:34 ] python: can't open file 'setup.py': [Errno 2] No such file or directory 21:55:34 ] Error in shell/make_shell_tarball.sh at line 96: python setup.py -q bdist_egg clean There is a known issue with these log lines which should be resolved by doing a clean build. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 14 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sun, 20 May 2018 12:46:44 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10484 Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: Ibda49725d3e76456f2d1b3edd1bf117b0174e234 --- M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 7 files changed, 226 insertions(+), 173 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/10484/1 -- To view, visit http://gerrit.cloudera.org:8080/10484 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: newchange Gerrit-Change-Id: Ibda49725d3e76456f2d1b3edd1bf117b0174e234 Gerrit-Change-Number: 10484 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR](2.x) IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10484 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 1: The commit on master ( https://gerrit.cloudera.org/#/c/10116/ ) changed AuthorizationTest.java to fix a broken test. This test is not present in 2.x, which made the changes in that file unnecessary + conflicting. -- To view, visit http://gerrit.cloudera.org:8080/10484 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: Ibda49725d3e76456f2d1b3edd1bf117b0174e234 Gerrit-Change-Number: 10484 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Comment-Date: Wed, 23 May 2018 11:55:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 ) Change subject: IMPALA-3307: Add support for IANA time-zone db .. Patch Set 9: > > > Uploaded patch set 9. > > > > Patch -set 9 contains the following changes: > > - Added a full timezone db to testdata/tzdb. > > - End-to-end tests and BE-tests were changed to use this timezone > > db. This was necessary because some timezone-tests were failing > on > > older jenkins workers that had an older tzdata package installed. > > It might be a good idea to store the timezone-db files in one .tar > file and extract them before running the tests. What do you think? I agree, .taring or compressing the tz db would be much better, if it does not make the code too complicated. Having less file would make the review more readable, and would also make the tz db consume much less space on hdfs, as the many small files will be rounded up to hdfs block size. -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 9 Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 25 May 2018 16:33:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10486 ) Change subject: IMPALA-7060: Restrict Impala to only support timezones that work in Hive .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/10486/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10486/1//COMMIT_MSG@29 PS1, Line 29: > I think you should add "Cherry-picks: not for 2.x." line. Done http://gerrit.cloudera.org:8080/#/c/10486/1/be/src/exprs/timezone_db.h File be/src/exprs/timezone_db.h: http://gerrit.cloudera.org:8080/#/c/10486/1/be/src/exprs/timezone_db.h@55 PS1, Line 55: map > const? This map turned out to be unnecessary. http://gerrit.cloudera.org:8080/#/c/10486/1/be/src/exprs/timezone_db.cc File be/src/exprs/timezone_db.cc: http://gerrit.cloudera.org:8080/#/c/10486/1/be/src/exprs/timezone_db.cc@68 PS1, Line 68: {"AEST", "Australia/Sydney"}, : {"CDT", "America/Chicago"}, : {"CEST", "CET"}, : {"EDT", "EST5EDT"}, : {"ICT", "Asia/Ho_Chi_Minh"}, : {"KST", "Asia/Seoul"}, : {"MDT", "MST7MDT"}, : {"PHT", "Asia/Manila"}, : {"PDT", "America/Los_Angeles"} > Are these time-zone abbreviations accepted by Hive? If not, I think we shou Done - I have checked TimezoneDatabase::TIMEZONE_DATABASE_STR, and as all the Java abbreviations are already there, I have removed this map. http://gerrit.cloudera.org:8080/#/c/10486/1/testdata/data/timezoneverification.csv File testdata/data/timezoneverification.csv: http://gerrit.cloudera.org:8080/#/c/10486/1/testdata/data/timezoneverification.csv@716 PS1, Line 716: JST > Please make sure hat all the Java-supported time-zone abbreviations are tes I have checked, all of them are included here. -- To view, visit http://gerrit.cloudera.org:8080/10486 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac Gerrit-Change-Number: 10486 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Jim Apple Gerrit-Comment-Date: Thu, 24 May 2018 14:27:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10486 ) Change subject: IMPALA-7060: Restrict Impala to only support timezones that work in Hive .. Patch Set 1: > Is this a breaking change? We normally hold on braking changes > until we bump the major version. > > Is IMPALA-3307 going to go in on a minor release? Is it a breaking > change? Yes, this and IMPALA-3307 are breaking changes. This change should remove all timezones that won't be supported after IMPALA-3307. IMPALA-3307 will introduce some changes related to timezones that were changed during history, but I consider those changes to be fixes rather than breaking changes. -- To view, visit http://gerrit.cloudera.org:8080/10486 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac Gerrit-Change-Number: 10486 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Jim Apple Gerrit-Comment-Date: Thu, 24 May 2018 14:31:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive
Hello Jim Apple, Attila Jeges, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10486 to look at the new patch set (#3). Change subject: IMPALA-7060: Restrict Impala to only support timezones that work in Hive .. IMPALA-7060: Restrict Impala to only support timezones that work in Hive IANA timezone integration (IMPALA-3307) will drop the support for many non-standard timezone aliases. As IMPALA-3307 is a very large change, its release may be delayed - meanwhile, it would be better to discourage Impala 3.x users from using timezone names that will not be supported in the future. For this reason, this change drops the support of non-standard aliases from the current boost based implementation. This is supposed to be a temporary solution till IMPALA-3307, so I have put minimal effort into it. Testing: Timestamp aliases were largely untested, so only a few changes were needed - these were copied from the review for IMPALA-3307, gerrit.cloudera.org/#/c/9986/ It may be useful to add more tests, e.g. to check that using dropped aliases return NULL and lead to warning, but I think that these kind of changes should be done first in the review of IMPALA-3307, and synced here afterwards. Cherry-picks: not for 2.x. Change-Id: I90859398081bae4976af31b09b3121c198b6adac --- M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.cc M testdata/data/timezoneverification.csv 3 files changed, 2 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/10486/3 -- To view, visit http://gerrit.cloudera.org:8080/10486 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac Gerrit-Change-Number: 10486 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive
Hello Jim Apple, Attila Jeges, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10486 to look at the new patch set (#2). Change subject: IMPALA-7060: Restrict Impala to only support timezones that work in Hive .. IMPALA-7060: Restrict Impala to only support timezones that work in Hive IANA timezone integration (IMPALA-3307) will drop the support for many non-standard timezone aliases. As IMPALA-3307 is a very large change, its release may be delayed - meanwhile, it would be better to discourage Impala 3.x users from using timezone names that will not be supported in the future. For this reason, this change drops the support of non-standard aliases from the current boost based implementation. This is supposed to be a temporary solution till IMPALA-3307, so I have put minimal effort into it. Testing: Timestamp aliases were largely untested, so only a few changes were needed - these were copied from the review for IMPALA-3307, gerrit.cloudera.org/#/c/9986/ It may be useful to add more tests, e.g. to check that using dropped aliases return NULL and lead to warning, but I think that these kind of changes should be done first in the review of IMPALA-3307, and synced here afterwards. Change-Id: I90859398081bae4976af31b09b3121c198b6adac --- M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.cc M testdata/data/timezoneverification.csv 3 files changed, 2 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/10486/2 -- To view, visit http://gerrit.cloudera.org:8080/10486 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac Gerrit-Change-Number: 10486 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#13). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M tests/metadata/test_last_ddl_time_update.py 8 files changed, 228 insertions(+), 173 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/13 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2868 PS12, Line 2868: // catalogd would read it back after alter_table, but as this would not work for > I don't think it's true that we'd always read it back after calling this fu Do you have an example in mind when the table should not be reloaded? I looked at the callers of applyAlterTable(): loadTableMetadata() is called after alter*() + dropTableStats(), and createTable() only creates an "incomplete table", so the table will be reloaded before it is actually used. Because of this, my impression was that tables are always reloaded by design. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 18 May 2018 15:43:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2868 PS12, Line 2868: // this would make it necessary to reload the table after alter_table in order to > I agree that today we will always reload the table metadata from the HMS an Thanks for the explanation! I have rewritten the comment, I hope that it is better now. -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 18 May 2018 17:04:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10486 Change subject: IMPALA-7060: Restrict Impala to only support timezones that work in Hive .. IMPALA-7060: Restrict Impala to only support timezones that work in Hive IANA timezone integration (IMPALA-3307) will drop the support for many non-standard timezone aliases. As IMPALA-3307 is a very large change, its release may be delayed - meanwhile, it would be better to discourage Impala 3.x users from using timezone names that will not be supported in the future. For this reason, this change drops the support of non-standard aliases from the current boost based implementation. This is supposed to be a temporary solution till IMPALA-3307, so I have put minimal effort into it. It should be much faster than the previous solution for timezone aliases, while slightly slower for canonical names. Testing: Timestamp aliases were largely untested, so only a few changes were needed - these were copied from the review for IMPALA-3307, gerrit.cloudera.org/#/c/9986/ It may be useful to add more tests, e.g. to check that using dropped aliases return NULL and lead to warning, but I think that these kind of changes should be done first in the review of IMPALA-3307, and synced here afterwards. Change-Id: I90859398081bae4976af31b09b3121c198b6adac --- M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.cc M be/src/exprs/timezone_db.h M testdata/data/timezoneverification.csv 4 files changed, 46 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/10486/1 -- To view, visit http://gerrit.cloudera.org:8080/10486 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac Gerrit-Change-Number: 10486 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 ) Change subject: IMPALA-3307: Add support for IANA time-zone db .. Patch Set 14: Code-Review+2 (5 comments) Thanks for thinking about Zip-Slip! I have left a few optional comments about the usability of the interfaces. http://gerrit.cloudera.org:8080/#/c/9986/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9986/14//COMMIT_MSG@34 PS14, Line 34: - Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to The Zip slip safe zip-util could be also mentioned in the commit message. http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/filesystem-util.h File be/src/util/filesystem-util.h: http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/filesystem-util.h@92 PS14, Line 92: Directory(const string& path, bool skip_hidden_entries = true); I thought a bit about usability and I vote for removing this parameter and skip only "." and ".." - I can't imagine any use case when I would be interested in those. http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/filesystem-util.h@109 PS14, Line 109: static Status GetEntryNames(const string& path, I would prefer max_result_size to be the last parameter, and give it a default value of 0. http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/zip-util-test.cc File be/src/util/zip-util-test.cc: http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/zip-util-test.cc@69 PS14, Line 69: EXPECT_FALSE(filesystem::exists(dest_dir3)); I guess that this is only true if zip decoding failed at the start, and some files may be already decompressed before reaching an error in the zip. I am not sure what to do with this, probably nothing. It would be possible add some kind of cleanup logic to the java util, but I am not sure if this worth the effort. http://gerrit.cloudera.org:8080/#/c/9986/14/fe/src/main/java/org/apache/impala/util/ZipUtil.java File fe/src/main/java/org/apache/impala/util/ZipUtil.java: http://gerrit.cloudera.org:8080/#/c/9986/14/fe/src/main/java/org/apache/impala/util/ZipUtil.java@45 PS14, Line 45: try (ZipFile zip = new ZipFile(params.archive_file)) { I would move this block to a similar function with (String archiveFile, String destDir) parameters to make this util usable from Java too. This would be minimal extra effort and I think that it can be handy to have an easily usable Zip-Slip safe extract function. -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 14 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 08 Jun 2018 16:15:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7169: Prevent HDFS from checkpointing trash until 3000 AD
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10742 ) Change subject: IMPALA-7169: Prevent HDFS from checkpointing trash until 3000 AD .. Patch Set 2: Code-Review+2 The flakiness related to IMPALA-7130 is not fixed yet ( it is on review: https://gerrit.cloudera.org/#/c/10747/ ). I just mentioned it to ensure you that it is not related to your change, so a new jenkins verify job may be successful. -- To view, visit http://gerrit.cloudera.org:8080/10742 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9452f7e44c7679f86a947cd20115c078757223d8 Gerrit-Change-Number: 10742 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Mon, 18 Jun 2018 20:02:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7181: Fix flaky test shell/test shell commandline.py::TestImpalaShell::test socket opening
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10747 ) Change subject: IMPALA-7181: Fix flaky test shell/test_shell_commandline.py::TestImpalaShell::test_socket_opening .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/10747/1/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/10747/1/tests/shell/test_shell_commandline.py@629 PS1, Line 629: # Verify that impala-shell tries to create a socket against the host:port : # combination specified by -i when -b is not used : impala_shell = Popen(shlex.split("%s %s" % (shell_cmd, args1, )), stdout=devnull, : stderr=devnull) : connection, client_address = s.accept() : data = connection.recv(1024) : assert expected_output in data : impala_shell.kill() : connection.close() : : # Verify that impala-shell tries to create a socket against the host:port : # combination specified by -i when -b is used I would prefer if the duplicated logic in the two cases would be merged to a function somehow, but this is optional. http://gerrit.cloudera.org:8080/#/c/10747/1/tests/shell/test_shell_commandline.py@645 PS1, Line 645: expected_output = "PingImpalaService" Already set in line 627, no need to set here again. http://gerrit.cloudera.org:8080/#/c/10747/1/tests/shell/test_shell_commandline.py@649 PS1, Line 649: s.close() I am a bit worried about exceptions - this could be moved to a finally block or the socket could be added to a with statement (see the python 2 version in https://stackoverflow.com/a/16772515 ) -- To view, visit http://gerrit.cloudera.org:8080/10747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd64632ded936d49fc404bcac75588dd7886be44 Gerrit-Change-Number: 10747 Gerrit-PatchSet: 1 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 18 Jun 2018 18:18:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7181: Fix flaky test shell/test shell commandline.py::TestImpalaShell::test socket opening
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10747 ) Change subject: IMPALA-7181: Fix flaky test shell/test_shell_commandline.py::TestImpalaShell::test_socket_opening .. Patch Set 2: (2 comments) Some socketing concerns, sorry for not noticing them in the first run. http://gerrit.cloudera.org:8080/#/c/10747/2/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/10747/2/tests/shell/test_shell_commandline.py@621 PS2, Line 621: accept I think that this will wait forever by default if no one connects to it. setdefaulttimeout should be to set avoid this ("select" or something similar can be used too, but that will be more complex I guess). http://gerrit.cloudera.org:8080/#/c/10747/2/tests/shell/test_shell_commandline.py@624 PS2, Line 624: impala_shell.kill() : connection.close() These should be moved to a finally block to call them if there is an exception, for example from the assert. -- To view, visit http://gerrit.cloudera.org:8080/10747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd64632ded936d49fc404bcac75588dd7886be44 Gerrit-Change-Number: 10747 Gerrit-PatchSet: 2 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Mon, 18 Jun 2018 19:52:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7181: Fix flaky test shell/test shell commandline.py::TestImpalaShell::test socket opening
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10747 ) Change subject: IMPALA-7181: Fix flaky test shell/test_shell_commandline.py::TestImpalaShell::test_socket_opening .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/10747/3/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/10747/3/tests/shell/test_shell_commandline.py@626 PS3, Line 626: impala_shell.kill() This can actually also throw an exception, if the process already exited for some reason. http://gerrit.cloudera.org:8080/#/c/10747/3/tests/shell/test_shell_commandline.py@627 PS3, Line 627: connection.close() connection should be None checked. -- To view, visit http://gerrit.cloudera.org:8080/10747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd64632ded936d49fc404bcac75588dd7886be44 Gerrit-Change-Number: 10747 Gerrit-PatchSet: 3 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Mon, 18 Jun 2018 20:59:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7169: Prevent HDFS from checkpointing trash until 3000 AD
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10742 ) Change subject: IMPALA-7169: Prevent HDFS from checkpointing trash until 3000 AD .. Patch Set 2: fyi, the build failure is related to https://issues.apache.org/jira/browse/IMPALA-7130 Vincent is working on the test to deflake it. -- To view, visit http://gerrit.cloudera.org:8080/10742 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9452f7e44c7679f86a947cd20115c078757223d8 Gerrit-Change-Number: 10742 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 18 Jun 2018 15:16:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7175: deflake check for failed impalad
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10745 ) Change subject: IMPALA-7175: deflake check for failed impalad .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/10745/1/tests/query_test/test_udfs.py File tests/query_test/test_udfs.py: http://gerrit.cloudera.org:8080/#/c/10745/1/tests/query_test/test_udfs.py@301 PS1, Line 301: def __num_impalads(self): Other tests may also run to the same problem, so I think that it would be useful to move this logic to a function like get_responsive_impalads() in impala_cluster.py -- To view, visit http://gerrit.cloudera.org:8080/10745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97c6b398e43c6abb1df2b1783c26159137f14fa4 Gerrit-Change-Number: 10745 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Csaba Ringhofer Gerrit-Comment-Date: Mon, 18 Jun 2018 16:30:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6835: Add table name and node id to Kudu scanner errors
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10671 ) Change subject: IMPALA-6835: Add table name and node id to Kudu scanner errors .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/10671 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0377fc8591738dc45092d228fcf292ddbb367825 Gerrit-Change-Number: 10671 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 13 Jun 2018 15:33:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 ) Change subject: IMPALA-3307: Add support for IANA time-zone db .. Patch Set 18: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/9986/17/be/src/exprs/timezone_db.cc File be/src/exprs/timezone_db.cc: http://gerrit.cloudera.org:8080/#/c/9986/17/be/src/exprs/timezone_db.cc@367 PS17, Line 367: Status TimezoneDatabase::LoadZoneAliasesFromHdfs( : const string& hdfs_zone_alias_conf) { nit: this could fit in one line -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 18 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Jun 2018 13:24:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6408: [DOCS] Add a missing info about SHUFFLE
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10685 ) Change subject: IMPALA-6408: [DOCS] Add a missing info about SHUFFLE .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/10685 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5738557354c384aab983f64722dde5944037aed7 Gerrit-Change-Number: 10685 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Csaba Ringhofer Gerrit-Comment-Date: Tue, 12 Jun 2018 13:49:59 + Gerrit-HasComments: No
[Impala-ASF-CR] Fail cleanly when in process server can't bind
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10726 ) Change subject: Fail cleanly when in process server can't bind .. Patch Set 1: Code-Review+1 I thought a bit about not too intrusive ways to reduce the window where a port number is already decided but the port itself is unbound. A static/global port number->socket fd map could be created, and FindUnusedEphemeralPort() could add the fd to this map instead of closing it. Another static function like "ClosePortIfReserved()" could be called as close as possible to the place where the port is bound to socket by the actual server. This function would be a noop if the port is not in the map, so caller would not have to know if the port was "reserved" or not. -- To view, visit http://gerrit.cloudera.org:8080/10726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I376a2aa559f4b5cf3b96fa3465520e9983ecec4b Gerrit-Change-Number: 10726 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Csaba Ringhofer Gerrit-Comment-Date: Fri, 15 Jun 2018 12:11:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7115: set a default THREAD RESERVATION LIMIT value
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10628 ) Change subject: IMPALA-7115: set a default THREAD_RESERVATION_LIMIT value .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/10628/1/testdata/workloads/functional-query/queries/QueryTest/thread-limits.test File testdata/workloads/functional-query/queries/QueryTest/thread-limits.test: http://gerrit.cloudera.org:8080/#/c/10628/1/testdata/workloads/functional-query/queries/QueryTest/thread-limits.test@109 PS1, Line 109: SELECT count(*) from functional.alltypes When I am trying to create a plan with crazy amount of nodes, I use WITH + UNION to get exponential growth. The following query should lead to at least 1000 fragments: explain with c1 as (select count(*) from functional.alltypes), c2 as (select * from c1 union select * from c1), c4 as (select * from c2 union select * from c2), c8 as (select * from c4 union select * from c4), c16 as (select * from c8 union select * from c8), c32 as (select * from c16 union select * from c16), c64 as (select * from c32 union select * from c32), c128 as (select * from c64 union select * from c64), c256 as (select * from c128 union select * from c128), c512 as (select * from c256 union select * from c256) select * from c512 union select * from c512; I think that the resulting plan would not be the same as with the flat list of unions, but it would do the job while the test would be much more compact. -- To view, visit http://gerrit.cloudera.org:8080/10628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I31d3fa3f6305c360922649dba53a9026c9563384 Gerrit-Change-Number: 10628 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Comment-Date: Thu, 14 Jun 2018 14:42:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6835: Improve Kudu scanner error messages to include the table name and the plan node id
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10671 ) Change subject: IMPALA-6835: Improve Kudu scanner error messages to include the table name and the plan node id .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/10671/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10671/1//COMMIT_MSG@7 PS1, Line 7: IMPALA-6835: Improve Kudu scanner error messages to include the table name and the plan node id nit: it would be nice to find a shorter title without loosing too much information - e.g. "Add table name and node id to Kudu scanner errors" http://gerrit.cloudera.org:8080/#/c/10671/1//COMMIT_MSG@11 PS1, Line 11: TPlanNode id which made it inconveient to debug. This change add nit: missing s http://gerrit.cloudera.org:8080/#/c/10671/1/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/10671/1/be/src/exec/kudu-scanner.cc@151 PS1, Line 151: Substitute("Unable to deserialize scan token for PlanNode '$0' with KuduTable '$1'", :scan_node_->id(), scan_node_->table_->name())); This made me think about ways to make it less verbose - a function like "string KuduScanner::AppendInfo(const char* msg)" could be created to add the node id and table name to a message. It is up to you, but I think that it's worth to add +1 function to make the code a bit shorter and ensure consistent messages. http://gerrit.cloudera.org:8080/#/c/10671/1/be/src/exec/kudu-scanner.cc@167 PS1, Line 167: scan_node_->id() I am not sure about this, but I would add table name for these errors too. http://gerrit.cloudera.org:8080/#/c/10671/1/be/src/exec/kudu-scanner.cc@168 PS1, Line 168: VLOG_ROW << "Starting KuduScanner with ReadMode=" << mode << " timeout=" << It may be useful do add the same info to logs. -- To view, visit http://gerrit.cloudera.org:8080/10671 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0377fc8591738dc45092d228fcf292ddbb367825 Gerrit-Change-Number: 10671 Gerrit-PatchSet: 1 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 11 Jun 2018 17:44:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10683 Change subject: IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width .. IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width Reading dictionary encoded Parquet data pages where the bit width is larger than the encoded type's size (e.g. coding 8 bit TINYINT with 16 bit dictionary indices) led to DCHECK error in debug builds. Impala does not create such parquet files (an N bit type can have maximum 2^N distinct values, so N bit dictionary indices are enough for a dictionary that contains every possible value), but the Parquet standard does not forbid to do so. These DCHECKs were probably introduced by a copy paste error (similar checks exist in the non-dictionary encoded bit reader functions, where they are valid). Testing: - a new test is added to check that these data pages can be decoded correctly Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a --- M be/src/util/bit-packing.inline.h M be/src/util/bit-stream-utils.inline.h M testdata/data/README A testdata/data/dict_encoding_with_large_bit_width.parquet M tests/query_test/test_scanners.py 5 files changed, 21 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/10683/1 -- To view, visit http://gerrit.cloudera.org:8080/10683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a Gerrit-Change-Number: 10683 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR] [DOCS] Remove duplicate bullet point about null() in UDF data types
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10614 ) Change subject: [DOCS] Remove duplicate bullet point about null() in UDF data types .. Patch Set 1: I have just bumped into this duplication. I thought that such a trivial change does not require a Jira, but I can create one for it if necessary. -- To view, visit http://gerrit.cloudera.org:8080/10614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6e371bda95e1f8fb116e57505cc6f10096ddf3b Gerrit-Change-Number: 10614 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Csaba Ringhofer Gerrit-Comment-Date: Wed, 06 Jun 2018 16:12:38 + Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Remove duplicate bullet point about null() in UDF data types
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10614 Change subject: [DOCS] Remove duplicate bullet point about null() in UDF data types .. [DOCS] Remove duplicate bullet point about null() in UDF data types The removed bullet point contained the same information as the second bullet point in the list. Change-Id: Ib6e371bda95e1f8fb116e57505cc6f10096ddf3b --- M docs/topics/impala_udf.xml 1 file changed, 0 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/10614/1 -- To view, visit http://gerrit.cloudera.org:8080/10614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib6e371bda95e1f8fb116e57505cc6f10096ddf3b Gerrit-Change-Number: 10614 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 ) Change subject: IMPALA-3307: Add support for IANA time-zone db .. Patch Set 11: (2 comments) Thanks for adding zip support! We should add some tests for zip_util, especially for error handling, which is an untested path at the moment if didn't miss something. I am ok with moving this (and dealing with my other comments) to a later commit. http://gerrit.cloudera.org:8080/#/c/9986/11/be/src/exprs/timezone_db.cc File be/src/exprs/timezone_db.cc: http://gerrit.cloudera.org:8080/#/c/9986/11/be/src/exprs/timezone_db.cc@198 PS11, Line 198: GetNextDirectoryEntry This is subjective, but I do not like this interface too much. I would prefer to wrap dir_stream to a class/struct, or create a function like this: static STATUS ListDirEntries(string path, vector& result, int max_result_num = 0). Both could be moved to util/filesystem_util. http://gerrit.cloudera.org:8080/#/c/9986/11/be/src/exprs/timezone_db.cc@213 PS11, Line 213: readdir_r There was a discussion about readdir_r() vs readdir() in https://gerrit.cloudera.org/#/c/8546/8 , and readdir() was preferred in the end. -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 11 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 06 Jun 2018 18:35:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 ) Change subject: IMPALA-3307: Add support for IANA time-zone db .. Patch Set 10: I do not see any more low hanging fruits for performance improvement. Some overhead could be removed by modifying CCTZ, but this is out of the scope of this change, so I created a follow up Jira: IMPALA-7085: Consider patching Google/CCTZ for Impala's need -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 10 Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 28 May 2018 16:20:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive
Hello Jim Apple, Attila Jeges, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10486 to look at the new patch set (#4). Change subject: IMPALA-7060: Restrict Impala to only support timezones that work in Hive .. IMPALA-7060: Restrict Impala to only support timezones that work in Hive IANA timezone integration (IMPALA-3307) will drop the support for many non-standard timezone aliases. As IMPALA-3307 is a very large change, its release may be delayed - meanwhile, it would be better to discourage Impala 3.x users from using timezone names that will not be supported in the future. For this reason, this change drops the support of non-standard aliases from the current boost based implementation. This is supposed to be a temporary solution till IMPALA-3307, so I have put minimal effort into it. Testing: - a few timezones are removed from timezoneverification.csv - added test for invalid timezone names Cherry-picks: not for 2.x. Change-Id: I90859398081bae4976af31b09b3121c198b6adac --- M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.cc M testdata/data/timezoneverification.csv M tests/query_test/test_timezones.py 4 files changed, 17 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/10486/4 -- To view, visit http://gerrit.cloudera.org:8080/10486 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac Gerrit-Change-Number: 10486 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive
Hello Jim Apple, Attila Jeges, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10486 to look at the new patch set (#5). Change subject: IMPALA-7060: Restrict Impala to only support timezones that work in Hive .. IMPALA-7060: Restrict Impala to only support timezones that work in Hive IANA timezone integration (IMPALA-3307) will drop the support for many non-standard timezone aliases. As IMPALA-3307 is a very large change, its release may be delayed - meanwhile, it would be better to discourage Impala 3.x users from using timezone names that will not be supported in the future. For this reason, this change drops the support of non-standard aliases from the current boost based implementation. This is supposed to be a temporary solution till IMPALA-3307, so I have put minimal effort into it. Testing: - a few timezones are removed from timezoneverification.csv - added test for invalid timezone names Cherry-picks: not for 2.x. Change-Id: I90859398081bae4976af31b09b3121c198b6adac --- M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.cc M testdata/data/timezoneverification.csv M tests/query_test/test_timezones.py 4 files changed, 17 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/10486/5 -- To view, visit http://gerrit.cloudera.org:8080/10486 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac Gerrit-Change-Number: 10486 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10486 ) Change subject: IMPALA-7060: Restrict Impala to only support timezones that work in Hive .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/10486/4/be/src/exprs/timezone_db.cc File be/src/exprs/timezone_db.cc: http://gerrit.cloudera.org:8080/#/c/10486/4/be/src/exprs/timezone_db.cc@628 PS4, Line 628: \"SystemV/AST4\",\"AST\",\"Atlantic Standard Time\",\"\",\"\",\"-04:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\ : \"SystemV/AST4ADT\",\"AST\",\"Atlantic Standard Time\",\"ADT\",\"Atlantic Daylight Time\",\"-04:00:00\",\"+01:00:00\",\"-1;0;4\",\"+02:00:00\",\"-1;0;10\",\"+02:00:00\"\n\ : \"SystemV/CST6\",\"CST\",\"Central Standard Time\",\"\",\"\",\"-06:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\ : \"SystemV/CST6CDT\",\"CST\",\"Central Standard Time\",\"CDT\",\"Central Daylight Time\",\"-06:00:00\",\"+01:00:00\",\"-1;0;4\",\"+02:00:00\",\"-1;0;10\",\"+02:00:00\"\n\ : \"SystemV/EST5\",\"EST\",\"Eastern Standard Time\",\"\",\"\",\"-05:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\ : \"SystemV/EST5EDT\",\"EST\",\"Eastern Standard Time\",\"EDT\",\"Eastern Daylight Time\",\"-05:00:00\",\"+01:00:00\",\"-1;0;4\",\"+02:00:00\",\"-1;0;10\",\"+02:00:00\"\n\ : \"SystemV/HST10\",\"HST\",\"Hawaii Standard Time\",\"\",\"\",\"-10:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\ : \"SystemV/MST7\",\"MST\",\"Mountain Standard Time\",\"\",\"\",\"-07:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\ : \"SystemV/MST7MDT\",\"MST\",\"Mountain Standard Time\",\"MDT\",\"Mountain Daylight Time\",\"-07:00:00\",\"+01:00:00\",\"-1;0;4\",\"+02:00:00\",\"-1;0;10\",\"+02:00:00\"\n\ : \"SystemV/PST8\",\"PST\",\"Pacific Standard Time\",\"\",\"\",\"-08:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\ : \"SystemV/PST8PDT\",\"PST\",\"Pacific Standard Time\",\"PDT\",\"Pacific Daylight Time\",\"-08:00:00\",\"+01:00:00\",\"-1;0;4\",\"+02:00:00\",\"-1;0;10\",\"+02:00:00\"\n\ : \"SystemV/YST9\",\"AKST\",\"Alaska Standard Time\",\"\",\"\",\"-09:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\ : \"SystemV/YST9YDT\",\"AKST\",\"Alaska Standard Time\",\"AKDT\",\"Alaska Daylight Time\",\"-09:00:00\",\"+01:00:00\",\"-1;0;4\",\"+02:00:00\",\"-1;0;10\",\"+02:00:00\"\n\ > I think these lines should be removed too. Done http://gerrit.cloudera.org:8080/#/c/10486/4/testdata/data/timezoneverification.csv File testdata/data/timezoneverification.csv: http://gerrit.cloudera.org:8080/#/c/10486/4/testdata/data/timezoneverification.csv@806 PS4, Line 806: SystemV/AST4,2015-06-20 15:00:00,2015-06-20 11:00:00 : SystemV/AST4ADT,2015-06-20 15:00:00,2015-06-20 12:00:00 : SystemV/AST4ADT,2014-12-20 15:00:00,2014-12-20 11:00:00 : SystemV/CST6,2015-06-20 15:00:00,2015-06-20 09:00:00 : SystemV/CST6CDT,2015-06-20 15:00:00,2015-06-20 10:00:00 : SystemV/CST6CDT,2014-12-20 15:00:00,2014-12-20 09:00:00 : SystemV/EST5,2015-06-20 15:00:00,2015-06-20 10:00:00 : SystemV/EST5EDT,2015-06-20 15:00:00,2015-06-20 11:00:00 : SystemV/EST5EDT,2014-12-20 15:00:00,2014-12-20 10:00:00 : SystemV/HST10,2015-06-20 15:00:00,2015-06-20 05:00:00 : SystemV/MST7,2015-06-20 15:00:00,2015-06-20 08:00:00 : SystemV/MST7MDT,2015-06-20 15:00:00,2015-06-20 09:00:00 : SystemV/MST7MDT,2014-12-20 15:00:00,2014-12-20 08:00:00 : SystemV/PST8,2015-06-20 15:00:00,2015-06-20 07:00:00 : SystemV/PST8PDT,2015-06-20 15:00:00,2015-06-20 08:00:00 : SystemV/PST8PDT,2014-12-20 15:00:00,2014-12-20 07:00:00 : SystemV/YST9,2015-06-20 15:00:00,2015-06-20 06:00:00 : SystemV/YST9YDT,2015-06-20 15:00:00,2015-06-20 07:00:00 : SystemV/YST9YDT,2014-12-20 15:00:00,2014-12-20 06:00:00 > These should be removed too, see my previous comment. Done - note that these tests were not removed from https://gerrit.cloudera.org/#/c/9986/10/testdata/data/timezoneverification.csv -- To view, visit http://gerrit.cloudera.org:8080/10486 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac Gerrit-Change-Number: 10486 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Jim Apple Gerrit-Comment-Date: Mon, 04 Jun 2018 17:58:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 ) Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@12 PS1, Line 12: the logs : will contain a new line for every occurrence > seems like we could change that generally. I was hesitant to go that way, because messages with the same error code but different parameters can contain useful information (for example different column names). http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22 PS1, Line 22: fragments > Right, but I don't think we do much LogError() from that (other than from i I agree that this logic would be mainly useful for scanners. KuduSink has a per row error though (TErrorCode::KUDU_NULL_CONSTRAINT_VIOLATION). I thought about adding these functions to scanner code, but I did not find a suitable class - HdfsScanner would be a good candidate, but it does not have common ancestors with KuduScanner, and both classes should be able to use this logic. Do you have a good place in mind? The primary motivation is to avoid massive logging with this specific message. Avoiding locking came up as a secondary benefit, as it seemed trivial to implement it with the new interface. -- To view, visit http://gerrit.cloudera.org:8080/10793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa Gerrit-Change-Number: 10793 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 26 Jun 2018 14:45:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 ) Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors .. Patch Set 1: (2 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@12 PS1, Line 12: the logs : will contain a new line for every occurrence > But isn't that the point of the change (as you state below). I guess it wo The log lines to eliminate are the ones that report the exact same information. These would be merged to max two lines: 1. one at the first occurrence of the error 2. one when the error collector is flushed - this would contain the number of errors since 1. This logic is probably more complicated then it should be, but it ensures that minimal information is lost. It would be simpler to avoid the flushing logic by logging only the first errors, but as RuntimeState reports the number of occurrences per error code, some way is needed to increment the counters. http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22 PS1, Line 22: fragments > As the class is written today, it's not really tied to RuntimeState. You co My problem with changing RuntimeState::LogError() to log only the first error for a code is that it would completely hide if an error occurs in more than one files/columns. Having one LogCollector per ColumnReader means that the error in one column cannot hide the one in another. An alternate approach would be to keep a LogCollector per HdfsScanner, and keep the last message, and log only if the message was changed. This would simplify the interface and work well with column readers. Another possibility would be to create a class to collect specially data errors. Its LogError() function could expect table/file/column parameters, and use them as the key of a map and count errors separately. -- To view, visit http://gerrit.cloudera.org:8080/10793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa Gerrit-Change-Number: 10793 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Jun 2018 14:14:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7190: Remove unsupported format writer support
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10823 ) Change subject: IMPALA-7190: Remove unsupported format writer support .. Patch Set 1: Code-Review+1 (2 comments) Some minor comments for tests, otherwise lgtm. http://gerrit.cloudera.org:8080/#/c/10823/1/tests/query_test/test_compressed_formats.py File tests/query_test/test_compressed_formats.py: http://gerrit.cloudera.org:8080/#/c/10823/1/tests/query_test/test_compressed_formats.py@150 PS1, Line 150: def test_unsupported_writers(self, vector): I would prefer to use unique_database, and remove the drops from unsupported-writers.test http://gerrit.cloudera.org:8080/#/c/10823/1/tests/shell/test_shell_interactive.py File tests/shell/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/10823/1/tests/shell/test_shell_interactive.py@a404 PS1, Line 404: This line could be left here, but check that "... not in result.stduot". This would make it easier to restore the original test when there will be new deprecated query options. -- To view, visit http://gerrit.cloudera.org:8080/10823 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I821dc7495a901f1658daa500daf3791b386c7185 Gerrit-Change-Number: 10823 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 27 Jun 2018 13:13:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Update to the workaround for IMPALA-3316
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10800 ) Change subject: [DOCS] Update to the workaround for IMPALA-3316 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/10800/1/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: http://gerrit.cloudera.org:8080/#/c/10800/1/docs/topics/impala_known_issues.xml@192 PS1, Line 192: SSS It does not have to be millisecond precision - the date can be without fractional part or with 1-9 digit fractional part. -- To view, visit http://gerrit.cloudera.org:8080/10800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If45da5d24dd3bc5f649d95b5bc104047420dbea1 Gerrit-Change-Number: 10800 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Csaba Ringhofer Gerrit-Comment-Date: Mon, 25 Jun 2018 17:36:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Update to the workaround for IMPALA-3316
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10800 ) Change subject: [DOCS] Update to the workaround for IMPALA-3316 .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If45da5d24dd3bc5f649d95b5bc104047420dbea1 Gerrit-Change-Number: 10800 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Csaba Ringhofer Gerrit-Comment-Date: Mon, 25 Jun 2018 18:19:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7181: Fix flaky test shell/test shell commandline.py::TestImpalaShell::test socket opening
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10747 ) Change subject: IMPALA-7181: Fix flaky test shell/test_shell_commandline.py::TestImpalaShell::test_socket_opening .. Patch Set 6: Code-Review+2 It turned out that I do not have an account on Jenkins yet, so I can only run DRY_RUN=true at the moment. -- To view, visit http://gerrit.cloudera.org:8080/10747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd64632ded936d49fc404bcac75588dd7886be44 Gerrit-Change-Number: 10747 Gerrit-PatchSet: 6 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Tue, 19 Jun 2018 13:18:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#6). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 182 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/6 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#5). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 182 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/5 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#7). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 182 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/7 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6946: handle negative counts in RLE decoder
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10233 ) Change subject: IMPALA-6946: handle negative counts in RLE decoder .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h File be/src/util/rle-encoding.h: http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@326 PS2, Line 326: LIKELY LIKELY seems to be strangely placed now - both conditions should be included in the likely branch, or maybe only the new condition, as current_value_ == value can be often false, for example in dict encoded pages. http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@679 PS2, Line 679: int32_t num_repeats = NextNumRepeats(); DictDecoder::DecodeNextValue also calls NextNumLiterals() and NextNumRepeats(), and expects uint32_t - can you do some cleanup there? http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@681 PS2, Line 681:uint32_t num_repeats_to_set = :std::min(num_repeats, num_values_to_consume - num_consumed); We could change this to signed too and remove the template parameter from std::min, as both arguments and the result would be signed. http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@694 PS2, Line 694: uint32_t num_literals_to_set = : std::min(num_literals, num_values_to_consume - num_consumed); Same as in line 681. -- To view, visit http://gerrit.cloudera.org:8080/10233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If75ef3fb12494209918c100f26407cd93b17addb Gerrit-Change-Number: 10233 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Csaba Ringhofer Gerrit-Comment-Date: Wed, 02 May 2018 14:54:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6946: handle negative counts in RLE decoder
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10233 ) Change subject: IMPALA-6946: handle negative counts in RLE decoder .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/10233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If75ef3fb12494209918c100f26407cd93b17addb Gerrit-Change-Number: 10233 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 02 May 2018 18:59:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 4: (11 comments) I do not consider this change to be complete, as there are some open questions: - At this point every kind of COMPUTE STATS update impala.lastComputeStatsTime, including INCREMENTAL, TABLESAMPLE, and calls that only COMPUTE STATS for specific columns. I can exclude some of these, but I think that the current behavior should cause the "least surprise". - I think that the tests should be reorganized, but I would like to finalize the behavior before doing that. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@240 PS3, Line 240: > Can you explain in a comment why we need to clone the table instead of remo I have changed the logic to upload data to HMS only if it was actually changed. The cloning was a misunderstanding on my part: CatalogOpExecutor clones HMS tables before calling alter_table, but the reason is that it does not want to update its own representation till the update is actually successful, so it alters the clone and calls alter_table with it. After alter_table, it downloads the table from HMS and overwrites its representation with this new version, so Catalog always contains a version which was loaded from HMS. Kudu tables are different, as Impala tries to be in sync with Kudu instead of HMS, and this final alter_table's goal is to ensure that HMS is in sync with Kudu. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@728 PS3, Line 728:* but not for ALTER TABLE SET COLUMN STATS. > This comment should point out that it updates impala.lastComputeStatsTime, Done http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@782 PS3, Line 782: u > Why do we only update this if both are set? Why not if we're just updating I tried to be as strict as possible to ensure that only COMPUTE STATS update impala.lastComputeStatsTime, but it turned out to be unnecessary. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2857 PS3, Line 2857:* command if the metadata is not completely in-sync. This affects both Hive and > Explain the bool parameter in the comment including what value the lastDdlT Is the missing field explanation really necessary? I think that the current comment states quite clearly that the alter_table overwrites the whole table. http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2862 PS3, Line 2862:*/ > unused? Done http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3221 PS3, Line 3221:* parent. > Assuming this line is still correct, can you add a comment where this happe It is not correct (since a long time). http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@50 PS3, Line 50: # compute statistics the fill table property impala.lastComputeStatsTime > This should be more verbose, explain why it is necessary to initialize it. I would like to add more comments after the tests are reorganized to their final shape. http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@75 PS3, Line 75: self.execute_query("drop stats %s" % FQ_TBL_NAME) > We should also have a test for computing incremental stats, for sampled sta Done http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@145 PS3, Line 145: # change table property > I think we should have defaults for both expect_* parameters here, or for n Done http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@161 PS3, Line 161: # drop statistics > nit: beforeStatsTime Done http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@180 PS3, Line 180: LAST_COMPUTE_STATS_TIME_KEY = "impala.lastComputeStatsTime" > stats_time, given the key is lastComputeStatsTime, too. Done -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#4). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 152 insertions(+), 131 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/4 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test@932 PS1, Line 932: The case when the subquery returns 0 row could be tested too. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 16:45:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8771 ) Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b Gerrit-Change-Number: 8771 Gerrit-PatchSet: 3 Gerrit-Owner: John RussellGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 10 Jan 2018 10:59:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#8). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce to files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 331 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/8 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#7). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce to files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 330 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/7 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#10). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 332 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/10 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/8400/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8400/8//COMMIT_MSG@20 PS8, Line 20: > nit: wrong word? Done http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup@1055 PS6, Line 1055: plan_hints:hints > nit: please check that all review comments are marked as "Done" when sendin Done http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69 PS6, Line 69:* Helper class for parsing. > Thanks for the explanation, I think I understand it now. Could we just have I have no problems with going that way, but I am not sure that it would make the change smaller: InsertStmt would also need a Set/AddPlanHints function, and it would be probably better to rewrite InsertStmt to always use the setter instead of constructor arguments. http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@71 PS8, Line 71: > nit: "rule that checks" or "rules that check" Done http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@75 PS8, Line 75: > partitio > nit: we generally seem to use camel case for acronyms, e.g. see createCtasT Done http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@344 PS6, Line 344: 01:EXCHANGE [UNPARTITIONED] > Yes, I think it's intentional. Maybe the docs need to be more clear. Can yo I have created IMPALA-6408 to track the doc issue. -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 17 Jan 2018 16:04:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#9). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 331 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/9 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8973 ) Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges .. Patch Set 3: > Why do you say that change is more complex? Picking a pre-existing database, and setting it up as needed for the test seemed more error prone to me than creating a minimal temporary db. Now (after creating the AuthorizationTest.java tests) I agree that they have a better place in java. -- To view, visit http://gerrit.cloudera.org:8080/8973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Gerrit-Change-Number: 8973 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Tue, 16 Jan 2018 20:26:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5801: [draft] Clean up codegen GetType() interface
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9063 Change subject: IMPALA-5801: [draft] Clean up codegen GetType() interface .. IMPALA-5801: [draft] Clean up codegen GetType() interface First draft - my goal with this patch is create a review where we can decide on the scope of the refactor and agree on the names of the functions. Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/exec-node.cc M be/src/exec/filter-context.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/tuple.cc M be/src/runtime/types.cc M be/src/util/tuple-row-compare.cc 21 files changed, 211 insertions(+), 178 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/9063/1 -- To view, visit http://gerrit.cloudera.org:8080/9063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 Gerrit-Change-Number: 9063 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR] IMPALA-5801: [draft] Clean up codegen GetType() interface
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9063 ) Change subject: IMPALA-5801: [draft] Clean up codegen GetType() interface .. Patch Set 1: (3 comments) The declarations of the new/renamed functions are in llvm-codegen.h. http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h@256 PS1, Line 256: llvm::PointerType* GetInternalRepresentationPtr(const ColumnType& type); I really do not like this name, if anyone has a better idea, please share it. http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h@496 PS1, Line 496: llvm::Type* boolean_type() { return GetInternalRepresentation(TYPE_BOOLEAN); } Maybe bool_type would be more logical in a c++ context. http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h@497 PS1, Line 497: llvm::Type* i8_type() { return GetInternalRepresentation(TYPE_TINYINT); } : llvm::Type* i16_type() { return GetInternalRepresentation(TYPE_SMALLINT); } : llvm::Type* i32_type() { return GetInternalRepresentation(TYPE_INT); } : llvm::Type* i64_type() { return GetInternalRepresentation(TYPE_BIGINT); } I have chosen i8_type() instead of int8_type(), because i128_type() already used this convention. If we want these names to be short, the type can be abbreviated too, for example as int8_t(). -- To view, visit http://gerrit.cloudera.org:8080/9063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 Gerrit-Change-Number: 9063 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Comment-Date: Thu, 18 Jan 2018 21:15:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#11). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 330 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/11 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/8400/10/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/10/fe/src/main/cup/sql-parser.cup@1185 PS10, Line 1185: e > nit: indent 2 spaces Done -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Thu, 18 Jan 2018 20:22:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 7: (7 comments) I have overwritten the wrong test in patch set 7, please ignore it and compare patch 6 to 8. http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@19 PS6, Line 19: nly > add "more memory efficient"? That makes it more obvious why it is more effi I hope it is correct and understandable enough this way. http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@27 PS6, Line 27: shuffle: > I don't think this is true, is it? For unpartitioned tables, what would the Done http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@35 PS6, Line 35: > Mention how you tested it in the commit message. I wrote about the current state of tests, but I became unsure during the process - should I add tests that actually execute the CTASs with hints? The reason why not to add them is that the non-DDL part of query execution is really the same as in INSERT tests and can be quite slow. http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69 PS6, Line 69:* Helper class for parsing. > Is this class still needed or a left-over from earlier patch-sets? Can you It is still needed, as it is the result value of non-terminal create_tbl_as_select_stmt_inner. I am not sure that this is the correct place for this explanation, maybe it should be moved to the .cup. http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1701 PS6, Line 1701: Multiple non-conflicting > Can you add one that has multiple distinct hints, like shuffle, clustered? Done http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@233 PS6, Line 233: shuffle > Is that true? Without specifying a hint there should be cases where the cos Thanks for spotting this, I have relied too much on experimentation instead of checking the code... I have split this test into two to test both outcomes. http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@344 PS6, Line 344: | order by: year ASC NULLS > At first I didn't understand what this one does (I saw that insert.test:574 I should add a comment about this, but I am unsure whether it should be a simple comment, or a todo about a strange behavior. This looks intentional (see https://gerrit.cloudera.org/#/c/4162/ ), but it is not mentioned explicitly in the documentation. It is true that this reduces the number of writes overall, but as a side effect, it causes the whole table to be written to one node. Writing the whole table to the coordinator may be useful in some cases (like a temp table?), but I think that shuffle has a different intention. I would ask Greg / Alex about their opinion. -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 15 Jan 2018 18:06:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5801: [draft] Clean up codegen GetType() interface
Hello Lars Volker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9063 to look at the new patch set (#2). Change subject: IMPALA-5801: [draft] Clean up codegen GetType() interface .. IMPALA-5801: [draft] Clean up codegen GetType() interface First draft - my goal with this patch is create a review where we can decide on the scope of the refactor and agree on the names of the functions. Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/exec-node.cc M be/src/exec/filter-context.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/tuple.cc M be/src/runtime/types.cc M be/src/util/tuple-row-compare.cc 21 files changed, 224 insertions(+), 210 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/9063/2 -- To view, visit http://gerrit.cloudera.org:8080/9063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 Gerrit-Change-Number: 9063 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5801: [draft] Clean up codegen GetType() interface
Hello Lars Volker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9063 to look at the new patch set (#3). Change subject: IMPALA-5801: [draft] Clean up codegen GetType() interface .. IMPALA-5801: [draft] Clean up codegen GetType() interface First draft - my goal with this patch is create a review where we can decide on the scope of the refactor and agree on the names of the functions. Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/exec-node.cc M be/src/exec/filter-context.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/tuple.cc M be/src/runtime/types.cc M be/src/util/tuple-row-compare.cc 21 files changed, 224 insertions(+), 210 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/9063/3 -- To view, visit http://gerrit.cloudera.org:8080/9063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 Gerrit-Change-Number: 9063 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges
Hello Lars Volker, Laszlo Gaal, Gabor Kaszab, Attila Jeges, Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8973 to look at the new patch set (#5). Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges .. IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges USE and SHOW TABLES should be allowed if there is at least one table in a database where the user has table or column privileges. Impala incorrectly checked only for table privileges. To test this issue in AuthorizationTest.java, 'functional_avro' is added as a test database with only column level permissions. Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M fe/src/test/resources/authz-policy.ini.template 4 files changed, 29 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/8973/5 -- To view, visit http://gerrit.cloudera.org:8080/8973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Gerrit-Change-Number: 8973 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal
[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8973 ) Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/8973/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java File fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java: http://gerrit.cloudera.org:8080/#/c/8973/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@102 PS3, Line 102: // SELECT permissions on columns ('id') on 'functional_avro.alltypessmall' > missing opening single-quote for 'functional_avro.alltypessmall' Done http://gerrit.cloudera.org:8080/#/c/8973/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@399 PS3, Line 399: privilege = new TPrivilege("", TPrivilegeLevel.SELECT, > typo: privilege (which has already been declared) Thanks for spotting it! Please ignore patch set 4, and compare 3 to 5. Note that it may look like if this could fit into 1 line, but it would be 91 characters. -- To view, visit http://gerrit.cloudera.org:8080/8973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Gerrit-Change-Number: 8973 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 17 Jan 2018 13:34:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8973 Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges .. IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges USE and SHOW TABLES should be allowed if there is at least one table in a database where the user has table or column privileges. Impala incorrectly checked only for table privileges. All automatic tests were added to grant_revoke.test, so AuthorizationTest.java is not changed - the reason for this is that adding a db with only column privileges to AuthorizationTest.java seemed more complex. Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test 3 files changed, 56 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/8973/1 -- To view, visit http://gerrit.cloudera.org:8080/8973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8 Gerrit-Change-Number: 8973 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal
[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8771 ) Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8771/3/docs/topics/impala_shell_options.xml File docs/topics/impala_shell_options.xml: http://gerrit.cloudera.org:8080/#/c/8771/3/docs/topics/impala_shell_options.xml@627 PS3, Line 627: > I think it's a bug that these two are not working in the impalarc and I fil I did not notice shell options during the implementation of [impala.query_options]. These can be set like query options in the shell, but have effect on the the shell, not the Impala server. I agree with Lars that this is a bug. -- To view, visit http://gerrit.cloudera.org:8080/8771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b Gerrit-Change-Number: 8771 Gerrit-PatchSet: 3 Gerrit-Owner: John RussellGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Fri, 12 Jan 2018 15:28:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#12). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner / parser tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 9 files changed, 330 insertions(+), 75 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/12 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#13). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner / parser tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 9 files changed, 333 insertions(+), 78 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/13 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 11: (6 comments) http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/cup/sql-parser.cup@1065 PS11, Line 1065: create_tbl_as_select_stmt_inner ::= > let's call this create_tbl_as_select_params to match what is produced Done http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69 PS6, Line 69:* Helper class for parsing. > I looked at this closely and both approaches seem fine to me. Here are my t Thanks! http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@76 PS11, Line 76: public CreateTableStmt createStmt_; > We don't use the "_" for public members. Done http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1642 PS11, Line 1642: private String[][] getHintStyles() { > Can you move this to FrontendTestBase, make it protected, and use it in the I have moved the member variable version found in other files instead of the function - I prefer the member, because no new arrays have to be created on every call. http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1651 PS11, Line 1651: public void TestCreateTableAsSelectWithHints() throws AnalysisException { > * I think the more important testing is in ParserTest#TestPlanHints similar I have added the CTAS version of insert tests to ParserTest#TestPlanHints. I think that some of these tests should move there, like the ones about the case insensitivity of hints, while the ones with errors or warnings should remain here. I am unsure about the rest, maybe they can be deleted, as planner tests already check most of these cases. About combining with TestInsertHints(): I have created combined insert/upsert/ctas tests in ParserTest#TestPlanHints, something similar could be done here. What makes this more tricky is that the actual tables are checked, so the source table must exist, and the target table in insert or the partitioning parameters in CTAS must match with the result of the select + there are some cases where insert is possible, but CTAS is not (HBase tables). http://gerrit.cloudera.org:8080/#/c/8400/11/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/11/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@207 PS11, Line 207: # IMPALA-4167: noclustered hint is ignored if the CTAS has partitions and sort columns. > * Any ideas to trim down the tests? Seems like a lot of new tests and we sh Maybe create /*+ clustered,noshuffle */ + the default case without hints is enough to show that both type of hints can take effect. If the other tests are deleted, then some of them should be moved to the insert tests, because they cover some scenarios that are not tested there. What do you think about this solution? I agree that both plans are unnecessary - I kept the DISTRIBUTEDPLAN, because the effect of (no)shuffle can be only checked there. -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 07 Feb 2018 17:50:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5801: Clean up codegen GetType() interface
Hello Lars Volker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9063 to look at the new patch set (#4). Change subject: IMPALA-5801: Clean up codegen GetType() interface .. IMPALA-5801: Clean up codegen GetType() interface Several functions that return llvm::(Pointer)Type were renamed to make them shorter or indicate their roles more clearly. The renamed functions can be found in llvm-codegen.h/cc. Changes in other files contain only renamed calls with the same functinality. Testing: No new tests are necessery, as no functianility was changed. Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/exec-node.cc M be/src/exec/filter-context.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/tuple.cc M be/src/runtime/types.cc M be/src/util/tuple-row-compare.cc 23 files changed, 325 insertions(+), 307 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/9063/4 -- To view, visit http://gerrit.cloudera.org:8080/9063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 Gerrit-Change-Number: 9063 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5801: Clean up codegen GetType() interface
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9063 ) Change subject: IMPALA-5801: Clean up codegen GetType() interface .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h@263 PS1, Line 263: GetType > I'm not sure that the templated version would improve readability enough to GetNamed(Ptr)Type is ok for me, even if it it a bit longer than I would prefer. I was curious about the error message, so I created an implementation and added a call with parameter int (which should lead to an error, as it doesn't have LLVM_CLASS_NAME). The error message contains some useful clues (first half) but also some strange stuff (second half), so I must agree that it could lead to confusion: In file included from be/src/codegen/llvm-codegen.cc:18:0: be/src/codegen/llvm-codegen.h: In instantiation of ‘llvm::Type* impala::LlvmCodeGen::GetType() [with T = int]’: be/src/codegen/llvm-codegen.cc:484:33: required from here be/src/codegen/llvm-codegen.h:259:65: error: ‘LLVM_CLASS_NAME’ is not a member of ‘int’ llvm::Type* GetType() { return GetNamedType(T::LLVM_CLASS_NAME); } ^ be/src/codegen/llvm-codegen.h: In member function ‘llvm::Type* impala::LlvmCodeGen::GetType() [with T = int]’: be/src/codegen/llvm-codegen.h:259:68: error: control reaches end of non-void function [-Werror=return-type] llvm::Type* GetType() { return GetNamedType(T::LLVM_CLASS_NAME); } ^ be/src/codegen/llvm-codegen.cc: In function ‘void boost::throw_exception(const std::exception&)’: be/src/codegen/llvm-codegen.cc:1719:1: error: ‘noreturn’ function does return [-Werror] } http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h@497 PS1, Line 497: llvm::Type* i8_type() { return GetInternalRepresentation(TYPE_TINYINT); } > Changing the interface first makes sense, we should switch over the impleme Done -- To view, visit http://gerrit.cloudera.org:8080/9063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 Gerrit-Change-Number: 9063 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 08 Feb 2018 22:12:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 ) Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/9241 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae Gerrit-Change-Number: 9241 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 09 Feb 2018 19:42:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 ) Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test File testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test: http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@55 PS2, Line 55: # IMPALA-6077: unsupported BIT_PACKED encoding fails when materializing columns. : select count(id), count(tinyint_col), count(smallint_col), count(int_col), : count(bigint_col), count(float_col), count(double_col), count(date_string_col), : count(string_col), count(timestamp_col), count(year), count(month), count(day) : from alltypesagg_bitpacked This query seems to be the same as the next query, and should not materialize columns. http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@64 PS2, Line 64: materializing I am not 100% sure about this, but I think that if a column is not complex, and the stats are filled, then count can be served from column chunk stats without reading any data page, so this error will not be returned. This may not be a problem for this specific parquet file, but I would mention it in a comment, or replace the query with something that has to read the data pages. http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@65 PS2, Line 65: select count(id), count(tinyint_col), count(smallint_col), count(int_col), : count(bigint_col), count(float_col), count(double_col), count(date_string_col), : count(string_col), count(timestamp_col), count(year), count(month), count(day) Is it necessary to list every column here? If one column is enough for the test, then I would prefer if it were shorter (for the sake of readability). -- To view, visit http://gerrit.cloudera.org:8080/9241 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae Gerrit-Change-Number: 9241 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 09 Feb 2018 14:58:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5801: Clean up codegen GetType() interface
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9063 ) Change subject: IMPALA-5801: Clean up codegen GetType() interface .. Patch Set 7: I have uploaded a patch with the template approach. (please ignore patch set 6, which contains some file by accident) The patch also contains some simplification around the changed call sites (like replacing GetTypes(...) with convenience functions), so it is not "rename only". I have also noticed several places where codegen is used in over complicated ways, for example "llvm::ConstantInt::get(llvm::Type::getInt1Ty(codegen->context()), false)" could be simply "codegen->false_value()". I can do some GetConst cleanup here, or I can create a ticket for it. -- To view, visit http://gerrit.cloudera.org:8080/9063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 Gerrit-Change-Number: 9063 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 14 Feb 2018 18:22:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5801: Clean up codegen GetType() interface
Hello Lars Volker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9063 to look at the new patch set (#7). Change subject: IMPALA-5801: Clean up codegen GetType() interface .. IMPALA-5801: Clean up codegen GetType() interface Several functions that return llvm::(Pointer)Type were renamed to make them shorter or indicate their roles more clearly. Some additional convenience functions were created to make some common codegen tasks simpler. The renamed functions can be found in llvm-codegen.h/cc. Changes in other files are mainly renamed calls with the same functionality. Testing: No new tests are necessary, as no functionality was changed. Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/exec-node.cc M be/src/exec/filter-context.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scanner.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/tuple.cc M be/src/runtime/types.cc M be/src/util/tuple-row-compare.cc 26 files changed, 339 insertions(+), 361 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/9063/7 -- To view, visit http://gerrit.cloudera.org:8080/9063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 Gerrit-Change-Number: 9063 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5801: Clean up codegen GetType() interface
Hello Lars Volker, Laszlo Gaal, Gabor Kaszab, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9063 to look at the new patch set (#9). Change subject: IMPALA-5801: Clean up codegen GetType() interface .. IMPALA-5801: Clean up codegen GetType() interface Several functions that return llvm::(Pointer)Type were renamed to make them shorter or indicate their roles more clearly. Some additional convenience functions were created to make some common codegen tasks simpler: - Get(Ptr)Type functions with string parameter are renamed to GetNamed(Ptr)Type - GetStruct(Ptr)Type template functions are created to make GetNamedType(MyStruct::LLVM_CLASS_NAME) calls simpler (some classes had LLVM_CLASS_NAME as private, these are changed to public) - integer type convenience functions are renamed to indicate bit width instead of matching SQL type (e.g. int_type->i32_type) - similar convenience functions were created for ptr to primitive types, int_ptr_type - Get(Ptr)Type functions with ColumnType parameter are renamed to GetSlot(Ptr)Type - GetIntConstant function is split to several functions depending on the type of the integer e.g. GetI32Constant The renamed functions can be found in llvm-codegen.h/cc. Changes in other files are mainly renamed calls with the same functionality. Testing: No new tests are necessary, as no functionality was changed. Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/exec-node.cc M be/src/exec/filter-context.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scanner.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/tuple.cc M be/src/runtime/types.cc M be/src/util/tuple-row-compare.cc 26 files changed, 357 insertions(+), 393 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/9063/9 -- To view, visit http://gerrit.cloudera.org:8080/9063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 Gerrit-Change-Number: 9063 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5801: Clean up codegen GetType() interface
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9063 ) Change subject: IMPALA-5801: Clean up codegen GetType() interface .. Patch Set 9: (4 comments) http://gerrit.cloudera.org:8080/#/c/9063/9/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/9063/9/be/src/codegen/llvm-codegen.h@529 PS9, Line 529: llvm::Constant* GetI8Constant(uint64_t val) { : return llvm::ConstantInt::get(context(), llvm::APInt(8, val)); : } : llvm::Constant* GetI16Constant(uint64_t val) { : return llvm::ConstantInt::get(context(), llvm::APInt(16, val)); : } : llvm::Constant* GetI32Constant(uint64_t val) { : return llvm::ConstantInt::get(context(), llvm::APInt(32, val)); : } : llvm::Constant* GetI64Constant(uint64_t val) { I have written the val arguments type to uint64_t to minimize the change compared to the original behavior. http://gerrit.cloudera.org:8080/#/c/9063/8/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/9063/8/be/src/codegen/llvm-codegen.h@272 PS8, Line 272: G > Nit: extra whitespace Done http://gerrit.cloudera.org:8080/#/c/9063/8/be/src/codegen/llvm-codegen.h@277 PS8, Line 277: G > Nit: extra whitespace Done http://gerrit.cloudera.org:8080/#/c/9063/8/be/src/codegen/llvm-codegen.h@506 PS8, Line 506: Consta > Being curious: is there any specific reason for not changing null_ptr_value There was no specific reason. I had to change true/false_value to Constant because of GetBoolConstant, so I also changed the function that returns them. Note that some other functions' return type could be changed from Value* to Constant*, for example CastPtrToLlvmPtr(). It could be also nice to change several variable's type on the call sites of these functions. -- To view, visit http://gerrit.cloudera.org:8080/9063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77 Gerrit-Change-Number: 9063 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 19 Feb 2018 14:32:39 + Gerrit-HasComments: Yes