[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/8508/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8508/2//COMMIT_MSG@9 PS2, Line 9: Impala does not represent a quoted string at date/time format. For example, Please wrap commit message lines at 70 characters (here and below). http://gerrit.cloudera.org:8080/#/c/8508/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/2/be/src/exprs/expr-test.cc@5747 PS2, Line 5747: '\\'day\\'dd,\\'month\\'MMM,\\'year\\',\\'hour\\'HH,\\'minute\\'mm,\\'second\\'ss')", Please wrap lines at 90 characters. http://gerrit.cloudera.org:8080/#/c/8508/2/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/2/be/src/runtime/timestamp-parse-util.h@79 PS2, Line 79: // User can specify a quoted string at date/time format. For example, addtional quoted string Please wrap lines at 90 characters (here and below). http://gerrit.cloudera.org:8080/#/c/8508/2/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/8508/2/be/src/runtime/timestamp-parse-util.cc@170 PS2, Line 170: if (str - string_literal != 1) { : const int len = str - string_literal - 1; : DCHECK(len >= 0); : const int pos = str - str_begin - len; : DCHECK(pos >= 0); : const char* val = string_literal + 1; : dt_ctx->toks.push_back(DateTimeFormatToken(SEPARATOR, pos, len, val)); : } What if (str - string_literal == 1) ? Note, that in Hive you get an apostrophe: > select from_unixtime(1492677561, 'H\'\''); 10' -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 2 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 14 Nov 2017 18:22:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5754: Improve randomness of rand()/random()
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/8355 ) Change subject: IMPALA-5754: Improve randomness of rand()/random() .. Patch Set 8: (3 comments) http://gerrit.cloudera.org:8080/#/c/8355/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8355/8//COMMIT_MSG@11 PS8, Line 11: libarary typo: library http://gerrit.cloudera.org:8080/#/c/8355/8/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8355/8/be/src/exprs/expr-test.cc@4660 PS8, Line 4660: const double expected_result = ConvertValue(GetValue(rand.str(), TYPE_DOUBLE)); nit: Please wrap lines at 90 characters. http://gerrit.cloudera.org:8080/#/c/8355/8/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8355/8/be/src/exprs/math-functions-ir.cc@163 PS8, Line 163: if (! seed_arg->is_null) { : seed = seed_arg->val; : } nit: one line. Also, please remove the space after ! -- To view, visit http://gerrit.cloudera.org:8080/8355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684 Gerrit-Change-Number: 8355 Gerrit-PatchSet: 8 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-Comment-Date: Tue, 14 Nov 2017 16:22:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/8490 ) Change subject: IMPALA-2248: Make idle_session_timeout a query option .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/client-request-state.cc@211 PS3, Line 211: } : else { single line http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.h@403 PS3, Line 403: 'timeout' nit: formal parameter is 'requested_timeout' http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.cc@1119 PS3, Line 1119: if (session_timeout > 0) { : lock_guard l(session_timeout_lock_); : multiset::const_iterator itr = session_timeout_set_.find(session_timeout); : DCHECK(itr != session_timeout_set_.end()); : session_timeout_set_.erase(itr); : session_timeout_cv_.notify_one(); : } Can we call UnregisterSessionTimeout() instead here? http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.cc@1236 PS3, Line 1236: nit: 4 character indents on line continuations http://gerrit.cloudera.org:8080/#/c/8490/3/fe/src/test/java/org/apache/impala/service/JdbcTest.java File fe/src/test/java/org/apache/impala/service/JdbcTest.java: http://gerrit.cloudera.org:8080/#/c/8490/3/fe/src/test/java/org/apache/impala/service/JdbcTest.java@657 PS3, Line 657: connections.get(0).close(); Maybe you can be more explicit here about the expected end-state, e.g.: assertNotNull("..", connections.get(0)); assertNull("..", connections.get(1)); assertNull("..", connections.get(2)); -- To view, visit http://gerrit.cloudera.org:8080/8490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e Gerrit-Change-Number: 8490 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 08 Nov 2017 12:37:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/8447/8/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8447/8/shell/impala_shell.py@241 PS8, Line 241: len(advanced_options) > 0 nit: you don't need to call len() to check if a dictionary is empty: if advanced_options: .. http://gerrit.cloudera.org:8080/#/c/8447/8/shell/impala_shell.py@244 PS8, Line 244: len(deprecated_options) > 0 Same as L241 -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 08 Nov 2017 08:55:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/8447/7/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8447/7/shell/impala_shell.py@240 PS7, Line 240: if len(advanced_options) > 0 and print_mode == QueryOptionDisplayModes.ALL_OPTIONS: : print '\nAdvanced Query Options:' : self._print_option_group(advanced_options, set_options) : if len(deprecated_options) > 0 and print_mode == QueryOptionDisplayModes.ALL_OPTIONS: : print '\nDeprecated Query Options:' : self._print_option_group(deprecated_options, set_options) nit: this can be simplified as: if print_mode == QueryOptionDisplayModes.ALL_OPTIONS: if advanced_options: print '\nAdvanced Query Options:' self._print_option_group(advanced_options, set_options) if deprecated_options: print '\nDeprecated Query Options:' self._print_option_group(deprecated_options, set_options) -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 07 Nov 2017 15:01:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8447/5/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/8447/5/be/src/service/query-options.cc@618 PS5, Line 618: Probably I'm missing something here, but shouldn't this be ADVANCED? http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@127 PS2, Line 127: PRINT_REGULAR_OPTIONS_ONLY = 1 : PRINT_ALL_OPTIONS = 2 > I found it more verbose to have 2 const instead of a bool. Ok, I see your point. You could also put the PRINT_* members into a separate class to emulate enums (see class CmdStatus in L61-L69). Same goes for QUERY_OPTION_* members. On the other hand this might be an overkill, so use your best judgement :) http://gerrit.cloudera.org:8080/#/c/8447/2/tests/shell/test_shell_interactive.py File tests/shell/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/8447/2/tests/shell/test_shell_interactive.py@351 PS2, Line 351:result = shell1.get_result() : assert "Query options (defaults shown in []):" in result.stdout : assert "ABORT_ON_ERROR" in result.stdout > No. SUPPORT_START_OVER as I discussed with Tim is a tricky one and we don't Thanks for the explanation. I still find confusing that in impala::PopulateQueryOptionLevels() the query option level for SUPPORT_START_OVER is explicitly set to REGULAR. Shouldn't it be ADVANCED? -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Nov 2017 11:20:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8447/2/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8447/2/be/src/service/impala-server.h@520 PS2, Line 520: void AddOptionLevelToConfigDescription(beeswax::ConfigVariable& option, : const TQueryOptions& default_query_options_, const string& option_key); Remove 'default_query_options_' and make AddOptionLevelToConfigDescription() const. -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 02 Nov 2017 18:12:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8447/2/tests/shell/test_shell_interactive.py File tests/shell/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/8447/2/tests/shell/test_shell_interactive.py@351 PS2, Line 351:result = shell1.get_result() : assert "Query options (defaults shown in []):" in result.stdout : assert "ABORT_ON_ERROR" in result.stdout SUPPORT_START_OVER should appear here in result.stdout, right? -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 02 Nov 2017 18:01:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/8447/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8447/2/be/src/service/impala-server.cc@1221 PS2, Line 1221: default_query_options_ Why pass in 'default_query_options_'? Isn't it already visible in AddOptionLevelToConfigDescription()? http://gerrit.cloudera.org:8080/#/c/8447/2/be/src/service/impala-server.cc@1224 PS2, Line 1224: ConfigVariable support_start_over; : support_start_over.__set_key("support_start_over"); : support_start_over.__set_value("false"); : default_configs_.push_back(support_start_over); Shouldn't you call AddOptionLevelToConfigDescription() with 'support_start_over' as well? http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_client.py@106 PS2, Line 106: is not It is safer to use "!=" here. "is not" works for small integers with the CPython interpreter but it is implementation specific behavior and might be changed in the future. http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_client.py@107 PS2, Line 107: return "" nit: maybe we should be more defensive and return "" as well if parsed_description[0] != "OptionLevel". http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@127 PS2, Line 127: PRINT_REGULAR_OPTIONS_ONLY = 1 : PRINT_ALL_OPTIONS = 2 nit: why not use a bool flag instead? http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@225 PS2, Line 225: = nit: remove spaces around '=' http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@250 PS2, Line 250: that 'then'? http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@266 PS2, Line 266: default_options[option_name] Why not use 'option_value'? http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@263 PS2, Line 263: elif level == self.QUERY_OPTION_ADVANCED: : advanced_options[option_name] = option_value : else: : advanced_options[option_name] = default_options[option_name] How about this instead? else: advanced_options[option_name] = option_value -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 02 Nov 2017 17:32:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 13: > Uploaded patch set 13: Patch Set 12 was rebased. Thanks for the review. I think it needed a rebase. Could you restart the job? -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 13 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 26 Oct 2017 06:11:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/service/fe-support.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test 12 files changed, 775 insertions(+), 108 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/13 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 13 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/service/fe-support.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test 12 files changed, 775 insertions(+), 108 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/12 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 12 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 11: (10 comments) Thanks for the review! http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@455 PS10, Line 455: TQueryOptions options; > Thanks for investigating and providing an explanation! Done http://gerrit.cloudera.org:8080/#/c/7564/11/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/7564/11/be/src/service/fe-support.cc@454 PS11, Line 454: JNIEnv* env, jclass caller_class, jstring options_str, jbyteArray tquery_options) { > rename options_str to csv_query_options for consistency Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@288 PS10, Line 288: if (!currentTestCase.isValid()) { > All PlannerTest .test files should end with "", so I think the code her Correct, every PlannerTest .test file ends with "". I've removed parseQueryOptions() from L338. I find the code in L339-L344 very confusing. If the .test file doesn't end with "", the last non-terminated test case is still validated and returned to the caller. In addition, if the very last section is not "" terminated it will be silently ignored. Maybe we should do something like this instead: if (!currentTestCase.isEmpty() || !sectionContents.isEmpty()) { throw new IllegalStateException("Test case" + " at line " + currentTestCase.startLineNum + " is not '' terminated."); } return currentTestCase; http://gerrit.cloudera.org:8080/#/c/7564/11/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java: http://gerrit.cloudera.org:8080/#/c/7564/11/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@359 PS11, Line 359: " - " + e.getMessage()); > also pass in 'e' as the cause Done http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test: http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@38 PS11, Line 38: DISTRIBUTEDPLAN > What's the value in keeping this section? I would like to keep it as a reference, so that we can see the effect of RUNTIME_FILTER_MODE=LOCAL in L339 and L422. http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@174 PS11, Line 174: DISTRIBUTEDPLAN > What's the value in keeping this section? Removed it. http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@257 PS11, Line 257: DISTRIBUTEDPLAN > What's the value in keeping this section? Removed it. http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@392 PS11, Line 392: MAX_NUM_RUNTIME_FILTERS=4 > How about setting to 3 so we can see both local and non-local filters being Done http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@582 PS11, Line 582: DISTRIBUTEDPLAN > What's the value in keeping this section? Removed it. http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@657 PS11, Line 657: DISTRIBUTEDPLAN > What's the value in keeping this section? Removed it. -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 11 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 25 Oct 2017
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 10: (26 comments) http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@455 PS10, Line 455: jbyteArray thrift_in_query_options) { > Not sure what 'thrift_in_query_options' means. How about using 'tquery_opti I've renamed the parameter. I decided to pass in the existing query options because the thrift-generated C++ and Java TQueryOptions classes don't handle the isset flags properly. If I don't pass in the existing query options, then: 1. I'd have to create an "empty" TQueryOptions instance inside this function. 2. The __isset member of that instance would contain true values for every option even though they haven't been set explicitly. Therefore I'd have to overwrite __isset with false values somehow before calling impala::ParseQueryOptions(). 3. When FeSupport.ParseQueryOptions() deserializes the TQueryOptions instance returned by Java_org_apache_impala_service_FeSupport_NativeParseQueryOptions(), it would still find that some query options have been set, even if they haven't been touched (e.g. TQueryOptions.setExplain_levelIsSet() always returns true since explain_level != null even if it hasn't been set explicitly). I'm not sure why thrift-generated classes behave this way. Maybe there's a better way to work around the isset bugs, but I couldn't think of any. http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@457 PS10, Line 457: DeserializeThriftMsg(env, thrift_in_query_options, ); > Needs error handling, e.g. using THROW_IF_ERROR_RET Done http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@458 PS10, Line 458: const char *o = env->GetStringUTFChars(options_str, NULL); > * You need to release the string UTF chars, take a look at JniUtfCharGuard Done http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@461 PS10, Line 461: TParseQueryOptionsResult result; > I think it's simpler to convert all Status to an exception. That way we don Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java@119 PS10, Line 119: // Create runtime filters. 'singleNodePlan' now points to the root of the distributed > Comment is wrong. 'singleNodePlan' definitely does not point to the root of Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@354 PS10, Line 354: public void addTarget(RuntimeFilterTarget target) { targets_.add(target); } > add newline before Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@397 PS10, Line 397: Preconditions.checkState(maxNumFilters >= 0); > Why is 0 not a valid value? 0 is a valid value. MAX_NUM_RUNTIME_FILTERS=0 effectively removes every runtime filter. http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@423 PS10, Line 423: filter.computeNdvEstimate(); > My understanding is that IMPALA-3450 has been fixed and we can remove this Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@537 PS10, Line 537:* The assigned filters are the ones for which 'scanNode' can be used a destination > can be used as a destination node Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@539 PS10, Line 539:* 1. If DISABLE_ROW_RUNTIME_FILTERING query option is set, a filter is only assigned to > If the DISABLE_ROW_RUNTIME_FILTERING query option is set ... Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@541 PS10, Line 541:* 2. If RUNTIME_FILTER_MODE query option is set to LOCAL, a filter is only assigned to > If the RUNTIME_FILTER_MODE query option is set ... Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@542 PS10, Line 542:*'scanNode' if the filter is local to the scan node. > This doesn't really explain what the LOCAL option means. How about: Done
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/service/fe-support.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test 12 files changed, 927 insertions(+), 108 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/11 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 11 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 10: > (5 comments) > > Nice. So much better. I'll let Alex take a look as well and do the > final +2. Thanks for the review! -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 10 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 06 Oct 2017 10:29:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/service/fe-support.cc M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test 13 files changed, 743 insertions(+), 108 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/10 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 10 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 9: (5 comments) http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/main/java/org/apache/impala/service/FeSupport.java File fe/src/main/java/org/apache/impala/service/FeSupport.java: http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/main/java/org/apache/impala/service/FeSupport.java@292 PS9, Line 292: paresResult > I believe you meant parseResult? Yes, fixed it. http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java: http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@125 PS9, Line 125: public int getStartingLineNum() { : return startLineNum; : } > single line Done http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@129 PS9, Line 129: public TQueryOptions getOptions() { : return this.options; : } > nit: single line Done http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@133 PS9, Line 133: public void setOptions(TQueryOptions options) { : this.options = options; : } > single line Done http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@367 PS9, Line 367: > Add a Preconditions.checkNotNull(result) here. Done -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 9 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 06 Oct 2017 10:18:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/service/fe-support.cc M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test 13 files changed, 744 insertions(+), 105 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/9 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 9 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 8: > Uploaded patch set 8. Rebased again because of a conflict in hdfs-scan-node-base.cc. -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 8 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 03 Oct 2017 11:57:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/service/fe-support.cc M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test 13 files changed, 756 insertions(+), 110 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/8 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 8 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 7: > Thanks for adding the support for parsing the query options. Did > you try to reuse the c++ implementation through a JNI call? It > would be nice if we could avoid implementing the same functionality > in both c++ and java. See FeSupport.java class for other cases > where we call into the backend from the java side. Thanks. Done. -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 7 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 03 Oct 2017 10:58:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator.cc M be/src/service/fe-support.cc M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test 13 files changed, 756 insertions(+), 110 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/7 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 7 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 6: > Uploaded patch set 6. Rebased against master. -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 6 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 28 Sep 2017 15:18:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test 10 files changed, 721 insertions(+), 110 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/6 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 6 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 5: > Uploaded patch set 5. Added support for query options to .test files. -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 5 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 28 Sep 2017 14:45:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/hash-join-node.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test 11 files changed, 725 insertions(+), 120 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/5 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 5 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 4: > > > (1 comment) > > > > The tests in this change use only 3 query options. Adding a new > > section to the .test files to support only these 3 options > wouldn't > > be too much work. > > > > On the other hand, adding support for all the query options that > > Impala supports would be a lot harder. Probably we would have to > > implement that using some Java reflection trickery. > > I don't think you need to use Java reflection. The generated Java > class for TQueryOptions has a number of helper functions to search > and set a field by name. So, for instance the QUERY_OPTIONS section > could have key=value pairs that correspond to query options. Then > we could write a small function that parses the key value pairs and > uses the helper functions to check for valid query options and set > the values. Do you want to give it a try? Thanks Thanks. I'm still confused though how to implement setting enum query options without reflection. e.g.: QUERYOPTIONS COMPRESSION_CODEC=GZIP Here we have to know that the type of TQueryOptions.compression_codec is org.apache.impala.thrift.THdfsCompression, otherwise we cannot parse "GZIP". Am I missing something? -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 4: > (1 comment) The tests in this change use only 3 query options. Adding a new section to the .test files to support only these 3 options wouldn't be too much work. On the other hand, adding support for all the query options that Impala supports would be a lot harder. Probably we would have to implement that using some Java reflection trickery. -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#4). Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/hash-join-node.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test M tests/query_test/test_runtime_filters.py 10 files changed, 666 insertions(+), 97 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/4 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Hello Thomas Tauber-Marshall, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7564 to look at the new patch set (#4). Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/hash-join-node.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test M tests/query_test/test_runtime_filters.py 10 files changed, 666 insertions(+), 97 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/4 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7564/2/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS2, Line 119: singleNodePlan > Let's expand to the comment to mention that 'singleNodePlan' now points to Done http://gerrit.cloudera.org:8080/#/c/7564/2/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: PS2, Line 87: def test_distrib_plan_filter_pruning(self, vector): : vector.get_value('exec_option')['num_nodes'] = 0 : self.run_test_case('QueryTest/runtime_filters_distrib_pruning', vector) : : def test_singlenode_plan_filter_pruning(self, vector): : vector.get_value('exec_option')['num_nodes'] = 1 : self.run_test_case('QueryTest/runtime_filters_singlenode_pruning', vector) > I don't follow. Did you check how the query option is set in testRuntimeFil Let me give you some context: In the current patch-set 'runtime_filters_distrib_pruning.test' and 'runtime_filters_singlenode_pruning.test' contain 11 test cases in total. Each of them tests runtime filter assignment with a different combination of query options. There are inline SET statements inside the .test files before each test case to set the options for that test. To implement this in PlannerTest.java, I would have to create 11 '.test' files, one for each test case. This is necessary because PlannerTest.java does not support inline SET statements inside the .test files. This approach seemed too much trouble to me. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#3). Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/hash-join-node.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test M tests/query_test/test_runtime_filters.py 10 files changed, 665 insertions(+), 97 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/3 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Hello Thomas Tauber-Marshall, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7564 to look at the new patch set (#3). Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/hash-join-node.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test M tests/query_test/test_runtime_filters.py 10 files changed, 665 insertions(+), 97 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/3 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 2: (8 comments) Thanks for the review. http://gerrit.cloudera.org:8080/#/c/7564/2/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS2, Line 119: singleNodePlan > The comment in L448 (RuntimeFilterGenerator) suggests that this function is The single node plan is created on L104 and assigned to 'singleNodePlan'. Distributed plan is created in-place on L114, therefore after L114 'singleNodePlan' refers to the distributed plan. Maybe I should rename 'singleNodePlan' to something else to avoid confusion? PS2, Line 117: // create runtime filters : if (ctx_.getQueryOptions().getRuntime_filter_mode() != TRuntimeFilterMode.OFF) { : RuntimeFilterGenerator.generateRuntimeFilters(ctx_, singleNodePlan); : ctx_.getAnalysisResult().getTimeline().markEvent("Runtime filters computed"); : } > Why was this moved here? The block was moved here because the distributed plan is created on L114 and I wanted to assign runtime filters on the distributed plan instead of the single node plan. Alex suggested this approach. This way we can enforce all the constraints (described in the commit message) during filter assignment directly. http://gerrit.cloudera.org:8080/#/c/7564/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: PS2, Line 142: public boolean isBoundByPartitionColumns = false; : // Indicates if 'node' is in the same fragment as the join that produces the : // filter : public boolean isLocalTarget = false; > can they change once they are set? If not, make them final. Done PS2, Line 165: tFilterTarget.setIs_bound_by_partition_columns(isBoundByPartitionColumns); : tFilterTarget.setIs_local_target(isLocalTarget); > Do you have to send these to the backend? I think they are only used in DCH The BE uses these flags to create instances of Coordinator::FilterTarget that are used in coordinator.cc. PS2, Line 455: Preconditions.checkNotNull(ctx); : Preconditions.checkNotNull(root); > Remove, not really doing anything useful here. Done PS2, Line 548: Preconditions.checkNotNull(ctx); : Preconditions.checkNotNull(scanNode); > remove Done PS2, Line 550: Analyzer analyzer = ctx.getRootAnalyzer(); : boolean isSingleNodeExec = ctx.isSingleNodeExec(); : boolean disableRowRuntimeFiltering = : ctx.getQueryOptions().isDisable_row_runtime_filtering(); : TRuntimeFilterMode runtimeFilterMode = ctx.getQueryOptions().getRuntime_filter_mode(); > nit: move these lines below L557. No need to get these values if we don't e Done http://gerrit.cloudera.org:8080/#/c/7564/2/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: PS2, Line 87: def test_distrib_plan_filter_pruning(self, vector): : vector.get_value('exec_option')['num_nodes'] = 0 : self.run_test_case('QueryTest/runtime_filters_distrib_pruning', vector) : : def test_singlenode_plan_filter_pruning(self, vector): : vector.get_value('exec_option')['num_nodes'] = 1 : self.run_test_case('QueryTest/runtime_filters_singlenode_pruning', vector) > These should be Planner tests (see PlannerTest.java) These are written as query tests and not planner tests because the planner tests don't have a good way to 'SET' query options on a per-test case basis. (different test cases in 'runtime_filters_distrib_pruning.test' and 'runtime_filters_singlenode_pruning.test' use different query options) -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 2: (6 comments) Thanks for the review! http://gerrit.cloudera.org:8080/#/c/7564/1/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS1, Line 119: RuntimeFilterGener > unnecessary Done PS1, Line 121: : > single line? Done http://gerrit.cloudera.org:8080/#/c/7564/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: Line 538:* The assigned filters are the ones for which 'scanNode' can be used a destination > update this comment Done PS1, Line 549: Preconditions.checkNotNull(scanNode); : Analyzer analyzer = ctx.getRootAnalyzer(); > single line? Done PS1, Line 590: e > nit: space before ':' Done http://gerrit.cloudera.org:8080/#/c/7564/1/testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test File testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test: Line 1: > I suppose these are written as query tests and not planner tests because th Correct. These test cases should be in the planner test suite, but implementing them this way is more straightforward. -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#2). Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/hash-join-node.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test M tests/query_test/test_runtime_filters.py 10 files changed, 666 insertions(+), 93 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/2 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#2). Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/hash-join-node.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test M tests/query_test/test_runtime_filters.py 10 files changed, 666 insertions(+), 93 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/2 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter
Attila Jeges has posted comments on this change. Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter .. Patch Set 4: > Patch Set 3: Code-Review+2 Thanks for reviewing it. -- To view, visit http://gerrit.cloudera.org:8080/7394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has abandoned this change. Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Abandoned It was decided that another approach should be taken to implement this feature. I'm abandoning this change. -- To view, visit http://gerrit.cloudera.org:8080/7096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I422e48847428cab9887aee899fed47a8de95c323 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter
Attila Jeges has uploaded a new patch set (#3). Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter .. IMPALA-5407: Fix crash in HdfsSequenceTableWriter The following use of sequence file writer can lead to a crash: > set compression_codec=gzip; > set seq_compression_mode=record; > set allow_unsupported_formats=1; > create table seq_tbl like tbl stored as sequencefile; > insert into seq_tbl select * from tbl; This fix removes the MemPool::FreeAll() call from HdfsSequenceTableWriter::Flush(). Freeing the memory pool in Flush() is incorrect because a memory pool buffer is cached by the compressor in the table writer which isn't reset across calls to Flush(). If the file that is being written is big enough, HdfsSequenceTableWriter::AppendRows() will call Flush() multiple times causing memory corruption. Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec --- M be/src/exec/hdfs-sequence-table-writer.cc M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test 2 files changed, 18 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/3 -- To view, visit http://gerrit.cloudera.org:8080/7394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter
Attila Jeges has posted comments on this change. Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter .. Patch Set 2: (2 comments) Added a test case to seq-writer.test and made sure that it crashes impalad if the fix is not applied. http://gerrit.cloudera.org:8080/#/c/7394/2//COMMIT_MSG Commit Message: PS2, Line 16: The issue is that HdfsSequenceTableWriter::Flush() frees the per-file : memory pool > Please clarify that HdfsSequenceTableWriter::AppendRows() may call Flush() Done PS2, Line 17: in use > cached in the compressor Done -- To view, visit http://gerrit.cloudera.org:8080/7394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter
Attila Jeges has uploaded a new patch set (#3). Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter .. IMPALA-5407: Fix crash in HdfsSequenceTableWriter The following use of sequence file writer can lead to a crash: > set compression_codec=gzip; > set seq_compression_mode=record; > set allow_unsupported_formats=1; > create table seq_tbl like tbl stored as sequencefile; > insert into seq_tbl select * from tbl; This fix removes the MemPool::FreeAll() call from HdfsSequenceTableWriter::Flush(). Freeing the memory pool in Flush() is incorrect because a memory pool buffer is cached by the compressor in the table writer which isn't reset across calls to Flush(). If the file that is being written is big enough, HdfsSequenceTableWriter::AppendRows() will call Flush() multiple times causing memory corruption. Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec --- M be/src/exec/hdfs-sequence-table-writer.cc M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test 2 files changed, 18 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/3 -- To view, visit http://gerrit.cloudera.org:8080/7394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter
Attila Jeges has uploaded a new patch set (#2). Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter .. IMPALA-5407: Fix crash in HdfsSequenceTableWriter The following use of sequence file writer can lead to a crash: > set compression_codec=gzip; > set seq_compression_mode=record; > set allow_unsupported_formats=1; > create table seq_tbl like tbl stored as sequencefile; > insert into seq_tbl select * from tbl; The issue is that HdfsSequenceTableWriter::Flush() frees the per-file memory pool, but the buffer used for compression is still in use after Flush() returns. If there are additional rows to process after calling Flush(), impalad might crash. This fix removes the FreeAll() call from Flush(). It was tested on a local minicluster with a table containing around 100 million rows. Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec --- M be/src/exec/hdfs-sequence-table-writer.cc 1 file changed, 0 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/2 -- To view, visit http://gerrit.cloudera.org:8080/7394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter
Attila Jeges has uploaded a new patch set (#2). Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter .. IMPALA-5407: Fix crash in HdfsSequenceTableWriter The following use of sequence file writer can lead to a crash: > set compression_codec=gzip; > set seq_compression_mode=record; > set allow_unsupported_formats=1; > create table seq_tbl like tbl stored as sequencefile; > insert into seq_tbl select * from tbl; The issue is that HdfsSequenceTableWriter::Flush() frees the per-file memory pool, but the buffer used for compression is still in use after Flush() returns. If there are additional rows to process after calling Flush(), impalad might crash. This fix removes the FreeAll() call from Flush(). It was tested on a local minicluster with a table containing around 100 million rows. Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec --- M be/src/exec/hdfs-sequence-table-writer.cc 1 file changed, 0 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/2 -- To view, visit http://gerrit.cloudera.org:8080/7394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-3381: Support AM/PM marker in date and time format strings
Attila Jeges has abandoned this change. Change subject: IMPALA-3381: Support AM/PM marker in date and time format strings .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/6523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I99794a3e152f1712c6c469bb266d23a81d19ca34 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter
Attila Jeges has uploaded a new change for review. http://gerrit.cloudera.org:8080/7394 Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter .. IMPALA-5407: Fix crash in HdfsSequenceTableWriter The following use of sequence file writer can lead to a crash: > set compression_codec=gzip; > set seq_compression_mode=record; > set allow_unsupported_formats=1; > create table seq_tbl like tbl stored as sequencefile; > insert into seq_tbl select * from tbl; This patch moves clearing the memory pool from Flush() to InitNewFile(). This mirrors the way the per-file memory pool is handled in class HdfsParquetTableWriter. Tested on a local minicluster with a table containing around 100 million rows. Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec --- M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h 2 files changed, 6 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/1 -- To view, visit http://gerrit.cloudera.org:8080/7394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new change for review. http://gerrit.cloudera.org:8080/7096 Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to plan nodes without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. Pruning based on the value of DISABLE_ROW_RUNTIME_FILTERING query option takes place as early as when runtime filters are assigned to the single node plan. Pruning based on RUNTIME_FILTER_MODE query option on the other hand has to wait until the distributed plan is ready (since it requires information about the locality of runtime filter targets). Change-Id: I422e48847428cab9887aee899fed47a8de95c323 --- M be/src/exec/hash-join-node.cc M be/src/exec/hdfs-scan-node-base.cc M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test A testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test M tests/query_test/test_runtime_filters.py 8 files changed, 665 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/7096/2 -- To view, visit http://gerrit.cloudera.org:8080/7096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I422e48847428cab9887aee899fed47a8de95c323 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges
[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"
Attila Jeges has uploaded a new patch set (#2). Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" .. Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" Reverting IMPALA-2716 as SparkSQL does not agree with the approach taken. More details can be found at: https://issues.apache.org/jira/browse/SPARK-12297 Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd --- M be/src/benchmarks/CMakeLists.txt D be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py D tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 75 insertions(+), 850 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/6896/2 -- To view, visit http://gerrit.cloudera.org:8080/6896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"
Attila Jeges has posted comments on this change. Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6896/1/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 25: #include > why do we need these headers? Removed the new includes and kept boost/date_time/local_time/local_time.hpp (it is required for boost::posix_time) http://gerrit.cloudera.org:8080/#/c/6896/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 123: // Avro schema of this table if this is an Avro table, otherwise null. Set in load(). > what did these have to do with your change? are they just dead variables (i Removed them, they were left behind accidentaly when I resolved the conflicts. -- To view, visit http://gerrit.cloudera.org:8080/6896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-HasComments: Yes
[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6896 to look at the new patch set (#2). Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" .. Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" Reverting IMPALA-2716 as SparkSQL does not agree with the approach taken. More details can be found at: https://issues.apache.org/jira/browse/SPARK-12297 Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd --- M be/src/benchmarks/CMakeLists.txt D be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py D tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 75 insertions(+), 850 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/6896/2 -- To view, visit http://gerrit.cloudera.org:8080/6896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6896 to look at the new patch set (#2). Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" .. Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" Reverting IMPALA-2716 as SparkSQL does not agree with the approach taken. More details can be found at: https://issues.apache.org/jira/browse/SPARK-12297 Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd --- M be/src/benchmarks/CMakeLists.txt D be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py D tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 84 insertions(+), 850 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/6896/2 -- To view, visit http://gerrit.cloudera.org:8080/6896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Attila Jeges has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6910/2/tests/util/adls_util.py File tests/util/adls_util.py: PS2, Line 29: @classmethod Why decorate __init__() with @classmethod? As a result, all 'ADLSClient' instances will share the same 'token' and 'adlsclient' members. If this is what you meant to do, please use 'cls' instead of 'self' in __init__() to make the intention clear. -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2525: Treat parquet ENUMs as STRINGs when creating impala tables.
Attila Jeges has posted comments on this change. Change subject: IMPALA-2525: Treat parquet ENUMs as STRINGs when creating impala tables. .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/6550/4/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java: PS4, Line 276: // before conversion to parquet, had been enum. Applications which do not have enumerated nit: Long line. Lines should wrap before 90 characters are hit. PS4, Line 277: nit: Please remove the extra leading spaces before 'types' (and in the line above before 'before'). -- To view, visit http://gerrit.cloudera.org:8080/6550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia7a2e20c3ab83eb3fac422c3b33c117856fec475 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jakub KukulGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Jakub Kukul Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"
Attila Jeges has posted comments on this change. Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" .. Patch Set 1: > Was it a clean revert? If not, where were the conflicts? There were conflicts in the following files: be/src/exec/parquet-column-readers.cc be/src/runtime/timestamp-value.cc be/src/runtime/timestamp-value.h common/thrift/generate_error_codes.py fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java fe/src/main/java/org/apache/impala/catalog/HdfsTable.java The conflicts were straightforward to resolve. -- To view, visit http://gerrit.cloudera.org:8080/6896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-HasComments: No
[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"
Attila Jeges has uploaded a new change for review. http://gerrit.cloudera.org:8080/6896 Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" .. Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" Reverting IMPALA-2716 as SparkSQL does not agree with the approach taken. More details can be found at: https://issues.apache.org/jira/browse/SPARK-12297 Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd --- M be/src/benchmarks/CMakeLists.txt D be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py D tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 84 insertions(+), 850 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/6896/1 -- To view, visit http://gerrit.cloudera.org:8080/6896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges
[Impala-ASF-CR] IMPALA-5137: pt1, Refactor TimestampValue constructors
Attila Jeges has posted comments on this change. Change subject: IMPALA-5137: pt1, Refactor TimestampValue constructors .. Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/6510/11/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: PS11, Line 282: nit: indentation is 4 for line continuation (here and below). http://gerrit.cloudera.org:8080/#/c/6510/11/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: PS11, Line 93: TimestampValue t You might want to use "const TimestampValue& t =" here (and below), just like you did in aggregate-functions-ir.cc -- To view, visit http://gerrit.cloudera.org:8080/6510 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id25e19f7984e5ebf9073d9c569faf69cec142fa1 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support
Attila Jeges has posted comments on this change. Change subject: IMPALA-5266 Impala ABM / LZCNT support .. Patch Set 8: (3 comments) http://gerrit.cloudera.org:8080/#/c/5821/8/be/src/util/bit-util-test.cc File be/src/util/bit-util-test.cc: PS8, Line 300: EXPECT_EQ(BitUtil::RoundUpToPowerOfTwo(7), 8); : EXPECT_EQ(BitUtil::RoundUpToPowerOfTwo(8), 8); : EXPECT_EQ(BitUtil::RoundUpToPowerOfTwo(65535), 65536); Duplicate of L297-299. I guess the intent here was to call RoundUpToPowerOfTwo() with 7L, 8L and 65535L, right? PS8, Line 308: EXPECT_EQ(BitUtil::RoundUpToPowerOfTwoNoHw(7), 8); : EXPECT_EQ(BitUtil::RoundUpToPowerOfTwoNoHw(8), 8); : EXPECT_EQ(BitUtil::RoundUpToPowerOfTwoNoHw(65535), 65536); same here http://gerrit.cloudera.org:8080/#/c/5821/8/be/src/util/bit-util.h File be/src/util/bit-util.h: Line 376: DCHECK(v > 0); DCHECK on L389 asserts that RoundUpToPowerOfTwo(int32_t) never overflows. Do we need a similar DCHECK here? -- To view, visit http://gerrit.cloudera.org:8080/5821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5246: UDF's Close() should handle Expr's preparation failure
Attila Jeges has posted comments on this change. Change subject: IMPALA-5246: UDF's Close() should handle Expr's preparation failure .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/6757/1/be/src/exprs/case-expr.cc File be/src/exprs/case-expr.cc: PS1, Line 82: fn_ctx->Free(reinterpret_cast(case_state)) I could be wrong, but shouldn't we call fn_ctx->SetFunctionState(FunctionContext::THREAD_LOCAL, nullptr) after fn_ctx->Free(case_state) to avoid double-free problems ? (just like we do in Close functions in be/src/testutil/test-udfs.cc) http://gerrit.cloudera.org:8080/#/c/6757/1/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: PS1, Line 182: ctx->Free(seed); same here http://gerrit.cloudera.org:8080/#/c/6757/1/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS1, Line 614: delete re; same here http://gerrit.cloudera.org:8080/#/c/6757/1/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: PS1, Line 195: delete dt_ctx same here http://gerrit.cloudera.org:8080/#/c/6757/1/be/src/exprs/utility-functions.cc File be/src/exprs/utility-functions.cc: PS1, Line 59: delete uuid_gen; same here -- To view, visit http://gerrit.cloudera.org:8080/6757 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2c689246ed4f8dd38f104fa35904f3926a7039c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 11: > Uploaded patch set 11. Fixed WARN_UNUSED_RESULT warning that failed clang-tidy job. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#11). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 850 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/11 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Hello Impala Public Jenkins, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5939 to look at the new patch set (#11). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 850 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/11 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-5261: Heap use-after-free in HdfsSequenceTableWriter
Attila Jeges has posted comments on this change. Change subject: IMPALA-5261: Heap use-after-free in HdfsSequenceTableWriter .. Patch Set 1: (1 comment) > (1 comment) > > I assumed you have verified the fix with ASAN build, right ? Yes, added testing details to the commit message. http://gerrit.cloudera.org:8080/#/c/6762/1/be/src/exec/hdfs-sequence-table-writer.cc File be/src/exec/hdfs-sequence-table-writer.cc: Line 306: value_length = text.size(); > DCHECK_EQ(value_length, row_buf_.Size()); Done -- To view, visit http://gerrit.cloudera.org:8080/6762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id339247f892710529d8ad56dd1e98eadbf32900b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5257: test seq writer hive compatibility fails on local file system build
Attila Jeges has posted comments on this change. Change subject: IMPALA-5257: test_seq_writer_hive_compatibility fails on local file system build .. Patch Set 1: > Is this ready to be GVO? Yes, it is -- To view, visit http://gerrit.cloudera.org:8080/6746 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic7dbe2529818865f871b66d78642ed956d1ee039 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 10: > Uploaded patch set 10. This is the rebased change. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#10). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 850 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/10 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Hello Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5939 to look at the new patch set (#10). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 850 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/10 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-5257: test seq writer hive compatibility fails on local file system build
Attila Jeges has posted comments on this change. Change subject: IMPALA-5257: test_seq_writer_hive_compatibility fails on local file system build .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6746/1/tests/query_test/test_compressed_formats.py File tests/query_test/test_compressed_formats.py: PS1, Line 152: @SkipIfLocal.hive > or @SkipIf.not_hdfs. The current one seems more correct as it's about Hive Correct. I've checked some other end-to-end tests that use Hive and they used the same skip decorators. -- To view, visit http://gerrit.cloudera.org:8080/6746 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic7dbe2529818865f871b66d78642ed956d1ee039 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5257: test seq writer hive compatibility fails on local file system build
Attila Jeges has posted comments on this change. Change subject: IMPALA-5257: test_seq_writer_hive_compatibility fails on local file system build .. Patch Set 1: > (1 comment) > > How did you test it ? I've done some simple testing locally. I set TARGET_FILESYSTEM environment variable to 'local', 'isilon', 's3' one after another and made sure that the test_seq_writer_hive_compatibility test is skipped. -- To view, visit http://gerrit.cloudera.org:8080/6746 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic7dbe2529818865f871b66d78642ed956d1ee039 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/5939/8/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: PS8, Line 124: local_date_time lt(temp, timezone); > It says an exception can be thrown by the constructor, but I guess what's i Yes, exactly. According to the documentation local_date_time(posix_time::ptime, timezone_ptr) constructor won't throw an exception. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 9: (5 comments) http://gerrit.cloudera.org:8080/#/c/5939/8/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS8, Line 601: > ... in 'scanner_status'. Done PS8, Line 604: const TimestampValue* > please pass this as a 'Status* scanner_status', which is what we usually do Done http://gerrit.cloudera.org:8080/#/c/5939/8/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: PS8, Line 124: local_date_time lt(temp, timezone); > can this throw an exception? I couldn't find any indication of an exception in the code or in the documentation: http://www.boost.org/doc/libs/1_57_0/doc/html/date_time/local_time.html#date_time.local_time.local_date_time PS8, Line 125: *this = lt.local_time(); > or that? Same here. http://gerrit.cloudera.org:8080/#/c/5939/8/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS8, Line 122: tabl > it's the same for other FileSystem interfaced things, like S3. So maybe rem Done -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#9). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 850 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/9 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5939 to look at the new patch set (#9). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 850 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/9 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-5257: test seq writer hive compatibility fails on local file system build
Attila Jeges has uploaded a new change for review. http://gerrit.cloudera.org:8080/6746 Change subject: IMPALA-5257: test_seq_writer_hive_compatibility fails on local file system build .. IMPALA-5257: test_seq_writer_hive_compatibility fails on local file system build TestTableWriters.test_seq_writer_hive_compatibility test introduced in IMPALA-3079 had to be skipped for non-HDFS filesystems. Change-Id: Ic7dbe2529818865f871b66d78642ed956d1ee039 --- M tests/query_test/test_compressed_formats.py 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/6746/1 -- To view, visit http://gerrit.cloudera.org:8080/6746 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic7dbe2529818865f871b66d78642ed956d1ee039 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 8: > (1 comment) Yes, I think we should do that. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: Line 123: ToPtime(); > I seem to remember there also being a lock in the boost code related to loc I've investigated the history of UtcToLocal(): - UtcToLocal() was first introduced in IMPALA-1658 (87b9fac2adda). This version of UtcToLocal() already used localtime_r() for UTC -> local time conversion. It also used couple boost functions to convert back and forth between 'ptime' and 'struct tm' types (to_tm(), ptime_from_tm(), nanoseconds()) - The first version of UtcToLocal() made querying tables with timestamps a lot slower (up to 30x times slower). The root cause was a global lock in localtime_r(). This is discussed in IMPALA-3316 and IMPALA-2125. Both JIRAs are still open. - UtcToLocal() was made 50% faster by 4ddce7f97880. This change got rid of the boost functions (to_tm(), ptime_from_tm(), nanoseconds()) and added the comment at L77. The change kept localtime_r() though, so it didn't do anything to solve the global lock problem. I wrote a simple benchmark program to confirm that UtcToLocal()'s performance degrades if the number of threads running it in parallel is increased. The benchmark program also confirms that FromUtc() does not have this problem. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer
Attila Jeges has posted comments on this change. Change subject: IMPALA-3079: Fix sequence file writer .. Patch Set 9: Code-Review+1 > Carry +2 forward. Core tests succeeded: http://sandbox.jenkins.cloudera.com/job/impala-private-build-and-test/5500/ -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer
Attila Jeges has uploaded a new patch set (#9). Change subject: IMPALA-3079: Fix sequence file writer .. IMPALA-3079: Fix sequence file writer This change fixes the following issues in the Sequence File Writer: 1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong() were broken. As a result, Impala created corrupt uncompressed sequence files. 2. KEY_CLASS_NAME was missing from the sequence file header. As a result, Hive could not read back uncompressed sequence files created by Impala. 3. Impala created record-compressed sequence files with empty keys block. As a result, Hive could not read back record-compressed sequence files created by Impala. 4. Impala created block-compressed files with: - empty key-lengths block - empty keys block - empty value-lengths block This resulted in invalid block-compressed sequence files that Hive could not read back. 5. In some cases the wrong Record-compression flag was written to the sequence file header. As a result, Hive could not read back record- compressed sequence files created by Impala. 6. Impala added 'sync_marker' instead of 'neg1_sync_marker' to the beginning of blocks in block-compressed sequence files. Hive could not read these files back. 7. The calculation of block sizes in SnappyBlockCompressor class was incorrect for odd-length buffers. Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 --- M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/read-write-util-test.cc M be/src/exec/read-write-util.h M be/src/util/compress.cc M be/src/util/decompress-test.cc M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test M tests/query_test/test_compressed_formats.py 8 files changed, 500 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/9 -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#8). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 806 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/8 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 8: (20 comments) http://gerrit.cloudera.org:8080/#/c/5939/7//COMMIT_MSG Commit Message: PS7, Line 22: New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE : LIKE will set the table property to UTC if the global flag : --set_parque > I don't follow this given the next two paragraphs. This isn't true for the Correct. Changed the commit message. http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 305: // parquet-mr. If conversion should not occur, this is set to an empty string. Otherwise > from the code, it looks like this is set to the empty string if conversion Correct, added a comment to document the behavior. http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 234: FLAGS_convert_legacy_hive_parquet_utc_timestamps > given that parquet_mr_write_zone() will take precedence over the flag, I th Done PS7, Line 246: DCHECK > nit: DCHECK_EQ() Done PS7, Line 609: status = scanner_state->LogOrReturnError(msg); > since this code is very performance critical, it would be best to order thi Correct. Reordered the conditions in the if-elif-else statement PS7, Line 611: status = status; > these could all be UNLIKELY since we want to optimize for the non-error cas Done PS7, Line 625: LY(dst->HasDateAndTime() > i.e. here too and below. Done PS7, Line 639: if (!SetTimestampConversionError(parent_->scan_node_, parent_->state_, : parent_->parse_status_, src, parent_->scan_node_->parquet_mr_write_zone())) { : return false; : } : } : } else { : DCHECK(parent_->scan_node_->parquet_mr_write_zone().empty()); : DCH > seems worth factoring this block (which occurs 3 times) into a non-inlined Done. Added a new non-class member function and factored out the block into it. (Adding a new member function instead to ScalarColumnReader template class is probably not a good idea since it is only needed in the ScalarColumnReaderinstance) http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: Line 193: > could you add a comment explaining these two lookup techniques? Done http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exprs/timezone_db.h File be/src/exprs/timezone_db.h: PS7, Line 45: hen IsTimestampDependen > how about drawing the connection with the next function directly: Done Line 49: > comment this. Done http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: Line 101: Status s("Failed to convert timestamp to local time."); > this might produce a ton of logging. instead, why not return the Status ob Done Line 109: bool TimestampValue::FromUtc(const std::string& timezone_str) { > how do we guarantee that this is true? did the caller already check that? Yes, it did. PS7, Line 111: _zone_ptr timezo > UNLIKELY (since perf critical) Done Line 123: ToPtime(); > see the comment at line 77. i think we were avoiding this for perf reasons. I've added a micro-benchmark to compare the two. UtcToLocal() is about 2.5 times faster than FromUtc() when input data size is 1,000 and 1.5 times faster when input data size is 10,000. Rewriting FromUtc() not to use boost function calls is not that straightforward. The above version of UtcToLocal() uses localtime_r() standard C function to perform the UTC->local timezone conversion. FromUtc() on the other hand should convert timestamps to a given time zone. I don't think there is a standard C function to do this. http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: PS7, Line 197: > below you say "local time in the given timezone", which I guess is technica Changed the comment below for 'FromUtc' PS7, Line 203: > it's not clear that "convert" means that *this is modified in-place, so I t Done http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS7, Line 122: new > is that always the case? Fixed the description PS7, Line 122: bles created " : "using CREATE TABLE and CREATE TABLE LIKE . You can find details in the " : "documentation."); : : DEFINE_int64(max_result_cache_size, 1 > I'm not sure that this makes sense on it's own. I think you kind of have to Removed that part
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5939 to look at the new patch set (#8). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 806 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/8 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#7). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New tables created by Impala will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the table property unless the global flag is set to true. Tables created using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 26 files changed, 665 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/7 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java: Line 113: "Invalid time zone in the '%s' table property: %s", > double 'the' Done http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 665: > Let's move all the CREATE/ALTER tests into a separate TestParquetMrInt96Wri Done Line 1882: "' '", "URI path cannot be empty."); > easier to read single quotes Done Line 1904: type == PrimitiveType.VARCHAR) { > easier to read single quotes Done http://gerrit.cloudera.org:8080/#/c/5939/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py: Line 105: parquet_path = get_fs_path( > What does "fn" stand for? I'm thinking "file name", but this is not just a Renamed it to 'parquet_path' Line 123: ON i.id = h.id AND i.day = h.day -- serves as a unique key > easier to read with the alias 'i' next to the table Done Line 125: (h.timestamp_col IS NULL) != (i.timestamp_col IS NULL) > simplify the first two conditions with: Done. Had to put round brackets around col IS NULL expressions to avoid analysis errors. http://gerrit.cloudera.org:8080/#/c/5939/6/tests/query_test/test_parquet_timestamp_compatibility.py File tests/query_test/test_parquet_timestamp_compatibility.py: Line 78: def test_invalid_parquet_mr_write_zone(self, vector, unique_database): > test_invalid_parquet_mr_write_zone Done Line 118: # tz_name conversion on the timestamp values. > extra space after "triggers" Done -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#7). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New tables created by Impala will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the table property unless the global flag is set to true. Tables created using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 26 files changed, 665 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/7 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer
Attila Jeges has posted comments on this change. Change subject: IMPALA-3079: Fix sequence file writer .. Patch Set 6: (2 comments) Thanks for the review. http://gerrit.cloudera.org:8080/#/c/6107/6/be/src/exec/read-write-util.h File be/src/exec/read-write-util.h: PS6, Line 247: DCHECK(val < -112); > nit: DCHECK_LT(val, -112); Done PS6, Line 258: DCHECK(val > 127); > nit: DCHECK_GT(val, 127); Done -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#6). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New tables created by Impala will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the table property unless the global flag is set to true. Tables created using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 26 files changed, 657 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/6 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#6). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New tables created by Impala will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the table property unless the global flag is set to true. Tables created using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 26 files changed, 657 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/6 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5939/5/tests/custom_cluster/test_parquet_timestamp_compatibility.py File tests/custom_cluster/test_parquet_timestamp_compatibility.py: Line 71: @pytest.mark.execute_serially > Moved tests that use startup options to ./tests/custom_cluster/test_hive_pa The rest of the tests were moved to ./tests/query_test/test_parquet_timestamp_compatibility.py -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#6). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New tables created by Impala will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the table property unless the global flag is set to true. Tables created using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 26 files changed, 657 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/6 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 5: (26 comments) Thanks for the review. http://gerrit.cloudera.org:8080/#/c/5939/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 305: // parquet-mr. > Comment that the FE guarantees that this is a valid timezone. Done http://gerrit.cloudera.org:8080/#/c/5939/5/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 246: DCHECK((is_timestamp_dependent_timezone_ && timezone_ == NULL) || > simplify condition: Done Line 549: /// Used to cache the timezone object corresponding to "parquet.mr.int96.write.zone" > the "parquet.mr.int96.write.zone" table property Done Line 550: /// table property to avoid repeated calls to 'TimezoneDatabase::FindTimezone()'. > no need to single-quote here Done http://gerrit.cloudera.org:8080/#/c/5939/5/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: Line 190: // See if they specified a zone id > Who is "they"? Suggest rephrasing Done Line 202: return time_zone_ptr(); > return NULL? Done http://gerrit.cloudera.org:8080/#/c/5939/5/common/thrift/BackendGflags.thrift File common/thrift/BackendGflags.thrift: Line 58: // If true, all newly created Parquet tables will have the parquet.mr.int96.write.zone > ... all newly created HDFS tables regardless of format ... Done http://gerrit.cloudera.org:8080/#/c/5939/5/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: Line 218: 9: optional string parquet_mr_write_zone; > remove trailing semicolon for consistency Done http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: Line 169: private static void analyzeParquetMrWriteZone(Table table, > These cases need tests in AnalyzeDDL Done. I also changed the exception message for consistency. http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java: Line 112: throw new AnalysisException("Invalid Time Zone: " + timezone); > Mention that the timezone is in the table property 'parquet.mr.int96.write. Done http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 183: if (getTblProperties().containsKey(HdfsTable.TBL_PROP_PARQUET_MR_WRITE_ZONE)) { > These cases need tests in AnalyzeDDL Done Line 186: "Only HDFS tables can use '%s' table property.", > the '%s' table property Done. I also changed the exception message for consistency. http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 54: import org.apache.impala.common.InternalException; > not needed Done http://gerrit.cloudera.org:8080/#/c/5939/5/tests/custom_cluster/test_parquet_timestamp_compatibility.py File tests/custom_cluster/test_parquet_timestamp_compatibility.py: Line 25: class TestParquetTimestampCompatibility(CustomClusterTestSuite): > Test needs a comment. In particular, what is covered here and what is cover I've added a comment that describes tests in this file. What do you mean by "integration tests not part of Impala tests"? I don't think we have any integration tests upstream. Line 26: TEST_DB_NAME = 'timezone_test_db' > Why not use the unique_database fixture? That's the best practice. Switched to unique_database. Line 39: self.client = impalad.service.create_beeswax_client() > I think we already have a self.client from ImpalaTestSuite. Removed creating a client from here. Line 49: "/test-warehouse/alltypesagg_hive_13_1_parquet/alltypesagg_hive_13_1.parquet" > We need the appropriate filesystem prefix here. This will break on local fs Done Line 56: self.client.execute('USE ' + self.TEST_DB_NAME) > Why? Better to avoid changing the session state Removed it. Line 59: self.client.execute('INVALIDATE METADATA') > why? Removed it, it's not needed. Line 61: def _set_impala_table_timezone(self, timezone): > simplify to pass table name as param Done Line 71: @pytest.mark.execute_serially > How long does this test take? I'm thinking we should only have minimal test Moved tests that use startup options to ./tests/custom_cluster/test_hive_parquet_timestamp_conversion.py Line 81: statement = '''ALTER TABLE hive_table > Should be an analyzer test in AnalyzeDDL Removed it from here. Line 104: select_from_impala_table = '''SELECT timestamp_col FROM impala_table WHERE id = 1''' > Can you
[Impala-ASF-CR] IMPALA-3381: Support AM/PM marker in date and time format strings
Attila Jeges has uploaded a new patch set (#2). Change subject: IMPALA-3381: Support AM/PM marker in date and time format strings .. IMPALA-3381: Support AM/PM marker in date and time format strings This change adds AM/PM marker to format strings used in 'to_timestamp' 'unix_timestamp' and 'from_unixtime' functions. It uses 'a' for the AM/PM marker following the Hive impelentation ( which follows Java 'SimpleDateFormat' patterns). Similarly to Hive, the 'a' pattern letter can be repeated any number of times in the format string without affecting the corresponding presentation. For example: > select from_unixtime( > unix_timestamp('2017-03-31 11:19:23 PM', '-MM-dd HH:mm:ss a'), > '-MM-dd HH:mm:ss aaa'); 2017-03-31 11:19:23 PM Change-Id: I99794a3e152f1712c6c469bb266d23a81d19ca34 --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 133 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6523/2 -- To view, visit http://gerrit.cloudera.org:8080/6523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99794a3e152f1712c6c469bb266d23a81d19ca34 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer
Attila Jeges has posted comments on this change. Change subject: IMPALA-3079: Fix sequence file writer .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/6107/4/be/src/exec/hdfs-sequence-table-writer.cc File be/src/exec/hdfs-sequence-table-writer.cc: PS4, Line 179: : > Thanks for listing them out. Please also put this list in the commit messag Done http://gerrit.cloudera.org:8080/#/c/6107/5/be/src/exec/read-write-util.h File be/src/exec/read-write-util.h: Line 214: // Returns size of the encoded long value, including the 1 byte for length for val < -112 > for val < -112 or val > 127. Done PS5, Line 228: ollow > nit: long line Done PS5, Line 245: um_bytes, 9); > nit: help to comment why it's 119 here (which is different from 120 in the Done -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer
Attila Jeges has uploaded a new patch set (#6). Change subject: IMPALA-3079: Fix sequence file writer .. IMPALA-3079: Fix sequence file writer This change fixes the following issues in the Sequence File Writer: 1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong() were broken. As a result, Impala could not read back uncompressed sequence files created by Impala. 2. KEY_CLASS_NAME was missing from the sequence file header. As a result, Hive could not read back uncompressed sequence files created by Impala. 3. Impala created record-compressed sequence files with empty keys block. As a result, Hive could not read back record-compressed sequence files created by Impala. 4. Impala created block-compressed files with: - empty key-lengths block - empty keys block - empty value-lengths block This resulted in invalid block-compressed sequence files that Hive could not read back. 5. In some cases the wrong Record-compression flag was written to the sequence file header. As a result, Hive could not read back record- compressed sequence files created by Impala. 6. Impala added 'sync_marker' instead of 'neg1_sync_marker' to the beginning of blocks in block-compressed sequence files. Hive could not read these files back. 7. The calculation of block sizes in SnappyBlockCompressor class was incorrect for odd-length buffers. Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 --- M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/read-write-util-test.cc M be/src/exec/read-write-util.h M be/src/util/compress.cc M be/src/util/decompress-test.cc M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test M tests/query_test/test_compressed_formats.py 8 files changed, 494 insertions(+), 80 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/6 -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer
Hello Michael Ho, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6107 to look at the new patch set (#6). Change subject: IMPALA-3079: Fix sequence file writer .. IMPALA-3079: Fix sequence file writer This change fixes the following issues in the Sequence File Writer: 1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong() were broken. As a result, Impala could not read back uncompressed sequence files created by Impala. 2. KEY_CLASS_NAME was missing from the sequence file header. As a result, Hive could not read back uncompressed sequence files created by Impala. 3. Impala created record-compressed sequence files with empty keys block. As a result, Hive could not read back record-compressed sequence files created by Impala. 4. Impala created block-compressed files with: - empty key-lengths block - empty keys block - empty value-lengths block This resulted in invalid block-compressed sequence files that Hive could not read back. 5. In some cases the wrong Record-compression flag was written to the sequence file header. As a result, Hive could not read back record- compressed sequence files created by Impala. 6. Impala added 'sync_marker' instead of 'neg1_sync_marker' to the beginning of blocks in block-compressed sequence files. Hive could not read these files back. 7. The calculation of block sizes in SnappyBlockCompressor class was incorrect for odd-length buffers. Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 --- M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/read-write-util-test.cc M be/src/exec/read-write-util.h M be/src/util/compress.cc M be/src/util/decompress-test.cc M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test M tests/query_test/test_compressed_formats.py 8 files changed, 494 insertions(+), 80 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/6 -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-3381: Support AM/PM marker in date and time format strings
Attila Jeges has uploaded a new change for review. http://gerrit.cloudera.org:8080/6523 Change subject: IMPALA-3381: Support AM/PM marker in date and time format strings .. IMPALA-3381: Support AM/PM marker in date and time format strings This change adds AM/PM marker to format strings used in 'to_timestamp' 'unix_timestamp' and 'from_unixtime' functions. It uses 'a' for the AM/PM marker following the Hive impelentation ( which follows Java 'SimpleDateFormat' patterns). Similarly to Hive, the 'a' pattern letter can be repeated any number of times in the format string without affecting the corresponding presentation. For example: > select from_unixtime( > unix_timestamp('2017-03-31 11:19:23 PM', '-MM-dd HH:mm:ss a'), > '-MM-dd HH:mm:ss aaa'); 2017-03-31 11:19:23 PM Change-Id: I99794a3e152f1712c6c469bb266d23a81d19ca34 --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 137 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6523/1 -- To view, visit http://gerrit.cloudera.org:8080/6523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I99794a3e152f1712c6c469bb266d23a81d19ca34 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 5: (15 comments) http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 304: // Time zone for adjusting timestamp values read from Parquet files written by > single line That would make the line 92 characters long. http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 246: DCHECK((is_timestamp_dependent_timezone_ && timezone_ == NULL) || > Even better, we can check this at the scan node level, e.g., in HdfsScanNod Simplified the code. The timezone check is done in the FE, the BE is guaranteed to have a valid timezone string. Line 653: } > As mentioned above, we should terminate the query early when in such a bad Added DCHECK(false) http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: Line 194: for (const string& tz_region: tz_region_list_) { > No need to use auto here. const string& is clearer Done http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exprs/timezone_db.h File be/src/exprs/timezone_db.h: Line 40: /// If 'tz' is not found in the database, nullptr is returned. > null pointer -> nullptr Done Line 46: /// If 'tz' is not found in the database, nullptr is returned. > null pointer -> nullptr Done http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 199: /// conversion was successfull and false otherwise. If conversion failed *this is set to > Comment on what the state of *this is when false is returned. Done Line 202: > Comment on what the state of *this is when false is returned. Done http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 357: const char *tz = env->GetStringUTFChars(timezone, NULL); > const char* Done http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 122: "the parquet.mr.int96.write.zone table property to UTC for new tables. This changes " > consistent placing of space (at line end or at line beginning) Done http://gerrit.cloudera.org:8080/#/c/5939/3/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java: Line 67: analyzeJoin(analyzer); > If we do the validation here, then there is no need to check in each HdfsSc Done http://gerrit.cloudera.org:8080/#/c/5939/3/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 184: if (getFileFormat() == THdfsFileFormat.KUDU) { > You can check getFileFormat() here. The only non-HDFS formats we have are H Done Line 189: String timezone = getTblProperties().get(HdfsTable.TBL_PROP_PARQUET_MR_WRITE_ZONE); > We should only do this for HDFS tables right? Done http://gerrit.cloudera.org:8080/#/c/5939/4/fe/src/main/java/org/apache/impala/service/FeSupport.java File fe/src/main/java/org/apache/impala/service/FeSupport.java: Line 87: // Returns true if timezone String is valid according to the BE timezone database, false > valid according to the BE timezone database Done http://gerrit.cloudera.org:8080/#/c/5939/4/tests/custom_cluster/test_parquet_timestamp_compatibility.py File tests/custom_cluster/test_parquet_timestamp_compatibility.py: Line 1: # Licensed to the Apache Software Foundation (ASF) under one > Use new Apache license header. Do we still have the old header in some exis My bad, I created this test from an old test script. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#5). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New tables created by Impala will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the table property unless the global flag is set to true. Tables created using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M tests/common/impala_test_suite.py A tests/custom_cluster/test_parquet_timestamp_compatibility.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py 24 files changed, 569 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/5 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer
Attila Jeges has uploaded a new patch set (#5). Change subject: IMPALA-3079: Fix sequence file writer .. IMPALA-3079: Fix sequence file writer Before the fix, sequence file writer produced corrupt files in some cases. Steps to reproduce: SET ALLOW_UNSUPPORTED_FORMATS=1; create table store_sales_seq_snap like tpcds_parquet.store_sales stored as SEQUENCEFILE; insert into store_sales_seq_snap partition(ss_sold_date_sk) select * from tpcds_parquet.store_sales where ss_sold_date_sk between 2450816 and 2451200; The insert statement produces a corrupt file that cannot be read back. This change fixes: - The implementation of zero-compressed encoding in ReadWriteUtil class. - The calculation of block sizes in SnappyBlockCompressor class. - Creating record/block compressed sequence files in HdfsSequenceTableWriter class. Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 --- M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/read-write-util-test.cc M be/src/exec/read-write-util.h M be/src/util/compress.cc M be/src/util/decompress-test.cc M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test M tests/query_test/test_compressed_formats.py 8 files changed, 484 insertions(+), 80 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/5 -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer
Attila Jeges has posted comments on this change. Change subject: IMPALA-3079: Fix sequence file writer .. Patch Set 5: (10 comments) http://gerrit.cloudera.org:8080/#/c/6107/4/be/src/exec/hdfs-sequence-table-writer.cc File be/src/exec/hdfs-sequence-table-writer.cc: PS4, Line 179: : > Is this the reason the old sequence file writer created corrupted files ? The old sequence file writer had multiple issues: 1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong() were broken. As a result, Impala could not read back uncompressed sequence files created by Impala (see be/src/exec/read-write-util.h). 2. KEY_CLASS_NAME was missing from the sequence file header. As a result, Hive could not read back uncompressed sequence files created by Impala (see be/src/exec/hdfs-sequence-table-writer.cc). 3. Keys were missing from record-compressed sequence files. Hive could not read back record-compressed sequence files created by Impala (see be/src/exec/hdfs-sequence-table-writer.cc). 4. In some cases the wrong Record-compression flag was written to the sequence file header. As a result, Hive could not read back record-compressed sequence files created by Impala (see be/src/exec/hdfs-sequence-table-writer.cc). 5.Impala added 'sync_marker' instead of 'neg1_sync_marker' to the beginning of blocks in block-compressed sequence files. Hive could not read these files back. 6. Impala created block-compressed files with: - empty key-lengths block (L176) - empty keys block (L177) - empty value-lengths block (L180) This resulted in invalid block-compressed sequence files that Hive could not read back (see HdfsSequenceTableWriter::WriteCompressedBlock()). PS4, Line 139: _cast(KEY_CLASS_NAME)); > nit: indent 4. Same below. Done Line 191: record.WriteBytes(output_length, output); > It seems a bit unfortunate that we don't free the temp buffer (i.e. output) Added FreeAll to the end of the 'Flush()' function. PS4, Line 193: // Output compressed keys block-size & compressed keys block. : // The keys block contains "\0\0\0\0" byte sequence as a key for each row (this is what : // Hive does). > Does not writing key-lengths block and key block prevent the file from bein Yes, Hive failed with an exception when I tried that. http://gerrit.cloudera.org:8080/#/c/6107/4/be/src/exec/hdfs-sequence-table-writer.h File be/src/exec/hdfs-sequence-table-writer.h: Line 29: > Would be good to add the details of the Sequence file's layout as top level Done http://gerrit.cloudera.org:8080/#/c/6107/3/be/src/exec/read-write-util.h File be/src/exec/read-write-util.h: Line 230: // For more information, see the documentation for 'WritableUtils.writeVLong()' method: > DCHECK_GE(num_bytes, 2); Done PS3, Line 233: nt64_t num_b > May also want to state that the source of this behavior. Done http://gerrit.cloudera.org:8080/#/c/6107/3/be/src/util/compress.cc File be/src/util/compress.cc: Line 248: outp += size; > DCHECK_LE(outp - out_buffer_, length); Done http://gerrit.cloudera.org:8080/#/c/6107/3/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test File testdata/workloads/functional-query/queries/QueryTest/seq-writer.test: Line 212: stored as SEQUENCEFILE; > May be helpful to also add a test for writing empty file: I tried this and it doesn't write an empty file. It doesn't create a file at all. Probably there's no easy way to force the sequence file writer to create an empty-file. http://gerrit.cloudera.org:8080/#/c/6107/3/tests/query_test/test_compressed_formats.py File tests/query_test/test_compressed_formats.py: PS3, Line 170: # Read it back in Impala : output = self.client.execute('select count(*) from %s' % table_name) : assert '16541' == output.get_data() : # Read it back in Hive : output = self.run_stmt_in_hive('select count(*) from %s' % table_name) : assert '16541' == output.split('\n')[1] : : def test_avro_writer(self, vector): : self.run_test_case('QueryTest/avro-wri > Doesn't this duplicate the second test ? May help to test with empty file a The 2nd test is for record-compressed sequence files while the 3rd is for block-compressed seq files. Added tests for files greater than 4K and less than 4K. I could not figure out how to create an empty file. -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer
Attila Jeges has uploaded a new patch set (#5). Change subject: IMPALA-3079: Fix sequence file writer .. IMPALA-3079: Fix sequence file writer Before the fix, sequence file writer produced corrupt files in some cases. Steps to reproduce: SET ALLOW_UNSUPPORTED_FORMATS=1; create table store_sales_seq_snap like tpcds_parquet.store_sales stored as SEQUENCEFILE; insert into store_sales_seq_snap partition(ss_sold_date_sk) select * from tpcds_parquet.store_sales where ss_sold_date_sk between 2450816 and 2451200; The insert statement produces a corrupt file that cannot be read back. This change fixes: - The implementation of zero-compressed encoding in ReadWriteUtil class. - The calculation of block sizes in SnappyBlockCompressor class. - Creating record/block compressed sequence files in HdfsSequenceTableWriter class. Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 --- M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/read-write-util-test.cc M be/src/exec/read-write-util.h M be/src/util/compress.cc M be/src/util/decompress-test.cc M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test M tests/query_test/test_compressed_formats.py 8 files changed, 484 insertions(+), 80 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/5 -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho