[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2017-11-14 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8508/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8508/2//COMMIT_MSG@9
PS2, Line 9: Impala does not represent a quoted string at date/time format. For 
example,
Please wrap commit message lines at 70 characters (here and below).


http://gerrit.cloudera.org:8080/#/c/8508/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8508/2/be/src/exprs/expr-test.cc@5747
PS2, Line 5747:  
'\\'day\\'dd,\\'month\\'MMM,\\'year\\',\\'hour\\'HH,\\'minute\\'mm,\\'second\\'ss')",
Please wrap lines at 90 characters.


http://gerrit.cloudera.org:8080/#/c/8508/2/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/8508/2/be/src/runtime/timestamp-parse-util.h@79
PS2, Line 79: //   User can specify a quoted string at date/time format. For 
example, addtional quoted string
Please wrap lines at 90 characters (here and below).


http://gerrit.cloudera.org:8080/#/c/8508/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8508/2/be/src/runtime/timestamp-parse-util.cc@170
PS2, Line 170:   if (str - string_literal != 1) {
 : const int len = str - string_literal - 1;
 : DCHECK(len >= 0);
 : const int pos = str - str_begin - len;
 : DCHECK(pos >= 0);
 : const char* val = string_literal + 1;
 : dt_ctx->toks.push_back(DateTimeFormatToken(SEPARATOR, 
pos, len, val));
 :   }
What if (str - string_literal == 1) ?

Note, that in Hive you get an apostrophe:
> select from_unixtime(1492677561, 'H\'\'');
10'



--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 2
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 14 Nov 2017 18:22:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5754: Improve randomness of rand()/random()

2017-11-14 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8355 )

Change subject: IMPALA-5754: Improve randomness of rand()/random()
..


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8355/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8355/8//COMMIT_MSG@11
PS8, Line 11: libarary
typo: library


http://gerrit.cloudera.org:8080/#/c/8355/8/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8355/8/be/src/exprs/expr-test.cc@4660
PS8, Line 4660: const double expected_result = 
ConvertValue(GetValue(rand.str(), TYPE_DOUBLE));
nit: Please wrap lines at 90 characters.


http://gerrit.cloudera.org:8080/#/c/8355/8/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8355/8/be/src/exprs/math-functions-ir.cc@163
PS8, Line 163:   if (! seed_arg->is_null) {
 : seed = seed_arg->val;
 :   }
nit: one line. Also, please remove the space after !



--
To view, visit http://gerrit.cloudera.org:8080/8355
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684
Gerrit-Change-Number: 8355
Gerrit-PatchSet: 8
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Comment-Date: Tue, 14 Nov 2017 16:22:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option

2017-11-08 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8490 )

Change subject: IMPALA-2248: Make idle_session_timeout a query option
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/client-request-state.cc@211
PS3, Line 211: }
 : else {
single line


http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.h@403
PS3, Line 403: 'timeout'
nit: formal parameter is 'requested_timeout'


http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.cc@1119
PS3, Line 1119:  if (session_timeout > 0) {
  : lock_guard l(session_timeout_lock_);
  : multiset::const_iterator itr = 
session_timeout_set_.find(session_timeout);
  : DCHECK(itr != session_timeout_set_.end());
  : session_timeout_set_.erase(itr);
  : session_timeout_cv_.notify_one();
  :   }
Can we call UnregisterSessionTimeout() instead here?


http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.cc@1236
PS3, Line 1236:
nit: 4 character indents on line continuations


http://gerrit.cloudera.org:8080/#/c/8490/3/fe/src/test/java/org/apache/impala/service/JdbcTest.java
File fe/src/test/java/org/apache/impala/service/JdbcTest.java:

http://gerrit.cloudera.org:8080/#/c/8490/3/fe/src/test/java/org/apache/impala/service/JdbcTest.java@657
PS3, Line 657: connections.get(0).close();
Maybe you can be more explicit here about the expected end-state, e.g.:

assertNotNull("..", connections.get(0));
assertNull("..", connections.get(1));
assertNull("..", connections.get(2));



--
To view, visit http://gerrit.cloudera.org:8080/8490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
Gerrit-Change-Number: 8490
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 08 Nov 2017 12:37:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-08 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8447 )

Change subject: IMPALA-2181: Add query option levels for display
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8447/8/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/8447/8/shell/impala_shell.py@241
PS8, Line 241: len(advanced_options) > 0
nit: you don't need to call len() to check if a dictionary is empty:

if advanced_options:
  ..


http://gerrit.cloudera.org:8080/#/c/8447/8/shell/impala_shell.py@244
PS8, Line 244: len(deprecated_options) > 0
Same as L241



--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 08 Nov 2017 08:55:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-07 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8447 )

Change subject: IMPALA-2181: Add query option levels for display
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8447/7/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/8447/7/shell/impala_shell.py@240
PS7, Line 240:   if len(advanced_options) > 0 and print_mode == 
QueryOptionDisplayModes.ALL_OPTIONS:
 : print '\nAdvanced Query Options:'
 : self._print_option_group(advanced_options, set_options)
 :   if len(deprecated_options) > 0 and print_mode == 
QueryOptionDisplayModes.ALL_OPTIONS:
 : print '\nDeprecated Query Options:'
 : self._print_option_group(deprecated_options, set_options)
nit: this can be simplified as:

if print_mode == QueryOptionDisplayModes.ALL_OPTIONS:
  if advanced_options:
print '\nAdvanced Query Options:'
self._print_option_group(advanced_options, set_options)
  if deprecated_options:
print '\nDeprecated Query Options:'
self._print_option_group(deprecated_options, set_options)



--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 07 Nov 2017 15:01:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-03 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8447 )

Change subject: IMPALA-2181: Add query option levels for display
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8447/5/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/8447/5/be/src/service/query-options.cc@618
PS5, Line 618:
Probably I'm missing something here, but shouldn't this be ADVANCED?


http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@127
PS2, Line 127:   PRINT_REGULAR_OPTIONS_ONLY = 1
 :   PRINT_ALL_OPTIONS = 2
> I found it more verbose to have 2 const instead of a bool.
Ok, I see your point.

You could also put the PRINT_* members into a separate class to emulate enums 
(see class CmdStatus in L61-L69). Same goes for QUERY_OPTION_* members. On the 
other hand this might be an overkill, so use your best judgement :)


http://gerrit.cloudera.org:8080/#/c/8447/2/tests/shell/test_shell_interactive.py
File tests/shell/test_shell_interactive.py:

http://gerrit.cloudera.org:8080/#/c/8447/2/tests/shell/test_shell_interactive.py@351
PS2, Line 351:result = shell1.get_result()
 : assert "Query options (defaults shown in []):" in 
result.stdout
 : assert "ABORT_ON_ERROR" in result.stdout
> No. SUPPORT_START_OVER as I discussed with Tim is a tricky one and we don't
Thanks for the explanation.

I still find confusing that in impala::PopulateQueryOptionLevels() the query 
option level for SUPPORT_START_OVER is explicitly set to REGULAR. Shouldn't it 
be ADVANCED?



--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Nov 2017 11:20:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-02 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8447 )

Change subject: IMPALA-2181: Add query option levels for display
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8447/2/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8447/2/be/src/service/impala-server.h@520
PS2, Line 520:  void AddOptionLevelToConfigDescription(beeswax::ConfigVariable& 
option,
 :   const TQueryOptions& default_query_options_, const string& 
option_key);
Remove 'default_query_options_' and make AddOptionLevelToConfigDescription() 
const.



--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 02 Nov 2017 18:12:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-02 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8447 )

Change subject: IMPALA-2181: Add query option levels for display
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8447/2/tests/shell/test_shell_interactive.py
File tests/shell/test_shell_interactive.py:

http://gerrit.cloudera.org:8080/#/c/8447/2/tests/shell/test_shell_interactive.py@351
PS2, Line 351:result = shell1.get_result()
 : assert "Query options (defaults shown in []):" in 
result.stdout
 : assert "ABORT_ON_ERROR" in result.stdout
SUPPORT_START_OVER should appear here in result.stdout, right?



--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 02 Nov 2017 18:01:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2181: Add query option levels for display

2017-11-02 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8447 )

Change subject: IMPALA-2181: Add query option levels for display
..


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8447/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8447/2/be/src/service/impala-server.cc@1221
PS2, Line 1221: default_query_options_
Why pass in 'default_query_options_'? Isn't it already visible in 
AddOptionLevelToConfigDescription()?


http://gerrit.cloudera.org:8080/#/c/8447/2/be/src/service/impala-server.cc@1224
PS2, Line 1224: ConfigVariable support_start_over;
  :   support_start_over.__set_key("support_start_over");
  :   support_start_over.__set_value("false");
  :   default_configs_.push_back(support_start_over);
Shouldn't you call AddOptionLevelToConfigDescription() with 
'support_start_over' as well?


http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_client.py
File shell/impala_client.py:

http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_client.py@106
PS2, Line 106: is not
It is safer to use "!=" here.

"is not" works for small integers with the CPython interpreter but it is 
implementation specific behavior and might be changed in the future.


http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_client.py@107
PS2, Line 107: return ""
nit: maybe we should be more defensive and return "" as well if 
parsed_description[0] != "OptionLevel".


http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@127
PS2, Line 127:   PRINT_REGULAR_OPTIONS_ONLY = 1
 :   PRINT_ALL_OPTIONS = 2
nit: why not use a bool flag instead?


http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@225
PS2, Line 225:  =
nit: remove spaces around '='


http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@250
PS2, Line 250: that
'then'?


http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@266
PS2, Line 266: default_options[option_name]
Why not use 'option_value'?


http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@263
PS2, Line 263:   elif level == self.QUERY_OPTION_ADVANCED:
 : advanced_options[option_name] = option_value
 :   else:
 : advanced_options[option_name] = 
default_options[option_name]
How about this instead?

  else:
advanced_options[option_name] = option_value



--
To view, visit http://gerrit.cloudera.org:8080/8447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1
Gerrit-Change-Number: 8447
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 02 Nov 2017 17:32:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-26 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 13:

> Uploaded patch set 13: Patch Set 12 was rebased.

Thanks for the review.
I think it needed a rebase. Could you restart the job?


--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 26 Oct 2017 06:11:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-26 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/service/fe-support.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
12 files changed, 775 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/13
--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-25 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/service/fe-support.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
12 files changed, 775 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/12
--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-25 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 11:

(10 comments)

Thanks for the review!

http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@455
PS10, Line 455:   TQueryOptions options;
> Thanks for investigating and providing an explanation!
Done


http://gerrit.cloudera.org:8080/#/c/7564/11/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/7564/11/be/src/service/fe-support.cc@454
PS11, Line 454: JNIEnv* env, jclass caller_class, jstring options_str, 
jbyteArray tquery_options) {
> rename options_str to csv_query_options for consistency
Done


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@288
PS10, Line 288: if (!currentTestCase.isValid()) {
> All PlannerTest .test files should end with "", so I think the code her
Correct, every PlannerTest .test file ends with "". I've removed  
parseQueryOptions() from L338.

I find the code in L339-L344 very confusing.  If the .test file doesn't end 
with "", the last non-terminated test case is still validated and returned 
to the caller. In addition, if the very last section is not "" terminated 
it will be silently ignored.

Maybe we should do something like this instead:

if (!currentTestCase.isEmpty() || !sectionContents.isEmpty()) {
  throw new IllegalStateException("Test case" +
  " at line " + currentTestCase.startLineNum +
  " is not '' terminated.");
}
return currentTestCase;


http://gerrit.cloudera.org:8080/#/c/7564/11/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java:

http://gerrit.cloudera.org:8080/#/c/7564/11/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@359
PS11, Line 359:   " - " + e.getMessage());
> also pass in 'e' as the cause
Done


http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test:

http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@38
PS11, Line 38:  DISTRIBUTEDPLAN
> What's the value in keeping this section?
I would like to keep it as a reference, so that we can see the effect of 
RUNTIME_FILTER_MODE=LOCAL in L339 and L422.


http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@174
PS11, Line 174:  DISTRIBUTEDPLAN
> What's the value in keeping this section?
Removed it.


http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@257
PS11, Line 257:  DISTRIBUTEDPLAN
> What's the value in keeping this section?
Removed it.


http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@392
PS11, Line 392: MAX_NUM_RUNTIME_FILTERS=4
> How about setting to 3 so we can see both local and non-local filters being
Done


http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@582
PS11, Line 582:  DISTRIBUTEDPLAN
> What's the value in keeping this section?
Removed it.


http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@657
PS11, Line 657:  DISTRIBUTEDPLAN
> What's the value in keeping this section?
Removed it.



--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 25 Oct 2017 

[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-24 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 10:

(26 comments)

http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@455
PS10, Line 455: jbyteArray thrift_in_query_options) {
> Not sure what 'thrift_in_query_options' means. How about using 'tquery_opti
I've renamed the parameter.

I decided to pass in the existing query options because the thrift-generated 
C++ and Java TQueryOptions classes don't handle the isset flags properly.

If I don't pass in the existing query options, then:

1. I'd have to create an "empty" TQueryOptions instance inside this function.

2. The __isset member of that instance would contain true values for every 
option even though they haven't been set explicitly. Therefore I'd have to 
overwrite __isset with false values somehow before calling 
impala::ParseQueryOptions().

3. When FeSupport.ParseQueryOptions() deserializes the TQueryOptions instance 
returned by Java_org_apache_impala_service_FeSupport_NativeParseQueryOptions(), 
it would still find that some query options have been set, even if they haven't 
been touched (e.g. TQueryOptions.setExplain_levelIsSet() always returns true 
since explain_level != null even if it hasn't been set explicitly).

I'm not sure why thrift-generated classes behave this way. Maybe there's a 
better way to work around the isset bugs, but I couldn't think of any.


http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@457
PS10, Line 457:   DeserializeThriftMsg(env, thrift_in_query_options, );
> Needs error handling, e.g. using THROW_IF_ERROR_RET
Done


http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@458
PS10, Line 458:   const char *o = env->GetStringUTFChars(options_str, NULL);
> * You need to release the string UTF chars, take a look at JniUtfCharGuard
Done


http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@461
PS10, Line 461:   TParseQueryOptionsResult result;
> I think it's simpler to convert all Status to an exception. That way we don
Done


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java@119
PS10, Line 119: // Create runtime filters. 'singleNodePlan' now points to 
the root of the distributed
> Comment is wrong. 'singleNodePlan' definitely does not point to the root of
Done


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@354
PS10, Line 354: public void addTarget(RuntimeFilterTarget target) { 
targets_.add(target); }
> add newline before
Done


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@397
PS10, Line 397: Preconditions.checkState(maxNumFilters >= 0);
> Why is 0 not a valid value?
0 is a valid value. MAX_NUM_RUNTIME_FILTERS=0 effectively removes every runtime 
filter.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@423
PS10, Line 423:   filter.computeNdvEstimate();
> My understanding is that IMPALA-3450 has been fixed and we can remove this
Done


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@537
PS10, Line 537:* The assigned filters are the ones for which 'scanNode' can 
be used a destination
> can be used as a destination node
Done


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@539
PS10, Line 539:* 1. If DISABLE_ROW_RUNTIME_FILTERING query option is set, a 
filter is only assigned to
> If the DISABLE_ROW_RUNTIME_FILTERING query option is set ...
Done


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@541
PS10, Line 541:* 2. If RUNTIME_FILTER_MODE query option is set to LOCAL, a 
filter is only assigned to
> If the RUNTIME_FILTER_MODE query option is set ...
Done


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@542
PS10, Line 542:*'scanNode' if the filter is local to the scan node.
> This doesn't really explain what the LOCAL option means. How about:
Done



[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-24 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/service/fe-support.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
12 files changed, 927 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/11
--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-06 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 10:

> (5 comments)
 >
 > Nice. So much better. I'll let Alex take a look as well and do the
 > final +2.

Thanks for the review!


--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 06 Oct 2017 10:29:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-06 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/service/fe-support.cc
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test
13 files changed, 743 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/10
-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-06 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 9:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/main/java/org/apache/impala/service/FeSupport.java@292
PS9, Line 292: paresResult
> I believe you meant parseResult?
Yes, fixed it.


http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java:

http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@125
PS9, Line 125: public int getStartingLineNum() {
 :   return startLineNum;
 : }
> single line
Done


http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@129
PS9, Line 129: public TQueryOptions getOptions() {
 :   return this.options;
 : }
> nit: single line
Done


http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@133
PS9, Line 133: public void setOptions(TQueryOptions options) {
 :   this.options = options;
 : }
> single line
Done


http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@367
PS9, Line 367:
> Add a Preconditions.checkNotNull(result) here.
Done



--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 06 Oct 2017 10:18:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-05 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/service/fe-support.cc
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test
13 files changed, 744 insertions(+), 105 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/9
--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-03 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 8:

> Uploaded patch set 8.

Rebased again because of a conflict in hdfs-scan-node-base.cc.


--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 03 Oct 2017 11:57:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-03 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/service/fe-support.cc
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test
13 files changed, 756 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/8
--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-03 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 7:

> Thanks for adding the support for parsing the query options. Did
 > you try to reuse the c++ implementation through a JNI call? It
 > would be nice if we could avoid implementing the same functionality
 > in both c++ and java. See FeSupport.java class for other cases
 > where we call into the backend from the java side. Thanks.

Done.


--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 03 Oct 2017 10:58:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-03 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M be/src/service/fe-support.cc
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test
13 files changed, 756 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/7
--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-09-28 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 6:

> Uploaded patch set 6.

Rebased against master.


--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 28 Sep 2017 15:18:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-09-28 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test
10 files changed, 721 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/6
--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-09-28 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 5:

> Uploaded patch set 5.

Added support for query options to .test files.


--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 28 Sep 2017 14:45:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-09-28 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test
11 files changed, 725 insertions(+), 120 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/5
--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-09-06 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 4:

> > > (1 comment)
 > >
 > > The tests in this change use only 3 query options. Adding a new
 > > section to the .test files to support only these 3 options
 > wouldn't
 > > be too much work.
 > >
 > > On the other hand, adding support for all the query options that
 > > Impala supports would be a lot harder. Probably we would have to
 > > implement that using some Java reflection trickery.
 > 
 > I don't think you need to use Java reflection. The generated Java
 > class for TQueryOptions has a number of helper functions to search
 > and set a field by name. So, for instance the QUERY_OPTIONS section
 > could have key=value pairs that correspond to query options. Then
 > we could write a small function that parses the key value pairs and
 > uses the helper functions to check for valid query options and set
 > the values. Do you want to give it a try? Thanks

Thanks. I'm still confused though how to implement setting enum query options 
without reflection. e.g.:
 QUERYOPTIONS
COMPRESSION_CODEC=GZIP

Here we have to know that the type of TQueryOptions.compression_codec is 
org.apache.impala.thrift.THdfsCompression, otherwise we cannot parse "GZIP". Am 
I missing something?

-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-09-01 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 4:

> (1 comment)

The tests in this change use only 3 query options. Adding a new section to the 
.test files to support only these 3 options wouldn't be too much work.

On the other hand, adding support for all the query options that Impala 
supports would be a lot harder. Probably we would have to implement that using 
some Java reflection trickery.

-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-08-30 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#4).

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test
M tests/query_test/test_runtime_filters.py
10 files changed, 666 insertions(+), 97 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-08-30 Thread Attila Jeges (Code Review)
Hello Thomas Tauber-Marshall,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7564

to look at the new patch set (#4).

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test
M tests/query_test/test_runtime_filters.py
10 files changed, 666 insertions(+), 97 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-08-30 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7564/2/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS2, Line 119: singleNodePlan
> Let's expand to the comment to mention that 'singleNodePlan' now points to 
Done


http://gerrit.cloudera.org:8080/#/c/7564/2/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

PS2, Line 87: def test_distrib_plan_filter_pruning(self, vector):
: vector.get_value('exec_option')['num_nodes'] = 0
: 
self.run_test_case('QueryTest/runtime_filters_distrib_pruning', vector)
: 
:   def test_singlenode_plan_filter_pruning(self, vector):
: vector.get_value('exec_option')['num_nodes'] = 1
: 
self.run_test_case('QueryTest/runtime_filters_singlenode_pruning', vector)
> I don't follow. Did you check how the query option is set in testRuntimeFil
Let me give you some context:
In the current patch-set 'runtime_filters_distrib_pruning.test' and 
'runtime_filters_singlenode_pruning.test' contain 11 test cases in total. Each 
of them tests runtime filter assignment with a different combination of query 
options. There are inline SET statements inside the .test files before each 
test case to set the options for that test.

To implement this in PlannerTest.java, I would have to create 11 '.test' files, 
one for each test case. This is necessary because PlannerTest.java does not 
support inline SET statements inside the .test files. This approach seemed too 
much trouble to me. What do you think?


-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-08-29 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#3).

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test
M tests/query_test/test_runtime_filters.py
10 files changed, 665 insertions(+), 97 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-08-29 Thread Attila Jeges (Code Review)
Hello Thomas Tauber-Marshall,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7564

to look at the new patch set (#3).

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test
M tests/query_test/test_runtime_filters.py
10 files changed, 665 insertions(+), 97 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-08-29 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 2:

(8 comments)

Thanks for the review.

http://gerrit.cloudera.org:8080/#/c/7564/2/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS2, Line 119: singleNodePlan
> The comment in L448 (RuntimeFilterGenerator) suggests that this function is
The single node plan is created on L104 and assigned to 'singleNodePlan'. 
Distributed plan is created in-place on L114, therefore after L114 
'singleNodePlan' refers to the distributed plan. Maybe I should rename 
'singleNodePlan' to something else to avoid confusion?


PS2, Line 117: // create runtime filters
 : if (ctx_.getQueryOptions().getRuntime_filter_mode() != 
TRuntimeFilterMode.OFF) {
 :   RuntimeFilterGenerator.generateRuntimeFilters(ctx_, 
singleNodePlan);
 :   ctx_.getAnalysisResult().getTimeline().markEvent("Runtime 
filters computed");
 : }
> Why was this moved here?
The block was moved here because the distributed plan is created on L114 and I 
wanted to assign runtime filters on the distributed plan instead of the single 
node plan. Alex suggested this approach. This way we can enforce all the 
constraints (described in the commit message) during filter assignment directly.


http://gerrit.cloudera.org:8080/#/c/7564/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

PS2, Line 142: public boolean isBoundByPartitionColumns = false;
 :   // Indicates if 'node' is in the same fragment as the join 
that produces the
 :   // filter
 :   public boolean isLocalTarget = false;
> can they change once they are set? If not, make them final.
Done


PS2, Line 165: 
tFilterTarget.setIs_bound_by_partition_columns(isBoundByPartitionColumns);
 : tFilterTarget.setIs_local_target(isLocalTarget);
> Do you have to send these to the backend? I think they are only used in DCH
The BE uses these flags to create instances of Coordinator::FilterTarget that 
are used in coordinator.cc.


PS2, Line 455: Preconditions.checkNotNull(ctx);
 : Preconditions.checkNotNull(root);
> Remove, not really doing anything useful here.
Done


PS2, Line 548: Preconditions.checkNotNull(ctx);
 : Preconditions.checkNotNull(scanNode);
> remove
Done


PS2, Line 550: Analyzer analyzer = ctx.getRootAnalyzer();
 : boolean isSingleNodeExec = ctx.isSingleNodeExec();
 : boolean disableRowRuntimeFiltering =
 : ctx.getQueryOptions().isDisable_row_runtime_filtering();
 : TRuntimeFilterMode runtimeFilterMode = 
ctx.getQueryOptions().getRuntime_filter_mode();
> nit: move these lines below L557. No need to get these values if we don't e
Done


http://gerrit.cloudera.org:8080/#/c/7564/2/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

PS2, Line 87: def test_distrib_plan_filter_pruning(self, vector):
: vector.get_value('exec_option')['num_nodes'] = 0
: 
self.run_test_case('QueryTest/runtime_filters_distrib_pruning', vector)
: 
:   def test_singlenode_plan_filter_pruning(self, vector):
: vector.get_value('exec_option')['num_nodes'] = 1
: 
self.run_test_case('QueryTest/runtime_filters_singlenode_pruning', vector)
> These should be Planner tests (see PlannerTest.java)
These are written as query tests and not planner tests because the planner 
tests don't have a good way to 'SET' query options on a per-test case basis.
(different test cases in 'runtime_filters_distrib_pruning.test' and 
'runtime_filters_singlenode_pruning.test' use different query options)


-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-08-03 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 2:

(6 comments)

Thanks for the review!

http://gerrit.cloudera.org:8080/#/c/7564/1/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS1, Line 119: RuntimeFilterGener
> unnecessary
Done


PS1, Line 121: 
 : 
> single line?
Done


http://gerrit.cloudera.org:8080/#/c/7564/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

Line 538:* The assigned filters are the ones for which 'scanNode' can be 
used a destination
> update this comment
Done


PS1, Line 549: Preconditions.checkNotNull(scanNode);
 : Analyzer analyzer = ctx.getRootAnalyzer();
> single line?
Done


PS1, Line 590: e
> nit: space before ':'
Done


http://gerrit.cloudera.org:8080/#/c/7564/1/testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test
File 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test:

Line 1: 
> I suppose these are written as query tests and not planner tests because th
Correct. These test cases should be in the planner test suite, but implementing 
them this way is more straightforward.


-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-08-03 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2).

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test
M tests/query_test/test_runtime_filters.py
10 files changed, 666 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-08-03 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2).

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test
M tests/query_test/test_runtime_filters.py
10 files changed, 666 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter

2017-07-19 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter
..


Patch Set 4:

> Patch Set 3: Code-Review+2

Thanks for reviewing it.

-- 
To view, visit http://gerrit.cloudera.org:8080/7394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-07-19 Thread Attila Jeges (Code Review)
Attila Jeges has abandoned this change.

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Abandoned

It was decided that another approach should be taken to implement this feature. 
I'm abandoning this change.

-- 
To view, visit http://gerrit.cloudera.org:8080/7096
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I422e48847428cab9887aee899fed47a8de95c323
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter

2017-07-18 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#3).

Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter
..

IMPALA-5407: Fix crash in HdfsSequenceTableWriter

The following use of sequence file writer can lead to a crash:
> set compression_codec=gzip;
> set seq_compression_mode=record;
> set allow_unsupported_formats=1;
> create table seq_tbl like tbl stored as sequencefile;
> insert into seq_tbl select * from tbl;

This fix removes the MemPool::FreeAll() call from
HdfsSequenceTableWriter::Flush(). Freeing the memory pool in Flush()
is incorrect because a memory pool buffer is cached by the compressor
in the table writer which isn't reset across calls to Flush().

If the file that is being written is big enough,
HdfsSequenceTableWriter::AppendRows() will call Flush() multiple
times causing memory corruption.

Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
---
M be/src/exec/hdfs-sequence-table-writer.cc
M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
2 files changed, 18 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter

2017-07-18 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter
..


Patch Set 2:

(2 comments)

Added a test case to seq-writer.test and made sure that it crashes impalad if 
the fix is not applied.

http://gerrit.cloudera.org:8080/#/c/7394/2//COMMIT_MSG
Commit Message:

PS2, Line 16: The issue is that HdfsSequenceTableWriter::Flush() frees the 
per-file
: memory pool
> Please clarify that HdfsSequenceTableWriter::AppendRows() may call Flush() 
Done


PS2, Line 17: in use
> cached in the compressor
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter

2017-07-18 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#3).

Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter
..

IMPALA-5407: Fix crash in HdfsSequenceTableWriter

The following use of sequence file writer can lead to a crash:
> set compression_codec=gzip;
> set seq_compression_mode=record;
> set allow_unsupported_formats=1;
> create table seq_tbl like tbl stored as sequencefile;
> insert into seq_tbl select * from tbl;

This fix removes the MemPool::FreeAll() call from
HdfsSequenceTableWriter::Flush(). Freeing the memory pool in Flush()
is incorrect because a memory pool buffer is cached by the compressor
in the table writer which isn't reset across calls to Flush().

If the file that is being written is big enough,
HdfsSequenceTableWriter::AppendRows() will call Flush() multiple
times causing memory corruption.

Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
---
M be/src/exec/hdfs-sequence-table-writer.cc
M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
2 files changed, 18 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter

2017-07-13 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2).

Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter
..

IMPALA-5407: Fix crash in HdfsSequenceTableWriter

The following use of sequence file writer can lead to a crash:
> set compression_codec=gzip;
> set seq_compression_mode=record;
> set allow_unsupported_formats=1;
> create table seq_tbl like tbl stored as sequencefile;
> insert into seq_tbl select * from tbl;

The issue is that HdfsSequenceTableWriter::Flush() frees the per-file
memory pool, but the buffer used for compression is still in use after
Flush() returns. If there are additional rows to process after calling
Flush(), impalad might crash.

This fix removes the FreeAll() call from Flush(). It was tested on a
local minicluster with a table containing around 100 million rows.

Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
---
M be/src/exec/hdfs-sequence-table-writer.cc
1 file changed, 0 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter

2017-07-13 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2).

Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter
..

IMPALA-5407: Fix crash in HdfsSequenceTableWriter

The following use of sequence file writer can lead to a crash:
> set compression_codec=gzip;
> set seq_compression_mode=record;
> set allow_unsupported_formats=1;
> create table seq_tbl like tbl stored as sequencefile;
> insert into seq_tbl select * from tbl;

The issue is that HdfsSequenceTableWriter::Flush() frees the per-file
memory pool, but the buffer used for compression is still in use after
Flush() returns. If there are additional rows to process after calling
Flush(), impalad might crash.

This fix removes the FreeAll() call from Flush(). It was tested on a
local minicluster with a table containing around 100 million rows.

Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
---
M be/src/exec/hdfs-sequence-table-writer.cc
1 file changed, 0 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-3381: Support AM/PM marker in date and time format strings

2017-07-11 Thread Attila Jeges (Code Review)
Attila Jeges has abandoned this change.

Change subject: IMPALA-3381: Support AM/PM marker in date and time format 
strings
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/6523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I99794a3e152f1712c6c469bb266d23a81d19ca34
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter

2017-07-11 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7394

Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter
..

IMPALA-5407: Fix crash in HdfsSequenceTableWriter

The following use of sequence file writer can lead to a crash:
> set compression_codec=gzip;
> set seq_compression_mode=record;
> set allow_unsupported_formats=1;
> create table seq_tbl like tbl stored as sequencefile;
> insert into seq_tbl select * from tbl;

This patch moves clearing the memory pool from Flush() to
InitNewFile(). This mirrors the way the per-file memory pool is
handled in class HdfsParquetTableWriter.

Tested on a local minicluster with a table containing around 100
million rows.

Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
---
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
2 files changed, 6 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-06-06 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7096

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to plan nodes without taking the value of RUNTIME_FILTER_MODE and
DISABLE_ROW_RUNTIME_FILTERING query options into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. Pruning based on the value of DISABLE_ROW_RUNTIME_FILTERING
query option takes place as early as when runtime filters are assigned
to the single node plan. Pruning based on RUNTIME_FILTER_MODE query
option on the other hand has to wait until the distributed plan is
ready (since it requires information about the locality of runtime
filter targets).

Change-Id: I422e48847428cab9887aee899fed47a8de95c323
---
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-scan-node-base.cc
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_distrib_pruning.test
A 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_singlenode_pruning.test
M tests/query_test/test_runtime_filters.py
8 files changed, 665 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/7096/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7096
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I422e48847428cab9887aee899fed47a8de95c323
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

2017-05-22 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2).

Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp 
data in Parquet"
..

Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

Reverting IMPALA-2716 as SparkSQL does not agree with the approach
taken.

More details can be found at:
https://issues.apache.org/jira/browse/SPARK-12297

Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
---
M be/src/benchmarks/CMakeLists.txt
D be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
D tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 75 insertions(+), 850 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/6896/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 


[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

2017-05-22 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp 
data in Parquet"
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6896/1/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 25: #include 
> why do we need these headers?
Removed the new includes and kept boost/date_time/local_time/local_time.hpp (it 
is required for  boost::posix_time)


http://gerrit.cloudera.org:8080/#/c/6896/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 123:   // Avro schema of this table if this is an Avro table, otherwise 
null. Set in load().
> what did these have to do with your change? are they just dead variables (i
Removed them, they were left behind accidentaly when I resolved the conflicts.


-- 
To view, visit http://gerrit.cloudera.org:8080/6896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

2017-05-22 Thread Attila Jeges (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6896

to look at the new patch set (#2).

Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp 
data in Parquet"
..

Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

Reverting IMPALA-2716 as SparkSQL does not agree with the approach
taken.

More details can be found at:
https://issues.apache.org/jira/browse/SPARK-12297

Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
---
M be/src/benchmarks/CMakeLists.txt
D be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
D tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 75 insertions(+), 850 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/6896/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 


[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

2017-05-22 Thread Attila Jeges (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6896

to look at the new patch set (#2).

Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp 
data in Parquet"
..

Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

Reverting IMPALA-2716 as SparkSQL does not agree with the approach
taken.

More details can be found at:
https://issues.apache.org/jira/browse/SPARK-12297

Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
---
M be/src/benchmarks/CMakeLists.txt
D be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
D tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 84 insertions(+), 850 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/6896/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-18 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6910/2/tests/util/adls_util.py
File tests/util/adls_util.py:

PS2, Line 29: @classmethod
Why decorate __init__() with @classmethod? As a result, all 'ADLSClient' 
instances will share the same 'token' and 'adlsclient' members.

If this is what you meant to do, please use 'cls' instead of 'self' in 
__init__() to make the intention clear.


-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2525: Treat parquet ENUMs as STRINGs when creating impala tables.

2017-05-18 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2525: Treat parquet ENUMs as STRINGs when creating 
impala tables.
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6550/4/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java:

PS4, Line 276:   //   before conversion to parquet, had been enum. 
Applications which do not have enumerated
nit: Long line. Lines should wrap before 90 characters are hit.


PS4, Line 277:   
nit: Please remove the extra leading spaces before 'types' (and in the line 
above before 'before').


-- 
To view, visit http://gerrit.cloudera.org:8080/6550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7a2e20c3ab83eb3fac422c3b33c117856fec475
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jakub Kukul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Jakub Kukul 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

2017-05-17 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp 
data in Parquet"
..


Patch Set 1:

> Was it a clean revert? If not, where were the conflicts?

There were conflicts in the following files:
be/src/exec/parquet-column-readers.cc
be/src/runtime/timestamp-value.cc
be/src/runtime/timestamp-value.h
common/thrift/generate_error_codes.py
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
fe/src/main/java/org/apache/impala/catalog/HdfsTable.java

The conflicts were straightforward to resolve.

-- 
To view, visit http://gerrit.cloudera.org:8080/6896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-HasComments: No


[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

2017-05-16 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6896

Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp 
data in Parquet"
..

Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"

Reverting IMPALA-2716 as SparkSQL does not agree with the approach
taken.

More details can be found at:
https://issues.apache.org/jira/browse/SPARK-12297

Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
---
M be/src/benchmarks/CMakeLists.txt
D be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
D tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 84 insertions(+), 850 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/6896/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-5137: pt1, Refactor TimestampValue constructors

2017-05-10 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-5137: pt1, Refactor TimestampValue constructors
..


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6510/11/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS11, Line 282:   
nit: indentation is 4 for line continuation (here and below).


http://gerrit.cloudera.org:8080/#/c/6510/11/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

PS11, Line 93: TimestampValue t 
You might want to use "const TimestampValue& t =" here (and below), just like 
you did in aggregate-functions-ir.cc


-- 
To view, visit http://gerrit.cloudera.org:8080/6510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id25e19f7984e5ebf9073d9c569faf69cec142fa1
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support

2017-05-09 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-5266 Impala ABM / LZCNT support
..


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5821/8/be/src/util/bit-util-test.cc
File be/src/util/bit-util-test.cc:

PS8, Line 300: EXPECT_EQ(BitUtil::RoundUpToPowerOfTwo(7), 8);
 :   EXPECT_EQ(BitUtil::RoundUpToPowerOfTwo(8), 8);
 :   EXPECT_EQ(BitUtil::RoundUpToPowerOfTwo(65535), 65536);
Duplicate of L297-299. I guess the intent here was to call 
RoundUpToPowerOfTwo() with 7L, 8L and 65535L, right?


PS8, Line 308:   EXPECT_EQ(BitUtil::RoundUpToPowerOfTwoNoHw(7), 8);
 :   EXPECT_EQ(BitUtil::RoundUpToPowerOfTwoNoHw(8), 8);
 :   EXPECT_EQ(BitUtil::RoundUpToPowerOfTwoNoHw(65535), 65536);
same here


http://gerrit.cloudera.org:8080/#/c/5821/8/be/src/util/bit-util.h
File be/src/util/bit-util.h:

Line 376: DCHECK(v > 0);
DCHECK on L389 asserts that RoundUpToPowerOfTwo(int32_t) never overflows. Do we 
need a similar DCHECK here?


-- 
To view, visit http://gerrit.cloudera.org:8080/5821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5246: UDF's Close() should handle Expr's preparation failure

2017-05-03 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-5246: UDF's Close() should handle Expr's preparation 
failure
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6757/1/be/src/exprs/case-expr.cc
File be/src/exprs/case-expr.cc:

PS1, Line 82: fn_ctx->Free(reinterpret_cast(case_state))
I could be wrong, but shouldn't we call
fn_ctx->SetFunctionState(FunctionContext::THREAD_LOCAL, nullptr)
after fn_ctx->Free(case_state) to avoid double-free problems ?
(just like we do in Close functions in be/src/testutil/test-udfs.cc)


http://gerrit.cloudera.org:8080/#/c/6757/1/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

PS1, Line 182: ctx->Free(seed);
same here


http://gerrit.cloudera.org:8080/#/c/6757/1/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS1, Line 614: delete re;
same here


http://gerrit.cloudera.org:8080/#/c/6757/1/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

PS1, Line 195: delete dt_ctx
same here


http://gerrit.cloudera.org:8080/#/c/6757/1/be/src/exprs/utility-functions.cc
File be/src/exprs/utility-functions.cc:

PS1, Line 59: delete uuid_gen;
same here


-- 
To view, visit http://gerrit.cloudera.org:8080/6757
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id2c689246ed4f8dd38f104fa35904f3926a7039c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-05-02 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 11:

> Uploaded patch set 11.

Fixed WARN_UNUSED_RESULT warning that failed clang-tidy job.

-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-05-02 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#11).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE  will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

HDFS tables created by Impala using CREATE TABLE LIKE 
will copy the property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 850 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-05-02 Thread Attila Jeges (Code Review)
Hello Impala Public Jenkins, Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5939

to look at the new patch set (#11).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE  will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

HDFS tables created by Impala using CREATE TABLE LIKE 
will copy the property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 850 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-5261: Heap use-after-free in HdfsSequenceTableWriter

2017-04-28 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-5261: Heap use-after-free in HdfsSequenceTableWriter
..


Patch Set 1:

(1 comment)

> (1 comment)
 > 
 > I assumed you have verified the fix with ASAN build, right ?

Yes, added testing details to the commit message.

http://gerrit.cloudera.org:8080/#/c/6762/1/be/src/exec/hdfs-sequence-table-writer.cc
File be/src/exec/hdfs-sequence-table-writer.cc:

Line 306: value_length = text.size();
> DCHECK_EQ(value_length, row_buf_.Size());
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id339247f892710529d8ad56dd1e98eadbf32900b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5257: test seq writer hive compatibility fails on local file system build

2017-04-27 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-5257: test_seq_writer_hive_compatibility fails on local 
file system build
..


Patch Set 1:

> Is this ready to be GVO?

Yes, it is

-- 
To view, visit http://gerrit.cloudera.org:8080/6746
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7dbe2529818865f871b66d78642ed956d1ee039
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-27 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 10:

> Uploaded patch set 10.

This is the rebased change.

-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-27 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#10).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE  will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

HDFS tables created by Impala using CREATE TABLE LIKE 
will copy the property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 850 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-27 Thread Attila Jeges (Code Review)
Hello Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5939

to look at the new patch set (#10).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE  will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

HDFS tables created by Impala using CREATE TABLE LIKE 
will copy the property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 850 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-5257: test seq writer hive compatibility fails on local file system build

2017-04-27 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-5257: test_seq_writer_hive_compatibility fails on local 
file system build
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6746/1/tests/query_test/test_compressed_formats.py
File tests/query_test/test_compressed_formats.py:

PS1, Line 152: @SkipIfLocal.hive
> or @SkipIf.not_hdfs. The current one seems more correct as it's about Hive 
Correct. I've checked some other end-to-end tests that use Hive and they used 
the same skip decorators.


-- 
To view, visit http://gerrit.cloudera.org:8080/6746
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7dbe2529818865f871b66d78642ed956d1ee039
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5257: test seq writer hive compatibility fails on local file system build

2017-04-27 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-5257: test_seq_writer_hive_compatibility fails on local 
file system build
..


Patch Set 1:

> (1 comment)
 > 
 > How did you test it ?

I've done some simple testing locally. 

I set TARGET_FILESYSTEM environment variable to 'local', 'isilon', 's3' one 
after another and made sure that the test_seq_writer_hive_compatibility test is 
skipped.

-- 
To view, visit http://gerrit.cloudera.org:8080/6746
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7dbe2529818865f871b66d78642ed956d1ee039
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-27 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5939/8/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

PS8, Line 124: local_date_time lt(temp, timezone);
> It says an exception can be thrown by the constructor, but I guess what's i
Yes, exactly. According to the documentation 
local_date_time(posix_time::ptime, timezone_ptr) constructor won't throw an 
exception.


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-27 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 9:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5939/8/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

PS8, Line 601:  
> ... in 'scanner_status'.
Done


PS8, Line 604:  const TimestampValue* 
> please pass this as a 'Status* scanner_status', which is what we usually do
Done


http://gerrit.cloudera.org:8080/#/c/5939/8/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

PS8, Line 124: local_date_time lt(temp, timezone);
> can this throw an exception?
I couldn't find any indication of an exception in the code or in the 
documentation:
http://www.boost.org/doc/libs/1_57_0/doc/html/date_time/local_time.html#date_time.local_time.local_date_time


PS8, Line 125: *this = lt.local_time();
> or that?
Same here.


http://gerrit.cloudera.org:8080/#/c/5939/8/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS8, Line 122: tabl
> it's the same for other FileSystem interfaced things, like S3. So maybe rem
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-27 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#9).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE  will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

HDFS tables created by Impala using CREATE TABLE LIKE 
will copy the property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 850 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-27 Thread Attila Jeges (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5939

to look at the new patch set (#9).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE  will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

HDFS tables created by Impala using CREATE TABLE LIKE 
will copy the property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 850 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-5257: test seq writer hive compatibility fails on local file system build

2017-04-27 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6746

Change subject: IMPALA-5257: test_seq_writer_hive_compatibility fails on local 
file system build
..

IMPALA-5257: test_seq_writer_hive_compatibility fails on local file
system build

TestTableWriters.test_seq_writer_hive_compatibility test introduced in
IMPALA-3079 had to be skipped for non-HDFS filesystems.

Change-Id: Ic7dbe2529818865f871b66d78642ed956d1ee039
---
M tests/query_test/test_compressed_formats.py
1 file changed, 3 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/6746/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6746
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic7dbe2529818865f871b66d78642ed956d1ee039
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-26 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 8:

> (1 comment)

Yes, I think we should do that.

-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-26 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

Line 123:   ToPtime();
> I seem to remember there also being a lock in the boost code related to loc
I've investigated the history of UtcToLocal():

- UtcToLocal() was first introduced in IMPALA-1658 (87b9fac2adda). This version 
of UtcToLocal() already used localtime_r() for UTC -> local time conversion. It 
also used couple boost functions to convert back and forth between 'ptime' and 
'struct tm' types (to_tm(), ptime_from_tm(), nanoseconds())

- The first version of UtcToLocal() made querying tables with timestamps a lot 
slower (up to 30x times slower). The root cause was a global lock in 
localtime_r(). This is discussed in IMPALA-3316 and IMPALA-2125. Both JIRAs are 
still open.

- UtcToLocal() was made 50% faster by 4ddce7f97880. This change got rid of the 
boost functions (to_tm(), ptime_from_tm(), nanoseconds()) and added the comment 
at L77. The change kept localtime_r() though, so it didn't do anything to solve 
the global lock problem.

I wrote a simple benchmark program to confirm that UtcToLocal()'s performance 
degrades if the number of threads running it in parallel is increased. The 
benchmark program also confirms that FromUtc() does not have this problem.


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer

2017-04-25 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-3079: Fix sequence file writer
..


Patch Set 9: Code-Review+1

> Carry +2 forward.

Core tests succeeded: 
http://sandbox.jenkins.cloudera.com/job/impala-private-build-and-test/5500/

-- 
To view, visit http://gerrit.cloudera.org:8080/6107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer

2017-04-24 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#9).

Change subject: IMPALA-3079: Fix sequence file writer
..

IMPALA-3079: Fix sequence file writer

This change fixes the following issues in the Sequence File Writer:
1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong()
   were broken. As a result, Impala created corrupt uncompressed
   sequence files.

2. KEY_CLASS_NAME was missing from the sequence file header. As a
   result, Hive could not read back uncompressed sequence files
   created by Impala.

3. Impala created record-compressed sequence files with empty keys
   block. As a result, Hive could not read back record-compressed
   sequence files created by Impala.

4. Impala created block-compressed files with:
   - empty key-lengths block
   - empty keys block
   - empty value-lengths block
   This resulted in invalid block-compressed sequence files that Hive could
   not read back.

5. In some cases the wrong Record-compression flag was written to the
   sequence file header. As a result, Hive could not read back record-
   compressed sequence files created by Impala.

6. Impala added 'sync_marker' instead of 'neg1_sync_marker' to the
   beginning of blocks in block-compressed sequence files. Hive could
   not read these files back.

7. The calculation of block sizes in SnappyBlockCompressor class was
   incorrect for odd-length buffers.

Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
---
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/util/compress.cc
M be/src/util/decompress-test.cc
M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
M tests/query_test/test_compressed_formats.py
8 files changed, 500 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-24 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#8).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE  will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

HDFS tables created by Impala using CREATE TABLE LIKE 
will copy the property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 806 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-24 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 8:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/5939/7//COMMIT_MSG
Commit Message:

PS7, Line 22: New HDFS tables created by Impala using CREATE TABLE and CREATE 
TABLE
: LIKE  will set the table property to UTC if the global flag
: --set_parque
> I don't follow this given the next two paragraphs. This isn't true for the 
Correct. Changed the commit message.


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 305:   // parquet-mr. If conversion should not occur, this is set to an 
empty string. Otherwise
> from the code, it looks like this is set to the empty string if conversion 
Correct, added a comment to document the behavior.


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 234: FLAGS_convert_legacy_hive_parquet_utc_timestamps
> given that parquet_mr_write_zone() will take precedence over the flag, I th
Done


PS7, Line 246: DCHECK
> nit: DCHECK_EQ()
Done


PS7, Line 609:  status = scanner_state->LogOrReturnError(msg);
> since this code is very performance critical, it would be best to order thi
Correct. Reordered the conditions in the if-elif-else statement


PS7, Line 611: status = status;
> these could all be UNLIKELY since we want to optimize for the non-error cas
Done


PS7, Line 625: LY(dst->HasDateAndTime()
> i.e. here too and below.
Done


PS7, Line 639: if (!SetTimestampConversionError(parent_->scan_node_, 
parent_->state_,
 : parent_->parse_status_, src, 
parent_->scan_node_->parquet_mr_write_zone())) {
 :   return false;
 : }
 :   }
 : } else {
 :   
DCHECK(parent_->scan_node_->parquet_mr_write_zone().empty());
 :   DCH
> seems worth factoring this block (which occurs 3 times) into a non-inlined 
Done. Added a new non-class member function and factored out the block into it.

(Adding a new member function instead to ScalarColumnReader template class is 
probably not a good idea since it is only needed in the 
ScalarColumnReader instance)


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 193: 
> could you add a comment explaining these two lookup techniques?
Done


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

PS7, Line 45: hen IsTimestampDependen
> how about drawing the connection with the next function directly:
Done


Line 49: 
> comment this.
Done


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

Line 101: Status s("Failed to convert timestamp to local time.");
> this might produce a ton of logging.  instead, why not return the Status ob
Done


Line 109: bool TimestampValue::FromUtc(const std::string& timezone_str) {
> how do we guarantee that this is true? did the caller already check that?
Yes, it did.


PS7, Line 111: _zone_ptr timezo
> UNLIKELY (since perf critical)
Done


Line 123:   ToPtime();
> see the comment at line 77. i think we were avoiding this for perf reasons.
I've added a micro-benchmark to compare the two. UtcToLocal() is about 2.5 
times faster than FromUtc() when input data size is 1,000 and 1.5 times faster 
when input data size is 10,000.

Rewriting FromUtc() not to use boost function calls is not that 
straightforward. The above version of UtcToLocal() uses localtime_r() standard 
C function to perform the UTC->local timezone conversion. FromUtc() on the 
other hand should convert timestamps to a given time zone. I don't think there 
is a standard C function to do this.


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

PS7, Line 197: 
> below you say "local time in the given timezone", which I guess is technica
Changed the comment below for 'FromUtc'


PS7, Line 203: 
> it's not clear that "convert" means that *this is modified in-place, so I t
Done


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS7, Line 122: new
> is that always the case?
Fixed the description


PS7, Line 122: bles created "
 : "using CREATE TABLE and CREATE TABLE LIKE . You can 
find details in the "
 : "documentation.");
 : 
 : DEFINE_int64(max_result_cache_size, 1
> I'm not sure that this makes sense on it's own. I think you kind of have to
Removed that part



[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-24 Thread Attila Jeges (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5939

to look at the new patch set (#8).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE  will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

HDFS tables created by Impala using CREATE TABLE LIKE 
will copy the property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 806 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-12 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#7).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New tables created by Impala will set the table property to UTC if the
global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is
set to true.

Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not
set the table property unless the global flag is set to true.

Tables created using CREATE TABLE LIKE  will copy the
property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
26 files changed, 665 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-12 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java:

Line 113:   "Invalid time zone in the '%s' table property: %s",
> double 'the'
Done


http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 665: 
> Let's move all the CREATE/ALTER tests into a separate TestParquetMrInt96Wri
Done


Line 1882: "'  '", "URI path cannot be empty.");
> easier to read single quotes
Done


Line 1904:   type == PrimitiveType.VARCHAR) {
> easier to read single quotes
Done


http://gerrit.cloudera.org:8080/#/c/5939/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py:

Line 105: parquet_path = get_fs_path(
> What does "fn" stand for? I'm thinking "file name", but this is not just a 
Renamed it to 'parquet_path'


Line 123:   ON i.id = h.id AND i.day = h.day  -- serves as a unique key
> easier to read with the alias 'i' next to the table
Done


Line 125:   (h.timestamp_col IS NULL) != (i.timestamp_col IS NULL)
> simplify the first two conditions with:
Done. Had to put round brackets around col IS NULL expressions to avoid 
analysis errors.


http://gerrit.cloudera.org:8080/#/c/5939/6/tests/query_test/test_parquet_timestamp_compatibility.py
File tests/query_test/test_parquet_timestamp_compatibility.py:

Line 78:   def test_invalid_parquet_mr_write_zone(self, vector, 
unique_database):
> test_invalid_parquet_mr_write_zone
Done


Line 118:   # tz_name conversion on the timestamp values.
> extra space after "triggers"
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-12 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#7).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New tables created by Impala will set the table property to UTC if the
global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is
set to true.

Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not
set the table property unless the global flag is set to true.

Tables created using CREATE TABLE LIKE  will copy the
property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
26 files changed, 665 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer

2017-04-10 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-3079: Fix sequence file writer
..


Patch Set 6:

(2 comments)

Thanks for the review.

http://gerrit.cloudera.org:8080/#/c/6107/6/be/src/exec/read-write-util.h
File be/src/exec/read-write-util.h:

PS6, Line 247: DCHECK(val < -112);
> nit: DCHECK_LT(val, -112);
Done


PS6, Line 258: DCHECK(val > 127);
> nit: DCHECK_GT(val, 127);
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-10 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#6).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New tables created by Impala will set the table property to UTC if the
global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is
set to true.

Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not
set the table property unless the global flag is set to true.

Tables created using CREATE TABLE LIKE  will copy the
property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
26 files changed, 657 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-10 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#6).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New tables created by Impala will set the table property to UTC if the
global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is
set to true.

Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not
set the table property unless the global flag is set to true.

Tables created using CREATE TABLE LIKE  will copy the
property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
26 files changed, 657 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-10 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5939/5/tests/custom_cluster/test_parquet_timestamp_compatibility.py
File tests/custom_cluster/test_parquet_timestamp_compatibility.py:

Line 71:   @pytest.mark.execute_serially
> Moved tests that use startup options to ./tests/custom_cluster/test_hive_pa
The rest of the tests were moved to 
./tests/query_test/test_parquet_timestamp_compatibility.py


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-10 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#6).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New tables created by Impala will set the table property to UTC if the
global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is
set to true.

Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not
set the table property unless the global flag is set to true.

Tables created using CREATE TABLE LIKE  will copy the
property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
26 files changed, 657 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-10 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 5:

(26 comments)

Thanks for the review.

http://gerrit.cloudera.org:8080/#/c/5939/5/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 305:   // parquet-mr.
> Comment that the FE guarantees that this is a valid timezone.
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 246:   DCHECK((is_timestamp_dependent_timezone_ && timezone_ == NULL) 
||
> simplify condition:
Done


Line 549:   /// Used to cache the timezone object corresponding to 
"parquet.mr.int96.write.zone"
> the "parquet.mr.int96.write.zone" table property
Done


Line 550:   /// table property to avoid repeated calls to 
'TimezoneDatabase::FindTimezone()'.
> no need to single-quote here
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 190:   // See if they specified a zone id
> Who is "they"? Suggest rephrasing
Done


Line 202:   return time_zone_ptr();
> return NULL?
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/common/thrift/BackendGflags.thrift
File common/thrift/BackendGflags.thrift:

Line 58:   // If true, all newly created Parquet tables will have the 
parquet.mr.int96.write.zone
> ... all newly created HDFS tables regardless of format ...
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

Line 218:   9: optional string parquet_mr_write_zone;
> remove trailing semicolon for consistency
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

Line 169:   private static void analyzeParquetMrWriteZone(Table table,
> These cases need tests in AnalyzeDDL
Done. I also changed the exception message for consistency.


http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java:

Line 112:   throw new AnalysisException("Invalid Time Zone: " + timezone);
> Mention that the timezone is in the table property 'parquet.mr.int96.write.
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 183: if 
(getTblProperties().containsKey(HdfsTable.TBL_PROP_PARQUET_MR_WRITE_ZONE)) {
> These cases need tests in AnalyzeDDL
Done


Line 186: "Only HDFS tables can use '%s' table property.",
> the '%s' table property
Done. I also changed the exception message for consistency.


http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 54: import org.apache.impala.common.InternalException;
> not needed
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/tests/custom_cluster/test_parquet_timestamp_compatibility.py
File tests/custom_cluster/test_parquet_timestamp_compatibility.py:

Line 25: class TestParquetTimestampCompatibility(CustomClusterTestSuite):
> Test needs a comment. In particular, what is covered here and what is cover
I've added a comment that describes tests in this file.

What do you mean by "integration tests not part of Impala tests"? I don't think 
we have any integration tests upstream.


Line 26:   TEST_DB_NAME = 'timezone_test_db'
> Why not use the unique_database fixture? That's the best practice.
Switched to unique_database.


Line 39: self.client = impalad.service.create_beeswax_client()
> I think we already have a self.client from ImpalaTestSuite.
Removed creating a client from here.


Line 49: 
"/test-warehouse/alltypesagg_hive_13_1_parquet/alltypesagg_hive_13_1.parquet"
> We need the appropriate filesystem prefix here. This will break on local fs
Done


Line 56: self.client.execute('USE ' + self.TEST_DB_NAME)
> Why? Better to avoid changing the session state
Removed it.


Line 59: self.client.execute('INVALIDATE METADATA')
> why?
Removed it, it's not needed.


Line 61:   def _set_impala_table_timezone(self, timezone):
> simplify to pass table name as param
Done


Line 71:   @pytest.mark.execute_serially
> How long does this test take? I'm thinking we should only have minimal test
Moved tests that use startup options to 
./tests/custom_cluster/test_hive_parquet_timestamp_conversion.py


Line 81: statement = '''ALTER TABLE hive_table
> Should be an analyzer test in AnalyzeDDL
Removed it from here.


Line 104: select_from_impala_table = '''SELECT timestamp_col FROM 
impala_table WHERE id = 1'''
> Can you 

[Impala-ASF-CR] IMPALA-3381: Support AM/PM marker in date and time format strings

2017-04-03 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2).

Change subject: IMPALA-3381: Support AM/PM marker in date and time format 
strings
..

IMPALA-3381: Support AM/PM marker in date and time format strings

This change adds AM/PM marker to format strings used in 'to_timestamp'
'unix_timestamp' and 'from_unixtime' functions.

It uses 'a' for the AM/PM marker following the Hive impelentation (
which follows Java 'SimpleDateFormat' patterns). Similarly to Hive,
the 'a' pattern letter can be repeated any number of times in the
format string without affecting the corresponding presentation.

For example:
> select from_unixtime(
>   unix_timestamp('2017-03-31 11:19:23 PM', '-MM-dd HH:mm:ss a'),
>   '-MM-dd HH:mm:ss aaa');
2017-03-31 11:19:23 PM

Change-Id: I99794a3e152f1712c6c469bb266d23a81d19ca34
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 133 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6523/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99794a3e152f1712c6c469bb266d23a81d19ca34
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer

2017-03-31 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-3079: Fix sequence file writer
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6107/4/be/src/exec/hdfs-sequence-table-writer.cc
File be/src/exec/hdfs-sequence-table-writer.cc:

PS4, Line 179: 
 : 
> Thanks for listing them out. Please also put this list in the commit messag
Done


http://gerrit.cloudera.org:8080/#/c/6107/5/be/src/exec/read-write-util.h
File be/src/exec/read-write-util.h:

Line 214: // Returns size of the encoded long value, including the 1 byte for 
length for val < -112
> for val < -112 or val > 127.
Done


PS5, Line 228: ollow
> nit: long line
Done


PS5, Line 245: um_bytes, 9);
> nit: help to comment why it's 119 here (which is different from 120 in the 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer

2017-03-31 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#6).

Change subject: IMPALA-3079: Fix sequence file writer
..

IMPALA-3079: Fix sequence file writer

This change fixes the following issues in the Sequence File Writer:
1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong()
   were broken. As a result, Impala could not read back uncompressed
   sequence files created by Impala.

2. KEY_CLASS_NAME was missing from the sequence file header. As a
   result, Hive could not read back uncompressed sequence files
   created by Impala.

3. Impala created record-compressed sequence files with empty keys
   block. As a result, Hive could not read back record-compressed
   sequence files created by Impala.

4. Impala created block-compressed files with:
   - empty key-lengths block
   - empty keys block
   - empty value-lengths block
   This resulted in invalid block-compressed sequence files that Hive could
   not read back.

5. In some cases the wrong Record-compression flag was written to the
   sequence file header. As a result, Hive could not read back record-
   compressed sequence files created by Impala.

6. Impala added 'sync_marker' instead of 'neg1_sync_marker' to the
   beginning of blocks in block-compressed sequence files. Hive could
   not read these files back.

7. The calculation of block sizes in SnappyBlockCompressor class was
   incorrect for odd-length buffers.

Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
---
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/util/compress.cc
M be/src/util/decompress-test.cc
M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
M tests/query_test/test_compressed_formats.py
8 files changed, 494 insertions(+), 80 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer

2017-03-31 Thread Attila Jeges (Code Review)
Hello Michael Ho,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6107

to look at the new patch set (#6).

Change subject: IMPALA-3079: Fix sequence file writer
..

IMPALA-3079: Fix sequence file writer

This change fixes the following issues in the Sequence File Writer:
1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong()
   were broken. As a result, Impala could not read back uncompressed
   sequence files created by Impala.

2. KEY_CLASS_NAME was missing from the sequence file header. As a
   result, Hive could not read back uncompressed sequence files
   created by Impala.

3. Impala created record-compressed sequence files with empty keys
   block. As a result, Hive could not read back record-compressed
   sequence files created by Impala.

4. Impala created block-compressed files with:
   - empty key-lengths block
   - empty keys block
   - empty value-lengths block
   This resulted in invalid block-compressed sequence files that Hive could
   not read back.

5. In some cases the wrong Record-compression flag was written to the
   sequence file header. As a result, Hive could not read back record-
   compressed sequence files created by Impala.

6. Impala added 'sync_marker' instead of 'neg1_sync_marker' to the
   beginning of blocks in block-compressed sequence files. Hive could
   not read these files back.

7. The calculation of block sizes in SnappyBlockCompressor class was
   incorrect for odd-length buffers.

Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
---
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/util/compress.cc
M be/src/util/decompress-test.cc
M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
M tests/query_test/test_compressed_formats.py
8 files changed, 494 insertions(+), 80 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-3381: Support AM/PM marker in date and time format strings

2017-03-31 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6523

Change subject: IMPALA-3381: Support AM/PM marker in date and time format 
strings
..

IMPALA-3381: Support AM/PM marker in date and time format strings

This change adds AM/PM marker to format strings used in 'to_timestamp'
'unix_timestamp' and 'from_unixtime' functions.

It uses 'a' for the AM/PM marker following the Hive impelentation (
which follows Java 'SimpleDateFormat' patterns). Similarly to Hive,
the 'a' pattern letter can be repeated any number of times in the
format string without affecting the corresponding presentation.

For example:
> select from_unixtime(
>   unix_timestamp('2017-03-31 11:19:23 PM', '-MM-dd HH:mm:ss a'),
>   '-MM-dd HH:mm:ss aaa');
2017-03-31 11:19:23 PM

Change-Id: I99794a3e152f1712c6c469bb266d23a81d19ca34
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 137 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6523/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I99794a3e152f1712c6c469bb266d23a81d19ca34
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-03-23 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 5:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 304:   // Time zone for adjusting timestamp values read from Parquet files 
written by
> single line
That would make the line 92 characters long.


http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 246:   DCHECK((is_timestamp_dependent_timezone_ && timezone_ == NULL) 
||
> Even better, we can check this at the scan node level, e.g., in HdfsScanNod
Simplified the code. The timezone check is done in the FE, the BE is guaranteed 
to have a valid timezone string.


Line 653:   }
> As mentioned above, we should terminate the query early when in such a bad 
Added DCHECK(false)


http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 194:   for (const string& tz_region: tz_region_list_) {
> No need to use auto here. const string& is clearer
Done


http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

Line 40:   /// If 'tz' is not found in the database, nullptr is returned.
> null pointer -> nullptr
Done


Line 46:   /// If 'tz' is not found in the database, nullptr is returned.
> null pointer -> nullptr
Done


http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 199:   /// conversion was successfull and false otherwise. If conversion 
failed *this is set to
> Comment on what the state of *this is when false is returned.
Done


Line 202: 
> Comment on what the state of *this is when false is returned.
Done


http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 357:   const char *tz = env->GetStringUTFChars(timezone, NULL);
> const char*
Done


http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 122: "the parquet.mr.int96.write.zone table property to UTC for new 
tables. This changes "
> consistent placing of space (at line end or at line beginning)
Done


http://gerrit.cloudera.org:8080/#/c/5939/3/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java:

Line 67: analyzeJoin(analyzer);
> If we do the validation here, then there is no need to check in each HdfsSc
Done


http://gerrit.cloudera.org:8080/#/c/5939/3/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 184:   if (getFileFormat() == THdfsFileFormat.KUDU) {
> You can check getFileFormat() here. The only non-HDFS formats we have are H
Done


Line 189:   String timezone = 
getTblProperties().get(HdfsTable.TBL_PROP_PARQUET_MR_WRITE_ZONE);
> We should only do this for HDFS tables right?
Done


http://gerrit.cloudera.org:8080/#/c/5939/4/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

Line 87:   // Returns true if timezone String is valid according to the BE 
timezone database, false
> valid according to the BE timezone database
Done


http://gerrit.cloudera.org:8080/#/c/5939/4/tests/custom_cluster/test_parquet_timestamp_compatibility.py
File tests/custom_cluster/test_parquet_timestamp_compatibility.py:

Line 1: # Licensed to the Apache Software Foundation (ASF) under one
> Use new Apache license header. Do we still have the old header in some exis
My bad, I created this test from an old test script.


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-03-23 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#5).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New tables created by Impala will set the table property to UTC if the
global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is
set to true.

Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not
set the table property unless the global flag is set to true.

Tables created using CREATE TABLE LIKE  will copy the
property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M tests/common/impala_test_suite.py
A tests/custom_cluster/test_parquet_timestamp_compatibility.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
24 files changed, 569 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer

2017-03-22 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#5).

Change subject: IMPALA-3079: Fix sequence file writer
..

IMPALA-3079: Fix sequence file writer

Before the fix, sequence file writer produced corrupt files in some
cases. Steps to reproduce:
  SET ALLOW_UNSUPPORTED_FORMATS=1;

  create table store_sales_seq_snap like tpcds_parquet.store_sales
  stored as SEQUENCEFILE;

  insert into store_sales_seq_snap partition(ss_sold_date_sk)
  select * from tpcds_parquet.store_sales
  where ss_sold_date_sk between 2450816 and 2451200;

The insert statement produces a corrupt file that cannot be read back.

This change fixes:
- The implementation of zero-compressed encoding in ReadWriteUtil
  class.
- The calculation of block sizes in SnappyBlockCompressor class.
- Creating record/block compressed sequence files in
  HdfsSequenceTableWriter class.

Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
---
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/util/compress.cc
M be/src/util/decompress-test.cc
M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
M tests/query_test/test_compressed_formats.py
8 files changed, 484 insertions(+), 80 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer

2017-03-22 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-3079: Fix sequence file writer
..


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6107/4/be/src/exec/hdfs-sequence-table-writer.cc
File be/src/exec/hdfs-sequence-table-writer.cc:

PS4, Line 179: 
 : 
> Is this the reason the old sequence file writer created corrupted files ?
The old sequence file writer had multiple issues:

1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong() were 
broken. As a result, Impala could not read back uncompressed sequence files 
created by Impala (see be/src/exec/read-write-util.h).

2. KEY_CLASS_NAME was missing from the sequence file header. As a result, Hive 
could not read back uncompressed sequence files created by Impala (see 
be/src/exec/hdfs-sequence-table-writer.cc).

3. Keys were missing from record-compressed sequence files. Hive could not read 
back record-compressed sequence files created by Impala (see 
be/src/exec/hdfs-sequence-table-writer.cc).

4. In some cases the wrong Record-compression flag was written to the sequence 
file header. As a result, Hive could not read back record-compressed sequence 
files created by Impala (see be/src/exec/hdfs-sequence-table-writer.cc).

5.Impala added 'sync_marker' instead of 'neg1_sync_marker' to the beginning of 
blocks in block-compressed sequence files. Hive could not read these files back.

6. Impala created block-compressed files with:
- empty key-lengths block (L176)
- empty keys block (L177)
- empty value-lengths block (L180)

This resulted in invalid block-compressed sequence files that Hive could not 
read back (see HdfsSequenceTableWriter::WriteCompressedBlock()).


PS4, Line 139: _cast(KEY_CLASS_NAME));
> nit: indent 4. Same below.
Done


Line 191:   record.WriteBytes(output_length, output);
> It seems a bit unfortunate that we don't free the temp buffer (i.e. output)
Added FreeAll to the end of the 'Flush()' function.


PS4, Line 193:   // Output compressed keys block-size & compressed keys block.
 :   // The keys block contains "\0\0\0\0" byte sequence as a key 
for each row (this is what
 :   // Hive does).
> Does not writing key-lengths block and key block prevent the file from bein
Yes, Hive failed with an exception when I tried that.


http://gerrit.cloudera.org:8080/#/c/6107/4/be/src/exec/hdfs-sequence-table-writer.h
File be/src/exec/hdfs-sequence-table-writer.h:

Line 29: 
> Would be good to add the details of the Sequence file's layout as top level
Done


http://gerrit.cloudera.org:8080/#/c/6107/3/be/src/exec/read-write-util.h
File be/src/exec/read-write-util.h:

Line 230: // For more information, see the documentation for 
'WritableUtils.writeVLong()' method:
> DCHECK_GE(num_bytes, 2);
Done


PS3, Line 233: nt64_t num_b
> May also want to state that the source of this behavior.
Done


http://gerrit.cloudera.org:8080/#/c/6107/3/be/src/util/compress.cc
File be/src/util/compress.cc:

Line 248: outp += size;
> DCHECK_LE(outp - out_buffer_,  length);
Done


http://gerrit.cloudera.org:8080/#/c/6107/3/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
File testdata/workloads/functional-query/queries/QueryTest/seq-writer.test:

Line 212: stored as SEQUENCEFILE;
> May be helpful to also add a test for writing empty file:
I tried this and it doesn't write an empty file. It doesn't create a file at 
all. Probably there's no easy way to force the sequence file writer to create 
an empty-file.


http://gerrit.cloudera.org:8080/#/c/6107/3/tests/query_test/test_compressed_formats.py
File tests/query_test/test_compressed_formats.py:

PS3, Line 170:   # Read it back in Impala
 :   output = self.client.execute('select count(*) from %s' % 
table_name)
 :   assert '16541' == output.get_data()
 :   # Read it back in Hive
 :   output = self.run_stmt_in_hive('select count(*) from %s' % 
table_name)
 :   assert '16541' == output.split('\n')[1]
 : 
 :   def test_avro_writer(self, vector):
 : self.run_test_case('QueryTest/avro-wri
> Doesn't this duplicate the second test ? May help to test with empty file a
The 2nd test is for record-compressed sequence files while the 3rd is for 
block-compressed seq files.
Added tests for files greater than 4K and less than 4K. I could not figure out 
how to create an empty file.


-- 
To view, visit http://gerrit.cloudera.org:8080/6107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 

[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer

2017-03-22 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#5).

Change subject: IMPALA-3079: Fix sequence file writer
..

IMPALA-3079: Fix sequence file writer

Before the fix, sequence file writer produced corrupt files in some
cases. Steps to reproduce:
  SET ALLOW_UNSUPPORTED_FORMATS=1;

  create table store_sales_seq_snap like tpcds_parquet.store_sales
  stored as SEQUENCEFILE;

  insert into store_sales_seq_snap partition(ss_sold_date_sk)
  select * from tpcds_parquet.store_sales
  where ss_sold_date_sk between 2450816 and 2451200;

The insert statement produces a corrupt file that cannot be read back.

This change fixes:
- The implementation of zero-compressed encoding in ReadWriteUtil
  class.
- The calculation of block sizes in SnappyBlockCompressor class.
- Creating record/block compressed sequence files in
  HdfsSequenceTableWriter class.

Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
---
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/util/compress.cc
M be/src/util/decompress-test.cc
M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
M tests/query_test/test_compressed_formats.py
8 files changed, 484 insertions(+), 80 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 


  1   2   >