[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

2017-11-15 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
..


Patch Set 3:

(3 comments)

> Based on my reading of "man readdir", it's not threadsafe. I think
 > the only usage here is if a user hits "http://impalad:.../; to see
 > the web UI. It's possible that there's a lock somewhere preventing
 > concurrent use, but given that this is already reasonably
 > expensive, I'd recommend using the reentrant readdir_r instead.
 >
 > I looked around, and it looks like C++17 and boost have filesystem
 > abstractions, but I think readdir() is simple enough.

I have replaced readdir() with readdir_r(), but the solution is only ok for 
Linux, and may cause problems on other Unix systems, because the size of 
dirent.d_name is not always fix.

The general solution in linux.die.net/man/3/readdir_r can be actually 
problematic, see womble.decadent.org.uk/readdir_r-advisory.html The scenario is 
not.

It is easy to allocate a buffer whose size will be surely enough in 
/proc/self/fd, but I am unsure about the ideal solution.
- Should I care about portability to other Posix variants?
- Is there a way to express "compilation error if other Posix variant than 
Linux"?
- Is there a maximum size for filenames? Most examples use 255 (+1 for \0), but 
I do not know if it is an official value or not.
- Is there a guideline for the maximum size to allocate on stack? Is ~255 ok?

http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
> Please describe in this line what the change does, not what it should do, i
Done


http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG@11
PS2, Line 11: Posix
> nit: Posix is a name and should be capitalized.
Done


http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG@18
PS2, Line 18: way to know the "expected value" in advance, and the number of 
file desciptors can
> Couldn't we test that a reasonable minimum number of file descriptors is re
Done



--
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 16 Nov 2017 01:46:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

2017-11-20 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Philip Zeyliger, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8546

to look at the new patch set (#6).

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
..

IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Running shell commands from impalad can be problematic, because using popen() 
leads
to forking which causes a spike in virtual memory. To avoid this, "ls" is 
replaced
with POSIX API calls.

FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, 
so
it was unneccesery work to initialize it. It is removed, and only the number of 
file
descriptors is computed.

The automatic test for this function is only a sanity check,  because there is 
no
way to know the "expected value" in advance, and the number of file desciptors 
can
change anytime.

Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
---
M be/src/util/proc-info-test.cc
M be/src/util/process-state-info.cc
M be/src/util/process-state-info.h
3 files changed, 35 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8546/6
--
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

2017-11-20 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.h
File be/src/util/process-state-info.h:

http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.h@61
PS5, Line 61: /// Below are some of the Process status information that will be 
read
> nit: long line. Please have a look at how to configure your text editor to
Done


http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc
File be/src/util/process-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc@36
PS5, Line 36: using boost::algorithm::is_any_of;
> nit: sort alphabetically?
Done


http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc@157
PS5, Line 157: // readdir() is not thread-safe according to its man page, 
but in glibc
> Can you include a reference to the source (e.g. https://www.gnu.org/softwar
Done


http://gerrit.cloudera.org:8080/#/c/8546/6/be/src/util/process-state-info.cc
File be/src/util/process-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/8546/6/be/src/util/process-state-info.cc@161
PS6, Line 161:   if(dir_entry->d_name[0] != '.') ++fd_count; // . and .. do 
not count
www.gnu.org/software/libc/manual/html_node/Reading_002fClosing-Directory.html 
states:
"Portability Note: On some systems readdir may not return entries for . and ..,"
Because of this, the filename must be checked to return the exact number of 
descriptors. As every filename in fd directory is a number, it is enough to 
check the first character.



--
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 20 Nov 2017 18:51:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-20 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..


Patch Set 8:

> Uploaded patch set 7.
Oops, there was a rebase... to see the changes that fix CHAR type handling, see 
diff between patch 6 and 8.

Tim, can you rerun the tests for me?


--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 21 Nov 2017 00:04:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2017-11-16 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8400 )

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..


Patch Set 3:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG@9
PS3, Line 9: Adding support for "clustered", "noclustered", "shuffle" and 
"noshuffle"
> How about "This change adds support..."
Done


http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG@12
PS3, Line 12: example:
> Capitalize
Done


http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG@19
PS3, Line 19: Sort by partition columns before insert to make the insert more 
efficient.
> Mention where exactly the sort happens (locally vs distributed).
Done


http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG@26
PS3, Line 26: exchenge
> spelling
Done


http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup@1054
PS3, Line 1054: RESULT = new CreateTableAsSelectStmt(new 
CreateTableStmt(tbl_def),
> nit: Can you wrap the lines at 90 chars?
Done


http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup@1167
PS3, Line 1167: tbl_def_without_col_defs ::=
> Does that now also allow hints like "CREATE /*+ clustered */ TABLE FOO LIKE
It is also accepted in that case, but has no effect - should I disallow it?

I think that it would be the best to enable hints in several cases (even if no 
hint is accepted there currently), and check the hints in normal java code and 
warn if one is not valid for a given statement. I am not sure though.


http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup@1168
PS3, Line 1168:   KW_CREATE external_val:external opt_plan_hints:hints KW_TABLE 
if_not_exists_val:if_not_exists
> long line
Done


http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@155
PS3, Line 155:   TableDef(TableName tableName, boolean isExternal, boolean 
ifNotExists, List hints) {
> long line
Done


http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1633
PS3, Line 1633: Only
> What does "only" mean here?
This was copy pasted - I guess it means "not error, just warning".

I did not want to deviate too much from the original 
(AnalyzeStmtsTest.TestInsertHints()), to make it easier to merge the two if I 
find a good way to do it.


http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1647
PS3, Line 1647:   // HBase tables cannot be tested, as currently Impala 
cannot create HBase tables.
> It's weird that this explanation occurs in the middle of the function. Can
This is also copy paste legacy - AnalyzeStmtsTest.TestInsertHints() had an 
HBase test here.


http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1653
PS3, Line 1653:
> Why is there a newline?
Also copy paste legacy.


http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1677
PS3, Line 1677: "select * from functional.alltypes");
> Please also add tests that have additional hints in the select clause.
Done


http://gerrit.cloudera.org:8080/#/c/8400/3/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

http://gerrit.cloudera.org:8080/#/c/8400/3/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@207
PS3, Line 207: # IMPALA-4167: clustered CTAS into partitioned table adds sort 
node.
> Can you add tests for noclustered and for sort by()?
Done



--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Fri, 17 Nov 2017 00:09:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

2017-11-17 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Philip Zeyliger, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8546

to look at the new patch set (#5).

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
..

IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Running shell commands from impalad can be problematic, because using popen() 
leads
to forking which causes a spike in virtual memory. To avoid this, "ls" is 
replaced
with POSIX API calls.

FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, 
so
it was unneccesery work to initialize it. It is removed, and only the number of 
file
descriptors is computed.

The automatic test for this function is only a sanity check,  because there is 
no
way to know the "expected value" in advance, and the number of file desciptors 
can
change anytime.

Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
---
M be/src/util/proc-info-test.cc
M be/src/util/process-state-info.cc
M be/src/util/process-state-info.h
3 files changed, 32 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8546/5
--
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

2017-11-17 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
..


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8546/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8546/4//COMMIT_MSG@11
PS4, Line 11: api
> nit: acronyms are usually all caps
Done


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.h
File be/src/util/process-state-info.h:

http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.h@77
PS4, Line 77: /// File Descriptors information will be read from /proc/pid/fd.
> nit: /proc/self/fd
Done


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc
File be/src/util/process-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@155
PS4, Line 155: does
> nit: do not count.
Done


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@156
PS4, Line 156: self
> nit: either use getpid() here to be consistent with the rest of the file, o
Done


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@159
PS4, Line 159: dir
> Can you try to think of better variable names? You have a DIR called d, and
Done


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@160
PS4, Line 160: while ((dir = readdir(d)) != nullptr) ++fd_count;
> Please add a brief comment why your usage of readdir is thread-safe here.
Done


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@163
PS4, Line 163: std::
> You shouldn't need std:: here
Done



--
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Sat, 18 Nov 2017 00:44:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell

2017-12-11 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8771 )

Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8771/2/docs/topics/impala_shell_options.xml
File docs/topics/impala_shell_options.xml:

http://gerrit.cloudera.org:8080/#/c/8771/2/docs/topics/impala_shell_options.xml@284
PS2, Line 284:   
--query_option="option=value"
Maybe the short form -Q should be also mentioned here - as I see, both forms 
are mentioned for other command line options.



--
To view, visit http://gerrit.cloudera.org:8080/8771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b
Gerrit-Change-Number: 8771
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Tue, 12 Dec 2017 00:44:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-04 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10116 )

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py
File tests/metadata/test_last_ddl_time_update.py:

http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@52
PS7, Line 52:
> Mention the supported substitutions in the methods below (and add comments
Done


http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@56
PS7, Line 56:  substituted
> Is there a reason we don't test changes to both? Do they never change at th
There is no statement that should change both.


http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@76
PS7, Line 76: ns the given queries
> Do you need to specify the class to access TestHelper?
Yes, adding the outer classes is necessary in this case,
h = TestHelper(...) would lead to "NameError: global name 'TestHelper' is not 
defined". It is also valid to use self.TestHelper(...) but I think that it is 
weird, so I prefer adding the class name.


http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@179
PS7, Line 179: # dynamic partition insert
> Can you move this method into the Helper class now?
Done - this change resulted in a bit less code but more indentation, so I am 
not sure that it is better this way.



--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 04 May 2018 17:50:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-04 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#8).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_last_ddl_time_update.py
6 files changed, 201 insertions(+), 161 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/8
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-05-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 4: Code-Review+1

(6 comments)

Few nits otherwise looks good to me. The LocalToUtc performance part is 
optional - as it does not affect other parts of the code, it can be easily done 
later when general structure is already accepted by other reviewers.

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc@526
PS4, Line 526:   const string& tz_name = (start_lookup.abbr != nullptr) ? 
start_lookup.abbr :
 :   context->impl()->state()->local_time_zone()->name();
What is the goal of this logic? To print timezone abbreviations instead of the 
full names, or to distinguish between summer/winter time, or both? A comment 
would be nice, and maybe the logic could be moved to a TimestampValue member 
function like string GetTimezoneName(Timezone*).


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@93
PS4, Line 93: inline bool CheckIfDateOutOfRange(const cctz::civil_day& date) {
:   static const cctz::civil_day 
max_date(TimestampFunctions::MAX_YEAR, 12, 31);
:   static const cctz::civil_day 
min_date(TimestampFunctions::MIN_YEAR, 1, 1);
:   return date < min_date || date > max_date;
: }
This could be simpler and possibly faster by expecting a cctz::civil_second 
argument and check if 1400<=year<1.


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@128
PS4, Line 128:
cctz explains pretty well the handling of dst boundaries, maybe we could add a 
link to it, for example to 
https://github.com/google/cctz/blob/a2dd3d0fbc811fe0a1d4d2dbb0341f1a3d28cb2a/include/cctz/time_zone.h#L147


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@129
PS4, Line 129:   // In case of ambiguity invalidate TimestampValue.
 :   const cctz::time_zone::civil_lookup from_cl = 
local_tz->lookup(from_cs);
 :   if (UNLIKELY(from_cl.kind != 
cctz::time_zone::civil_lookup::UNIQUE)) {
 : SetToInvalidDateTime();
 :   } else {
 : cctz::time_point from_tp = from_cl.pre;
 :
 : // Convert 'from_tp' time_point to civil_second assuming 
'UTC' time-zone.
 : cctz::civil_second to_cs = cctz::convert(from_tp, 
TimezoneDatabase::GetUtcTimezone());
 :
 : // boost::gregorian::date() throws 
boost::gregorian::bad_year if year is not in the
 : // 1400.. range. Need to check validity before creating 
the date object.
 : if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs 
{
I may be possible to get TimestampValue from cctz::time_zone::civil_lookup in a 
faster way - splitting from_tp to a day part (since a constant date) and the 
remainder seconds part is enough for us and should be faster then getting 
cctz::civil_second (which contains year/month/day split).

The code could look something like this:
int64 secs_since_base = from_tp - BASETIME_AS_CCTZ_SYS_SEC;
time_=sec_since_base%(24*60*60)+time_.fractional_seconds();
int32 days_since_base = sec_since_base/(24*60*60);
if(out_of_range(days_since_base)) SetToInvalidDateTime();
date_ = days_since_base - BASEDATE_AS_BOOST_GREG_DATE;


http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@146
PS4, Line 146:   time_ = boost::posix_time::time_duration(to_cs.hour(), 
to_cs.minute(), to_cs.second(),
nit: long line


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h
File be/src/util/filesystem-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@66
PS3, Line 66: iff
> "iff" stands for "if and only if".
Thanks for the explanation!



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 09 May 2018 11:47:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10116 )

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..


Patch Set 9:

The change broke AuthorizationTest#TestDescribeTableResults. I fixed this in 
patch set 9.


--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 09 May 2018 19:11:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-09 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#9).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M tests/metadata/test_last_ddl_time_update.py
7 files changed, 202 insertions(+), 161 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/9
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6920: fix inconsistencies with scanner thread tokens

2018-04-27 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10186 )

Change subject: IMPALA-6920: fix inconsistencies with scanner thread tokens
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/10186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d31d72441aff7293759281d0248e641df43704
Gerrit-Change-Number: 10186
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Apr 2018 18:52:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6920: fix inconsistencies with scanner thread tokens

2018-04-27 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10186 )

Change subject: IMPALA-6920: fix inconsistencies with scanner thread tokens
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10186/3/be/src/runtime/thread-resource-mgr.h
File be/src/runtime/thread-resource-mgr.h:

http://gerrit.cloudera.org:8080/#/c/10186/3/be/src/runtime/thread-resource-mgr.h@99
PS3, Line 99:
nit: extra space - several copy pasted comments contain double spaces, so this 
may be intentional


http://gerrit.cloudera.org:8080/#/c/10186/3/be/src/runtime/thread-resource-mgr.h@189
PS3, Line 189: Set to NULL when unregistered.
Is there a reason behind this? Seems to be an object pool heritage, where 
Reset() used to reset all the members with the exception of parent_.


http://gerrit.cloudera.org:8080/#/c/10186/3/be/src/runtime/thread-resource-mgr.cc
File be/src/runtime/thread-resource-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/10186/3/be/src/runtime/thread-resource-mgr.cc@55
PS3, Line 55:   DCHECK(pools_.find(pool.get()) == pools_.end());
Is this DCHECK still useful after removing object caching?


http://gerrit.cloudera.org:8080/#/c/10186/3/be/src/runtime/thread-resource-mgr.cc@133
PS3, Line 133: not
nit: double not



--
To view, visit http://gerrit.cloudera.org:8080/10186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d31d72441aff7293759281d0248e641df43704
Gerrit-Change-Number: 10186
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Comment-Date: Fri, 27 Apr 2018 17:17:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-10 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#10).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M tests/metadata/test_last_ddl_time_update.py
7 files changed, 203 insertions(+), 161 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/10
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-10 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10116 )

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..


Patch Set 10: Code-Review+1

Sorry, some of the authorization test changes were missing from patch 9.
Carry +1


--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 10 May 2018 16:10:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-05-11 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/5/be/src/runtime/timestamp-value.cc@139
PS5, Line 139:   // In case the resulting 'time_point' is ambiguous, we have to 
invalidate
 :   // TimestampValue.
 :   // 'civil_lookup' members and the details of handling 
ambiguity are described at:
 :   // 
https://github.com/google/cctz/blob/a2dd3d0fbc811fe0a1d4d2dbb0341f1a3d28cb2a/
 :   // include/cctz/time_zone.h#L106
 :   if (UNLIKELY(from_cl.kind != 
cctz::time_zone::civil_lookup::UNIQUE)
I have investigated a bit about this:

- there is a Jira that complains about this behavior: 
https://issues.apache.org/jira/browse/IMPALA-3169

- Hive does not work like this, it returns a "valid" timestamp for 
repeated/skipped hours:

select
 to_utc_timestamp(cast("2011-03-13 02:15:00" as timestamp), 
"America/Los_Angeles"),
 to_utc_timestamp(cast("2011-11-06 01:15:00" as timestamp), 
"America/Los_Angeles")
result: 2011-03-13 10:15:00.0   2011-11-06 09:15:00.0

I think that we should do the same, at least  for repeated values. I can 
imagine several valid queries where this would be the correct behavior, for 
example when we filter for a time interval.

So I vote for solving IMPALA-3169 in this patch by choosing pre or post time in 
non UNIQUE cases too. If there are no test cases yet for skipped/repeated 
hours, then we should create some and expect the same results that Hive returns.



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 11 May 2018 14:04:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-11 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#11).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M tests/metadata/test_last_ddl_time_update.py
8 files changed, 220 insertions(+), 173 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/11
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-11 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10116 )

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..


Patch Set 10:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1235
PS10, Line 1235:* TODO: the following paragraph seems at least partially 
obsolate
> Why not clean it up then? Point (1) is obsolete but point (2) is still vali
Done


http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@244
PS10, Line 244: Long.toString(System.currentTimeMillis() / 1000));
> Doesn't the HMS set this in alter_table()?
It sets it if the property does not exist, but this would not work well for 
Kudu tables: after calling alter_table here, the table will not be loaded from 
HMS again (unlike in HDFS tables, where this is done to ensure HMS-Impala 
consistency). This means that if I would remove "transient_lastDdlTime" here, 
then HMS would calculate a valid value, but Impala catalogd would not know 
about this value.


http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@785
PS10, Line 785:   msTbl.putToParameters("impala.lastComputeStatsTime",
> Create a constant for the table property in Table similar to what we have i
I have put the constant to HdfsTable, because all the other property keys 
reside their, even those that are used by Kudu tables too.


http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2868
PS10, Line 2868: // HMS updates transient_lastDdlTime if it is removed.
> I understand what this comment says, but I don't understand it's relevance
This comment remained here by mistake.


http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2870
PS10, Line 2870: Long.toString(System.currentTimeMillis() / 1000));
> We're doing "System.currentTimeMillis() / 1000" in quite a few places, I su
I have created a bit different helper function.


http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py
File tests/metadata/test_last_ddl_time_update.py:

http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@74
PS10, Line 74: def run_test(self, query, expect_changed_ddl_time, 
expect_changed_stats_time):
> Passing a ";" delimited string is really weird. Why not pass a list of quer
The "invalidate metadata %(TBL)s; describe %(TBL)s" combo was used the check 
that in case of Kudu tables, lastDdlTime is not increased by reloading the 
table (it used to increase it), so an extra query is needed after INVALIDATE 
METADATA to load the table. The two can be separated (like for other init 
queries) but I felt that they belong together.

I replaced the string splitting with variable argument list - I can replace 
this with two separate calls if needed.


http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@93
PS10, Line 93:   # Hive uses a seconds granularity on the last ddl time.
> Not just Hive - we do too. Just say "All times are stored in seconds" or so
Isn't it an HMS convention in this case? Or is there a reason behind not using 
higher precision timestamp?


http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@155
PS10, Line 155: h.expect_no_time_change("drop incremental stats %(TBL)s 
partition (j=1, s='2012')")
> use "drop stats" instead to wipe everything (including incremental stats)
Are you sure about this? I use DROP INCREMENTAL STATS on purpose to check +1 
statement's effect in the test suite.



--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 

[Impala-ASF-CR] IMPALA-5706: Parallelise read I/O in sorter

2018-05-07 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9943 )

Change subject: IMPALA-5706: Parallelise read I/O in sorter
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1730
PS7, Line 1730: return (sorted_runs_.size() + 1) / 2;
If the goal is to minimize the number of "extra merges" per row, then it is 
optimal for the final merge to always merge as much runs as possible, so this 
line could return
sorted_runs_.size() - max_runs_per_intermediate_merge

An example when this would result in less merges:
max_runs_per_intermediate_merge: 3
number of runs: 7
(the numbers are the number of original runs merged into a run)

1 1 1 1 1 1 1

1 1 1 1 3 - current logic decides to merge (5+1/2)=3 runs

1 3 3 - ready for final merge after merging 6 runs

1 1 1 1 1 1 1

1 1 1 1 3 - new logic would decide to merge 5-3=2 runs

1 1 3 2 - ready for final merge after merging 5 runs

On the other hand, merging more runs in the final merge means that the buffers 
will be released later, so I am not completely sure that maximizing it is a 
good idea.



--
To view, visit http://gerrit.cloudera.org:8080/9943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9
Gerrit-Change-Number: 9943
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 May 2018 15:59:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-05-04 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 3:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/exprs/timestamp-functions.cc@95
PS3, Line 95:   time_t unix_time;
:   if (UNLIKELY(!ts_value.UtcToUnixTime(_time))) return 
TimestampVal::null();
:   cctz::time_point from_tp = 
UnixSecondsToTimePoint(unix_time);
:
:   // Convert 'from_tp' time_point to civil_second assuming 
'timezone' time-zone.
:   cctz::civil_second to_cs = cctz::convert(from_tp, *timezone);
:
:   if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs {
: const string& msg = Substitute(
: "Timestamp '$0' did not convert to a valid local time in 
timezone '$1'",
: ts_value.ToString(), tz_string_value.DebugString());
: context->AddWarning(msg.c_str());
: return TimestampVal::null();
:   }
:
:   // Note that 'to_cs' has second granularity. Since time-zone 
rules do not affect
:   // fractional seconds, the fractional second part of the 
returned TimestampVal should be
:   // equal to ts_value.time().fractional_seconds().
:   return CivilSecondToTimestampVal(to_cs, 
ts_value.time().fractional_seconds());
This logic is the same as TimestampValue::UtcToLocal(), plus warning if the 
resulting timestamp is not valid. As local time and generic timezone 
conversions are the same now, it would make sense to keep them at one place.


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/exprs/timestamp-functions.cc@144
PS3, Line 144:   cctz::time_point from_tp = from_cl.pre;
 :
 :   // Convert 'from_tp' time_point to civil_second assuming 'UTC' 
time-zone.
 :   cctz::civil_second to_cs = cctz::convert(from_tp, 
TimezoneDatabase::GetUtcTimezone());
 :
 :   if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs {
 : const string& msg =
 : Substitute("Timestamp '$0' in timezone '$1' could not be 
converted to UTC",
 : ts_value.ToString(), tz_string_value.DebugString());
 : context->AddWarning(msg.c_str());
 : return TimestampVal::null();
 :   }
 :
 :   // Note that 'to_cs' has second granularity. Since time-zone 
rules do not affect
 :   // fractional seconds, the fractional second part of the 
returned TimestampVal should be
 :   // equal to ts_value.time().fractional_seconds().
 :   return CivilSecondToTimestampVal(to_cs, 
ts_value.time().fractional_seconds());
Similarly to my comment at line 113, this logic could be moved to a 
TimestampValue function. Removing these calculations from this file would mean 
that the helper functions (line 47-74) could be removed too.


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/runtime/timestamp-value.cc@117
PS3, Line 117:   time_ = boost::posix_time::time_duration(to_cs.hour(), 
to_cs.minute(), to_cs.second(),
nit: long line


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h
File be/src/util/filesystem-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@66
PS3, Line 66: iff
typo: if


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@70
PS3, Line 70: iff
typo: if


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@71
PS3, Line 71: path
Maybe writing "string" instead of "path" express better that no file system is 
accessed.


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/filesystem-util.h@79
PS3, Line 79: path
Same as line 71.


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc
File be/src/util/time.cc:

http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc@165
PS3, Line 165:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc@168
PS3, Line 168:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/9986/3/be/src/util/time.cc@171
PS3, Line 171: 
nit: extra space



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila 

[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-15 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#12).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M tests/metadata/test_last_ddl_time_update.py
8 files changed, 228 insertions(+), 173 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/12
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-15 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10116 )

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..


Patch Set 11:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@244
PS10, Line 244: 
msTable_.putToParameters(StatsSetupConst.DO_NOT_UPDATE_STATS,
> Makes sense, please add a comment somewhere stating why we prefer to set th
Done


http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@785
PS10, Line 785:   Table.updateTimestampProperty(msTbl, 
HdfsTable.TBL_PROP_LAST_COMPUTE_STATS_TIME);
> The Kudu properties are generaly in KuduTable. Which property in HdfsTable
Done - I thought that some of the properties defined in HdfsTable are used by 
Kudu tables too, but I looked through them and they are only used in HDFS 
tables.


http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py
File tests/metadata/test_last_ddl_time_update.py:

http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@74
PS10, Line 74: def run_test(self, queries, expect_changed_ddl_time, 
expect_changed_stats_time):
> The vast majority of cases only runs one query at once, and there is no fun
Done


http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@93
PS10, Line 93:   time.sleep (1.1)
> I agree it's an HMS convention. Let's change the comment to state that then
I have already rewritten Hive to HMS in patch set 11 - does the comment still 
miss something?



--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 15 May 2018 13:57:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-20 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10116 )

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..


Patch Set 14:

>From the failing build:

21:55:34 ] Creating an egg for /home/ubuntu/Impala/shell/ext-py/sqlparse-0.1.14
...
21:55:34 ] Creating an egg for /home/ubuntu/Impala/shell/ext-py/sqlparse-0.1.19
21:55:34 ] python: can't open file 'setup.py': [Errno 2] No such file or 
directory
21:55:34 ] Error in shell/make_shell_tarball.sh at line 96: python setup.py -q 
bdist_egg clean

There is a known issue with these log lines which should be resolved by doing a 
clean build.


--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sun, 20 May 2018 12:46:44 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-6131: Track time of last statistics update in metadata

2018-05-23 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10484


Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: Ibda49725d3e76456f2d1b3edd1bf117b0174e234
---
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_last_ddl_time_update.py
7 files changed, 226 insertions(+), 173 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/10484/1
--
To view, visit http://gerrit.cloudera.org:8080/10484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibda49725d3e76456f2d1b3edd1bf117b0174e234
Gerrit-Change-Number: 10484
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR](2.x) IMPALA-6131: Track time of last statistics update in metadata

2018-05-23 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10484 )

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..


Patch Set 1:

The commit on master ( https://gerrit.cloudera.org/#/c/10116/ ) changed 
AuthorizationTest.java to fix a broken test. This test is not present in 2.x, 
which made the changes in that file unnecessary + conflicting.


--
To view, visit http://gerrit.cloudera.org:8080/10484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibda49725d3e76456f2d1b3edd1bf117b0174e234
Gerrit-Change-Number: 10484
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Comment-Date: Wed, 23 May 2018 11:55:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-05-25 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 9:

> > > Uploaded patch set 9.
 > >
 > > Patch -set 9 contains the following changes:
 > > - Added a full timezone db to testdata/tzdb.
 > > - End-to-end tests and BE-tests were changed to use this timezone
 > > db. This was necessary because some timezone-tests were failing
 > on
 > > older jenkins workers that had an older tzdata package installed.
 >
 > It might be a good idea to store the timezone-db files in one .tar
 > file and extract them before running the tests. What do you think?

I agree, .taring or compressing the tz db would be much better, if it does not 
make the code too complicated. Having less file would make the review more 
readable, and would also make the tz db consume much less space on hdfs, as the 
many small files will be rounded up to hdfs block size.


-- 
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 25 May 2018 16:33:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive

2018-05-24 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10486 )

Change subject: IMPALA-7060: Restrict Impala to only support timezones that 
work in Hive
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10486/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10486/1//COMMIT_MSG@29
PS1, Line 29:
> I think you should add "Cherry-picks: not for 2.x." line.
Done


http://gerrit.cloudera.org:8080/#/c/10486/1/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/10486/1/be/src/exprs/timezone_db.h@55
PS1, Line 55: map
> const?
This map turned out to be unnecessary.


http://gerrit.cloudera.org:8080/#/c/10486/1/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/10486/1/be/src/exprs/timezone_db.cc@68
PS1, Line 68: {"AEST", "Australia/Sydney"},
: {"CDT", "America/Chicago"},
: {"CEST", "CET"},
: {"EDT", "EST5EDT"},
: {"ICT", "Asia/Ho_Chi_Minh"},
: {"KST", "Asia/Seoul"},
: {"MDT", "MST7MDT"},
: {"PHT", "Asia/Manila"},
: {"PDT", "America/Los_Angeles"}
> Are these time-zone abbreviations accepted by Hive? If not, I think we shou
Done - I have checked TimezoneDatabase::TIMEZONE_DATABASE_STR, and as all the 
Java abbreviations are already there, I have removed this map.


http://gerrit.cloudera.org:8080/#/c/10486/1/testdata/data/timezoneverification.csv
File testdata/data/timezoneverification.csv:

http://gerrit.cloudera.org:8080/#/c/10486/1/testdata/data/timezoneverification.csv@716
PS1, Line 716: JST
> Please make sure hat all the Java-supported time-zone abbreviations are tes
I have checked, all of them are included here.



--
To view, visit http://gerrit.cloudera.org:8080/10486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac
Gerrit-Change-Number: 10486
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Jim Apple 
Gerrit-Comment-Date: Thu, 24 May 2018 14:27:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive

2018-05-24 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10486 )

Change subject: IMPALA-7060: Restrict Impala to only support timezones that 
work in Hive
..


Patch Set 1:

> Is this a breaking change? We normally hold on braking changes
 > until we bump the major version.
 > 
 > Is IMPALA-3307 going to go in on a minor release? Is it a breaking
 > change?

Yes, this and IMPALA-3307 are breaking changes. This change should remove all 
timezones that won't be supported after IMPALA-3307. IMPALA-3307 will introduce 
some changes related to timezones that were changed during history, but I 
consider those changes to be fixes rather than breaking changes.


--
To view, visit http://gerrit.cloudera.org:8080/10486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac
Gerrit-Change-Number: 10486
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Jim Apple 
Gerrit-Comment-Date: Thu, 24 May 2018 14:31:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive

2018-05-24 Thread Csaba Ringhofer (Code Review)
Hello Jim Apple, Attila Jeges,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10486

to look at the new patch set (#3).

Change subject: IMPALA-7060: Restrict Impala to only support timezones that 
work in Hive
..

IMPALA-7060: Restrict Impala to only support timezones that work in Hive

IANA timezone integration (IMPALA-3307) will drop the support for
many non-standard timezone aliases. As IMPALA-3307 is a very large
change, its release may be delayed - meanwhile, it would be better
to discourage Impala 3.x users from using timezone names that will
not be supported in the future. For this reason, this change drops
the support of non-standard aliases from the current boost based
implementation.

This is supposed to be a temporary solution till IMPALA-3307, so
I have put minimal effort into it.

Testing:
Timestamp aliases were largely untested, so only a few changes were
needed - these were copied from the review for IMPALA-3307,
gerrit.cloudera.org/#/c/9986/ It may be useful to add more tests,
e.g. to check that using dropped aliases return NULL and lead to
warning, but I think that these kind of changes should be done first
in the review of IMPALA-3307, and synced here afterwards.

Cherry-picks: not for 2.x.

Change-Id: I90859398081bae4976af31b09b3121c198b6adac
---
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.cc
M testdata/data/timezoneverification.csv
3 files changed, 2 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/10486/3
--
To view, visit http://gerrit.cloudera.org:8080/10486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac
Gerrit-Change-Number: 10486
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive

2018-05-24 Thread Csaba Ringhofer (Code Review)
Hello Jim Apple, Attila Jeges,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10486

to look at the new patch set (#2).

Change subject: IMPALA-7060: Restrict Impala to only support timezones that 
work in Hive
..

IMPALA-7060: Restrict Impala to only support timezones that work in Hive

IANA timezone integration (IMPALA-3307) will drop the support for
many non-standard timezone aliases. As IMPALA-3307 is a very large
change, its release may be delayed - meanwhile, it would be better
to discourage Impala 3.x users from using timezone names that will
not be supported in the future. For this reason, this change drops
the support of non-standard aliases from the current boost based
implementation.

This is supposed to be a temporary solution till IMPALA-3307, so
I have put minimal effort into it.

Testing:
Timestamp aliases were largely untested, so only a few changes were
needed - these were copied from the review for IMPALA-3307,
gerrit.cloudera.org/#/c/9986/ It may be useful to add more tests,
e.g. to check that using dropped aliases return NULL and lead to
warning, but I think that these kind of changes should be done first
in the review of IMPALA-3307, and synced here afterwards.

Change-Id: I90859398081bae4976af31b09b3121c198b6adac
---
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.cc
M testdata/data/timezoneverification.csv
3 files changed, 2 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/10486/2
--
To view, visit http://gerrit.cloudera.org:8080/10486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac
Gerrit-Change-Number: 10486
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-18 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#13).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M tests/metadata/test_last_ddl_time_update.py
8 files changed, 228 insertions(+), 173 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/13
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10116 )

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2868
PS12, Line 2868: // catalogd would read it back after alter_table, but 
as this would not work for
> I don't think it's true that we'd always read it back after calling this fu
Do you have an example in mind when the table should not be reloaded? I looked 
at the callers of applyAlterTable(): loadTableMetadata() is called after 
alter*() + dropTableStats(), and createTable() only creates an "incomplete 
table", so the table will be reloaded before it is actually used. Because of 
this, my impression was that tables are always reloaded by design.



--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 18 May 2018 15:43:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10116 )

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10116/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2868
PS12, Line 2868: // this would make it necessary to reload the table 
after alter_table in order to
> I agree that today we will always reload the table metadata from the HMS an
Thanks for the explanation!

I have rewritten the comment, I hope that it is better now.



--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 18 May 2018 17:04:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive

2018-05-23 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10486


Change subject: IMPALA-7060: Restrict Impala to only support timezones that 
work in Hive
..

IMPALA-7060: Restrict Impala to only support timezones that work in Hive

IANA timezone integration (IMPALA-3307) will drop the support for
many non-standard timezone aliases. As IMPALA-3307 is a very large
change, its release may be delayed - meanwhile, it would be better
to discourage Impala 3.x users from using timezone names that will
not be supported in the future. For this reason, this change drops
the support of non-standard aliases from the current boost based
implementation.

This is supposed to be a temporary solution till IMPALA-3307, so
I have put minimal effort into it. It should be much faster than
the previous solution for timezone aliases, while slightly slower
for canonical names.

Testing:
Timestamp aliases were largely untested, so only a few changes were
needed - these were copied from the review for IMPALA-3307,
gerrit.cloudera.org/#/c/9986/ It may be useful to add more tests,
e.g. to check that using dropped aliases return NULL and lead to
warning, but I think that these kind of changes should be done first
in the review of IMPALA-3307, and synced here afterwards.

Change-Id: I90859398081bae4976af31b09b3121c198b6adac
---
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
M testdata/data/timezoneverification.csv
4 files changed, 46 insertions(+), 29 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/10486/1
--
To view, visit http://gerrit.cloudera.org:8080/10486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac
Gerrit-Change-Number: 10486
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-06-08 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 14: Code-Review+2

(5 comments)

Thanks for thinking about Zip-Slip!
I have left a few optional comments about the usability of the interfaces.

http://gerrit.cloudera.org:8080/#/c/9986/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9986/14//COMMIT_MSG@34
PS14, Line 34: - Introduces a new startup flag (--hdfs_zone_info_zip) to 
impalad to
The Zip slip safe zip-util could be also mentioned in the commit message.


http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/filesystem-util.h
File be/src/util/filesystem-util.h:

http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/filesystem-util.h@92
PS14, Line 92: Directory(const string& path, bool skip_hidden_entries = 
true);
I thought a bit about usability and I vote for removing this parameter and skip 
only "." and ".." - I can't imagine any use case when I would be interested in 
those.


http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/filesystem-util.h@109
PS14, Line 109: static Status GetEntryNames(const string& path,
I would prefer max_result_size to be the last parameter, and give it a default 
value of 0.


http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/zip-util-test.cc
File be/src/util/zip-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/14/be/src/util/zip-util-test.cc@69
PS14, Line 69:   EXPECT_FALSE(filesystem::exists(dest_dir3));
I guess that this is only true if zip decoding failed at the start, and some 
files may be already decompressed before reaching an error in the zip. I am not 
sure what to do with this, probably nothing. It would be possible add some kind 
of cleanup logic to the java util, but I am not sure if this worth the effort.


http://gerrit.cloudera.org:8080/#/c/9986/14/fe/src/main/java/org/apache/impala/util/ZipUtil.java
File fe/src/main/java/org/apache/impala/util/ZipUtil.java:

http://gerrit.cloudera.org:8080/#/c/9986/14/fe/src/main/java/org/apache/impala/util/ZipUtil.java@45
PS14, Line 45: try (ZipFile zip = new ZipFile(params.archive_file)) {
I would move this block to a similar function with (String archiveFile, String 
destDir) parameters to make this util usable from Java too. This would be 
minimal extra effort and I think that it can be handy to have an easily usable 
Zip-Slip safe extract function.



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 08 Jun 2018 16:15:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7169: Prevent HDFS from checkpointing trash until 3000 AD

2018-06-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10742 )

Change subject: IMPALA-7169: Prevent HDFS from checkpointing trash until 3000 AD
..


Patch Set 2: Code-Review+2

The flakiness related to IMPALA-7130 is not fixed yet ( it is on review: 
https://gerrit.cloudera.org/#/c/10747/ ).

I just mentioned it to ensure you that it is not related to your change, so a 
new jenkins verify job may be successful.


--
To view, visit http://gerrit.cloudera.org:8080/10742
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9452f7e44c7679f86a947cd20115c078757223d8
Gerrit-Change-Number: 10742
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Mon, 18 Jun 2018 20:02:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7181: Fix flaky test shell/test shell commandline.py::TestImpalaShell::test socket opening

2018-06-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10747 )

Change subject: IMPALA-7181: Fix flaky test 
shell/test_shell_commandline.py::TestImpalaShell::test_socket_opening
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10747/1/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/10747/1/tests/shell/test_shell_commandline.py@629
PS1, Line 629:   # Verify that impala-shell tries to create a socket 
against the host:port
 :   # combination specified by -i when -b is not used
 :   impala_shell = Popen(shlex.split("%s %s" % (shell_cmd, 
args1, )), stdout=devnull,
 :  stderr=devnull)
 :   connection, client_address = s.accept()
 :   data = connection.recv(1024)
 :   assert expected_output in data
 :   impala_shell.kill()
 :   connection.close()
 :
 :   # Verify that impala-shell tries to create a socket 
against the host:port
 :   # combination specified by -i when -b is used
I would prefer if the duplicated logic in the two cases would be merged to a 
function somehow, but this is optional.


http://gerrit.cloudera.org:8080/#/c/10747/1/tests/shell/test_shell_commandline.py@645
PS1, Line 645:   expected_output = "PingImpalaService"
Already set in line 627, no need to set here again.


http://gerrit.cloudera.org:8080/#/c/10747/1/tests/shell/test_shell_commandline.py@649
PS1, Line 649: s.close()
I am a bit worried about exceptions - this could be moved to a finally block or 
the socket could be added to a with statement (see the python 2 version in 
https://stackoverflow.com/a/16772515 )



--
To view, visit http://gerrit.cloudera.org:8080/10747
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd64632ded936d49fc404bcac75588dd7886be44
Gerrit-Change-Number: 10747
Gerrit-PatchSet: 1
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 18 Jun 2018 18:18:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7181: Fix flaky test shell/test shell commandline.py::TestImpalaShell::test socket opening

2018-06-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10747 )

Change subject: IMPALA-7181: Fix flaky test 
shell/test_shell_commandline.py::TestImpalaShell::test_socket_opening
..


Patch Set 2:

(2 comments)

Some socketing concerns, sorry for not noticing them in the first run.

http://gerrit.cloudera.org:8080/#/c/10747/2/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/10747/2/tests/shell/test_shell_commandline.py@621
PS2, Line 621: accept
I think that this will wait forever by default if no one connects to it. 
setdefaulttimeout should be to set avoid this ("select" or something similar 
can be used too, but that will be more complex I guess).


http://gerrit.cloudera.org:8080/#/c/10747/2/tests/shell/test_shell_commandline.py@624
PS2, Line 624:   impala_shell.kill()
 :   connection.close()
These should be moved to a finally block to call them if there is an exception, 
for example from the assert.



--
To view, visit http://gerrit.cloudera.org:8080/10747
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd64632ded936d49fc404bcac75588dd7886be44
Gerrit-Change-Number: 10747
Gerrit-PatchSet: 2
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Mon, 18 Jun 2018 19:52:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7181: Fix flaky test shell/test shell commandline.py::TestImpalaShell::test socket opening

2018-06-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10747 )

Change subject: IMPALA-7181: Fix flaky test 
shell/test_shell_commandline.py::TestImpalaShell::test_socket_opening
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10747/3/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/10747/3/tests/shell/test_shell_commandline.py@626
PS3, Line 626: impala_shell.kill()
This can actually also throw an exception, if the process already exited for 
some reason.


http://gerrit.cloudera.org:8080/#/c/10747/3/tests/shell/test_shell_commandline.py@627
PS3, Line 627: connection.close()
connection should be None checked.



--
To view, visit http://gerrit.cloudera.org:8080/10747
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd64632ded936d49fc404bcac75588dd7886be44
Gerrit-Change-Number: 10747
Gerrit-PatchSet: 3
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Mon, 18 Jun 2018 20:59:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7169: Prevent HDFS from checkpointing trash until 3000 AD

2018-06-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10742 )

Change subject: IMPALA-7169: Prevent HDFS from checkpointing trash until 3000 AD
..


Patch Set 2:

fyi, the build failure is related to 
https://issues.apache.org/jira/browse/IMPALA-7130

Vincent is working on the test to deflake it.


--
To view, visit http://gerrit.cloudera.org:8080/10742
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9452f7e44c7679f86a947cd20115c078757223d8
Gerrit-Change-Number: 10742
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 18 Jun 2018 15:16:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7175: deflake check for failed impalad

2018-06-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10745 )

Change subject: IMPALA-7175: deflake check for failed impalad
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10745/1/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/10745/1/tests/query_test/test_udfs.py@301
PS1, Line 301:   def __num_impalads(self):
Other tests may also run to the same problem, so I think that it would be 
useful to move this logic to a function like get_responsive_impalads() in 
impala_cluster.py



--
To view, visit http://gerrit.cloudera.org:8080/10745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97c6b398e43c6abb1df2b1783c26159137f14fa4
Gerrit-Change-Number: 10745
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Comment-Date: Mon, 18 Jun 2018 16:30:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6835: Add table name and node id to Kudu scanner errors

2018-06-13 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10671 )

Change subject: IMPALA-6835: Add table name and node id to Kudu scanner errors
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/10671
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0377fc8591738dc45092d228fcf292ddbb367825
Gerrit-Change-Number: 10671
Gerrit-PatchSet: 3
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 13 Jun 2018 15:33:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-06-12 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 18: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9986/17/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/17/be/src/exprs/timezone_db.cc@367
PS17, Line 367: Status TimezoneDatabase::LoadZoneAliasesFromHdfs(
  : const string& hdfs_zone_alias_conf) {
nit: this could fit in one line



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 18
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Jun 2018 13:24:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6408: [DOCS] Add a missing info about SHUFFLE

2018-06-12 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10685 )

Change subject: IMPALA-6408: [DOCS] Add a missing info about SHUFFLE
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/10685
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5738557354c384aab983f64722dde5944037aed7
Gerrit-Change-Number: 10685
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Comment-Date: Tue, 12 Jun 2018 13:49:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] Fail cleanly when in process server can't bind

2018-06-15 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10726 )

Change subject: Fail cleanly when in process server can't bind
..


Patch Set 1: Code-Review+1

I thought a bit about not too intrusive ways to reduce the window where a port 
number is already decided but the port itself is unbound. A static/global port 
number->socket fd map could be created, and FindUnusedEphemeralPort() could add 
the fd to this map instead of closing it. Another static function like 
"ClosePortIfReserved()" could be called as close as possible to the place where 
the port is bound to socket by the actual server. This function would be a noop 
if the port is not in the map, so caller would not have to know if the port was 
"reserved" or not.


--
To view, visit http://gerrit.cloudera.org:8080/10726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I376a2aa559f4b5cf3b96fa3465520e9983ecec4b
Gerrit-Change-Number: 10726
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Comment-Date: Fri, 15 Jun 2018 12:11:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7115: set a default THREAD RESERVATION LIMIT value

2018-06-14 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10628 )

Change subject: IMPALA-7115: set a default THREAD_RESERVATION_LIMIT value
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10628/1/testdata/workloads/functional-query/queries/QueryTest/thread-limits.test
File testdata/workloads/functional-query/queries/QueryTest/thread-limits.test:

http://gerrit.cloudera.org:8080/#/c/10628/1/testdata/workloads/functional-query/queries/QueryTest/thread-limits.test@109
PS1, Line 109: SELECT count(*) from functional.alltypes
When I am trying to create a plan with crazy amount of nodes, I use WITH + 
UNION to get exponential growth. The following query should lead to at least 
1000 fragments:

explain with
 c1 as (select count(*) from functional.alltypes),
 c2 as (select * from c1 union select * from c1),
 c4 as (select * from c2 union select * from c2),
 c8 as (select * from c4 union select * from c4),
 c16 as (select * from c8 union select * from c8),
 c32 as (select * from c16 union select * from c16),
 c64 as (select * from c32 union select * from c32),
 c128 as (select * from c64 union select * from c64),
 c256 as (select * from c128 union select * from c128),
 c512 as (select * from c256 union select * from c256)
select * from c512 union select * from c512;

I think that the resulting plan would not be the same as with the flat list of 
unions, but it would do the job while the test would be much more compact.



-- 
To view, visit http://gerrit.cloudera.org:8080/10628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I31d3fa3f6305c360922649dba53a9026c9563384
Gerrit-Change-Number: 10628
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Comment-Date: Thu, 14 Jun 2018 14:42:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6835: Improve Kudu scanner error messages to include the table name and the plan node id

2018-06-11 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10671 )

Change subject: IMPALA-6835: Improve Kudu scanner error messages to include the 
table name and the plan node id
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10671/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10671/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-6835: Improve Kudu scanner error messages to include the 
table name and the plan node id
nit: it would be nice to find a shorter title without loosing too much 
information - e.g. "Add table name and node id to Kudu scanner errors"


http://gerrit.cloudera.org:8080/#/c/10671/1//COMMIT_MSG@11
PS1, Line 11: TPlanNode id which made it inconveient to debug. This change add
nit: missing s


http://gerrit.cloudera.org:8080/#/c/10671/1/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/10671/1/be/src/exec/kudu-scanner.cc@151
PS1, Line 151:   Substitute("Unable to deserialize scan token for PlanNode 
'$0' with KuduTable '$1'",
 :scan_node_->id(), 
scan_node_->table_->name()));
This made me think about ways to make it less verbose - a function like "string 
KuduScanner::AppendInfo(const char* msg)" could be created to add the node id 
and table name to a message. It is up to you, but I think that it's worth to 
add +1 function to make the code a bit shorter and ensure consistent messages.


http://gerrit.cloudera.org:8080/#/c/10671/1/be/src/exec/kudu-scanner.cc@167
PS1, Line 167: scan_node_->id()
I am not sure about this, but I would add table name for these errors too.


http://gerrit.cloudera.org:8080/#/c/10671/1/be/src/exec/kudu-scanner.cc@168
PS1, Line 168:   VLOG_ROW << "Starting KuduScanner with ReadMode=" << mode << " 
timeout=" <<
It may be useful do add the same info to logs.



--
To view, visit http://gerrit.cloudera.org:8080/10671
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0377fc8591738dc45092d228fcf292ddbb367825
Gerrit-Change-Number: 10671
Gerrit-PatchSet: 1
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 11 Jun 2018 17:44:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width

2018-06-11 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10683


Change subject: IMPALA-7417: Remove DCHECKs with unnecessary constraint on 
dictionary encoding bit width
..

IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding 
bit width

Reading dictionary encoded Parquet data pages where the bit width is
larger than the encoded type's size (e.g. coding 8 bit TINYINT with
16 bit dictionary indices) led to DCHECK error in debug builds.
Impala does not create such parquet files (an N bit type can have
maximum 2^N distinct values, so N bit dictionary indices are enough
for a dictionary that contains every possible value), but the Parquet
standard does not forbid to do so.

These DCHECKs were probably introduced by a copy paste error (similar
checks exist in the non-dictionary encoded bit reader functions,
where they are valid).

Testing:
- a new test is added to check that these data pages can be decoded
  correctly

Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a
---
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.inline.h
M testdata/data/README
A testdata/data/dict_encoding_with_large_bit_width.parquet
M tests/query_test/test_scanners.py
5 files changed, 21 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/10683/1
--
To view, visit http://gerrit.cloudera.org:8080/10683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a
Gerrit-Change-Number: 10683
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR] [DOCS] Remove duplicate bullet point about null() in UDF data types

2018-06-06 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10614 )

Change subject: [DOCS] Remove duplicate bullet point about null() in UDF data 
types
..


Patch Set 1:

I have just bumped into this duplication. I thought that such a trivial change 
does not require a Jira, but I can create one for it if necessary.


--
To view, visit http://gerrit.cloudera.org:8080/10614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6e371bda95e1f8fb116e57505cc6f10096ddf3b
Gerrit-Change-Number: 10614
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Comment-Date: Wed, 06 Jun 2018 16:12:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Remove duplicate bullet point about null() in UDF data types

2018-06-06 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10614


Change subject: [DOCS] Remove duplicate bullet point about null() in UDF data 
types
..

[DOCS] Remove duplicate bullet point about null() in UDF data types

The removed bullet point contained the same information as the
second bullet point in the list.

Change-Id: Ib6e371bda95e1f8fb116e57505cc6f10096ddf3b
---
M docs/topics/impala_udf.xml
1 file changed, 0 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/10614/1
--
To view, visit http://gerrit.cloudera.org:8080/10614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib6e371bda95e1f8fb116e57505cc6f10096ddf3b
Gerrit-Change-Number: 10614
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-06-06 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 11:

(2 comments)

Thanks for adding zip support!
We should add some tests for zip_util, especially for error handling, which is 
an untested path at the moment if didn't miss something. I am ok with moving 
this (and dealing with my other comments) to a later commit.

http://gerrit.cloudera.org:8080/#/c/9986/11/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/11/be/src/exprs/timezone_db.cc@198
PS11, Line 198: GetNextDirectoryEntry
This is subjective, but I do not like this interface too much. I would prefer 
to wrap dir_stream to a class/struct, or create a function like this: static 
STATUS ListDirEntries(string path, vector& result, int max_result_num = 
0). Both could be moved to util/filesystem_util.


http://gerrit.cloudera.org:8080/#/c/9986/11/be/src/exprs/timezone_db.cc@213
PS11, Line 213: readdir_r
There was a discussion about readdir_r() vs readdir() in 
https://gerrit.cloudera.org/#/c/8546/8 , and readdir() was preferred in the end.



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 06 Jun 2018 18:35:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-05-28 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 10:

I do not see any more low hanging fruits for performance improvement. Some 
overhead could be removed by modifying CCTZ, but this is out of the scope of 
this change, so I created a follow up Jira:
IMPALA-7085:
Consider patching Google/CCTZ for Impala's need


--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 28 May 2018 16:20:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive

2018-06-01 Thread Csaba Ringhofer (Code Review)
Hello Jim Apple, Attila Jeges,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10486

to look at the new patch set (#4).

Change subject: IMPALA-7060: Restrict Impala to only support timezones that 
work in Hive
..

IMPALA-7060: Restrict Impala to only support timezones that work in Hive

IANA timezone integration (IMPALA-3307) will drop the support for
many non-standard timezone aliases. As IMPALA-3307 is a very large
change, its release may be delayed - meanwhile, it would be better
to discourage Impala 3.x users from using timezone names that will
not be supported in the future. For this reason, this change drops
the support of non-standard aliases from the current boost based
implementation.

This is supposed to be a temporary solution till IMPALA-3307, so
I have put minimal effort into it.

Testing:
- a few timezones are removed from timezoneverification.csv
- added test for invalid timezone names

Cherry-picks: not for 2.x.

Change-Id: I90859398081bae4976af31b09b3121c198b6adac
---
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.cc
M testdata/data/timezoneverification.csv
M tests/query_test/test_timezones.py
4 files changed, 17 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/10486/4
--
To view, visit http://gerrit.cloudera.org:8080/10486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac
Gerrit-Change-Number: 10486
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive

2018-06-04 Thread Csaba Ringhofer (Code Review)
Hello Jim Apple, Attila Jeges,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10486

to look at the new patch set (#5).

Change subject: IMPALA-7060: Restrict Impala to only support timezones that 
work in Hive
..

IMPALA-7060: Restrict Impala to only support timezones that work in Hive

IANA timezone integration (IMPALA-3307) will drop the support for
many non-standard timezone aliases. As IMPALA-3307 is a very large
change, its release may be delayed - meanwhile, it would be better
to discourage Impala 3.x users from using timezone names that will
not be supported in the future. For this reason, this change drops
the support of non-standard aliases from the current boost based
implementation.

This is supposed to be a temporary solution till IMPALA-3307, so
I have put minimal effort into it.

Testing:
- a few timezones are removed from timezoneverification.csv
- added test for invalid timezone names

Cherry-picks: not for 2.x.

Change-Id: I90859398081bae4976af31b09b3121c198b6adac
---
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.cc
M testdata/data/timezoneverification.csv
M tests/query_test/test_timezones.py
4 files changed, 17 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/10486/5
--
To view, visit http://gerrit.cloudera.org:8080/10486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac
Gerrit-Change-Number: 10486
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-7060: Restrict Impala to only support timezones that work in Hive

2018-06-04 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10486 )

Change subject: IMPALA-7060: Restrict Impala to only support timezones that 
work in Hive
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10486/4/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/10486/4/be/src/exprs/timezone_db.cc@628
PS4, Line 628: \"SystemV/AST4\",\"AST\",\"Atlantic Standard 
Time\",\"\",\"\",\"-04:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\
 : \"SystemV/AST4ADT\",\"AST\",\"Atlantic Standard 
Time\",\"ADT\",\"Atlantic Daylight 
Time\",\"-04:00:00\",\"+01:00:00\",\"-1;0;4\",\"+02:00:00\",\"-1;0;10\",\"+02:00:00\"\n\
 : \"SystemV/CST6\",\"CST\",\"Central Standard 
Time\",\"\",\"\",\"-06:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\
 : \"SystemV/CST6CDT\",\"CST\",\"Central Standard 
Time\",\"CDT\",\"Central Daylight 
Time\",\"-06:00:00\",\"+01:00:00\",\"-1;0;4\",\"+02:00:00\",\"-1;0;10\",\"+02:00:00\"\n\
 : \"SystemV/EST5\",\"EST\",\"Eastern Standard 
Time\",\"\",\"\",\"-05:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\
 : \"SystemV/EST5EDT\",\"EST\",\"Eastern Standard 
Time\",\"EDT\",\"Eastern Daylight 
Time\",\"-05:00:00\",\"+01:00:00\",\"-1;0;4\",\"+02:00:00\",\"-1;0;10\",\"+02:00:00\"\n\
 : \"SystemV/HST10\",\"HST\",\"Hawaii Standard 
Time\",\"\",\"\",\"-10:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\
 : \"SystemV/MST7\",\"MST\",\"Mountain Standard 
Time\",\"\",\"\",\"-07:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\
 : \"SystemV/MST7MDT\",\"MST\",\"Mountain Standard 
Time\",\"MDT\",\"Mountain Daylight 
Time\",\"-07:00:00\",\"+01:00:00\",\"-1;0;4\",\"+02:00:00\",\"-1;0;10\",\"+02:00:00\"\n\
 : \"SystemV/PST8\",\"PST\",\"Pacific Standard 
Time\",\"\",\"\",\"-08:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\
 : \"SystemV/PST8PDT\",\"PST\",\"Pacific Standard 
Time\",\"PDT\",\"Pacific Daylight 
Time\",\"-08:00:00\",\"+01:00:00\",\"-1;0;4\",\"+02:00:00\",\"-1;0;10\",\"+02:00:00\"\n\
 : \"SystemV/YST9\",\"AKST\",\"Alaska Standard 
Time\",\"\",\"\",\"-09:00:00\",\"+00:00:00\",\"\",\"\",\"\",\"\"\n\
 : \"SystemV/YST9YDT\",\"AKST\",\"Alaska Standard 
Time\",\"AKDT\",\"Alaska Daylight 
Time\",\"-09:00:00\",\"+01:00:00\",\"-1;0;4\",\"+02:00:00\",\"-1;0;10\",\"+02:00:00\"\n\
> I think these lines should be removed too.
Done


http://gerrit.cloudera.org:8080/#/c/10486/4/testdata/data/timezoneverification.csv
File testdata/data/timezoneverification.csv:

http://gerrit.cloudera.org:8080/#/c/10486/4/testdata/data/timezoneverification.csv@806
PS4, Line 806: SystemV/AST4,2015-06-20 15:00:00,2015-06-20 11:00:00
 : SystemV/AST4ADT,2015-06-20 15:00:00,2015-06-20 12:00:00
 : SystemV/AST4ADT,2014-12-20 15:00:00,2014-12-20 11:00:00
 : SystemV/CST6,2015-06-20 15:00:00,2015-06-20 09:00:00
 : SystemV/CST6CDT,2015-06-20 15:00:00,2015-06-20 10:00:00
 : SystemV/CST6CDT,2014-12-20 15:00:00,2014-12-20 09:00:00
 : SystemV/EST5,2015-06-20 15:00:00,2015-06-20 10:00:00
 : SystemV/EST5EDT,2015-06-20 15:00:00,2015-06-20 11:00:00
 : SystemV/EST5EDT,2014-12-20 15:00:00,2014-12-20 10:00:00
 : SystemV/HST10,2015-06-20 15:00:00,2015-06-20 05:00:00
 : SystemV/MST7,2015-06-20 15:00:00,2015-06-20 08:00:00
 : SystemV/MST7MDT,2015-06-20 15:00:00,2015-06-20 09:00:00
 : SystemV/MST7MDT,2014-12-20 15:00:00,2014-12-20 08:00:00
 : SystemV/PST8,2015-06-20 15:00:00,2015-06-20 07:00:00
 : SystemV/PST8PDT,2015-06-20 15:00:00,2015-06-20 08:00:00
 : SystemV/PST8PDT,2014-12-20 15:00:00,2014-12-20 07:00:00
 : SystemV/YST9,2015-06-20 15:00:00,2015-06-20 06:00:00
 : SystemV/YST9YDT,2015-06-20 15:00:00,2015-06-20 07:00:00
 : SystemV/YST9YDT,2014-12-20 15:00:00,2014-12-20 06:00:00
> These should be removed too, see my previous comment.
Done - note that these tests were not removed from 
https://gerrit.cloudera.org/#/c/9986/10/testdata/data/timezoneverification.csv



--
To view, visit http://gerrit.cloudera.org:8080/10486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I90859398081bae4976af31b09b3121c198b6adac
Gerrit-Change-Number: 10486
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Jim Apple 
Gerrit-Comment-Date: Mon, 04 Jun 2018 17:58:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

2018-06-26 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for 
common data errors
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@12
PS1, Line 12: the logs
: will contain a new line for every occurrence
> seems like we could change that generally.
I was hesitant to go that way, because messages with the same error code but 
different parameters can contain useful information (for example different 
column names).


http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22
PS1, Line 22: fragments
> Right, but I don't think we do much LogError() from that (other than from i
I agree that  this logic would be mainly useful for scanners.  KuduSink has a 
per row error though (TErrorCode::KUDU_NULL_CONSTRAINT_VIOLATION).

I thought about adding these functions to scanner code, but I did not find a 
suitable class - HdfsScanner would be a good candidate, but it does not have 
common ancestors with KuduScanner, and both classes should be able to use this 
logic. Do you have a good place in mind?

The primary motivation is to avoid massive logging with this specific message. 
Avoiding locking came up as a secondary benefit, as it seemed trivial to 
implement it with the new interface.



--
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 26 Jun 2018 14:45:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

2018-06-27 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for 
common data errors
..


Patch Set 1:

(2 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@12
PS1, Line 12: the logs
: will contain a new line for every occurrence
> But isn't that the point of the change (as you state below).  I guess it wo
The log lines to eliminate are the ones that report the exact same information. 
These would be merged to max two lines:

1. one at the first occurrence of the error
2. one when the error collector is flushed - this would contain the number of 
errors since 1.

This logic is probably more complicated then it should be, but it ensures that 
minimal information is lost.

It would be simpler to avoid the flushing logic by logging only the first 
errors, but as RuntimeState reports the number of occurrences per error code, 
some way is needed to increment the counters.


http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22
PS1, Line 22: fragments
> As the class is written today, it's not really tied to RuntimeState. You co
My problem with changing RuntimeState::LogError() to log only the first error 
for a code is that it would completely hide if an error occurs in more than one 
files/columns. Having one LogCollector per ColumnReader means that the error in 
one column cannot hide the one in another.

An alternate approach would be to keep a LogCollector per HdfsScanner, and keep 
the last message, and log only if the message was changed. This would simplify 
the interface and work well with column readers.

Another possibility would be to create a class to collect specially data 
errors. Its LogError() function could expect table/file/column parameters, and 
use them as the key of a map and count errors separately.



--
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Jun 2018 14:14:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7190: Remove unsupported format writer support

2018-06-27 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10823 )

Change subject: IMPALA-7190: Remove unsupported format writer support
..


Patch Set 1: Code-Review+1

(2 comments)

Some minor comments for tests, otherwise lgtm.

http://gerrit.cloudera.org:8080/#/c/10823/1/tests/query_test/test_compressed_formats.py
File tests/query_test/test_compressed_formats.py:

http://gerrit.cloudera.org:8080/#/c/10823/1/tests/query_test/test_compressed_formats.py@150
PS1, Line 150:   def test_unsupported_writers(self, vector):
I would prefer to use unique_database, and remove the drops from 
unsupported-writers.test


http://gerrit.cloudera.org:8080/#/c/10823/1/tests/shell/test_shell_interactive.py
File tests/shell/test_shell_interactive.py:

http://gerrit.cloudera.org:8080/#/c/10823/1/tests/shell/test_shell_interactive.py@a404
PS1, Line 404:
This line could be left here, but check that "... not in result.stduot". This 
would make it easier to restore the original test when there will be new 
deprecated query options.



--
To view, visit http://gerrit.cloudera.org:8080/10823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I821dc7495a901f1658daa500daf3791b386c7185
Gerrit-Change-Number: 10823
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 27 Jun 2018 13:13:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Update to the workaround for IMPALA-3316

2018-06-25 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10800 )

Change subject: [DOCS] Update to the workaround for IMPALA-3316
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10800/1/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/10800/1/docs/topics/impala_known_issues.xml@192
PS1, Line 192: SSS
It does not have to be millisecond precision - the date can be without 
fractional part or with 1-9 digit fractional part.



--
To view, visit http://gerrit.cloudera.org:8080/10800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If45da5d24dd3bc5f649d95b5bc104047420dbea1
Gerrit-Change-Number: 10800
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Comment-Date: Mon, 25 Jun 2018 17:36:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Update to the workaround for IMPALA-3316

2018-06-25 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10800 )

Change subject: [DOCS] Update to the workaround for IMPALA-3316
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If45da5d24dd3bc5f649d95b5bc104047420dbea1
Gerrit-Change-Number: 10800
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Comment-Date: Mon, 25 Jun 2018 18:19:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7181: Fix flaky test shell/test shell commandline.py::TestImpalaShell::test socket opening

2018-06-19 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10747 )

Change subject: IMPALA-7181: Fix flaky test 
shell/test_shell_commandline.py::TestImpalaShell::test_socket_opening
..


Patch Set 6: Code-Review+2

It turned out that I do not have an account on Jenkins yet, so I can only run 
DRY_RUN=true at the moment.


--
To view, visit http://gerrit.cloudera.org:8080/10747
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd64632ded936d49fc404bcac75588dd7886be44
Gerrit-Change-Number: 10747
Gerrit-PatchSet: 6
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Tue, 19 Jun 2018 13:18:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-02 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#6).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_last_ddl_time_update.py
6 files changed, 182 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/6
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-02 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#5).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_last_ddl_time_update.py
6 files changed, 182 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/5
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-02 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#7).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_last_ddl_time_update.py
6 files changed, 182 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/7
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6946: handle negative counts in RLE decoder

2018-05-02 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10233 )

Change subject: IMPALA-6946: handle negative counts in RLE decoder
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@326
PS2, Line 326: LIKELY
LIKELY seems to be strangely placed now - both conditions should be included in 
the likely branch, or maybe only the new condition, as current_value_ == value 
can be often false, for example in dict encoded pages.


http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@679
PS2, Line 679: int32_t num_repeats = NextNumRepeats();
DictDecoder::DecodeNextValue also calls NextNumLiterals() and 
NextNumRepeats(), and expects uint32_t - can you do some cleanup there?


http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@681
PS2, Line 681:uint32_t num_repeats_to_set =
 :std::min(num_repeats, num_values_to_consume 
- num_consumed);
We could change this to signed too and remove the template parameter from 
std::min, as both arguments and the result would be signed.


http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@694
PS2, Line 694: uint32_t num_literals_to_set =
 : std::min(num_literals, num_values_to_consume - 
num_consumed);
Same as in line 681.



--
To view, visit http://gerrit.cloudera.org:8080/10233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If75ef3fb12494209918c100f26407cd93b17addb
Gerrit-Change-Number: 10233
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Comment-Date: Wed, 02 May 2018 14:54:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6946: handle negative counts in RLE decoder

2018-05-02 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10233 )

Change subject: IMPALA-6946: handle negative counts in RLE decoder
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/10233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If75ef3fb12494209918c100f26407cd93b17addb
Gerrit-Change-Number: 10233
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 02 May 2018 18:59:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-04-26 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10116 )

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..


Patch Set 4:

(11 comments)

I do not consider this change to be complete, as there are some open questions:
- At this point every kind of COMPUTE STATS update impala.lastComputeStatsTime, 
including INCREMENTAL, TABLESAMPLE, and calls that only COMPUTE STATS for 
specific columns. I can exclude some of these, but I think that the current 
behavior should cause the "least surprise".
- I think that the tests should be reorganized, but I would like to finalize 
the behavior before doing that.

http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@240
PS3, Line 240:
> Can you explain in a comment why we need to clone the table instead of remo
I have changed the logic to upload data to HMS only if it was actually changed.

The cloning was a misunderstanding on my part: CatalogOpExecutor clones HMS 
tables before calling alter_table, but the reason is that it does not want to 
update its own representation till the update is actually successful, so it 
alters the clone and calls alter_table with it. After alter_table, it downloads 
the table from HMS and overwrites its representation with this new version, so 
Catalog always contains a version which was loaded from HMS. Kudu tables are 
different, as Impala tries to be in sync with Kudu instead of HMS, and this 
final alter_table's goal is to ensure that HMS is in sync with Kudu.


http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@728
PS3, Line 728:* but not for ALTER TABLE SET COLUMN STATS.
> This comment should point out that it updates impala.lastComputeStatsTime,
Done


http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@782
PS3, Line 782:  u
> Why do we only update this if both are set? Why not if we're just updating
I tried to be as strict as possible to ensure that only COMPUTE STATS update 
impala.lastComputeStatsTime, but it turned out to be unnecessary.


http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2857
PS3, Line 2857:* command if the metadata is not completely in-sync. This 
affects both Hive and
> Explain the bool parameter in the comment including what value the lastDdlT
Is the missing field explanation really necessary? I think that the current 
comment states quite clearly that the alter_table overwrites the whole table.


http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2862
PS3, Line 2862:*/
> unused?
Done


http://gerrit.cloudera.org:8080/#/c/10116/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3221
PS3, Line 3221:* parent.
> Assuming this line is still correct, can you add a comment where this happe
It is not correct (since a long time).


http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py
File tests/metadata/test_last_ddl_time_update.py:

http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@50
PS3, Line 50: # compute statistics the fill table property 
impala.lastComputeStatsTime
> This should be more verbose, explain why it is necessary to initialize it.
I would like to add more comments after the tests are reorganized to their 
final shape.


http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@75
PS3, Line 75: self.execute_query("drop stats %s" % FQ_TBL_NAME)
> We should also have a test for computing incremental stats, for sampled sta
Done


http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@145
PS3, Line 145: # change table property
> I think we should have defaults for both expect_* parameters here, or for n
Done


http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@161
PS3, Line 161: # drop statistics
> nit: beforeStatsTime
Done


http://gerrit.cloudera.org:8080/#/c/10116/3/tests/metadata/test_last_ddl_time_update.py@180
PS3, Line 180: LAST_COMPUTE_STATS_TIME_KEY = "impala.lastComputeStatsTime"
> stats_time, given the key is lastComputeStatsTime, too.
Done



--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master

[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-04-26 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#4).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_last_ddl_time_update.py
6 files changed, 152 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/4
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-11 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test@932
PS1, Line 932: 
The case when the subquery returns 0 row could be tested too.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 16:45:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell

2018-01-10 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8771 )

Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b
Gerrit-Change-Number: 8771
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 10 Jan 2018 10:59:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-01-15 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Gabor Kaszab,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8400

to look at the new patch set (#8).

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..

IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

This change adds support for "clustered", "noclustered", "shuffle" and
"noshuffle" hints in CTAS statement.

Example:
create /*+ clustered,noshuffle */ table t partitioned by (year, month) as
select * from functional.alltypes

The effect of these hints are the same as in insert statements:

clustered:
Sort locally by partition expression before insert to ensure that only
one partition is written at a time. The goal is to reduce to files kept
open / buffers kept in memory simultaneously.

noclustered:
Do not sort by primary key before insert to Kudu table. No effect on HDFS
tables currently, as this is their default behavior.

shuffle:
Forces the planner to add an exchange node that repartitions by the
partition expression of the output table. This means that a partition
will be written only by a single node, which minimizes the global
number of simultaneous writes.

noshuffle:
Do not add exchange node before insert to partitioned tables.

The parser needed some modifications to be able to differentiate between
CREATE statements that allow hints (CTAS), and CREATE statements that
do not (every other type of CREATE statements). As a result, KW_CREATE
was moved from tbl_def_without_col_defs to statement rules.

The tests mainly mirror the analysis / planner tests of INSERT.
Query tests are not created, as the hints have no effect on
the DDL part of CTAS, and the actual query ran is the same as in
the insert case.

Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
5 files changed, 331 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/8
--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-01-15 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Gabor Kaszab,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8400

to look at the new patch set (#7).

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..

IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

This change adds support for "clustered", "noclustered", "shuffle" and
"noshuffle" hints in CTAS statement.

Example:
create /*+ clustered,noshuffle */ table t partitioned by (year, month) as
select * from functional.alltypes

The effect of these hints are the same as in insert statements:

clustered:
Sort locally by partition expression before insert to ensure that only
one partition is written at a time. The goal is to reduce to files kept
open / buffers kept in memory simultaneously.

noclustered:
Do not sort by primary key before insert to Kudu table. No effect on HDFS
tables currently, as this is their default behavior.

shuffle:
Forces the planner to add an exchange node that repartitions by the
partition expression of the output table. This means that a partition
will be written only by a single node, which minimizes the global
number of simultaneous writes.

noshuffle:
Do not add exchange node before insert to partitioned tables.

The parser needed some modifications to be able to differentiate between
CREATE statements that allow hints (CTAS), and CREATE statements that
do not (every other type of CREATE statements). As a result, KW_CREATE
was moved from tbl_def_without_col_defs to statement rules.

The tests mainly mirror the analysis / planner tests of INSERT.
Query tests are not created, as the hints have no effect on
the DDL part of CTAS, and the actual query ran is the same as in
the insert case.

Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
5 files changed, 330 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/7
--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-01-17 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Gabor Kaszab,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8400

to look at the new patch set (#10).

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..

IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

This change adds support for "clustered", "noclustered", "shuffle" and
"noshuffle" hints in CTAS statement.

Example:
create /*+ clustered,noshuffle */ table t partitioned by (year, month) as
select * from functional.alltypes

The effect of these hints are the same as in insert statements:

clustered:
Sort locally by partition expression before insert to ensure that only
one partition is written at a time. The goal is to reduce the number of
files kept open / buffers kept in memory simultaneously.

noclustered:
Do not sort by primary key before insert to Kudu table. No effect on HDFS
tables currently, as this is their default behavior.

shuffle:
Forces the planner to add an exchange node that repartitions by the
partition expression of the output table. This means that a partition
will be written only by a single node, which minimizes the global
number of simultaneous writes.
If only one partition is written (because all partitioning columns
are constant or the target table is not partitioned), then the shuffle
hint leads to a plan where all rows are merged at the coordinator where
the table sink is executed.

noshuffle:
Do not add exchange node before insert to partitioned tables.

The parser needed some modifications to be able to differentiate between
CREATE statements that allow hints (CTAS), and CREATE statements that
do not (every other type of CREATE statements). As a result, KW_CREATE
was moved from tbl_def_without_col_defs to statement rules.

The tests mainly mirror the analysis / planner tests of INSERT.
Query tests are not created, as the hints have no effect on
the DDL part of CTAS, and the actual query ran is the same as in
the insert case.

Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
5 files changed, 332 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/10
--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-01-17 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8400 )

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8400/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8400/8//COMMIT_MSG@20
PS8, Line 20:
> nit: wrong word?
Done


http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup@1055
PS6, Line 1055: plan_hints:hints
> nit: please check that all review comments are marked as "Done" when sendin
Done


http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69
PS6, Line 69:* Helper class for parsing.
> Thanks for the explanation, I think I understand it now. Could we just have
I have no problems with going that way, but I am not sure that it would make 
the change smaller: InsertStmt would also need a Set/AddPlanHints function, and 
it would be probably better to rewrite InsertStmt to always use the setter 
instead of constructor arguments.


http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@71
PS8, Line 71:
> nit: "rule that checks" or "rules that check"
Done


http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@75
PS8, Line 75: > partitio
> nit: we generally seem to use camel case for acronyms, e.g. see createCtasT
Done


http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@344
PS6, Line 344: 01:EXCHANGE [UNPARTITIONED]
> Yes, I think it's intentional. Maybe the docs need to be more clear. Can yo
I have created IMPALA-6408 to track the doc issue.



--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 17 Jan 2018 16:04:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-01-17 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Gabor Kaszab,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8400

to look at the new patch set (#9).

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..

IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

This change adds support for "clustered", "noclustered", "shuffle" and
"noshuffle" hints in CTAS statement.

Example:
create /*+ clustered,noshuffle */ table t partitioned by (year, month) as
select * from functional.alltypes

The effect of these hints are the same as in insert statements:

clustered:
Sort locally by partition expression before insert to ensure that only
one partition is written at a time. The goal is to reduce the number of
files kept open / buffers kept in memory simultaneously.

noclustered:
Do not sort by primary key before insert to Kudu table. No effect on HDFS
tables currently, as this is their default behavior.

shuffle:
Forces the planner to add an exchange node that repartitions by the
partition expression of the output table. This means that a partition
will be written only by a single node, which minimizes the global
number of simultaneous writes.

noshuffle:
Do not add exchange node before insert to partitioned tables.

The parser needed some modifications to be able to differentiate between
CREATE statements that allow hints (CTAS), and CREATE statements that
do not (every other type of CREATE statements). As a result, KW_CREATE
was moved from tbl_def_without_col_defs to statement rules.

The tests mainly mirror the analysis / planner tests of INSERT.
Query tests are not created, as the hints have no effect on
the DDL part of CTAS, and the actual query ran is the same as in
the insert case.

Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
5 files changed, 331 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/9
--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

2018-01-16 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8973 )

Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only 
column privileges
..


Patch Set 3:

> Why do you say that change is more complex?
Picking a pre-existing database, and setting it up as needed for the test 
seemed more error prone to me than creating a minimal temporary db.

Now (after creating the AuthorizationTest.java tests) I agree that they have a 
better place in java.


--
To view, visit http://gerrit.cloudera.org:8080/8973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Gerrit-Change-Number: 8973
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Tue, 16 Jan 2018 20:26:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5801: [draft] Clean up codegen GetType() interface

2018-01-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9063


Change subject: IMPALA-5801: [draft] Clean up codegen GetType() interface
..

IMPALA-5801: [draft] Clean up codegen GetType() interface

First draft - my goal with this patch is create a review
where we can decide on the scope of the refactor and
agree on the names of the functions.

Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/exec-node.cc
M be/src/exec/filter-context.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/text-converter.cc
M be/src/exprs/compound-predicates.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/tuple.cc
M be/src/runtime/types.cc
M be/src/util/tuple-row-compare.cc
21 files changed, 211 insertions(+), 178 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/9063/1
--
To view, visit http://gerrit.cloudera.org:8080/9063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
Gerrit-Change-Number: 9063
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR] IMPALA-5801: [draft] Clean up codegen GetType() interface

2018-01-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9063 )

Change subject: IMPALA-5801: [draft] Clean up codegen GetType() interface
..


Patch Set 1:

(3 comments)

The declarations of the new/renamed functions are in llvm-codegen.h.

http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h@256
PS1, Line 256:   llvm::PointerType* GetInternalRepresentationPtr(const 
ColumnType& type);
I really do not like this name, if anyone has a better idea, please share it.


http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h@496
PS1, Line 496:   llvm::Type* boolean_type() { return 
GetInternalRepresentation(TYPE_BOOLEAN); }
Maybe bool_type would be more logical in a c++ context.


http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h@497
PS1, Line 497:   llvm::Type* i8_type() { return 
GetInternalRepresentation(TYPE_TINYINT); }
 :   llvm::Type* i16_type() { return 
GetInternalRepresentation(TYPE_SMALLINT); }
 :   llvm::Type* i32_type() { return 
GetInternalRepresentation(TYPE_INT); }
 :   llvm::Type* i64_type() { return 
GetInternalRepresentation(TYPE_BIGINT); }
I have chosen i8_type() instead of int8_type(), because i128_type() already 
used this convention. If we want these names to be short, the type can be 
abbreviated too, for example as int8_t().



--
To view, visit http://gerrit.cloudera.org:8080/9063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
Gerrit-Change-Number: 9063
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Comment-Date: Thu, 18 Jan 2018 21:15:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-01-18 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Gabor Kaszab, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8400

to look at the new patch set (#11).

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..

IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

This change adds support for "clustered", "noclustered", "shuffle" and
"noshuffle" hints in CTAS statement.

Example:
create /*+ clustered,noshuffle */ table t partitioned by (year, month) as
select * from functional.alltypes

The effect of these hints are the same as in insert statements:

clustered:
Sort locally by partition expression before insert to ensure that only
one partition is written at a time. The goal is to reduce the number of
files kept open / buffers kept in memory simultaneously.

noclustered:
Do not sort by primary key before insert to Kudu table. No effect on HDFS
tables currently, as this is their default behavior.

shuffle:
Forces the planner to add an exchange node that repartitions by the
partition expression of the output table. This means that a partition
will be written only by a single node, which minimizes the global
number of simultaneous writes.
If only one partition is written (because all partitioning columns
are constant or the target table is not partitioned), then the shuffle
hint leads to a plan where all rows are merged at the coordinator where
the table sink is executed.

noshuffle:
Do not add exchange node before insert to partitioned tables.

The parser needed some modifications to be able to differentiate between
CREATE statements that allow hints (CTAS), and CREATE statements that
do not (every other type of CREATE statements). As a result, KW_CREATE
was moved from tbl_def_without_col_defs to statement rules.

The tests mainly mirror the analysis / planner tests of INSERT.
Query tests are not created, as the hints have no effect on
the DDL part of CTAS, and the actual query ran is the same as in
the insert case.

Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
5 files changed, 330 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/11
--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-01-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8400 )

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8400/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/8400/10/fe/src/main/cup/sql-parser.cup@1185
PS10, Line 1185:   e
> nit: indent 2 spaces
Done



--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Thu, 18 Jan 2018 20:22:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-01-15 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8400 )

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..


Patch Set 7:

(7 comments)

I have overwritten the wrong test in patch set 7, please ignore it and compare 
patch 6 to 8.

http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@19
PS6, Line 19: nly
> add "more memory efficient"? That makes it more obvious why it is more effi
I hope it is correct and understandable enough this way.


http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@27
PS6, Line 27: shuffle:
> I don't think this is true, is it? For unpartitioned tables, what would the
Done


http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@35
PS6, Line 35:
> Mention how you tested it in the commit message.
I wrote about the current state of tests, but I became unsure during the 
process - should I add tests that actually execute the CTASs with hints? The 
reason why not to add them is that the non-DDL part of query execution is 
really the same as in INSERT tests and can be quite slow.


http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69
PS6, Line 69:* Helper class for parsing.
> Is this class still needed or a left-over from earlier patch-sets? Can you
It is still needed, as it is the result value of non-terminal 
create_tbl_as_select_stmt_inner.

I am not sure that this is the correct place for this explanation, maybe it 
should be moved to the .cup.


http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1701
PS6, Line 1701: Multiple non-conflicting
> Can you add one that has multiple distinct hints, like shuffle, clustered?
Done


http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@233
PS6, Line 233: shuffle
> Is that true? Without specifying a hint there should be cases where the cos
Thanks for spotting this, I have relied too much on experimentation instead of 
checking the code... I have split this test into two to test both outcomes.


http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@344
PS6, Line 344: |  order by: year ASC NULLS
> At first I didn't understand what this one does (I saw that insert.test:574
I should add a comment about this, but I am unsure whether it should be a 
simple comment, or a todo about a strange behavior. This looks intentional (see 
https://gerrit.cloudera.org/#/c/4162/ ), but it is not mentioned explicitly in 
the documentation. It is true that this reduces the number of writes overall, 
but as a side effect, it causes the whole table to be written to one node. 
Writing the whole table to the coordinator may be useful in some cases (like a 
temp table?), but I think that shuffle has a different intention. I would ask 
Greg / Alex about their opinion.



--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Mon, 15 Jan 2018 18:06:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5801: [draft] Clean up codegen GetType() interface

2018-01-23 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9063

to look at the new patch set (#2).

Change subject: IMPALA-5801: [draft] Clean up codegen GetType() interface
..

IMPALA-5801: [draft] Clean up codegen GetType() interface

First draft - my goal with this patch is create a review
where we can decide on the scope of the refactor and
agree on the names of the functions.

Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/exec-node.cc
M be/src/exec/filter-context.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/text-converter.cc
M be/src/exprs/compound-predicates.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/tuple.cc
M be/src/runtime/types.cc
M be/src/util/tuple-row-compare.cc
21 files changed, 224 insertions(+), 210 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/9063/2
--
To view, visit http://gerrit.cloudera.org:8080/9063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
Gerrit-Change-Number: 9063
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5801: [draft] Clean up codegen GetType() interface

2018-01-23 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9063

to look at the new patch set (#3).

Change subject: IMPALA-5801: [draft] Clean up codegen GetType() interface
..

IMPALA-5801: [draft] Clean up codegen GetType() interface

First draft - my goal with this patch is create a review
where we can decide on the scope of the refactor and
agree on the names of the functions.

Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/exec-node.cc
M be/src/exec/filter-context.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/text-converter.cc
M be/src/exprs/compound-predicates.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/tuple.cc
M be/src/runtime/types.cc
M be/src/util/tuple-row-compare.cc
21 files changed, 224 insertions(+), 210 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/9063/3
--
To view, visit http://gerrit.cloudera.org:8080/9063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
Gerrit-Change-Number: 9063
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

2018-01-17 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Laszlo Gaal, Gabor Kaszab, Attila Jeges, Dimitris 
Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8973

to look at the new patch set (#5).

Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only 
column privileges
..

IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

USE and SHOW TABLES should be allowed if there is at least one
table in a database where the user has table or column
privileges. Impala incorrectly checked only for table privileges.

To test this issue in AuthorizationTest.java, 'functional_avro'
is added as a test database with only column level permissions.

Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/resources/authz-policy.ini.template
4 files changed, 29 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/8973/5
--
To view, visit http://gerrit.cloudera.org:8080/8973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Gerrit-Change-Number: 8973
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 


[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

2018-01-17 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8973 )

Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only 
column privileges
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8973/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
File fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java:

http://gerrit.cloudera.org:8080/#/c/8973/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@102
PS3, Line 102:   // SELECT permissions on columns ('id') on 
'functional_avro.alltypessmall'
> missing opening single-quote for 'functional_avro.alltypessmall'
Done


http://gerrit.cloudera.org:8080/#/c/8973/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@399
PS3, Line 399: privilege = new TPrivilege("", TPrivilegeLevel.SELECT,
> typo: privilege (which has already been declared)
Thanks for spotting it!
Please ignore patch set 4, and compare 3 to 5.
Note that it may look like if this could fit into 1 line, but it would be 91 
characters.



--
To view, visit http://gerrit.cloudera.org:8080/8973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Gerrit-Change-Number: 8973
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 17 Jan 2018 13:34:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

2018-01-12 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8973


Change subject: IMPALA-4315: Allow USE and SHOW TABLES if the user has only 
column privileges
..

IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges

USE and SHOW TABLES should be allowed if there is at least one
table in a database where the user has table or column
privileges. Impala incorrectly checked only for table privileges.

All automatic tests were added to grant_revoke.test, so
AuthorizationTest.java is not changed - the reason for this
is that adding a db with only column privileges to
AuthorizationTest.java seemed more complex.

Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
3 files changed, 56 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/8973/1
--
To view, visit http://gerrit.cloudera.org:8080/8973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia69756a18cb1db304d2bb8c92288612cbd1164d8
Gerrit-Change-Number: 8973
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 


[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell

2018-01-12 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8771 )

Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8771/3/docs/topics/impala_shell_options.xml
File docs/topics/impala_shell_options.xml:

http://gerrit.cloudera.org:8080/#/c/8771/3/docs/topics/impala_shell_options.xml@627
PS3, Line 627:
> I think it's a bug that these two are not working in the impalarc and I fil
I did not notice shell options during the implementation of 
[impala.query_options]. These can be set like query options in the shell, but 
have effect on the the shell, not the Impala server. I agree with Lars that 
this is a bug.



--
To view, visit http://gerrit.cloudera.org:8080/8771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b
Gerrit-Change-Number: 8771
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Fri, 12 Jan 2018 15:28:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-02-07 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Gabor Kaszab, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8400

to look at the new patch set (#12).

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..

IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

This change adds support for "clustered", "noclustered", "shuffle" and
"noshuffle" hints in CTAS statement.

Example:
create /*+ clustered,noshuffle */ table t partitioned by (year, month) as
select * from functional.alltypes

The effect of these hints are the same as in insert statements:

clustered:
Sort locally by partition expression before insert to ensure that only
one partition is written at a time. The goal is to reduce the number of
files kept open / buffers kept in memory simultaneously.

noclustered:
Do not sort by primary key before insert to Kudu table. No effect on HDFS
tables currently, as this is their default behavior.

shuffle:
Forces the planner to add an exchange node that repartitions by the
partition expression of the output table. This means that a partition
will be written only by a single node, which minimizes the global
number of simultaneous writes.
If only one partition is written (because all partitioning columns
are constant or the target table is not partitioned), then the shuffle
hint leads to a plan where all rows are merged at the coordinator where
the table sink is executed.

noshuffle:
Do not add exchange node before insert to partitioned tables.

The parser needed some modifications to be able to differentiate between
CREATE statements that allow hints (CTAS), and CREATE statements that
do not (every other type of CREATE statements). As a result, KW_CREATE
was moved from tbl_def_without_col_defs to statement rules.

The tests mainly mirror the analysis / planner / parser tests of INSERT.
Query tests are not created, as the hints have no effect on
the DDL part of CTAS, and the actual query ran is the same as in
the insert case.

Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
9 files changed, 330 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/12
--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-02-07 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Gabor Kaszab, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8400

to look at the new patch set (#13).

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..

IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

This change adds support for "clustered", "noclustered", "shuffle" and
"noshuffle" hints in CTAS statement.

Example:
create /*+ clustered,noshuffle */ table t partitioned by (year, month)
as select * from functional.alltypes

The effect of these hints are the same as in insert statements:

clustered:
Sort locally by partition expression before insert to ensure that only
one partition is written at a time. The goal is to reduce the number of
files kept open / buffers kept in memory simultaneously.

noclustered:
Do not sort by primary key before insert to Kudu table. No effect on HDFS
tables currently, as this is their default behavior.

shuffle:
Forces the planner to add an exchange node that repartitions by the
partition expression of the output table. This means that a partition
will be written only by a single node, which minimizes the global
number of simultaneous writes.
If only one partition is written (because all partitioning columns
are constant or the target table is not partitioned), then the shuffle
hint leads to a plan where all rows are merged at the coordinator where
the table sink is executed.

noshuffle:
Do not add exchange node before insert to partitioned tables.

The parser needed some modifications to be able to differentiate between
CREATE statements that allow hints (CTAS), and CREATE statements that
do not (every other type of CREATE statements). As a result, KW_CREATE
was moved from tbl_def_without_col_defs to statement rules.

The tests mainly mirror the analysis / planner / parser tests of INSERT.
Query tests are not created, as the hints have no effect on
the DDL part of CTAS, and the actual query ran is the same as in
the insert case.

Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
9 files changed, 333 insertions(+), 78 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/13
--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-02-07 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8400 )

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..


Patch Set 11:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/cup/sql-parser.cup@1065
PS11, Line 1065: create_tbl_as_select_stmt_inner ::=
> let's call this create_tbl_as_select_params to match what is produced
Done


http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69
PS6, Line 69:* Helper class for parsing.
> I looked at this closely and both approaches seem fine to me. Here are my t
Thanks!


http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@76
PS11, Line 76: public CreateTableStmt createStmt_;
> We don't use the "_" for public members.
Done


http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1642
PS11, Line 1642:   private String[][] getHintStyles() {
> Can you move this to FrontendTestBase, make it protected, and use it in the
I have moved the member variable version found in other files instead of the 
function - I prefer the member, because no new arrays have to be created on 
every call.


http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1651
PS11, Line 1651:   public void TestCreateTableAsSelectWithHints() throws 
AnalysisException {
> * I think the more important testing is in ParserTest#TestPlanHints similar
I have added the CTAS version of insert tests to ParserTest#TestPlanHints. I 
think that some of these tests should move there, like the ones about the case 
insensitivity of hints, while the ones with errors or warnings should remain 
here.

I am unsure about the rest, maybe they can be deleted, as planner tests already 
check most of these cases.

About combining with TestInsertHints(): I have created combined 
insert/upsert/ctas tests in ParserTest#TestPlanHints, something similar could 
be done here. What makes this more tricky is that the actual tables are 
checked, so the source table must exist, and the target table in insert or the 
partitioning parameters in CTAS must match with the result of the select + 
there are some cases where insert is possible, but CTAS is not (HBase tables).


http://gerrit.cloudera.org:8080/#/c/8400/11/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

http://gerrit.cloudera.org:8080/#/c/8400/11/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@207
PS11, Line 207: # IMPALA-4167: noclustered hint is ignored if the CTAS has 
partitions and sort columns.
> * Any ideas to trim down the tests? Seems like a lot of new tests and we sh
Maybe create /*+ clustered,noshuffle */ + the default case without hints is 
enough to show that both type of hints can take effect. If the other tests are 
deleted, then some of them should be moved to the insert tests, because they 
cover some scenarios that are not tested there. What do you think about this 
solution?

I agree that both plans are unnecessary - I kept the DISTRIBUTEDPLAN, because 
the effect of (no)shuffle can be only checked there.



--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 07 Feb 2018 17:50:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5801: Clean up codegen GetType() interface

2018-02-08 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9063

to look at the new patch set (#4).

Change subject: IMPALA-5801: Clean up codegen GetType() interface
..

IMPALA-5801: Clean up codegen GetType() interface

Several functions that return llvm::(Pointer)Type were renamed
to make them shorter or indicate their roles more clearly.

The renamed functions can be found in llvm-codegen.h/cc. Changes
in other files contain only renamed calls with the same functinality.

Testing:
No new tests are necessery, as no functianility was changed.

Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/exec-node.cc
M be/src/exec/filter-context.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/text-converter.cc
M be/src/exprs/compound-predicates.cc
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/tuple.cc
M be/src/runtime/types.cc
M be/src/util/tuple-row-compare.cc
23 files changed, 325 insertions(+), 307 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/9063/4
--
To view, visit http://gerrit.cloudera.org:8080/9063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
Gerrit-Change-Number: 9063
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5801: Clean up codegen GetType() interface

2018-02-08 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9063 )

Change subject: IMPALA-5801: Clean up codegen GetType() interface
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h@263
PS1, Line 263: GetType
> I'm not sure that the templated version would improve readability enough to
GetNamed(Ptr)Type is ok for me, even if it it a bit longer than I would prefer.

I was curious about the error message, so I created an implementation and added 
a call with parameter int (which should lead to an error, as it doesn't have 
LLVM_CLASS_NAME).

The error message contains some useful clues (first half) but also some strange 
stuff (second half), so I must agree that it could lead to confusion:

In file included from be/src/codegen/llvm-codegen.cc:18:0:
be/src/codegen/llvm-codegen.h: In instantiation of ‘llvm::Type* 
impala::LlvmCodeGen::GetType() [with T = int]’:
be/src/codegen/llvm-codegen.cc:484:33:   required from here
be/src/codegen/llvm-codegen.h:259:65: error: ‘LLVM_CLASS_NAME’ is not a member 
of ‘int’
   llvm::Type* GetType() { return GetNamedType(T::LLVM_CLASS_NAME); }
 ^
be/src/codegen/llvm-codegen.h: In member function ‘llvm::Type* 
impala::LlvmCodeGen::GetType() [with T = int]’:
be/src/codegen/llvm-codegen.h:259:68: error: control reaches end of non-void 
function [-Werror=return-type]
   llvm::Type* GetType() { return GetNamedType(T::LLVM_CLASS_NAME); }
^
be/src/codegen/llvm-codegen.cc: In function ‘void boost::throw_exception(const 
std::exception&)’:
be/src/codegen/llvm-codegen.cc:1719:1: error: ‘noreturn’ function does return 
[-Werror]
 }


http://gerrit.cloudera.org:8080/#/c/9063/1/be/src/codegen/llvm-codegen.h@497
PS1, Line 497:   llvm::Type* i8_type() { return 
GetInternalRepresentation(TYPE_TINYINT); }
> Changing the interface first makes sense, we should switch over the impleme
Done



--
To view, visit http://gerrit.cloudera.org:8080/9063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
Gerrit-Change-Number: 9063
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 08 Feb 2018 22:12:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

2018-02-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 09 Feb 2018 19:42:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

2018-02-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test:

http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@55
PS2, Line 55: # IMPALA-6077: unsupported BIT_PACKED encoding fails when 
materializing columns.
: select count(id), count(tinyint_col), count(smallint_col), 
count(int_col),
:   count(bigint_col), count(float_col), count(double_col), 
count(date_string_col),
:   count(string_col), count(timestamp_col), count(year), 
count(month), count(day)
: from alltypesagg_bitpacked
This query seems to be the same as the next query, and should not materialize 
columns.


http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@64
PS2, Line 64: materializing
I am not 100% sure about this, but I think that if a column is not complex, and 
the stats are filled, then count can be served from column chunk stats without 
reading any data page, so this error will not be returned.

This may not be a problem for this specific parquet file, but I would mention 
it in a comment, or replace the query with something that has to read the data 
pages.


http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@65
PS2, Line 65: select count(id), count(tinyint_col), count(smallint_col), 
count(int_col),
:   count(bigint_col), count(float_col), count(double_col), 
count(date_string_col),
:   count(string_col), count(timestamp_col), count(year), 
count(month), count(day)
Is it necessary to list every column here? If one column is enough for the 
test, then I would prefer if it were shorter (for the sake of readability).



--
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Fri, 09 Feb 2018 14:58:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5801: Clean up codegen GetType() interface

2018-02-14 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9063 )

Change subject: IMPALA-5801: Clean up codegen GetType() interface
..


Patch Set 7:

I have uploaded a patch with the template approach. (please ignore patch set 6, 
which contains some file by accident)

The patch also contains some simplification around the changed call sites (like 
replacing GetTypes(...) with convenience functions), so it is not "rename only".

I have also noticed several places where codegen is used in over complicated 
ways, for example 
"llvm::ConstantInt::get(llvm::Type::getInt1Ty(codegen->context()), false)" 
could be simply "codegen->false_value()". I can do some GetConst cleanup here, 
or I can create a ticket for it.


--
To view, visit http://gerrit.cloudera.org:8080/9063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
Gerrit-Change-Number: 9063
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 14 Feb 2018 18:22:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5801: Clean up codegen GetType() interface

2018-02-14 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9063

to look at the new patch set (#7).

Change subject: IMPALA-5801: Clean up codegen GetType() interface
..

IMPALA-5801: Clean up codegen GetType() interface

Several functions that return llvm::(Pointer)Type were renamed
to make them shorter or indicate their roles more clearly. Some
additional convenience functions were created to make some common
codegen tasks simpler.

The renamed functions can be found in llvm-codegen.h/cc. Changes
in other files are mainly renamed calls with the same functionality.

Testing:
No new tests are necessary, as no functionality was changed.

Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/exec-node.cc
M be/src/exec/filter-context.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/text-converter.cc
M be/src/exprs/compound-predicates.cc
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/tuple.cc
M be/src/runtime/types.cc
M be/src/util/tuple-row-compare.cc
26 files changed, 339 insertions(+), 361 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/9063/7
--
To view, visit http://gerrit.cloudera.org:8080/9063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
Gerrit-Change-Number: 9063
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5801: Clean up codegen GetType() interface

2018-02-19 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Laszlo Gaal, Gabor Kaszab, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9063

to look at the new patch set (#9).

Change subject: IMPALA-5801: Clean up codegen GetType() interface
..

IMPALA-5801: Clean up codegen GetType() interface

Several functions that return llvm::(Pointer)Type were renamed
to make them shorter or indicate their roles more clearly. Some
additional convenience functions were created to make some common
codegen tasks simpler:

- Get(Ptr)Type functions with string parameter are renamed to
  GetNamed(Ptr)Type
- GetStruct(Ptr)Type template functions are created to make
  GetNamedType(MyStruct::LLVM_CLASS_NAME) calls simpler (some
  classes had LLVM_CLASS_NAME as private, these are changed to
  public)
- integer type convenience functions are renamed to indicate
  bit width instead of matching SQL type (e.g. int_type->i32_type)
- similar convenience functions were created for ptr to primitive
  types, int_ptr_type
- Get(Ptr)Type functions with ColumnType parameter are renamed
  to GetSlot(Ptr)Type
- GetIntConstant function is split to several functions depending
  on the type of the integer e.g. GetI32Constant

The renamed functions can be found in llvm-codegen.h/cc. Changes
in other files are mainly renamed calls with the same functionality.

Testing:
No new tests are necessary, as no functionality was changed.

Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/exec-node.cc
M be/src/exec/filter-context.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/text-converter.cc
M be/src/exprs/compound-predicates.cc
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/tuple.cc
M be/src/runtime/types.cc
M be/src/util/tuple-row-compare.cc
26 files changed, 357 insertions(+), 393 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/9063/9
--
To view, visit http://gerrit.cloudera.org:8080/9063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
Gerrit-Change-Number: 9063
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5801: Clean up codegen GetType() interface

2018-02-19 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9063 )

Change subject: IMPALA-5801: Clean up codegen GetType() interface
..


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9063/9/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/9063/9/be/src/codegen/llvm-codegen.h@529
PS9, Line 529:   llvm::Constant* GetI8Constant(uint64_t val) {
 : return llvm::ConstantInt::get(context(), llvm::APInt(8, 
val));
 :   }
 :   llvm::Constant* GetI16Constant(uint64_t val) {
 : return llvm::ConstantInt::get(context(), llvm::APInt(16, 
val));
 :   }
 :   llvm::Constant* GetI32Constant(uint64_t val) {
 : return llvm::ConstantInt::get(context(), llvm::APInt(32, 
val));
 :   }
 :   llvm::Constant* GetI64Constant(uint64_t val) {
I have written the val arguments type to uint64_t to minimize the change 
compared to the original behavior.


http://gerrit.cloudera.org:8080/#/c/9063/8/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/9063/8/be/src/codegen/llvm-codegen.h@272
PS8, Line 272: G
> Nit: extra whitespace
Done


http://gerrit.cloudera.org:8080/#/c/9063/8/be/src/codegen/llvm-codegen.h@277
PS8, Line 277: G
> Nit: extra whitespace
Done


http://gerrit.cloudera.org:8080/#/c/9063/8/be/src/codegen/llvm-codegen.h@506
PS8, Line 506: Consta
> Being curious: is there any specific reason for not changing null_ptr_value
There was no specific reason. I had to change true/false_value to Constant 
because of GetBoolConstant, so I also changed the function that returns them.

Note that some other functions' return type could be changed from Value* to 
Constant*, for example CastPtrToLlvmPtr(). It could be also nice to change 
several variable's type on the call sites of these functions.



--
To view, visit http://gerrit.cloudera.org:8080/9063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib146ca21af82023b0b460f813eae3435b4b2eb77
Gerrit-Change-Number: 9063
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 19 Feb 2018 14:32:39 +
Gerrit-HasComments: Yes


  1   2   3   4   5   6   7   8   9   10   >