[Impala-ASF-CR] IMPALA-3504: [DOCS] Document utc timestamp()

2017-10-09 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8190 )

Change subject: IMPALA-3504: [DOCS] Document utc_timestamp()
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8190/2/docs/topics/impala_datetime_functions.xml
File docs/topics/impala_datetime_functions.xml:

http://gerrit.cloudera.org:8080/#/c/8190/2/docs/topics/impala_datetime_functions.xml@2543
PS2, Line 2543: For working with date/time values represented as 
integer values, you can convert
The most typical integer timestamp representation is the number of seconds (not 
microsecs) elapsed since the Unix epoch. It may be worth including specific 
instructions for dealing with these. If there are no conversion functions for 
these, then I suggest providing examples incorporating multiplication and 
division.


http://gerrit.cloudera.org:8080/#/c/8190/2/docs/topics/impala_datetime_functions.xml@2545
PS2, Line 2545: with the 
unix_micros_to_utc_timestamp() and
Shouldn't the unix_micros_to_utc_timestamp and utc_to_unix_micros functions be 
briefly documented separately as well, just like the from_unixtime and 
unix_timestamp functions are? At least an indexterm should be added in my 
opinion.


http://gerrit.cloudera.org:8080/#/c/8190/2/docs/topics/impala_datetime_functions.xml@2557
PS2, Line 2557: select now(), utc_timestamp();
(nit, optional) This pairing of functions for the queries seems quite arbitrary 
to me and makes it hard to compare all three functions. I would suggest 
examples showing either each function separately (3 queries) or all of them in 
a single query:

select now(), current_timestamp(), utc_timestamp();



--
To view, visit http://gerrit.cloudera.org:8080/8190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2662fc79d588f22a24a5067429a57b3c0d0f0f0
Gerrit-Change-Number: 8190
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-Comment-Date: Mon, 09 Oct 2017 10:22:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-03-02 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5939/3//COMMIT_MSG
Commit Message:

Line 23: global flag --prevent_parquet_mr_zone_adjustment is set to true.
> Not sure if the name of this flag was already decided upon, but imo, it's n
It is also important that it is only set on new tables and that it is set to 
UTC. I would suggest --set_parquet_mr_int96_write_zone_to_utc_on_new_tables


http://gerrit.cloudera.org:8080/#/c/5939/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 123: "Impala, Hive, and Spark to not apply timestamp timezone 
adjustments for parquet "
> Mention that this property also makes Hive write in UTC.
Hive always writes in UTC (that's the problem). It makes Hive write as if it 
was located in UTC instead of the actual local timezone, which makes the 
written value match the original value.

I think this effect is so complex that we can not describe it in a few lines. I 
would replace the last sentence with "This changes the behavior of recent 
versions of Hive and Spark as well. The writing of timestamps is affected in 
Hive and Spark but not in Impala. The reading of timestamps that were written 
by Hive, Spark, or any other parquet-mr writer is affected in Hive, Spark and 
Impala. You can find details in the documentation."


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-20 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 2:

> > (1 comment)
 > 
 > Apologies for the delayed reply. Hive writes timestamps using 12
 > bytes using little endian. Then it passes them to parquet-mr as a
 > BINARY string, which means it is hitting PARQUET-251. This explains
 > why I saw the odd values for min/max in my tests.
 > 
 > Internally parquet-mr orders BINARY values using byte comparison,
 > potentially leading to a min/max value not being the semantically
 > smallest/largest value of a set of values. I am inclined to call
 > this a bug in hive, but I'm curious to hear what you think about
 > this.

I don't think it's a bug that the min/max corresponds to the binary ordering, 
since at Parquet's level timestamps are just meaningless bytes. If we were 
using a proper Parquet logical type then it would be different, but when saving 
12 bytes, I think the proper order is the binary ordering. In any case, I think 
we should aim for Hive-compatibility in this.

The bug that causes the last row to be both the min and max values is a major 
pain though that will make column statistics for byte arrays totally useless. I 
don't see how we could handle that other than ignoring any such min/max values 
written by affected Hive versions.

-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-20 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5611/2/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 39: /// TIMESTAMP values are written in the in-memory format used by 
Impala, relative to UTC,
> That's concerning if Hive and parquet-mr are inconsistent in how they defin
It's not that Hive and parquet-mr do it differently, it's simply that there is 
no such type as our timestamp in Parquet (there is a separate timestamp_millis 
type), so parquet-mr doesn't handle this type at all, while Hive does. From 
Parquet's perspective it's just 96 bits without meaning, so probably the min 
and max should be calculated accordingly as well and not based on the base type.

On the other hand, if Hive already has some implementation, we should probably 
use the same logic. But if I remember correctly, Lars looked at Hive's min/max 
values for timestamps and found that they were not the min/max values in any 
way, neither as instants on a timeline, neither as the 96 bits in any byte 
order.


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-19 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5611/2/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 39: /// TIMESTAMP values are written in the in-memory format used by 
Impala, relative to UTC,
> I think we convert them to INT96 for parquet. I think the INT96 order is eq
Lars, if I remember correctly you found that Hive does not write statistics 
according to these rules. Did you find out what logic they follow? Is it 
different than this implementation? If so, will the parsing logic contain a 
condition that checks whether the stats were written by Hive?


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: optional 2nd and 3rd arguments for instr().

2017-01-09 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: optional 2nd and 3rd arguments for instr().
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5589/2/docs/topics/impala_string_functions.xml
File docs/topics/impala_string_functions.xml:

PS2, Line 395: negative position argument
> That particular circumstance causes an error instead of a zero return value
That's intentional. In fact, initially I returned 0 in this case as well, but 
then acting on Dan Hecht's recommendation I checked this behavior in Oracle and 
found that Oracle gives an error in this case, so we do the same. Related 
discussion can be found on 
https://gerrit.cloudera.org/#/c/4094/13/be/src/exprs/string-functions-ir.cc@292


-- 
To view, visit http://gerrit.cloudera.org:8080/5589
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I17268bdb480230938f94559fe1eabe34ac2448b7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-Reviewer: zi+z...@cloudera.com
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4371: Incorrect DCHECK-s in hdfs-parquet-table-writer

2016-11-03 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-4371: Incorrect DCHECK-s in hdfs-parquet-table-writer
..


Patch Set 2:

> Please see https://issues.cloudera.org/browse/IMPALA-4371

It seems that Gerrit auto-linkifies JIRA entries even inside URLs, which 
results in invalid links. I should just probably refer to it as IMPALA-4371.

-- 
To view, visit http://gerrit.cloudera.org:8080/4835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I52e1b1354e9ea056b49331e75e53759952a81b76
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4371: Incorrect DCHECK-s in hdfs-parquet-table-writer

2016-11-03 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-4371: Incorrect DCHECK-s in hdfs-parquet-table-writer
..


Patch Set 2:

> Oh, oops, I wasn't noticing this was the writier.  In that case,
 > why are these DCHECKs not valid?  Why does our code create pages
 > with zero values (other than the last)? Or were we hitting the
 > non-zero compressed size dcheck (and if so, that also doesn't seem
 > to make sense).

Originally I explained the reason in the commit message, but when I was asked 
to open a JIRA about it, I moved the explanation there. Please see 
https://issues.cloudera.org/browse/IMPALA-4371

-- 
To view, visit http://gerrit.cloudera.org:8080/4835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I52e1b1354e9ea056b49331e75e53759952a81b76
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4371: Incorrect DCHECK-s in hdfs-parquet-table-writer

2016-11-03 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-4371: Incorrect DCHECK-s in hdfs-parquet-table-writer
..


Patch Set 2:

> Do you have access to the file to repro'ed this in the wild?  And
 > given that DCHECKs are compiled out of release builds, how did this
 > cause a problem in the wild?

In the wild, some `insert into ... select from ...` queries have hit this check 
when converting CSV to Parquet, but I do not have access to those CSV files. A 
debug build was used while investigating an unrelated issue, that's how the 
DCHECK failure was noticed.

-- 
To view, visit http://gerrit.cloudera.org:8080/4835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I52e1b1354e9ea056b49331e75e53759952a81b76
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4371: Incorrect DCHECK-s in hdfs-parquet-table-writer

2016-11-03 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-4371: Incorrect DCHECK-s in hdfs-parquet-table-writer
..


Patch Set 2:

> Is it reasonably possible to create a file that demonstrates this
 > and can be added to the regression test suite?

Dan, it has just occurred to me that since this condition happens while writing 
a file and not while reading it, I don't see how we could use a test file to 
test it. Additionally, I don't see a real risk of regression, as I don't expect 
anyone to put in DCHECK-s based on the same false assumptions again.

So considering the value of such a test and the difficulty of reproducing the 
exact conditions triggering the DCHECK-s, I suggest not creating a test file 
for the regression suite.

-- 
To view, visit http://gerrit.cloudera.org:8080/4835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I52e1b1354e9ea056b49331e75e53759952a81b76
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4371: Incorrect DCHECK-s in hdfs-parquet-table-writer

2016-10-26 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has uploaded a new patch set (#2).

Change subject: IMPALA-4371: Incorrect DCHECK-s in hdfs-parquet-table-writer
..

IMPALA-4371: Incorrect DCHECK-s in hdfs-parquet-table-writer

Change-Id: I52e1b1354e9ea056b49331e75e53759952a81b76
---
M be/src/exec/hdfs-parquet-table-writer.cc
1 file changed, 1 insertion(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/4835/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I52e1b1354e9ea056b49331e75e53759952a81b76
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Remove seemingly incorrect DCHECK-s.

2016-10-25 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4835

Change subject: Remove seemingly incorrect DCHECK-s.
..

Remove seemingly incorrect DCHECK-s.

The first conditional DCHECK means that if a page's size is 0 then it's
compressed size is also 0. This, however, seems to be a false
assumption, as the compressed output may include metadata, such as
length or checksum.

The GZIP compressor, for example, states that an input of 0 bytes
requires 23 bytes when compressed. The Snappy compressor also mentions
storing length information in the compressed output. The compressed size
estimation in the LZ4 compressor also contains a constant part.

The "Last page might be empty" comment and the second DCHECK also seems
to be based on a false assumption. If a value doesn't fit on the current
page, AppendRow creates a possible bigger new page and tries writing the
data in the new page instead. This means that if the data is bigger than
the page size, then the current page is finalized and a new page is
added, even if the original page was empty. In other words, empty pages
can occur in the middle of the pages_ array as well, not only at the end
of it.

Change-Id: I52e1b1354e9ea056b49331e75e53759952a81b76
---
M be/src/exec/hdfs-parquet-table-writer.cc
1 file changed, 1 insertion(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/4835/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I52e1b1354e9ea056b49331e75e53759952a81b76
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output

2016-10-18 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has abandoned this change.

Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output
..


Abandoned

Abandoning the review until I can work on it again.

-- 
To view, visit http://gerrit.cloudera.org:8080/4527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output

2016-10-04 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4527/2/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java
File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java:

Line 76: return 
Joiner.on(".").join(getStandaloneIdentSqlList((Splitter.on('.').split(ident;
> I think we can still make progress here, I'm not sure I completely follow y
If I remember correctly, I encountered a problem with the 
com.cloudera.impala.analysis.TableName class. It stores the table name as a 
String, but the table name itself can be qualified in case of complex types, 
thus it can not be quoted by simply putting it between backticks.

It also seems that com.cloudera.impala.analysis.InsertStmt and 
com.cloudera.impala.catalog.View constructors take a List parameter for 
columns and store it in the object.

It seems to me that in order to properly quote these, we would need to refactor 
the code to use a data structure capable of representing hierarchical names 
unambiguously to pass the table and column references around and to store them. 
I thought that the ongoing effort you mentioned is about addressing this.


-- 
To view, visit http://gerrit.cloudera.org:8080/4527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output

2016-09-29 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4527/2/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java
File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java:

Line 76: return 
Joiner.on(".").join(getStandaloneIdentSqlList((Splitter.on('.').split(ident;
> Those are two separate issues. In your example, we are guarding against a l
I added splitting because when I just put the whole string between backticks 
then Impala ended up escaping qualified identifiers of complex types 
incorrectly. I noticed that these references were already in a string form by 
the time getIdentSql gets called, so there is no way to properly quote them 
without refactoring to pass them in a structured format to the affected 
functions.

The "Invalid column/field name" error message mislead me to believe that dots 
are not allowed in any identifier, which lead me to come up with my approach 
which would work if identifiers really couldn't contain dots. Since this turned 
out to be a false assumption, it seems that I have to discard this change as 
this task can only be implemented properly after IMPALA-2287 and probably some 
other refactorings.


-- 
To view, visit http://gerrit.cloudera.org:8080/4527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output

2016-09-28 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4527/2/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java
File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java:

Line 69:   public static String getIdentSql(String ident) {
> So we are in agreement then?
I agree that it's slighly less pleasing to the eye but I think 
correctness/unambiguity is more important than aesthetics. Also Hive does the 
same, so I think people are used to it.


Line 76: return 
Joiner.on(".").join(getStandaloneIdentSqlList((Splitter.on('.').split(ident;
> select `a.b.c`.`x.y.z` from (select 1 `x.y.z`) `a.b.c`
Hmmm. Interesting. On the other hand:

> create view test as select `a.b.c`.`x.y.z` from (select 1 `x.y.z`) `a.b.c`;
ERROR: AnalysisException: Invalid column/field name: x.y.z

This is a serious problem, because it means that identifier names are ambigous. 
I checked that in case of a struct, functions get "column_name.field_name" as 
the column name. I haven't checked your example yet, but my guess is that in 
this case the column name parameter will become "x.y.z". Still, the former 
should be quoted as `column_name`.`field_name` and latter as `x.y.z`. It seems 
that it's too late to fix this problem by the time getIdentSql gets called.


-- 
To view, visit http://gerrit.cloudera.org:8080/4527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output

2016-09-28 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4527/2/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java
File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java:

Line 69:   public static String getIdentSql(String ident) {
> The main issue with this approach is that all invocations of toSql() will n
Initially I was also worried that identifiers become "ugly" and considered 
making this behavior optional. I was thinking of a query option that the user 
can control using the SET statement. Then I checked how Hive works and I found 
that it already quotes every identifier (I checked the table and view 
definitions if I remember correctly), so I came to the conclusion that quoting 
everything should be okay. Of course I expected some suggestions to how it 
should work, that's why I didn't fix the styling issues in the test.


Line 76: return 
Joiner.on(".").join(getStandaloneIdentSqlList((Splitter.on('.').split(ident;
> I don't think splitting here is the right fix. An identifier itself could c
I checked that identifiers can't contain dots, or at least users can't create 
such identifiers. If you try specifying `a.b` as an identifier, Impala will 
complain that dots are not allowed in identifier names. Do you know of a code 
piece that gets around this mechanism and creates identifiers with dots in some 
other way?

Qualified table or column references are exactly the reason why I only made 
this version public and the other one private, as only this is safe to use to 
quote table or column references. In case of complex types, dots can become a 
part of either the table reference or the column reference.

In case of a struct, a column reference might look like column_name.field_name, 
which should be quoted as `column_name`.`field_name`. In case of a map or 
array, table references might look like table_name.map_or_array_name, which 
should be quoted `table_name`.`map_or_array_name`.

This is my understanding based on the short amount of time I spent on Impala, 
so please correct me if I'm wrong.


Line 88: if (ident.equals("*")) {
> This doesn't seem right or necessary. First, "*" is not an identifier (it i
Without this check a SELECT * became SELECT `*` which is invalid.


-- 
To view, visit http://gerrit.cloudera.org:8080/4527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output

2016-09-26 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has uploaded a new patch set (#2).

Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output
..

IMPALA-784: Use `-s in SHOW CREATE TABLE output

Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128
---
M fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java
M fe/src/main/java/com/cloudera/impala/analysis/DeleteStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/TableName.java
M fe/src/main/java/com/cloudera/impala/analysis/TableRef.java
M fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java
M fe/src/main/java/com/cloudera/impala/analysis/UpdateStmt.java
M fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java
8 files changed, 339 insertions(+), 338 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4527/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output

2016-09-23 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output
..


Patch Set 1:

The functional tests still fail with false positives like
- CREATE TABLE functional_kudu.dimtbl (id BIGINT, name STRING, zip INT)
+ CREATE TABLE `functional_kudu`.`dimtbl` (`id` BIGINT, `name` STRING, `zip` 
INT)

I haven't fixed these yet, because I still have to figure out where the 
expected strings come from, but first I wanted to make sure that we agree on 
the approach to be taken.

The same applies to the overly long lines in the unit test. That test is fixed 
functionally but not stylistically, as I didn't want to spend too much time on 
it until I get feedback about whether this is the right approach.

Thanks,

Zoltan

-- 
To view, visit http://gerrit.cloudera.org:8080/4527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output

2016-09-23 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4527

Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output
..

IMPALA-784: Use `-s in SHOW CREATE TABLE output

Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128
---
M fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java
M fe/src/main/java/com/cloudera/impala/analysis/DeleteStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/TableName.java
M fe/src/main/java/com/cloudera/impala/analysis/TableRef.java
M fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java
M fe/src/main/java/com/cloudera/impala/analysis/UpdateStmt.java
M fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java
8 files changed, 339 insertions(+), 338 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4527/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-13 Thread Zoltan Ivanfi (Code Review)
Hello Lars Volker, Matthew Jacobs, Internal Jenkins, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#16).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/experiments/CMakeLists.txt
R be/src/experiments/string-search-sse-test.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/string-search-test.cc
M be/src/runtime/string-search.h
M common/function-registry/impala_functions.py
9 files changed, 343 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/16
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi