[Impala-ASF-CR] IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command

2018-01-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8928 )

Change subject: IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" 
command
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2281
PS3, Line 2281: if (rowFormat.getFieldDelimiter() != null) {
  : sd.getSerdeInfo().putToParameters(
  : "serialization.format", 
rowFormat.getFieldDelimiter());
  : sd.getSerdeInfo().putToParameters("field.delim",
  : rowFormat.getFieldDelimiter());
  : }
  : if (rowFormat.getEscapeChar() != null) {
  : sd.getSerdeInfo().putToParameters("escape.delim", 
rowFormat.getEscapeChar());
  : }
  : if (rowFormat.getLineDelimiter() != null) {
  : sd.getSerdeInfo().putToParameters("line.delim", 
rowFormat.getLineDelimiter());
  : }
> That block is similar to the block between L2305-2318. I believe you should
Sorry, I meant SerDeInfo not StorageDescriptor. For the case of HdfsPartition 
you get it by calling getSerdeInfo() and for the case of table you get it from 
the StorageDescriptor.



--
To view, visit http://gerrit.cloudera.org:8080/8928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154
Gerrit-Change-Number: 8928
Gerrit-PatchSet: 3
Gerrit-Owner: Adam Holley 
Gerrit-Reviewer: Adam Holley 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Comment-Date: Sat, 06 Jan 2018 05:09:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] PREVIEW: IMPALA-6372: Go parallel for Hive dataload

2018-01-05 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8894 )

Change subject: PREVIEW: IMPALA-6372: Go parallel for Hive dataload
..


Patch Set 6:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/8894/6/bin/load-data.py
File bin/load-data.py:

http://gerrit.cloudera.org:8080/#/c/8894/6/bin/load-data.py@a114
PS6, Line 114:
 :
 :
 :
 :
 :
 :
 :
 :
 :
This comment is still needed with Impyla HS2, add to exec_hive_query_from_file.


http://gerrit.cloudera.org:8080/#/c/8894/6/bin/load-data.py@42
PS6, Line 42: num_processes = multiprocessing.cpu_count()
It might make sense for this to be a command line argument.


http://gerrit.cloudera.org:8080/#/c/8894/6/bin/load-data.py@151
PS6, Line 151:   with connect(host=hs2_host, port=hs2_port, 
auth_mechanism="PLAIN", \
 :user=getpass.getuser(), database="default") 
as conn:
Currently only works for non-secure cluster. This a regression compared to the 
Beeline execution, but even the current Beeline command doesn't work if the 
cluster has SSL. We never use this script on a secure cluster now, but we will 
want to in future.

Revert to Beeline or add long comment about making this work with secure.


http://gerrit.cloudera.org:8080/#/c/8894/6/bin/load-data.py@163
PS6, Line 163: out_file.write("ERROR: {0}\n{1}\n".format(query, str(e)))
Might be nice to rename the file to be distinctive in the case of an error. 
This would make debugging an error easier.


http://gerrit.cloudera.org:8080/#/c/8894/6/bin/load-data.py@275
PS6, Line 275: print "Execution of {0} SQL file {1} 
failed.".format(execution_type, query_file)
This should include information about the log file to look at.


http://gerrit.cloudera.org:8080/#/c/8894/6/bin/load-data.py@280
PS6, Line 280: def get_files_by_pattern(directory_contents, file_pattern):
 :   return [f for f in directory_contents if file_pattern in f]
Unused. Remove.


http://gerrit.cloudera.org:8080/#/c/8894/6/bin/load-data.py@315
PS6, Line 315: #os.chdir(sql_dir)
Remove and clarify the need for absolute paths rather than relative paths.


http://gerrit.cloudera.org:8080/#/c/8894/6/bin/load-data.py@336
PS6, Line 336: # Use maximum parallelism for this part, as these are almost 
entirely metadata
 : # operations.
Comment out of date. Remove


http://gerrit.cloudera.org:8080/#/c/8894/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/8894/6/testdata/bin/generate-schema-statements.py@627
PS6, Line 627: invalidate_table_stmt = "INVALIDATE METADATA 
{0}.{1};\n".format(db, table_name)
 :   impala_invalidate.create.append(invalidate_table_stmt)
Some tables should be able to use refresh instead. I think only Hive-created 
tables need invalidate.


http://gerrit.cloudera.org:8080/#/c/8894/6/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/8894/6/testdata/datasets/functional/functional_schema_template.sql@330
PS6, Line 330: USE {db_name}{db_suffix};
Not needed, remove.



--
To view, visit http://gerrit.cloudera.org:8080/8894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I34b71e6df3c8f23a5a31451280e35f4dc015a2fd
Gerrit-Change-Number: 8894
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Sat, 06 Jan 2018 02:15:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6231: Implement decimal v2 fuzz test

2018-01-05 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8898 )

Change subject: IMPALA-6231: Implement decimal_v2 fuzz test
..

IMPALA-6231: Implement decimal_v2 fuzz test

Implement a test that generates random decimal numbers in the pytest
framework, performs a random mathemtaical operation in Impala and
verifies that the result is correct by doing the same operating using
the Python decimal module. We try to generate not only completely random
decimal numbers, but also numbers that have interesting properties, such
as the number being a power of two.

Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85
---
A tests/query_test/test_decimal_fuzz.py
1 file changed, 249 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/8898/2
--
To view, visit http://gerrit.cloudera.org:8080/8898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85
Gerrit-Change-Number: 8898
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-6231: Implement decimal v2 fuzz test

2018-01-05 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8898 )

Change subject: IMPALA-6231: Implement decimal_v2 fuzz test
..


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py
File tests/query_test/test_decimal_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@33
PS1, Line 33:   # Set the Python decimal context to a large precision 
initially, so that the
:   # mathematical operations are performed at a high precision.
:   decimal.getcontext().prec = 80
:   decimal.getcontext().rounding = decimal.ROUND_HALF_UP
> Use def setup_method and teardown_method for this?
I used local decimal context to solve the issue.


http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@46
PS1, Line 46: create_exec_option_dimension_from_dict({'_': ['_']})
> Sorry, what does this do?
I'm just trying to remove all test dimensions that we normally use (such as 
file format). I'm just adding a useless dimension here, because the test 
doesn't run with zero dimensions. Is there a better way to do this?


http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@65
PS1, Line 65: Returns a decimal with a random precision, scale and value.
> What about something a little more precise, like "Return a 3-tuple with str
Done


http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@70
PS1, Line 70: 38
> I think we want 38
Yes, I think we want 38. It's ok that 38 shows up in both extreme precision and 
random precision tests. We want random to be completely random and all values 
should be possible.


http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@134
PS1, Line 134: decimal_value
> It was a little confusing at first to know what the intention was here. May
Done


http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@204
PS1, Line 204:   truncated_expected = expected.quantize(
> Nice.  I didn't know you could do this.  I tried a few of the more bizarre
Yeah, it's pretty cool. I learned about this while developing this test.


http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@230
PS1, Line 230:   if op == '+':
 : expected_result = decimal.Decimal(value1) + 
decimal.Decimal(value2)
 :   elif op == '-':
 : expected_result = decimal.Decimal(value1) - 
decimal.Decimal(value2)
 :   elif op == '*':
 : expected_result = decimal.Decimal(value1) * 
decimal.Decimal(value2)
 :   elif op == '/':
 : expected_result = decimal.Decimal(value1) / 
decimal.Decimal(value2)
 :   elif op == '%':
 : expected_result = decimal.Decimal(value1) % 
decimal.Decimal(value2)
> Up to you, but you could consider the operator module so you don't have to
Are you suggesting to create a mapping m that maps, for example, "+" to 
operator.add()? Then calculate result like this m[op](value1, value2)?

I don't think doing this is much better than the current approach.


http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@248
PS1, Line 248: def test_fuzz(self, vector):
> You could parallelize the test by making the iteration a test parameter.
Cool tip, but I decided against making each iteration a separate test. Too much 
stuff gets printed to screen.



--
To view, visit http://gerrit.cloudera.org:8080/8898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85
Gerrit-Change-Number: 8898
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Sat, 06 Jan 2018 02:14:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] PREVIEW: IMPALA-6372: Go parallel for Hive dataload

2018-01-05 Thread Joe McDonnell (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8894

to look at the new patch set (#6).

Change subject: PREVIEW: IMPALA-6372: Go parallel for Hive dataload
..

PREVIEW: IMPALA-6372: Go parallel for Hive dataload

This changes generate-schema-statements.py to produce
separate SQL files for different file formats for Hive.
This changes load-data.py to go parallel on these
separate Hive SQL files. For correctness, the text
version of all tables must be loaded before any
of the other file formats.

load-data.py runs DDLs to create the tables in Impala
and goes parallel. Currently, there are some minor
dependencies so that text tables must be created
prior to creating the other table formats. This
changes the definitions of some tables in
testdata/datasets/functional/functional_schema_template.sql
to remove these dependencies. Now, the DDLs for the
text tables can run in parallel to the other file formats.

To unify the parallelism for Impala and Hive, load-data.py
now uses a single fixed-size pool of processes to run all
SQL files rather than spawning a thread per SQL file.

This currently switches Hive execution to go through
Impyla's HS2 support rather than Beeline. This part is
in flux.

Speeding up Hive causes TPC-H to finish very quickly,
while TPC-DS and functional are still doing DDLs.
TPC-H's invalidate metadata can cause errors in TPC-DS
or functional due IMPALA-5087. To avoid this,
generate-schema-statements.py generates a SQL file
to invalidate metadata for each table individually
and load-data.py uses this rather than a universal
invalidate.

This saves about 15-20 minutes on dataload (including
for GVO).

Change-Id: I34b71e6df3c8f23a5a31451280e35f4dc015a2fd
---
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
M testdata/datasets/functional/functional_schema_template.sql
3 files changed, 137 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/8894/6
--
To view, visit http://gerrit.cloudera.org:8080/8894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I34b71e6df3c8f23a5a31451280e35f4dc015a2fd
Gerrit-Change-Number: 8894
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8490 )

Change subject: IMPALA-2248: Make idle_session_timeout a query option
..

IMPALA-2248: Make idle_session_timeout a query option

This commit makes idle_session_timeout a query option.

idle_session_timeout currently can be set as a command line
option, which will be the default timeout for sessions.
HS2 sessions can override it with a smaller value by setting
it in the configuration overlay of HS2 OpenSession().

However, we can't override idle_session_timeout for JDBC/ODBC
connections, because we cannot put this in the connection string.

This commit is a workaround for this problem, it allows JDBC/ODBC
connections to set the session timeout as a query option
with the SET statement.

After this commit, the session timeout can be overridden to
any value, i.e. the command line flag idle_session_timeout
doesn't limit this option anymore.

I created an automated test case in JdbcTest.java based on
test_hs2.py::test_concurrent_session_mixed_idle_timeout. I also
extended the test_session_expiration and test_set_and_unset
test suites.

Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
Reviewed-on: http://gerrit.cloudera.org:8080/8490
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/service/client-request-state.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/test/java/org/apache/impala/service/JdbcTest.java
A fe/src/test/java/org/apache/impala/util/Metrics.java
M tests/custom_cluster/test_session_expiration.py
M tests/custom_cluster/test_set_and_unset.py
M tests/hs2/hs2_test_suite.py
M tests/hs2/test_hs2.py
14 files changed, 391 insertions(+), 60 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
Gerrit-Change-Number: 8490
Gerrit-PatchSet: 22
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8490 )

Change subject: IMPALA-2248: Make idle_session_timeout a query option
..


Patch Set 21: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
Gerrit-Change-Number: 8490
Gerrit-PatchSet: 21
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sat, 06 Jan 2018 01:47:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3651: Adds murmur hash() built-in function

2018-01-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8893 )

Change subject: IMPALA-3651: Adds murmur_hash() built-in function
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8893/4/testdata/workloads/functional-query/queries/QueryTest/exprs.test
File testdata/workloads/functional-query/queries/QueryTest/exprs.test:

http://gerrit.cloudera.org:8080/#/c/8893/4/testdata/workloads/functional-query/queries/QueryTest/exprs.test@2922
PS4, Line 2922: select murmur_hash(NULL)
Tests like this belong in expr-test.cc

We typically prefer expr-test.cc whenever possible. We rely on this exprs.test 
for cases where we want to test the expression in a "real" query that scans 
table data. The new test cases do not fall into that category so are better 
added to expr-test.cc



--
To view, visit http://gerrit.cloudera.org:8080/8893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14d56ffb8fab256f3f66a2669271fd4b3c50cc29
Gerrit-Change-Number: 8893
Gerrit-PatchSet: 4
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 06 Jan 2018 01:43:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8034 )

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..


Patch Set 21: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 21
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 06 Jan 2018 01:30:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8034 )

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..

IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

Currently DictDecoder class and DictEncoder class uses std::vector
to store the tables mapping codeword to value and vice-versa. It is
hard to detect the memory usage by these tables when they becomes
very large, since this memory is not accounted by Impala's memory
mangement infrastructure.

This patch uses the memory tracker of HdfsScanner to track the memory used
by dictionary in DictDecoder class. Similary it uses memory tracker of
HdfsTableSink to track the memory used by dictionary in DictEncoder class.

Memory for the dictionary, stored as std::vector is still allocated
from std:allocator but the amount allocated is accounted by
introducing a counter which is incremented and decremented as the
memory is consumed and released by vector.

Testing
---
Ran all the backend and end-end tests with no failures.

Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Reviewed-on: http://gerrit.cloudera.org:8080/8034
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
5 files changed, 173 insertions(+), 24 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 22
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase

2018-01-05 Thread Alex Behm (Code Review)
Alex Behm has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8958


Change subject: IMPALA-5152: Introduce metadata loading phase
..

IMPALA-5152: Introduce metadata loading phase

Reworks the collection and loading of missing metadata
when compiling a statement. Introduces a new
metadata-loading phase between parsing and analysis.
Summary of the new compilation flow:
1. Parse statement.
2. Collect all table references from the parsed
   statement and generate a list of tables that need
   to be loaded for analysis to succeed.
3. Request missing metadata and wait for it to arrive.
   As views become loaded we expand the set of required
   tables based on the view definitions.
   This step populates a statement-local table cache
   that contains all loaded tables relevant to the
   statement.
4. Create a new Analyzer with the table cache and
   analyze the statement. During analysis only the
   table cache is consulted for table metadata, the
   ImpaladCatalog is not used for that purpose anymore.
5. Authorize the statement.
6. Plan generation as usual.

The intent of the existing code was to collect all tables
missing metadata during analysis, load the metadata, and then
re-analyze the statement (and repeat those steps until all
metadata is loaded).
Unfortunately, the relevant code was hard-to-follow, subtle
and not well tested, and therefore it was broken in several
ways over the course of time. For example, the introduction
of path analysis for nested types subtly broke the intended
behavior, and there are other similar examples.

The serial table loading observed in the JIRA was caused by the
following code in the resolution of table references:
for (all path interpretations) {
  try {
// Try to resolve the path; might call getTable() which
// throws for nonexistent tables.
  } catch (AnalysisException e) {
if (analyzer.hasMissingTbls()) throw e;
  }
}

The following example illustrates the problem:
SELECT * FROM a.b, x.y
When resolving the path "a.b" we consider that "a" could be a
database or a table. Similarly, "b" could be a table or a
nested collection.
If the path resolution for "a.b" adds a missing table entry,
then the path resolution for "x.y" could exit prematurely,
without trying the other path interpretations that would
lead to adding the expected missing table. So effectively,
the tables end up being loaded one-by-one.

Testing:
- A core/hdfs run succeeded
- No new tests were added because the existing functional tests
  provide good coverage of various metadata loading scenarios.
- The issue reported in IMPALA-5152 is basically impossible now.
  Adding FE unit tests for that bug specifically would require
  ugly changes to the new code to enable such testing.

Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/AuthorizationStmt.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescribeDbStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropDbStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/FromClause.java
M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/LimitElement.java
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SetStmt.java
M 

[Impala-ASF-CR] IMPALA-6307: CTAS statement fails with duplicate column exception.

2018-01-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8930 )

Change subject: IMPALA-6307: CTAS statement fails with duplicate column 
exception.
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@223
PS1, Line 223: Preconditions.checkState(colDefs.size() + 
partitionColDefs.size() == types.size());
 : for (int i = 0; i < colDefs.size(); ++i) 
colDefs.get(i).setType(types.get(i));
 : for (int i = 0; i < partitionColDefs.size(); ++i) {
 :   partitionColDefs.get(i).setType(types.get(i + 
colDefs.size()));
 : }
I believe it would be nice to add a comment in AnalysisContext.java L405 to 
indicate what these types represent for the case of a CTAS. Then it will be 
more clear why this block is needed.


http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/main/java/org/apache/impala/analysis/TableDef.java@160
PS1, Line 160: dataLayout_.getPartitionColumnDefs().clear();
I would add a reset() function in TableDataLayout() and put that there.


http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1510
PS1, Line 1510: AnalyzesOk("create table functional.ctas_tbl partitioned by 
(year) as " +
  : "with tmp as (select a.timestamp_col, a.year from 
functional.alltypes a " +
  : "left join functional.alltypes b " +
  : "on b.timestamp_col between a.timestamp_col and 
a.timestamp_col) " +
  : "select a.timestamp_col, a.year from tmp a");
Add a comment. It will not be clear to everyone what this thing is testing.



--
To view, visit http://gerrit.cloudera.org:8080/8930
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2
Gerrit-Change-Number: 8930
Gerrit-PatchSet: 1
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Comment-Date: Sat, 06 Jan 2018 01:13:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

2018-01-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8820 )

Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
..


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8820/9/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

http://gerrit.cloudera.org:8080/#/c/8820/9/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@124
PS9, Line 124: if (!Table.isExternalTable(table_.getMetaStoreTable()) &&
 : !setsToExternal) {
nit: check if they fit in a single line (I could be wrong)


http://gerrit.cloudera.org:8080/#/c/8820/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/8820/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@315
PS5, Line 315: "Cannot create tab
> I kind of disagree here: The underlying function is 5 lines long, and readi
In principal yes, in this case no. The name of this function doesn't reveal 
that you're actually analyzing this tbl property (checking for its existence) 
and setting its value at the same time, so as a reader I have to check the 
definition of this function. So in my mind this is pure overhead and doesn't 
really help me understand the code. Feel free to keep it if you want though, 
it's not that important :)


http://gerrit.cloudera.org:8080/#/c/8820/9/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/8820/9/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@67
PS9, Line 67: // Kudu table name generated during analysis for managed Kudu 
tables
:   private String generatedKuduTableNameProperty_ = "";
Since all the tblproperties live in TableDef, I believe it makes more sense to 
put that generated field there as well.


http://gerrit.cloudera.org:8080/#/c/8820/9/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/8820/9/tests/query_test/test_kudu.py@327
PS9, Line 327: try:
 :   cursor.execute("ALTER TABLE %s.foo SET 
TBLPROPERTIES('kudu.table_name'='blah')"
 :   % (unique_database))
 : except Exception as e:
 :   expected_error = "AnalysisException: Not allowed to set 
'kudu.table_name' " + \
 :   "manually for managed Kudu tables"
 :   assert expected_error in str(e)
That should be an analysis test.



--
To view, visit http://gerrit.cloudera.org:8080/8820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
Gerrit-Change-Number: 8820
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sat, 06 Jan 2018 01:02:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3942: Fix wronly escaped string literal in front-end

2018-01-05 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8818 )

Change subject: IMPALA-3942: Fix wronly escaped string literal in front-end
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8818/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8818/2//COMMIT_MSG@7
PS2, Line 7: wronly
wrongly


http://gerrit.cloudera.org:8080/#/c/8818/2//COMMIT_MSG@10
PS2, Line 10: There are some holes in escaping the string literal
Can you expand on this a little more, eg mention that:

- toSql() always returns strings that are single quoted, resulting in improper 
escaping in the output if the original string was actually double quoted
- Its not always possible to determine if a string "should" be single or double 
quoted, eg concat('a', "b") and that's why we just choose one to always 
normalize to.


http://gerrit.cloudera.org:8080/#/c/8818/2/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java:

http://gerrit.cloudera.org:8080/#/c/8818/2/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@39
PS2, Line 39:   private final String cached_escaped_value_;
Add a comment explaining what this is.

Also, per my comments below, let's get rid of the concept of this being a 
'cached' value, and name it something like 'escaped_single_quoted_value_'.


http://gerrit.cloudera.org:8080/#/c/8818/2/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@104
PS2, Line 104: getEscapedValue
> Escaping and unescaping the value happen here. Do you have more intuitive n
You might name it something like "getEscapedSingleQuotedValue" to make it more 
clear what its doing.


http://gerrit.cloudera.org:8080/#/c/8818/2/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@105
PS2, Line 105: if (cached_escaped_value_ != null) return cached_escaped_value_;
Since getEscapedValue() is called in the constructor, this will always be true 
outside the constructor.

So, let's simplify the logic here by removing this and only calling 
getEscapedValue() in the constructor, then using the cached_escaped_value_ 
directly in the other places, eg. toSqlImpl().


http://gerrit.cloudera.org:8080/#/c/8818/2/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@121
PS2, Line 121: i++
nit: ++i



--
To view, visit http://gerrit.cloudera.org:8080/8818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc4b5f5d8ffaa8feb96a466959427a04b3b06fec
Gerrit-Change-Number: 8818
Gerrit-PatchSet: 2
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 06 Jan 2018 00:44:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6348: Redact only sensitive fields in runtime profiles

2018-01-05 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8934 )

Change subject: IMPALA-6348: Redact only sensitive fields in runtime profiles
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8934/2/tests/custom_cluster/test_redaction.py
File tests/custom_cluster/test_redaction.py:

http://gerrit.cloudera.org:8080/#/c/8934/2/tests/custom_cluster/test_redaction.py@336
PS2, Line 336: self.assert_query_profile_contains(self.find_last_query_id(), 
user_profile_pattern)
> You mean L315/L317?  We ran a different query in L335, find_last_query_id()
My bad. Ignore.



--
To view, visit http://gerrit.cloudera.org:8080/8934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae3b6726009bf458a7ec73131e5d659b12ab73cf
Gerrit-Change-Number: 8934
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 05 Jan 2018 23:55:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4168: Adds Oracle-style hint placement for INSERT/UPSERT

2018-01-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8676 )

Change subject: IMPALA-4168: Adds Oracle-style hint placement for INSERT/UPSERT
..


Patch Set 9: Code-Review+2

Can you also an example of two of the new allowed syntax? Thanks


--
To view, visit http://gerrit.cloudera.org:8080/8676
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied7629d70197a0270cdc0853e00cc021fdb4dc20
Gerrit-Change-Number: 8676
Gerrit-PatchSet: 9
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 05 Jan 2018 23:53:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command

2018-01-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8928 )

Change subject: IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" 
command
..


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/8928/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8928/3//COMMIT_MSG@10
PS3, Line 10: metadatad.
typo: metadata

Also, you don't need to outline the changes at that level. Instead, for this 
change you can add an example or the new syntax allowed.


http://gerrit.cloudera.org:8080/#/c/8928/3/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

http://gerrit.cloudera.org:8080/#/c/8928/3/common/thrift/JniCatalog.thrift@269
PS3, Line 269: file
Did you mean 'row' here?


http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/cup/sql-parser.cup@1007
PS3, Line 1007: // row_format_val cannot be used as it conflicts with 
cache_op_val.
Hm, why would it conflict with cache_op_val? Can you post the error message if 
you use row_format_val here?


http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java:

http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@37
PS3, Line 37: this.
nit: you can remove 'this'. All the private fields are ending with '_' (see 
rowFormat_), so it's pretty clear what this variable represents.


http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@49
PS3, Line 49: rowFormatParams.setPartition_set(getPartitionSet().toThrift());
:   }
nit: indentation off by 2


http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@59
PS3, Line 59: tbl instanceof KuduTable
I think there are a few more cases to consider here. How about HBase tables? 
Also, what happens if the file format is not text (e.g. parquet)? Should we 
still allow this to be set?


http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@463
PS3, Line 463: if (rowFormatParams.isSetPartition_set()) {
 :   resultColVal.setString_val(
 :   "Updated " + numUpdatedPartitions.getRef() + " 
partition(s).");
 :   } else {
 :   resultColVal.setString_val("Updated table.");
 :   }
nit: indentation is off


http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2269
PS3, Line 2269: boolean
Add comment about the return value. Also, fix indentation of this function.


http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2281
PS3, Line 2281: if (rowFormat.getFieldDelimiter() != null) {
  : sd.getSerdeInfo().putToParameters(
  : "serialization.format", 
rowFormat.getFieldDelimiter());
  : sd.getSerdeInfo().putToParameters("field.delim",
  : rowFormat.getFieldDelimiter());
  : }
  : if (rowFormat.getEscapeChar() != null) {
  : sd.getSerdeInfo().putToParameters("escape.delim", 
rowFormat.getEscapeChar());
  : }
  : if (rowFormat.getLineDelimiter() != null) {
  : sd.getSerdeInfo().putToParameters("line.delim", 
rowFormat.getLineDelimiter());
  : }
That block is similar to the block between L2305-2318. I believe you should be 
able to create a utility function that takes a RowFormat and a 
StorageDescriptor and updates the latter.


http://gerrit.cloudera.org:8080/#/c/8928/3/tests/query_test/test_delimited_text.py
File tests/query_test/test_delimited_text.py:

http://gerrit.cloudera.org:8080/#/c/8928/3/tests/query_test/test_delimited_text.py@73
PS3, Line 73: def test_delimited_text_alter(self, vector, unique_database):
I believe most of these tests should be in test_ddl.py (alter-table.test).


http://gerrit.cloudera.org:8080/#/c/8928/3/tests/query_test/test_delimited_text.py@148
PS3, Line 148: def test_delimited_text_alter_kudu_fail(self, vector, 
unique_database):
 : """ Test alter row format statement.  IMPALA-4323.  Ensure 
alter statement fails
 : for Kudu. """
 : 

[Impala-ASF-CR] IMPALA-6348: Redact only sensitive fields in runtime profiles

2018-01-05 Thread Bharath Vissapragada (Code Review)
Hello Sailesh Mukil, anujphadke,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8934

to look at the new patch set (#3).

Change subject: IMPALA-6348: Redact only sensitive fields in runtime profiles
..

IMPALA-6348: Redact only sensitive fields in runtime profiles

Without this patch, redaction is applied to every field in the
runtime profile. This approach has an undesired side effect when
Kerberos auth + email redaction is in place.

Since the redaction applies to every field, even principals
(from Connected/Delegated User fields) are redacted, as the Kerberos
principal format generally pattern matches with an email redactor
template.

This is particularly problematic for monitoring tools that consume
runtime profiles and use these fields to group the queries by user.

This patch fixes the problem by redacting only the following sensitive
fields.

- Query Statement
- Error logs (since they can contain column references etc.)
- Query Status
- Query Plan

Other fields in the runtime profile are left unredacted.

Change-Id: Iae3b6726009bf458a7ec73131e5d659b12ab73cf
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-server.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M tests/custom_cluster/test_redaction.py
6 files changed, 56 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/8934/3
--
To view, visit http://gerrit.cloudera.org:8080/8934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iae3b6726009bf458a7ec73131e5d659b12ab73cf
Gerrit-Change-Number: 8934
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] Draft: Add JSON output to MemTracker::LogUsage()

2018-01-05 Thread Lars Volker (Code Review)
Lars Volker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8955


Change subject: Draft: Add JSON output to MemTracker::LogUsage()
..

Draft: Add JSON output to MemTracker::LogUsage()

TODO: Function comments in mem-tracker.h may need updating.

Change-Id: I0a59c39b0a85c78f1e8db10a93cbb814dc6adfa0
---
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/util/default-path-handlers.cc
M tests/verifiers/mem_usage_verifier.py
4 files changed, 83 insertions(+), 45 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/8955/1
--
To view, visit http://gerrit.cloudera.org:8080/8955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0a59c39b0a85c78f1e8db10a93cbb814dc6adfa0
Gerrit-Change-Number: 8955
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 


[Impala-ASF-CR] IMPALA-6362: avoid Reservation/MemTracker deadlock

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8933 )

Change subject: IMPALA-6362: avoid Reservation/MemTracker deadlock
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7adbe961a925075422c685690dd3d1609779ced
Gerrit-Change-Number: 8933
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 22:57:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6362: avoid Reservation/MemTracker deadlock

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8933 )

Change subject: IMPALA-6362: avoid Reservation/MemTracker deadlock
..

IMPALA-6362: avoid Reservation/MemTracker deadlock

Avoid the circular dependency between ReservationTracker::lock_ and
MemTracker::child_trackers_lock_ by not acquiring
ReservationTracker::lock_ in GetReservation(), where an atomic
operation is sufficient.

Testing:
Added a unit test that reproed the deadlock.

Change-Id: Id7adbe961a925075422c685690dd3d1609779ced
Reviewed-on: http://gerrit.cloudera.org:8080/8933
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/runtime/bufferpool/reservation-tracker-test.cc
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/bufferpool/reservation-tracker.h
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
5 files changed, 75 insertions(+), 15 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id7adbe961a925075422c685690dd3d1609779ced
Gerrit-Change-Number: 8933
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6346: Potential deadlock in KrpcDataStreamMgr

2018-01-05 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8950 )

Change subject: IMPALA-6346: Potential deadlock in KrpcDataStreamMgr
..


Patch Set 1:

Can you please add a BE test for it ?


--
To view, visit http://gerrit.cloudera.org:8080/8950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7d1a8f12a4821092ca61ccc8a6f20c0404d56c7
Gerrit-Change-Number: 8950
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Fri, 05 Jan 2018 22:46:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3193: Show table's comment on show tables

2018-01-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8851 )

Change subject: IMPALA-3193: Show table's comment on show tables
..


Patch Set 4:

(1 comment)

Thanks for looking into it. I just wanted to iterate that it's a good practice 
to respond to high level comments so that reviewers have a better understanding 
of what kind of changes are in the patch that they are reviewing. In most 
cases, a simple "DONE" is sufficient, but you can use your judgement to see 
what kind of response is needed depending on the ask.

http://gerrit.cloudera.org:8080/#/c/8851/4/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/8851/4/fe/src/main/java/org/apache/impala/service/JniFrontend.java@304
PS4, Line 304: for (Table table: tables) {
 : org.apache.hadoop.hive.metastore.api.Table msTable
 :   = table.getMetaStoreTable(msClient);
 : if (msTable != null) {
 :   String comment = 
msTable.getParameters().get("comment");
 :   comments.add(comment != null ? comment : "");
 : }
 :   }
This is not acceptable from a performance point of view. You're making a call 
to HMS for every table that satisfies the params.pattern just to get the 
comment. You should return the comment (if any) only for already loaded tables 
(i.e. no table loading and no external calls to either the catalog or the HMS).



--
To view, visit http://gerrit.cloudera.org:8080/8851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I44f814af05db6f3c027718ade9f474f8b8153bcd
Gerrit-Change-Number: 8851
Gerrit-PatchSet: 4
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Comment-Date: Fri, 05 Jan 2018 22:44:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6346: Potential deadlock in KrpcDataStreamMgr

2018-01-05 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8950 )

Change subject: IMPALA-6346: Potential deadlock in KrpcDataStreamMgr
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7d1a8f12a4821092ca61ccc8a6f20c0404d56c7
Gerrit-Change-Number: 8950
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Fri, 05 Jan 2018 22:07:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8490 )

Change subject: IMPALA-2248: Make idle_session_timeout a query option
..


Patch Set 21: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
Gerrit-Change-Number: 8490
Gerrit-PatchSet: 21
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 05 Jan 2018 21:55:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8490 )

Change subject: IMPALA-2248: Make idle_session_timeout a query option
..


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1682/


--
To view, visit http://gerrit.cloudera.org:8080/8490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
Gerrit-Change-Number: 8490
Gerrit-PatchSet: 21
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 05 Jan 2018 21:56:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8490 )

Change subject: IMPALA-2248: Make idle_session_timeout a query option
..


Patch Set 21:

This solution and the behaviour seem reasonable to me. It's a bit unfortunate 
that we have to special-case this but it seems unavoidable.


--
To view, visit http://gerrit.cloudera.org:8080/8490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
Gerrit-Change-Number: 8490
Gerrit-PatchSet: 21
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 05 Jan 2018 21:56:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8034 )

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1681/


--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 21
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 21:52:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8034 )

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..


Patch Set 21: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 21
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 21:52:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3942: Fix wronly escaped string literal in front-end

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8818 )

Change subject: IMPALA-3942: Fix wronly escaped string literal in front-end
..


Patch Set 2:

(1 comment)

We should get this moving along again now that people are getting back from 
holidays.

http://gerrit.cloudera.org:8080/#/c/8818/2/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:

http://gerrit.cloudera.org:8080/#/c/8818/2/tests/query_test/test_queries.py@315
PS2, Line 315: self.prepare()
We should use the unique_database test fixture instead of this approach. E.g. 
test_scanners.py has a bunch of tests that create tables in temporary databases.



--
To view, visit http://gerrit.cloudera.org:8080/8818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc4b5f5d8ffaa8feb96a466959427a04b3b06fec
Gerrit-Change-Number: 8818
Gerrit-PatchSet: 2
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 21:50:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8508 )

Change subject: IMPALA-5237: Support a quoted string in date/time format
..


Patch Set 6:

Sorry for the slow response. If I understand correctly, you can't add the test 
case because of IMPALA-3942. It looks like you have a fix for IMPALA-3942 so I 
think we should finish the review on that and merge it before this patch.


--
To view, visit http://gerrit.cloudera.org:8080/8508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178
Gerrit-Change-Number: 8508
Gerrit-PatchSet: 6
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 21:38:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 21:21:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..

IMPALA-6364: Bypass file handle cache for ineligible files

Currently, all HdfsFileHandles are owned and constructed
by the file handle cache. When the file handle cache
is disabled or the file handle is not eligible for
caching, the HdfsFileHandle is stored exclusively in
ScanRange::exclusive_hdfs_fh_, but the HdfsFileHandle still
comes from the file handle cache. It is created via a call to
DiskIoMgr::GetCachedHdfsFileHandle() with 'require_new_handle'
set to true and destroyed via
DiskIoMgr::ReleaseCachedHdfsFileHandle() with 'destroy_handle'
set to true.

Recent testing has revealed that the lock on the file handle
cache is a bottleneck for workloads with many small remote
files. There is no benefit to storing these exclusive file
handles in the file handle cache, as they do not participate
in the caching.

This change introduces DiskIoMgr::GetExclusiveHdfsFileHandle()
and DiskIoMgr::ReleaseExclusiveHdfsFileHandle(). These are
equivalent to the Get/ReleaseCachedHdfsFileHandle() calls, except
they bypass the file handle cache and create/destroy the
file handle directly. ScanRange::Open()/Close(), which
populates and frees ScanRange::exclusive_hdfs_fh_, now uses
these new calls rather than accessing the file handle cache.
This avoids the locking entirely, solving the bottleneck.

To draw a distinction between the two codepaths, HdfsFileHandle
is now an abstract class with two subclasses:
 - CachedHdfsFileHandles cover all handles that live in file handle
   cache. Get/ReleaseCachedHdfsFileHandle() use this subclass.
 - ExclusiveHdfsFileHandles cover all cases where a file handle
   does not come from the cache. The new
   Get/ReleaseExclusiveHdfsFileHandle() use this subclass.

Separately, testing revealed that increasing the number of
partitions for the file handle cache also fixes the contention
problem. This changes the file handle cache to make the number
of partitions configurable via startup parameter
num_file_handle_cache_partitions. This allows mitigation of
future bottlenecks without a patch.

Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Reviewed-on: http://gerrit.cloudera.org:8080/8945
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins
---
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/handle-cache.h
M be/src/runtime/io/handle-cache.inline.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
6 files changed, 138 insertions(+), 87 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


Re: [Impala-ASF-CR] IMPALA-6296: Avoid crash caused by DCHECK in Codegen in debug mode

2018-01-05 Thread Manaswini Maharana
Thank you, everyone, for all the support and help. It was an amazing
learning experience. Looking forward to making much more contributions in
future.

Regards!
Mansi

On Fri, Jan 5, 2018 at 3:09 PM, Tim Armstrong (Code Review) <
ger...@cloudera.org> wrote:

> Tim Armstrong *posted comments* on this change.
>
> View Change 
>
> Patch set 6:
>
> Thank you for the contribution
>
>
> To view, visit change 8923 . To
> unsubscribe, visit settings .
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I93e2b1efb325100d01d398e68e789d87b877167e
> Gerrit-Change-Number: 8923
> Gerrit-PatchSet: 6
> Gerrit-Owner: Manaswini Maharana 
> Gerrit-Reviewer: Impala Public Jenkins
> Gerrit-Reviewer: Manaswini Maharana 
> Gerrit-Reviewer: Thomas Tauber-Marshall 
> Gerrit-Reviewer: Tim Armstrong 
> Gerrit-Comment-Date: Fri, 05 Jan 2018 20:09:33 +
> Gerrit-HasComments: No
>


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-05 Thread anujphadke (Code Review)
anujphadke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 2:

Can we test all the cases fixed this with 100% reproducible cases?
For Ex:
I commented out the change in  decompress.cc and ran the fuzz test and it 
passed.


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 05 Jan 2018 20:45:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3651: Adds murmur hash() built-in function

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8893 )

Change subject: IMPALA-3651: Adds murmur_hash() built-in function
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1677/


--
To view, visit http://gerrit.cloudera.org:8080/8893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14d56ffb8fab256f3f66a2669271fd4b3c50cc29
Gerrit-Change-Number: 8893
Gerrit-PatchSet: 4
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 20:44:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6370: fix partitioned parquet tables with nested types

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8947 )

Change subject: IMPALA-6370: fix partitioned parquet tables with nested types
..

IMPALA-6370: fix partitioned parquet tables with nested types

When materialising a nested collection, has_template_tuple() should use
the template tuple for the collection, not the top-level tuple.

Testing:
Added tests based on nested-types-basic.test that operate on a simple
partitioned table. The tests reliably crashed Impala before the fix.

Change-Id: Ic808b824ce3b31af0539036d8ca23d17b18deab4
Reviewed-on: http://gerrit.cloudera.org:8080/8947
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-scanner.h
A 
testdata/workloads/functional-query/queries/QueryTest/nested-types-basic-partitioned.test
M tests/query_test/test_nested_types.py
3 files changed, 385 insertions(+), 3 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic808b824ce3b31af0539036d8ca23d17b18deab4
Gerrit-Change-Number: 8947
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6370: fix partitioned parquet tables with nested types

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8947 )

Change subject: IMPALA-6370: fix partitioned parquet tables with nested types
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic808b824ce3b31af0539036d8ca23d17b18deab4
Gerrit-Change-Number: 8947
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 20:44:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6296: Avoid crash caused by DCHECK in Codegen in debug mode

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8923 )

Change subject: IMPALA-6296: Avoid crash caused by DCHECK in Codegen in debug 
mode
..


Patch Set 6:

Thank you for the contribution


--
To view, visit http://gerrit.cloudera.org:8080/8923
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93e2b1efb325100d01d398e68e789d87b877167e
Gerrit-Change-Number: 8923
Gerrit-PatchSet: 6
Gerrit-Owner: Manaswini Maharana 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Manaswini Maharana 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 20:09:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6362: avoid Reservation/MemTracker deadlock

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8933 )

Change subject: IMPALA-6362: avoid Reservation/MemTracker deadlock
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1680/


--
To view, visit http://gerrit.cloudera.org:8080/8933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7adbe961a925075422c685690dd3d1609779ced
Gerrit-Change-Number: 8933
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 19:16:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6362: avoid Reservation/MemTracker deadlock

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8933 )

Change subject: IMPALA-6362: avoid Reservation/MemTracker deadlock
..


Patch Set 5: Code-Review+2

carry


--
To view, visit http://gerrit.cloudera.org:8080/8933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7adbe961a925075422c685690dd3d1609779ced
Gerrit-Change-Number: 8933
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 19:16:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6362: avoid Reservation/MemTracker deadlock

2018-01-05 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8933 )

Change subject: IMPALA-6362: avoid Reservation/MemTracker deadlock
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7adbe961a925075422c685690dd3d1609779ced
Gerrit-Change-Number: 8933
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 19:14:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 2: Code-Review+1

(4 comments)

Thanks, this is looking good. Couple of minor comments then I'll let tianyi +2

http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@310
PS2, Line 310:   if line_split[0] == "id":
Can you make this an elif chain and assert that the column name is one of the 
expected column name. Just to make it less likely that an error in the tests 
results in it silently failing. I.e. something like

  if line_split[0] == 'id':
...
  elif line_split[0] == 'int':
  ...
  else:
assert line_split[0] == 'bigint_col'
...


http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@339
PS2, Line 339:   if line_split[0] == "tinyint_col":
Same here


http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@349
PS2, Line 349: %src_tbl)
nit: missing space after %


http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@351
PS2, Line 351: %
same here



--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 2
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 05 Jan 2018 19:09:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6346: Potential deadlock in KrpcDataStreamMgr

2018-01-05 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8950


Change subject: IMPALA-6346: Potential deadlock in KrpcDataStreamMgr
..

IMPALA-6346: Potential deadlock in KrpcDataStreamMgr

In KrpcDataStreamMgr::CreateRecvr() we take the lock_ and
then call recvr->TakeOverEarlySender() for all contexts.
recvr->TakeOverEarlySender() then calls
recvr_->mgr_->EnqueueDeserializeTask((), which can block if the
deserialize pool queue is full. The next thread to become available
in that queue will also have to acquire lock_, thus leading to a
deadlock.

We fix this by moving the EarlySendersList out of the
EarlySendersMap and dropping the lock before taking any actions on
the RPC contexts in the EarlySendersList. All functions called after
dropping 'lock_' do not require the lock to protect them as they are
thread safe.

Testing: Manually verified that queries work. It's not easy to write
a deterministic test case to catch this.

Change-Id: Ib7d1a8f12a4821092ca61ccc8a6f20c0404d56c7
---
M be/src/runtime/krpc-data-stream-mgr.cc
1 file changed, 18 insertions(+), 13 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8950/1
--
To view, visit http://gerrit.cloudera.org:8080/8950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib7d1a8f12a4821092ca61ccc8a6f20c0404d56c7
Gerrit-Change-Number: 8950
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-6362: avoid Reservation/MemTracker deadlock

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8933 )

Change subject: IMPALA-6362: avoid Reservation/MemTracker deadlock
..


Patch Set 4:

I'm confident it fixes the problem we saw - the deadlock was clear from the 
backtraces we got.


--
To view, visit http://gerrit.cloudera.org:8080/8933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7adbe961a925075422c685690dd3d1609779ced
Gerrit-Change-Number: 8933
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 18:43:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6362: avoid Reservation/MemTracker deadlock

2018-01-05 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8933 )

Change subject: IMPALA-6362: avoid Reservation/MemTracker deadlock
..


Patch Set 4:

Tim, thanks for this. Do you feel confident this fixes IMPALA-6362, or should 
we prepare a stress test run of a private build on a downstream cluster before 
merging the patch?


--
To view, visit http://gerrit.cloudera.org:8080/8933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7adbe961a925075422c685690dd3d1609779ced
Gerrit-Change-Number: 8933
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 18:20:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6075: Add Impala daemon metric for catalog version.

2018-01-05 Thread Pranay Singh (Code Review)
Pranay Singh has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8949


Change subject: IMPALA-6075: Add Impala daemon metric for catalog version.
..

IMPALA-6075: Add Impala daemon metric for catalog version.

This patch adds a new metric that shows the current version of catalog
which is currently used by impala daemon.

Testing:
Verified manually that the new metric for catalog version is displayed and it
corresponds to the latest version in catalogd.INFO log file.

Change-Id: Id2307eb434561ed74ff058106541c0ebda017d97
---
M be/src/service/impala-server.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
4 files changed, 21 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8949/1
--
To view, visit http://gerrit.cloudera.org:8080/8949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id2307eb434561ed74ff058106541c0ebda017d97
Gerrit-Change-Number: 8949
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1679/


--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 17:33:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-05 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..


Patch Set 3: Code-Review+2

Carry +2


--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 17:32:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 17:16:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6370: fix partitioned parquet tables with nested types

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8947 )

Change subject: IMPALA-6370: fix partitioned parquet tables with nested types
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1678/


--
To view, visit http://gerrit.cloudera.org:8080/8947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic808b824ce3b31af0539036d8ca23d17b18deab4
Gerrit-Change-Number: 8947
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 17:07:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6370: fix partitioned parquet tables with nested types

2018-01-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8947 )

Change subject: IMPALA-6370: fix partitioned parquet tables with nested types
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic808b824ce3b31af0539036d8ca23d17b18deab4
Gerrit-Change-Number: 8947
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 17:02:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6128: Add support for AES-CTR encryption when spilling to disk CFB mode is a stream cipher and is secure when used with a different nonce/IV for every message. However it can be

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8861 )

Change subject: IMPALA-6128: Add support for AES-CTR encryption when spilling 
to disk CFB mode is a stream cipher and is secure when used with a different 
nonce/IV for every message. However it can be a performance bottleneck. CTR 
mode is also stream cipher and is secure
..


Patch Set 4:

We have some internal infra at cloudera that can build Impala on multiple OSes. 
I'll run this patch through that before merging to make sure that it works as 
expected on older versions.


--
To view, visit http://gerrit.cloudera.org:8080/8861
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9debc240615dd8cdbf00ec8730cff62ffef52aff
Gerrit-Change-Number: 8861
Gerrit-PatchSet: 4
Gerrit-Owner: Xianda Ke 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Xianda Ke 
Gerrit-Comment-Date: Fri, 05 Jan 2018 16:55:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6128: Add support for AES-CTR encryption when spilling to disk CFB mode is a stream cipher and is secure when used with a different nonce/IV for every message. However it can be

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8861 )

Change subject: IMPALA-6128: Add support for AES-CTR encryption when spilling 
to disk CFB mode is a stream cipher and is secure when used with a different 
nonce/IV for every message. However it can be a performance bottleneck. CTR 
mode is also stream cipher and is secure
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8861
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9debc240615dd8cdbf00ec8730cff62ffef52aff
Gerrit-Change-Number: 8861
Gerrit-PatchSet: 4
Gerrit-Owner: Xianda Ke 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Xianda Ke 
Gerrit-Comment-Date: Fri, 05 Jan 2018 16:54:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3651: Adds murmur hash() built-in function

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8893 )

Change subject: IMPALA-3651: Adds murmur_hash() built-in function
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14d56ffb8fab256f3f66a2669271fd4b3c50cc29
Gerrit-Change-Number: 8893
Gerrit-PatchSet: 4
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 16:53:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6370: fix partitioned parquet tables with nested types

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8947 )

Change subject: IMPALA-6370: fix partitioned parquet tables with nested types
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8947/2/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

http://gerrit.cloudera.org:8080/#/c/8947/2/tests/query_test/test_nested_types.py@126
PS2, Line 126:   self.run_stmt_in_hive("""
> Hive+refresh is pretty slow, could we instead use a custom LOCATION pointin
Cool, that shaved a bit of time off of it (13s total vs 7s)



--
To view, visit http://gerrit.cloudera.org:8080/8947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic808b824ce3b31af0539036d8ca23d17b18deab4
Gerrit-Change-Number: 8947
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 16:50:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6370: fix partitioned parquet tables with nested types

2018-01-05 Thread Tim Armstrong (Code Review)
Hello Thomas Tauber-Marshall, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8947

to look at the new patch set (#3).

Change subject: IMPALA-6370: fix partitioned parquet tables with nested types
..

IMPALA-6370: fix partitioned parquet tables with nested types

When materialising a nested collection, has_template_tuple() should use
the template tuple for the collection, not the top-level tuple.

Testing:
Added tests based on nested-types-basic.test that operate on a simple
partitioned table. The tests reliably crashed Impala before the fix.

Change-Id: Ic808b824ce3b31af0539036d8ca23d17b18deab4
---
M be/src/exec/hdfs-scanner.h
A 
testdata/workloads/functional-query/queries/QueryTest/nested-types-basic-partitioned.test
M tests/query_test/test_nested_types.py
3 files changed, 385 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8947/3
--
To view, visit http://gerrit.cloudera.org:8080/8947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic808b824ce3b31af0539036d8ca23d17b18deab4
Gerrit-Change-Number: 8947
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option

2018-01-05 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8490 )

Change subject: IMPALA-2248: Make idle_session_timeout a query option
..


Patch Set 21:

There are several ways to fix that test case.
I chose the one that involves the least code modifications.

Other way of fixing it would be to add an extra param to 
impala::DebugQueryOptions(). This param would contain the default query 
options. Callers could pass ImpalaServer::default_query_options. After that the 
result of DebugQueryOptions() would change, because 
ImpalaServer::default_query_options is not equal to the default constructed 
TQueryOptions. Therefore, we would also need to modify some test cases that 
check the result of DebugQueryOptions(), e.g. 
TestAdmissionController::test_set_request_pool.

Other fix could be to not set 
ImpalaServer::default_query_options.idle_session_timeout. With this solution 
idle_session_timeout would be more aligned with TQueryOptions::query_timeout_s 
/ FLAGS_idle_query_timeout, but the result of the SET command would not show 
the effective value of the session timeout when it's not set explicitly.

Please tell me if you prefer the other solutions, or if I'm missing some point.


--
To view, visit http://gerrit.cloudera.org:8080/8490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
Gerrit-Change-Number: 8490
Gerrit-PatchSet: 21
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 05 Jan 2018 15:27:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option

2018-01-05 Thread Zoltan Borok-Nagy (Code Review)
Hello Michael Ho, Thomas Tauber-Marshall, Laszlo Gaal, Gabor Kaszab, Attila 
Jeges, Tim Armstrong, Csaba Ringhofer, Dan Hecht, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8490

to look at the new patch set (#20).

Change subject: IMPALA-2248: Make idle_session_timeout a query option
..

IMPALA-2248: Make idle_session_timeout a query option

This commit makes idle_session_timeout a query option.

idle_session_timeout currently can be set as a command line
option, which will be the default timeout for sessions.
HS2 sessions can override it with a smaller value by setting
it in the configuration overlay of HS2 OpenSession().

However, we can't override idle_session_timeout for JDBC/ODBC
connections, because we cannot put this in the connection string.

This commit is a workaround for this problem, it allows JDBC/ODBC
connections to set the session timeout as a query option
with the SET statement.

After this commit, the session timeout can be overridden to
any value, i.e. the command line flag idle_session_timeout
doesn't limit this option anymore.

I created an automated test case in JdbcTest.java based on
test_hs2.py::test_concurrent_session_mixed_idle_timeout. I also
extended the test_session_expiration and test_set_and_unset
test suites.

Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
---
M be/src/service/client-request-state.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/test/java/org/apache/impala/service/JdbcTest.java
A fe/src/test/java/org/apache/impala/util/Metrics.java
M tests/custom_cluster/test_session_expiration.py
M tests/custom_cluster/test_set_and_unset.py
M tests/hs2/hs2_test_suite.py
M tests/hs2/test_hs2.py
14 files changed, 391 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/8490/20
--
To view, visit http://gerrit.cloudera.org:8080/8490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
Gerrit-Change-Number: 8490
Gerrit-PatchSet: 20
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

2018-01-05 Thread Gabor Kaszab (Code Review)
Hello Laszlo Gaal, Zoltan Borok-Nagy, Attila Jeges, Dimitris Tsirogiannis, Tim 
Armstrong, Csaba Ringhofer, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8820

to look at the new patch set (#9).

Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
..

IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

This change disallows explicitly setting the Kudu table name property
for managed Kudu tables in a CREATE TABLE statement. The Kudu table
name property gets a generated value as the following:
'impala::db_name.table_name' where table_name is the one given in
the CREATE TABLE statement.
Providing the Kudu table name property when creating a managed Kudu
table results in an error without creating the table. E.g.:
CREATE TABLE t (i INT) STORED AS KUDU
  TBLPROPERTIES('kudu.table_name'='some_name');

Alongside the CREATE TABLE statement also the ALTER TABLE statement
is changed not to allow the modification of Kudu.table_name of
managed Kudu tables.

Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
M tests/query_test/test_kudu.py
6 files changed, 159 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/8820/9
--
To view, visit http://gerrit.cloudera.org:8080/8820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
Gerrit-Change-Number: 8820
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

2018-01-05 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8820 )

Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
..


Patch Set 8: Code-Review-1

There is a test failing in the core test suite. Please hold on with the 
reviewing until I figure out what happens.


--
To view, visit http://gerrit.cloudera.org:8080/8820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
Gerrit-Change-Number: 8820
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 05 Jan 2018 09:14:07 +
Gerrit-HasComments: No