[jira] [Commented] (IMPALA-7169) TestHdfsEncryption::()::test_drop_partition_encrypt fails to find file

2018-06-13 Thread Tianyi Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511888#comment-16511888
 ] 

Tianyi Wang commented on IMPALA-7169:
-

1. This tests claims that Hadoop 2.8+ exhibit certain behavior but when I was 
experimenting it's the same for Hadoop 2.6 as well. But this is not the reason 
the build failed.
2. The problem is Impala removed a partition but it didn't appear in trash. I 
don't know how consistent namenode is but I suspect something is asynchronous 
with moving a encrypted file. I will look into HDFS code.

> TestHdfsEncryption::()::test_drop_partition_encrypt fails to find file
> --
>
> Key: IMPALA-7169
> URL: https://issues.apache.org/jira/browse/IMPALA-7169
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.13.0
>Reporter: Lars Volker
>Assignee: Tianyi Wang
>Priority: Critical
>  Labels: broken-build, flaky
>
> {noformat}
> F 
> metadata/test_hdfs_encryption.py::TestHdfsEncryption::()::test_drop_partition_encrypt
>  metadata/test_hdfs_encryption.py:202: in test_drop_partition_encrypt
>  assert self.hdfs_client.exists(
>  E   assert   0xba6ee90>>('/user/jenkins/.Trash/Current/test-warehouse/test_encryption_db.db/t1/j=1/j1.txt')
>  E+  where  > = 
> .exists
>  E+where  0xba6ee90> =  0xba6ed50>.hdfs_client
>  E+  and   
> '/user/jenkins/.Trash/Current/test-warehouse/test_encryption_db.db/t1/j=1/j1.txt'
>  = ('jenkins', 
> 'test_encryption_db')
>  E+where  = 
> '/user/{0}/.Trash/Current/test-warehouse/{1}.db/t1/j=1/j1.txt'.format
>  E+and   'jenkins' = ()
>  E+  where  = getpass.getuser
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7170) "tests/comparison/data_generator.py populate" is broken

2018-06-13 Thread Michael Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Brown updated IMPALA-7170:
--
Summary: "tests/comparison/data_generator.py populate" is broken  (was: 
test/comparison is broken)

> "tests/comparison/data_generator.py populate" is broken
> ---
>
> Key: IMPALA-7170
> URL: https://issues.apache.org/jira/browse/IMPALA-7170
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.0
>Reporter: Tianyi Wang
>Priority: Major
>
> test/comparison in Impala 3.x is broken, presumably by the switch to Hadoop 3.
> Firstly, to run the tests in impala 3.x, the mini-cluster needs to be started 
> with YARN, which is not documented anywhere. 
> Then, data_generator.py will exit with the following error:
> {noformat}
> 2018-04-23 23:15:46,065 INFO:db_connection[752]:Dropping database randomness
> 2018-04-23 23:15:46,095 INFO:db_connection[234]:Creating database randomness
> 2018-04-23 23:15:52,390 INFO:data_generator[235]:Starting MR job to generate 
> data for randomness
> Traceback (most recent call last):
>   File "tests/comparison/data_generator.py", line 339, in 
> populator.populate_db(args.table_count, postgresql_conn=postgresql_conn)
>   File "tests/comparison/data_generator.py", line 134, in populate_db
> self._run_data_generator_mr_job([g for _, g in table_and_generators], 
> self.db_name)
>   File "tests/comparison/data_generator.py", line 244, in 
> _run_data_generator_mr_job
> % (reducer_count, ','.join(files), mapper_input_file, hdfs_output_dir))
>   File "/home/impdev/projects/impala/tests/comparison/cluster.py", line 476, 
> in run_mr_job
> stderr=subprocess.STDOUT, env=env)
>   File "/home/impdev/projects/impala/tests/util/shell_util.py", line 113, in 
> shell
> "\ncmd: %s\nstdout: %s\nstderr: %s") % (retcode, cmd, output, err))
> Exception: Command returned non-zero exit code: 1
> cmd: set -euo pipefail
> hadoop jar 
> /home/impdev/projects/impala/toolchain/cdh_components/hadoop-3.0.0-cdh6.x-SNAPSHOT/share/hadoop/tools/lib/hadoop-streaming-3.0.0-cdh6.x-SNAPSHOT.jar
>  -D mapred.reduce.tasks=36 \
> -D stream.num.map.output.key.fields=2 \
> -files 
> tests/comparison/common.py,tests/comparison/db_types.py,tests/comparison/data_generator_mapred_common.py,tests/comparison/data_generator_mapper.py,tests/comparison/data_generator_reducer.py,tests/comparison/random_val_generator.py
>  \
> -input /tmp/data_gen_randomness_mr_input_1524525348 \
> -output /tmp/data_gen_randomness_mr_output_1524525348 \
> -mapper data_generator_mapper.py \
> -reducer data_generator_reducer.py
> stdout: packageJobJar: [] 
> [/home/impdev/projects/impala/toolchain/cdh_components/hadoop-3.0.0-cdh6.x-SNAPSHOT/share/hadoop/tools/lib/hadoop-streaming-3.0.0-cdh6.x-SNAPSHOT.jar]
>  /tmp/streamjob2990195923122538287.jar tmpDir=null
> 18/04/23 23:15:53 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 18/04/23 23:15:53 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 18/04/23 23:15:54 INFO mapreduce.JobResourceUploader: Disabling Erasure 
> Coding for path: 
> /tmp/hadoop-yarn/staging/impdev/.staging/job_1524519161700_0002
> 18/04/23 23:15:54 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
> 18/04/23 23:15:54 INFO lzo.LzoCodec: Successfully loaded & initialized 
> native-lzo library [hadoop-lzo rev 2b3bd7731ff3ef5d8585a004b90696630e5cea96]
> 18/04/23 23:15:54 INFO mapred.FileInputFormat: Total input files to process : 
> 1
> 18/04/23 23:15:54 INFO mapreduce.JobSubmitter: number of splits:2
> 18/04/23 23:15:54 INFO Configuration.deprecation: mapred.reduce.tasks is 
> deprecated. Instead, use mapreduce.job.reduces
> 18/04/23 23:15:54 INFO Configuration.deprecation: 
> yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, 
> use yarn.system-metrics-publisher.enabled
> 18/04/23 23:15:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
> job_1524519161700_0002
> 18/04/23 23:15:54 INFO mapreduce.JobSubmitter: Executing with tokens: []
> 18/04/23 23:15:54 INFO conf.Configuration: resource-types.xml not found
> 18/04/23 23:15:54 INFO resource.ResourceUtils: Unable to find 
> 'resource-types.xml'.
> 18/04/23 23:15:54 INFO impl.YarnClientImpl: Submitted application 
> application_1524519161700_0002
> 18/04/23 23:15:54 INFO mapreduce.Job: The url to track the job: 
> http://c37e0835e988:8088/proxy/application_1524519161700_0002/
> 18/04/23 23:15:54 INFO mapreduce.Job: Running job: job_1524519161700_0002
> 18/04/23 23:16:00 INFO mapreduce.Job: Job job_1524519161700_0002 running in 
> uber mode : false
> 18/04/23 23:16:00 INFO mapreduce.Job:  map 0% reduce 0%
> 18/04/23 23:16:06 INFO mapreduce.Job: Job 

[jira] [Updated] (IMPALA-7170) test/comparison is broken

2018-06-13 Thread Tianyi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianyi Wang updated IMPALA-7170:

Component/s: Infrastructure

> test/comparison is broken
> -
>
> Key: IMPALA-7170
> URL: https://issues.apache.org/jira/browse/IMPALA-7170
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.0
>Reporter: Tianyi Wang
>Priority: Major
>
> test/comparison in Impala 3.x is broken, presumably by the switch to Hadoop 3.
> Firstly, to run the tests in impala 3.x, the mini-cluster needs to be started 
> with YARN, which is not documented anywhere. 
> Then, data_generator.py will exit with the following error:
> {noformat}
> 2018-04-23 23:15:46,065 INFO:db_connection[752]:Dropping database randomness
> 2018-04-23 23:15:46,095 INFO:db_connection[234]:Creating database randomness
> 2018-04-23 23:15:52,390 INFO:data_generator[235]:Starting MR job to generate 
> data for randomness
> Traceback (most recent call last):
>   File "tests/comparison/data_generator.py", line 339, in 
> populator.populate_db(args.table_count, postgresql_conn=postgresql_conn)
>   File "tests/comparison/data_generator.py", line 134, in populate_db
> self._run_data_generator_mr_job([g for _, g in table_and_generators], 
> self.db_name)
>   File "tests/comparison/data_generator.py", line 244, in 
> _run_data_generator_mr_job
> % (reducer_count, ','.join(files), mapper_input_file, hdfs_output_dir))
>   File "/home/impdev/projects/impala/tests/comparison/cluster.py", line 476, 
> in run_mr_job
> stderr=subprocess.STDOUT, env=env)
>   File "/home/impdev/projects/impala/tests/util/shell_util.py", line 113, in 
> shell
> "\ncmd: %s\nstdout: %s\nstderr: %s") % (retcode, cmd, output, err))
> Exception: Command returned non-zero exit code: 1
> cmd: set -euo pipefail
> hadoop jar 
> /home/impdev/projects/impala/toolchain/cdh_components/hadoop-3.0.0-cdh6.x-SNAPSHOT/share/hadoop/tools/lib/hadoop-streaming-3.0.0-cdh6.x-SNAPSHOT.jar
>  -D mapred.reduce.tasks=36 \
> -D stream.num.map.output.key.fields=2 \
> -files 
> tests/comparison/common.py,tests/comparison/db_types.py,tests/comparison/data_generator_mapred_common.py,tests/comparison/data_generator_mapper.py,tests/comparison/data_generator_reducer.py,tests/comparison/random_val_generator.py
>  \
> -input /tmp/data_gen_randomness_mr_input_1524525348 \
> -output /tmp/data_gen_randomness_mr_output_1524525348 \
> -mapper data_generator_mapper.py \
> -reducer data_generator_reducer.py
> stdout: packageJobJar: [] 
> [/home/impdev/projects/impala/toolchain/cdh_components/hadoop-3.0.0-cdh6.x-SNAPSHOT/share/hadoop/tools/lib/hadoop-streaming-3.0.0-cdh6.x-SNAPSHOT.jar]
>  /tmp/streamjob2990195923122538287.jar tmpDir=null
> 18/04/23 23:15:53 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 18/04/23 23:15:53 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 18/04/23 23:15:54 INFO mapreduce.JobResourceUploader: Disabling Erasure 
> Coding for path: 
> /tmp/hadoop-yarn/staging/impdev/.staging/job_1524519161700_0002
> 18/04/23 23:15:54 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
> 18/04/23 23:15:54 INFO lzo.LzoCodec: Successfully loaded & initialized 
> native-lzo library [hadoop-lzo rev 2b3bd7731ff3ef5d8585a004b90696630e5cea96]
> 18/04/23 23:15:54 INFO mapred.FileInputFormat: Total input files to process : 
> 1
> 18/04/23 23:15:54 INFO mapreduce.JobSubmitter: number of splits:2
> 18/04/23 23:15:54 INFO Configuration.deprecation: mapred.reduce.tasks is 
> deprecated. Instead, use mapreduce.job.reduces
> 18/04/23 23:15:54 INFO Configuration.deprecation: 
> yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, 
> use yarn.system-metrics-publisher.enabled
> 18/04/23 23:15:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
> job_1524519161700_0002
> 18/04/23 23:15:54 INFO mapreduce.JobSubmitter: Executing with tokens: []
> 18/04/23 23:15:54 INFO conf.Configuration: resource-types.xml not found
> 18/04/23 23:15:54 INFO resource.ResourceUtils: Unable to find 
> 'resource-types.xml'.
> 18/04/23 23:15:54 INFO impl.YarnClientImpl: Submitted application 
> application_1524519161700_0002
> 18/04/23 23:15:54 INFO mapreduce.Job: The url to track the job: 
> http://c37e0835e988:8088/proxy/application_1524519161700_0002/
> 18/04/23 23:15:54 INFO mapreduce.Job: Running job: job_1524519161700_0002
> 18/04/23 23:16:00 INFO mapreduce.Job: Job job_1524519161700_0002 running in 
> uber mode : false
> 18/04/23 23:16:00 INFO mapreduce.Job:  map 0% reduce 0%
> 18/04/23 23:16:06 INFO mapreduce.Job: Job job_1524519161700_0002 failed with 
> state FAILED due to: Application application_1524519161700_0002 failed 2 
> times due to AM Container 

[jira] [Resolved] (IMPALA-5216) Admission control queuing should be asynchronous

2018-06-13 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5216.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0
   Impala 2.13.0

Resolving on Bikram's behalf - he's afk for a bit.

> Admission control queuing should be asynchronous
> 
>
> Key: IMPALA-5216
> URL: https://issues.apache.org/jira/browse/IMPALA-5216
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Dan Hecht
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: admission-control, resource-management
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Currently, admission control queuing occurs synchronously w.r.t. the 
> {{ExecuteStatement}} client RPC. That is, a query handle is not returned 
> until the query is admitted.
> Instead, the queuing should occur on the asynchronous path.  That way, the 
> client gets a query handle back immediately and can e.g. cancel a query that 
> is in the admission control queue.
> We'll also need a way to better expose the progress of a query handle 
> (related to IMPALA-124). E.g. that the query is queued for admission and what 
> resource(s) it's waiting on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-5937) Docs are missing some query options

2018-06-13 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-5937:

Description: 
I noticed that the following options show up in "SET" in impala-shell but don't 
have corresponding documentation entries. I know BUFFER_POOL_LIMIT is mentioned 
in IMPALA-5655.

--BUFFER_POOL_LIMIT--
 -DECIMAL_V2-
 -DEFAULT_SPILLABLE_BUFFER_SIZE-
 DISABLE_CODEGEN_ROWS_THRESHOLD
 ENABLE_EXPR_REWRITES
 -MAX_ROW_SIZE-
 -MIN_SPILLABLE_BUFFER_SIZE-
 -PARQUET_ARRAY_RESOLUTION-
 PARQUET_DICTIONARY_FILTERING
 PARQUET_READ_STATISTICS
 -STRICT_MODE  /Dev option that Greg and Tim recommended not to doc-

  was:
I noticed that the following options show up in "SET" in impala-shell but don't 
have corresponding documentation entries. I know BUFFER_POOL_LIMIT is mentioned 
in IMPALA-5655.

--BUFFER_POOL_LIMIT--
 -DECIMAL_V2-
 -DEFAULT_SPILLABLE_BUFFER_SIZE-
 DISABLE_CODEGEN_ROWS_THRESHOLD
 ENABLE_EXPR_REWRITES
 -MAX_ROW_SIZE-
 -MIN_SPILLABLE_BUFFER_SIZE-
 -PARQUET_ARRAY_RESOLUTION-
 PARQUET_DICTIONARY_FILTERING
 PARQUET_READ_STATISTICS
 STRICT_MODE


> Docs are missing some query options
> ---
>
> Key: IMPALA-5937
> URL: https://issues.apache.org/jira/browse/IMPALA-5937
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Philip Zeyliger
>Assignee: Alex Rodoni
>Priority: Major
>
> I noticed that the following options show up in "SET" in impala-shell but 
> don't have corresponding documentation entries. I know BUFFER_POOL_LIMIT is 
> mentioned in IMPALA-5655.
> --BUFFER_POOL_LIMIT--
>  -DECIMAL_V2-
>  -DEFAULT_SPILLABLE_BUFFER_SIZE-
>  DISABLE_CODEGEN_ROWS_THRESHOLD
>  ENABLE_EXPR_REWRITES
>  -MAX_ROW_SIZE-
>  -MIN_SPILLABLE_BUFFER_SIZE-
>  -PARQUET_ARRAY_RESOLUTION-
>  PARQUET_DICTIONARY_FILTERING
>  PARQUET_READ_STATISTICS
>  -STRICT_MODE  /Dev option that Greg and Tim recommended not to doc-



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6532) NullPointerException in HiveContextAwareRecordReader.initIOContext() when executing Hive query

2018-06-13 Thread Joe McDonnell (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511619#comment-16511619
 ] 

Joe McDonnell commented on IMPALA-6532:
---

Created a change on 2.x that modifies tests that do run_stmt_in_hive() to be 
serial. I am running tests on this change.

> NullPointerException in HiveContextAwareRecordReader.initIOContext() when 
> executing Hive query
> --
>
> Key: IMPALA-6532
> URL: https://issues.apache.org/jira/browse/IMPALA-6532
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Bikramjeet Vig
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1273/console]
> {noformat}
> 02:48:03 [gw15] FAILED 
> metadata/test_partition_metadata.py::TestPartitionMetadata::test_partition_metadata_compatibility[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: avro/snap/block] 
> {noformat}
> {noformat}
> 03:29:11 === FAILURES 
> ===
> 03:29:11  
> TestPartitionMetadata.test_partition_metadata_compatibility[exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: avro/snap/block] 
> 03:29:11 [gw15] linux2 -- Python 2.7.12 
> /home/ubuntu/Impala/bin/../infra/python/env/bin/python
> 03:29:11 metadata/test_partition_metadata.py:127: in 
> test_partition_metadata_compatibility
> 03:29:11 "insert into %s partition (x) values(1,1)" % FQ_TBL_HIVE)
> 03:29:11 common/impala_test_suite.py:684: in run_stmt_in_hive
> 03:29:11 raise RuntimeError(stderr)
> 03:29:11 E   RuntimeError: SLF4J: Class path contains multiple SLF4J bindings.
> 03:29:11 E   SLF4J: Found binding in 
> [jar:file:/home/ubuntu/Impala/toolchain/cdh_components/hbase-1.2.0-cdh5.15.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 03:29:11 E   SLF4J: Found binding in 
> [jar:file:/home/ubuntu/Impala/toolchain/cdh_components/hadoop-2.6.0-cdh5.15.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 03:29:11 E   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for 
> an explanation.
> 03:29:11 E   SLF4J: Actual binding is of type 
> [org.slf4j.impl.Log4jLoggerFactory]
> 03:29:11 E   scan complete in 3ms
> 03:29:11 E   Connecting to jdbc:hive2://localhost:11050
> 03:29:11 E   Connected to: Apache Hive (version 1.1.0-cdh5.15.0-SNAPSHOT)
> 03:29:11 E   Driver: Hive JDBC (version 1.1.0-cdh5.15.0-SNAPSHOT)
> 03:29:11 E   Transaction isolation: TRANSACTION_REPEATABLE_READ
> 03:29:11 E   No rows affected (0.045 seconds)
> 03:29:11 E   INFO  : Compiling 
> command(queryId=ubuntu_20180215024848_d80982c5-a75c-4441-ab43-68d238eb69ba): 
> insert into 
> test_partition_metadata_compatibility_b2ac5e.part_parquet_tbl_hive partition 
> (x) values(1,1)
> 03:29:11 E   INFO  : Semantic Analysis Completed
> 03:29:11 E   INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:_col0, type:int, comment:null), 
> FieldSchema(name:_col1, type:string, comment:null)], properties:null)
> 03:29:11 E   INFO  : Completed compiling 
> command(queryId=ubuntu_20180215024848_d80982c5-a75c-4441-ab43-68d238eb69ba); 
> Time taken: 0.123 seconds
> 03:29:11 E   INFO  : Executing 
> command(queryId=ubuntu_20180215024848_d80982c5-a75c-4441-ab43-68d238eb69ba): 
> insert into 
> test_partition_metadata_compatibility_b2ac5e.part_parquet_tbl_hive partition 
> (x) values(1,1)
> 03:29:11 E   INFO  : Query ID = 
> ubuntu_20180215024848_d80982c5-a75c-4441-ab43-68d238eb69ba
> 03:29:11 E   INFO  : Total jobs = 3
> 03:29:11 E   INFO  : Launching Job 1 out of 3
> 03:29:11 E   INFO  : Starting task [Stage-1:MAPRED] in serial mode
> 03:29:11 E   INFO  : Number of reduce tasks is set to 0 since there's no 
> reduce operator
> 03:29:11 E   INFO  : number of splits:1
> 03:29:11 E   INFO  : Submitting tokens for job: job_local1716894275_0002
> 03:29:11 E   INFO  : The url to track the job: http://localhost:8080/
> 03:29:11 E   INFO  : Job running in-process (local Hadoop)
> 03:29:11 E   INFO  : 2018-02-15 02:48:03,500 Stage-1 map = 0%,  reduce = 0%
> 03:29:11 E   ERROR : Ended Job = job_local1716894275_0002 with errors
> 03:29:11 E   ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> 03:29:11 E   INFO  : MapReduce Jobs Launched: 
> 03:29:11 E   INFO  : Stage-Stage-1:  HDFS Read: 0 HDFS Write: 

[jira] [Commented] (IMPALA-7111) ASAN heap-use-after-free in impala::HdfsPluginTextScanner::CheckPluginEnabled

2018-06-13 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511591#comment-16511591
 ] 

Tim Armstrong commented on IMPALA-7111:
---

I'm going to take another look at the code. The error message we saw on the 
DEBUG build seems to confirm that both FLAGS_enabled_hdfs_text_scanner_plugins 
and plugin_name have the right values.

> ASAN heap-use-after-free in impala::HdfsPluginTextScanner::CheckPluginEnabled
> -
>
> Key: IMPALA-7111
> URL: https://issues.apache.org/jira/browse/IMPALA-7111
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.13.0, Impala 3.1.0
>Reporter: Lars Volker
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: asan, broken-build
>
>  [~tarmstr...@cloudera.com] - I'm assigning this to you since you added this 
> file in IMPALA-6941.
> {noformat}
> ==4582==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x603000e8aa28 at pc 0x017ab9b4 bp 0x7f67e5f6b650 sp 0x7f67e5f6b648
> READ of size 1 at 0x603000e8aa28 thread T9236
> #0 0x17ab9b3 in bool 
> __gnu_cxx::__ops::_Iter_pred 
> >::operator()<__gnu_cxx::__normal_iterator 
> >(__gnu_cxx::__normal_iterator) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/bits/predefined_ops.h:231:24
> #1 0x17ab745 in __gnu_cxx::__normal_iterator 
> std::__find_if<__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__ops::_Iter_pred > 
> >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator, 
> __gnu_cxx::__ops::_Iter_pred >, 
> std::random_access_iterator_tag) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/bits/stl_algo.h:140:8
> #2 0x17ab2dc in __gnu_cxx::__normal_iterator 
> std::__find_if<__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__ops::_Iter_pred > 
> >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator, 
> __gnu_cxx::__ops::_Iter_pred >) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/bits/stl_algo.h:161:14
> #3 0x17aaf6c in __gnu_cxx::__normal_iterator 
> std::find_if<__gnu_cxx::__normal_iterator, 
> boost::algorithm::detail::is_any_ofF 
> >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator, 
> boost::algorithm::detail::is_any_ofF) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/bits/stl_algo.h:3803:14
> #4 0x17aaba1 in boost::iterator_range<__gnu_cxx::__normal_iterator std::string> > 
> boost::algorithm::detail::token_finderF
>  >::operator()<__gnu_cxx::__normal_iterator 
> >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator) const 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/detail/finder.hpp:565:41
> #5 0x17ac118 in 
> boost::function2 std::string> >, __gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator 
> >::operator()(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator) const 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:766:14
> #6 0x17abf8d in 
> boost::algorithm::detail::find_iterator_base<__gnu_cxx::__normal_iterator  std::string> >::do_find(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator) const 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/detail/find_iterator.hpp:63:32
> #7 0x17aa00c in 
> boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator std::string> >::increment() 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/find_iterator.hpp:305:44
> #8 0x17a95a5 in 
> boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator std::string> 
> >::split_iterator
>  > >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator, 
> boost::algorithm::detail::token_finderF
>  >) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/find_iterator.hpp:265:21
> #9 0x17a8d5e in std::vector >& 
> boost::algorithm::iter_split std::allocator >, std::string, 
> boost::algorithm::detail::token_finderF
>  > >(std::vector >&, std::string&, 
> boost::algorithm::detail::token_finderF
>  >) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/iter_find.hpp:170:21
> #10 0x179754f in std::vector >& 

[jira] [Resolved] (IMPALA-6929) Create Kudu table syntax does not allow multi-column range partitions

2018-06-13 Thread Thomas Tauber-Marshall (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall resolved IMPALA-6929.

   Resolution: Fixed
Fix Version/s: Impala 3.1.0
   Impala 2.13.0

> Create Kudu table syntax does not allow multi-column range partitions
> -
>
> Key: IMPALA-6929
> URL: https://issues.apache.org/jira/browse/IMPALA-6929
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.11.0
>Reporter: Dan Burkert
>Assignee: Thomas Tauber-Marshall
>Priority: Major
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> The Impala CREATE TABLE syntax guide includes this bit of grammar in the Kudu 
> partitioning section:
> {code:java}
> range_clause ::=
>   RANGE [ (pk_col [, ...]) ]
>   (
> {
>   PARTITION constant_expression range_comparison_operator VALUES 
> range_comparison_operator constant_expression
>   | PARTITION VALUE = constant_expression_or_tuple
> }
>[, ...]
>   ){code}
> This is suspicious because {{constant_expression}} is used in the range 
> clause, and {{constant_expression_or_tuple}} is used in the single-value 
> clause.  I believe both should allow for tuples.
> In other words, today a CREATE TABLE statement such as
> {code:java}
> CREATE TABLE t (a BIGINT, b BIGINT, PRIMARY KEY (a, b))
> PARTITION BY RANGE (a, b) (
>     PARTITION (0, 0) <= VALUES < (10, 0)
> ) STORED AS KUDU;{code}
> results in a syntax error, and it should not.  CC [~twmarshall]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7111) ASAN heap-use-after-free in impala::HdfsPluginTextScanner::CheckPluginEnabled

2018-06-13 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511588#comment-16511588
 ] 

Tim Armstrong commented on IMPALA-7111:
---

TSAN and UBSAN didn't show anything when running the data errors test.

> ASAN heap-use-after-free in impala::HdfsPluginTextScanner::CheckPluginEnabled
> -
>
> Key: IMPALA-7111
> URL: https://issues.apache.org/jira/browse/IMPALA-7111
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.13.0, Impala 3.1.0
>Reporter: Lars Volker
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: asan, broken-build
>
>  [~tarmstr...@cloudera.com] - I'm assigning this to you since you added this 
> file in IMPALA-6941.
> {noformat}
> ==4582==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x603000e8aa28 at pc 0x017ab9b4 bp 0x7f67e5f6b650 sp 0x7f67e5f6b648
> READ of size 1 at 0x603000e8aa28 thread T9236
> #0 0x17ab9b3 in bool 
> __gnu_cxx::__ops::_Iter_pred 
> >::operator()<__gnu_cxx::__normal_iterator 
> >(__gnu_cxx::__normal_iterator) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/bits/predefined_ops.h:231:24
> #1 0x17ab745 in __gnu_cxx::__normal_iterator 
> std::__find_if<__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__ops::_Iter_pred > 
> >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator, 
> __gnu_cxx::__ops::_Iter_pred >, 
> std::random_access_iterator_tag) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/bits/stl_algo.h:140:8
> #2 0x17ab2dc in __gnu_cxx::__normal_iterator 
> std::__find_if<__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__ops::_Iter_pred > 
> >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator, 
> __gnu_cxx::__ops::_Iter_pred >) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/bits/stl_algo.h:161:14
> #3 0x17aaf6c in __gnu_cxx::__normal_iterator 
> std::find_if<__gnu_cxx::__normal_iterator, 
> boost::algorithm::detail::is_any_ofF 
> >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator, 
> boost::algorithm::detail::is_any_ofF) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/bits/stl_algo.h:3803:14
> #4 0x17aaba1 in boost::iterator_range<__gnu_cxx::__normal_iterator std::string> > 
> boost::algorithm::detail::token_finderF
>  >::operator()<__gnu_cxx::__normal_iterator 
> >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator) const 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/detail/finder.hpp:565:41
> #5 0x17ac118 in 
> boost::function2 std::string> >, __gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator 
> >::operator()(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator) const 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:766:14
> #6 0x17abf8d in 
> boost::algorithm::detail::find_iterator_base<__gnu_cxx::__normal_iterator  std::string> >::do_find(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator) const 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/detail/find_iterator.hpp:63:32
> #7 0x17aa00c in 
> boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator std::string> >::increment() 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/find_iterator.hpp:305:44
> #8 0x17a95a5 in 
> boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator std::string> 
> >::split_iterator
>  > >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator, 
> boost::algorithm::detail::token_finderF
>  >) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/find_iterator.hpp:265:21
> #9 0x17a8d5e in std::vector >& 
> boost::algorithm::iter_split std::allocator >, std::string, 
> boost::algorithm::detail::token_finderF
>  > >(std::vector >&, std::string&, 
> boost::algorithm::detail::token_finderF
>  >) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/iter_find.hpp:170:21
> #10 0x179754f in std::vector >& 
> boost::algorithm::split 
> >, std::string, boost::algorithm::detail::is_any_ofF 
> >(std::vector >&, std::string&, 
> 

[jira] [Assigned] (IMPALA-7169) TestHdfsEncryption::()::test_drop_partition_encrypt fails to find file

2018-06-13 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker reassigned IMPALA-7169:
---

Assignee: Tianyi Wang

> TestHdfsEncryption::()::test_drop_partition_encrypt fails to find file
> --
>
> Key: IMPALA-7169
> URL: https://issues.apache.org/jira/browse/IMPALA-7169
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.13.0
>Reporter: Lars Volker
>Assignee: Tianyi Wang
>Priority: Critical
>  Labels: broken-build, flaky
>
> {noformat}
> F 
> metadata/test_hdfs_encryption.py::TestHdfsEncryption::()::test_drop_partition_encrypt
>  metadata/test_hdfs_encryption.py:202: in test_drop_partition_encrypt
>  assert self.hdfs_client.exists(
>  E   assert   0xba6ee90>>('/user/jenkins/.Trash/Current/test-warehouse/test_encryption_db.db/t1/j=1/j1.txt')
>  E+  where  > = 
> .exists
>  E+where  0xba6ee90> =  0xba6ed50>.hdfs_client
>  E+  and   
> '/user/jenkins/.Trash/Current/test-warehouse/test_encryption_db.db/t1/j=1/j1.txt'
>  = ('jenkins', 
> 'test_encryption_db')
>  E+where  = 
> '/user/{0}/.Trash/Current/test-warehouse/{1}.db/t1/j=1/j1.txt'.format
>  E+and   'jenkins' = ()
>  E+  where  = getpass.getuser
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7169) TestHdfsEncryption::()::test_drop_partition_encrypt fails to find file

2018-06-13 Thread Lars Volker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511561#comment-16511561
 ] 

Lars Volker commented on IMPALA-7169:
-

[~tianyiwang] - I'm assigning this to you randomly; feel free to find another 
person or assign back to me if you're swamped.

> TestHdfsEncryption::()::test_drop_partition_encrypt fails to find file
> --
>
> Key: IMPALA-7169
> URL: https://issues.apache.org/jira/browse/IMPALA-7169
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.13.0
>Reporter: Lars Volker
>Priority: Critical
>  Labels: broken-build, flaky
>
> {noformat}
> F 
> metadata/test_hdfs_encryption.py::TestHdfsEncryption::()::test_drop_partition_encrypt
>  metadata/test_hdfs_encryption.py:202: in test_drop_partition_encrypt
>  assert self.hdfs_client.exists(
>  E   assert   0xba6ee90>>('/user/jenkins/.Trash/Current/test-warehouse/test_encryption_db.db/t1/j=1/j1.txt')
>  E+  where  > = 
> .exists
>  E+where  0xba6ee90> =  0xba6ed50>.hdfs_client
>  E+  and   
> '/user/jenkins/.Trash/Current/test-warehouse/test_encryption_db.db/t1/j=1/j1.txt'
>  = ('jenkins', 
> 'test_encryption_db')
>  E+where  = 
> '/user/{0}/.Trash/Current/test-warehouse/{1}.db/t1/j=1/j1.txt'.format
>  E+and   'jenkins' = ()
>  E+  where  = getpass.getuser
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7169) TestHdfsEncryption::()::test_drop_partition_encrypt fails to find file

2018-06-13 Thread Lars Volker (JIRA)
Lars Volker created IMPALA-7169:
---

 Summary: TestHdfsEncryption::()::test_drop_partition_encrypt fails 
to find file
 Key: IMPALA-7169
 URL: https://issues.apache.org/jira/browse/IMPALA-7169
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 2.13.0
Reporter: Lars Volker


{noformat}
F 
metadata/test_hdfs_encryption.py::TestHdfsEncryption::()::test_drop_partition_encrypt
 metadata/test_hdfs_encryption.py:202: in test_drop_partition_encrypt
 assert self.hdfs_client.exists(
 E   assert >('/user/jenkins/.Trash/Current/test-warehouse/test_encryption_db.db/t1/j=1/j1.txt')
 E+  where > = 
.exists
 E+where  = .hdfs_client
 E+  and   
'/user/jenkins/.Trash/Current/test-warehouse/test_encryption_db.db/t1/j=1/j1.txt'
 = ('jenkins', 
'test_encryption_db')
 E+where  = 
'/user/{0}/.Trash/Current/test-warehouse/{1}.db/t1/j=1/j1.txt'.format
 E+and   'jenkins' = ()
 E+  where  = getpass.getuser
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5058) Improve concurrency of DDL/DML operations during catalog updates

2018-06-13 Thread tim geary (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511498#comment-16511498
 ] 

tim geary commented on IMPALA-5058:
---

Do we know what cdh version this will be in?

> Improve concurrency of DDL/DML operations during catalog updates
> 
>
> Key: IMPALA-5058
> URL: https://issues.apache.org/jira/browse/IMPALA-5058
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Dimitris Tsirogiannis
>Priority: Critical
>  Labels: catalog-server, performance, usability
> Fix For: Impala 2.12.0
>
> Attachments: sample-refresh-duration-graph.png
>
>
> Currently, long running DDL/DML operations can block other operations from 
> making progress if they run concurrently with the getCatalogObjects() call 
> that creates catalog updates. The reason is that while getCatalogObjects() 
> holds the lock for its entire duration and also tries to acquire the locks 
> for the tables it processes. If that operation is blocked by another 
> operation on a table then any other, unrelated, catalog write operation 
> cannot make any progress as it cannot acquire the catalog lock which is held 
> by getCatalogObjects().
> From a user's point of view, concurrent DDL/DML operations are executed 
> serially and, consequently, the latency of DDL/DML operations may vary 
> significantly. With the fix for this issue concurrent DDL/DML operations 
> should allow to run concurrently and the throughput of these operations 
> should increase significantly. At the same time, the latency of DDL/DML 
> operations should not depend on any other operations that are running at the 
> same time. It's important to note that when we talk about the latency of an 
> operation it is with respect to the coordinator that initiates the operation; 
> the fix doesn't do anything to improve the latency of broadcasting metadata 
> changes through the statestore. Some common user case where this fix is 
> applicable are the following:
>  # Concurrent REFRESH operations on different tables. 
>  # Concurrent ALTER TABLE operations on different tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7111) ASAN heap-use-after-free in impala::HdfsPluginTextScanner::CheckPluginEnabled

2018-06-13 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511484#comment-16511484
 ] 

Tim Armstrong commented on IMPALA-7111:
---

Tried looping again overnight, this time with some extra load on the system to 
produce some interesting races. Couldn't reproduce. I'll try TSAN to see if it 
finds a data race.

> ASAN heap-use-after-free in impala::HdfsPluginTextScanner::CheckPluginEnabled
> -
>
> Key: IMPALA-7111
> URL: https://issues.apache.org/jira/browse/IMPALA-7111
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.13.0, Impala 3.1.0
>Reporter: Lars Volker
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: asan, broken-build
>
>  [~tarmstr...@cloudera.com] - I'm assigning this to you since you added this 
> file in IMPALA-6941.
> {noformat}
> ==4582==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x603000e8aa28 at pc 0x017ab9b4 bp 0x7f67e5f6b650 sp 0x7f67e5f6b648
> READ of size 1 at 0x603000e8aa28 thread T9236
> #0 0x17ab9b3 in bool 
> __gnu_cxx::__ops::_Iter_pred 
> >::operator()<__gnu_cxx::__normal_iterator 
> >(__gnu_cxx::__normal_iterator) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/bits/predefined_ops.h:231:24
> #1 0x17ab745 in __gnu_cxx::__normal_iterator 
> std::__find_if<__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__ops::_Iter_pred > 
> >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator, 
> __gnu_cxx::__ops::_Iter_pred >, 
> std::random_access_iterator_tag) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/bits/stl_algo.h:140:8
> #2 0x17ab2dc in __gnu_cxx::__normal_iterator 
> std::__find_if<__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__ops::_Iter_pred > 
> >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator, 
> __gnu_cxx::__ops::_Iter_pred >) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/bits/stl_algo.h:161:14
> #3 0x17aaf6c in __gnu_cxx::__normal_iterator 
> std::find_if<__gnu_cxx::__normal_iterator, 
> boost::algorithm::detail::is_any_ofF 
> >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator, 
> boost::algorithm::detail::is_any_ofF) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../include/c++/4.9.2/bits/stl_algo.h:3803:14
> #4 0x17aaba1 in boost::iterator_range<__gnu_cxx::__normal_iterator std::string> > 
> boost::algorithm::detail::token_finderF
>  >::operator()<__gnu_cxx::__normal_iterator 
> >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator) const 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/detail/finder.hpp:565:41
> #5 0x17ac118 in 
> boost::function2 std::string> >, __gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator 
> >::operator()(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator) const 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:766:14
> #6 0x17abf8d in 
> boost::algorithm::detail::find_iterator_base<__gnu_cxx::__normal_iterator  std::string> >::do_find(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator) const 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/detail/find_iterator.hpp:63:32
> #7 0x17aa00c in 
> boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator std::string> >::increment() 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/find_iterator.hpp:305:44
> #8 0x17a95a5 in 
> boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator std::string> 
> >::split_iterator
>  > >(__gnu_cxx::__normal_iterator, 
> __gnu_cxx::__normal_iterator, 
> boost::algorithm::detail::token_finderF
>  >) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/find_iterator.hpp:265:21
> #9 0x17a8d5e in std::vector >& 
> boost::algorithm::iter_split std::allocator >, std::string, 
> boost::algorithm::detail::token_finderF
>  > >(std::vector >&, std::string&, 
> boost::algorithm::detail::token_finderF
>  >) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/algorithm/string/iter_find.hpp:170:21
> #10 0x179754f in std::vector >& 
> 

[jira] [Work stopped] (IMPALA-7070) Failed test: query_test.test_nested_types.TestParquetArrayEncodings.test_thrift_array_of_arrays on S3

2018-06-13 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7070 stopped by Lars Volker.
---
> Failed test: 
> query_test.test_nested_types.TestParquetArrayEncodings.test_thrift_array_of_arrays
>  on S3
> -
>
> Key: IMPALA-7070
> URL: https://issues.apache.org/jira/browse/IMPALA-7070
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Lars Volker
>Priority: Critical
>  Labels: broken-build, flaky, s3, test-failure
>
>  
> {code:java}
> Error Message
> query_test/test_nested_types.py:406: in test_thrift_array_of_arrays "col1 
> array>") query_test/test_nested_types.py:579: in 
> _create_test_table check_call(["hadoop", "fs", "-put", local_path, 
> location], shell=False) /usr/lib64/python2.6/subprocess.py:505: in check_call 
> raise CalledProcessError(retcode, cmd) E   CalledProcessError: Command 
> '['hadoop', 'fs', '-put', 
> '/data/jenkins/workspace/impala-asf-2.x-core-s3/repos/Impala/testdata/parquet_nested_types_encodings/bad-thrift.parquet',
>  
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays']'
>  returned non-zero exit status 1
> Stacktrace
> query_test/test_nested_types.py:406: in test_thrift_array_of_arrays
> "col1 array>")
> query_test/test_nested_types.py:579: in _create_test_table
> check_call(["hadoop", "fs", "-put", local_path, location], shell=False)
> /usr/lib64/python2.6/subprocess.py:505: in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command '['hadoop', 'fs', '-put', 
> '/data/jenkins/workspace/impala-asf-2.x-core-s3/repos/Impala/testdata/parquet_nested_types_encodings/bad-thrift.parquet',
>  
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays']'
>  returned non-zero exit status 1
> Standard Error
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_thrift_array_of_arrays_11da5fde` CASCADE;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_thrift_array_of_arrays_11da5fde`;
> MainThread: Created database "test_thrift_array_of_arrays_11da5fde" for test 
> ID 
> "query_test/test_nested_types.py::TestParquetArrayEncodings::()::test_thrift_array_of_arrays[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]"
> -- executing against localhost:21000
> create table test_thrift_array_of_arrays_11da5fde.ThriftArrayOfArrays (col1 
> array>) stored as parquet location 
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays';
> 18/05/20 18:31:03 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 18/05/20 18:31:03 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> 18/05/20 18:31:03 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 18/05/20 18:31:06 INFO Configuration.deprecation: 
> fs.s3a.server-side-encryption-key is deprecated. Instead, use 
> fs.s3a.server-side-encryption.key
> put: rename 
> `s3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays/bad-thrift.parquet._COPYING_'
>  to 
> `s3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays/bad-thrift.parquet':
>  Input/output error
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-7070) Failed test: query_test.test_nested_types.TestParquetArrayEncodings.test_thrift_array_of_arrays on S3

2018-06-13 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7070 started by Lars Volker.
---
> Failed test: 
> query_test.test_nested_types.TestParquetArrayEncodings.test_thrift_array_of_arrays
>  on S3
> -
>
> Key: IMPALA-7070
> URL: https://issues.apache.org/jira/browse/IMPALA-7070
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Lars Volker
>Priority: Critical
>  Labels: broken-build, flaky, s3, test-failure
>
>  
> {code:java}
> Error Message
> query_test/test_nested_types.py:406: in test_thrift_array_of_arrays "col1 
> array>") query_test/test_nested_types.py:579: in 
> _create_test_table check_call(["hadoop", "fs", "-put", local_path, 
> location], shell=False) /usr/lib64/python2.6/subprocess.py:505: in check_call 
> raise CalledProcessError(retcode, cmd) E   CalledProcessError: Command 
> '['hadoop', 'fs', '-put', 
> '/data/jenkins/workspace/impala-asf-2.x-core-s3/repos/Impala/testdata/parquet_nested_types_encodings/bad-thrift.parquet',
>  
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays']'
>  returned non-zero exit status 1
> Stacktrace
> query_test/test_nested_types.py:406: in test_thrift_array_of_arrays
> "col1 array>")
> query_test/test_nested_types.py:579: in _create_test_table
> check_call(["hadoop", "fs", "-put", local_path, location], shell=False)
> /usr/lib64/python2.6/subprocess.py:505: in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command '['hadoop', 'fs', '-put', 
> '/data/jenkins/workspace/impala-asf-2.x-core-s3/repos/Impala/testdata/parquet_nested_types_encodings/bad-thrift.parquet',
>  
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays']'
>  returned non-zero exit status 1
> Standard Error
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_thrift_array_of_arrays_11da5fde` CASCADE;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_thrift_array_of_arrays_11da5fde`;
> MainThread: Created database "test_thrift_array_of_arrays_11da5fde" for test 
> ID 
> "query_test/test_nested_types.py::TestParquetArrayEncodings::()::test_thrift_array_of_arrays[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]"
> -- executing against localhost:21000
> create table test_thrift_array_of_arrays_11da5fde.ThriftArrayOfArrays (col1 
> array>) stored as parquet location 
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays';
> 18/05/20 18:31:03 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 18/05/20 18:31:03 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> 18/05/20 18:31:03 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 18/05/20 18:31:06 INFO Configuration.deprecation: 
> fs.s3a.server-side-encryption-key is deprecated. Instead, use 
> fs.s3a.server-side-encryption.key
> put: rename 
> `s3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays/bad-thrift.parquet._COPYING_'
>  to 
> `s3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays/bad-thrift.parquet':
>  Input/output error
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7168) DML query may hang if CatalogUpdateCallback() encounters repeated error

2018-06-13 Thread Pranay Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranay Singh updated IMPALA-7168:
-
Description: 
DML queries or INSERT  will encounter a hang, if 
exec_env_->frontend()->UpdateCatalogCache() in 
ImpalaServer::CatalogUpdateCallback encounters repeated error like ENOMEM. 

This happens with SYNC_DDL set to 1 when the coordinator node is waiting for 
it's catalog version to become current.

The scenario shows up like this, lets say there are two coordinator nodes , 
Node A, Node B
and catalogd and statestored are running on Node C.

a) CREATE TABLE is executed on Node A, with SYNC_DDL set to 1, the thread 
running the query is going to block in 
impala::ImpalaServer::ProcessCatalogUpdateResult(), waiting for it's catalog 
version to become current.

b) Meanwhile statestored running on Node C would call 
ImpalaServer::CatalogUpdateCallback on Node B via thrift RPC to do a delta 
topic update, which would not happen if we encounter repeated errors, say front 
end is low on memory (low JVM heap situation).

c) In such case Node A will wait indefinitely waiting for it's catalog version 
to become current, till Node B is shutdown voluntarily.
Note: This is a case where Node B is reachable (hearbeat is fine, but node is 
in a bad state, non working).




  was:
DML queries or INSERT  will encounter a hang, if 
exec_env_->frontend()->UpdateCatalogCache() in 
ImpalaServer::CatalogUpdateCallback encounters repeated error like ENOMEM. 

This happens with SYNC_DDL set to 1 when the coordinator node is waiting for 
it's catalog version to become current.

The scenario shows up like this, lets say there are two coordinator nodes , 
Node A, Node B
and catalogd and statestored are running on Node C.

a) CREATE TABLE is executed on Node A, with SYNC_DDL set to 1, the thread 
running the query is going to block in 
impala::ImpalaServer::ProcessCatalogUpdateResult(), waiting for it's catalog 
version to become current.

b) Meanwhile statestored running on Node C would call 
ImpalaServer::CatalogUpdateCallback on Node B via thrift RPC to do a delta 
topic update, which would not happen if we encounter repeated errors, say front 
end is low on memory (low JVM heap situation).

c) In such case Node A will wait indefinitely waiting for it's catalog version 
to become current, till Node B is shutdown voluntarily.
Note: This is a case where Node B is reachable (hearbeat is fine, but bad node) 
since 





> DML query may hang if CatalogUpdateCallback() encounters repeated error
> ---
>
> Key: IMPALA-7168
> URL: https://issues.apache.org/jira/browse/IMPALA-7168
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, 
> Impala 2.12.0
>Reporter: Pranay Singh
>Priority: Major
>
> DML queries or INSERT  will encounter a hang, if 
> exec_env_->frontend()->UpdateCatalogCache() in 
> ImpalaServer::CatalogUpdateCallback encounters repeated error like ENOMEM. 
> This happens with SYNC_DDL set to 1 when the coordinator node is waiting for 
> it's catalog version to become current.
> The scenario shows up like this, lets say there are two coordinator nodes , 
> Node A, Node B
> and catalogd and statestored are running on Node C.
> a) CREATE TABLE is executed on Node A, with SYNC_DDL set to 1, the thread 
> running the query is going to block in 
> impala::ImpalaServer::ProcessCatalogUpdateResult(), waiting for it's catalog 
> version to become current.
> b) Meanwhile statestored running on Node C would call 
> ImpalaServer::CatalogUpdateCallback on Node B via thrift RPC to do a delta 
> topic update, which would not happen if we encounter repeated errors, say 
> front end is low on memory (low JVM heap situation).
> c) In such case Node A will wait indefinitely waiting for it's catalog 
> version to become current, till Node B is shutdown voluntarily.
> Note: This is a case where Node B is reachable (hearbeat is fine, but node is 
> in a bad state, non working).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work stopped] (IMPALA-7070) Failed test: query_test.test_nested_types.TestParquetArrayEncodings.test_thrift_array_of_arrays on S3

2018-06-13 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7070 stopped by Lars Volker.
---
> Failed test: 
> query_test.test_nested_types.TestParquetArrayEncodings.test_thrift_array_of_arrays
>  on S3
> -
>
> Key: IMPALA-7070
> URL: https://issues.apache.org/jira/browse/IMPALA-7070
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Lars Volker
>Priority: Critical
>  Labels: broken-build, flaky, s3, test-failure
>
>  
> {code:java}
> Error Message
> query_test/test_nested_types.py:406: in test_thrift_array_of_arrays "col1 
> array>") query_test/test_nested_types.py:579: in 
> _create_test_table check_call(["hadoop", "fs", "-put", local_path, 
> location], shell=False) /usr/lib64/python2.6/subprocess.py:505: in check_call 
> raise CalledProcessError(retcode, cmd) E   CalledProcessError: Command 
> '['hadoop', 'fs', '-put', 
> '/data/jenkins/workspace/impala-asf-2.x-core-s3/repos/Impala/testdata/parquet_nested_types_encodings/bad-thrift.parquet',
>  
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays']'
>  returned non-zero exit status 1
> Stacktrace
> query_test/test_nested_types.py:406: in test_thrift_array_of_arrays
> "col1 array>")
> query_test/test_nested_types.py:579: in _create_test_table
> check_call(["hadoop", "fs", "-put", local_path, location], shell=False)
> /usr/lib64/python2.6/subprocess.py:505: in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command '['hadoop', 'fs', '-put', 
> '/data/jenkins/workspace/impala-asf-2.x-core-s3/repos/Impala/testdata/parquet_nested_types_encodings/bad-thrift.parquet',
>  
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays']'
>  returned non-zero exit status 1
> Standard Error
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_thrift_array_of_arrays_11da5fde` CASCADE;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_thrift_array_of_arrays_11da5fde`;
> MainThread: Created database "test_thrift_array_of_arrays_11da5fde" for test 
> ID 
> "query_test/test_nested_types.py::TestParquetArrayEncodings::()::test_thrift_array_of_arrays[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]"
> -- executing against localhost:21000
> create table test_thrift_array_of_arrays_11da5fde.ThriftArrayOfArrays (col1 
> array>) stored as parquet location 
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays';
> 18/05/20 18:31:03 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 18/05/20 18:31:03 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> 18/05/20 18:31:03 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 18/05/20 18:31:06 INFO Configuration.deprecation: 
> fs.s3a.server-side-encryption-key is deprecated. Instead, use 
> fs.s3a.server-side-encryption.key
> put: rename 
> `s3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays/bad-thrift.parquet._COPYING_'
>  to 
> `s3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays/bad-thrift.parquet':
>  Input/output error
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-7070) Failed test: query_test.test_nested_types.TestParquetArrayEncodings.test_thrift_array_of_arrays on S3

2018-06-13 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7070 started by Lars Volker.
---
> Failed test: 
> query_test.test_nested_types.TestParquetArrayEncodings.test_thrift_array_of_arrays
>  on S3
> -
>
> Key: IMPALA-7070
> URL: https://issues.apache.org/jira/browse/IMPALA-7070
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Lars Volker
>Priority: Critical
>  Labels: broken-build, flaky, s3, test-failure
>
>  
> {code:java}
> Error Message
> query_test/test_nested_types.py:406: in test_thrift_array_of_arrays "col1 
> array>") query_test/test_nested_types.py:579: in 
> _create_test_table check_call(["hadoop", "fs", "-put", local_path, 
> location], shell=False) /usr/lib64/python2.6/subprocess.py:505: in check_call 
> raise CalledProcessError(retcode, cmd) E   CalledProcessError: Command 
> '['hadoop', 'fs', '-put', 
> '/data/jenkins/workspace/impala-asf-2.x-core-s3/repos/Impala/testdata/parquet_nested_types_encodings/bad-thrift.parquet',
>  
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays']'
>  returned non-zero exit status 1
> Stacktrace
> query_test/test_nested_types.py:406: in test_thrift_array_of_arrays
> "col1 array>")
> query_test/test_nested_types.py:579: in _create_test_table
> check_call(["hadoop", "fs", "-put", local_path, location], shell=False)
> /usr/lib64/python2.6/subprocess.py:505: in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command '['hadoop', 'fs', '-put', 
> '/data/jenkins/workspace/impala-asf-2.x-core-s3/repos/Impala/testdata/parquet_nested_types_encodings/bad-thrift.parquet',
>  
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays']'
>  returned non-zero exit status 1
> Standard Error
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_thrift_array_of_arrays_11da5fde` CASCADE;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_thrift_array_of_arrays_11da5fde`;
> MainThread: Created database "test_thrift_array_of_arrays_11da5fde" for test 
> ID 
> "query_test/test_nested_types.py::TestParquetArrayEncodings::()::test_thrift_array_of_arrays[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]"
> -- executing against localhost:21000
> create table test_thrift_array_of_arrays_11da5fde.ThriftArrayOfArrays (col1 
> array>) stored as parquet location 
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays';
> 18/05/20 18:31:03 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 18/05/20 18:31:03 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> 18/05/20 18:31:03 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 18/05/20 18:31:06 INFO Configuration.deprecation: 
> fs.s3a.server-side-encryption-key is deprecated. Instead, use 
> fs.s3a.server-side-encryption.key
> put: rename 
> `s3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays/bad-thrift.parquet._COPYING_'
>  to 
> `s3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays/bad-thrift.parquet':
>  Input/output error
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6910) Multiple tests failing on S3 build: error reading from HDFS file

2018-06-13 Thread Lars Volker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511473#comment-16511473
 ] 

Lars Volker commented on IMPALA-6910:
-

I keep hitting this when testing possible workarounds for IMPALA-7070.

> Multiple tests failing on S3 build: error reading from HDFS file
> 
>
> Key: IMPALA-6910
> URL: https://issues.apache.org/jira/browse/IMPALA-6910
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: David Knupp
>Assignee: Sailesh Mukil
>Priority: Critical
>  Labels: s3
>
> Stacktrace
> {noformat}
> query_test/test_compressed_formats.py:149: in test_seq_writer
> self.run_test_case('QueryTest/seq-writer', vector, unique_database)
> common/impala_test_suite.py:397: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:612: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:173: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:341: in __execute_query
> self.wait_for_completion(handle)
> beeswax/impala_beeswax.py:361: in wait_for_completion
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error: Error reading from HDFS file: 
> s3a://impala-cdh5-s3-test/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2452585/a5482dcb946b6c98-7543e0dd0004_95929617_data.0.parq
> E   Error(255): Unknown error 255
> E   Root cause: SdkClientException: Data read has a different length than the 
> expected: dataLength=8576; expectedLength=17785; includeSkipped=true; 
> in.getClass()=class com.amazonaws.services.s3.AmazonS3Client$2; 
> markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; 
> resetCount=0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7168) DML query may hang if CatalogUpdateCallback() encounters repeated error

2018-06-13 Thread Pranay Singh (JIRA)
Pranay Singh created IMPALA-7168:


 Summary: DML query may hang if CatalogUpdateCallback() encounters 
repeated error
 Key: IMPALA-7168
 URL: https://issues.apache.org/jira/browse/IMPALA-7168
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 2.12.0, Impala 3.0, Impala 2.11.0, Impala 2.10.0, 
Impala 2.9.0
Reporter: Pranay Singh


DML queries or INSERT  will encounter a hang, if 
exec_env_->frontend()->UpdateCatalogCache() in 
ImpalaServer::CatalogUpdateCallback encounters repeated error like ENOMEM. 

This happens with SYNC_DDL set to 1 when the coordinator node is waiting for 
it's catalog version to become current.

The scenario shows up like this, lets say there are two coordinator nodes , 
Node A, Node B
and catalogd and statestored are running on Node C.

a) CREATE TABLE is executed on Node A, with SYNC_DDL set to 1, the thread 
running the query is going to block in 
impala::ImpalaServer::ProcessCatalogUpdateResult(), waiting for it's catalog 
version to become current.

b) Meanwhile statestored running on Node C would call 
ImpalaServer::CatalogUpdateCallback on Node B via thrift RPC to do a delta 
topic update, which would not happen if we encounter repeated errors, say front 
end is low on memory (low JVM heap situation).

c) In such case Node A will wait indefinitely, till Node B is shutdown 
voluntarily.Note this is case where Node B is reachable (hearbeat is fine, but 
bad node)






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Issue Comment Deleted] (IMPALA-5937) Docs are missing some query options

2018-06-13 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-5937:

Comment: was deleted

(was: [~tarmstrong] STRICT_MODE is not listed in the SET output. Is it a 
special internal option? Do we need to document it?)

> Docs are missing some query options
> ---
>
> Key: IMPALA-5937
> URL: https://issues.apache.org/jira/browse/IMPALA-5937
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Philip Zeyliger
>Assignee: Alex Rodoni
>Priority: Major
>
> I noticed that the following options show up in "SET" in impala-shell but 
> don't have corresponding documentation entries. I know BUFFER_POOL_LIMIT is 
> mentioned in IMPALA-5655.
> --BUFFER_POOL_LIMIT--
>  -DECIMAL_V2-
>  -DEFAULT_SPILLABLE_BUFFER_SIZE-
>  DISABLE_CODEGEN_ROWS_THRESHOLD
>  ENABLE_EXPR_REWRITES
>  -MAX_ROW_SIZE-
>  -MIN_SPILLABLE_BUFFER_SIZE-
>  -PARQUET_ARRAY_RESOLUTION-
>  PARQUET_DICTIONARY_FILTERING
>  PARQUET_READ_STATISTICS
>  STRICT_MODE



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2751) quote in WITH block's comment breaks shell

2018-06-13 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya resolved IMPALA-2751.
--
Resolution: Fixed

> quote in WITH block's comment breaks shell
> --
>
> Key: IMPALA-2751
> URL: https://issues.apache.org/jira/browse/IMPALA-2751
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2
> Environment: CDH5.4.8
>Reporter: Marcell Szabo
>Assignee: Fredy Wijaya
>Priority: Minor
>  Labels: impala-shell, shell, usability
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Steps to reproduce:
> $ cat > test.sql
> with a as (
> select 'a'
> -- shouldn't matter
> ) 
> select * from a; 
> $ impala-shell -f test.sql 
> /usr/bin/impala-shell: line 32: warning: setlocale: LC_CTYPE: cannot change 
> locale (UTF-8): No such file or directory
> /usr/bin/impala-shell: line 32: warning: setlocale: LC_CTYPE: cannot change 
> locale (UTF-8): No such file or directory
> Starting Impala Shell without Kerberos authentication
> Connected to host:21000
> Server version: impalad version 2.2.0-cdh5 RELEASE (build 
> 1d0b017e2441dd8950924743d839f14b3995e259)
> Traceback (most recent call last):
>   File "/usr/lib/impala-shell/impala_shell.py", line 1006, in 
> execute_queries_non_interactive_mode(options)
>   File "/usr/lib/impala-shell/impala_shell.py", line 922, in 
> execute_queries_non_interactive_mode
> if shell.onecmd(query) is CmdStatus.ERROR:
>   File "/usr/lib64/python2.6/cmd.py", line 219, in onecmd
> return func(arg)
>   File "/usr/lib/impala-shell/impala_shell.py", line 762, in do_with
> tokens = list(lexer)
>   File "/usr/lib64/python2.6/shlex.py", line 269, in next
> token = self.get_token()
>   File "/usr/lib64/python2.6/shlex.py", line 96, in get_token
> raw = self.read_token()
>   File "/usr/lib64/python2.6/shlex.py", line 172, in read_token
> raise ValueError, "No closing quotation"
> ValueError: No closing quotation
> Also, copy-pasting the query interactively, the line never closes.
> Strangely, the issue only seems to occur in presence of the WITH block.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-6816) Statestore spends a lot of time in GetMinSubscriberTopicVersion()

2018-06-13 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-6816:
-

Assignee: Tim Armstrong

> Statestore spends a lot of time in GetMinSubscriberTopicVersion()
> -
>
> Key: IMPALA-6816
> URL: https://issues.apache.org/jira/browse/IMPALA-6816
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: admission-control, statestore
>
> {noformat}
> Samples: 13K of event 'cycles', Event count (approx.): 1200870513
>   20.23%  statestored  impalad  [.] 
> impala::Statestore::GetMinSubscriberTopicVersion(std::string const&, 
> std::string*)
>7.68%  statestored  [kernel.kallsyms][k] find_busiest_group
>3.46%  statestored  impalad  [.] 
> impala::Statestore::Subscriber::LastTopicVersionProcessed(std::string const&) 
> const
>3.26%  statestored  libc-2.12.so [.] __memcmp_sse4_1
>1.41%  statestored  [kernel.kallsyms][k] find_next_bit
>1.40%  statestored  [kernel.kallsyms][k] cpumask_next_and
>1.21%  statestored  libpthread-2.12.so   [.] pthread_mutex_lock
>1.04%  statestored  libc-2.12.so [.] memcpy
>1.01%  statestored  [kernel.kallsyms][k] _spin_lock
>0.98%  statestored  impalad  [.] 0x0088f903
>0.93%  statestored  impalad  [.] 0x0088f8f5
>0.91%  statestored  impalad  [.] 0x0088f8ea
>0.85%  statestored  [kernel.kallsyms][k] ixgbe_xmit_frame_ring
>0.77%  statestored  impalad  [.] 0x0088f8e3
>0.75%  statestored  impalad  [.] 0x0088f900
>0.75%  statestored  impalad  [.] 
> impala::Statestore::IsPrioritizedTopic(std::string const&)
>0.73%  statestored  impalad  [.] 0x0088f8fa
>0.72%  statestored  impalad  [.] operator new[](unsigned 
> long)
>0.68%  statestored  [kernel.kallsyms][k] tcp_recvmsg
>0.67%  statestored  impalad  [.] 0x0088f8fd
>0.66%  statestored  impalad  [.] 
> impala::Statestore::Topic::BuildDelta(std::string const&, long, 
> impala::TTopicDelta*)
>0.61%  statestored  [kernel.kallsyms][k] thread_return
>0.60%  statestored  impalad  [.] 0x0088f8f2
>0.60%  statestored  libstdc++.so.6   [.] 
> std::string::compare(std::string const&) const
>0.59%  statestored  impalad  [.] 0x0088f8e6
>0.56%  statestored  impalad  [.] 0x0088f8ee
>0.56%  statestored  libcrypto.so.1.0.1e  [.] aesni_encrypt
>0.55%  statestored  impalad  [.] 0x0088f8e0
>0.55%  statestored  [kernel.kallsyms][k] tcp_transmit_skb
>0.53%  statestored  [kernel.kallsyms][k] fget_light
>0.51%  statestored  impalad  [.] std::_Rb_tree std::pair >, 
> std::_Select1st0.50%  statestored  impalad  [.] 
> apache::thrift::transport::TVirtualTransport  apache::thrift::transport::TBufferBase>::readAll_virt(unsigned char*
>0.50%  statestored  impalad  [.] 
> impala::Statestore::DoSubscriberUpdate(impala::Statestore::UpdateKind, int, 
> impala::Statestore::ScheduledSubscriberUpdate const&)
>0.49%  statestored  libssl.so.1.0.1e [.] tls1_enc
>0.48%  statestored  libssl.so.1.0.1e [.] ssl3_read_bytes
> {noformat}
> We are spending most of our time computing this for non-catalog topics, where 
> it's not even used.
> There are a couple of ways we could fix this that I can think of:
> * Avoid including this information for topics where we're not interested in it
> * Cache or precompute the value somehow to avoid iterating over all 
> subscribers every time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6816) Statestore spends a lot of time in GetMinSubscriberTopicVersion()

2018-06-13 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6816:
--
Labels: admission-control statestore  (was: )

> Statestore spends a lot of time in GetMinSubscriberTopicVersion()
> -
>
> Key: IMPALA-6816
> URL: https://issues.apache.org/jira/browse/IMPALA-6816
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: admission-control, statestore
>
> {noformat}
> Samples: 13K of event 'cycles', Event count (approx.): 1200870513
>   20.23%  statestored  impalad  [.] 
> impala::Statestore::GetMinSubscriberTopicVersion(std::string const&, 
> std::string*)
>7.68%  statestored  [kernel.kallsyms][k] find_busiest_group
>3.46%  statestored  impalad  [.] 
> impala::Statestore::Subscriber::LastTopicVersionProcessed(std::string const&) 
> const
>3.26%  statestored  libc-2.12.so [.] __memcmp_sse4_1
>1.41%  statestored  [kernel.kallsyms][k] find_next_bit
>1.40%  statestored  [kernel.kallsyms][k] cpumask_next_and
>1.21%  statestored  libpthread-2.12.so   [.] pthread_mutex_lock
>1.04%  statestored  libc-2.12.so [.] memcpy
>1.01%  statestored  [kernel.kallsyms][k] _spin_lock
>0.98%  statestored  impalad  [.] 0x0088f903
>0.93%  statestored  impalad  [.] 0x0088f8f5
>0.91%  statestored  impalad  [.] 0x0088f8ea
>0.85%  statestored  [kernel.kallsyms][k] ixgbe_xmit_frame_ring
>0.77%  statestored  impalad  [.] 0x0088f8e3
>0.75%  statestored  impalad  [.] 0x0088f900
>0.75%  statestored  impalad  [.] 
> impala::Statestore::IsPrioritizedTopic(std::string const&)
>0.73%  statestored  impalad  [.] 0x0088f8fa
>0.72%  statestored  impalad  [.] operator new[](unsigned 
> long)
>0.68%  statestored  [kernel.kallsyms][k] tcp_recvmsg
>0.67%  statestored  impalad  [.] 0x0088f8fd
>0.66%  statestored  impalad  [.] 
> impala::Statestore::Topic::BuildDelta(std::string const&, long, 
> impala::TTopicDelta*)
>0.61%  statestored  [kernel.kallsyms][k] thread_return
>0.60%  statestored  impalad  [.] 0x0088f8f2
>0.60%  statestored  libstdc++.so.6   [.] 
> std::string::compare(std::string const&) const
>0.59%  statestored  impalad  [.] 0x0088f8e6
>0.56%  statestored  impalad  [.] 0x0088f8ee
>0.56%  statestored  libcrypto.so.1.0.1e  [.] aesni_encrypt
>0.55%  statestored  impalad  [.] 0x0088f8e0
>0.55%  statestored  [kernel.kallsyms][k] tcp_transmit_skb
>0.53%  statestored  [kernel.kallsyms][k] fget_light
>0.51%  statestored  impalad  [.] std::_Rb_tree std::pair >, 
> std::_Select1st0.50%  statestored  impalad  [.] 
> apache::thrift::transport::TVirtualTransport  apache::thrift::transport::TBufferBase>::readAll_virt(unsigned char*
>0.50%  statestored  impalad  [.] 
> impala::Statestore::DoSubscriberUpdate(impala::Statestore::UpdateKind, int, 
> impala::Statestore::ScheduledSubscriberUpdate const&)
>0.49%  statestored  libssl.so.1.0.1e [.] tls1_enc
>0.48%  statestored  libssl.so.1.0.1e [.] ssl3_read_bytes
> {noformat}
> We are spending most of our time computing this for non-catalog topics, where 
> it's not even used.
> There are a couple of ways we could fix this that I can think of:
> * Avoid including this information for topics where we're not interested in it
> * Cache or precompute the value somehow to avoid iterating over all 
> subscribers every time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5216) Admission control queuing should be asynchronous

2018-06-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511338#comment-16511338
 ] 

ASF subversion and git services commented on IMPALA-5216:
-

Commit 2de9db8fc64dc1054ac3ca41ab0f0047db670da6 in impala's branch 
refs/heads/master from [~bikram.sngh91]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=2de9db8 ]

IMPALA-5216: Make admission control queuing async

Implement asynchronous admission control queuing. This is achieved by
running the admission control code-path in a separate thread. Major
changes include: propagating cancellation to the admission control
thread and dequeuing thread, and adding a new Query Operation State
called "PENDING" that represents the state between completion of
planning and starting of query execution.

Testing:
- Added a deterministic end to end test and a session expiry test.
- Ran multiple stress tests successfully with a cancellation probability
of 60% and with different values for the following parameters:
max_requests, queue_wait_timeout_ms. Ensured that the impalad was in a
valid state afterwards (no orphan fragments or wrong metrics).
- Ran all exhaustive tests and ASAN core tests successfully.
- Ran data load successfully.

Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf
Reviewed-on: http://gerrit.cloudera.org:8080/10060
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Admission control queuing should be asynchronous
> 
>
> Key: IMPALA-5216
> URL: https://issues.apache.org/jira/browse/IMPALA-5216
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Dan Hecht
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: admission-control, resource-management
>
> Currently, admission control queuing occurs synchronously w.r.t. the 
> {{ExecuteStatement}} client RPC. That is, a query handle is not returned 
> until the query is admitted.
> Instead, the queuing should occur on the asynchronous path.  That way, the 
> client gets a query handle back immediately and can e.g. cancel a query that 
> is in the admission control queue.
> We'll also need a way to better expose the progress of a query handle 
> (related to IMPALA-124). E.g. that the query is queued for admission and what 
> resource(s) it's waiting on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2751) quote in WITH block's comment breaks shell

2018-06-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511337#comment-16511337
 ] 

ASF subversion and git services commented on IMPALA-2751:
-

Commit 2b05c3c3ca6cfa7a41215c84185a12d6ad97ff19 in impala's branch 
refs/heads/master from [~fredyw]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=2b05c3c ]

IMPALA-2751: Matching quotes are not required in comments

This patch fixes the issue where non-matching quotes inside comments
will cause the shell to not terminate.

The fix is to strip any SQL comments before sending to shlex since shlex
does not understand SQL comments and will raise an exception when it
sees unmatched quotes regardless whether the quotes are in the comments or
not.

Testing:
- Added new shell tests
- Ran all end-to-end shell tests on Python 2.6 and Python 2.7

Change-Id: I2feae34026a7e63f3d31489f757f093a73ca5d2c
Reviewed-on: http://gerrit.cloudera.org:8080/10541
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> quote in WITH block's comment breaks shell
> --
>
> Key: IMPALA-2751
> URL: https://issues.apache.org/jira/browse/IMPALA-2751
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2
> Environment: CDH5.4.8
>Reporter: Marcell Szabo
>Assignee: Fredy Wijaya
>Priority: Minor
>  Labels: impala-shell, shell, usability
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Steps to reproduce:
> $ cat > test.sql
> with a as (
> select 'a'
> -- shouldn't matter
> ) 
> select * from a; 
> $ impala-shell -f test.sql 
> /usr/bin/impala-shell: line 32: warning: setlocale: LC_CTYPE: cannot change 
> locale (UTF-8): No such file or directory
> /usr/bin/impala-shell: line 32: warning: setlocale: LC_CTYPE: cannot change 
> locale (UTF-8): No such file or directory
> Starting Impala Shell without Kerberos authentication
> Connected to host:21000
> Server version: impalad version 2.2.0-cdh5 RELEASE (build 
> 1d0b017e2441dd8950924743d839f14b3995e259)
> Traceback (most recent call last):
>   File "/usr/lib/impala-shell/impala_shell.py", line 1006, in 
> execute_queries_non_interactive_mode(options)
>   File "/usr/lib/impala-shell/impala_shell.py", line 922, in 
> execute_queries_non_interactive_mode
> if shell.onecmd(query) is CmdStatus.ERROR:
>   File "/usr/lib64/python2.6/cmd.py", line 219, in onecmd
> return func(arg)
>   File "/usr/lib/impala-shell/impala_shell.py", line 762, in do_with
> tokens = list(lexer)
>   File "/usr/lib64/python2.6/shlex.py", line 269, in next
> token = self.get_token()
>   File "/usr/lib64/python2.6/shlex.py", line 96, in get_token
> raw = self.read_token()
>   File "/usr/lib64/python2.6/shlex.py", line 172, in read_token
> raise ValueError, "No closing quotation"
> ValueError: No closing quotation
> Also, copy-pasting the query interactively, the line never closes.
> Strangely, the issue only seems to occur in presence of the WITH block.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6929) Create Kudu table syntax does not allow multi-column range partitions

2018-06-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511333#comment-16511333
 ] 

ASF subversion and git services commented on IMPALA-6929:
-

Commit b1a57f692da9b40e6d0ca5c331246be6f1d74cac in impala's branch 
refs/heads/2.x from [~twmarshall]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=b1a57f6 ]

IMPALA-6929: Support multi-column range partitions for Kudu

Kudu allows specifying range partitions over multiple columns. Impala
already has support for doing this when the partitions are specified
with '=', but if the partitions are specified with '<' or '<=', the
parser would return an error.

This patch modifies the parser to allow for creating Kudu tables like:
create table kudu_test (a int, b int, primary key(a, b))
  partition by range(a, b) (partition (0, 0) <= values < (1, 1));
and similary to alter partitions like:
alter table kudu_test add range partition (1, 1) <= values < (2, 2);

Testing:
- Modified functional_kudu.jointbl's schema so that we have a table
  in functional with a multi-column range partition to test things
  against.
- Added FE and E2E tests for CREATE and ALTER.

Change-Id: I0141dd3344a4f22b186f513b7406f286668ef1e7
Reviewed-on: http://gerrit.cloudera.org:8080/10441
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Create Kudu table syntax does not allow multi-column range partitions
> -
>
> Key: IMPALA-6929
> URL: https://issues.apache.org/jira/browse/IMPALA-6929
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.11.0
>Reporter: Dan Burkert
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> The Impala CREATE TABLE syntax guide includes this bit of grammar in the Kudu 
> partitioning section:
> {code:java}
> range_clause ::=
>   RANGE [ (pk_col [, ...]) ]
>   (
> {
>   PARTITION constant_expression range_comparison_operator VALUES 
> range_comparison_operator constant_expression
>   | PARTITION VALUE = constant_expression_or_tuple
> }
>[, ...]
>   ){code}
> This is suspicious because {{constant_expression}} is used in the range 
> clause, and {{constant_expression_or_tuple}} is used in the single-value 
> clause.  I believe both should allow for tuples.
> In other words, today a CREATE TABLE statement such as
> {code:java}
> CREATE TABLE t (a BIGINT, b BIGINT, PRIMARY KEY (a, b))
> PARTITION BY RANGE (a, b) (
>     PARTITION (0, 0) <= VALUES < (10, 0)
> ) STORED AS KUDU;{code}
> results in a syntax error, and it should not.  CC [~twmarshall]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5706) Parallelise read I/O in sorter

2018-06-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511327#comment-16511327
 ] 

ASF subversion and git services commented on IMPALA-5706:
-

Commit ab7ac5b6108646f98a9dcfcfb3a17f5ab5861586 in impala's branch 
refs/heads/2.x from [~gaborkaszab]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=ab7ac5b ]

IMPALA-5706: Spilling sort optimisations

This patch covers multiple changes with the purpose of optimizing
spilling sort mechanism:
  - Remove the hard-coded maximum limit of buffers that can be used
for merging the sorted runs. Instead this number is calculated
based on the available memory through buffer pool.
  - The already sorted runs are distributed more optimally between
the last intermediate merge and the final merge to avoid that a
heavy intermediate merge is followed by a light final merge.
  - Right before starting the merging phase Sorter tries to allocate
additional memory through the buffer pool.
  - An output run is not allocated anymore for the final merge.

Note, double-buffering the runs during a merge was also planned with
this patch. However, performance testing showed that except some
exotic queries with unreasonably small amount of buffer pool memory
available double-buffering doesn't add to the overall performance.
It's basically because the half of the available buffers have to be
sacrificed to do double-buffering and as a result the merge tree can
get deeper. In addition the amount of I/O wait time is not reaching
the level where double-buffering could countervail the reduced number
of runs during a particular merge.

Performance measurements were made during manual testing to verify
that this is in fact an optimization:
  - In case doing a sort on top of a join when working with a
restricted amount of memory then the Sort node successfully
allocates additional memory right before the merging phase. This
is feasible because once Join finishes sending new input data and
calls InputDone() then it releases memory that can be picked up
by the Sorter. This results in shallower merging trees (more runs
grabbed for a merge).
  - On a multi-node cluster I verified that in cases when at least one
merging step is done then this change reduces the execution time
for sorts.
  - The more merging steps are done the bigger the performance gain is
compared to the baseline.

Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9
Reviewed-on: http://gerrit.cloudera.org:8080/9943
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Parallelise read I/O in sorter
> --
>
> Key: IMPALA-5706
> URL: https://issues.apache.org/jira/browse/IMPALA-5706
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Tim Armstrong
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: resource-management, spill
> Fix For: Impala 3.1.0
>
>
> IMPALA-3200 offers an opportunity to improve the spilling sort algorithm:
> * Use the reliability of reservations to select the most efficient order to 
> conduct merges in (rather than greedily trying to maximise the fan-in of the 
> current merge). We want to minimise the depth of the merge tree, then 
> structure the tree based on the preferred fan-in.
> * Do multiple-buffering of the stream being written (this happens 
> automatically if there are free buffers in the BufferPool client).
> * Do multiple-buffering of the streams being read, instead of blocking on 
> read I/O frequently.
> More concretely, the idea is to implement double-buffering of spilled input 
> runs by calling BufferPool::Pin() early to prefetch the second page in each 
> input Run. Currently only one page per input run is pinned, which means that 
> the sorter frequently blocks on I/O.
> I'd suggest doing this in two steps.
> The first step is to change how the fan-in of each merge run is selected. We 
> know the number of runs to be merged and the buffer reservation that is 
> available, so we can compute the maximum possible fan-in of each merge step 
> (assuming 1 buffer for the output run and 1 buffer for each input run to the 
> merge). We can then calculate the minimum number of rounds of merging 
> required and, based on that, decide how the runs should be merged (you could 
> think about it as a tree of merge operations). I think we want to reduce the 
> number of bytes written to disk. E.g. if we have 5 buffers and 8 input runs, 
> we should merge input runs (1,2,3,4) then merge that intermediate runs with 
> runs (5,6,7). It's reasonable to assume that the input runs are all 
> approximate the same size.
> ee53ddb389549247f5bfe760d446dc7b3b963a29 actually removed some logic along 
> 

[jira] [Commented] (IMPALA-7157) Avoid unnecessarily pretty printing profiles per fragment instance

2018-06-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511335#comment-16511335
 ] 

ASF subversion and git services commented on IMPALA-7157:
-

Commit 8d49797194f5e2f2c6128945fc4fecd5054163da in impala's branch 
refs/heads/master from [~sailesh]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=8d49797 ]

IMPALA-7157: Avoid unnecessarily pretty printing profiles per fragment instance

In SendReport(), if VLOG_FILE_IS_ON is 'true' (which is not the most
verbose logging level, but is higher than default), we pretty print
the profile for every fragment instance, which is a very expensive
operation, as serializing the profile is non-trivial (look at
RuntimeProfile::PrettyPrint()), and printing large amounts of information
to the logs isn't cheap as well. Lastly, it is very noisy.

This seems unnecessary since this will not benefit us, as all the
profiles are merged at the coordinator side. We could argue that this
might be necessary when an executor fails to send the profile to the
coordinator, but that signifies a network issue which will not be
reflected in the profile of any fragment instance.

This will help reduce noise in the logs when the log level is
bumped up to find other real issues that VLOG_FILE can help with.

Change-Id: Ic0445950385fa6160764feaed9a993fa0e59b242
Reviewed-on: http://gerrit.cloudera.org:8080/10669
Reviewed-by: Sailesh Mukil 
Tested-by: Impala Public Jenkins 


> Avoid unnecessarily pretty printing profiles per fragment instance
> --
>
> Key: IMPALA-7157
> URL: https://issues.apache.org/jira/browse/IMPALA-7157
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Sailesh Mukil
>Assignee: Sailesh Mukil
>Priority: Minor
>  Labels: logs
>
> In SendReport(), if VLOG_FILE_IS_ON is 'true' (which is not the most verbose 
> logging level, but is higher than default), we pretty print the profile for 
> every fragment instance, which is a very expensive operation, as serializing 
> the profile is non-trivial (look at RuntimeProfile::PrettyPrint()), and 
> printing large amounts of information to the logs isn't cheap as well. 
> Lastly, it is very noisy.
> This seems unnecessary since this will not benefit us, as all the profiles 
> are merged at the coordinator side. We could argue that this might be 
> necessary when an executor fails to send the profile to the coordinator, but 
> that signifies a network issue which will not be reflected in the profile of 
> any fragment instance.
> This will help reduce noise in the logs when the log level is bumped up to 
> find other real issues that VLOG_FILE can help with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2751) quote in WITH block's comment breaks shell

2018-06-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511334#comment-16511334
 ] 

ASF subversion and git services commented on IMPALA-2751:
-

Commit d3362bd43c1e8a9f36fd48379ed04170cf162081 in impala's branch 
refs/heads/2.x from [~fredyw]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=d3362bd ]

IMPALA-2751: Matching quotes are not required in comments

This patch fixes the issue where non-matching quotes inside comments
will cause the shell to not terminate.

The fix is to strip any SQL comments before sending to shlex since shlex
does not understand SQL comments and will raise an exception when it
sees unmatched quotes regardless whether the quotes are in the comments or
not.

Testing:
- Added new shell tests
- Ran all end-to-end shell tests on Python 2.6 and Python 2.7

Change-Id: I2feae34026a7e63f3d31489f757f093a73ca5d2c
Reviewed-on: http://gerrit.cloudera.org:8080/10541
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> quote in WITH block's comment breaks shell
> --
>
> Key: IMPALA-2751
> URL: https://issues.apache.org/jira/browse/IMPALA-2751
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2
> Environment: CDH5.4.8
>Reporter: Marcell Szabo
>Assignee: Fredy Wijaya
>Priority: Minor
>  Labels: impala-shell, shell, usability
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Steps to reproduce:
> $ cat > test.sql
> with a as (
> select 'a'
> -- shouldn't matter
> ) 
> select * from a; 
> $ impala-shell -f test.sql 
> /usr/bin/impala-shell: line 32: warning: setlocale: LC_CTYPE: cannot change 
> locale (UTF-8): No such file or directory
> /usr/bin/impala-shell: line 32: warning: setlocale: LC_CTYPE: cannot change 
> locale (UTF-8): No such file or directory
> Starting Impala Shell without Kerberos authentication
> Connected to host:21000
> Server version: impalad version 2.2.0-cdh5 RELEASE (build 
> 1d0b017e2441dd8950924743d839f14b3995e259)
> Traceback (most recent call last):
>   File "/usr/lib/impala-shell/impala_shell.py", line 1006, in 
> execute_queries_non_interactive_mode(options)
>   File "/usr/lib/impala-shell/impala_shell.py", line 922, in 
> execute_queries_non_interactive_mode
> if shell.onecmd(query) is CmdStatus.ERROR:
>   File "/usr/lib64/python2.6/cmd.py", line 219, in onecmd
> return func(arg)
>   File "/usr/lib/impala-shell/impala_shell.py", line 762, in do_with
> tokens = list(lexer)
>   File "/usr/lib64/python2.6/shlex.py", line 269, in next
> token = self.get_token()
>   File "/usr/lib64/python2.6/shlex.py", line 96, in get_token
> raw = self.read_token()
>   File "/usr/lib64/python2.6/shlex.py", line 172, in read_token
> raise ValueError, "No closing quotation"
> ValueError: No closing quotation
> Also, copy-pasting the query interactively, the line never closes.
> Strangely, the issue only seems to occur in presence of the WITH block.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6929) Create Kudu table syntax does not allow multi-column range partitions

2018-06-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511336#comment-16511336
 ] 

ASF subversion and git services commented on IMPALA-6929:
-

Commit bf2124bf30bb768f06a614bdfa260a6877b1da35 in impala's branch 
refs/heads/master from [~twmarshall]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=bf2124b ]

IMPALA-6929: Support multi-column range partitions for Kudu

Kudu allows specifying range partitions over multiple columns. Impala
already has support for doing this when the partitions are specified
with '=', but if the partitions are specified with '<' or '<=', the
parser would return an error.

This patch modifies the parser to allow for creating Kudu tables like:
create table kudu_test (a int, b int, primary key(a, b))
  partition by range(a, b) (partition (0, 0) <= values < (1, 1));
and similary to alter partitions like:
alter table kudu_test add range partition (1, 1) <= values < (2, 2);

Testing:
- Modified functional_kudu.jointbl's schema so that we have a table
  in functional with a multi-column range partition to test things
  against.
- Added FE and E2E tests for CREATE and ALTER.

Change-Id: I0141dd3344a4f22b186f513b7406f286668ef1e7
Reviewed-on: http://gerrit.cloudera.org:8080/10441
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Create Kudu table syntax does not allow multi-column range partitions
> -
>
> Key: IMPALA-6929
> URL: https://issues.apache.org/jira/browse/IMPALA-6929
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.11.0
>Reporter: Dan Burkert
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> The Impala CREATE TABLE syntax guide includes this bit of grammar in the Kudu 
> partitioning section:
> {code:java}
> range_clause ::=
>   RANGE [ (pk_col [, ...]) ]
>   (
> {
>   PARTITION constant_expression range_comparison_operator VALUES 
> range_comparison_operator constant_expression
>   | PARTITION VALUE = constant_expression_or_tuple
> }
>[, ...]
>   ){code}
> This is suspicious because {{constant_expression}} is used in the range 
> clause, and {{constant_expression_or_tuple}} is used in the single-value 
> clause.  I believe both should allow for tuples.
> In other words, today a CREATE TABLE statement such as
> {code:java}
> CREATE TABLE t (a BIGINT, b BIGINT, PRIMARY KEY (a, b))
> PARTITION BY RANGE (a, b) (
>     PARTITION (0, 0) <= VALUES < (10, 0)
> ) STORED AS KUDU;{code}
> results in a syntax error, and it should not.  CC [~twmarshall]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7158) Incorrect init of HdfsScanNodeBase::progress_

2018-06-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511329#comment-16511329
 ] 

ASF subversion and git services commented on IMPALA-7158:
-

Commit 3a727a217212f44f953ab0461adf9b9f87dbf9a2 in impala's branch 
refs/heads/2.x from Bharath Vissapragada
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=3a727a2 ]

IMPALA-7158: Fix HdfsScanNodeBase::progress_'s init

(Testing) Verified that the correct node id is being logged
with this patch and --v=2.

Change-Id: Id2a738edea80ff3fb13ff368b4093c8b4ef34df7
Reviewed-on: http://gerrit.cloudera.org:8080/10672
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Incorrect init of HdfsScanNodeBase::progress_
> -
>
> Key: IMPALA-7158
> URL: https://issues.apache.org/jira/browse/IMPALA-7158
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: bharath v
>Assignee: bharath v
>Priority: Minor
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> I was digging into something else and came across this. The initialization of 
> the scan split progress is incorrect 
> {noformat}
> Status HdfsScanNodeBase::Open(RuntimeState* state) {
>   RETURN_IF_ERROR(ScanNode::Open(state));
>   .
>   .
>   progress_.Init(Substitute("Splits complete (node=$0)", 
> total_splits),total_splits);
>   return Status::OK();
> }{noformat}
> It should be {{progress_.Init(Substitute("Splits complete (node=$0)", 
> id_),total_splits);}} instead



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7157) Avoid unnecessarily pretty printing profiles per fragment instance

2018-06-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511332#comment-16511332
 ] 

ASF subversion and git services commented on IMPALA-7157:
-

Commit 4cad55077c5fc2a630b9177bc3799d7bad7087ad in impala's branch 
refs/heads/2.x from [~sailesh]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=4cad550 ]

IMPALA-7157: Avoid unnecessarily pretty printing profiles per fragment instance

In SendReport(), if VLOG_FILE_IS_ON is 'true' (which is not the most
verbose logging level, but is higher than default), we pretty print
the profile for every fragment instance, which is a very expensive
operation, as serializing the profile is non-trivial (look at
RuntimeProfile::PrettyPrint()), and printing large amounts of information
to the logs isn't cheap as well. Lastly, it is very noisy.

This seems unnecessary since this will not benefit us, as all the
profiles are merged at the coordinator side. We could argue that this
might be necessary when an executor fails to send the profile to the
coordinator, but that signifies a network issue which will not be
reflected in the profile of any fragment instance.

This will help reduce noise in the logs when the log level is
bumped up to find other real issues that VLOG_FILE can help with.

Change-Id: Ic0445950385fa6160764feaed9a993fa0e59b242
Reviewed-on: http://gerrit.cloudera.org:8080/10669
Reviewed-by: Sailesh Mukil 
Tested-by: Impala Public Jenkins 


> Avoid unnecessarily pretty printing profiles per fragment instance
> --
>
> Key: IMPALA-7157
> URL: https://issues.apache.org/jira/browse/IMPALA-7157
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Sailesh Mukil
>Assignee: Sailesh Mukil
>Priority: Minor
>  Labels: logs
>
> In SendReport(), if VLOG_FILE_IS_ON is 'true' (which is not the most verbose 
> logging level, but is higher than default), we pretty print the profile for 
> every fragment instance, which is a very expensive operation, as serializing 
> the profile is non-trivial (look at RuntimeProfile::PrettyPrint()), and 
> printing large amounts of information to the logs isn't cheap as well. 
> Lastly, it is very noisy.
> This seems unnecessary since this will not benefit us, as all the profiles 
> are merged at the coordinator side. We could argue that this might be 
> necessary when an executor fails to send the profile to the coordinator, but 
> that signifies a network issue which will not be reflected in the profile of 
> any fragment instance.
> This will help reduce noise in the logs when the log level is bumped up to 
> find other real issues that VLOG_FILE can help with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6408) [DOCS] Description of "shuffle" hint does not mention changes in IMPALA-3930

2018-06-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511328#comment-16511328
 ] 

ASF subversion and git services commented on IMPALA-6408:
-

Commit 342b2f4ba826c560f40da57a4108aaac090e81d9 in impala's branch 
refs/heads/2.x from [~arodoni_cloudera]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=342b2f4 ]

IMPALA-6408: [DOCS] Add a missing info about SHUFFLE

Change-Id: I5738557354c384aab983f64722dde5944037aed7
Reviewed-on: http://gerrit.cloudera.org:8080/10685
Reviewed-by: Csaba Ringhofer 
Reviewed-by: Alex Rodoni 
Tested-by: Impala Public Jenkins 


> [DOCS] Description of "shuffle" hint does not mention changes in IMPALA-3930
> 
>
> Key: IMPALA-6408
> URL: https://issues.apache.org/jira/browse/IMPALA-6408
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Csaba Ringhofer
>Assignee: Alex Rodoni
>Priority: Minor
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> The change in IMPALA-3930 states that if only one partition is written 
> (because all partitioning columns are constant or the target table is not 
> partitioned), then the "shuffle" hint leads to a plan where all rows are 
> merged at the coordinator where the table sink is executed.
> The documentation of the "shuffle" hint does not mention this behavior. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2746) Backend tests should pass with leak sanitizer enabled

2018-06-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511331#comment-16511331
 ] 

ASF subversion and git services commented on IMPALA-2746:
-

Commit 7c5eb933b810c476505c9f5a044b749ed630da86 in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=7c5eb93 ]

IMPALA-7145: fix leak of OpenSSL context when spilling

Add a RAII wrapper for the OpenSSL context that automatically frees
on all exit paths from the function.

Add a backend test wrapper that enables LeakSanitizer for an individual
test. This is a step towards IMPALA-2746.

Fix version check bug in asan.h.

Testing:
Enable LeakSanitizer for openssl-util-test. This reliably found the bug.

Ran core tests under ASAN.

Change-Id: I98760ed8f31b18b489a156f945c29c95c9bf3184
Reviewed-on: http://gerrit.cloudera.org:8080/10666
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Backend tests should pass with leak sanitizer enabled
> -
>
> Key: IMPALA-2746
> URL: https://issues.apache.org/jira/browse/IMPALA-2746
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Martin Grund
>Priority: Minor
>  Labels: resource-management, test-infra
>
> Currently, when running the backend tests with ASAN, the build will fail if 
> memory leak detection is enabled. We should investigate where leaks occur and 
> fix them to make sure we can benefit from the leak detection as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5931) Don't synthesize block metadata in the catalog for S3/ADLS

2018-06-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511325#comment-16511325
 ] 

ASF subversion and git services commented on IMPALA-5931:
-

Commit 11554a17c75b242767d5a50d66bc2874aa545c77 in impala's branch 
refs/heads/2.x from [~vercego]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=11554a1 ]

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range data structures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- added backend scheduler tests
- mixed-filesystems test covers tables/queries with multiple fs's.
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
- TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Reviewed-on: http://gerrit.cloudera.org:8080/8523
Reviewed-by: Vuk Ercegovac 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/10692
Reviewed-by: Impala Public Jenkins 


> Don't synthesize block metadata in the catalog for S3/ADLS
> --
>
> Key: IMPALA-5931
> URL: https://issues.apache.org/jira/browse/IMPALA-5931
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Dan Hecht
>Assignee: Vuk Ercegovac
>Priority: Major
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Today, the catalog synthesizes block metadata for S3/ADLS by just breaking up 
> splittable files into "blocks" with the FileSystem's default block size. 
> Rather than carrying these blocks around in the catalog and distributing them 
> to all impalad's, we might as well generate the scan ranges on-the-fly during 
> planning. That would save the memory and network bandwidth of blocks.
> That does mean that the planner will have to instantiate and call the 
> filesystem to get the default block size, but for these FileSystem's, that's 
> just a matter of reading the config.
> Perhaps the same can be done for HDFS erasure coding, though that depends on 
> what a block location actually means in that context and whether they contain 
> useful info.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-5655) Update docs for max_block_mgr_memory/buffer_pool_limit

2018-06-13 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-5655.
---
Resolution: Fixed

> Update docs for max_block_mgr_memory/buffer_pool_limit
> --
>
> Key: IMPALA-5655
> URL: https://issues.apache.org/jira/browse/IMPALA-5655
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Alex Rodoni
>Priority: Minor
>
> The undocumented max_block_mgr_memory query option will be replaced with a 
> new query option (probably buffer_pool_limit, but TBD). max_block_mgr_memory 
> doesn't show up in the docs output but there is a .xml file in the docs/ 
> directory.
> We should update the docs to reflect this change.
> It may be worth documenting buffer_pool_limit, but only as a very advanced 
> option that may be removed or have it's behaviour changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-5937) Docs are missing some query options

2018-06-13 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-5937:

Description: 
I noticed that the following options show up in "SET" in impala-shell but don't 
have corresponding documentation entries. I know BUFFER_POOL_LIMIT is mentioned 
in IMPALA-5655.

--BUFFER_POOL_LIMIT--
 -DECIMAL_V2-
 -DEFAULT_SPILLABLE_BUFFER_SIZE-
 DISABLE_CODEGEN_ROWS_THRESHOLD
 ENABLE_EXPR_REWRITES
 -MAX_ROW_SIZE-
 -MIN_SPILLABLE_BUFFER_SIZE-
 -PARQUET_ARRAY_RESOLUTION-
 PARQUET_DICTIONARY_FILTERING
 PARQUET_READ_STATISTICS
 STRICT_MODE

  was:
I noticed that the following options show up in "SET" in impala-shell but don't 
have corresponding documentation entries. I know BUFFER_POOL_LIMIT is mentioned 
in IMPALA-5655. 


--BUFFER_POOL_LIMIT--
DECIMAL_V2
-DEFAULT_SPILLABLE_BUFFER_SIZE-
DISABLE_CODEGEN_ROWS_THRESHOLD
ENABLE_EXPR_REWRITES
-MAX_ROW_SIZE-
-MIN_SPILLABLE_BUFFER_SIZE-
PARQUET_ARRAY_RESOLUTION
PARQUET_DICTIONARY_FILTERING
PARQUET_READ_STATISTICS
STRICT_MODE



> Docs are missing some query options
> ---
>
> Key: IMPALA-5937
> URL: https://issues.apache.org/jira/browse/IMPALA-5937
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Philip Zeyliger
>Assignee: Alex Rodoni
>Priority: Major
>
> I noticed that the following options show up in "SET" in impala-shell but 
> don't have corresponding documentation entries. I know BUFFER_POOL_LIMIT is 
> mentioned in IMPALA-5655.
> --BUFFER_POOL_LIMIT--
>  -DECIMAL_V2-
>  -DEFAULT_SPILLABLE_BUFFER_SIZE-
>  DISABLE_CODEGEN_ROWS_THRESHOLD
>  ENABLE_EXPR_REWRITES
>  -MAX_ROW_SIZE-
>  -MIN_SPILLABLE_BUFFER_SIZE-
>  -PARQUET_ARRAY_RESOLUTION-
>  PARQUET_DICTIONARY_FILTERING
>  PARQUET_READ_STATISTICS
>  STRICT_MODE



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org