[jira] [Commented] (IMPALA-12419) TestIcebergTable.test_migrated_table_field_id_resolution fails in S3 build

2024-02-08 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815658#comment-17815658
 ] 

Steve Loughran commented on IMPALA-12419:
-

hey, you still seeing this?

> TestIcebergTable.test_migrated_table_field_id_resolution fails in S3 build
> --
>
> Key: IMPALA-12419
> URL: https://issues.apache.org/jira/browse/IMPALA-12419
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.3.0
>Reporter: Wenzhe Zhou
>Priority: Major
>
> TestIcebergTable.test_migrated_table_field_id_resolution fails in S3 build
> {code:java}
> query_test/test_iceberg.py:246: in test_migrated_table_field_id_resolution
> "iceberg_migrated_alter_test", "parquet")
> common/file_utils.py:58: in create_iceberg_table_from_directory
> check_call(['hdfs', 'dfs', '-rm', '-f', '-r', hdfs_dir])
> /data/jenkins/workspace/impala-cdw-master-core-s3/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190:
>  in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command '['hdfs', 'dfs', '-rm', '-f', '-r', 
> '/test-warehouse/iceberg_migrated_alter_test']' returned non-zero exit status 
> 1
> {code}
> Standard Error
> {code:java}
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_migrated_table_field_id_resolution_eb4581e8` 
> CASCADE;
> -- 2023-09-04 03:37:39,538 INFO MainThread: Started query 
> 4149ca931eb6d16c:97d680a8
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_migrated_table_field_id_resolution_eb4581e8`;
> -- 2023-09-04 03:37:45,054 INFO MainThread: Started query 
> 3d4215586e7766ad:333cb5bd
> -- 2023-09-04 03:37:45,356 INFO MainThread: Created database 
> "test_migrated_table_field_id_resolution_eb4581e8" for test ID 
> "query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]"
> Picked up JAVA_TOOL_OPTIONS:  
> -javaagent:/data/jenkins/workspace/impala-cdw-master-core-s3/repos/Impala/fe/target/dependency/jamm-0.4.0.jar
> 23/09/04 03:37:46 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 23/09/04 03:37:46 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 23/09/04 03:37:46 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 23/09/04 03:37:46 INFO Configuration.deprecation: No unit for 
> fs.s3a.connection.request.timeout(0) assuming SECONDS
> 23/09/04 03:37:48 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 23/09/04 03:37:48 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 23/09/04 03:37:48 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.
> 23/09/04 03:37:48 WARN fs.FileSystem: Failed to initialize fileystem 
> s3a://impala-test-uswest2-2: java.nio.file.AccessDeniedException: 
> impala-test-uswest2-2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: 
> No AWS Credentials provided by TemporaryAWSCredentialsProvider 
> SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
> IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to 
> load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or 
> AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
> rm: impala-test-uswest2-2: 
> org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
> provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
> EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
> com.amazonaws.SdkClientException: Unable to load AWS credentials from 
> environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and 
> AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-12419) TestIcebergTable.test_migrated_table_field_id_resolution fails in S3 build

2023-09-11 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763691#comment-17763691
 ] 

Steve Loughran commented on IMPALA-12419:
-

test run hasn't included any credentials; not running in EC2 either
{code}
rm: impala-test-uswest2-2: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
com.amazonaws.SdkClientException: Unable to load AWS credentials from 
environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY 
(or AWS_SECRET_ACCESS_KEY))

{code}



> TestIcebergTable.test_migrated_table_field_id_resolution fails in S3 build
> --
>
> Key: IMPALA-12419
> URL: https://issues.apache.org/jira/browse/IMPALA-12419
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.3.0
>Reporter: Wenzhe Zhou
>Priority: Major
>
> TestIcebergTable.test_migrated_table_field_id_resolution fails in S3 build
> {code:java}
> query_test/test_iceberg.py:246: in test_migrated_table_field_id_resolution
> "iceberg_migrated_alter_test", "parquet")
> common/file_utils.py:58: in create_iceberg_table_from_directory
> check_call(['hdfs', 'dfs', '-rm', '-f', '-r', hdfs_dir])
> /data/jenkins/workspace/impala-cdw-master-core-s3/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190:
>  in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command '['hdfs', 'dfs', '-rm', '-f', '-r', 
> '/test-warehouse/iceberg_migrated_alter_test']' returned non-zero exit status 
> 1
> {code}
> Standard Error
> {code:java}
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_migrated_table_field_id_resolution_eb4581e8` 
> CASCADE;
> -- 2023-09-04 03:37:39,538 INFO MainThread: Started query 
> 4149ca931eb6d16c:97d680a8
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_migrated_table_field_id_resolution_eb4581e8`;
> -- 2023-09-04 03:37:45,054 INFO MainThread: Started query 
> 3d4215586e7766ad:333cb5bd
> -- 2023-09-04 03:37:45,356 INFO MainThread: Created database 
> "test_migrated_table_field_id_resolution_eb4581e8" for test ID 
> "query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]"
> Picked up JAVA_TOOL_OPTIONS:  
> -javaagent:/data/jenkins/workspace/impala-cdw-master-core-s3/repos/Impala/fe/target/dependency/jamm-0.4.0.jar
> 23/09/04 03:37:46 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 23/09/04 03:37:46 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 23/09/04 03:37:46 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 23/09/04 03:37:46 INFO Configuration.deprecation: No unit for 
> fs.s3a.connection.request.timeout(0) assuming SECONDS
> 23/09/04 03:37:48 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 23/09/04 03:37:48 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 23/09/04 03:37:48 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.
> 23/09/04 03:37:48 WARN fs.FileSystem: Failed to initialize fileystem 
> s3a://impala-test-uswest2-2: java.nio.file.AccessDeniedException: 
> impala-test-uswest2-2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: 
> No AWS Credentials provided by TemporaryAWSCredentialsProvider 
> SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
> IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to 
> load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or 
> AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
> rm: impala-test-uswest2-2: 
> 

[jira] (IMPALA-11629) Support Huawei OBS (Object Storage Service) FileSystem

2023-02-20 Thread Steve Loughran (Jira)


[ https://issues.apache.org/jira/browse/IMPALA-11629 ]


Steve Loughran deleted comment on IMPALA-11629:
-

was (Author: ste...@apache.org):
Why was this done? we had problems with the cos stuff and as its not supported 
by cdh then we should just cut it from our distributions, even if the source is 
there for consistency with apache impala.

> Support Huawei OBS (Object Storage Service) FileSystem
> --
>
> Key: IMPALA-11629
> URL: https://issues.apache.org/jira/browse/IMPALA-11629
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: yx91490
>Assignee: yx91490
>Priority: Major
> Fix For: Impala 4.3.0
>
> Attachments: hdfs_ee_test.log, hdfs_ee_test_patch10.log, 
> obs_ee_test.log, obs_ee_test_core_results.csv, obs_ee_test_patch10.log
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11629) Support Huawei OBS (Object Storage Service) FileSystem

2023-02-20 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691216#comment-17691216
 ] 

Steve Loughran commented on IMPALA-11629:
-

hadn't noticed this was an external jira; will cut that comment.

the problem was HADOOP-18159  with old libraries and the *detail* that even 
shaded sdks (including amazon) weren't shading the list of public domains, 
which broke s3a interaction with newer AWS regions. First fix: HADOOP-18307, 
second is actually upgrading the JAR.

Just be careful when pulling in shaded dependencies that they don't do the 
wrong thing here. (oh, and cloudstore storediag now prints out where the file 
came from, to help track it down in future 
https://github.com/steveloughran/cloudstore )

> Support Huawei OBS (Object Storage Service) FileSystem
> --
>
> Key: IMPALA-11629
> URL: https://issues.apache.org/jira/browse/IMPALA-11629
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: yx91490
>Assignee: yx91490
>Priority: Major
> Fix For: Impala 4.3.0
>
> Attachments: hdfs_ee_test.log, hdfs_ee_test_patch10.log, 
> obs_ee_test.log, obs_ee_test_core_results.csv, obs_ee_test_patch10.log
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11629) Support Huawei OBS (Object Storage Service) FileSystem

2023-02-16 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689660#comment-17689660
 ] 

Steve Loughran commented on IMPALA-11629:
-

Why was this done? we had problems with the cos stuff and as its not supported 
by cdh then we should just cut it from our distributions, even if the source is 
there for consistency with apache impala.

> Support Huawei OBS (Object Storage Service) FileSystem
> --
>
> Key: IMPALA-11629
> URL: https://issues.apache.org/jira/browse/IMPALA-11629
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: yx91490
>Assignee: yx91490
>Priority: Major
> Fix For: Impala 4.3.0
>
> Attachments: hdfs_ee_test.log, hdfs_ee_test_patch10.log, 
> obs_ee_test.log, obs_ee_test_core_results.csv, obs_ee_test_patch10.log
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11662) Improve "refresh iceberg_tbl_on_oss;" performance

2023-02-13 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688040#comment-17688040
 ] 

Steve Loughran commented on IMPALA-11662:
-

it came with HDFS-14478; you just need to set a baseline hadoop version to be 
one with the api

> Improve "refresh iceberg_tbl_on_oss;" performance
> -
>
> Key: IMPALA-11662
> URL: https://issues.apache.org/jira/browse/IMPALA-11662
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Li Penglin
>Assignee: Li Penglin
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.3.0
>
>
> Since Iceberg provides rich metadata, the cost of directory listing on OSS 
> service e.g. S3A is higher than the cost on HDFS, we could create the file 
> descriptors from Iceberg metadata instead of using 
> org.apache.hadoop.fs.FileSystem#listFiles. 
> https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L189.
> The only thing missing there is the last_modification_time of the files. But 
> since Iceberg files are immutable, maybe we could just come up with a special 
> timestamp for these files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11807) TestIcebergTable.test_avro_file_format and TestIcebergTable.test_mixed_file_format failed

2022-12-20 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649714#comment-17649714
 ] 

Steve Loughran commented on IMPALA-11807:
-

you using the right filesystem in the openFile() call? as the file being opened 
has an s3a url, but the exception has the same path under 
hdfs://localhost:20500/, and there's no hdfsd there

> TestIcebergTable.test_avro_file_format and 
> TestIcebergTable.test_mixed_file_format failed
> -
>
> Key: IMPALA-11807
> URL: https://issues.apache.org/jira/browse/IMPALA-11807
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Wenzhe Zhou
>Assignee: Noemi Pap-Takacs
>Priority: Major
>
> TestIcebergTable.test_avro_file_format failed after merging patch 
> IMPALA-11708 (Add support for mixed Iceberg tables with AVRO file format).
> {code:java}
> *Error Message*
> query_test/test_iceberg.py:906: in test_avro_file_format 
> self.run_test_case('QueryTest/iceberg-avro', vector, unique_database) 
> common/impala_test_suite.py:712: in run_test_case result = exec_fn(query, 
> user=test_section.get('USER', '').strip() or None) 
> common/impala_test_suite.py:650: in __exec_in_impala result = 
> self.__execute_query(target_impalad_client, query, user=user) 
> common/impala_test_suite.py:986: in __execute_query return 
> impalad_client.execute(query, user=user) common/impala_connection.py:212: in 
> execute return self.__beeswax_client.execute(sql_stmt, user=user) 
> beeswax/impala_beeswax.py:189: in execute handle = 
> self.__execute_query(query_string.strip(), user=user) 
> beeswax/impala_beeswax.py:365: in __execute_query handle = 
> self.execute_query_async(query_string, user=user) 
> beeswax/impala_beeswax.py:359: in execute_query_async handle = 
> self.__do_rpc(lambda: self.imp_service.query(query,)) 
> beeswax/impala_beeswax.py:522: in __do_rpc raise 
> ImpalaBeeswaxException(self.__build_error_message(b), b) E   
> ImpalaBeeswaxException: ImpalaBeeswaxException: EINNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> EMESSAGE: AnalysisException: Failed 
> to load metadata for table: 'functional_parquet.iceberg_avro_format' E   
> CAUSED BY: TableLoadingException: IcebergTableLoadingException: Error loading 
> metadata for Iceberg table 
> s3a://impala-test-uswest2-2/test-warehouse/functional_parquet.db/iceberg_avro_format
>  E   CAUSED BY: RuntimeIOException: Failed to open input stream for file: 
> hdfs://localhost:20500/test-warehouse/functional_parquet.db/iceberg_avro_format/metadata/snap-5594844384179945437-1-6b11ef63-7b9a-48a5-a448-7cc329eb85ec.avro
>  E   CAUSED BY: ConnectException: Call From 
> impala-ec2-centos79-m6i-4xlarge-ondemand-1b22.vpc.cloudera.com/127.0.0.1 to 
> localhost:20500 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused E   CAUSED BY: 
> ConnectException: Connection refused
> *Stacktrace*
> query_test/test_iceberg.py:906: in test_avro_file_format
> self.run_test_case('QueryTest/iceberg-avro', vector, unique_database)
> common/impala_test_suite.py:712: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:650: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:986: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:212: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:189: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:365: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:359: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:522: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'functional_parquet.iceberg_avro_format'
> E   CAUSED BY: TableLoadingException: IcebergTableLoadingException: Error 
> loading metadata for Iceberg table 
> s3a://impala-test-uswest2-2/test-warehouse/functional_parquet.db/iceberg_avro_format
> E   CAUSED BY: RuntimeIOException: Failed to open input stream for file: 
> 

[jira] [Commented] (IMPALA-11662) Improve "refresh iceberg_tbl_on_oss;" performance

2022-11-02 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627987#comment-17627987
 ] 

Steve Loughran commented on IMPALA-11662:
-

if you use the iterator api to list files, listStatusIterator()/listFiles, 
listLocatedStatus, ..., you get a iterator back which on both abfs and s3a will 
do background fetches of pages of data, rather than block until all the pages 
of data are back. hides a lot of the latency.

note, listing is a lot slower on versioned buckets where older versions of 
files have been overwritten/deleted, even deleted dir markers cause problems. 
Are you testing on versioned buckets? if so, turning off directory marker 
deletion makes a big difference

> Improve "refresh iceberg_tbl_on_oss;" performance
> -
>
> Key: IMPALA-11662
> URL: https://issues.apache.org/jira/browse/IMPALA-11662
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: LiPenglin
>Priority: Major
>  Labels: impala-iceberg
>
> Since Iceberg provides rich metadata, the cost of directory listing on OSS 
> service e.g. S3A is higher than the cost on HDFS, we could create the file 
> descriptors from Iceberg metadata instead of using 
> org.apache.hadoop.fs.FileSystem#listFiles. 
> https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L189.
> The only thing missing there is the last_modification_time of the files. But 
> since Iceberg files are immutable, maybe we could just come up with a special 
> timestamp for these files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11592) TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build

2022-09-19 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606596#comment-17606596
 ] 

Steve Loughran commented on IMPALA-11592:
-

is this reproducible? our guess is that it can only be triggered by a GC 
happening at a certain point, which we will add resilience for (keeping strong 
ref in the method, safety checks afterwards)

> TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build
> ---
>
> Key: IMPALA-11592
> URL: https://issues.apache.org/jira/browse/IMPALA-11592
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build
> Attachments: catalogd.INFO.gz, impalad.INFO.gz, 
> impalad_node1.INFO.gz, impalad_node2.INFO.gz
>
>
> custom_cluster.test_local_catalog.TestLocalCatalogRetries.test_fetch_metadata_retry
>  fails in a S3 build:
> {noformat}
> custom_cluster/test_local_catalog.py:317: in test_fetch_metadata_retry
> seen = self._check_metadata_retries(queries)
> custom_cluster/test_local_catalog.py:293: in _check_metadata_retries
> assert failed_queries.empty(),\
> E   AssertionError: Failed query count non zero: [('refresh 
> functional.alltypes', 'ImpalaBeeswaxException:\n Query 
> aborted:TableLoadingException: Refreshing file and block metadata for 24 
> paths for table functional.alltypes: failed to load 1 paths. Check the 
> catalog server log for more details.\n\n')]
> E   assert  0x7fe3fb710128>>()
> E+  where  0x7fe3fb710128>> = .empty
> {noformat}
> Looking into the catalog server log, there is a NullPointerException:
> {noformat}
> E0916 21:15:27.469425 25508 ParallelFileMetadataLoader.java:171] Refreshing 
> file and block metadata for 24 paths for table functional.alltypes 
> encountered an error loading data for path 
> s3a://impala-test-uswest2-2/test-warehouse/alltypes/year=2010/month=9
> Java exception follows:
> java.util.concurrent.ExecutionException: java.lang.NullPointerException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:168)
> at 
> org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:120)
> at 
> org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:781)
> at org.apache.impala.catalog.HdfsTable.access$100(HdfsTable.java:153)
> at 
> org.apache.impala.catalog.HdfsTable$PartitionDeltaUpdater.apply(HdfsTable.java:1534)
> at 
> org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(HdfsTable.java:1411)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1254)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1179)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.reloadTable(CatalogServiceCatalog.java:2551)
> at 
> org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:6158)
> at 
> org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:287)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.(Listing.java:621)
> at 
> org.apache.hadoop.fs.s3a.Listing.createObjectListingIterator(Listing.java:163)
> at 
> org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Listing.java:144)
> at 
> org.apache.hadoop.fs.s3a.Listing.getListFilesAssumingDir(Listing.java:212)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(S3AFileSystem.java:4790)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listFiles$37(S3AFileSystem.java:4732)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2363)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2382)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(S3AFileSystem.java:4731)
> at 
> org.apache.impala.common.FileSystemUtil.listFiles(FileSystemUtil.java:754)
> at 
> org.apache.impala.common.FileSystemUtil.listStatus(FileSystemUtil.java:729)
> at 
> 

[jira] [Commented] (IMPALA-11592) TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build

2022-09-19 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606534#comment-17606534
 ] 

Steve Loughran commented on IMPALA-11592:
-

while we try and fix this, could you set 
org.apache.hadoop.fs.statistics.impl.IOStatisticsContextIntegration to log at 
debug to see if it notes inference of GC events

> TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build
> ---
>
> Key: IMPALA-11592
> URL: https://issues.apache.org/jira/browse/IMPALA-11592
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build
> Attachments: catalogd.INFO.gz, impalad.INFO.gz, 
> impalad_node1.INFO.gz, impalad_node2.INFO.gz
>
>
> custom_cluster.test_local_catalog.TestLocalCatalogRetries.test_fetch_metadata_retry
>  fails in a S3 build:
> {noformat}
> custom_cluster/test_local_catalog.py:317: in test_fetch_metadata_retry
> seen = self._check_metadata_retries(queries)
> custom_cluster/test_local_catalog.py:293: in _check_metadata_retries
> assert failed_queries.empty(),\
> E   AssertionError: Failed query count non zero: [('refresh 
> functional.alltypes', 'ImpalaBeeswaxException:\n Query 
> aborted:TableLoadingException: Refreshing file and block metadata for 24 
> paths for table functional.alltypes: failed to load 1 paths. Check the 
> catalog server log for more details.\n\n')]
> E   assert  0x7fe3fb710128>>()
> E+  where  0x7fe3fb710128>> = .empty
> {noformat}
> Looking into the catalog server log, there is a NullPointerException:
> {noformat}
> E0916 21:15:27.469425 25508 ParallelFileMetadataLoader.java:171] Refreshing 
> file and block metadata for 24 paths for table functional.alltypes 
> encountered an error loading data for path 
> s3a://impala-test-uswest2-2/test-warehouse/alltypes/year=2010/month=9
> Java exception follows:
> java.util.concurrent.ExecutionException: java.lang.NullPointerException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:168)
> at 
> org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:120)
> at 
> org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:781)
> at org.apache.impala.catalog.HdfsTable.access$100(HdfsTable.java:153)
> at 
> org.apache.impala.catalog.HdfsTable$PartitionDeltaUpdater.apply(HdfsTable.java:1534)
> at 
> org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(HdfsTable.java:1411)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1254)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1179)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.reloadTable(CatalogServiceCatalog.java:2551)
> at 
> org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:6158)
> at 
> org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:287)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.(Listing.java:621)
> at 
> org.apache.hadoop.fs.s3a.Listing.createObjectListingIterator(Listing.java:163)
> at 
> org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Listing.java:144)
> at 
> org.apache.hadoop.fs.s3a.Listing.getListFilesAssumingDir(Listing.java:212)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(S3AFileSystem.java:4790)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listFiles$37(S3AFileSystem.java:4732)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2363)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2382)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(S3AFileSystem.java:4731)
> at 
> org.apache.impala.common.FileSystemUtil.listFiles(FileSystemUtil.java:754)
> at 
> org.apache.impala.common.FileSystemUtil.listStatus(FileSystemUtil.java:729)
> at 
> org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:190)
>   

[jira] [Created] (IMPALA-11207) use hadoop-cloud-storage as the cloud store connector dependency

2022-03-29 Thread Steve Loughran (Jira)
Steve Loughran created IMPALA-11207:
---

 Summary: use hadoop-cloud-storage as the cloud store connector 
dependency
 Key: IMPALA-11207
 URL: https://issues.apache.org/jira/browse/IMPALA-11207
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 4.0.1
Reporter: Steve Loughran


use hadoop-cloud-storage as a dependency to get the full set of cloud store 
dependencies of a release, with all dependencies you don't need stripped out. 
in particular, hadoop-common is cut so there's no need to repeat whatever 
excludes you have there.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8544) Expose additional S3A / S3Guard metrics

2021-01-08 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261474#comment-17261474
 ] 

Steve Loughran commented on IMPALA-8544:



Going to say "no point collecting S3guard metrics" given it is now surplus. But 
the IOStatistics API will be supported by both S3A and ABFS for individual 
streams, remoteIterators for listings, etc. 

If you can use that API, you can collect statistics across all the stores

> Expose additional S3A / S3Guard metrics
> ---
>
> Key: IMPALA-8544
> URL: https://issues.apache.org/jira/browse/IMPALA-8544
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>  Labels: s3
>
> S3A / S3Guard internally collects several useful metrics that we should 
> consider exposing to Impala users. The full list of statistics can be found 
> in {{o.a.h.fs.s3a.Statistic}}. The stats include: the number of S3 operations 
> performed (put, get, etc.), invocation counts for various {{FileSystem}} 
> methods, stream statistics (bytes read, written, etc.), etc.
> Some interesting stats that stand out:
>  * "stream_aborted": "Count of times the TCP stream was aborted" - the number 
> of TCP connection aborts, a high value would indicate performance issues
>  * "stream_read_exceptions" : "Number of exceptions invoked on input streams" 
> - incremented whenever an {{IOException}} is caught while reading (these 
> exception don't always get propagated to Impala because they trigger a retry)
>  * "store_io_throttled": "Requests throttled and retried" - looks like it 
> tracks the number of times the fs retries an operation because the original 
> request hit a throttling exception
>  * "s3guard_metadatastore_retry": "S3Guard metadata store retry events" - 
> looks like it tracks the number of times the fs retries S3Guard operations
>  * "s3guard_metadatastore_throttled" : "S3Guard metadata store throttled 
> events" - similar to "store_io_throttled" but looks like it is specific to 
> S3Guard
> We should consider how to expose these metrics via Impala logs / runtime 
> profiles.
> There are a few options:
>  * {{S3AFileSystem}} exposes {{StorageStatistics}} specific to S3A / S3Guard 
> via the {{FileSystem#getStorageStatistics}} method; the 
> {{S3AStorageStatistics}} seems to include all the S3A / S3Guard metrics, 
> however, I think the stats might be aggregated globally, which would make it 
> hard to create per-query specific metrics
>  * {{S3AInstrumentation}} exposes all the metrics as well, and looks like it 
> is per-fs instance, so it is not aggregated globally; {{S3AInstrumentation}} 
> extends {{o.a.h.metrics2.MetricsSource}} so perhaps it is exposed via some 
> API (haven't looked into this yet)
>  * {{S3AInputStream#toString}} dumps the statistics from 
> {{o.a.h.fs.s3a.S3AInstrumentation.InputStreamStatistics}} and 
> {{S3AFileSystem#toString}} dumps them all as well
>  * {{S3AFileSystem}} updates the stats in 
> {{o.a.h.fs.Statistics.StatisticsData}} as well (e.g. bytesRead, bytesWritten, 
> etc.)
> Impala has a {{hdfs-fs-cache}} as well, so {{hdfsFs}} objects get shared 
> across threads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10073) Create shaded dependency for S3A and aws-java-sdk-bundle

2020-09-08 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192126#comment-17192126
 ] 

Steve Loughran commented on IMPALA-10073:
-

Which bits of the AWS SDK have you cut?



> Create shaded dependency for S3A and aws-java-sdk-bundle
> 
>
> Key: IMPALA-10073
> URL: https://issues.apache.org/jira/browse/IMPALA-10073
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> One of the largest dependencies in Impala Docker containers is the 
> aws-java-sdk-bundle jar. One way to decrease the size of this dependency is 
> to apply a similar technique used for the hive-exec shaded jar: 
> [https://github.com/apache/impala/blob/master/shaded-deps/pom.xml]
> The aws-java-sdk-bundle contains SDKs for all AWS services, even though 
> Impala-S3A only requires a few of the more basic SDKs.
> IMPALA-10028 and HADOOP-17197 both discuss this a bit as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10028) Additional optimizations of Impala docker container sizes

2020-08-10 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175003#comment-17175003
 ] 

Steve Loughran commented on IMPALA-10028:
-

I just closed HADOOP-17197 as WONTFIX. Maybe consider cloud-enabled vs on-prem 
docker images

> Additional optimizations of Impala docker container sizes
> -
>
> Key: IMPALA-10028
> URL: https://issues.apache.org/jira/browse/IMPALA-10028
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> There are some more optimizations we can make to get the images to be even 
> smaller. It looks like we may have regressed with regards to image size as 
> well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release 
> build and they are currently 1.01 GB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9759) Revisit integration of snapshot dataload with s3guard

2020-07-17 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159820#comment-17159820
 ] 

Steve Loughran commented on IMPALA-9759:


+1 for unique keys. Otherwise: if you config s3guard to use etag version 
tracking, it will detect mismatch and when opening a file, retry awaiting the 
version it knows about to be available 

> Revisit integration of snapshot dataload with s3guard
> -
>
> Key: IMPALA-9759
> URL: https://issues.apache.org/jira/browse/IMPALA-9759
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Assignee: Sahil Takiar
>Priority: Critical
>  Labels: broken-build, flaky
>
> Sometimes, the s3 jobs (which use s3guard for consistency) sees test failures 
> due to missing files from the dataload snapshot (see bottom). This may be 
> related to the interaction of snapshot loading with s3guard. We should nail 
> down exactly the right procedure for loading the snapshot. Currently, we do 
> the following:
> 1. Remove any data from the s3bucket via the s3 commandline
> 2. Create the s3guard dynamodb table (or reuse existing one if a previous job 
> failed without deleting the old dynamodb table)
> 3. Prune any existing entries from that table
> 4. Load the snapshot to the s3 bucket
> In theory, this leave s3guard with an empty dynamodb table and an s3bucket 
> with data. As tests progress and try to access the s3 bucket, s3guard would 
> see that there is no entry in the dynamodb table and then check the 
> underlying s3 bucket.
> We need to revisit these steps and verify that everything is being done 
> correctly.
> {noformat}
> metadata/test_metadata_query_statements.py:70: in test_show_stats
> self.run_test_case('QueryTest/show-stats', vector, "functional")
> common/impala_test_suite.py:687: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:523: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E assert Comparing QueryTestResults (expected vs actual):
> E '2009','1',310,1,'19.95KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=1
>  == '2009','1',310,1,'19.95KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=1
> E '2009','10',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=10
>  == '2009','10',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=10
> E '2009','11',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=11
>  == '2009','11',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=11
> E '2009','12',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=12
>  == '2009','12',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=12
> E '2009','2',280,1,'18.12KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=2
>  == '2009','2',280,1,'18.12KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=2
> E '2009','3',310,1,'20.06KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=3
>  == '2009','3',310,1,'20.06KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=3
> E '2009','4',300,1,'19.61KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=4
>  == '2009','4',300,1,'19.61KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=4
> E '2009','5',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=5
>  != '2009','5',0,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=5
> E '2009','6',300,1,'19.71KB','NOT CACHED','NOT 
> 

[jira] [Commented] (IMPALA-3717) Additional s3 setting to allow encryption algorithm

2020-05-18 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110232#comment-17110232
 ] 

Steve Loughran commented on IMPALA-3717:


this has long been done in s3a. please close. 

> Additional s3 setting to allow encryption algorithm
> ---
>
> Key: IMPALA-3717
> URL: https://issues.apache.org/jira/browse/IMPALA-3717
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Affects Versions: Impala 2.6.0
>Reporter: Pavas Garg
>Priority: Minor
>  Labels: s3
>
> distcp and impala requires an additional s3 setting on the configuration 
> 1. To allow not only the selection of encryption algorithm but 
> 2. Also the master key name (which will be held within the AWS KMS).
> The S3 API has the following option on the rest service to achieve this 
> "x-amz-server-side-encryption-aws-kms-key-id". 
> This should just be a case of adding the config option and passing this onto 
> the S3 call.
> Please see Server-Side Encryption Specific Request Headers on -
> http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8544) Expose additional S3A / S3Guard metrics

2020-05-11 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104467#comment-17104467
 ] 

Steve Loughran commented on IMPALA-8544:


HADOOP-16830 -adds a public statistics API...Please review

> Expose additional S3A / S3Guard metrics
> ---
>
> Key: IMPALA-8544
> URL: https://issues.apache.org/jira/browse/IMPALA-8544
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>  Labels: s3
>
> S3A / S3Guard internally collects several useful metrics that we should 
> consider exposing to Impala users. The full list of statistics can be found 
> in {{o.a.h.fs.s3a.Statistic}}. The stats include: the number of S3 operations 
> performed (put, get, etc.), invocation counts for various {{FileSystem}} 
> methods, stream statistics (bytes read, written, etc.), etc.
> Some interesting stats that stand out:
>  * "stream_aborted": "Count of times the TCP stream was aborted" - the number 
> of TCP connection aborts, a high value would indicate performance issues
>  * "stream_read_exceptions" : "Number of exceptions invoked on input streams" 
> - incremented whenever an {{IOException}} is caught while reading (these 
> exception don't always get propagated to Impala because they trigger a retry)
>  * "store_io_throttled": "Requests throttled and retried" - looks like it 
> tracks the number of times the fs retries an operation because the original 
> request hit a throttling exception
>  * "s3guard_metadatastore_retry": "S3Guard metadata store retry events" - 
> looks like it tracks the number of times the fs retries S3Guard operations
>  * "s3guard_metadatastore_throttled" : "S3Guard metadata store throttled 
> events" - similar to "store_io_throttled" but looks like it is specific to 
> S3Guard
> We should consider how to expose these metrics via Impala logs / runtime 
> profiles.
> There are a few options:
>  * {{S3AFileSystem}} exposes {{StorageStatistics}} specific to S3A / S3Guard 
> via the {{FileSystem#getStorageStatistics}} method; the 
> {{S3AStorageStatistics}} seems to include all the S3A / S3Guard metrics, 
> however, I think the stats might be aggregated globally, which would make it 
> hard to create per-query specific metrics
>  * {{S3AInstrumentation}} exposes all the metrics as well, and looks like it 
> is per-fs instance, so it is not aggregated globally; {{S3AInstrumentation}} 
> extends {{o.a.h.metrics2.MetricsSource}} so perhaps it is exposed via some 
> API (haven't looked into this yet)
>  * {{S3AInputStream#toString}} dumps the statistics from 
> {{o.a.h.fs.s3a.S3AInstrumentation.InputStreamStatistics}} and 
> {{S3AFileSystem#toString}} dumps them all as well
>  * {{S3AFileSystem}} updates the stats in 
> {{o.a.h.fs.Statistics.StatisticsData}} as well (e.g. bytesRead, bytesWritten, 
> etc.)
> Impala has a {{hdfs-fs-cache}} as well, so {{hdfsFs}} objects get shared 
> across threads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9702) Incoherent data read issues on S3

2020-05-07 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101561#comment-17101561
 ] 

Steve Loughran commented on IMPALA-9702:


bq. The second issue is that we intermittently see some files missing. These 
are files copied from the snapshot into s3 at the beginning of the test. This 
is likely to be some consistency issue. I think I'm going to split that out 
into its own issue.

S3 load balancers cache 404s of objects looked for *Even after the object is 
created*; lots of work in hadoop 3.3.0 to try and nail this down, with 
create(path, overwrite=true) skipping head probes, and work looking at app code 
to remove calls to exists(), getFileStatus(), isFile(), etc.

A 404 will stay in cache while clients issue HEAD/GET requests, you need to 
allow 20-30+s after the failing HEAD before the cache entry appears to expire. 
When S3Guard is enabled and we find an entry in DDB, we use that knowledge to 
spin for a while (90+s with linear backoff) awaiting for it to appear. 
Unguarded: we just try to eliminate those 404s.

what is the interval between test setup and execute? could you implement the 
probe with backoff yourself? 


> Incoherent data read issues on S3
> -
>
> Key: IMPALA-9702
> URL: https://issues.apache.org/jira/browse/IMPALA-9702
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Bikramjeet Vig
>Assignee: Joe McDonnell
>Priority: Critical
>
> A bunch of tests with extra rows returned or wrong metadata:
> {noformat}
> metadata/test_ddl.py:445: in test_alter_table
> multiple_impalad=self._use_multiple_impalad(vector))
> common/impala_test_suite.py:687: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:523: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 1,1,'2012/withslash' == 1,1,'2012/withslash'
> E 2,1,'2012' == 2,1,'2012'
> E 2,1,'2012' == 2,1,'2012'
> E 3,1,'2013' == 3,1,'2013'
> E 3,1,'2013' == 3,1,'2013'
> E 3,1,'2013' == 3,1,'2013'
> E 4,NULL,'NULL' == 4,NULL,'NULL'
> E 4,NULL,'NULL' == 4,NULL,'NULL'
> E 4,NULL,'NULL' == 4,NULL,'NULL'
> E 4,NULL,'NULL' == 4,NULL,'NULL'
> E 4,NULL,'NULL' == 4,NULL,'NULL'
> E None != 5,NULL,'2013'
> E None != 5,NULL,'2013'
> E None != 5,NULL,'2013'
> E None != 5,NULL,'2013'
> E None != 5,NULL,'2013'
> E Number of rows returned (expected vs actual): 11 != 16
> {noformat}
> {noformat}
> metadata/test_explain.py:113: in test_explain_validate_cardinality_estimates
> check_cardinality(result.data, '7.30K')
> metadata/test_explain.py:98: in check_cardinality
> query_result, expected_cardinality=expected_cardinality)
> metadata/test_explain.py:86: in check_row_size_and_cardinality
> assert m.groups()[1] == expected_cardinality
> E   assert '6.99K' == '7.30K'
> E - 6.99K
> E + 7.30K
> {noformat}
> {noformat}
> ERROR:test_configuration:Comparing QueryTestResults (expected vs actual):
> 1,false,2,3,4,5,6,7,'1985-07-15','c2','my va',1 == 
> 1,false,2,3,4,5,6,7,'1985-07-15','c2','my va',1
> None != 1,false,2,3,4,5,6,7,'1985-07-15','c2','my va',1
> Number of rows returned (expected vs actual): 1 != 2
> {noformat}
> {noformat}
> metadata/test_metadata_query_statements.py:70: in test_show_stats
> self.run_test_case('QueryTest/show-stats', vector, "functional")
> common/impala_test_suite.py:687: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:523: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E '2009','1',310,1,'19.95KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=1'
>  == '2009','1',310,1,'19.95KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=1'
> E '2009','10',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=10'
>  == '2009','10',310,1,'20.36KB','NOT CACHED','NOT 
> 

[jira] [Commented] (IMPALA-9702) Incoherent data read issues on S3

2020-05-05 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100164#comment-17100164
 ] 

Steve Loughran commented on IMPALA-9702:


does this test overwrite an existing file with a new one?

> Incoherent data read issues on S3
> -
>
> Key: IMPALA-9702
> URL: https://issues.apache.org/jira/browse/IMPALA-9702
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Bikramjeet Vig
>Assignee: Joe McDonnell
>Priority: Critical
>
> A bunch of tests with extra rows returned or wrong metadata:
> {noformat}
> metadata/test_ddl.py:445: in test_alter_table
> multiple_impalad=self._use_multiple_impalad(vector))
> common/impala_test_suite.py:687: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:523: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 1,1,'2012/withslash' == 1,1,'2012/withslash'
> E 2,1,'2012' == 2,1,'2012'
> E 2,1,'2012' == 2,1,'2012'
> E 3,1,'2013' == 3,1,'2013'
> E 3,1,'2013' == 3,1,'2013'
> E 3,1,'2013' == 3,1,'2013'
> E 4,NULL,'NULL' == 4,NULL,'NULL'
> E 4,NULL,'NULL' == 4,NULL,'NULL'
> E 4,NULL,'NULL' == 4,NULL,'NULL'
> E 4,NULL,'NULL' == 4,NULL,'NULL'
> E 4,NULL,'NULL' == 4,NULL,'NULL'
> E None != 5,NULL,'2013'
> E None != 5,NULL,'2013'
> E None != 5,NULL,'2013'
> E None != 5,NULL,'2013'
> E None != 5,NULL,'2013'
> E Number of rows returned (expected vs actual): 11 != 16
> {noformat}
> {noformat}
> metadata/test_explain.py:113: in test_explain_validate_cardinality_estimates
> check_cardinality(result.data, '7.30K')
> metadata/test_explain.py:98: in check_cardinality
> query_result, expected_cardinality=expected_cardinality)
> metadata/test_explain.py:86: in check_row_size_and_cardinality
> assert m.groups()[1] == expected_cardinality
> E   assert '6.99K' == '7.30K'
> E - 6.99K
> E + 7.30K
> {noformat}
> {noformat}
> ERROR:test_configuration:Comparing QueryTestResults (expected vs actual):
> 1,false,2,3,4,5,6,7,'1985-07-15','c2','my va',1 == 
> 1,false,2,3,4,5,6,7,'1985-07-15','c2','my va',1
> None != 1,false,2,3,4,5,6,7,'1985-07-15','c2','my va',1
> Number of rows returned (expected vs actual): 1 != 2
> {noformat}
> {noformat}
> metadata/test_metadata_query_statements.py:70: in test_show_stats
> self.run_test_case('QueryTest/show-stats', vector, "functional")
> common/impala_test_suite.py:687: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:523: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E '2009','1',310,1,'19.95KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=1'
>  == '2009','1',310,1,'19.95KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=1'
> E '2009','10',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=10'
>  == '2009','10',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=10'
> E '2009','11',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=11'
>  == '2009','11',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=11'
> E '2009','12',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=12'
>  == '2009','12',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=12'
> E '2009','2',280,1,'18.12KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=2'
>  == '2009','2',280,1,'18.12KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=2'
> E 

[jira] [Commented] (IMPALA-8577) Crash during OpenSSLSocket.read

2020-02-11 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034609#comment-17034609
 ] 

Steve Loughran commented on IMPALA-8577:


which openssl version? 1.0.4 doesn't seem to work with openssl 1.1.1 *at all*, 
forcing some backport fun

> Crash during OpenSSLSocket.read
> ---
>
> Key: IMPALA-8577
> URL: https://issues.apache.org/jira/browse/IMPALA-8577
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: David Rorke
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: 5ca78771-ad78-4a29-31f88aa6-9bfac38c.dmp, 
> hs_err_pid6313.log, 
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.ERROR.20190521-103105.6313,
>  
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.INFO.20190521-103105.6313
>
>
> Impalad crashed while running a TPC-DS 10 TB run against S3.   Excerpt from 
> the stack trace (hs_err log file attached with more complete stack):
> {noformat}
> Stack: [0x7f3d095bc000,0x7f3d09dbc000],  sp=0x7f3d09db9050,  free 
> space=8180k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [impalad+0x2528a33]  
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int)+0x133
> C  [impalad+0x2528e0f]  tcmalloc::ThreadCache::Scavenge()+0x3f
> C  [impalad+0x266468a]  operator delete(void*)+0x32a
> C  [libcrypto.so.10+0x6e70d]  CRYPTO_free+0x1d
> J 5709  org.wildfly.openssl.SSLImpl.freeBIO0(J)V (0 bytes) @ 
> 0x7f3d4dadf9f9 [0x7f3d4dadf940+0xb9]
> J 5708 C1 org.wildfly.openssl.SSLImpl.freeBIO(J)V (5 bytes) @ 
> 0x7f3d4dfd0dfc [0x7f3d4dfd0d80+0x7c]
> J 5158 C1 org.wildfly.openssl.OpenSSLEngine.shutdown()V (78 bytes) @ 
> 0x7f3d4de4fe2c [0x7f3d4de4f720+0x70c]
> J 5758 C1 org.wildfly.openssl.OpenSSLEngine.closeInbound()V (51 bytes) @ 
> 0x7f3d4de419cc [0x7f3d4de417c0+0x20c]
> J 2994 C2 
> org.wildfly.openssl.OpenSSLEngine.unwrap(Ljava/nio/ByteBuffer;[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult;
>  (892 bytes) @ 0x7f3d4db8da34 [0x7f3d4db8c900+0x1134]
> J 3161 C2 org.wildfly.openssl.OpenSSLSocket.read([BII)I (810 bytes) @ 
> 0x7f3d4dd64cb0 [0x7f3d4dd646c0+0x5f0]
> J 5090 C2 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer()I
>  (97 bytes) @ 0x7f3d4ddd9ee0 [0x7f3d4ddd9e40+0xa0]
> J 5846 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.fillInputBuffer(I)I
>  (48 bytes) @ 0x7f3d4d7acb24 [0x7f3d4d7ac7a0+0x384]
> J 5845 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.isStale()Z (31 
> bytes) @ 0x7f3d4d7ad49c [0x7f3d4d7ad220+0x27c]
> {noformat}
> The crash may not be easy to reproduce.  I've run this test multiple times 
> and only crashed once.   I have a core file if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9316) Consider coalescing S3 scans

2020-02-05 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030531#comment-17030531
 ] 

Steve Loughran commented on IMPALA-9316:


FYI, [~omalley] has a PR for vectored IO
https://github.com/apache/hadoop/pull/1830

For S3 & Azure, they would both skip any requests against the store, and be 
able to do ranged reads in parallel

* the coalescing code would be done in the store client, based on its knowledge 
of store characteristics
* and they can do parallel reads which you couldn't do yourself
* but, we will need a new API.

> Consider coalescing S3 scans
> 
>
> Key: IMPALA-9316
> URL: https://issues.apache.org/jira/browse/IMPALA-9316
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
>
> We should consider coalescing S3 reads. IIUC the current {{DiskIoMgr}} code 
> for S3A does not do anything special for scheduling S3 scan ranges. It simply 
> round-robin assigns scans to IO threads.
> I think there might be a smarter algorithm we could employ when scheduling S3 
> reads. A few things to consider:
> * With the migration to {{hdfsPreadFully}}, each S3 scan range should 
> correspond to a single HTTP GET request (assuming the 8 MB limit is not hit, 
> see below)
> * {{read_size}} limits the size of a read to 8 MB (I believe if a scan range 
> exceeds this limit, the reads are just done on the same IO thread, but 
> sequentially - they are broken up into multiple HTTP GET requests)
> * S3A has a readahead option that defaults to 64 KB, however, it only applies 
> in certain situations
> ** If {{fs.s3a.experimental.input.fadvise=random}} (which is the recommended 
> value when reading Parquet / ORC data), the readahead applies if (1) it won't 
> cause the read to go past the end of the file, and (2) the request read 
> length is under 64 KB (it reads up to Math.max(requested-read-length, 64 KB)) 
> (so the readahead most likely applies for small reads)
> Coalescing reads would allow Impala to combine multiple, small HTTP GET 
> requests into fewer, larger HTTP GET requests. There may be some data that 
> needs to be skipped over, but the cost of reading that extra data might 
> outweigh the cost of issuing multiple HTTP requests. Since each HTTP request 
> requires a round-trip to S3, issuing a lot of GET requests can be costly, 
> especially if each only reads a small amount of data.
> Some implementation factors to consider:
> * There should probably be a limit on the maximum size of a read request (is 
> 8 MB the right value for S3?)
> * Since S3A uses a default of 64 KB for their readahead, we can probably use 
> a similar value
> * Should the number of disk IO threads be considered when coalescing reads? 
> e.g. by default there are 16 IO threads, if there are 16 small scan ranges, 
> does it make more sense to coalesce them into a single large scan range, or 
> would we get better throughput by issuing all 16 in parallel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9058) S3 tests failing with FileNotFoundException getVersionMarkerItem on ../VERSION

2020-02-03 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028936#comment-17028936
 ] 

Steve Loughran commented on IMPALA-9058:


actually, this could be table not found

> S3 tests failing with FileNotFoundException getVersionMarkerItem on ../VERSION
> --
>
> Key: IMPALA-9058
> URL: https://issues.apache.org/jira/browse/IMPALA-9058
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Critical
>
> I've seen this happen several times now, S3 tests intermittently fail with an 
> error such as:
> {code:java}
> Query aborted:InternalException: Error adding partitions E   CAUSED BY: 
> MetaException: java.io.IOException: Got exception: 
> java.io.FileNotFoundException getVersionMarkerItem on ../VERSION: 
> com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException: Requested 
> resource not found (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ResourceNotFoundException; Request ID: 
> 8T9IS939MDI7ASOB0IJCC34J3NVV4KQNSO5AEMVJF66Q9ASUAAJG) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9058) S3 tests failing with FileNotFoundException getVersionMarkerItem on ../VERSION

2020-01-24 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022993#comment-17022993
 ] 

Steve Loughran commented on IMPALA-9058:


please -full stack would make it easier to validate error change.

Maybe ResourceNotFound here means the table is missing...

> S3 tests failing with FileNotFoundException getVersionMarkerItem on ../VERSION
> --
>
> Key: IMPALA-9058
> URL: https://issues.apache.org/jira/browse/IMPALA-9058
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Critical
>
> I've seen this happen several times now, S3 tests intermittently fail with an 
> error such as:
> {code:java}
> Query aborted:InternalException: Error adding partitions E   CAUSED BY: 
> MetaException: java.io.IOException: Got exception: 
> java.io.FileNotFoundException getVersionMarkerItem on ../VERSION: 
> com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException: Requested 
> resource not found (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ResourceNotFoundException; Request ID: 
> 8T9IS939MDI7ASOB0IJCC34J3NVV4KQNSO5AEMVJF66Q9ASUAAJG) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9112) Consider removing hdfsExists calls when writing files to S3

2020-01-23 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022503#comment-17022503
 ] 

Steve Loughran commented on IMPALA-9112:


you can use createFile(path, false) to say "overwrite is not allowed" on HDFS, 
native FS this is an atomice create-no-overwrite call; for s3 and abfs we do a 
HEAD.

saying overwrite=false means that s3A client will do that HEAD (and so may 
create a 404), but you at least save the overhead of your own round trip call 
to the store

> Consider removing hdfsExists calls when writing files to S3
> ---
>
> Key: IMPALA-9112
> URL: https://issues.apache.org/jira/browse/IMPALA-9112
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> There are a few places in the backend where we call {{hdfsExists}} before 
> writing out a file. This can cause issues when writing data to S3, because S3 
> can cache 404 Not Found errors. This issue manifests itself with errors such 
> as:
> {code:java}
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d88/.3943ae7ccf00711e-59606d88000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq
>  TO 
> s3a://[bucket-name]/[table-name]/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq)
>  failed, error was: 
> s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d88/.3943ae7ccf00711e-59606d88000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq
> Error(5): Input/output error
> Root cause: AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 
> 404; Error Code: 404 Not Found; Request ID: []; S3 Extended Request ID: 
> []){code}
> HADOOP-13884, HADOOP-13950, HADOOP-16490 - the HDFS clients allow specifying 
> an "overwrite" option when creating a file; this can avoid doing any HEAD 
> requests when opening a file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9316) Consider coalescing S3 scans

2020-01-23 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021978#comment-17021978
 ] 

Steve Loughran commented on IMPALA-9316:


Hadoop 3.3 (And to be backported to 3.2.x has a new openFile() builder API 
Which lets you add specific options (and mandatory options) when opening a 
file. You can set hints for seek on s3a (todo: standard option name for all 
stores), readahead range

there

{code}
fs.openFile(testpath)
.withFileStatus(listingStatus)
.opt("fs.s3a.experimental.fadvise", "random")
.build()
.get()
{code}  

You get back a future where any HEAD op is done async; if you pass in a file 
status/located filestatus then the FS *may* use that to bypass any probes for 
the file existing. This speeds up going from listfiles to open; there's no IO 
to S3 until the first byte is actually read.




> Consider coalescing S3 scans
> 
>
> Key: IMPALA-9316
> URL: https://issues.apache.org/jira/browse/IMPALA-9316
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
>
> We should consider coalescing S3 reads. IIUC the current {{DiskIoMgr}} code 
> for S3A does not do anything special for scheduling S3 scan ranges. It simply 
> round-robin assigns scans to IO threads.
> I think there might be a smarter algorithm we could employ when scheduling S3 
> reads. A few things to consider:
> * With the migration to {{hdfsPreadFully}}, each S3 scan range should 
> correspond to a single HTTP GET request (assuming the 8 MB limit is not hit, 
> see below)
> * {{read_size}} limits the size of a read to 8 MB (I believe if a scan range 
> exceeds this limit, the reads are just done on the same IO thread, but 
> sequentially - they are broken up into multiple HTTP GET requests)
> * S3A has a readahead option that defaults to 64 KB, however, it only applies 
> in certain situations
> ** If {{fs.s3a.experimental.input.fadvise=random}} (which is the recommended 
> value when reading Parquet / ORC data), the readahead applies if (1) it won't 
> cause the read to go past the end of the file, and (2) the request read 
> length is under 64 KB (it reads up to Math.max(requested-read-length, 64 KB)) 
> (so the readahead most likely applies for small reads)
> Coalescing reads would allow Impala to combine multiple, small HTTP GET 
> requests into fewer, larger HTTP GET requests. There may be some data that 
> needs to be skipped over, but the cost of reading that extra data might 
> outweigh the cost of issuing multiple HTTP requests. Since each HTTP request 
> requires a round-trip to S3, issuing a lot of GET requests can be costly, 
> especially if each only reads a small amount of data.
> Some implementation factors to consider:
> * There should probably be a limit on the maximum size of a read request (is 
> 8 MB the right value for S3?)
> * Since S3A uses a default of 64 KB for their readahead, we can probably use 
> a similar value
> * Should the number of disk IO threads be considered when coalescing reads? 
> e.g. by default there are 16 IO threads, if there are 16 small scan ranges, 
> does it make more sense to coalesce them into a single large scan range, or 
> would we get better throughput by issuing all 16 in parallel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9316) Consider coalescing S3 scans

2020-01-23 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021974#comment-17021974
 ] 

Steve Loughran commented on IMPALA-9316:


readahead is set by fs.s3a.readahead.range. 

The new openFile() builder lets you set the seek policy and readahead range 
when you open the file; certainly seek policy should be an option to explicitly 
set when opening parquet/ORC files *unless encoded with gzip*

> Consider coalescing S3 scans
> 
>
> Key: IMPALA-9316
> URL: https://issues.apache.org/jira/browse/IMPALA-9316
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
>
> We should consider coalescing S3 reads. IIUC the current {{DiskIoMgr}} code 
> for S3A does not do anything special for scheduling S3 scan ranges. It simply 
> round-robin assigns scans to IO threads.
> I think there might be a smarter algorithm we could employ when scheduling S3 
> reads. A few things to consider:
> * With the migration to {{hdfsPreadFully}}, each S3 scan range should 
> correspond to a single HTTP GET request (assuming the 8 MB limit is not hit, 
> see below)
> * {{read_size}} limits the size of a read to 8 MB (I believe if a scan range 
> exceeds this limit, the reads are just done on the same IO thread, but 
> sequentially - they are broken up into multiple HTTP GET requests)
> * S3A has a readahead option that defaults to 64 KB, however, it only applies 
> in certain situations
> ** If {{fs.s3a.experimental.input.fadvise=random}} (which is the recommended 
> value when reading Parquet / ORC data), the readahead applies if (1) it won't 
> cause the read to go past the end of the file, and (2) the request read 
> length is under 64 KB (it reads up to Math.max(requested-read-length, 64 KB)) 
> (so the readahead most likely applies for small reads)
> Coalescing reads would allow Impala to combine multiple, small HTTP GET 
> requests into fewer, larger HTTP GET requests. There may be some data that 
> needs to be skipped over, but the cost of reading that extra data might 
> outweigh the cost of issuing multiple HTTP requests. Since each HTTP request 
> requires a round-trip to S3, issuing a lot of GET requests can be costly, 
> especially if each only reads a small amount of data.
> Some implementation factors to consider:
> * There should probably be a limit on the maximum size of a read request (is 
> 8 MB the right value for S3?)
> * Since S3A uses a default of 64 KB for their readahead, we can probably use 
> a similar value
> * Should the number of disk IO threads be considered when coalescing reads? 
> e.g. by default there are 16 IO threads, if there are 16 small scan ranges, 
> does it make more sense to coalesce them into a single large scan range, or 
> would we get better throughput by issuing all 16 in parallel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9112) Consider removing hdfsExists calls when writing files to S3

2020-01-14 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015013#comment-17015013
 ] 

Steve Loughran commented on IMPALA-9112:


please, do this. You also save the latency of a round trip. 

Without this rename is brittle. Without S3Guard -it can fail; with S3Guard, we 
can see from the DB that the file is there, so we do exponential retries 
waiting for it to appear, for up to 90s by default.

> Consider removing hdfsExists calls when writing files to S3
> ---
>
> Key: IMPALA-9112
> URL: https://issues.apache.org/jira/browse/IMPALA-9112
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> There are a few places in the backend where we call {{hdfsExists}} before 
> writing out a file. This can cause issues when writing data to S3, because S3 
> can cache 404 Not Found errors. This issue manifests itself with errors such 
> as:
> {code:java}
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d88/.3943ae7ccf00711e-59606d88000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq
>  TO 
> s3a://[bucket-name]/[table-name]/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq)
>  failed, error was: 
> s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d88/.3943ae7ccf00711e-59606d88000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq
> Error(5): Input/output error
> Root cause: AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 
> 404; Error Code: 404 Not Found; Request ID: []; S3 Extended Request ID: 
> []){code}
> HADOOP-13884, HADOOP-13950, HADOOP-16490 - the HDFS clients allow specifying 
> an "overwrite" option when creating a file; this can avoid doing any HEAD 
> requests when opening a file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread

2019-11-20 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978688#comment-16978688
 ] 

Steve Loughran commented on IMPALA-8525:


Nice!

> preads should use hdfsPreadFully rather than hdfsPread
> --
>
> Key: IMPALA-8525
> URL: https://issues.apache.org/jira/browse/IMPALA-8525
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Impala preads (only enabled if {{use_hdfs_pread}} is true) use the 
> {{hdfsPread}} API from libhdfs, which ultimately invokes 
> {{PositionedReadable#read(long position, byte[] buffer, int offset, int 
> length)}} in the HDFS-client.
> {{PositionedReadable}} also exposes the method {{readFully(long position, 
> byte[] buffer, int offset, int length)}}. The difference is that {{#read}} 
> will "Read up to the specified number of bytes" whereas {{#readFully}} will 
> "Read the specified number of bytes". So there is no guarantee that {{#read}} 
> will read *all* of the request bytes.
> Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it 
> inside a while loop until all the requested bytes have been read from the 
> file. This can cause a few performance issues:
> (1) if the underlying {{FileSystem}} does not support ByteBuffer reads 
> (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will 
> allocate a Java array equal in size to specified length of the buffer; the 
> call to {{PositionedReadable#read}} may only fill up the buffer partially; 
> Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, 
> which will cause another large array allocation; this can result in a lot of 
> wasted time doing unnecessary array allocations
> (2) given that Impala calls {{hdfsPread}} in a while loop, there is no point 
> in continuously calling {{hdfsPread}} when a single call to 
> {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect 
> performance much, but is unnecessary)
> Prior solutions to this problem have been to introduce a "chunk-size" to 
> Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related 
> changes for S3). However, with the migration to {{hdfsPreadFully}} the 
> chunk-size is no longer necessary.
> Furthermore, preads are most effective when the data is read all at once 
> (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller 
> chunks (typically 128K). For example, {{DFSInputStream#read(long position, 
> byte[] buffer, int offset, int length)}} opens up remote block readers with a 
> byte range determined by the value of {{length}} passed into the {{#read}} 
> call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request 
> with the size of the read specified by the given {{length}} (although fadvise 
> must be set to RANDOM for this to work).
> This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14564



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9058) S3 tests failing with FileNotFoundException getVersionMarkerItem on ../VERSION

2019-10-21 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956153#comment-16956153
 ] 

Steve Loughran commented on IMPALA-9058:


Happens if the DDB table is present but that doesn't contain a s3guard Version 
marker. HADOOP-16520 has just done a lot of work handling race conditions in 
cluster setup, and the ability to recover from unintentional deletion of these 
markers.

If you saw the stack trace on a build of Hadoop without this patch, I'd say 
retry patch version to see if that fixes it.

If you saw it in a patch which does (and it was only committed anywhere in the 
last week), then you may have found a regression. 

> S3 tests failing with FileNotFoundException getVersionMarkerItem on ../VERSION
> --
>
> Key: IMPALA-9058
> URL: https://issues.apache.org/jira/browse/IMPALA-9058
> Project: IMPALA
>  Issue Type: Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Critical
>
> I've seen this happen several times now, S3 tests intermittently fail with an 
> error such as:
> {code:java}
> Query aborted:InternalException: Error adding partitions E   CAUSED BY: 
> MetaException: java.io.IOException: Got exception: 
> java.io.FileNotFoundException getVersionMarkerItem on ../VERSION: 
> com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException: Requested 
> resource not found (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ResourceNotFoundException; Request ID: 
> 8T9IS939MDI7ASOB0IJCC34J3NVV4KQNSO5AEMVJF66Q9ASUAAJG) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8754) S3 with S3Guard tests encounter "ResourceNotFoundException" from DynamoDB

2019-08-23 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914206#comment-16914206
 ] 

Steve Loughran commented on IMPALA-8754:


DDB table wasn't found 
# the table doesn't exist
# the table does exist, but it is in a different region
S3Guard infers the region of the table to be that of the bucket; if you are 
reading data from buckets in other regions, the inference will be wrong.

There's some option to fix the table region; {{fs.s3a.s3guard.ddb.region}}.
you need to set this to the region where the table is

> S3 with S3Guard tests encounter "ResourceNotFoundException" from DynamoDB
> -
>
> Key: IMPALA-8754
> URL: https://issues.apache.org/jira/browse/IMPALA-8754
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> When running tests on s3 with s3guard, various tests can encounter the 
> following error coming from the DynamoDB:
> {noformat}
> EQuery aborted:Disk I/O error on 
> impala-ec2-centos74-m5-4xlarge-ondemand-02c8.vpc.cloudera.com:22002: Failed 
> to open HDFS file 
> s3a://impala-test-uswest2-1/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2451718/6843d8a91fc5ae1d-88b2af4b0004_156969840_data.0.parq
> E   Error(2): No such file or directory
> E   Root cause: ResourceNotFoundException: Requested resource not found 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ResourceNotFoundException; Request ID: 
> XXX){noformat}
> Tests that have seen this (this is flaky):
>  * TestTpcdsQuery.test_tpcds_count
>  * TestHdfsFdCaching.test_caching_disabled_by_param
>  * TestMtDop.test_compute_stats
>  * TestScanRangeLengths.test_scan_ranges



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8544) Expose additional S3A / S3Guard metrics

2019-06-18 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867113#comment-16867113
 ] 

Steve Loughran commented on IMPALA-8544:


BTW, ,S3 throttling is a per-shard in bucket, and all requests, GET, HEAD, PUT 
etc throttle. This isn't always detected by us, things just get slower

> Expose additional S3A / S3Guard metrics
> ---
>
> Key: IMPALA-8544
> URL: https://issues.apache.org/jira/browse/IMPALA-8544
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>  Labels: s3
>
> S3A / S3Guard internally collects several useful metrics that we should 
> consider exposing to Impala users. The full list of statistics can be found 
> in {{o.a.h.fs.s3a.Statistic}}. The stats include: the number of S3 operations 
> performed (put, get, etc.), invocation counts for various {{FileSystem}} 
> methods, stream statistics (bytes read, written, etc.), etc.
> Some interesting stats that stand out:
>  * "stream_aborted": "Count of times the TCP stream was aborted" - the number 
> of TCP connection aborts, a high value would indicate performance issues
>  * "stream_read_exceptions" : "Number of exceptions invoked on input streams" 
> - incremented whenever an {{IOException}} is caught while reading (these 
> exception don't always get propagated to Impala because they trigger a retry)
>  * "store_io_throttled": "Requests throttled and retried" - looks like it 
> tracks the number of times the fs retries an operation because the original 
> request hit a throttling exception
>  * "s3guard_metadatastore_retry": "S3Guard metadata store retry events" - 
> looks like it tracks the number of times the fs retries S3Guard operations
>  * "s3guard_metadatastore_throttled" : "S3Guard metadata store throttled 
> events" - similar to "store_io_throttled" but looks like it is specific to 
> S3Guard
> We should consider how to expose these metrics via Impala logs / runtime 
> profiles.
> There are a few options:
>  * {{S3AFileSystem}} exposes {{StorageStatistics}} specific to S3A / S3Guard 
> via the {{FileSystem#getStorageStatistics}} method; the 
> {{S3AStorageStatistics}} seems to include all the S3A / S3Guard metrics, 
> however, I think the stats might be aggregated globally, which would make it 
> hard to create per-query specific metrics
>  * {{S3AInstrumentation}} exposes all the metrics as well, and looks like it 
> is per-fs instance, so it is not aggregated globally; {{S3AInstrumentation}} 
> extends {{o.a.h.metrics2.MetricsSource}} so perhaps it is exposed via some 
> API (haven't looked into this yet)
>  * {{S3AInputStream#toString}} dumps the statistics from 
> {{o.a.h.fs.s3a.S3AInstrumentation.InputStreamStatistics}} and 
> {{S3AFileSystem#toString}} dumps them all as well
>  * {{S3AFileSystem}} updates the stats in 
> {{o.a.h.fs.Statistics.StatisticsData}} as well (e.g. bytesRead, bytesWritten, 
> etc.)
> Impala has a {{hdfs-fs-cache}} as well, so {{hdfsFs}} objects get shared 
> across threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8544) Expose additional S3A / S3Guard metrics

2019-06-18 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867110#comment-16867110
 ] 

Steve Loughran commented on IMPALA-8544:


With on-demand DDB (HADOOP-15563) DDB throttling is no longer an issue. Seek 
costs are more relevant and easier to tie to specific queries, so yes: collect 
them

> Expose additional S3A / S3Guard metrics
> ---
>
> Key: IMPALA-8544
> URL: https://issues.apache.org/jira/browse/IMPALA-8544
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>  Labels: s3
>
> S3A / S3Guard internally collects several useful metrics that we should 
> consider exposing to Impala users. The full list of statistics can be found 
> in {{o.a.h.fs.s3a.Statistic}}. The stats include: the number of S3 operations 
> performed (put, get, etc.), invocation counts for various {{FileSystem}} 
> methods, stream statistics (bytes read, written, etc.), etc.
> Some interesting stats that stand out:
>  * "stream_aborted": "Count of times the TCP stream was aborted" - the number 
> of TCP connection aborts, a high value would indicate performance issues
>  * "stream_read_exceptions" : "Number of exceptions invoked on input streams" 
> - incremented whenever an {{IOException}} is caught while reading (these 
> exception don't always get propagated to Impala because they trigger a retry)
>  * "store_io_throttled": "Requests throttled and retried" - looks like it 
> tracks the number of times the fs retries an operation because the original 
> request hit a throttling exception
>  * "s3guard_metadatastore_retry": "S3Guard metadata store retry events" - 
> looks like it tracks the number of times the fs retries S3Guard operations
>  * "s3guard_metadatastore_throttled" : "S3Guard metadata store throttled 
> events" - similar to "store_io_throttled" but looks like it is specific to 
> S3Guard
> We should consider how to expose these metrics via Impala logs / runtime 
> profiles.
> There are a few options:
>  * {{S3AFileSystem}} exposes {{StorageStatistics}} specific to S3A / S3Guard 
> via the {{FileSystem#getStorageStatistics}} method; the 
> {{S3AStorageStatistics}} seems to include all the S3A / S3Guard metrics, 
> however, I think the stats might be aggregated globally, which would make it 
> hard to create per-query specific metrics
>  * {{S3AInstrumentation}} exposes all the metrics as well, and looks like it 
> is per-fs instance, so it is not aggregated globally; {{S3AInstrumentation}} 
> extends {{o.a.h.metrics2.MetricsSource}} so perhaps it is exposed via some 
> API (haven't looked into this yet)
>  * {{S3AInputStream#toString}} dumps the statistics from 
> {{o.a.h.fs.s3a.S3AInstrumentation.InputStreamStatistics}} and 
> {{S3AFileSystem#toString}} dumps them all as well
>  * {{S3AFileSystem}} updates the stats in 
> {{o.a.h.fs.Statistics.StatisticsData}} as well (e.g. bytesRead, bytesWritten, 
> etc.)
> Impala has a {{hdfs-fs-cache}} as well, so {{hdfsFs}} objects get shared 
> across threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8577) Crash during OpenSSLSocket.read

2019-06-05 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856717#comment-16856717
 ] 

Steve Loughran commented on IMPALA-8577:


I've reverted the entire feature: too many problems right now. Sorry. It's just 
all the issues were piling up on me. There's a new SDK update going in 
HADOOP-16117; maybe you can try again with that.

> Crash during OpenSSLSocket.read
> ---
>
> Key: IMPALA-8577
> URL: https://issues.apache.org/jira/browse/IMPALA-8577
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: David Rorke
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: 5ca78771-ad78-4a29-31f88aa6-9bfac38c.dmp, 
> hs_err_pid6313.log, 
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.ERROR.20190521-103105.6313,
>  
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.INFO.20190521-103105.6313
>
>
> Impalad crashed while running a TPC-DS 10 TB run against S3.   Excerpt from 
> the stack trace (hs_err log file attached with more complete stack):
> {noformat}
> Stack: [0x7f3d095bc000,0x7f3d09dbc000],  sp=0x7f3d09db9050,  free 
> space=8180k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [impalad+0x2528a33]  
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int)+0x133
> C  [impalad+0x2528e0f]  tcmalloc::ThreadCache::Scavenge()+0x3f
> C  [impalad+0x266468a]  operator delete(void*)+0x32a
> C  [libcrypto.so.10+0x6e70d]  CRYPTO_free+0x1d
> J 5709  org.wildfly.openssl.SSLImpl.freeBIO0(J)V (0 bytes) @ 
> 0x7f3d4dadf9f9 [0x7f3d4dadf940+0xb9]
> J 5708 C1 org.wildfly.openssl.SSLImpl.freeBIO(J)V (5 bytes) @ 
> 0x7f3d4dfd0dfc [0x7f3d4dfd0d80+0x7c]
> J 5158 C1 org.wildfly.openssl.OpenSSLEngine.shutdown()V (78 bytes) @ 
> 0x7f3d4de4fe2c [0x7f3d4de4f720+0x70c]
> J 5758 C1 org.wildfly.openssl.OpenSSLEngine.closeInbound()V (51 bytes) @ 
> 0x7f3d4de419cc [0x7f3d4de417c0+0x20c]
> J 2994 C2 
> org.wildfly.openssl.OpenSSLEngine.unwrap(Ljava/nio/ByteBuffer;[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult;
>  (892 bytes) @ 0x7f3d4db8da34 [0x7f3d4db8c900+0x1134]
> J 3161 C2 org.wildfly.openssl.OpenSSLSocket.read([BII)I (810 bytes) @ 
> 0x7f3d4dd64cb0 [0x7f3d4dd646c0+0x5f0]
> J 5090 C2 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer()I
>  (97 bytes) @ 0x7f3d4ddd9ee0 [0x7f3d4ddd9e40+0xa0]
> J 5846 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.fillInputBuffer(I)I
>  (48 bytes) @ 0x7f3d4d7acb24 [0x7f3d4d7ac7a0+0x384]
> J 5845 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.isStale()Z (31 
> bytes) @ 0x7f3d4d7ad49c [0x7f3d4d7ad220+0x27c]
> {noformat}
> The crash may not be easy to reproduce.  I've run this test multiple times 
> and only crashed once.   I have a core file if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8577) Crash during OpenSSLSocket.read

2019-05-31 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853293#comment-16853293
 ] 

Steve Loughran commented on IMPALA-8577:


s3a calls abort() on the AWS input stream when it needs to do a big seek; the 
AWS SDK then tries to close the TCP stream without flushing it, reading to the 
end etc. If somehow that aborted stream was returned to the connection pool, I 
can see bad things happening.

I am starting to worry about the stability of the wildfly SSL +AWS SDK setup. 
Is it always automatic, or do we have a way of forcing use of the JDK version. 
I wan't to be able to field support calls with "try setting X" rather than "try 
removing JAR Y from all machines in your cluster"

> Crash during OpenSSLSocket.read
> ---
>
> Key: IMPALA-8577
> URL: https://issues.apache.org/jira/browse/IMPALA-8577
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: David Rorke
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: 5ca78771-ad78-4a29-31f88aa6-9bfac38c.dmp, 
> hs_err_pid6313.log, 
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.ERROR.20190521-103105.6313,
>  
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.INFO.20190521-103105.6313
>
>
> Impalad crashed while running a TPC-DS 10 TB run against S3.   Excerpt from 
> the stack trace (hs_err log file attached with more complete stack):
> {noformat}
> Stack: [0x7f3d095bc000,0x7f3d09dbc000],  sp=0x7f3d09db9050,  free 
> space=8180k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [impalad+0x2528a33]  
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int)+0x133
> C  [impalad+0x2528e0f]  tcmalloc::ThreadCache::Scavenge()+0x3f
> C  [impalad+0x266468a]  operator delete(void*)+0x32a
> C  [libcrypto.so.10+0x6e70d]  CRYPTO_free+0x1d
> J 5709  org.wildfly.openssl.SSLImpl.freeBIO0(J)V (0 bytes) @ 
> 0x7f3d4dadf9f9 [0x7f3d4dadf940+0xb9]
> J 5708 C1 org.wildfly.openssl.SSLImpl.freeBIO(J)V (5 bytes) @ 
> 0x7f3d4dfd0dfc [0x7f3d4dfd0d80+0x7c]
> J 5158 C1 org.wildfly.openssl.OpenSSLEngine.shutdown()V (78 bytes) @ 
> 0x7f3d4de4fe2c [0x7f3d4de4f720+0x70c]
> J 5758 C1 org.wildfly.openssl.OpenSSLEngine.closeInbound()V (51 bytes) @ 
> 0x7f3d4de419cc [0x7f3d4de417c0+0x20c]
> J 2994 C2 
> org.wildfly.openssl.OpenSSLEngine.unwrap(Ljava/nio/ByteBuffer;[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult;
>  (892 bytes) @ 0x7f3d4db8da34 [0x7f3d4db8c900+0x1134]
> J 3161 C2 org.wildfly.openssl.OpenSSLSocket.read([BII)I (810 bytes) @ 
> 0x7f3d4dd64cb0 [0x7f3d4dd646c0+0x5f0]
> J 5090 C2 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer()I
>  (97 bytes) @ 0x7f3d4ddd9ee0 [0x7f3d4ddd9e40+0xa0]
> J 5846 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.fillInputBuffer(I)I
>  (48 bytes) @ 0x7f3d4d7acb24 [0x7f3d4d7ac7a0+0x384]
> J 5845 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.isStale()Z (31 
> bytes) @ 0x7f3d4d7ad49c [0x7f3d4d7ad220+0x27c]
> {noformat}
> The crash may not be easy to reproduce.  I've run this test multiple times 
> and only crashed once.   I have a core file if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8544) Expose additional S3A / S3Guard metrics

2019-05-21 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844719#comment-16844719
 ] 

Steve Loughran commented on IMPALA-8544:


The trouble with thread-local tracking is that operations span threads, e.g. 
writes up are done block-by-block in the thread pool, rename/copy will soon do 
the same, etc. This is why the current statistics underreport, while the 
aggregate value overreports on a per-query basis (see the aggregate stats in a 
_SUCCESS file for a spark query as an example)

w.r.t exposing our stream statistics method, -1 as is  

# it removes the option for us to change that data structure.
# the fields are all non-atomic, non-volatile values so that the cost of 
incrementing them is ~0. If things are being collected, that may change.
# if people start wrapping/proxying streams, it ceases to be valid.

if you want to have some per-input stream statistics, it'd be better to have 
something which all input streams can implement, so HDFS, ABFS, etc can also 
implement. I'll take suggestions as to what is the best design here, given that 
goal of keeping incrementing the statistics low cost.

As usual, anything which goes near the filesystem APIs will need spec updates, 
tests for all the stores to implement etc. We need that to stop us breaking 
your code later.

The other thing to consider is passing more of a stats context down to 
read/write/copy/commit operations so that the work done across threads can be 
tied back to the final operation. e.g. every async write would update some 
counters which in outputstream.close() would be merged back into the classic 
per-thread counters.

+[~DanielZhou] for his thoughts on ABFS stats gathering

> Expose additional S3A / S3Guard metrics
> ---
>
> Key: IMPALA-8544
> URL: https://issues.apache.org/jira/browse/IMPALA-8544
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>  Labels: s3
>
> S3A / S3Guard internally collects several useful metrics that we should 
> consider exposing to Impala users. The full list of statistics can be found 
> in {{o.a.h.fs.s3a.Statistic}}. The stats include: the number of S3 operations 
> performed (put, get, etc.), invocation counts for various {{FileSystem}} 
> methods, stream statistics (bytes read, written, etc.), etc.
> Some interesting stats that stand out:
>  * "stream_aborted": "Count of times the TCP stream was aborted" - the number 
> of TCP connection aborts, a high value would indicate performance issues
>  * "stream_read_exceptions" : "Number of exceptions invoked on input streams" 
> - incremented whenever an {{IOException}} is caught while reading (these 
> exception don't always get propagated to Impala because they trigger a retry)
>  * "store_io_throttled": "Requests throttled and retried" - looks like it 
> tracks the number of times the fs retries an operation because the original 
> request hit a throttling exception
>  * "s3guard_metadatastore_retry": "S3Guard metadata store retry events" - 
> looks like it tracks the number of times the fs retries S3Guard operations
>  * "s3guard_metadatastore_throttled" : "S3Guard metadata store throttled 
> events" - similar to "store_io_throttled" but looks like it is specific to 
> S3Guard
> We should consider how to expose these metrics via Impala logs / runtime 
> profiles.
> There are a few options:
>  * {{S3AFileSystem}} exposes {{StorageStatistics}} specific to S3A / S3Guard 
> via the {{FileSystem#getStorageStatistics}} method; the 
> {{S3AStorageStatistics}} seems to include all the S3A / S3Guard metrics, 
> however, I think the stats might be aggregated globally, which would make it 
> hard to create per-query specific metrics
>  * {{S3AInstrumentation}} exposes all the metrics as well, and looks like it 
> is per-fs instance, so it is not aggregated globally; {{S3AInstrumentation}} 
> extends {{o.a.h.metrics2.MetricsSource}} so perhaps it is exposed via some 
> API (haven't looked into this yet)
>  * {{S3AInputStream#toString}} dumps the statistics from 
> {{o.a.h.fs.s3a.S3AInstrumentation.InputStreamStatistics}} and 
> {{S3AFileSystem#toString}} dumps them all as well
>  * {{S3AFileSystem}} updates the stats in 
> {{o.a.h.fs.Statistics.StatisticsData}} as well (e.g. bytesRead, bytesWritten, 
> etc.)
> Impala has a {{hdfs-fs-cache}} as well, so {{hdfsFs}} objects get shared 
> across threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8242) Support Iceberg on S3

2019-02-27 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779688#comment-16779688
 ] 

Steve Loughran commented on IMPALA-8242:


this could be really useful. And it doesn't just have to be S3, it's just that 
S3 is where you gain the most

> Support Iceberg on S3
> -
>
> Key: IMPALA-8242
> URL: https://issues.apache.org/jira/browse/IMPALA-8242
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Quanlong Huang
>Priority: Major
>
> http://iceberg.incubator.apache.org/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6544) Lack of S3 consistency leads to rare test failures

2019-01-02 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732496#comment-16732496
 ] 

Steve Loughran commented on IMPALA-6544:


yes: S3A create file does a check to see if a file is there before creation

* if its a directory: fail fast
* if its a file and overwrite=false, falil

It's something we've discussed killing in the past as when we know 
overwrite=true, all we care about is whether its a directory or not: no need to 
HEAD the file.

The other thing is that with the newer createFile() API call, we can add an s3 
specific option to say "skip all the existence checks". A bit dangerous, but 
very fast. You had better know what you are doing The Flink team have asked for 
it already. 

* If you switch to using S3Guard, DynamoDB gives the consistency
* If you aren't using it, you have other consistency issues lurking

Looking @ the rest of the stack (traces are always interesting), put is doiing 
an upload to one path, then kicking off a rename; the renames need its own src 
and data checks. Eliminate that temp file (remember, PUT to an object store is 
the atomic operation you need), then that'll strip out most of that IO.



> Lack of S3 consistency leads to rare test failures
> --
>
> Key: IMPALA-6544
> URL: https://issues.apache.org/jira/browse/IMPALA-6544
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: Sailesh Mukil
>Priority: Major
>  Labels: S3, broken-build, consistency, flaky, test-framework
>
> Every now and then, we hit a flaky test on S3 runs due to files missing when 
> they should be present, and vice versa. We could consider running our tests 
> (or a subset of our tests) with S3Guard to avoid these problems, however rare 
> they are.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6910) Multiple tests failing on S3 build: error reading from HDFS file

2018-11-03 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674009#comment-16674009
 ] 

Steve Loughran commented on IMPALA-6910:


Does the test overwrite a path which has been used for previous test runs with 
the delete only kicking off before this test run? If so, you are seeing delayed 
delete consistency

> Multiple tests failing on S3 build: error reading from HDFS file
> 
>
> Key: IMPALA-6910
> URL: https://issues.apache.org/jira/browse/IMPALA-6910
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: David Knupp
>Assignee: Sailesh Mukil
>Priority: Critical
>  Labels: broken-build, flaky, s3
> Fix For: Impala 3.1.0
>
>
> Stacktrace
> {noformat}
> query_test/test_compressed_formats.py:149: in test_seq_writer
> self.run_test_case('QueryTest/seq-writer', vector, unique_database)
> common/impala_test_suite.py:397: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:612: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:173: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:341: in __execute_query
> self.wait_for_completion(handle)
> beeswax/impala_beeswax.py:361: in wait_for_completion
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error: Error reading from HDFS file: 
> s3a://impala-cdh5-s3-test/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2452585/a5482dcb946b6c98-7543e0dd0004_95929617_data.0.parq
> E   Error(255): Unknown error 255
> E   Root cause: SdkClientException: Data read has a different length than the 
> expected: dataLength=8576; expectedLength=17785; includeSkipped=true; 
> in.getClass()=class com.amazonaws.services.s3.AmazonS3Client$2; 
> markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; 
> resetCount=0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7733) TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename

2018-10-30 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668927#comment-16668927
 ] 

Steve Loughran commented on IMPALA-7733:


It could be your assertions are just brittle to change, in which case spinning 
briefly until the listings are consistent is a tactic...but it is a symptom of 
a problem

> TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename
> -
>
> Key: IMPALA-7733
> URL: https://issues.apache.org/jira/browse/IMPALA-7733
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Assignee: Tianyi Wang
>Priority: Blocker
>
> I see two examples in the past two months or so where this test fails due to 
> a rename error on S3. The test's stacktrace looks like this:
> {noformat}
> query_test/test_insert_parquet.py:112: in test_insert_parquet
> self.run_test_case('insert_parquet', vector, unique_database, 
> multiple_impalad=True)
> common/impala_test_suite.py:408: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:625: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:176: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:350: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:371: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Error(s) moving partition files. First error (of 1) was: 
> Hdfs op (RENAME 
> s3a:///test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
>  TO 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq)
>  failed, error was: 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
> E   Error(5): Input/output error{noformat}
> Since we know this happens once in a while, some ideas to deflake it:
>  * retry
>  * check for this specific issue... if we think its platform flakiness, then 
> we should skip it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7733) TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename

2018-10-29 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667650#comment-16667650
 ] 

Steve Loughran commented on IMPALA-7733:


This looks like you are hitting S3 inconsistency; rename is usually the place 
where it surfaces

* you shouldn't be using any commit algorithm which relies on rename; see 
HADOOP-13786
* unless you can implement resilience to inconsistency (e.g. spinning), you are 
going to have to embrace S3Guard with some consistency metadata store.

You can't just view this as a flaky test: this is probably a symptom of a 
problem which surfaces in production: *this test has successfully found it*

> TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename
> -
>
> Key: IMPALA-7733
> URL: https://issues.apache.org/jira/browse/IMPALA-7733
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Assignee: Tianyi Wang
>Priority: Blocker
>  Labels: flaky-test
>
> I see two examples in the past two months or so where this test fails due to 
> a rename error on S3. The test's stacktrace looks like this:
> {noformat}
> query_test/test_insert_parquet.py:112: in test_insert_parquet
> self.run_test_case('insert_parquet', vector, unique_database, 
> multiple_impalad=True)
> common/impala_test_suite.py:408: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:625: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:176: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:350: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:371: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Error(s) moving partition files. First error (of 1) was: 
> Hdfs op (RENAME 
> s3a:///test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
>  TO 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq)
>  failed, error was: 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
> E   Error(5): Input/output error{noformat}
> Since we know this happens once in a while, some ideas to deflake it:
>  * retry
>  * check for this specific issue... if we think its platform flakiness, then 
> we should skip it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7221) While reading from object store S3/ADLS at +500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck

2018-07-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534724#comment-16534724
 ] 

Steve Loughran commented on IMPALA-7221:


you are free to provide the patch for that feature...

> While reading from object store S3/ADLS at +500MB/sec 
> TypeArrayKlass::allocate_common becomes a CPU bottleneck
> --
>
> Key: IMPALA-7221
> URL: https://issues.apache.org/jira/browse/IMPALA-7221
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Mostafa Mokhtar
>Assignee: Sailesh Mukil
>Priority: Major
> Attachments: s3_alloc_expensive_1_js.txt, s3_alloc_expensive_2_ps.txt
>
>
> From Perf
> {code}
> Samples: 1M of event 'cpu-clock', Event count (approx.): 32005850
>   Children  Self  Command Shared Object  Symbol   
>   ◆
> -   16.46% 0.04%  impalad impalad[.] 
> hdfsRead   ▒
>- 16.45% hdfsRead  
>   ▒
>   - 9.71% jni_NewByteArray
>   ▒
>9.63% TypeArrayKlass::allocate_common  
>   ▒
> 6.57% __memmove_ssse3_back
>   ▒
> +9.72% 0.03%  impalad libjvm.so  [.] 
> jni_NewByteArray   ▒
> +9.67% 8.79%  impalad libjvm.so  [.] 
> TypeArrayKlass::allocate_co▒
> +8.82% 0.00%  impalad [unknown]  [.] 
>    ▒
> +7.67% 0.04%  impalad [kernel.kallsyms]  [k] 
> system_call_fastpath   ▒
> +7.19% 7.02%  impalad impalad[.] 
> impala::ScalarColumnReader<▒
> +7.18% 6.55%  impalad libc-2.17.so   [.] 
> __memmove_ssse3_back   ▒
> +6.32% 0.00%  impalad [unknown]  [.] 
> 0x001a9458 ▒
> +6.07% 0.00%  impalad [kernel.kallsyms]  [k] 
> do_softirq ▒
> +6.07% 0.00%  impalad [kernel.kallsyms]  [k] 
> call_softirq   ▒
> +6.05% 0.24%  impalad [kernel.kallsyms]  [k] 
> __do_softirq   ▒
> +5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
> xen_hvm_callback_vector▒
> +5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
> xen_evtchn_do_upcall   ▒
> +5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
> irq_exit   ▒
> +5.81% 0.03%  impalad [kernel.kallsyms]  [k] 
> net_rx_action  ▒
> {code}
> {code}
> #0  0x7ffa3d78d69b in TypeArrayKlass::allocate_common(int, bool, Thread*) 
> () from /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so
> #1  0x7ffa3d3e22d2 in jni_NewByteArray () from 
> /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so
> #2  0x020ec13c in hdfsRead ()
> #3  0x01100948 in impala::io::ScanRange::Read(unsigned char*, long, 
> long*, bool*) ()
> #4  0x010fa294 in 
> impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, 
> impala::io::RequestContext*, impala::io::ScanRange*) ()
> #5  0x010fa3f4 in 
> impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) ()
> #6  0x00d15193 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function, impala::Promise*) ()
> #7  0x00d158d4 in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, 
> impala::Promise*), boost::_bi::list4, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value*> > > >::run() ()
> #8  0x012919aa in thread_proxy ()
> #9  0x7ffa3b6a6e25 in start_thread () from /lib64/libpthread.so.0
> #10 0x7ffa3b3d0bad in clone () from /lib64/libc.so.6
> {code}
> There is also log4j contention in the JVM due to writing error messages to 
> impalad.ERRO like this
> {code}
> readDirect: FSDataInputStream#read error:
> UnsupportedOperationException: Byte-buffer read unsupported by input 
> streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported 
> by input stream
> at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150)
> readDirect: FSDataInputStream#read error:
> UnsupportedOperationException: Byte-buffer read unsupported by input 
>