[jira] [Commented] (IMPALA-10308) Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with ASAN build
[ https://issues.apache.org/jira/browse/IMPALA-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225068#comment-17225068 ] WangSheng commented on IMPALA-10308: Hi [~sql_forever], we need to create these test tables manually before execute tests, if you want to verify a specific test. Or you can run $IMPALA_HOME/bin/{{run-all-tests}}{{.sh to execute whole impala tests, impala server will create tests tables automatically, all DDL statements in functional_schema_template.sql will be executed before run tests, more details about impala test, you can refer: https://cwiki.apache.org/confluence/display/IMPALA/How+to+load%2C+run%2C+and+create+new+Impala+tests}} > Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with > ASAN build > > > Key: IMPALA-10308 > URL: https://issues.apache.org/jira/browse/IMPALA-10308 > Project: IMPALA > Issue Type: Bug >Reporter: Qifan Chen >Priority: Major > > The following error was seen when running the scanner test against the ASAN > build. > {code:java} > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: Failed to load metadata for table: > 'iceberg_partitioned' > E CAUSED BY: TableLoadingException: Error loading metadata for Iceberg > table hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned > E CAUSED BY: IllegalArgumentException: Can not create a Path from a null > string > TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': > 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5', > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > [gw2] linux2 -- Python 2.7.16 > /home/qchen/Impala/bin/../infra/python/env-gcc7.5.0/bin/python > query_test/test_scanners.py:357: in test_iceberg_query > self.run_test_case('QueryTest/iceberg-query', vector) > common/impala_test_suite.py:662: in run_test_case > result = exec_fn(query, user=test_section.get('USER', '').strip() or None) > common/impala_test_suite.py:600: in __exec_in_impala > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:920: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:205: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:187: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:363: in __execute_query > handle = self.execute_query_async(query_string, user=user) > beeswax/impala_beeswax.py:357: in execute_query_async > handle = self.__do_rpc(lambda: self.imp_service.query(query,)) > beeswax/impala_beeswax.py:520: in __do_rpc > {code} > To reproduce, apply the following steps. > {code:java} > 1. Build: ${IMPALA_HOME}/buildall.sh -skiptests -ninja -asan > 2. Run test: > cd {IMPALA_HOME} > $tests/run-tests.py --exploration_strategy=exhaustive > tests/query_test/test_scanners.py > {code} > Branch info. > The master branch with ttps://github.com/apache/impala.git. The HEAD points > at 193c2e773fa9f6772e4a7c30ed3a4f75029863f1. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10308) Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with ASAN build
[ https://issues.apache.org/jira/browse/IMPALA-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224611#comment-17224611 ] WangSheng commented on IMPALA-10308: Hi [~sql_forever], thanks for report this bug. It seems that this test failed when loading iceberg_partitioned. Have you even put the test files to hdfs manually like this? {code:java} // testdata/datasets/functional/functional_schema_template.sql `hadoop fs -mkdir -p /test-warehouse/iceberg_test && \ hadoop fs -put -f ${IMPALA_HOME}/testdata/data/iceberg_test/iceberg_partitioned /test-warehouse/iceberg_test/ {code} I've already rebuild code in my own environment by ninja and asan, but I can create external Iceberg table and query normally, like this: {code:java} CREATE EXTERNAL TABLE functional_parquet.iceberg_partitioned ( id INT, user STRING, action STRING, event_time TIMESTAMP ) PARTITION BY SPEC ( event_time HOUR, action IDENTITY ) STORED AS ICEBERG LOCATION 'hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned' TBLPROPERTIES ('iceberg.catalog'='hadoop.tables', 'iceberg.file_format'='parquet'); select count(1) from functional_parquet.iceberg_partitioned;{code} > Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with > ASAN build > > > Key: IMPALA-10308 > URL: https://issues.apache.org/jira/browse/IMPALA-10308 > Project: IMPALA > Issue Type: Bug >Reporter: Qifan Chen >Priority: Major > > The following error was seen when running the scanner test against the ASAN > build. > {code:java} > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: Failed to load metadata for table: > 'iceberg_partitioned' > E CAUSED BY: TableLoadingException: Error loading metadata for Iceberg > table hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned > E CAUSED BY: IllegalArgumentException: Can not create a Path from a null > string > TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': > 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5', > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > [gw2] linux2 -- Python 2.7.16 > /home/qchen/Impala/bin/../infra/python/env-gcc7.5.0/bin/python > query_test/test_scanners.py:357: in test_iceberg_query > self.run_test_case('QueryTest/iceberg-query', vector) > common/impala_test_suite.py:662: in run_test_case > result = exec_fn(query, user=test_section.get('USER', '').strip() or None) > common/impala_test_suite.py:600: in __exec_in_impala > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:920: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:205: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:187: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:363: in __execute_query > handle = self.execute_query_async(query_string, user=user) > beeswax/impala_beeswax.py:357: in execute_query_async > handle = self.__do_rpc(lambda: self.imp_service.query(query,)) > beeswax/impala_beeswax.py:520: in __do_rpc > {code} > To reproduce, apply the following
[jira] [Commented] (IMPALA-10237) Support BUCKET and TRUNCATE partition transforms as built-in functions
[ https://issues.apache.org/jira/browse/IMPALA-10237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215128#comment-17215128 ] WangSheng commented on IMPALA-10237: Hi [~gaborkaszab], I don't understand this JIRA title, What do you mean by built-in functions for BUCKET/TRUNCATE partition transform? Could you please add more descriptions about what this JIRA intended to do? > Support BUCKET and TRUNCATE partition transforms as built-in functions > -- > > Key: IMPALA-10237 > URL: https://issues.apache.org/jira/browse/IMPALA-10237 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Frontend >Reporter: Gabor Kaszab >Priority: Major > Labels: impala-iceberg > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10159) Support ORC file format for Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng resolved IMPALA-10159. Resolution: Fixed > Support ORC file format for Iceberg table > - > > Key: IMPALA-10159 > URL: https://issues.apache.org/jira/browse/IMPALA-10159 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > Impala can query PARQUET file format for Iceberg Table now. Since have > already do some work in IMPALA-9741, we can continue ORC file format > supported work in this jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10159) Support ORC file format for Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng resolved IMPALA-10159. Resolution: Fixed > Support ORC file format for Iceberg table > - > > Key: IMPALA-10159 > URL: https://issues.apache.org/jira/browse/IMPALA-10159 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > Impala can query PARQUET file format for Iceberg Table now. Since have > already do some work in IMPALA-9741, we can continue ORC file format > supported work in this jira. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (IMPALA-10166) ALTER TABLE for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10166 started by WangSheng. -- > ALTER TABLE for Iceberg tables > -- > > Key: IMPALA-10166 > URL: https://issues.apache.org/jira/browse/IMPALA-10166 > Project: IMPALA > Issue Type: New Feature >Reporter: Zoltán Borók-Nagy >Assignee: WangSheng >Priority: Major > Labels: impala-iceberg > > Add support for ALTER TABLE operations for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10166) ALTER TABLE for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212239#comment-17212239 ] WangSheng commented on IMPALA-10166: HI [~boroknagyz], I will try to implement this as soon as possible. > ALTER TABLE for Iceberg tables > -- > > Key: IMPALA-10166 > URL: https://issues.apache.org/jira/browse/IMPALA-10166 > Project: IMPALA > Issue Type: New Feature >Reporter: Zoltán Borók-Nagy >Assignee: WangSheng >Priority: Major > Labels: impala-iceberg > > Add support for ALTER TABLE operations for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10166) ALTER TABLE for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng reassigned IMPALA-10166: -- Assignee: WangSheng > ALTER TABLE for Iceberg tables > -- > > Key: IMPALA-10166 > URL: https://issues.apache.org/jira/browse/IMPALA-10166 > Project: IMPALA > Issue Type: New Feature >Reporter: Zoltán Borók-Nagy >Assignee: WangSheng >Priority: Major > Labels: impala-iceberg > > Add support for ALTER TABLE operations for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10166) ALTER TABLE for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211579#comment-17211579 ] WangSheng commented on IMPALA-10166: Hi [~boroknagyz], is anyone preparing to do this jira? If not, I'd like to try this. > ALTER TABLE for Iceberg tables > -- > > Key: IMPALA-10166 > URL: https://issues.apache.org/jira/browse/IMPALA-10166 > Project: IMPALA > Issue Type: New Feature >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Add support for ALTER TABLE operations for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-10166) ALTER TABLE for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211579#comment-17211579 ] WangSheng edited comment on IMPALA-10166 at 10/10/20, 6:29 AM: --- Hi [~boroknagyz], is anyone preparing to do this jira? If not, please assign this Jira to me, I'd like to try this. was (Author: skyyws): Hi [~boroknagyz], is anyone preparing to do this jira? If not, I'd like to try this. > ALTER TABLE for Iceberg tables > -- > > Key: IMPALA-10166 > URL: https://issues.apache.org/jira/browse/IMPALA-10166 > Project: IMPALA > Issue Type: New Feature >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Add support for ALTER TABLE operations for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10164) Support HadoopCatalog for Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng resolved IMPALA-10164. Resolution: Fixed > Support HadoopCatalog for Iceberg table > --- > > Key: IMPALA-10164 > URL: https://issues.apache.org/jira/browse/IMPALA-10164 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > We just supported HadoopTable api to create Iceberg table in Impala now, it's > apparently not enough, so we preparing to support HadoopCatalog. The main > design is to add a new table property named 'iceberg.catalog', and default > value is 'hadoop.tables', we implement 'hadoop.catalog' to supported > HadoopCatalog api. We may even support 'hive.catalog' in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10164) Support HadoopCatalog for Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng resolved IMPALA-10164. Resolution: Fixed > Support HadoopCatalog for Iceberg table > --- > > Key: IMPALA-10164 > URL: https://issues.apache.org/jira/browse/IMPALA-10164 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > We just supported HadoopTable api to create Iceberg table in Impala now, it's > apparently not enough, so we preparing to support HadoopCatalog. The main > design is to add a new table property named 'iceberg.catalog', and default > value is 'hadoop.tables', we implement 'hadoop.catalog' to supported > HadoopCatalog api. We may even support 'hive.catalog' in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng resolved IMPALA-9741. --- Resolution: Fixed > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Labels: impala-iceberg > Attachments: select-iceberg.jpg > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (IMPALA-10159) Support ORC file format for Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10159 started by WangSheng. -- > Support ORC file format for Iceberg table > - > > Key: IMPALA-10159 > URL: https://issues.apache.org/jira/browse/IMPALA-10159 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > Impala can query PARQUET file format for Iceberg Table now. Since have > already do some work in IMPALA-9741, we can continue ORC file format > supported work in this jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng resolved IMPALA-9741. --- Resolution: Fixed > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Labels: impala-iceberg > Attachments: select-iceberg.jpg > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9967) Scan orc failed when table contains timestamp column
[ https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng reassigned IMPALA-9967: - Assignee: (was: WangSheng) > Scan orc failed when table contains timestamp column > > > Key: IMPALA-9967 > URL: https://issues.apache.org/jira/browse/IMPALA-9967 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: WangSheng >Priority: Minor > Labels: impala-iceberg > Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, > 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc > > > Recently, when I test impala query orc table, I found that scanning failed > when table contains timestamp column, here is there exception: > {code:java} > I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > @ 0x1c9f753 impala::Status::Status() > @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() > @ 0x27a7fb3 impala::HdfsOrcScanner::Open() > @ 0x27365fe > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() > @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() > @ 0x28caa7d impala::HdfsScanNode::ScannerThread() > @ 0x28c9de5 > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv > @ 0x28cc19e > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x205 boost::function0<>::operator()() > @ 0x2675d93 impala::Thread::SuperviseThread() > @ 0x267dd30 boost::_bi::list5<>::operator()<>() > @ 0x267dc54 boost::_bi::bind_t<>::operator()() > @ 0x267dc15 boost::detail::thread_data<>::run() > @ 0x3e3c3c1 thread_proxy > @ 0x7f32360336b9 start_thread > @ 0x7f3232bfe41c clone > I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] > 68436a6e0883be84:53877f720002] Error preparing scanner for scan range > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > {code} > When I remove timestamp colum from table, and generate test data, query > success. By the way, my test data is generated by spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work stopped] (IMPALA-9967) Scan orc failed when table contains timestamp column
[ https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9967 stopped by WangSheng. - > Scan orc failed when table contains timestamp column > > > Key: IMPALA-9967 > URL: https://issues.apache.org/jira/browse/IMPALA-9967 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, > 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc > > > Recently, when I test impala query orc table, I found that scanning failed > when table contains timestamp column, here is there exception: > {code:java} > I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > @ 0x1c9f753 impala::Status::Status() > @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() > @ 0x27a7fb3 impala::HdfsOrcScanner::Open() > @ 0x27365fe > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() > @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() > @ 0x28caa7d impala::HdfsScanNode::ScannerThread() > @ 0x28c9de5 > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv > @ 0x28cc19e > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x205 boost::function0<>::operator()() > @ 0x2675d93 impala::Thread::SuperviseThread() > @ 0x267dd30 boost::_bi::list5<>::operator()<>() > @ 0x267dc54 boost::_bi::bind_t<>::operator()() > @ 0x267dc15 boost::detail::thread_data<>::run() > @ 0x3e3c3c1 thread_proxy > @ 0x7f32360336b9 start_thread > @ 0x7f3232bfe41c clone > I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] > 68436a6e0883be84:53877f720002] Error preparing scanner for scan range > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > {code} > When I remove timestamp colum from table, and generate test data, query > success. By the way, my test data is generated by spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-9967) Scan orc failed when table contains timestamp column
[ https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9967 started by WangSheng. - > Scan orc failed when table contains timestamp column > > > Key: IMPALA-9967 > URL: https://issues.apache.org/jira/browse/IMPALA-9967 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, > 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc > > > Recently, when I test impala query orc table, I found that scanning failed > when table contains timestamp column, here is there exception: > {code:java} > I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > @ 0x1c9f753 impala::Status::Status() > @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() > @ 0x27a7fb3 impala::HdfsOrcScanner::Open() > @ 0x27365fe > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() > @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() > @ 0x28caa7d impala::HdfsScanNode::ScannerThread() > @ 0x28c9de5 > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv > @ 0x28cc19e > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x205 boost::function0<>::operator()() > @ 0x2675d93 impala::Thread::SuperviseThread() > @ 0x267dd30 boost::_bi::list5<>::operator()<>() > @ 0x267dc54 boost::_bi::bind_t<>::operator()() > @ 0x267dc15 boost::detail::thread_data<>::run() > @ 0x3e3c3c1 thread_proxy > @ 0x7f32360336b9 start_thread > @ 0x7f3232bfe41c clone > I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] > 68436a6e0883be84:53877f720002] Error preparing scanner for scan range > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > {code} > When I remove timestamp colum from table, and generate test data, query > success. By the way, my test data is generated by spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'
[ https://issues.apache.org/jira/browse/IMPALA-10221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng resolved IMPALA-10221. Resolution: Fixed > Use 'iceberg.file_format' to replace 'iceberg_file_format' > -- > > Key: IMPALA-10221 > URL: https://issues.apache.org/jira/browse/IMPALA-10221 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > We provide several new table properties in IMPALA-10164, such as > 'iceberg.catalog', > in order to keep consist of these properties, we rename > 'iceberg_file_format' to > 'iceberg.file_format'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'
[ https://issues.apache.org/jira/browse/IMPALA-10221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng resolved IMPALA-10221. Resolution: Fixed > Use 'iceberg.file_format' to replace 'iceberg_file_format' > -- > > Key: IMPALA-10221 > URL: https://issues.apache.org/jira/browse/IMPALA-10221 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > We provide several new table properties in IMPALA-10164, such as > 'iceberg.catalog', > in order to keep consist of these properties, we rename > 'iceberg_file_format' to > 'iceberg.file_format'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9688) Support create iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng resolved IMPALA-9688. --- Resolution: Fixed > Support create iceberg table by impala > -- > > Key: IMPALA-9688 > URL: https://issues.apache.org/jira/browse/IMPALA-9688 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Labels: impala-iceberg > > This sub-task mainly realizes the creation of iceberg table through impala -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9688) Support create iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng resolved IMPALA-9688. --- Resolution: Fixed > Support create iceberg table by impala > -- > > Key: IMPALA-9688 > URL: https://issues.apache.org/jira/browse/IMPALA-9688 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Labels: impala-iceberg > > This sub-task mainly realizes the creation of iceberg table through impala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'
[ https://issues.apache.org/jira/browse/IMPALA-10221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10221 started by WangSheng. -- > Use 'iceberg.file_format' to replace 'iceberg_file_format' > -- > > Key: IMPALA-10221 > URL: https://issues.apache.org/jira/browse/IMPALA-10221 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > We provide several new table properties in IMPALA-10164, such as > 'iceberg.catalog', > in order to keep consist of these properties, we rename > 'iceberg_file_format' to > 'iceberg.file_format'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'
WangSheng created IMPALA-10221: -- Summary: Use 'iceberg.file_format' to replace 'iceberg_file_format' Key: IMPALA-10221 URL: https://issues.apache.org/jira/browse/IMPALA-10221 Project: IMPALA Issue Type: Sub-task Reporter: WangSheng Assignee: WangSheng We provide several new table properties in IMPALA-10164, such as 'iceberg.catalog', in order to keep consist of these properties, we rename 'iceberg_file_format' to 'iceberg.file_format'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'
WangSheng created IMPALA-10221: -- Summary: Use 'iceberg.file_format' to replace 'iceberg_file_format' Key: IMPALA-10221 URL: https://issues.apache.org/jira/browse/IMPALA-10221 Project: IMPALA Issue Type: Sub-task Reporter: WangSheng Assignee: WangSheng We provide several new table properties in IMPALA-10164, such as 'iceberg.catalog', in order to keep consist of these properties, we rename 'iceberg_file_format' to 'iceberg.file_format'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (IMPALA-10164) Support HadoopCatalog for Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10164 started by WangSheng. -- > Support HadoopCatalog for Iceberg table > --- > > Key: IMPALA-10164 > URL: https://issues.apache.org/jira/browse/IMPALA-10164 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > We just supported HadoopTable api to create Iceberg table in Impala now, it's > apparently not enough, so we preparing to support HadoopCatalog. The main > design is to add a new table property named 'iceberg.catalog', and default > value is 'hadoop.tables', we implement 'hadoop.catalog' to supported > HadoopCatalog api. We may even support 'hive.catalog' in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10164) Support HadoopCatalog for Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195455#comment-17195455 ] WangSheng commented on IMPALA-10164: Hi [~boroknagyz], I've do some work for this jira, and it's not very difficult to implement this function. But HadoopCatalog is quite different from HadoopTables: # We just need Configuration to construct HadoopTables, but HadoopCatalog need another param location, such as hdfs://xxx/warehouse/, and this location used to reserve table. When using HadoopCatalog, we need to provide TableIdentifier which mainly contains database and table, then Iceberg will create table use location 'hdfs://xxx/warehouse/database/table' to storage table info; # When create external table, we cannot use 'hdfs://xxx/warehouse/database/table' to loading table directly, we need use TableIdentifier.of(database, table) and 'hdfs://xxx/warehouse/' instead. So here is the problem: when creating external table with HadoopCatalog, how to define the location? * If we use 'hdfs://xxx/warehouse' in sql, we can simply use this location and TableIdentifier.of(database, table) to loading table, but this usage is different from HdfsTable, a little wired; * If we use 'hdfs://xxx/warehouse/database/table' in sql, we need to extract 'hdfs://xxx/warehouse', 'database', 'table' from this location, and compare with database, table with from 'create external table xxx', if same, we can loading table, otherwise maybe throw exception. How do you think?Here is my simple patch, I use first method to just verify, in this patch, we need to create table like this: {code:java} create external database.table stored as ICEBERG location 'hdfs://test-warehouse'{code} Here is the Gerrit url: https://gerrit.cloudera.org/#/c/16446/ > Support HadoopCatalog for Iceberg table > --- > > Key: IMPALA-10164 > URL: https://issues.apache.org/jira/browse/IMPALA-10164 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > We just supported HadoopTable api to create Iceberg table in Impala now, it's > apparently not enough, so we preparing to support HadoopCatalog. The main > design is to add a new table property named 'iceberg.catalog', and default > value is 'hadoop.tables', we implement 'hadoop.catalog' to supported > HadoopCatalog api. We may even support 'hive.catalog' in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10164) Support HadoopCatalog for Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-10164: --- Issue Type: Improvement (was: New Feature) > Support HadoopCatalog for Iceberg table > --- > > Key: IMPALA-10164 > URL: https://issues.apache.org/jira/browse/IMPALA-10164 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > We just supported HadoopTable api to create Iceberg table in Impala now, it's > apparently not enough, so we preparing to support HadoopCatalog. The main > design is to add a new table property named 'iceberg.catalog', and default > value is 'hadoop.tables', we implement 'hadoop.catalog' to supported > HadoopCatalog api. We may even support 'hive.catalog' in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10164) Support HadoopCatalog for Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-10164: --- Parent: (was: IMPALA-9621) Issue Type: New Feature (was: Sub-task) > Support HadoopCatalog for Iceberg table > --- > > Key: IMPALA-10164 > URL: https://issues.apache.org/jira/browse/IMPALA-10164 > Project: IMPALA > Issue Type: New Feature >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > We just supported HadoopTable api to create Iceberg table in Impala now, it's > apparently not enough, so we preparing to support HadoopCatalog. The main > design is to add a new table property named 'iceberg.catalog', and default > value is 'hadoop.tables', we implement 'hadoop.catalog' to supported > HadoopCatalog api. We may even support 'hive.catalog' in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10164) Support HadoopCatalog for Iceberg table
WangSheng created IMPALA-10164: -- Summary: Support HadoopCatalog for Iceberg table Key: IMPALA-10164 URL: https://issues.apache.org/jira/browse/IMPALA-10164 Project: IMPALA Issue Type: Sub-task Reporter: WangSheng Assignee: WangSheng We just supported HadoopTable api to create Iceberg table in Impala now, it's apparently not enough, so we preparing to support HadoopCatalog. The main design is to add a new table property named 'iceberg.catalog', and default value is 'hadoop.tables', we implement 'hadoop.catalog' to supported HadoopCatalog api. We may even support 'hive.catalog' in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10164) Support HadoopCatalog for Iceberg table
WangSheng created IMPALA-10164: -- Summary: Support HadoopCatalog for Iceberg table Key: IMPALA-10164 URL: https://issues.apache.org/jira/browse/IMPALA-10164 Project: IMPALA Issue Type: Sub-task Reporter: WangSheng Assignee: WangSheng We just supported HadoopTable api to create Iceberg table in Impala now, it's apparently not enough, so we preparing to support HadoopCatalog. The main design is to add a new table property named 'iceberg.catalog', and default value is 'hadoop.tables', we implement 'hadoop.catalog' to supported HadoopCatalog api. We may even support 'hive.catalog' in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IMPALA-10159) Support ORC file format for Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192834#comment-17192834 ] WangSheng commented on IMPALA-10159: Hi [~boroknagyz], I use spark-shell to generated test files, my spark client version is 2.4.5, and the orc jars in this client is 1.5.5, even I replace these orc jars to 1.6.3, it doesn't work. Here is the code to generated test files: {code:java} val conf = new Configuration() val tblLoc = "/test-warehouse/iceberg_test/iceberg_partitioned_orc" val catalog = new HadoopTables(conf); val sparkSchema = StructType(List(StructField("id", IntegerType,true), StructField("user", StringType,false),StructField("action", StringType,false), StructField("event_time", SparkSchemaUtil.convert(Types.TimestampType.withoutZone()),false))) val icebergSchema = SparkSchemaUtil.convert(sparkSchema) val spec = PartitionSpec.builderFor(icebergSchema).hour("event_time").identity("action").build val table = catalog.create(icebergSchema, spec, tblLoc) val data_df = spark.createDataFrame(Seq((1,"Alex","view",Timestamp.valueOf("2020-01-01 08:00:00".toDF("id","user","action","ts") var array = data_df.select(data_df("id"),data_df("user"),data_df("action"),to_timestamp(data_df("ts"))).collect() val df = spark.createDataFrame(sc.makeRDD(array), sparkSchema) df.write.format("iceberg").option("write-format", "orc").mode("append").save(tblLoc) spark.read.format("iceberg").load(tblLoc).show {code} This code will throw exception "java.lang.UnsupportedOperationException: Spark does not support timestamp without time zone fields" If we replace "SparkSchemaUtil.convert(Types.TimestampType.withoutZone())" to "TimestampType", we can generated test files normally, but when query in Impala, you can meet the problem in IMPALA-9967. And here is the create statement: {code:java} CREATE EXTERNAL TABLE default.iceberg_partitioned_orc STORED AS ICEBERG LOCATION 'hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned_orc' TBLPROPERTIES('iceberg_file_format'='orc'); {code} > Support ORC file format for Iceberg table > - > > Key: IMPALA-10159 > URL: https://issues.apache.org/jira/browse/IMPALA-10159 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > Impala can query PARQUET file format for Iceberg Table now. Since have > already do some work in IMPALA-9741, we can continue ORC file format > supported work in this jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10159) Support ORC file format for Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192751#comment-17192751 ] WangSheng commented on IMPALA-10159: Hi [~boroknagyz],[~tarmstrong], supported ORC file format for Iceberg table is quite simple based on IMPALA-9741. The point is to construct test cases, and we meet problems in IMPALA-9967. My previous test file is generated by Spark, and I found that Spark is not supported timestamp without time zone fields. So I think we may generate test files without Timestamp type and explain this in the code. How do you think? > Support ORC file format for Iceberg table > - > > Key: IMPALA-10159 > URL: https://issues.apache.org/jira/browse/IMPALA-10159 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > > Impala can query PARQUET file format for Iceberg Table now. Since have > already do some work in IMPALA-9741, we can continue ORC file format > supported work in this jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10159) Support ORC file format for Iceberg table
WangSheng created IMPALA-10159: -- Summary: Support ORC file format for Iceberg table Key: IMPALA-10159 URL: https://issues.apache.org/jira/browse/IMPALA-10159 Project: IMPALA Issue Type: Sub-task Reporter: WangSheng Assignee: WangSheng Impala can query PARQUET file format for Iceberg Table now. Since have already do some work in IMPALA-9741, we can continue ORC file format supported work in this jira. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10159) Support ORC file format for Iceberg table
WangSheng created IMPALA-10159: -- Summary: Support ORC file format for Iceberg table Key: IMPALA-10159 URL: https://issues.apache.org/jira/browse/IMPALA-10159 Project: IMPALA Issue Type: Sub-task Reporter: WangSheng Assignee: WangSheng Impala can query PARQUET file format for Iceberg Table now. Since have already do some work in IMPALA-9741, we can continue ORC file format supported work in this jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9967) Scan orc failed when table contains timestamp column
[ https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186208#comment-17186208 ] WangSheng edited comment on IMPALA-9967 at 8/28/20, 7:17 AM: - Hi [~boroknagyz], here is the data file: {code:java} create external table orc_test( id int, user string, action string, event_time timestamp) stored as orc location 'hdfs://localhost:20500/orc_table_test'; {code} This file contains timestamp column, create external table by this file, select will throw exception. [^00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc] {code:java} create external table orc_test2( id int, user string, action string) stored as orc location 'hdfs://localhost:20500/orc_table_test2'; {code} This file does not contains timestamp column, and create external table by this file, select returns success. [^00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc] was (Author: skyyws): {code:java} create external table orc_test( id int, user string, action string, event_time timestamp) stored as orc location 'hdfs://localhost:20500/orc_table_test'; {code} This file contains timestamp column, create external table by this file, select will throw exception. [^00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc] {code:java} create external table orc_test2( id int, user string, action string) stored as orc location 'hdfs://localhost:20500/orc_table_test2'; {code} This file does not contains timestamp column, and create external table by this file, select returns success. [^00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc] > Scan orc failed when table contains timestamp column > > > Key: IMPALA-9967 > URL: https://issues.apache.org/jira/browse/IMPALA-9967 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: WangSheng >Priority: Minor > Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, > 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc > > > Recently, when I test impala query orc table, I found that scanning failed > when table contains timestamp column, here is there exception: > {code:java} > I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > @ 0x1c9f753 impala::Status::Status() > @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() > @ 0x27a7fb3 impala::HdfsOrcScanner::Open() > @ 0x27365fe > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() > @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() > @ 0x28caa7d impala::HdfsScanNode::ScannerThread() > @ 0x28c9de5 > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv > @ 0x28cc19e > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x205 boost::function0<>::operator()() > @ 0x2675d93 impala::Thread::SuperviseThread() > @ 0x267dd30 boost::_bi::list5<>::operator()<>() > @ 0x267dc54 boost::_bi::bind_t<>::operator()() > @ 0x267dc15 boost::detail::thread_data<>::run() > @ 0x3e3c3c1 thread_proxy > @ 0x7f32360336b9 start_thread > @ 0x7f3232bfe41c clone > I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] > 68436a6e0883be84:53877f720002] Error preparing scanner for scan range > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > {code} > When I remove timestamp colum from table, and generate test data, query > success. By the way, my test data is generated by spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9967) Scan orc failed when table contains timestamp column
[ https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186208#comment-17186208 ] WangSheng commented on IMPALA-9967: --- {code:java} create external table orc_test( id int, user string, action string, event_time timestamp) stored as orc location 'hdfs://localhost:20500/orc_table_test'; {code} This file contains timestamp column, create external table by this file, select will throw exception. [^00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc] {code:java} create external table orc_test2( id int, user string, action string) stored as orc location 'hdfs://localhost:20500/orc_table_test2'; {code} This file does not contains timestamp column, and create external table by this file, select returns success. [^00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc] > Scan orc failed when table contains timestamp column > > > Key: IMPALA-9967 > URL: https://issues.apache.org/jira/browse/IMPALA-9967 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: WangSheng >Priority: Minor > Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, > 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc > > > Recently, when I test impala query orc table, I found that scanning failed > when table contains timestamp column, here is there exception: > {code:java} > I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > @ 0x1c9f753 impala::Status::Status() > @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() > @ 0x27a7fb3 impala::HdfsOrcScanner::Open() > @ 0x27365fe > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() > @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() > @ 0x28caa7d impala::HdfsScanNode::ScannerThread() > @ 0x28c9de5 > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv > @ 0x28cc19e > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x205 boost::function0<>::operator()() > @ 0x2675d93 impala::Thread::SuperviseThread() > @ 0x267dd30 boost::_bi::list5<>::operator()<>() > @ 0x267dc54 boost::_bi::bind_t<>::operator()() > @ 0x267dc15 boost::detail::thread_data<>::run() > @ 0x3e3c3c1 thread_proxy > @ 0x7f32360336b9 start_thread > @ 0x7f3232bfe41c clone > I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] > 68436a6e0883be84:53877f720002] Error preparing scanner for scan range > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > {code} > When I remove timestamp colum from table, and generate test data, query > success. By the way, my test data is generated by spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9967) Scan orc failed when table contains timestamp column
[ https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9967: -- Attachment: 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc > Scan orc failed when table contains timestamp column > > > Key: IMPALA-9967 > URL: https://issues.apache.org/jira/browse/IMPALA-9967 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: WangSheng >Priority: Minor > Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, > 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc > > > Recently, when I test impala query orc table, I found that scanning failed > when table contains timestamp column, here is there exception: > {code:java} > I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > @ 0x1c9f753 impala::Status::Status() > @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() > @ 0x27a7fb3 impala::HdfsOrcScanner::Open() > @ 0x27365fe > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() > @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() > @ 0x28caa7d impala::HdfsScanNode::ScannerThread() > @ 0x28c9de5 > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv > @ 0x28cc19e > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x205 boost::function0<>::operator()() > @ 0x2675d93 impala::Thread::SuperviseThread() > @ 0x267dd30 boost::_bi::list5<>::operator()<>() > @ 0x267dc54 boost::_bi::bind_t<>::operator()() > @ 0x267dc15 boost::detail::thread_data<>::run() > @ 0x3e3c3c1 thread_proxy > @ 0x7f32360336b9 start_thread > @ 0x7f3232bfe41c clone > I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] > 68436a6e0883be84:53877f720002] Error preparing scanner for scan range > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > {code} > When I remove timestamp colum from table, and generate test data, query > success. By the way, my test data is generated by spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9967) Scan orc failed when table contains timestamp column
[ https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9967: -- Attachment: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc > Scan orc failed when table contains timestamp column > > > Key: IMPALA-9967 > URL: https://issues.apache.org/jira/browse/IMPALA-9967 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: WangSheng >Priority: Minor > Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc > > > Recently, when I test impala query orc table, I found that scanning failed > when table contains timestamp column, here is there exception: > {code:java} > I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > @ 0x1c9f753 impala::Status::Status() > @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() > @ 0x27a7fb3 impala::HdfsOrcScanner::Open() > @ 0x27365fe > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() > @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() > @ 0x28caa7d impala::HdfsScanNode::ScannerThread() > @ 0x28c9de5 > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv > @ 0x28cc19e > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x205 boost::function0<>::operator()() > @ 0x2675d93 impala::Thread::SuperviseThread() > @ 0x267dd30 boost::_bi::list5<>::operator()<>() > @ 0x267dc54 boost::_bi::bind_t<>::operator()() > @ 0x267dc15 boost::detail::thread_data<>::run() > @ 0x3e3c3c1 thread_proxy > @ 0x7f32360336b9 start_thread > @ 0x7f3232bfe41c clone > I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] > 68436a6e0883be84:53877f720002] Error preparing scanner for scan range > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > {code} > When I remove timestamp colum from table, and generate test data, query > success. By the way, my test data is generated by spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9967) Scan orc failed when table contains timestamp column
[ https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9967: -- Description: Recently, when I test impala query orc table, I found that scanning failed when table contains timestamp column, here is there exception: {code:java} I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: Unknown type kind @ 0x1c9f753 impala::Status::Status() @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() @ 0x27a7fb3 impala::HdfsOrcScanner::Open() @ 0x27365fe impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() @ 0x28caa7d impala::HdfsScanNode::ScannerThread() @ 0x28c9de5 _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv @ 0x28cc19e _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x205 boost::function0<>::operator()() @ 0x2675d93 impala::Thread::SuperviseThread() @ 0x267dd30 boost::_bi::list5<>::operator()<>() @ 0x267dc54 boost::_bi::bind_t<>::operator()() @ 0x267dc15 boost::detail::thread_data<>::run() @ 0x3e3c3c1 thread_proxy @ 0x7f32360336b9 start_thread @ 0x7f3232bfe41c clone I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 68436a6e0883be84:53877f720002] Error preparing scanner for scan range hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: Unknown type kind {code} When I remove timestamp colum from table, and generate test data, query success. By the way, my test data is generated by spark. was: Recently, when I test impala query orc table, I found that scanning failed when table contains timestamp column, here is there exception: {code:java} I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: Unknown type kind @ 0x1c9f753 impala::Status::Status() @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() @ 0x27a7fb3 impala::HdfsOrcScanner::Open() @ 0x27365fe impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() @ 0x28caa7d impala::HdfsScanNode::ScannerThread() @ 0x28c9de5 _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv @ 0x28cc19e _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x205 boost::function0<>::operator()() @ 0x2675d93 impala::Thread::SuperviseThread() @ 0x267dd30 boost::_bi::list5<>::operator()<>() @ 0x267dc54 boost::_bi::bind_t<>::operator()() @ 0x267dc15 boost::detail::thread_data<>::run() @ 0x3e3c3c1 thread_proxy @ 0x7f32360336b9 start_thread @ 0x7f3232bfe41c clone I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 68436a6e0883be84:53877f720002] Error preparing scanner for scan range hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: Unknown type kind {code} When I remove timestamp colum from table, and generate test data, query success. By the way, my test data is generated by spark. > Scan orc failed when table contains timestamp column > > > Key: IMPALA-9967 > URL: https://issues.apache.org/jira/browse/IMPALA-9967 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: WangSheng >Priority: Minor > > Recently, when I test impala query orc table, I found that scanning failed > when table contains timestamp column, here is there exception: > {code:java} > I0717
[jira] [Updated] (IMPALA-9967) Scan orc failed when table contains timestamp column
[ https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9967: -- Description: Recently, when I test impala query orc table, I found that scanning failed when table contains timestamp column, here is there exception: {code:java} I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: Unknown type kind @ 0x1c9f753 impala::Status::Status() @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() @ 0x27a7fb3 impala::HdfsOrcScanner::Open() @ 0x27365fe impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() @ 0x28caa7d impala::HdfsScanNode::ScannerThread() @ 0x28c9de5 _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv @ 0x28cc19e _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x205 boost::function0<>::operator()() @ 0x2675d93 impala::Thread::SuperviseThread() @ 0x267dd30 boost::_bi::list5<>::operator()<>() @ 0x267dc54 boost::_bi::bind_t<>::operator()() @ 0x267dc15 boost::detail::thread_data<>::run() @ 0x3e3c3c1 thread_proxy @ 0x7f32360336b9 start_thread @ 0x7f3232bfe41c clone I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 68436a6e0883be84:53877f720002] Error preparing scanner for scan range hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: Unknown type kind {code} When I remove timestamp colum from table, and generate test data, query success. By the way, my test data is generated by spark. was: Recently, when I test impala query orc table, I found that scanning failed when table contains timestamp column, here is there exception: {code:java} I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: Unknown type kind @ 0x1c9f753 impala::Status::Status() @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() @ 0x27a7fb3 impala::HdfsOrcScanner::Open() @ 0x27365fe impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() @ 0x28caa7d impala::HdfsScanNode::ScannerThread() @ 0x28c9de5 _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv @ 0x28cc19e _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x205 boost::function0<>::operator()() @ 0x2675d93 impala::Thread::SuperviseThread() @ 0x267dd30 boost::_bi::list5<>::operator()<>() @ 0x267dc54 boost::_bi::bind_t<>::operator()() @ 0x267dc15 boost::detail::thread_data<>::run() @ 0x3e3c3c1 thread_proxy @ 0x7f32360336b9 start_thread @ 0x7f3232bfe41c clone I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 68436a6e0883be84:53877f720002] Error preparing scanner for scan range hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: Unknown type kind {code} When I remove timestamp colum from table, and generate test data, query success. By the way, my test data is generated by spark. > Scan orc failed when table contains timestamp column > > > Key: IMPALA-9967 > URL: https://issues.apache.org/jira/browse/IMPALA-9967 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: WangSheng >Priority: Minor > > Recently, when I test impala query orc table, I found that scanning failed > when table contains timestamp column, here is there exception: > {code:java} > I0717 08:31:47.179124 78759 status.cc:129]
[jira] [Created] (IMPALA-9967) Scan orc failed when table contains timestamp column
WangSheng created IMPALA-9967: - Summary: Scan orc failed when table contains timestamp column Key: IMPALA-9967 URL: https://issues.apache.org/jira/browse/IMPALA-9967 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 4.0 Reporter: WangSheng Recently, when I test impala query orc table, I found that scanning failed when table contains timestamp column, here is there exception: {code:java} I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: Unknown type kind @ 0x1c9f753 impala::Status::Status() @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() @ 0x27a7fb3 impala::HdfsOrcScanner::Open() @ 0x27365fe impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() @ 0x28caa7d impala::HdfsScanNode::ScannerThread() @ 0x28c9de5 _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv @ 0x28cc19e _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x205 boost::function0<>::operator()() @ 0x2675d93 impala::Thread::SuperviseThread() @ 0x267dd30 boost::_bi::list5<>::operator()<>() @ 0x267dc54 boost::_bi::bind_t<>::operator()() @ 0x267dc15 boost::detail::thread_data<>::run() @ 0x3e3c3c1 thread_proxy @ 0x7f32360336b9 start_thread @ 0x7f3232bfe41c clone I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 68436a6e0883be84:53877f720002] Error preparing scanner for scan range hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: Unknown type kind {code} When I remove timestamp colum from table, and generate test data, query success. By the way, my test data is generated by spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9967) Scan orc failed when table contains timestamp column
WangSheng created IMPALA-9967: - Summary: Scan orc failed when table contains timestamp column Key: IMPALA-9967 URL: https://issues.apache.org/jira/browse/IMPALA-9967 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 4.0 Reporter: WangSheng Recently, when I test impala query orc table, I found that scanning failed when table contains timestamp column, here is there exception: {code:java} I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: Unknown type kind @ 0x1c9f753 impala::Status::Status() @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() @ 0x27a7fb3 impala::HdfsOrcScanner::Open() @ 0x27365fe impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() @ 0x28caa7d impala::HdfsScanNode::ScannerThread() @ 0x28c9de5 _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv @ 0x28cc19e _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x205 boost::function0<>::operator()() @ 0x2675d93 impala::Thread::SuperviseThread() @ 0x267dd30 boost::_bi::list5<>::operator()<>() @ 0x267dc54 boost::_bi::bind_t<>::operator()() @ 0x267dc15 boost::detail::thread_data<>::run() @ 0x3e3c3c1 thread_proxy @ 0x7f32360336b9 start_thread @ 0x7f3232bfe41c clone I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 68436a6e0883be84:53877f720002] Error preparing scanner for scan range hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: Unknown type kind {code} When I remove timestamp colum from table, and generate test data, query success. By the way, my test data is generated by spark. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152439#comment-17152439 ] WangSheng edited comment on IMPALA-9741 at 7/7/20, 2:35 AM: Hi [~tarmstrong],[~boroknagyz],[~vihangk1], I have already completed a version of query iceberg table by impala. The main design is treated iceberg table as an unpartitioned hdfs table, including theses functions: # identity iceberg file format by table property; # push down iceberg partition column predicates to iceberg, to filter data files need to be scanned; This is a simple version, and some code may be not good, hope you can give some advice, thanks a lot. Here is the gerrit url: https://gerrit.cloudera.org/#/c/16143/ was (Author: skyyws): Hi [~tarmstrong],[~boroknagyz],[~vihangk1], I have already completed a version of query iceberg table by impala. The main design is treated iceberg table as an unpartitioned hdfs table, including theses functions: # identity iceberg file format by table property; # push down iceberg partition column predicates to iceberg, to filter data files need to be scanned; This is a simple version, and some code may be not good, hope you can give some advice, thanks a lot. Here is the gerrit url: https://gerrit.cloudera.org/#/c/16143/ > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Attachments: select-iceberg.jpg > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152439#comment-17152439 ] WangSheng edited comment on IMPALA-9741 at 7/7/20, 2:32 AM: Hi [~tarmstrong],[~boroknagyz],[~vihangk1], I have already completed a version of query iceberg table by impala. The main design is treated iceberg table as an unpartitioned hdfs table, including theses functions: # identity iceberg file format by table property; # push down iceberg partition column predicates to iceberg, to filter data files need to be scanned; This is a simple version, and some code may be not good, hope you can give some advice, thanks a lot. Here is the gerrit url: https://gerrit.cloudera.org/#/c/16143/ was (Author: skyyws): Hi [~tarmstrong],[~boroknagyz],[~vihangk1], I have already completed a version of query iceberg table by impala. The main design is treated iceberg table as an unpartitioned hdfs table, including theses functions: # identity iceberg file format by table property; # push down iceberg partition column predicates to iceberg, to filter data files need to be scanned; This is a simple version, and some code may be not good, hope you can give some advice, thanks a lot. Here is the gerrit url: https://gerrit.cloudera.org/#/c/16143/ > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Attachments: select-iceberg.jpg > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152439#comment-17152439 ] WangSheng commented on IMPALA-9741: --- Hi [~tarmstrong],[~boroknagyz],[~vihangk1], I have already completed a version of query iceberg table by impala. The main design is treated iceberg table as an unpartitioned hdfs table, including theses functions: # identity iceberg file format by table property; # push down iceberg partition column predicates to iceberg, to filter data files need to be scanned; This is a simple version, and some code may be not good, hope you can give some advice, thanks a lot. Here is the gerrit url: https://gerrit.cloudera.org/#/c/16143/ > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Attachments: select-iceberg.jpg > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109448#comment-17109448 ] WangSheng commented on IMPALA-9741: --- [~tarmstrong] I see, and thanks for your advice, I will try to implement this function recently based on IMPALA-9688, I will updated here if any progress. > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Attachments: select-iceberg.jpg > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108158#comment-17108158 ] WangSheng edited comment on IMPALA-9741 at 5/15/20, 10:53 AM: -- Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement query iceberg by impala recently, and here is my initial desgin. I will write a class named IcebergScanNode.java in frontend, and this class mainly contains these functions: * Transform impala conjunts to iceberg expressions, which means we can pushdown some predicates to icebrg; * Get specific data files from icebreg by these expressions, which stored in hdfs; * Use these specific data files to construct related thrift struct, such as THdfsFileSplit/TScanRangerSpec; * And then backend will use these thrift structs to construct "SCAN HDFS" to scan data, and this way we can reuse these code in backend. And I have upload a very simple desgin picture as an attachment, but still some questions need to be consider: # If iceberg returns different format files, such as parquet/orc, does backend can handle these files? # if not, we may decide the table data format when create table, maybe by tblproperties, like this: 'iceberg_table_format'='parquet', and if so, we cannot select iceberg table which has different format data files. was (Author: skyyws): Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement query iceberg by impala recently, and here is my initial desgin. I will write a class named IcebergScanNode.java in frontend, and this class mainly contains these functions: * Transform impala conjunts to iceberg expressions, which means we can pushdown some predicates to icebrg; * Get specific data files from icebreg by these expressions, which stored in hdfs; * Use these specific data files to construct related thrift struct, such as THdfsFileSplit/TScanRangerSpec; * And then backend will.use these thrift structs to construct "SCAN HDFS" to scan data, and this way we can reuse these code in backend. And I have upload a very simple desgin picture as an attachment, but still some questions need to be consider: # If iceberg returns different format files, such as parquet/orc, does backend can handle these files? # if not, we may decide the table data format when create table, maybe by tblproperties, like this: 'iceberg_table'='parquet' > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Attachments: select-iceberg.jpg > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108158#comment-17108158 ] WangSheng edited comment on IMPALA-9741 at 5/15/20, 10:39 AM: -- Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement query iceberg by impala recently, and here is my initial desgin. I will write a class named IcebergScanNode.java in frontend, and this class mainly contains these functions: * Transform impala conjunts to iceberg expressions, which means we can pushdown some predicates to icebrg; * Get specific data files from icebreg by these expressions, which stored in hdfs; * Use these specific data files to construct related thrift struct, such as THdfsFileSplit/TScanRangerSpec; * And then backend will.use these thrift structs to construct "SCAN HDFS" to scan data, and this way we can reuse these code in backend. And I have upload a very simple desgin picture as an attachment, but still some questions need to be consider: # If iceberg returns different format files, such as parquet/orc, does backend can handle these files? # if not, we may decide the table data format when create table, maybe by tblproperties, like this: 'iceberg_table'='parquet' was (Author: skyyws): Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement query iceberg by impala recently, and here is my initial desgin. I will write a class named IcebergScanNode.java in frontend, and this class mainly contains these functions: * Transform impala conjunts to iceberg expressions, which means we can pushdown some predicates to icebrg; * Get specific data files from icebreg by these expressions, which stored in hdfs; * Use these specific data files to construct related thrift struct, such as THdfsFileSplit/TScanRangerSpec; * And then backend will.use these thrift structs to construct "SCAN HDFS" to scan data, and this way we can reuse these code in backend. And here is a very simple desgin: !select-iceberg.jpg! Still some questions need to be consider: # If iceberg returns different format files, such as parquet/orc, does backend can handle these files? # if not, we may decide the table data format when create table, maybe by tblproperties, like this: 'iceberg_table'='parquet' > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Attachments: select-iceberg.jpg > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108158#comment-17108158 ] WangSheng edited comment on IMPALA-9741 at 5/15/20, 10:37 AM: -- Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement query iceberg by impala recently, and here is my initial desgin. I will write a class named IcebergScanNode.java in frontend, and this class mainly contains these functions: * Transform impala conjunts to iceberg expressions, which means we can pushdown some predicates to icebrg; * Get specific data files from icebreg by these expressions, which stored in hdfs; * Use these specific data files to construct related thrift struct, such as THdfsFileSplit/TScanRangerSpec; * And then backend will.use these thrift structs to construct "SCAN HDFS" to scan data, and this way we can reuse these code in backend. And here is a very simple desgin: !select-iceberg.jpg! Still some questions need to be consider: # If iceberg returns different format files, such as parquet/orc, does backend can handle these files? # if not, we may decide the table data format when create table, maybe by tblproperties, like this: 'iceberg_table'='parquet' was (Author: skyyws): Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement query iceberg by impala recently, and here is my initial desgin. I will write a class named IcebergScanNode.java in frontend, and this class mainly contains these functions: * Transform impala conjunts to iceberg expressions, which means we can pushdown some predicates to icebrg; * Get specific data files from icebreg by these expressions, which stored in hdfs; * Use these specific data files to construct related thrift struct, such as THdfsFileSplit/TScanRangerSpec; * And then backend will.use these thrift structs to construct "SCAN HDFS" to scan data, and this way we can reuse these code in backend. And here is a very simple desgin: !select-iceberg.png! > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Attachments: select-iceberg.jpg > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9741: -- Attachment: select-iceberg.jpg > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Attachments: select-iceberg.jpg > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9741: -- Attachment: (was: select-iceberg.png) > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Attachments: select-iceberg.jpg > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108158#comment-17108158 ] WangSheng commented on IMPALA-9741: --- Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement query iceberg by impala recently, and here is my initial desgin. I will write a class named IcebergScanNode.java in frontend, and this class mainly contains these functions: * Transform impala conjunts to iceberg expressions, which means we can pushdown some predicates to icebrg; * Get specific data files from icebreg by these expressions, which stored in hdfs; * Use these specific data files to construct related thrift struct, such as THdfsFileSplit/TScanRangerSpec; * And then backend will.use these thrift structs to construct "SCAN HDFS" to scan data, and this way we can reuse these code in backend. And here is a very simple desgin: !select-iceberg.png! > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Attachments: select-iceberg.png > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9741: -- Attachment: select-iceberg.png > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Attachments: select-iceberg.png > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9741 started by WangSheng. - > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9741) Support query iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9741: -- Description: Since we have submit an patch of supporting create iceberg table by impala in IMPALA-9688, we are preparing to implement iceberg table query by impala. But we need to read the impala and iceberg code deeply to determine how to do this. > Support query iceberg table by impala > - > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9741) Support query iceberg table by impala
WangSheng created IMPALA-9741: - Summary: Support query iceberg table by impala Key: IMPALA-9741 URL: https://issues.apache.org/jira/browse/IMPALA-9741 Project: IMPALA Issue Type: Sub-task Reporter: WangSheng Assignee: WangSheng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9741) Support query iceberg table by impala
WangSheng created IMPALA-9741: - Summary: Support query iceberg table by impala Key: IMPALA-9741 URL: https://issues.apache.org/jira/browse/IMPALA-9741 Project: IMPALA Issue Type: Sub-task Reporter: WangSheng Assignee: WangSheng -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9621 started by WangSheng. - > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097812#comment-17097812 ] WangSheng edited comment on IMPALA-9621 at 5/2/20, 6:14 AM: Hi [~stakiar][~rdblue], recently I discussed with the colleague who is research the iceberg, we found that it not necessary to specify the format when creating a iceberg table. The mainly reason file format is file level, instead of table level, which means a table in iceberg can have different format for each data files in hdfs. In file [IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java], function open() construct iterator by file format which means different file may have different format. So I didn't consider table format in IMPALA-9688 when creating iceberg table by impala. And we are trying to implement query iceberg table by impala recently, if there is any progress, I will update here as well. was (Author: skyyws): Hi [~stakiar][~rdblue], I discussed with the colleague responsible for iceberg recently, we found that it not necessary to specify the format when creating a iceberg table. The mainly reason file format is file level, instead of table level, which means a table in iceberg can have different format for each data files in hdfs. In file [IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java], function open() construct iterator by file format which means different file may have different format. So I didn't consider table format in IMPALA-9688 when creating iceberg table by impala. And we are trying to implement query iceberg table by impala recently, if there is any progress, I will update here as well. > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097812#comment-17097812 ] WangSheng commented on IMPALA-9621: --- Hi [~stakiar][~rdblue], I discussed with the colleague responsible for iceberg recently, we found that it not necessary to specify the format when creating a iceberg table. The mainly reason file format is file level, instead of table level, which means a table in iceberg can have different format for each data files in hdfs. In file [IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java], function open() construct iterator by file format which means different file may have different format. So I didn't consider table format in IMPALA-9688 when creating iceberg table by impala. And we are trying to implement query iceberg table by impala recently, if there is any progress, I will update here as well. > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9688) Support create iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097795#comment-17097795 ] WangSheng commented on IMPALA-9688: --- Hi [~stakiar], thanks for your question. I have already add a new syntax "partition by spec" to create iceberg table with partitions in my patch, so the above create sql can execute successful in my test environment. I design this mainly because iceberg partition is very different from hdfs table, and this is also refer to the design of kudu table create(partition by hash/range) {code:java} create table iceberg_test( level string, event_time string, message string) partition by spec( level identity, event_time identity ) stored as iceberg; create table hdfs_test( level string, event_time string, message string) partitioned by ( dt string ) stored as parquet; {code} > Support create iceberg table by impala > -- > > Key: IMPALA-9688 > URL: https://issues.apache.org/jira/browse/IMPALA-9688 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > This sub-task mainly realizes the creation of iceberg table through impala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094972#comment-17094972 ] WangSheng edited comment on IMPALA-9621 at 4/29/20, 1:37 AM: - Hi [~stakiar], thanks for your reply. Hive for iceberg is on progess now, you can find the related code: [IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java, and here is another related project on github: [hiveberg|https://github.com/ExpediaGroup/hiveberg]. was (Author: skyyws): Hi [~stakiar], thanks for your replay. Hive for iceberg is on progess now, you can find the related code: [IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java, and here is another related project on github: [hiveberg|https://github.com/ExpediaGroup/hiveberg]. > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094972#comment-17094972 ] WangSheng edited comment on IMPALA-9621 at 4/29/20, 1:37 AM: - Hi [~stakiar], thanks for your reply. Hive for iceberg is on progess now, you can find the related code: [IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java], and here is another related project on github: [hiveberg|https://github.com/ExpediaGroup/hiveberg]. was (Author: skyyws): Hi [~stakiar], thanks for your reply. Hive for iceberg is on progess now, you can find the related code: [IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java, and here is another related project on github: [hiveberg|https://github.com/ExpediaGroup/hiveberg]. > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094972#comment-17094972 ] WangSheng commented on IMPALA-9621: --- Hi [~stakiar], thanks for your replay. Hive for iceberg is on progess now, you can find the related code: [IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java, and here is another related project on github: [hiveberg|https://github.com/ExpediaGroup/hiveberg]. > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9688) Support create iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091126#comment-17091126 ] WangSheng edited comment on IMPALA-9688 at 4/24/20, 3:20 AM: - Hi [~tarmstrong], [~stigahuang], I've already implemented a simple version to create iceberg table by impala. We can use the following sql to create an iceberg table: {code:java} create table iceberg_test( level string, event_time string, message string) partition by spec( level identity, event_time identity ) stored as iceberg; {code} this query would be transformed as a iceberg table shcema like this: {code:java} Schema schema = new Schema( Types.NestedField.required(1, "level", Types.StringType.get()), Types.NestedField.required(2, "event_time", Types.StringType.get()), Types.NestedField.required(3, "message", Types.StringType.get())); PartitionSpec spec = PartitionSpec.builderFor(schema).identity("event_time").identity("level").build(); HadoopTables.create(schema, spec, location); {code} We can also use show create table xxx and show partitions xxx for iceberg table. I referred to the implementation of kudu table by defined a new IcebergTable and related classes. I know there are many places need to be revised and improved. The point is I'm not sure if this solution is feasible, so hope you guys can give me some suggestions, thanks a lot! And here is the gerrit url: https://gerrit.cloudera.org/#/c/15797 was (Author: skyyws): Hi [~tarmstrong], [~stigahuang], I've already implemented a simple version to create iceberg table by impala. We can use the following sql to create an iceberg table: {code:java} create table iceberg_test( level string, event_time string, message string) partition by spec( level identity, event_time identity ) stored as iceberg; {code} this query would be transformed as a iceberg table shcema like this: {code:java} Schema schema = new Schema( Types.NestedField.required(1, "level", Types.StringType.get()), Types.NestedField.required(2, "event_time", Types.StringType.get()), Types.NestedField.required(3, "message", Types.StringType.get())); PartitionSpec spec = PartitionSpec.builderFor(schema).identity("event_time").identity("level").build(); HadoopTables.create(schema, spec, location); {code} We can also use show create table xxx and show partitions xxx for iceberg. I referred to the implementation of kudu table by defined a new IcebergTable and related classes. I know there are many places need to be revised and improved. The point is I'm not sure if this solution is feasible, so hope you guys can give me some suggestions, thanks a lot! And here is the gerrit url: https://gerrit.cloudera.org/#/c/15797 > Support create iceberg table by impala > -- > > Key: IMPALA-9688 > URL: https://issues.apache.org/jira/browse/IMPALA-9688 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > This sub-task mainly realizes the creation of iceberg table through impala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9688) Support create iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091126#comment-17091126 ] WangSheng commented on IMPALA-9688: --- Hi [~tarmstrong], [~stigahuang], I've already implemented a simple version to create iceberg table by impala. We can use the following sql to create an iceberg table: {code:java} create table iceberg_test( level string, event_time string, message string) partition by spec( level identity, event_time identity ) stored as iceberg; {code} this query would be transformed as a iceberg table shcema like this: {code:java} Schema schema = new Schema( Types.NestedField.required(1, "level", Types.StringType.get()), Types.NestedField.required(2, "event_time", Types.StringType.get()), Types.NestedField.required(3, "message", Types.StringType.get())); PartitionSpec spec = PartitionSpec.builderFor(schema).identity("event_time").identity("level").build(); HadoopTables.create(schema, spec, location); {code} We can also use show create table xxx and show partitions xxx for iceberg. I referred to the implementation of kudu table by defined a new IcebergTable and related classes. I know there are many places need to be revised and improved. The point is I'm not sure if this solution is feasible, so hope you guys can give me some suggestions, thanks a lot! And here is the gerrit url: https://gerrit.cloudera.org/#/c/15797 > Support create iceberg table by impala > -- > > Key: IMPALA-9688 > URL: https://issues.apache.org/jira/browse/IMPALA-9688 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > This sub-task mainly realizes the creation of iceberg table through impala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9688) Support create iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9688: -- Description: This sub-task mainly realizes the creation of iceberg table through impala (was: This sub-task mainly implement iceberg table create by impala) > Support create iceberg table by impala > -- > > Key: IMPALA-9688 > URL: https://issues.apache.org/jira/browse/IMPALA-9688 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > This sub-task mainly realizes the creation of iceberg table through impala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-9688) Support create iceberg table by impala
[ https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9688 started by WangSheng. - > Support create iceberg table by impala > -- > > Key: IMPALA-9688 > URL: https://issues.apache.org/jira/browse/IMPALA-9688 > Project: IMPALA > Issue Type: Sub-task >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > This sub-task mainly implement iceberg table create by impala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9688) Support create iceberg table by impala
WangSheng created IMPALA-9688: - Summary: Support create iceberg table by impala Key: IMPALA-9688 URL: https://issues.apache.org/jira/browse/IMPALA-9688 Project: IMPALA Issue Type: Sub-task Reporter: WangSheng Assignee: WangSheng This sub-task mainly implement iceberg table create by impala -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9688) Support create iceberg table by impala
WangSheng created IMPALA-9688: - Summary: Support create iceberg table by impala Key: IMPALA-9688 URL: https://issues.apache.org/jira/browse/IMPALA-9688 Project: IMPALA Issue Type: Sub-task Reporter: WangSheng Assignee: WangSheng This sub-task mainly implement iceberg table create by impala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082186#comment-17082186 ] WangSheng edited comment on IMPALA-9621 at 4/13/20, 11:24 AM: -- [~tarmstrong]Thanks for your suggestion, Tim. We will continue to study to see if we can find a suitable implementation solution.If there is any progress, I will update here. was (Author: skyyws): [~tarmstrong]Thanks for your suggestion, Tim. I found that iceberg is not very similar to hive table. Each format has its own input and output like MapredParquetInputFormat/MapredParquetOutputFormat, KuduInputFormat/KuduOutputFormat and so on, but iceberg does not. We cannot just add ICEBERG as a new file format in HdfsFileFormat, whether implement it is as a new table or a special hdfstable. We will continue to study to see if we can find a suitable implementation solution.If there is any progress, I will update here. > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082186#comment-17082186 ] WangSheng commented on IMPALA-9621: --- [~tarmstrong]Thanks for your suggestion, Tim. I found that iceberg is not very similar to hive table. Each format has its own input and output like MapredParquetInputFormat/MapredParquetOutputFormat, KuduInputFormat/KuduOutputFormat and so on, but iceberg does not. We cannot just add ICEBERG as a new file format in HdfsFileFormat, whether implement it is as a new table or a special hdfstable. We will continue to study to see if we can find a suitable implementation solution.If there is any progress, I will update here. > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080176#comment-17080176 ] WangSheng edited comment on IMPALA-9621 at 4/10/20, 2:38 AM: - [~tarmstrong]Hi Tim, thanks for your reply again. Do you mean shared the code of HdfsScanNode, and treat iceberg as another HdfsTable? Or just implement icebergTable with HdfsScanNode? We planned to implement this by treating iceberg as ICEBERG_PARQUET, just like HUDI_PARQUET as first. But after read iceberg source code, we found that metadata structure is different with impala, iceberg manage metadata itself by referring a hdfs location. Even if we can use HiveCatalog api, we cannot read iceberg data on hdfs directly, it doesn't like normal hdfs table structure: hfs://xxx/db/table/partition=xxx/xxx. As you mentioned above, a lot of it might be very different, so I will study the iceberg code more deeply to see if I can find a better way. Hope for your more advice, thanks! was (Author: skyyws): [~tarmstrong]Hi Tim, thanks for your reply again. Do you mean shared the code of HdfsScanNode, and treat iceberg as another HdfsTable? Or just implement icebergTable with HdfsScanNode? We planned to implement this by treating iceberg as ICEBERG_PARQUET, just like HUDI_PARQUET as first. But after read iceberg source code, we found that metadata structure is different with impala, iceberg manage metadata itself by referring a hdfs location. Even if we can use HiveCatalog api, we cannot read iceberg data on hdfs directly, it doesn't like normal hdfs table structure: hfs://xxx/db/table/partition=xxx/xxx. As you mentioned above, a lot of it might be very different, so I will study the iceberg code more deeply to see if I can find a better way. Hope for your more advice, thanks! > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080176#comment-17080176 ] WangSheng commented on IMPALA-9621: --- [~tarmstrong]Hi Tim, thanks for your reply again. Do you mean shared the code of HdfsScanNode, and treat iceberg as another HdfsTable? Or just implement icebergTable with HdfsScanNode? We planned to implement this by treating iceberg as ICEBERG_PARQUET, just like HUDI_PARQUET as first. But after read iceberg source code, we found that metadata structure is different with impala, iceberg manage metadata itself by referring a hdfs location. Even if we can use HiveCatalog api, we cannot read iceberg data on hdfs directly, it doesn't like normal hdfs table structure: hfs://xxx/db/table/partition=xxx/xxx. As you mentioned above, a lot of it might be very different, so I will study the iceberg code more deeply to see if I can find a better way. Hope for your more advice, thanks! > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079038#comment-17079038 ] WangSheng commented on IMPALA-9621: --- [~tarmstrong] Hi Tim, here is the quick start of iceberg api: [create-table|https://iceberg.apache.org/api-quickstart/#create-a-table]. And I've already read the iceberg source code, when use HiveCatalog to create table, iceberg will call HiveMetaStoreClient to create a table in HMS, you can found the code in [HiveTableOperations.doCommit()|https://github.com/apache/incubator-iceberg/blob/master/hive/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java]. I will test this on my local environment lately, and also try to stay consistent if possible. > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079031#comment-17079031 ] WangSheng commented on IMPALA-9621: --- [~stakiar]Thanks for your suggestion, Sahil. I've already read the code in IMPALA-8778 several days age. This path support impala read Hudi optimized table by treat HUDI_PARQUET as another special parquet. When handle with HUDI_PARQUET, impala just filter and then treat as an normal parquet. My opinion is to treat iceberg as a new data source such as kudu, HBase, so we could create/drop/alter/select iceberg table by impala. > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1705#comment-1705 ] WangSheng edited comment on IMPALA-9621 at 4/8/20, 3:17 AM: Here are some of my thoughts: * Refer to the implementation of kudu related operation, we need to implement IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, iceberg-scan-node.cc and so on; * Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType; * Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. If we use HiveCatalog, we can directly call the API of iceberg to create the table, impala does't need to create hms table independently. If we use HadoopTables, impala should create hms table firstly, and then call HadoopTables to create iceberg table, just like create kudu table; * Iceberg now only support parquet file format, but may support orc in the future, so I'm not sure it is necessary to implement IcebergFileFormat like HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the second method is significantly simpler to implement. And I try to split this task as some sub-task: * Implement metadata related modification, such as create/drop/alter and so on; * Support query/insert iceberg table; * Do some query optimization to improve query performance; * Other related work; These are some of my simple ideas, more details still need to think. Hope you guys can give me some advice [~stigahuang][~tarmstrong]. Others are very welcome to give me more suggestions, thanks a lot! was (Author: skyyws): Here are some of my thoughts: * Refer to the implementation of kudu related operation, we need to implement IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, iceberg-scan-node.cc and so on; * Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType; * Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. If we use HiveCatalog, we can directly call the API of iceberg to create the table, impala does't need to create hms table independently. If we use HadoopTables, impala should create hms table firstly, and then call HadoopTables to create iceberg table, just like create kudu table; * Iceberg now only support parquet file format, but may support orc in the future, so I'm not sure it is necessary to implement IcebergFileFormat like HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the second method is significantly simpler to implement. And I try to split this task as some sub-task: * Implement metadata related modification, such as create/drop/alter and so on; * Support query/insert iceberg table; * Do some query optimization to improve query performance; * Other related work; These are some of my simple ideas, more details still need to think. Hope you guys can give me some advice [~stigahuang][~tarmstrong]. Others are very welcome to give me more suggestions, thanks a lot! > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1705#comment-1705 ] WangSheng edited comment on IMPALA-9621 at 4/8/20, 3:17 AM: Here are some of my thoughts: * Refer to the implementation of kudu related operation, we need to implement IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, iceberg-scan-node.cc and so on; * Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType; * Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. If we use HiveCatalog, we can directly call the API of iceberg to create the table, impala does't need to create hms table independently. If we use HadoopTables, impala should create hms table firstly, and then call HadoopTables to create iceberg table, just like create kudu table; * Iceberg now only support parquet file format, but may support orc in the future, so I'm not sure it is necessary to implement IcebergFileFormat like HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the second method is significantly simpler to implement. And I try to split this task as some sub-task: * Implement metadata related modification, such as create/drop/alter and so on; * Support query/insert iceberg table; * Do some query optimization to improve query performance; * Other related work; These are some of my simple ideas, more details still need to think. Hope you guys can give me some advice [~stigahuang][~tarmstrong]. Others are very welcome to give me more suggestions, thanks a lot! was (Author: skyyws): Here are some of my thoughts: * Refer to the implementation of kudu related operation, we need to implement IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, iceberg-scan-node.cc and so on; * Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType; * Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. If we use HiveCatalog, we can directly call the API of iceberg to create the table, impala does't need to create hms table independently. If we use HadoopTables, impala should create hms table firstly, and then call HadoopTables to create iceberg table, just like create kudu table; * Iceberg now only support parquet file format, but may support orc in the future, so I'm not sure it is necessary to implement IcebergFileFormat like HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the second method is significantly simpler to implement. And I try to split this task as some sub-task: * Implement metadata related modification, such as create/drop/alter and so on; * Support query/insert iceberg table; * Do some query optimization to improve query performance; * Other related work; These are some of my simple ideas, more details still need to think. Hope you guys can give me some advice [~stigahuang][~tarmstrong]. Others are very welcome to give me more suggestions, thanks a lot! > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1705#comment-1705 ] WangSheng edited comment on IMPALA-9621 at 4/8/20, 3:17 AM: Here are some of my thoughts: * Refer to the implementation of kudu related operation, we need to implement IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, iceberg-scan-node.cc and so on; * Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType; * Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. If we use HiveCatalog, we can directly call the API of iceberg to create the table, impala does't need to create hms table independently. If we use HadoopTables, impala should create hms table firstly, and then call HadoopTables to create iceberg table, just like create kudu table; * Iceberg now only support parquet file format, but may support orc in the future, so I'm not sure it is necessary to implement IcebergFileFormat like HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the second method is significantly simpler to implement. And I try to split this task as some sub-task: * Implement metadata related modification, such as create/drop/alter and so on; * Support query/insert iceberg table; * Do some query optimization to improve query performance; * Other related work; These are some of my simple ideas, more details still need to think. Hope you guys can give me some advice [~stigahuang], [~tarmstrong]. Others are very welcome to give me more suggestions, thanks a lot! was (Author: skyyws): Here are some of my thoughts: * Refer to the implementation of kudu related operation, we need to implement IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, iceberg-scan-node.cc and so on; * Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType; * Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. If we use HiveCatalog, we can directly call the API of iceberg to create the table, impala does't need to create hms table independently. If we use HadoopTables, impala should create hms table firstly, and then call HadoopTables to create iceberg table, just like create kudu table; * Iceberg now only support parquet file format, but may support orc in the future, so I'm not sure it is necessary to implement IcebergFileFormat like HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the second method is significantly simpler to implement. And I try to split this task as some sub-task: * Implement metadata related modification, such as create/drop/alter and so on; * Support query/insert iceberg table; * Do some query optimization to improve query performance; * Other related work; These are some of my simple ideas, more details still need to think. Hope you guys can give me some advice [~stigahuang][~tarmstrong]. Others are very welcome to give me more suggestions, thanks a lot! > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs
[ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1705#comment-1705 ] WangSheng commented on IMPALA-9621: --- Here are some of my thoughts: * Refer to the implementation of kudu related operation, we need to implement IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, iceberg-scan-node.cc and so on; * Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType; * Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. If we use HiveCatalog, we can directly call the API of iceberg to create the table, impala does't need to create hms table independently. If we use HadoopTables, impala should create hms table firstly, and then call HadoopTables to create iceberg table, just like create kudu table; * Iceberg now only support parquet file format, but may support orc in the future, so I'm not sure it is necessary to implement IcebergFileFormat like HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the second method is significantly simpler to implement. And I try to split this task as some sub-task: * Implement metadata related modification, such as create/drop/alter and so on; * Support query/insert iceberg table; * Do some query optimization to improve query performance; * Other related work; These are some of my simple ideas, more details still need to think. Hope you guys can give me some advice [~stigahuang][~tarmstrong]. Others are very welcome to give me more suggestions, thanks a lot! > Support iceberg on hdfs > --- > > Key: IMPALA-9621 > URL: https://issues.apache.org/jira/browse/IMPALA-9621 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > We are investigating iceberg recently, and preparing to implement select > iceberg data by impala. Our production use hdfs, so we will try to support > iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9621) Support iceberg on hdfs
WangSheng created IMPALA-9621: - Summary: Support iceberg on hdfs Key: IMPALA-9621 URL: https://issues.apache.org/jira/browse/IMPALA-9621 Project: IMPALA Issue Type: Improvement Reporter: WangSheng Assignee: WangSheng We are investigating iceberg recently, and preparing to implement select iceberg data by impala. Our production use hdfs, so we will try to support iceberg on hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9264) Support catalogd without HMS
[ https://issues.apache.org/jira/browse/IMPALA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028643#comment-17028643 ] WangSheng commented on IMPALA-9264: --- [~vihangk1] Sorry for my late reply due to the spring festival vacation, Vihang. "catalogd connected to mysql/pg directly" is "Local/Embedded Metastore Server" in catalogd server (the document you mentioned above) which is used in some situation in our production enviroment. > Support catalogd without HMS > > > Key: IMPALA-9264 > URL: https://issues.apache.org/jira/browse/IMPALA-9264 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 3.3.0 >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > In my company, catalogd connected to mysql/pg directly (instead of by > metastore service) is a very common usage. And we just need to config > hive-site.xml like this(metastore related config such as hive.metastore.uris > is unnecessary): > {code:java} > > javax.jdo.option.ConnectionDriverName > org.postgresql.Driver > > > javax.jdo.option.ConnectionPassword > password > > > javax.jdo.option.ConnectionURL > jdbc:postgresql://localhost:5432/HMS_home_impala > > > javax.jdo.option.ConnectionUserName > hiveuser > > {code} > Recently, when I test impala-3.3 in this situation, I found that created kudu > managed table failed > ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've > already fixed this. > I guess there maybe other funcionality that have not been took into > considertion in this situation. So I built this jira to collect those > functionality, and I'm willing to continue contributing when I'm free. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9287) test_kudu_table_create_without_hms fails on Hive-3 environment
[ https://issues.apache.org/jira/browse/IMPALA-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019174#comment-17019174 ] WangSheng commented on IMPALA-9287: --- Here is the Gerri url: https://gerrit.cloudera.org/#/c/15057/ > test_kudu_table_create_without_hms fails on Hive-3 environment > -- > > Key: IMPALA-9287 > URL: https://issues.apache.org/jira/browse/IMPALA-9287 > Project: IMPALA > Issue Type: Test > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Vihang Karajgaonkar >Assignee: WangSheng >Priority: Blocker > Labels: broken-build > > {{test_kudu_table_create_without_hms}} which was added recently in > IMPALA-9266 fails when Hive-3 is used. To reproduce the issue build Impala > after setting {{USE_CDP_HIVE=true}} and then run the test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9287) test_kudu_table_create_without_hms fails on Hive-3 environment
[ https://issues.apache.org/jira/browse/IMPALA-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng reassigned IMPALA-9287: - Assignee: WangSheng > test_kudu_table_create_without_hms fails on Hive-3 environment > -- > > Key: IMPALA-9287 > URL: https://issues.apache.org/jira/browse/IMPALA-9287 > Project: IMPALA > Issue Type: Test > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Vihang Karajgaonkar >Assignee: WangSheng >Priority: Blocker > Labels: broken-build > > {{test_kudu_table_create_without_hms}} which was added recently in > IMPALA-9266 fails when Hive-3 is used. To reproduce the issue build Impala > after setting {{USE_CDP_HIVE=true}} and then run the test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9287) test_kudu_table_create_without_hms fails on Hive-3 environment
[ https://issues.apache.org/jira/browse/IMPALA-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016456#comment-17016456 ] WangSheng commented on IMPALA-9287: --- [~vihangk1] Sorry for my late reply, I will try to solve this problem as soon as prossible. > test_kudu_table_create_without_hms fails on Hive-3 environment > -- > > Key: IMPALA-9287 > URL: https://issues.apache.org/jira/browse/IMPALA-9287 > Project: IMPALA > Issue Type: Test > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Vihang Karajgaonkar >Priority: Blocker > Labels: broken-build > > {{test_kudu_table_create_without_hms}} which was added recently in > IMPALA-9266 fails when Hive-3 is used. To reproduce the issue build Impala > after setting {{USE_CDP_HIVE=true}} and then run the test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9266) TestLogFragments.test_log_fragments fails due to missing log
[ https://issues.apache.org/jira/browse/IMPALA-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005289#comment-17005289 ] WangSheng commented on IMPALA-9266: --- Write a new custom cluster test case to replace original fe test case. Here is the url: https://gerrit.cloudera.org/#/c/14962/ > TestLogFragments.test_log_fragments fails due to missing log > > > Key: IMPALA-9266 > URL: https://issues.apache.org/jira/browse/IMPALA-9266 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Joe McDonnell >Assignee: WangSheng >Priority: Blocker > Labels: broken-build > > TestLogFragments.test_log_fragments is failing due to missing a log entry: > {noformat} > /data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/observability/test_log_fragments.py:46: > in test_log_fragments > "] Analysis and authorization finished.") > common/impala_test_suite.py:1149: in assert_impalad_log_contains > self.assert_log_contains("impalad", level, line_regex, expected_count) > common/impala_test_suite.py:1185: in assert_log_contains > (expected_count, log_file_path, line_regex, found, line) > E AssertionError: Expected 1 lines in file > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/ee_tests/impalad.impala-ec2-centos74-m5-4xlarge-ondemand-088c.vpc.cloudera.com.jenkins.log.INFO.20191227-001949.23945 > matching regex 'ce41d657e70d6890:6f0f227d] Analysis and > authorization finished.', but found 0 lines. Last line was: > E Caught signal: SIGTERM. Daemon will exit.{noformat} > This started happening after the "IMPALA-8974: Fixed a bug when create kudu > managed table without HMS" commit went in. That commit adds a test that > restarts Impala in a frontend test. The problem is that it runs > start-impala-cluster.py without arguments, whereas bin/run-all-tests.sh runs > start-impala-cluster.py specifying the --log_dir. This would put the log > files in a different location (/tmp?). > [https://github.com/apache/impala/blob/320f05852060c1027326ac20be7df340a7a5263f/fe/src/test/java/org/apache/impala/catalog/CreateKuduTableWithoutHMSTest.java#L98] > [https://github.com/apache/impala/blob/master/bin/run-all-tests.sh#L165-L167] > In one run that hit this issue, there are two sets of impalad logs in the > ee_test directory. One set starts at 06:40:22 and ends at 07:11:28. The > second set starts at 09:45:25 and ends at 09:47:30. So, this is missing 2.5 > hours of ee_test log files, which matches the theory. > This is also likely to impact other things like erasure coding or tests that > run against the data cache. > GVO doesn't hit this because the job that runs frontend tests does not run > end to end tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-9266) TestLogFragments.test_log_fragments fails due to missing log
[ https://issues.apache.org/jira/browse/IMPALA-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9266 started by WangSheng. - > TestLogFragments.test_log_fragments fails due to missing log > > > Key: IMPALA-9266 > URL: https://issues.apache.org/jira/browse/IMPALA-9266 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Joe McDonnell >Assignee: WangSheng >Priority: Blocker > Labels: broken-build > > TestLogFragments.test_log_fragments is failing due to missing a log entry: > {noformat} > /data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/observability/test_log_fragments.py:46: > in test_log_fragments > "] Analysis and authorization finished.") > common/impala_test_suite.py:1149: in assert_impalad_log_contains > self.assert_log_contains("impalad", level, line_regex, expected_count) > common/impala_test_suite.py:1185: in assert_log_contains > (expected_count, log_file_path, line_regex, found, line) > E AssertionError: Expected 1 lines in file > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/ee_tests/impalad.impala-ec2-centos74-m5-4xlarge-ondemand-088c.vpc.cloudera.com.jenkins.log.INFO.20191227-001949.23945 > matching regex 'ce41d657e70d6890:6f0f227d] Analysis and > authorization finished.', but found 0 lines. Last line was: > E Caught signal: SIGTERM. Daemon will exit.{noformat} > This started happening after the "IMPALA-8974: Fixed a bug when create kudu > managed table without HMS" commit went in. That commit adds a test that > restarts Impala in a frontend test. The problem is that it runs > start-impala-cluster.py without arguments, whereas bin/run-all-tests.sh runs > start-impala-cluster.py specifying the --log_dir. This would put the log > files in a different location (/tmp?). > [https://github.com/apache/impala/blob/320f05852060c1027326ac20be7df340a7a5263f/fe/src/test/java/org/apache/impala/catalog/CreateKuduTableWithoutHMSTest.java#L98] > [https://github.com/apache/impala/blob/master/bin/run-all-tests.sh#L165-L167] > In one run that hit this issue, there are two sets of impalad logs in the > ee_test directory. One set starts at 06:40:22 and ends at 07:11:28. The > second set starts at 09:45:25 and ends at 09:47:30. So, this is missing 2.5 > hours of ee_test log files, which matches the theory. > This is also likely to impact other things like erasure coding or tests that > run against the data cache. > GVO doesn't hit this because the job that runs frontend tests does not run > end to end tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9266) TestLogFragments.test_log_fragments fails due to missing log
[ https://issues.apache.org/jira/browse/IMPALA-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005083#comment-17005083 ] WangSheng commented on IMPALA-9266: --- Here is the commit for this jira: https://gerrit.cloudera.org/#/c/14957/ > TestLogFragments.test_log_fragments fails due to missing log > > > Key: IMPALA-9266 > URL: https://issues.apache.org/jira/browse/IMPALA-9266 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Joe McDonnell >Assignee: WangSheng >Priority: Blocker > Labels: broken-build > > TestLogFragments.test_log_fragments is failing due to missing a log entry: > {noformat} > /data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/observability/test_log_fragments.py:46: > in test_log_fragments > "] Analysis and authorization finished.") > common/impala_test_suite.py:1149: in assert_impalad_log_contains > self.assert_log_contains("impalad", level, line_regex, expected_count) > common/impala_test_suite.py:1185: in assert_log_contains > (expected_count, log_file_path, line_regex, found, line) > E AssertionError: Expected 1 lines in file > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/ee_tests/impalad.impala-ec2-centos74-m5-4xlarge-ondemand-088c.vpc.cloudera.com.jenkins.log.INFO.20191227-001949.23945 > matching regex 'ce41d657e70d6890:6f0f227d] Analysis and > authorization finished.', but found 0 lines. Last line was: > E Caught signal: SIGTERM. Daemon will exit.{noformat} > This started happening after the "IMPALA-8974: Fixed a bug when create kudu > managed table without HMS" commit went in. That commit adds a test that > restarts Impala in a frontend test. The problem is that it runs > start-impala-cluster.py without arguments, whereas bin/run-all-tests.sh runs > start-impala-cluster.py specifying the --log_dir. This would put the log > files in a different location (/tmp?). > [https://github.com/apache/impala/blob/320f05852060c1027326ac20be7df340a7a5263f/fe/src/test/java/org/apache/impala/catalog/CreateKuduTableWithoutHMSTest.java#L98] > [https://github.com/apache/impala/blob/master/bin/run-all-tests.sh#L165-L167] > In one run that hit this issue, there are two sets of impalad logs in the > ee_test directory. One set starts at 06:40:22 and ends at 07:11:28. The > second set starts at 09:45:25 and ends at 09:47:30. So, this is missing 2.5 > hours of ee_test log files, which matches the theory. > This is also likely to impact other things like erasure coding or tests that > run against the data cache. > GVO doesn't hit this because the job that runs frontend tests does not run > end to end tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9266) TestLogFragments.test_log_fragments fails due to missing log
[ https://issues.apache.org/jira/browse/IMPALA-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng reassigned IMPALA-9266: - Assignee: WangSheng > TestLogFragments.test_log_fragments fails due to missing log > > > Key: IMPALA-9266 > URL: https://issues.apache.org/jira/browse/IMPALA-9266 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Joe McDonnell >Assignee: WangSheng >Priority: Blocker > Labels: broken-build > > TestLogFragments.test_log_fragments is failing due to missing a log entry: > {noformat} > /data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/observability/test_log_fragments.py:46: > in test_log_fragments > "] Analysis and authorization finished.") > common/impala_test_suite.py:1149: in assert_impalad_log_contains > self.assert_log_contains("impalad", level, line_regex, expected_count) > common/impala_test_suite.py:1185: in assert_log_contains > (expected_count, log_file_path, line_regex, found, line) > E AssertionError: Expected 1 lines in file > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/ee_tests/impalad.impala-ec2-centos74-m5-4xlarge-ondemand-088c.vpc.cloudera.com.jenkins.log.INFO.20191227-001949.23945 > matching regex 'ce41d657e70d6890:6f0f227d] Analysis and > authorization finished.', but found 0 lines. Last line was: > E Caught signal: SIGTERM. Daemon will exit.{noformat} > This started happening after the "IMPALA-8974: Fixed a bug when create kudu > managed table without HMS" commit went in. That commit adds a test that > restarts Impala in a frontend test. The problem is that it runs > start-impala-cluster.py without arguments, whereas bin/run-all-tests.sh runs > start-impala-cluster.py specifying the --log_dir. This would put the log > files in a different location (/tmp?). > [https://github.com/apache/impala/blob/320f05852060c1027326ac20be7df340a7a5263f/fe/src/test/java/org/apache/impala/catalog/CreateKuduTableWithoutHMSTest.java#L98] > [https://github.com/apache/impala/blob/master/bin/run-all-tests.sh#L165-L167] > In one run that hit this issue, there are two sets of impalad logs in the > ee_test directory. One set starts at 06:40:22 and ends at 07:11:28. The > second set starts at 09:45:25 and ends at 09:47:30. So, this is missing 2.5 > hours of ee_test log files, which matches the theory. > This is also likely to impact other things like erasure coding or tests that > run against the data cache. > GVO doesn't hit this because the job that runs frontend tests does not run > end to end tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-9268) CreateKuduTableWithoutHMSTest caused TestLogFragments failed
[ https://issues.apache.org/jira/browse/IMPALA-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng closed IMPALA-9268. - duplicated > CreateKuduTableWithoutHMSTest caused TestLogFragments failed > > > Key: IMPALA-9268 > URL: https://issues.apache.org/jira/browse/IMPALA-9268 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.3.0 >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Fix For: Impala 3.4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9268) CreateKuduTableWithoutHMSTest caused TestLogFragments failed
[ https://issues.apache.org/jira/browse/IMPALA-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004618#comment-17004618 ] WangSheng edited comment on IMPALA-9268 at 12/29/19 2:44 AM: - Hi [~tarmstrong], I sorry for caused this bug. I didn't realize to run all test on jenkin until Quanlong suggest me to submit an exhaustive test. I can be sure this is a bug after twice exhaustive tests on jenkins. But IMPALA-8794 already merge into master, so I can only create new Jira to handle this bug but duplicate with IMPALA-9266, so I will close this Jira and abandon my change on gerrit. was (Author: skyyws): Hi [~tarmstrong], I sorry for caused this bug. I didn't realize to run all test on jenkin until Quanlong suggest me to submit an exhaustive test. I can be sure this is a bug after twice exhaustive tests on jenkins. But IMPALA-8794 already merge into master, so I can only create new Jira to handle this bug but duplicate with IMPALA-9268, so I will close this Jira and abandon my change on gerrit. > CreateKuduTableWithoutHMSTest caused TestLogFragments failed > > > Key: IMPALA-9268 > URL: https://issues.apache.org/jira/browse/IMPALA-9268 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.3.0 >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Fix For: Impala 3.4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-9268) CreateKuduTableWithoutHMSTest caused TestLogFragments failed
[ https://issues.apache.org/jira/browse/IMPALA-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng closed IMPALA-9268. - duplicated > CreateKuduTableWithoutHMSTest caused TestLogFragments failed > > > Key: IMPALA-9268 > URL: https://issues.apache.org/jira/browse/IMPALA-9268 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.3.0 >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Fix For: Impala 3.4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IMPALA-9268) CreateKuduTableWithoutHMSTest caused TestLogFragments failed
[ https://issues.apache.org/jira/browse/IMPALA-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004618#comment-17004618 ] WangSheng commented on IMPALA-9268: --- Hi [~tarmstrong], I sorry for caused this bug. I didn't realize to run all test on jenkin until Quanlong suggest me to submit an exhaustive test. I can be sure this is a bug after twice exhaustive tests on jenkins. But IMPALA-8794 already merge into master, so I can only create new Jira to handle this bug but duplicate with IMPALA-9268, so I will close this Jira and abandon my change on gerrit. > CreateKuduTableWithoutHMSTest caused TestLogFragments failed > > > Key: IMPALA-9268 > URL: https://issues.apache.org/jira/browse/IMPALA-9268 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.3.0 >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Fix For: Impala 3.4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9268) CreateKuduTableWithoutHMSTest caused TestLogFragments failed
[ https://issues.apache.org/jira/browse/IMPALA-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004615#comment-17004615 ] WangSheng commented on IMPALA-9268: --- [~laszlog], you are right, I just run frontend test, not all test, so I didn't found this bug. I did't notice your jira, so I create a new Jira to solve this problem and submit patch to gerrit. If you prepare to do this, I will close this Jira and abandon my change on gerrit. Thanks for your attention! > CreateKuduTableWithoutHMSTest caused TestLogFragments failed > > > Key: IMPALA-9268 > URL: https://issues.apache.org/jira/browse/IMPALA-9268 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.3.0 >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > Fix For: Impala 3.4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9268) CreateKuduTableWithoutHMSTest caused TestLogFragments failed
WangSheng created IMPALA-9268: - Summary: CreateKuduTableWithoutHMSTest caused TestLogFragments failed Key: IMPALA-9268 URL: https://issues.apache.org/jira/browse/IMPALA-9268 Project: IMPALA Issue Type: Improvement Components: Frontend Affects Versions: Impala 3.3.0 Reporter: WangSheng Assignee: WangSheng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9264) Support catalogd without HMS
[ https://issues.apache.org/jira/browse/IMPALA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9264: -- Description: In my company, catalogd connected to mysql/pg directly (instead of by metastore service) is a very common usage. And we just need to config hive-site.xml like this(metastore related config such as hive.metastore.uris is unnecessary): {code:java} javax.jdo.option.ConnectionDriverName org.postgresql.Driver javax.jdo.option.ConnectionPassword password javax.jdo.option.ConnectionURL jdbc:postgresql://localhost:5432/HMS_home_impala javax.jdo.option.ConnectionUserName hiveuser {code} Recently, when I test impala-3.3 in this situation, I found that created kudu managed table failed ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've already fixed this. I guess there maybe other funcionality that have not been took into considertion in this situation. So I built this jira to collect those functionality, and I'm willing to continue contributing when I'm free. was: In my company, catalogd connected to mysql/pg directly (instead of by metastore service) is a very common usage. And we just need to config hive-site.xml like this(metastore related config such as hive.metastore.uris is unnecessary): {code:java} javax.jdo.option.ConnectionDriverName org.postgresql.Driver javax.jdo.option.ConnectionPassword password javax.jdo.option.ConnectionURL jdbc:postgresql://localhost:5432/HMS_home_impala javax.jdo.option.ConnectionUserName hiveuser {code} Recently, when I test impala-3.3 in this situation, I found that created kudu managed table failed ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've already fixed this. I guess there maybe other functions that have not been took into considertion in this situation. So I built this jira to collect those functions, and I'm willing to continue contributing when I'm free. > Support catalogd without HMS > > > Key: IMPALA-9264 > URL: https://issues.apache.org/jira/browse/IMPALA-9264 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 3.3.0 >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > In my company, catalogd connected to mysql/pg directly (instead of by > metastore service) is a very common usage. And we just need to config > hive-site.xml like this(metastore related config such as hive.metastore.uris > is unnecessary): > {code:java} > > javax.jdo.option.ConnectionDriverName > org.postgresql.Driver > > > javax.jdo.option.ConnectionPassword > password > > > javax.jdo.option.ConnectionURL > jdbc:postgresql://localhost:5432/HMS_home_impala > > > javax.jdo.option.ConnectionUserName > hiveuser > > {code} > Recently, when I test impala-3.3 in this situation, I found that created kudu > managed table failed > ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've > already fixed this. > I guess there maybe other funcionality that have not been took into > considertion in this situation. So I built this jira to collect those > functionality, and I'm willing to continue contributing when I'm free. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9264) Support catalogd without HMS
[ https://issues.apache.org/jira/browse/IMPALA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9264: -- Description: In my company, catalogd connected to mysql/pg directly (instead of by metastore service) is a very common usage. And we just need to config hive-site.xml like this(metastore related config such as hive.metastore.uris is unnecessary): {code:java} javax.jdo.option.ConnectionDriverName org.postgresql.Driver javax.jdo.option.ConnectionPassword password javax.jdo.option.ConnectionURL jdbc:postgresql://localhost:5432/HMS_home_impala javax.jdo.option.ConnectionUserName hiveuser {code} Recently, when I test impala-3.3 in this situation, I found that created kudu managed table failed ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've already fixed this. I guess there maybe other functions that have not been took into considertion in this situation. So I built this jira to collect those functions, and I'm willing to continue contributing when I'm free. was: In my company, catalogd connected to mysql/pg directly (instead of by metastore service) is a very common usage. And we just need to config hive-site.xml like this: {code:java} javax.jdo.option.ConnectionDriverName org.postgresql.Driver javax.jdo.option.ConnectionPassword password javax.jdo.option.ConnectionURL jdbc:postgresql://localhost:5432/HMS_home_impala javax.jdo.option.ConnectionUserName hiveuser {code} Recently, when I test impala-3.3 in this situation, I found that created kudu managed table failed ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've already fixed this. I guess there maybe other functions that have not been took into considertion in this situation. So I built this jira to collect those functions, and I'm willing to continue contributing when I'm free. Summary: Support catalogd without HMS (was: Support kinds of functions when catalogd connected to mysql/pg directly) > Support catalogd without HMS > > > Key: IMPALA-9264 > URL: https://issues.apache.org/jira/browse/IMPALA-9264 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 3.3.0 >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > In my company, catalogd connected to mysql/pg directly (instead of by > metastore service) is a very common usage. And we just need to config > hive-site.xml like this(metastore related config such as hive.metastore.uris > is unnecessary): > {code:java} > > javax.jdo.option.ConnectionDriverName > org.postgresql.Driver > > > javax.jdo.option.ConnectionPassword > password > > > javax.jdo.option.ConnectionURL > jdbc:postgresql://localhost:5432/HMS_home_impala > > > javax.jdo.option.ConnectionUserName > hiveuser > > {code} > Recently, when I test impala-3.3 in this situation, I found that created kudu > managed table failed > ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've > already fixed this. > I guess there maybe other functions that have not been took into considertion > in this situation. So I built this jira to collect those functions, and I'm > willing to continue contributing when I'm free. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9264) Support kinds of functions when catalogd connected to mysql/pg directly
[ https://issues.apache.org/jira/browse/IMPALA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9264: -- Summary: Support kinds of functions when catalogd connected to mysql/pg directly (was: Support kinds of function when catalogd connected to mysql/pg directly) > Support kinds of functions when catalogd connected to mysql/pg directly > --- > > Key: IMPALA-9264 > URL: https://issues.apache.org/jira/browse/IMPALA-9264 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 3.3.0 >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > In my company, catalogd connected to mysql/pg directly (instead of by > metastore service) is a very common usage. And we just need to config > hive-site.xml like this: > {code:java} > > javax.jdo.option.ConnectionDriverName > org.postgresql.Driver > > > javax.jdo.option.ConnectionPassword > password > > > javax.jdo.option.ConnectionURL > jdbc:postgresql://localhost:5432/HMS_home_impala > > > javax.jdo.option.ConnectionUserName > hiveuser > > {code} > Recently, when I test impala-3.3 in this situation, I found that created kudu > managed table failed > ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've > already fixed this. > I guess there maybe other functions that have not been took into considertion > in this situation. So I built this jira to collect those functions, and I'm > willing to continue contributing when I'm free. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9264) Support kinds of function when catalogd connected to mysql/pg directly
[ https://issues.apache.org/jira/browse/IMPALA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangSheng updated IMPALA-9264: -- Description: In my company, catalogd connected to mysql/pg directly (instead of by metastore service) is a very common usage. And we just need to config hive-site.xml like this: {code:java} javax.jdo.option.ConnectionDriverName org.postgresql.Driver javax.jdo.option.ConnectionPassword password javax.jdo.option.ConnectionURL jdbc:postgresql://localhost:5432/HMS_home_impala javax.jdo.option.ConnectionUserName hiveuser {code} Recently, when I test impala-3.3 in this situation, I found that created kudu managed table failed ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've already fixed this. I guess there maybe other functions that have not been took into considertion in this situation. So I built this jira to collect those functions, and I'm willing to continue contributing when I'm free was:In my company, > Support kinds of function when catalogd connected to mysql/pg directly > -- > > Key: IMPALA-9264 > URL: https://issues.apache.org/jira/browse/IMPALA-9264 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 3.3.0 >Reporter: WangSheng >Assignee: WangSheng >Priority: Major > > In my company, catalogd connected to mysql/pg directly (instead of by > metastore service) is a very common usage. And we just need to config > hive-site.xml like this: > {code:java} > > javax.jdo.option.ConnectionDriverName > org.postgresql.Driver > > > javax.jdo.option.ConnectionPassword > password > > > javax.jdo.option.ConnectionURL > jdbc:postgresql://localhost:5432/HMS_home_impala > > > javax.jdo.option.ConnectionUserName > hiveuser > > {code} > Recently, when I test impala-3.3 in this situation, I found that created kudu > managed table failed > ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've > already fixed this. > I guess there maybe other functions that have not been took into considertion > in this situation. So I built this jira to collect those functions, and I'm > willing to continue contributing when I'm free > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org