from:"WangSheng \(JIRA\)"

[jira] [Commented] (IMPALA-10308) Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with ASAN build

2020-11-02 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225068#comment-17225068
 ] 

WangSheng commented on IMPALA-10308:


Hi [~sql_forever], we need to create these test tables manually before execute 
tests, if you want to verify a specific test. Or you can run 
$IMPALA_HOME/bin/{{run-all-tests}}{{.sh to execute whole impala tests, impala 
server will create tests tables automatically, all DDL statements in 
functional_schema_template.sql will be executed before run tests, more details 
about impala test, you can refer: 
https://cwiki.apache.org/confluence/display/IMPALA/How+to+load%2C+run%2C+and+create+new+Impala+tests}}

> Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with 
> ASAN build
> 
>
> Key: IMPALA-10308
> URL: https://issues.apache.org/jira/browse/IMPALA-10308
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Qifan Chen
>Priority: Major
>
> The following error was seen when running the scanner test against the ASAN 
> build.
> {code:java}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'iceberg_partitioned'
> E   CAUSED BY: TableLoadingException: Error loading metadata for Iceberg 
> table hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned
> E   CAUSED BY: IllegalArgumentException: Can not create a Path from a null 
> string
>  TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': 
> 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> [gw2] linux2 -- Python 2.7.16 
> /home/qchen/Impala/bin/../infra/python/env-gcc7.5.0/bin/python
> query_test/test_scanners.py:357: in test_iceberg_query
> self.run_test_case('QueryTest/iceberg-query', vector)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:920: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:363: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:357: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:520: in __do_rpc
> {code}
> To reproduce, apply the following steps.
> {code:java}
> 1. Build: ${IMPALA_HOME}/buildall.sh -skiptests -ninja -asan
> 2. Run test: 
> cd {IMPALA_HOME} 
> $tests/run-tests.py --exploration_strategy=exhaustive 
> tests/query_test/test_scanners.py
> {code}
> Branch info.
> The master branch with ttps://github.com/apache/impala.git.  The HEAD points 
> at 193c2e773fa9f6772e4a7c30ed3a4f75029863f1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10308) Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with ASAN build

2020-11-02 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224611#comment-17224611
 ] 

WangSheng commented on IMPALA-10308:


Hi [~sql_forever], thanks for report this bug. It seems that this test failed 
when loading iceberg_partitioned. Have you even put the test files to hdfs 
manually like this?
{code:java}
// testdata/datasets/functional/functional_schema_template.sql
`hadoop fs -mkdir -p /test-warehouse/iceberg_test && \
hadoop fs -put -f ${IMPALA_HOME}/testdata/data/iceberg_test/iceberg_partitioned 
/test-warehouse/iceberg_test/
{code}
I've already rebuild code in my own environment by ninja and asan, but I can 
create external Iceberg table and query normally, like this:
{code:java}
 CREATE EXTERNAL TABLE functional_parquet.iceberg_partitioned ( 
  
   id INT,  
  
   user STRING, 
  
   action STRING,   
  
   event_time TIMESTAMP 
  
 )  
  
 PARTITION BY SPEC  
  
 (  
  
   event_time HOUR, 
  
   action IDENTITY  
  
 )  
  
 STORED AS ICEBERG  
  
 LOCATION 
'hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned'

 TBLPROPERTIES ('iceberg.catalog'='hadoop.tables', 
'iceberg.file_format'='parquet');
select count(1) from functional_parquet.iceberg_partitioned;{code}
 

> Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with 
> ASAN build
> 
>
> Key: IMPALA-10308
> URL: https://issues.apache.org/jira/browse/IMPALA-10308
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Qifan Chen
>Priority: Major
>
> The following error was seen when running the scanner test against the ASAN 
> build.
> {code:java}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'iceberg_partitioned'
> E   CAUSED BY: TableLoadingException: Error loading metadata for Iceberg 
> table hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned
> E   CAUSED BY: IllegalArgumentException: Can not create a Path from a null 
> string
>  TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': 
> 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> [gw2] linux2 -- Python 2.7.16 
> /home/qchen/Impala/bin/../infra/python/env-gcc7.5.0/bin/python
> query_test/test_scanners.py:357: in test_iceberg_query
> self.run_test_case('QueryTest/iceberg-query', vector)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:920: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:363: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:357: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:520: in __do_rpc
> {code}
> To reproduce, apply the following

[jira] [Commented] (IMPALA-10237) Support BUCKET and TRUNCATE partition transforms as built-in functions

2020-10-15 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215128#comment-17215128
 ] 

WangSheng commented on IMPALA-10237:


Hi [~gaborkaszab], I don't understand this JIRA title, What do you mean by  
built-in functions for BUCKET/TRUNCATE partition transform? Could you please 
add more descriptions about what this JIRA intended to do?

> Support BUCKET and TRUNCATE partition transforms as built-in functions
> --
>
> Key: IMPALA-10237
> URL: https://issues.apache.org/jira/browse/IMPALA-10237
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: Gabor Kaszab
>Priority: Major
>  Labels: impala-iceberg
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10159) Support ORC file format for Iceberg table

2020-10-14 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-10159.

Resolution: Fixed

> Support ORC file format for Iceberg table
> -
>
> Key: IMPALA-10159
> URL: https://issues.apache.org/jira/browse/IMPALA-10159
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> Impala can query PARQUET file format for Iceberg Table now. Since have 
> already do some work in IMPALA-9741, we can continue ORC file format 
> supported work in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10159) Support ORC file format for Iceberg table

2020-10-14 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-10159.

Resolution: Fixed

> Support ORC file format for Iceberg table
> -
>
> Key: IMPALA-10159
> URL: https://issues.apache.org/jira/browse/IMPALA-10159
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> Impala can query PARQUET file format for Iceberg Table now. Since have 
> already do some work in IMPALA-9741, we can continue ORC file format 
> supported work in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (IMPALA-10166) ALTER TABLE for Iceberg tables

2020-10-12 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10166 started by WangSheng.
--
> ALTER TABLE for Iceberg tables
> --
>
> Key: IMPALA-10166
> URL: https://issues.apache.org/jira/browse/IMPALA-10166
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Zoltán Borók-Nagy
>Assignee: WangSheng
>Priority: Major
>  Labels: impala-iceberg
>
> Add support for ALTER TABLE operations for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10166) ALTER TABLE for Iceberg tables

2020-10-12 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212239#comment-17212239
 ] 

WangSheng commented on IMPALA-10166:


HI [~boroknagyz], I will try to implement this as soon as possible.

> ALTER TABLE for Iceberg tables
> --
>
> Key: IMPALA-10166
> URL: https://issues.apache.org/jira/browse/IMPALA-10166
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Zoltán Borók-Nagy
>Assignee: WangSheng
>Priority: Major
>  Labels: impala-iceberg
>
> Add support for ALTER TABLE operations for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10166) ALTER TABLE for Iceberg tables

2020-10-10 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng reassigned IMPALA-10166:
--

Assignee: WangSheng

> ALTER TABLE for Iceberg tables
> --
>
> Key: IMPALA-10166
> URL: https://issues.apache.org/jira/browse/IMPALA-10166
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Zoltán Borók-Nagy
>Assignee: WangSheng
>Priority: Major
>  Labels: impala-iceberg
>
> Add support for ALTER TABLE operations for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10166) ALTER TABLE for Iceberg tables

2020-10-10 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211579#comment-17211579
 ] 

WangSheng commented on IMPALA-10166:


Hi [~boroknagyz], is anyone preparing to do this jira? If not, I'd like to try 
this. 

> ALTER TABLE for Iceberg tables
> --
>
> Key: IMPALA-10166
> URL: https://issues.apache.org/jira/browse/IMPALA-10166
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Add support for ALTER TABLE operations for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-10166) ALTER TABLE for Iceberg tables

2020-10-10 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211579#comment-17211579
 ] 

WangSheng edited comment on IMPALA-10166 at 10/10/20, 6:29 AM:
---

Hi [~boroknagyz], is anyone preparing to do this jira? If not, please assign 
this Jira to me, I'd like to try this. 


was (Author: skyyws):
Hi [~boroknagyz], is anyone preparing to do this jira? If not, I'd like to try 
this. 

> ALTER TABLE for Iceberg tables
> --
>
> Key: IMPALA-10166
> URL: https://issues.apache.org/jira/browse/IMPALA-10166
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Add support for ALTER TABLE operations for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-10-09 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-10164.

Resolution: Fixed

> Support HadoopCatalog for Iceberg table
> ---
>
> Key: IMPALA-10164
> URL: https://issues.apache.org/jira/browse/IMPALA-10164
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We just supported HadoopTable api to create Iceberg table in Impala now, it's 
> apparently not enough, so we preparing to support HadoopCatalog. The main 
> design is to add a new table property named 'iceberg.catalog', and default 
> value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
> HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-10-09 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-10164.

Resolution: Fixed

> Support HadoopCatalog for Iceberg table
> ---
>
> Key: IMPALA-10164
> URL: https://issues.apache.org/jira/browse/IMPALA-10164
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We just supported HadoopTable api to create Iceberg table in Impala now, it's 
> apparently not enough, so we preparing to support HadoopCatalog. The main 
> design is to add a new table property named 'iceberg.catalog', and default 
> value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
> HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9741) Support query iceberg table by impala

2020-10-09 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-9741.
---
Resolution: Fixed

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>  Labels: impala-iceberg
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (IMPALA-10159) Support ORC file format for Iceberg table

2020-10-09 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10159 started by WangSheng.
--
> Support ORC file format for Iceberg table
> -
>
> Key: IMPALA-10159
> URL: https://issues.apache.org/jira/browse/IMPALA-10159
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> Impala can query PARQUET file format for Iceberg Table now. Since have 
> already do some work in IMPALA-9741, we can continue ORC file format 
> supported work in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9741) Support query iceberg table by impala

2020-10-09 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-9741.
---
Resolution: Fixed

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>  Labels: impala-iceberg
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-10-09 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng reassigned IMPALA-9967:
-

Assignee: (was: WangSheng)

> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
> Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, 
> 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc
>
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
> Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> @  0x1c9f753  impala::Status::Status()
> @  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
> @  0x27a7fb3  impala::HdfsOrcScanner::Open()
> @  0x27365fe  
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @  0x28cb379  impala::HdfsScanNode::ProcessSplit()
> @  0x28caa7d  impala::HdfsScanNode::ScannerThread()
> @  0x28c9de5  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x28cc19e  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x205  boost::function0<>::operator()()
> @  0x2675d93  impala::Thread::SuperviseThread()
> @  0x267dd30  boost::_bi::list5<>::operator()<>()
> @  0x267dc54  boost::_bi::bind_t<>::operator()()
> @  0x267dc15  boost::detail::thread_data<>::run()
> @  0x3e3c3c1  thread_proxy
> @ 0x7f32360336b9  start_thread
> @ 0x7f3232bfe41c  clone
> I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
> 68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
>  Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> {code}
> When I remove timestamp colum from table, and generate test data, query 
> success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work stopped] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-10-09 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9967 stopped by WangSheng.
-
> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
> Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, 
> 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc
>
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
> Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> @  0x1c9f753  impala::Status::Status()
> @  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
> @  0x27a7fb3  impala::HdfsOrcScanner::Open()
> @  0x27365fe  
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @  0x28cb379  impala::HdfsScanNode::ProcessSplit()
> @  0x28caa7d  impala::HdfsScanNode::ScannerThread()
> @  0x28c9de5  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x28cc19e  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x205  boost::function0<>::operator()()
> @  0x2675d93  impala::Thread::SuperviseThread()
> @  0x267dd30  boost::_bi::list5<>::operator()<>()
> @  0x267dc54  boost::_bi::bind_t<>::operator()()
> @  0x267dc15  boost::detail::thread_data<>::run()
> @  0x3e3c3c1  thread_proxy
> @ 0x7f32360336b9  start_thread
> @ 0x7f3232bfe41c  clone
> I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
> 68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
>  Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> {code}
> When I remove timestamp colum from table, and generate test data, query 
> success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-10-09 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9967 started by WangSheng.
-
> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
> Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, 
> 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc
>
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
> Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> @  0x1c9f753  impala::Status::Status()
> @  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
> @  0x27a7fb3  impala::HdfsOrcScanner::Open()
> @  0x27365fe  
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @  0x28cb379  impala::HdfsScanNode::ProcessSplit()
> @  0x28caa7d  impala::HdfsScanNode::ScannerThread()
> @  0x28c9de5  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x28cc19e  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x205  boost::function0<>::operator()()
> @  0x2675d93  impala::Thread::SuperviseThread()
> @  0x267dd30  boost::_bi::list5<>::operator()<>()
> @  0x267dc54  boost::_bi::bind_t<>::operator()()
> @  0x267dc15  boost::detail::thread_data<>::run()
> @  0x3e3c3c1  thread_proxy
> @ 0x7f32360336b9  start_thread
> @ 0x7f3232bfe41c  clone
> I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
> 68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
>  Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> {code}
> When I remove timestamp colum from table, and generate test data, query 
> success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'

2020-10-09 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-10221.

Resolution: Fixed

> Use 'iceberg.file_format' to replace 'iceberg_file_format'
> --
>
> Key: IMPALA-10221
> URL: https://issues.apache.org/jira/browse/IMPALA-10221
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We provide several new table properties in IMPALA-10164, such as 
> 'iceberg.catalog',
>  in order to keep consist of these properties, we rename 
> 'iceberg_file_format' to
>  'iceberg.file_format'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'

2020-10-09 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-10221.

Resolution: Fixed

> Use 'iceberg.file_format' to replace 'iceberg_file_format'
> --
>
> Key: IMPALA-10221
> URL: https://issues.apache.org/jira/browse/IMPALA-10221
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We provide several new table properties in IMPALA-10164, such as 
> 'iceberg.catalog',
>  in order to keep consist of these properties, we rename 
> 'iceberg_file_format' to
>  'iceberg.file_format'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (IMPALA-9688) Support create iceberg table by impala

2020-10-09 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-9688.
---
Resolution: Fixed

> Support create iceberg table by impala
> --
>
> Key: IMPALA-9688
> URL: https://issues.apache.org/jira/browse/IMPALA-9688
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>  Labels: impala-iceberg
>
> This sub-task mainly realizes the creation of iceberg table through impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (IMPALA-9688) Support create iceberg table by impala

2020-10-09 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-9688.
---
Resolution: Fixed

> Support create iceberg table by impala
> --
>
> Key: IMPALA-9688
> URL: https://issues.apache.org/jira/browse/IMPALA-9688
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>  Labels: impala-iceberg
>
> This sub-task mainly realizes the creation of iceberg table through impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'

2020-10-05 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10221 started by WangSheng.
--
> Use 'iceberg.file_format' to replace 'iceberg_file_format'
> --
>
> Key: IMPALA-10221
> URL: https://issues.apache.org/jira/browse/IMPALA-10221
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We provide several new table properties in IMPALA-10164, such as 
> 'iceberg.catalog',
>  in order to keep consist of these properties, we rename 
> 'iceberg_file_format' to
>  'iceberg.file_format'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'

2020-10-05 Thread WangSheng (Jira)

WangSheng created IMPALA-10221:
--

 Summary: Use 'iceberg.file_format' to replace 'iceberg_file_format'
 Key: IMPALA-10221
 URL: https://issues.apache.org/jira/browse/IMPALA-10221
 Project: IMPALA
  Issue Type: Sub-task
Reporter: WangSheng
Assignee: WangSheng


We provide several new table properties in IMPALA-10164, such as 
'iceberg.catalog',
 in order to keep consist of these properties, we rename 'iceberg_file_format' 
to
 'iceberg.file_format'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'

2020-10-05 Thread WangSheng (Jira)

WangSheng created IMPALA-10221:
--

 Summary: Use 'iceberg.file_format' to replace 'iceberg_file_format'
 Key: IMPALA-10221
 URL: https://issues.apache.org/jira/browse/IMPALA-10221
 Project: IMPALA
  Issue Type: Sub-task
Reporter: WangSheng
Assignee: WangSheng


We provide several new table properties in IMPALA-10164, such as 
'iceberg.catalog',
 in order to keep consist of these properties, we rename 'iceberg_file_format' 
to
 'iceberg.file_format'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-09-14 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10164 started by WangSheng.
--
> Support HadoopCatalog for Iceberg table
> ---
>
> Key: IMPALA-10164
> URL: https://issues.apache.org/jira/browse/IMPALA-10164
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We just supported HadoopTable api to create Iceberg table in Impala now, it's 
> apparently not enough, so we preparing to support HadoopCatalog. The main 
> design is to add a new table property named 'iceberg.catalog', and default 
> value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
> HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-09-14 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195455#comment-17195455
 ] 

WangSheng commented on IMPALA-10164:


Hi [~boroknagyz], I've do some work for this jira, and it's not very difficult 
to implement this function. But HadoopCatalog is quite different from 
HadoopTables:
 # We just need Configuration to construct HadoopTables, but HadoopCatalog need 
another param location, such as hdfs://xxx/warehouse/, and this location used 
to reserve table. When using HadoopCatalog, we need to provide TableIdentifier 
which mainly contains database and table, then Iceberg will create table use 
location 'hdfs://xxx/warehouse/database/table' to storage table info;
 # When create external table, we cannot use 
'hdfs://xxx/warehouse/database/table' to loading table directly, we need use 
TableIdentifier.of(database, table) and 'hdfs://xxx/warehouse/' instead.

So here is the problem: when creating external table with HadoopCatalog, how to 
define the location?
 * If we use 'hdfs://xxx/warehouse' in sql, we can simply use this location and 
TableIdentifier.of(database, table) to loading table, but this usage is 
different from HdfsTable, a little wired;
 * If we use 'hdfs://xxx/warehouse/database/table' in sql, we need to extract 
'hdfs://xxx/warehouse', 'database', 'table' from this location, and compare 
with database, table with  from 'create external table xxx', if same, we can 
loading table, otherwise maybe throw exception.

How do you think?Here is my simple patch, I use first method to just verify, in 
this patch, we need to create table like this:
{code:java}
create external database.table 
stored as ICEBERG
location 'hdfs://test-warehouse'{code}
Here is the Gerrit url: https://gerrit.cloudera.org/#/c/16446/

> Support HadoopCatalog for Iceberg table
> ---
>
> Key: IMPALA-10164
> URL: https://issues.apache.org/jira/browse/IMPALA-10164
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We just supported HadoopTable api to create Iceberg table in Impala now, it's 
> apparently not enough, so we preparing to support HadoopCatalog. The main 
> design is to add a new table property named 'iceberg.catalog', and default 
> value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
> HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-09-10 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-10164:
---
Issue Type: Improvement  (was: New Feature)

> Support HadoopCatalog for Iceberg table
> ---
>
> Key: IMPALA-10164
> URL: https://issues.apache.org/jira/browse/IMPALA-10164
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We just supported HadoopTable api to create Iceberg table in Impala now, it's 
> apparently not enough, so we preparing to support HadoopCatalog. The main 
> design is to add a new table property named 'iceberg.catalog', and default 
> value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
> HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-09-10 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-10164:
---
Parent: (was: IMPALA-9621)
Issue Type: New Feature  (was: Sub-task)

> Support HadoopCatalog for Iceberg table
> ---
>
> Key: IMPALA-10164
> URL: https://issues.apache.org/jira/browse/IMPALA-10164
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We just supported HadoopTable api to create Iceberg table in Impala now, it's 
> apparently not enough, so we preparing to support HadoopCatalog. The main 
> design is to add a new table property named 'iceberg.catalog', and default 
> value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
> HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-09-10 Thread WangSheng (Jira)

WangSheng created IMPALA-10164:
--

 Summary: Support HadoopCatalog for Iceberg table
 Key: IMPALA-10164
 URL: https://issues.apache.org/jira/browse/IMPALA-10164
 Project: IMPALA
  Issue Type: Sub-task
Reporter: WangSheng
Assignee: WangSheng


We just supported HadoopTable api to create Iceberg table in Impala now, it's 
apparently not enough, so we preparing to support HadoopCatalog. The main 
design is to add a new table property named 'iceberg.catalog', and default 
value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-09-10 Thread WangSheng (Jira)

WangSheng created IMPALA-10164:
--

 Summary: Support HadoopCatalog for Iceberg table
 Key: IMPALA-10164
 URL: https://issues.apache.org/jira/browse/IMPALA-10164
 Project: IMPALA
  Issue Type: Sub-task
Reporter: WangSheng
Assignee: WangSheng


We just supported HadoopTable api to create Iceberg table in Impala now, it's 
apparently not enough, so we preparing to support HadoopCatalog. The main 
design is to add a new table property named 'iceberg.catalog', and default 
value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IMPALA-10159) Support ORC file format for Iceberg table

2020-09-09 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192834#comment-17192834
 ] 

WangSheng commented on IMPALA-10159:


Hi [~boroknagyz], I use spark-shell to generated test files, my spark client 
version is 2.4.5, and the orc jars in this client is 1.5.5, even I replace 
these orc jars to 1.6.3, it doesn't work. Here is the code to generated test 
files:

{code:java}
val conf = new Configuration()
val tblLoc = "/test-warehouse/iceberg_test/iceberg_partitioned_orc"
val catalog = new HadoopTables(conf);
val sparkSchema = StructType(List(StructField("id", IntegerType,true),
StructField("user", StringType,false),StructField("action", StringType,false),
StructField("event_time", 
SparkSchemaUtil.convert(Types.TimestampType.withoutZone()),false)))
val icebergSchema = SparkSchemaUtil.convert(sparkSchema)
val spec = 
PartitionSpec.builderFor(icebergSchema).hour("event_time").identity("action").build
val table = catalog.create(icebergSchema, spec, tblLoc)
val data_df = 
spark.createDataFrame(Seq((1,"Alex","view",Timestamp.valueOf("2020-01-01 
08:00:00".toDF("id","user","action","ts")
var array = 
data_df.select(data_df("id"),data_df("user"),data_df("action"),to_timestamp(data_df("ts"))).collect()
val df = spark.createDataFrame(sc.makeRDD(array), sparkSchema)
df.write.format("iceberg").option("write-format", 
"orc").mode("append").save(tblLoc)
spark.read.format("iceberg").load(tblLoc).show
{code}
This code will throw exception "java.lang.UnsupportedOperationException: Spark 
does not support timestamp without time zone fields"
If we replace "SparkSchemaUtil.convert(Types.TimestampType.withoutZone())" to 
"TimestampType", we can generated test files normally, but when query in 
Impala, you can meet the problem in IMPALA-9967.
And here is the create statement:

{code:java}
CREATE EXTERNAL TABLE default.iceberg_partitioned_orc
STORED AS ICEBERG
LOCATION 
'hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned_orc'
TBLPROPERTIES('iceberg_file_format'='orc');
{code}



> Support ORC file format for Iceberg table
> -
>
> Key: IMPALA-10159
> URL: https://issues.apache.org/jira/browse/IMPALA-10159
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> Impala can query PARQUET file format for Iceberg Table now. Since have 
> already do some work in IMPALA-9741, we can continue ORC file format 
> supported work in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10159) Support ORC file format for Iceberg table

2020-09-09 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192751#comment-17192751
 ] 

WangSheng commented on IMPALA-10159:


Hi [~boroknagyz],[~tarmstrong], supported ORC file format for Iceberg table is 
quite simple based on IMPALA-9741. The point is to construct test cases, and we 
meet problems in IMPALA-9967. My previous test file is generated by Spark, and 
I found that Spark is not supported timestamp without time zone fields. So I 
think we may generate test files without Timestamp type and explain this in the 
code. How do you think?

> Support ORC file format for Iceberg table
> -
>
> Key: IMPALA-10159
> URL: https://issues.apache.org/jira/browse/IMPALA-10159
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>
> Impala can query PARQUET file format for Iceberg Table now. Since have 
> already do some work in IMPALA-9741, we can continue ORC file format 
> supported work in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10159) Support ORC file format for Iceberg table

2020-09-09 Thread WangSheng (Jira)

WangSheng created IMPALA-10159:
--

 Summary: Support ORC file format for Iceberg table
 Key: IMPALA-10159
 URL: https://issues.apache.org/jira/browse/IMPALA-10159
 Project: IMPALA
  Issue Type: Sub-task
Reporter: WangSheng
Assignee: WangSheng


Impala can query PARQUET file format for Iceberg Table now. Since have already 
do some work in IMPALA-9741, we can continue ORC file format supported work in 
this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IMPALA-10159) Support ORC file format for Iceberg table

2020-09-09 Thread WangSheng (Jira)

WangSheng created IMPALA-10159:
--

 Summary: Support ORC file format for Iceberg table
 Key: IMPALA-10159
 URL: https://issues.apache.org/jira/browse/IMPALA-10159
 Project: IMPALA
  Issue Type: Sub-task
Reporter: WangSheng
Assignee: WangSheng


Impala can query PARQUET file format for Iceberg Table now. Since have already 
do some work in IMPALA-9741, we can continue ORC file format supported work in 
this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-08-28 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186208#comment-17186208
 ] 

WangSheng edited comment on IMPALA-9967 at 8/28/20, 7:17 AM:
-

Hi [~boroknagyz], here is the data file:
{code:java}
create external table orc_test(
id int, user string, action string, event_time timestamp) 
stored as orc 
location 'hdfs://localhost:20500/orc_table_test';
{code}
This file contains timestamp column, create external table by this file, select 
will throw exception.
 [^00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc] 


{code:java}
create external table orc_test2(
id int, user string, action string) 
stored as orc 
location 'hdfs://localhost:20500/orc_table_test2';
{code}
This file does not contains timestamp column, and create external table by this 
file, select returns success.
 [^00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc] 




was (Author: skyyws):
{code:java}
create external table orc_test(
id int, user string, action string, event_time timestamp) 
stored as orc 
location 'hdfs://localhost:20500/orc_table_test';
{code}
This file contains timestamp column, create external table by this file, select 
will throw exception.
 [^00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc] 


{code:java}
create external table orc_test2(
id int, user string, action string) 
stored as orc 
location 'hdfs://localhost:20500/orc_table_test2';
{code}
This file does not contains timestamp column, and create external table by this 
file, select returns success.
 [^00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc] 



> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: WangSheng
>Priority: Minor
> Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, 
> 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc
>
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
> Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> @  0x1c9f753  impala::Status::Status()
> @  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
> @  0x27a7fb3  impala::HdfsOrcScanner::Open()
> @  0x27365fe  
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @  0x28cb379  impala::HdfsScanNode::ProcessSplit()
> @  0x28caa7d  impala::HdfsScanNode::ScannerThread()
> @  0x28c9de5  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x28cc19e  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x205  boost::function0<>::operator()()
> @  0x2675d93  impala::Thread::SuperviseThread()
> @  0x267dd30  boost::_bi::list5<>::operator()<>()
> @  0x267dc54  boost::_bi::bind_t<>::operator()()
> @  0x267dc15  boost::detail::thread_data<>::run()
> @  0x3e3c3c1  thread_proxy
> @ 0x7f32360336b9  start_thread
> @ 0x7f3232bfe41c  clone
> I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
> 68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
>  Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> {code}
> When I remove timestamp colum from table, and generate test data, query 
> success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-08-27 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186208#comment-17186208
 ] 

WangSheng commented on IMPALA-9967:
---

{code:java}
create external table orc_test(
id int, user string, action string, event_time timestamp) 
stored as orc 
location 'hdfs://localhost:20500/orc_table_test';
{code}
This file contains timestamp column, create external table by this file, select 
will throw exception.
 [^00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc] 


{code:java}
create external table orc_test2(
id int, user string, action string) 
stored as orc 
location 'hdfs://localhost:20500/orc_table_test2';
{code}
This file does not contains timestamp column, and create external table by this 
file, select returns success.
 [^00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc] 



> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: WangSheng
>Priority: Minor
> Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, 
> 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc
>
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
> Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> @  0x1c9f753  impala::Status::Status()
> @  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
> @  0x27a7fb3  impala::HdfsOrcScanner::Open()
> @  0x27365fe  
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @  0x28cb379  impala::HdfsScanNode::ProcessSplit()
> @  0x28caa7d  impala::HdfsScanNode::ScannerThread()
> @  0x28c9de5  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x28cc19e  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x205  boost::function0<>::operator()()
> @  0x2675d93  impala::Thread::SuperviseThread()
> @  0x267dd30  boost::_bi::list5<>::operator()<>()
> @  0x267dc54  boost::_bi::bind_t<>::operator()()
> @  0x267dc15  boost::detail::thread_data<>::run()
> @  0x3e3c3c1  thread_proxy
> @ 0x7f32360336b9  start_thread
> @ 0x7f3232bfe41c  clone
> I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
> 68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
>  Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> {code}
> When I remove timestamp colum from table, and generate test data, query 
> success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-08-27 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9967:
--
Attachment: 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc

> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: WangSheng
>Priority: Minor
> Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, 
> 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc
>
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
> Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> @  0x1c9f753  impala::Status::Status()
> @  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
> @  0x27a7fb3  impala::HdfsOrcScanner::Open()
> @  0x27365fe  
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @  0x28cb379  impala::HdfsScanNode::ProcessSplit()
> @  0x28caa7d  impala::HdfsScanNode::ScannerThread()
> @  0x28c9de5  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x28cc19e  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x205  boost::function0<>::operator()()
> @  0x2675d93  impala::Thread::SuperviseThread()
> @  0x267dd30  boost::_bi::list5<>::operator()<>()
> @  0x267dc54  boost::_bi::bind_t<>::operator()()
> @  0x267dc15  boost::detail::thread_data<>::run()
> @  0x3e3c3c1  thread_proxy
> @ 0x7f32360336b9  start_thread
> @ 0x7f3232bfe41c  clone
> I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
> 68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
>  Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> {code}
> When I remove timestamp colum from table, and generate test data, query 
> success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-08-27 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9967:
--
Attachment: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc

> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: WangSheng
>Priority: Minor
> Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc
>
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
> Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> @  0x1c9f753  impala::Status::Status()
> @  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
> @  0x27a7fb3  impala::HdfsOrcScanner::Open()
> @  0x27365fe  
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @  0x28cb379  impala::HdfsScanNode::ProcessSplit()
> @  0x28caa7d  impala::HdfsScanNode::ScannerThread()
> @  0x28c9de5  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x28cc19e  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x205  boost::function0<>::operator()()
> @  0x2675d93  impala::Thread::SuperviseThread()
> @  0x267dd30  boost::_bi::list5<>::operator()<>()
> @  0x267dc54  boost::_bi::bind_t<>::operator()()
> @  0x267dc15  boost::detail::thread_data<>::run()
> @  0x3e3c3c1  thread_proxy
> @ 0x7f32360336b9  start_thread
> @ 0x7f3232bfe41c  clone
> I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
> 68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
>  Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> {code}
> When I remove timestamp colum from table, and generate test data, query 
> success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-07-17 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9967:
--
Description: 
Recently, when I test impala query orc table, I found that scanning failed when 
table contains timestamp column, here is there exception: 

{code:java}
I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
Encountered parse error in tail of ORC file 
hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
 Unknown type kind
@  0x1c9f753  impala::Status::Status()
@  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
@  0x27a7fb3  impala::HdfsOrcScanner::Open()
@  0x27365fe  impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
@  0x28cb379  impala::HdfsScanNode::ProcessSplit()
@  0x28caa7d  impala::HdfsScanNode::ScannerThread()
@  0x28c9de5  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
@  0x28cc19e  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x205  boost::function0<>::operator()()
@  0x2675d93  impala::Thread::SuperviseThread()
@  0x267dd30  boost::_bi::list5<>::operator()<>()
@  0x267dc54  boost::_bi::bind_t<>::operator()()
@  0x267dc15  boost::detail::thread_data<>::run()
@  0x3e3c3c1  thread_proxy
@ 0x7f32360336b9  start_thread
@ 0x7f3232bfe41c  clone
I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
 Encountered parse error in tail of ORC file 
hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
 Unknown type kind
{code}

When I remove timestamp colum from table, and generate test data, query 
success. By the way, my test data is generated by spark.

  was:
Recently, when I test impala query orc table, I found that scanning failed when 
table contains timestamp column, here is there exception: 

{code:java}
I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
Encountered parse error in tail of ORC file 
hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
 Unknown type kind
@  0x1c9f753  impala::Status::Status()
@  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
@  0x27a7fb3  impala::HdfsOrcScanner::Open()
@  0x27365fe  impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
@  0x28cb379  impala::HdfsScanNode::ProcessSplit()
@  0x28caa7d  impala::HdfsScanNode::ScannerThread()
@  0x28c9de5  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
@  0x28cc19e  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x205  boost::function0<>::operator()()
@  0x2675d93  impala::Thread::SuperviseThread()
@  0x267dd30  boost::_bi::list5<>::operator()<>()
@  0x267dc54  boost::_bi::bind_t<>::operator()()
@  0x267dc15  boost::detail::thread_data<>::run()
@  0x3e3c3c1  thread_proxy
@ 0x7f32360336b9  start_thread
@ 0x7f3232bfe41c  clone
I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
 Encountered parse error in tail of ORC file 
hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
 Unknown type kind
{code}

When I remove timestamp colum from table, and generate test data, query 
success. By the way, my test data is generated by spark.


> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: WangSheng
>Priority: Minor
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717

[jira] [Updated] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-07-17 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9967:
--
Description: 
Recently, when I test impala query orc table, I found that scanning failed when 
table contains timestamp column, here is there exception: 

{code:java}
I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
Encountered parse error in tail of ORC file 
hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
 Unknown type kind
@  0x1c9f753  impala::Status::Status()
@  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
@  0x27a7fb3  impala::HdfsOrcScanner::Open()
@  0x27365fe  impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
@  0x28cb379  impala::HdfsScanNode::ProcessSplit()
@  0x28caa7d  impala::HdfsScanNode::ScannerThread()
@  0x28c9de5  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
@  0x28cc19e  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x205  boost::function0<>::operator()()
@  0x2675d93  impala::Thread::SuperviseThread()
@  0x267dd30  boost::_bi::list5<>::operator()<>()
@  0x267dc54  boost::_bi::bind_t<>::operator()()
@  0x267dc15  boost::detail::thread_data<>::run()
@  0x3e3c3c1  thread_proxy
@ 0x7f32360336b9  start_thread
@ 0x7f3232bfe41c  clone
I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
 Encountered parse error in tail of ORC file 
hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
 Unknown type kind
{code}

When I remove timestamp colum from table, and generate test data, query 
success. By the way, my test data is generated by spark.

  was:
Recently, when I test impala query orc table, I found that scanning failed when 
table contains timestamp column, here is there exception: 

{code:java}
I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
Encountered parse error in tail of ORC file 
hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
 Unknown type kind
@  0x1c9f753  impala::Status::Status()
@  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
@  0x27a7fb3  impala::HdfsOrcScanner::Open()
@  0x27365fe  impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
@  0x28cb379  impala::HdfsScanNode::ProcessSplit()
@  0x28caa7d  impala::HdfsScanNode::ScannerThread()
@  0x28c9de5  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
@  0x28cc19e  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x205  boost::function0<>::operator()()
@  0x2675d93  impala::Thread::SuperviseThread()
@  0x267dd30  boost::_bi::list5<>::operator()<>()
@  0x267dc54  boost::_bi::bind_t<>::operator()()
@  0x267dc15  boost::detail::thread_data<>::run()
@  0x3e3c3c1  thread_proxy
@ 0x7f32360336b9  start_thread
@ 0x7f3232bfe41c  clone
I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
 Encountered parse error in tail of ORC file 
hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
 Unknown type kind
{code}

When I remove timestamp colum from table, and generate test data, query 
success. By the way, my test data is generated by spark.


> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: WangSheng
>Priority: Minor
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129]

[jira] [Created] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-07-17 Thread WangSheng (Jira)

WangSheng created IMPALA-9967:
-

 Summary: Scan orc failed when table contains timestamp column
 Key: IMPALA-9967
 URL: https://issues.apache.org/jira/browse/IMPALA-9967
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.0
Reporter: WangSheng


Recently, when I test impala query orc table, I found that scanning failed when 
table contains timestamp column, here is there exception: 

{code:java}
I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
Encountered parse error in tail of ORC file 
hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
 Unknown type kind
@  0x1c9f753  impala::Status::Status()
@  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
@  0x27a7fb3  impala::HdfsOrcScanner::Open()
@  0x27365fe  impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
@  0x28cb379  impala::HdfsScanNode::ProcessSplit()
@  0x28caa7d  impala::HdfsScanNode::ScannerThread()
@  0x28c9de5  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
@  0x28cc19e  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x205  boost::function0<>::operator()()
@  0x2675d93  impala::Thread::SuperviseThread()
@  0x267dd30  boost::_bi::list5<>::operator()<>()
@  0x267dc54  boost::_bi::bind_t<>::operator()()
@  0x267dc15  boost::detail::thread_data<>::run()
@  0x3e3c3c1  thread_proxy
@ 0x7f32360336b9  start_thread
@ 0x7f3232bfe41c  clone
I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
 Encountered parse error in tail of ORC file 
hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
 Unknown type kind
{code}

When I remove timestamp colum from table, and generate test data, query 
success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-07-17 Thread WangSheng (Jira)

WangSheng created IMPALA-9967:
-

 Summary: Scan orc failed when table contains timestamp column
 Key: IMPALA-9967
 URL: https://issues.apache.org/jira/browse/IMPALA-9967
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.0
Reporter: WangSheng


Recently, when I test impala query orc table, I found that scanning failed when 
table contains timestamp column, here is there exception: 

{code:java}
I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
Encountered parse error in tail of ORC file 
hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
 Unknown type kind
@  0x1c9f753  impala::Status::Status()
@  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
@  0x27a7fb3  impala::HdfsOrcScanner::Open()
@  0x27365fe  impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
@  0x28cb379  impala::HdfsScanNode::ProcessSplit()
@  0x28caa7d  impala::HdfsScanNode::ScannerThread()
@  0x28c9de5  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
@  0x28cc19e  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x205  boost::function0<>::operator()()
@  0x2675d93  impala::Thread::SuperviseThread()
@  0x267dd30  boost::_bi::list5<>::operator()<>()
@  0x267dc54  boost::_bi::bind_t<>::operator()()
@  0x267dc15  boost::detail::thread_data<>::run()
@  0x3e3c3c1  thread_proxy
@ 0x7f32360336b9  start_thread
@ 0x7f3232bfe41c  clone
I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
hdfs://localhost:20500/test-warehouse/iceberg_test/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
 Encountered parse error in tail of ORC file 
hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
 Unknown type kind
{code}

When I remove timestamp colum from table, and generate test data, query 
success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (IMPALA-9741) Support query iceberg table by impala

2020-07-06 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152439#comment-17152439
 ] 

WangSheng edited comment on IMPALA-9741 at 7/7/20, 2:35 AM:


Hi [~tarmstrong],[~boroknagyz],[~vihangk1], I have already completed a version 
of query iceberg table by impala. The main design is treated iceberg table as 
an unpartitioned hdfs table, including theses functions:
# identity iceberg file format by table property;
# push down iceberg partition column predicates to iceberg, to filter data 
files need to be scanned;

This is a simple version, and some code may be not good, hope you can give some 
advice, thanks a lot.
Here is the gerrit url: https://gerrit.cloudera.org/#/c/16143/


was (Author: skyyws):
Hi [~tarmstrong],[~boroknagyz],[~vihangk1], I have already completed a version 
of query iceberg table by impala. The main design is treated iceberg table as 
an unpartitioned hdfs table, including theses functions:
# identity iceberg file format by table property;
# push down iceberg partition column predicates to iceberg, to filter data 
files need to be scanned;
This is a simple version, and some code may be not good, hope you can give some 
advice, thanks a lot.

Here is the gerrit url: https://gerrit.cloudera.org/#/c/16143/

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9741) Support query iceberg table by impala

2020-07-06 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152439#comment-17152439
 ] 

WangSheng edited comment on IMPALA-9741 at 7/7/20, 2:32 AM:


Hi [~tarmstrong],[~boroknagyz],[~vihangk1], I have already completed a version 
of query iceberg table by impala. The main design is treated iceberg table as 
an unpartitioned hdfs table, including theses functions:
# identity iceberg file format by table property;
# push down iceberg partition column predicates to iceberg, to filter data 
files need to be scanned;
This is a simple version, and some code may be not good, hope you can give some 
advice, thanks a lot.

Here is the gerrit url: https://gerrit.cloudera.org/#/c/16143/


was (Author: skyyws):
Hi [~tarmstrong],[~boroknagyz],[~vihangk1], I have already completed a version 
of query iceberg table by impala. The main design is treated iceberg table as 
an unpartitioned hdfs table, including theses functions:
# identity iceberg file format by table property;
# push down iceberg partition column predicates to iceberg, to filter data 
files need to be scanned;
This is a simple version, and some code may be not good, hope you can give some 
advice, thanks a lot.
Here is the gerrit url: https://gerrit.cloudera.org/#/c/16143/

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9741) Support query iceberg table by impala

2020-07-06 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152439#comment-17152439
 ] 

WangSheng commented on IMPALA-9741:
---

Hi [~tarmstrong],[~boroknagyz],[~vihangk1], I have already completed a version 
of query iceberg table by impala. The main design is treated iceberg table as 
an unpartitioned hdfs table, including theses functions:
# identity iceberg file format by table property;
# push down iceberg partition column predicates to iceberg, to filter data 
files need to be scanned;
This is a simple version, and some code may be not good, hope you can give some 
advice, thanks a lot.
Here is the gerrit url: https://gerrit.cloudera.org/#/c/16143/

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9741) Support query iceberg table by impala

2020-05-17 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109448#comment-17109448
 ] 

WangSheng commented on IMPALA-9741:
---

[~tarmstrong] I see, and thanks for your advice, I will try to implement this 
function recently based on IMPALA-9688, I will updated here if any progress.

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9741) Support query iceberg table by impala

2020-05-15 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108158#comment-17108158
 ] 

WangSheng edited comment on IMPALA-9741 at 5/15/20, 10:53 AM:
--

Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement 
query iceberg by impala recently, and here is my initial desgin. I will write a 
class named IcebergScanNode.java in frontend, and this class mainly contains 
these functions:
* Transform impala conjunts to iceberg expressions, which means we can pushdown 
some predicates to icebrg;
* Get specific data files from icebreg by these expressions, which stored in 
hdfs;
* Use these specific data files to construct related thrift struct, such as 
THdfsFileSplit/TScanRangerSpec;
* And then backend will use these thrift structs to construct "SCAN HDFS" to 
scan data, and this way we can reuse these code in backend.

And I have upload a very simple desgin picture as an attachment, but still some 
questions need to be consider:
# If iceberg returns different format files, such as parquet/orc,  does backend 
can handle these files?
# if not, we may decide the table data format when create table, maybe by 
tblproperties, like this: 'iceberg_table_format'='parquet', and if so, we 
cannot select iceberg table which has different format data files.



was (Author: skyyws):
Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement 
query iceberg by impala recently, and here is my initial desgin. I will write a 
class named IcebergScanNode.java in frontend, and this class mainly contains 
these functions:
* Transform impala conjunts to iceberg expressions, which means we can pushdown 
some predicates to icebrg;
* Get specific data files from icebreg by these expressions, which stored in 
hdfs;
* Use these specific data files to construct related thrift struct, such as 
THdfsFileSplit/TScanRangerSpec;
* And then backend will.use these thrift structs to construct "SCAN HDFS" to 
scan data, and this way we can reuse these code in backend.

And I have upload a very simple desgin picture as an attachment, but still some 
questions need to be consider:
# If iceberg returns different format files, such as parquet/orc,  does backend 
can handle these files?
# if not, we may decide the table data format when create table, maybe by 
tblproperties, like this: 'iceberg_table'='parquet'


> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9741) Support query iceberg table by impala

2020-05-15 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108158#comment-17108158
 ] 

WangSheng edited comment on IMPALA-9741 at 5/15/20, 10:39 AM:
--

Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement 
query iceberg by impala recently, and here is my initial desgin. I will write a 
class named IcebergScanNode.java in frontend, and this class mainly contains 
these functions:
* Transform impala conjunts to iceberg expressions, which means we can pushdown 
some predicates to icebrg;
* Get specific data files from icebreg by these expressions, which stored in 
hdfs;
* Use these specific data files to construct related thrift struct, such as 
THdfsFileSplit/TScanRangerSpec;
* And then backend will.use these thrift structs to construct "SCAN HDFS" to 
scan data, and this way we can reuse these code in backend.

And I have upload a very simple desgin picture as an attachment, but still some 
questions need to be consider:
# If iceberg returns different format files, such as parquet/orc,  does backend 
can handle these files?
# if not, we may decide the table data format when create table, maybe by 
tblproperties, like this: 'iceberg_table'='parquet'



was (Author: skyyws):
Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement 
query iceberg by impala recently, and here is my initial desgin. I will write a 
class named IcebergScanNode.java in frontend, and this class mainly contains 
these functions:
* Transform impala conjunts to iceberg expressions, which means we can pushdown 
some predicates to icebrg;
* Get specific data files from icebreg by these expressions, which stored in 
hdfs;
* Use these specific data files to construct related thrift struct, such as 
THdfsFileSplit/TScanRangerSpec;
* And then backend will.use these thrift structs to construct "SCAN HDFS" to 
scan data, and this way we can reuse these code in backend.

And here is a very simple desgin:
 !select-iceberg.jpg! 

Still some questions need to be consider:
# If iceberg returns different format files, such as parquet/orc,  does backend 
can handle these files?
# if not, we may decide the table data format when create table, maybe by 
tblproperties, like this: 'iceberg_table'='parquet'


> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9741) Support query iceberg table by impala

2020-05-15 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108158#comment-17108158
 ] 

WangSheng edited comment on IMPALA-9741 at 5/15/20, 10:37 AM:
--

Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement 
query iceberg by impala recently, and here is my initial desgin. I will write a 
class named IcebergScanNode.java in frontend, and this class mainly contains 
these functions:
* Transform impala conjunts to iceberg expressions, which means we can pushdown 
some predicates to icebrg;
* Get specific data files from icebreg by these expressions, which stored in 
hdfs;
* Use these specific data files to construct related thrift struct, such as 
THdfsFileSplit/TScanRangerSpec;
* And then backend will.use these thrift structs to construct "SCAN HDFS" to 
scan data, and this way we can reuse these code in backend.

And here is a very simple desgin:
 !select-iceberg.jpg! 

Still some questions need to be consider:
# If iceberg returns different format files, such as parquet/orc,  does backend 
can handle these files?
# if not, we may decide the table data format when create table, maybe by 
tblproperties, like this: 'iceberg_table'='parquet'



was (Author: skyyws):
Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement 
query iceberg by impala recently, and here is my initial desgin. I will write a 
class named IcebergScanNode.java in frontend, and this class mainly contains 
these functions:
* Transform impala conjunts to iceberg expressions, which means we can pushdown 
some predicates to icebrg;
* Get specific data files from icebreg by these expressions, which stored in 
hdfs;
* Use these specific data files to construct related thrift struct, such as 
THdfsFileSplit/TScanRangerSpec;
* And then backend will.use these thrift structs to construct "SCAN HDFS" to 
scan data, and this way we can reuse these code in backend.

And here is a very simple desgin:
 !select-iceberg.png! 

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9741) Support query iceberg table by impala

2020-05-15 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9741:
--
Attachment: select-iceberg.jpg

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9741) Support query iceberg table by impala

2020-05-15 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9741:
--
Attachment: (was: select-iceberg.png)

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9741) Support query iceberg table by impala

2020-05-15 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108158#comment-17108158
 ] 

WangSheng commented on IMPALA-9741:
---

Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement 
query iceberg by impala recently, and here is my initial desgin. I will write a 
class named IcebergScanNode.java in frontend, and this class mainly contains 
these functions:
* Transform impala conjunts to iceberg expressions, which means we can pushdown 
some predicates to icebrg;
* Get specific data files from icebreg by these expressions, which stored in 
hdfs;
* Use these specific data files to construct related thrift struct, such as 
THdfsFileSplit/TScanRangerSpec;
* And then backend will.use these thrift structs to construct "SCAN HDFS" to 
scan data, and this way we can reuse these code in backend.

And here is a very simple desgin:
 !select-iceberg.png! 

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Attachments: select-iceberg.png
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9741) Support query iceberg table by impala

2020-05-15 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9741:
--
Attachment: select-iceberg.png

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Attachments: select-iceberg.png
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-9741) Support query iceberg table by impala

2020-05-10 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9741 started by WangSheng.
-
> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9741) Support query iceberg table by impala

2020-05-10 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9741:
--
Description: Since we have submit an patch of supporting create iceberg 
table by impala in IMPALA-9688, we are preparing to implement iceberg table 
query by impala. But we need to read the impala and iceberg code  deeply to 
determine how to do this.

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9741) Support query iceberg table by impala

2020-05-10 Thread WangSheng (Jira)

WangSheng created IMPALA-9741:
-

 Summary: Support query iceberg table by impala
 Key: IMPALA-9741
 URL: https://issues.apache.org/jira/browse/IMPALA-9741
 Project: IMPALA
  Issue Type: Sub-task
Reporter: WangSheng
Assignee: WangSheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9741) Support query iceberg table by impala

2020-05-10 Thread WangSheng (Jira)

WangSheng created IMPALA-9741:
-

 Summary: Support query iceberg table by impala
 Key: IMPALA-9741
 URL: https://issues.apache.org/jira/browse/IMPALA-9741
 Project: IMPALA
  Issue Type: Sub-task
Reporter: WangSheng
Assignee: WangSheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (IMPALA-9621) Support iceberg on hdfs

2020-05-10 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9621 started by WangSheng.
-
> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs

2020-05-02 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097812#comment-17097812
 ] 

WangSheng edited comment on IMPALA-9621 at 5/2/20, 6:14 AM:


Hi [~stakiar][~rdblue], recently I discussed with the colleague who is research 
the iceberg, we found that it not necessary to specify the format when creating 
a iceberg table. The mainly reason file format is file level, instead of table 
level, which means a table in iceberg can have different format for each data 
files in hdfs. In file 
[IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java],
 function open() construct iterator by file format which means different file 
may have different format. So I didn't consider table format in IMPALA-9688 
when creating iceberg table by impala. And we are trying to implement query 
iceberg table by impala recently, if there is any progress, I will update here 
as well.


was (Author: skyyws):
Hi [~stakiar][~rdblue], I discussed with the colleague responsible for iceberg 
recently, we found that it not necessary to specify the format when creating a 
iceberg table. The mainly reason file format is file level, instead of table 
level, which means a table in iceberg can have different format for each data 
files in hdfs. In file 
[IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java],
 function open() construct iterator by file format which means different file 
may have different format. So I didn't consider table format in IMPALA-9688 
when creating iceberg table by impala. And we are trying to implement query 
iceberg table by impala recently, if there is any progress, I will update here 
as well.

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs

2020-05-01 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097812#comment-17097812
 ] 

WangSheng commented on IMPALA-9621:
---

Hi [~stakiar][~rdblue], I discussed with the colleague responsible for iceberg 
recently, we found that it not necessary to specify the format when creating a 
iceberg table. The mainly reason file format is file level, instead of table 
level, which means a table in iceberg can have different format for each data 
files in hdfs. In file 
[IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java],
 function open() construct iterator by file format which means different file 
may have different format. So I didn't consider table format in IMPALA-9688 
when creating iceberg table by impala. And we are trying to implement query 
iceberg table by impala recently, if there is any progress, I will update here 
as well.

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9688) Support create iceberg table by impala

2020-05-01 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097795#comment-17097795
 ] 

WangSheng commented on IMPALA-9688:
---

Hi [~stakiar], thanks for your question. I have already add a new syntax 
"partition by spec" to create iceberg table with partitions in my patch, so the 
above create sql can execute successful in my test environment. I design this 
mainly because iceberg partition is very different from hdfs table, and this is 
also refer to the design of kudu table create(partition by hash/range)
{code:java}
create table iceberg_test(
level string,
event_time string,
message string)
partition by spec(
level identity,
event_time identity
)
stored as iceberg;

create table hdfs_test(
level string,
event_time string,
message string)
partitioned by (
dt string
)
stored as parquet;
{code}


> Support create iceberg table by impala
> --
>
> Key: IMPALA-9688
> URL: https://issues.apache.org/jira/browse/IMPALA-9688
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> This sub-task mainly realizes the creation of iceberg table through impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs

2020-04-28 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094972#comment-17094972
 ] 

WangSheng edited comment on IMPALA-9621 at 4/29/20, 1:37 AM:
-

Hi [~stakiar], thanks for your reply. Hive for iceberg is on progess now, you 
can find the related code: 
[IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java,
 and here is another related project on github: 
[hiveberg|https://github.com/ExpediaGroup/hiveberg].


was (Author: skyyws):
Hi [~stakiar], thanks for your replay. Hive for iceberg is on progess now, you 
can find the related code: 
[IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java,
 and here is another related project on github: 
[hiveberg|https://github.com/ExpediaGroup/hiveberg].

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs

2020-04-28 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094972#comment-17094972
 ] 

WangSheng edited comment on IMPALA-9621 at 4/29/20, 1:37 AM:
-

Hi [~stakiar], thanks for your reply. Hive for iceberg is on progess now, you 
can find the related code: 
[IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java],
 and here is another related project on github: 
[hiveberg|https://github.com/ExpediaGroup/hiveberg].


was (Author: skyyws):
Hi [~stakiar], thanks for your reply. Hive for iceberg is on progess now, you 
can find the related code: 
[IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java,
 and here is another related project on github: 
[hiveberg|https://github.com/ExpediaGroup/hiveberg].

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs

2020-04-28 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094972#comment-17094972
 ] 

WangSheng commented on IMPALA-9621:
---

Hi [~stakiar], thanks for your replay. Hive for iceberg is on progess now, you 
can find the related code: 
[IcebergInputFormat.java|https://github.com/apache/incubator-iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java,
 and here is another related project on github: 
[hiveberg|https://github.com/ExpediaGroup/hiveberg].

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9688) Support create iceberg table by impala

2020-04-23 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091126#comment-17091126
 ] 

WangSheng edited comment on IMPALA-9688 at 4/24/20, 3:20 AM:
-

Hi [~tarmstrong], [~stigahuang], I've already implemented a simple version to 
create iceberg table by impala. We can use the following sql to create an 
iceberg table:
{code:java}
create table iceberg_test(
level string,
event_time string,
message string)
partition by spec(
level identity,
event_time identity
)
stored as iceberg;
{code}
this query would be transformed as a iceberg table shcema like this:
{code:java}
Schema schema = new Schema(
Types.NestedField.required(1, "level", Types.StringType.get()),
Types.NestedField.required(2, "event_time", Types.StringType.get()),
Types.NestedField.required(3, "message", Types.StringType.get()));
PartitionSpec spec = 
PartitionSpec.builderFor(schema).identity("event_time").identity("level").build();
HadoopTables.create(schema, spec, location);
{code}
We can also use show create table xxx and show partitions xxx for iceberg 
table. I referred to the implementation of kudu table by defined a new 
IcebergTable and related classes. I know there are many places need to be 
revised and improved. The point is I'm not sure if this solution is feasible, 
so hope you guys can give me some suggestions, thanks a lot!
And here is the gerrit url: https://gerrit.cloudera.org/#/c/15797



was (Author: skyyws):
Hi [~tarmstrong], [~stigahuang], I've already implemented a simple version to 
create iceberg table by impala. We can use the following sql to create an 
iceberg table:
{code:java}
create table iceberg_test(
level string,
event_time string,
message string)
partition by spec(
level identity,
event_time identity
)
stored as iceberg;
{code}
this query would be transformed as a iceberg table shcema like this:
{code:java}
Schema schema = new Schema(
Types.NestedField.required(1, "level", Types.StringType.get()),
Types.NestedField.required(2, "event_time", Types.StringType.get()),
Types.NestedField.required(3, "message", Types.StringType.get()));
PartitionSpec spec = 
PartitionSpec.builderFor(schema).identity("event_time").identity("level").build();
HadoopTables.create(schema, spec, location);
{code}
We can also use show create table xxx and show partitions xxx for iceberg. I 
referred to the implementation of kudu table by defined a new IcebergTable and 
related classes. I know there are many places need to be revised and improved. 
The point is I'm not sure if this solution is feasible, so hope you guys can 
give me some suggestions, thanks a lot!
And here is the gerrit url: https://gerrit.cloudera.org/#/c/15797


> Support create iceberg table by impala
> --
>
> Key: IMPALA-9688
> URL: https://issues.apache.org/jira/browse/IMPALA-9688
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> This sub-task mainly realizes the creation of iceberg table through impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9688) Support create iceberg table by impala

2020-04-23 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091126#comment-17091126
 ] 

WangSheng commented on IMPALA-9688:
---

Hi [~tarmstrong], [~stigahuang], I've already implemented a simple version to 
create iceberg table by impala. We can use the following sql to create an 
iceberg table:
{code:java}
create table iceberg_test(
level string,
event_time string,
message string)
partition by spec(
level identity,
event_time identity
)
stored as iceberg;
{code}
this query would be transformed as a iceberg table shcema like this:
{code:java}
Schema schema = new Schema(
Types.NestedField.required(1, "level", Types.StringType.get()),
Types.NestedField.required(2, "event_time", Types.StringType.get()),
Types.NestedField.required(3, "message", Types.StringType.get()));
PartitionSpec spec = 
PartitionSpec.builderFor(schema).identity("event_time").identity("level").build();
HadoopTables.create(schema, spec, location);
{code}
We can also use show create table xxx and show partitions xxx for iceberg. I 
referred to the implementation of kudu table by defined a new IcebergTable and 
related classes. I know there are many places need to be revised and improved. 
The point is I'm not sure if this solution is feasible, so hope you guys can 
give me some suggestions, thanks a lot!
And here is the gerrit url: https://gerrit.cloudera.org/#/c/15797


> Support create iceberg table by impala
> --
>
> Key: IMPALA-9688
> URL: https://issues.apache.org/jira/browse/IMPALA-9688
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> This sub-task mainly realizes the creation of iceberg table through impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9688) Support create iceberg table by impala

2020-04-23 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9688:
--
Description: This sub-task mainly realizes the creation of iceberg table 
through impala  (was: This sub-task mainly implement iceberg table create by 
impala)

> Support create iceberg table by impala
> --
>
> Key: IMPALA-9688
> URL: https://issues.apache.org/jira/browse/IMPALA-9688
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> This sub-task mainly realizes the creation of iceberg table through impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-9688) Support create iceberg table by impala

2020-04-23 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9688 started by WangSheng.
-
> Support create iceberg table by impala
> --
>
> Key: IMPALA-9688
> URL: https://issues.apache.org/jira/browse/IMPALA-9688
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> This sub-task mainly implement iceberg table create by impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9688) Support create iceberg table by impala

2020-04-23 Thread WangSheng (Jira)

WangSheng created IMPALA-9688:
-

 Summary: Support create iceberg table by impala
 Key: IMPALA-9688
 URL: https://issues.apache.org/jira/browse/IMPALA-9688
 Project: IMPALA
  Issue Type: Sub-task
Reporter: WangSheng
Assignee: WangSheng


This sub-task mainly implement iceberg table create by impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IMPALA-9688) Support create iceberg table by impala

2020-04-23 Thread WangSheng (Jira)

WangSheng created IMPALA-9688:
-

 Summary: Support create iceberg table by impala
 Key: IMPALA-9688
 URL: https://issues.apache.org/jira/browse/IMPALA-9688
 Project: IMPALA
  Issue Type: Sub-task
Reporter: WangSheng
Assignee: WangSheng


This sub-task mainly implement iceberg table create by impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs

2020-04-13 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082186#comment-17082186
 ] 

WangSheng edited comment on IMPALA-9621 at 4/13/20, 11:24 AM:
--

[~tarmstrong]Thanks for your suggestion, Tim. We will continue to study to see 
if we can find a suitable implementation solution.If there is any progress, I 
will update here.


was (Author: skyyws):
[~tarmstrong]Thanks for your suggestion, Tim. I found that iceberg is not very 
similar to hive table. Each format has its own input and output like 
MapredParquetInputFormat/MapredParquetOutputFormat, 
KuduInputFormat/KuduOutputFormat and so on, but iceberg does not. We cannot 
just add ICEBERG as a new file format in HdfsFileFormat, whether implement it 
is as a new table or a special hdfstable. We will continue to study to see if 
we can find a suitable implementation solution.If there is any progress, I will 
update here.

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs

2020-04-13 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082186#comment-17082186
 ] 

WangSheng commented on IMPALA-9621:
---

[~tarmstrong]Thanks for your suggestion, Tim. I found that iceberg is not very 
similar to hive table. Each format has its own input and output like 
MapredParquetInputFormat/MapredParquetOutputFormat, 
KuduInputFormat/KuduOutputFormat and so on, but iceberg does not. We cannot 
just add ICEBERG as a new file format in HdfsFileFormat, whether implement it 
is as a new table or a special hdfstable. We will continue to study to see if 
we can find a suitable implementation solution.If there is any progress, I will 
update here.

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs

2020-04-09 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080176#comment-17080176
 ] 

WangSheng edited comment on IMPALA-9621 at 4/10/20, 2:38 AM:
-

[~tarmstrong]Hi Tim, thanks for your reply again. Do you mean shared the code 
of HdfsScanNode, and treat iceberg as another HdfsTable? Or just implement 
icebergTable with HdfsScanNode? 
We planned to implement this by treating iceberg as ICEBERG_PARQUET, just like 
HUDI_PARQUET as first. But after read iceberg source code, we found that 
metadata structure is different with impala, iceberg manage metadata itself by 
referring a hdfs location. Even if we can use HiveCatalog api, we cannot read 
iceberg data on hdfs directly, it doesn't like normal hdfs table structure: 
hfs://xxx/db/table/partition=xxx/xxx. 
As you mentioned above, a lot of it might be very different, so I will study 
the iceberg code more deeply to see if I can find a better way. Hope for your 
more advice, thanks!


was (Author: skyyws):
[~tarmstrong]Hi Tim, thanks for your reply again. Do you mean shared the code 
of HdfsScanNode, and treat iceberg as another HdfsTable? Or just implement 
icebergTable with HdfsScanNode? We planned to implement this by treating 
iceberg as ICEBERG_PARQUET, just like HUDI_PARQUET as first. But after read 
iceberg source code, we found that metadata structure is different with impala, 
iceberg manage metadata itself by referring a hdfs location. Even if we can use 
HiveCatalog api, we cannot read iceberg data on hdfs directly, it doesn't like 
normal hdfs table structure: hfs://xxx/db/table/partition=xxx/xxx. As you 
mentioned above, a lot of it might be very different, so I will study the 
iceberg code more deeply to see if I can find a better way. Hope for your more 
advice, thanks!

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs

2020-04-09 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080176#comment-17080176
 ] 

WangSheng commented on IMPALA-9621:
---

[~tarmstrong]Hi Tim, thanks for your reply again. Do you mean shared the code 
of HdfsScanNode, and treat iceberg as another HdfsTable? Or just implement 
icebergTable with HdfsScanNode? We planned to implement this by treating 
iceberg as ICEBERG_PARQUET, just like HUDI_PARQUET as first. But after read 
iceberg source code, we found that metadata structure is different with impala, 
iceberg manage metadata itself by referring a hdfs location. Even if we can use 
HiveCatalog api, we cannot read iceberg data on hdfs directly, it doesn't like 
normal hdfs table structure: hfs://xxx/db/table/partition=xxx/xxx. As you 
mentioned above, a lot of it might be very different, so I will study the 
iceberg code more deeply to see if I can find a better way. Hope for your more 
advice, thanks!

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs

2020-04-09 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079038#comment-17079038
 ] 

WangSheng commented on IMPALA-9621:
---

[~tarmstrong] Hi Tim, here is the quick start of iceberg api: 
[create-table|https://iceberg.apache.org/api-quickstart/#create-a-table]. And 
I've already read the iceberg source code, when use HiveCatalog to create 
table, iceberg will call HiveMetaStoreClient to create a table in HMS, you can 
found the code in 
[HiveTableOperations.doCommit()|https://github.com/apache/incubator-iceberg/blob/master/hive/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java].
 I will test this on my local environment lately, and also try to stay 
consistent if possible.

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs

2020-04-09 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079031#comment-17079031
 ] 

WangSheng commented on IMPALA-9621:
---

[~stakiar]Thanks for your suggestion, Sahil. I've already read the code in 
IMPALA-8778 several days age. This path support impala read Hudi optimized 
table by treat HUDI_PARQUET as another special parquet. When handle with 
HUDI_PARQUET, impala just filter and then treat as an normal parquet. My 
opinion is to treat iceberg as a new data source such as kudu, HBase, so we 
could create/drop/alter/select iceberg table by impala.

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs

2020-04-07 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1705#comment-1705
 ] 

WangSheng edited comment on IMPALA-9621 at 4/8/20, 3:17 AM:


Here are some of my thoughts:
* Refer to the implementation of kudu related operation, we need to implement 
IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, 
iceberg-scan-node.cc and so on;
* Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType;
* Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. 
If we use HiveCatalog, we can directly call the API of iceberg to create the 
table, impala does't need to create hms table independently. If we use 
HadoopTables, impala should create hms table firstly, and then call 
HadoopTables to create iceberg table, just like create kudu table;
* Iceberg now only support parquet file format, but may support orc in the 
future, so I'm not sure it is necessary to implement IcebergFileFormat like 
HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the 
second method is significantly simpler to implement.

And I try to split this task as some sub-task:
* Implement metadata related modification, such as create/drop/alter and so on;
* Support query/insert iceberg table;
* Do some query optimization to improve query performance；
* Other related work；

These are some of my simple ideas, more details still need to think. Hope you 
guys can give me some advice [~stigahuang][~tarmstrong]. Others are very 
welcome to give me more suggestions, thanks a lot!


was (Author: skyyws):
Here are some of my thoughts:
* Refer to the implementation of kudu related operation, we need to implement 
IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, 
iceberg-scan-node.cc and so on;
* Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType;
* Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. 
If we use HiveCatalog, we can directly call the API of iceberg to create the 
table, impala does't need to create hms table independently. If we use 
HadoopTables, impala should create hms table firstly, and then call 
HadoopTables to create iceberg table, just like create kudu table;
* Iceberg now only support parquet file format, but may support orc in the 
future, so I'm not sure it is necessary to implement IcebergFileFormat like 
HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the 
second method is significantly simpler to implement.

And I try to split this task as some sub-task:
* Implement metadata related modification, such as create/drop/alter and so on;
* Support query/insert iceberg table;
* Do some query optimization to improve query performance；
* Other related work；
These are some of my simple ideas, more details still need to think. Hope you 
guys can give me some advice [~stigahuang][~tarmstrong]. Others are very 
welcome to give me more suggestions, thanks a lot!

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs

2020-04-07 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1705#comment-1705
 ] 

WangSheng edited comment on IMPALA-9621 at 4/8/20, 3:17 AM:


Here are some of my thoughts:
* Refer to the implementation of kudu related operation, we need to implement 
IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, 
iceberg-scan-node.cc and so on;
* Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType;
* Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. 
If we use HiveCatalog, we can directly call the API of iceberg to create the 
table, impala does't need to create hms table independently. If we use 
HadoopTables, impala should create hms table firstly, and then call 
HadoopTables to create iceberg table, just like create kudu table;
* Iceberg now only support parquet file format, but may support orc in the 
future, so I'm not sure it is necessary to implement IcebergFileFormat like 
HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the 
second method is significantly simpler to implement.

And I try to split this task as some sub-task:
* Implement metadata related modification, such as create/drop/alter and so on;
* Support query/insert iceberg table;
* Do some query optimization to improve query performance；
* Other related work；
These are some of my simple ideas, more details still need to think. Hope you 
guys can give me some advice [~stigahuang][~tarmstrong]. Others are very 
welcome to give me more suggestions, thanks a lot!


was (Author: skyyws):
Here are some of my thoughts:
* Refer to the implementation of kudu related operation, we need to implement 
IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, 
iceberg-scan-node.cc and so on;
* Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType;
* Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. 
If we use HiveCatalog, we can directly call the API of iceberg to create the 
table, impala does't need to create hms table independently. If we use 
HadoopTables, impala should create hms table firstly, and then call 
HadoopTables to create iceberg table, just like create kudu table;
* Iceberg now only support parquet file format, but may support orc in the 
future, so I'm not sure it is necessary to implement IcebergFileFormat like 
HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the 
second method is significantly simpler to implement.
And I try to split this task as some sub-task:
* Implement metadata related modification, such as create/drop/alter and so on;
* Support query/insert iceberg table;
* Do some query optimization to improve query performance；
* Other related work；
These are some of my simple ideas, more details still need to think. Hope you 
guys can give me some advice [~stigahuang][~tarmstrong]. Others are very 
welcome to give me more suggestions, thanks a lot!

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs

2020-04-07 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1705#comment-1705
 ] 

WangSheng edited comment on IMPALA-9621 at 4/8/20, 3:17 AM:


Here are some of my thoughts:
* Refer to the implementation of kudu related operation, we need to implement 
IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, 
iceberg-scan-node.cc and so on;
* Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType;
* Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. 
If we use HiveCatalog, we can directly call the API of iceberg to create the 
table, impala does't need to create hms table independently. If we use 
HadoopTables, impala should create hms table firstly, and then call 
HadoopTables to create iceberg table, just like create kudu table;
* Iceberg now only support parquet file format, but may support orc in the 
future, so I'm not sure it is necessary to implement IcebergFileFormat like 
HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the 
second method is significantly simpler to implement.

And I try to split this task as some sub-task:
* Implement metadata related modification, such as create/drop/alter and so on;
* Support query/insert iceberg table;
* Do some query optimization to improve query performance；
* Other related work；

These are some of my simple ideas, more details still need to think. Hope you 
guys can give me some advice [~stigahuang], [~tarmstrong]. Others are very 
welcome to give me more suggestions, thanks a lot!


was (Author: skyyws):
Here are some of my thoughts:
* Refer to the implementation of kudu related operation, we need to implement 
IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, 
iceberg-scan-node.cc and so on;
* Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType;
* Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. 
If we use HiveCatalog, we can directly call the API of iceberg to create the 
table, impala does't need to create hms table independently. If we use 
HadoopTables, impala should create hms table firstly, and then call 
HadoopTables to create iceberg table, just like create kudu table;
* Iceberg now only support parquet file format, but may support orc in the 
future, so I'm not sure it is necessary to implement IcebergFileFormat like 
HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the 
second method is significantly simpler to implement.

And I try to split this task as some sub-task:
* Implement metadata related modification, such as create/drop/alter and so on;
* Support query/insert iceberg table;
* Do some query optimization to improve query performance；
* Other related work；

These are some of my simple ideas, more details still need to think. Hope you 
guys can give me some advice [~stigahuang][~tarmstrong]. Others are very 
welcome to give me more suggestions, thanks a lot!

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9621) Support iceberg on hdfs

2020-04-07 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1705#comment-1705
 ] 

WangSheng commented on IMPALA-9621:
---

Here are some of my thoughts:
* Refer to the implementation of kudu related operation, we need to implement 
IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, 
iceberg-scan-node.cc and so on;
* Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType;
* Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. 
If we use HiveCatalog, we can directly call the API of iceberg to create the 
table, impala does't need to create hms table independently. If we use 
HadoopTables, impala should create hms table firstly, and then call 
HadoopTables to create iceberg table, just like create kudu table;
* Iceberg now only support parquet file format, but may support orc in the 
future, so I'm not sure it is necessary to implement IcebergFileFormat like 
HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the 
second method is significantly simpler to implement.
And I try to split this task as some sub-task:
* Implement metadata related modification, such as create/drop/alter and so on;
* Support query/insert iceberg table;
* Do some query optimization to improve query performance；
* Other related work；
These are some of my simple ideas, more details still need to think. Hope you 
guys can give me some advice [~stigahuang][~tarmstrong]. Others are very 
welcome to give me more suggestions, thanks a lot!

> Support iceberg on hdfs
> ---
>
> Key: IMPALA-9621
> URL: https://issues.apache.org/jira/browse/IMPALA-9621
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select 
> iceberg data by impala. Our production use hdfs, so we will try to support 
> iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9621) Support iceberg on hdfs

2020-04-07 Thread WangSheng (Jira)

WangSheng created IMPALA-9621:
-

 Summary: Support iceberg on hdfs
 Key: IMPALA-9621
 URL: https://issues.apache.org/jira/browse/IMPALA-9621
 Project: IMPALA
  Issue Type: Improvement
Reporter: WangSheng
Assignee: WangSheng


We are investigating iceberg recently, and preparing to implement select 
iceberg data by impala. Our production use hdfs, so we will try to support 
iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9264) Support catalogd without HMS

2020-02-02 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028643#comment-17028643
 ] 

WangSheng commented on IMPALA-9264:
---

[~vihangk1] Sorry for my late reply due to the spring festival vacation, 
Vihang. "catalogd connected to mysql/pg directly" is "Local/Embedded Metastore 
Server" in catalogd server (the document you mentioned above) which is used in 
some situation in our production enviroment.

> Support catalogd without HMS
> 
>
> Key: IMPALA-9264
> URL: https://issues.apache.org/jira/browse/IMPALA-9264
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.3.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> In my company, catalogd connected to mysql/pg directly (instead of by 
> metastore service) is a  very common usage. And we just need to config 
> hive-site.xml like this(metastore related config such as hive.metastore.uris 
> is unnecessary):
> {code:java}
> 
> javax.jdo.option.ConnectionDriverName
> org.postgresql.Driver
>   
>   
> javax.jdo.option.ConnectionPassword
> password
>   
>   
> javax.jdo.option.ConnectionURL
> jdbc:postgresql://localhost:5432/HMS_home_impala
>   
>   
> javax.jdo.option.ConnectionUserName
> hiveuser
>   
> {code}
> Recently, when I test impala-3.3 in this situation, I found that created kudu 
> managed table failed 
> ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've 
> already fixed this.
> I guess there maybe other funcionality that have not been took into 
> considertion in this situation. So I built this jira to collect those 
> functionality, and I'm willing to continue contributing when I'm free.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9287) test_kudu_table_create_without_hms fails on Hive-3 environment

2020-01-19 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019174#comment-17019174
 ] 

WangSheng commented on IMPALA-9287:
---

Here is the Gerri url: https://gerrit.cloudera.org/#/c/15057/

> test_kudu_table_create_without_hms fails on Hive-3 environment
> --
>
> Key: IMPALA-9287
> URL: https://issues.apache.org/jira/browse/IMPALA-9287
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Vihang Karajgaonkar
>Assignee: WangSheng
>Priority: Blocker
>  Labels: broken-build
>
> {{test_kudu_table_create_without_hms}} which was added recently in 
> IMPALA-9266 fails when Hive-3 is used. To reproduce the issue build Impala 
> after setting {{USE_CDP_HIVE=true}} and then run the test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9287) test_kudu_table_create_without_hms fails on Hive-3 environment

2020-01-15 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng reassigned IMPALA-9287:
-

Assignee: WangSheng

> test_kudu_table_create_without_hms fails on Hive-3 environment
> --
>
> Key: IMPALA-9287
> URL: https://issues.apache.org/jira/browse/IMPALA-9287
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Vihang Karajgaonkar
>Assignee: WangSheng
>Priority: Blocker
>  Labels: broken-build
>
> {{test_kudu_table_create_without_hms}} which was added recently in 
> IMPALA-9266 fails when Hive-3 is used. To reproduce the issue build Impala 
> after setting {{USE_CDP_HIVE=true}} and then run the test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9287) test_kudu_table_create_without_hms fails on Hive-3 environment

2020-01-15 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016456#comment-17016456
 ] 

WangSheng commented on IMPALA-9287:
---

[~vihangk1] Sorry for my late reply, I will try to solve this problem as soon 
as prossible.

> test_kudu_table_create_without_hms fails on Hive-3 environment
> --
>
> Key: IMPALA-9287
> URL: https://issues.apache.org/jira/browse/IMPALA-9287
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Vihang Karajgaonkar
>Priority: Blocker
>  Labels: broken-build
>
> {{test_kudu_table_create_without_hms}} which was added recently in 
> IMPALA-9266 fails when Hive-3 is used. To reproduce the issue build Impala 
> after setting {{USE_CDP_HIVE=true}} and then run the test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9266) TestLogFragments.test_log_fragments fails due to missing log

2019-12-30 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005289#comment-17005289
 ] 

WangSheng commented on IMPALA-9266:
---

Write a new custom cluster test case to replace original fe test case. Here is 
the url: https://gerrit.cloudera.org/#/c/14962/

> TestLogFragments.test_log_fragments fails due to missing log
> 
>
> Key: IMPALA-9266
> URL: https://issues.apache.org/jira/browse/IMPALA-9266
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: WangSheng
>Priority: Blocker
>  Labels: broken-build
>
> TestLogFragments.test_log_fragments is failing due to missing a log entry:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/observability/test_log_fragments.py:46:
>  in test_log_fragments
> "] Analysis and authorization finished.")
> common/impala_test_suite.py:1149: in assert_impalad_log_contains
> self.assert_log_contains("impalad", level, line_regex, expected_count)
> common/impala_test_suite.py:1185: in assert_log_contains
> (expected_count, log_file_path, line_regex, found, line)
> E   AssertionError: Expected 1 lines in file 
> /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/ee_tests/impalad.impala-ec2-centos74-m5-4xlarge-ondemand-088c.vpc.cloudera.com.jenkins.log.INFO.20191227-001949.23945
>  matching regex 'ce41d657e70d6890:6f0f227d] Analysis and 
> authorization finished.', but found 0 lines. Last line was: 
> E   Caught signal: SIGTERM. Daemon will exit.{noformat}
> This started happening after the "IMPALA-8974: Fixed a bug when create kudu 
> managed table without HMS" commit went in. That commit adds a test that 
> restarts Impala in a frontend test. The problem is that it runs 
> start-impala-cluster.py without arguments, whereas bin/run-all-tests.sh runs 
> start-impala-cluster.py specifying the --log_dir. This would put the log 
> files in a different location (/tmp?).
> [https://github.com/apache/impala/blob/320f05852060c1027326ac20be7df340a7a5263f/fe/src/test/java/org/apache/impala/catalog/CreateKuduTableWithoutHMSTest.java#L98]
> [https://github.com/apache/impala/blob/master/bin/run-all-tests.sh#L165-L167]
> In one run that hit this issue, there are two sets of impalad logs in the 
> ee_test directory. One set starts at 06:40:22 and ends at 07:11:28. The 
> second set starts at 09:45:25 and ends at 09:47:30. So, this is missing 2.5 
> hours of ee_test log files, which matches the theory.
> This is also likely to impact other things like erasure coding or tests that 
> run against the data cache.
> GVO doesn't hit this because the job that runs frontend tests does not run 
> end to end tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-9266) TestLogFragments.test_log_fragments fails due to missing log

2019-12-29 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9266 started by WangSheng.
-
> TestLogFragments.test_log_fragments fails due to missing log
> 
>
> Key: IMPALA-9266
> URL: https://issues.apache.org/jira/browse/IMPALA-9266
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: WangSheng
>Priority: Blocker
>  Labels: broken-build
>
> TestLogFragments.test_log_fragments is failing due to missing a log entry:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/observability/test_log_fragments.py:46:
>  in test_log_fragments
> "] Analysis and authorization finished.")
> common/impala_test_suite.py:1149: in assert_impalad_log_contains
> self.assert_log_contains("impalad", level, line_regex, expected_count)
> common/impala_test_suite.py:1185: in assert_log_contains
> (expected_count, log_file_path, line_regex, found, line)
> E   AssertionError: Expected 1 lines in file 
> /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/ee_tests/impalad.impala-ec2-centos74-m5-4xlarge-ondemand-088c.vpc.cloudera.com.jenkins.log.INFO.20191227-001949.23945
>  matching regex 'ce41d657e70d6890:6f0f227d] Analysis and 
> authorization finished.', but found 0 lines. Last line was: 
> E   Caught signal: SIGTERM. Daemon will exit.{noformat}
> This started happening after the "IMPALA-8974: Fixed a bug when create kudu 
> managed table without HMS" commit went in. That commit adds a test that 
> restarts Impala in a frontend test. The problem is that it runs 
> start-impala-cluster.py without arguments, whereas bin/run-all-tests.sh runs 
> start-impala-cluster.py specifying the --log_dir. This would put the log 
> files in a different location (/tmp?).
> [https://github.com/apache/impala/blob/320f05852060c1027326ac20be7df340a7a5263f/fe/src/test/java/org/apache/impala/catalog/CreateKuduTableWithoutHMSTest.java#L98]
> [https://github.com/apache/impala/blob/master/bin/run-all-tests.sh#L165-L167]
> In one run that hit this issue, there are two sets of impalad logs in the 
> ee_test directory. One set starts at 06:40:22 and ends at 07:11:28. The 
> second set starts at 09:45:25 and ends at 09:47:30. So, this is missing 2.5 
> hours of ee_test log files, which matches the theory.
> This is also likely to impact other things like erasure coding or tests that 
> run against the data cache.
> GVO doesn't hit this because the job that runs frontend tests does not run 
> end to end tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9266) TestLogFragments.test_log_fragments fails due to missing log

2019-12-29 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005083#comment-17005083
 ] 

WangSheng commented on IMPALA-9266:
---

Here is the commit for this jira: https://gerrit.cloudera.org/#/c/14957/

> TestLogFragments.test_log_fragments fails due to missing log
> 
>
> Key: IMPALA-9266
> URL: https://issues.apache.org/jira/browse/IMPALA-9266
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: WangSheng
>Priority: Blocker
>  Labels: broken-build
>
> TestLogFragments.test_log_fragments is failing due to missing a log entry:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/observability/test_log_fragments.py:46:
>  in test_log_fragments
> "] Analysis and authorization finished.")
> common/impala_test_suite.py:1149: in assert_impalad_log_contains
> self.assert_log_contains("impalad", level, line_regex, expected_count)
> common/impala_test_suite.py:1185: in assert_log_contains
> (expected_count, log_file_path, line_regex, found, line)
> E   AssertionError: Expected 1 lines in file 
> /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/ee_tests/impalad.impala-ec2-centos74-m5-4xlarge-ondemand-088c.vpc.cloudera.com.jenkins.log.INFO.20191227-001949.23945
>  matching regex 'ce41d657e70d6890:6f0f227d] Analysis and 
> authorization finished.', but found 0 lines. Last line was: 
> E   Caught signal: SIGTERM. Daemon will exit.{noformat}
> This started happening after the "IMPALA-8974: Fixed a bug when create kudu 
> managed table without HMS" commit went in. That commit adds a test that 
> restarts Impala in a frontend test. The problem is that it runs 
> start-impala-cluster.py without arguments, whereas bin/run-all-tests.sh runs 
> start-impala-cluster.py specifying the --log_dir. This would put the log 
> files in a different location (/tmp?).
> [https://github.com/apache/impala/blob/320f05852060c1027326ac20be7df340a7a5263f/fe/src/test/java/org/apache/impala/catalog/CreateKuduTableWithoutHMSTest.java#L98]
> [https://github.com/apache/impala/blob/master/bin/run-all-tests.sh#L165-L167]
> In one run that hit this issue, there are two sets of impalad logs in the 
> ee_test directory. One set starts at 06:40:22 and ends at 07:11:28. The 
> second set starts at 09:45:25 and ends at 09:47:30. So, this is missing 2.5 
> hours of ee_test log files, which matches the theory.
> This is also likely to impact other things like erasure coding or tests that 
> run against the data cache.
> GVO doesn't hit this because the job that runs frontend tests does not run 
> end to end tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9266) TestLogFragments.test_log_fragments fails due to missing log

2019-12-29 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng reassigned IMPALA-9266:
-

Assignee: WangSheng

> TestLogFragments.test_log_fragments fails due to missing log
> 
>
> Key: IMPALA-9266
> URL: https://issues.apache.org/jira/browse/IMPALA-9266
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: WangSheng
>Priority: Blocker
>  Labels: broken-build
>
> TestLogFragments.test_log_fragments is failing due to missing a log entry:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/observability/test_log_fragments.py:46:
>  in test_log_fragments
> "] Analysis and authorization finished.")
> common/impala_test_suite.py:1149: in assert_impalad_log_contains
> self.assert_log_contains("impalad", level, line_regex, expected_count)
> common/impala_test_suite.py:1185: in assert_log_contains
> (expected_count, log_file_path, line_regex, found, line)
> E   AssertionError: Expected 1 lines in file 
> /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/ee_tests/impalad.impala-ec2-centos74-m5-4xlarge-ondemand-088c.vpc.cloudera.com.jenkins.log.INFO.20191227-001949.23945
>  matching regex 'ce41d657e70d6890:6f0f227d] Analysis and 
> authorization finished.', but found 0 lines. Last line was: 
> E   Caught signal: SIGTERM. Daemon will exit.{noformat}
> This started happening after the "IMPALA-8974: Fixed a bug when create kudu 
> managed table without HMS" commit went in. That commit adds a test that 
> restarts Impala in a frontend test. The problem is that it runs 
> start-impala-cluster.py without arguments, whereas bin/run-all-tests.sh runs 
> start-impala-cluster.py specifying the --log_dir. This would put the log 
> files in a different location (/tmp?).
> [https://github.com/apache/impala/blob/320f05852060c1027326ac20be7df340a7a5263f/fe/src/test/java/org/apache/impala/catalog/CreateKuduTableWithoutHMSTest.java#L98]
> [https://github.com/apache/impala/blob/master/bin/run-all-tests.sh#L165-L167]
> In one run that hit this issue, there are two sets of impalad logs in the 
> ee_test directory. One set starts at 06:40:22 and ends at 07:11:28. The 
> second set starts at 09:45:25 and ends at 09:47:30. So, this is missing 2.5 
> hours of ee_test log files, which matches the theory.
> This is also likely to impact other things like erasure coding or tests that 
> run against the data cache.
> GVO doesn't hit this because the job that runs frontend tests does not run 
> end to end tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Closed] (IMPALA-9268) CreateKuduTableWithoutHMSTest caused TestLogFragments failed

2019-12-28 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng closed IMPALA-9268.
-

duplicated

> CreateKuduTableWithoutHMSTest caused TestLogFragments failed
> 
>
> Key: IMPALA-9268
> URL: https://issues.apache.org/jira/browse/IMPALA-9268
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Fix For: Impala 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9268) CreateKuduTableWithoutHMSTest caused TestLogFragments failed

2019-12-28 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004618#comment-17004618
 ] 

WangSheng edited comment on IMPALA-9268 at 12/29/19 2:44 AM:
-

Hi [~tarmstrong], I sorry for caused this bug. I didn't realize to run all test 
on jenkin until Quanlong suggest me to submit an exhaustive test. I can be sure 
this is a bug after twice exhaustive tests on jenkins. But IMPALA-8794 already 
merge into master, so I can only create new Jira to handle this bug but 
duplicate with IMPALA-9266, so I will close this Jira and abandon my change on 
gerrit.


was (Author: skyyws):
Hi [~tarmstrong], I sorry for caused this bug. I didn't realize to run all test 
on jenkin until Quanlong suggest me to submit an exhaustive test. I can be sure 
this is a bug after twice exhaustive tests on jenkins. But IMPALA-8794 already 
merge into master, so I can only create new Jira to handle this bug but 
duplicate with IMPALA-9268, so I will close this Jira and abandon my change on 
gerrit.

> CreateKuduTableWithoutHMSTest caused TestLogFragments failed
> 
>
> Key: IMPALA-9268
> URL: https://issues.apache.org/jira/browse/IMPALA-9268
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Fix For: Impala 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Closed] (IMPALA-9268) CreateKuduTableWithoutHMSTest caused TestLogFragments failed

2019-12-28 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng closed IMPALA-9268.
-

duplicated

> CreateKuduTableWithoutHMSTest caused TestLogFragments failed
> 
>
> Key: IMPALA-9268
> URL: https://issues.apache.org/jira/browse/IMPALA-9268
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Fix For: Impala 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IMPALA-9268) CreateKuduTableWithoutHMSTest caused TestLogFragments failed

2019-12-28 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004618#comment-17004618
 ] 

WangSheng commented on IMPALA-9268:
---

Hi [~tarmstrong], I sorry for caused this bug. I didn't realize to run all test 
on jenkin until Quanlong suggest me to submit an exhaustive test. I can be sure 
this is a bug after twice exhaustive tests on jenkins. But IMPALA-8794 already 
merge into master, so I can only create new Jira to handle this bug but 
duplicate with IMPALA-9268, so I will close this Jira and abandon my change on 
gerrit.

> CreateKuduTableWithoutHMSTest caused TestLogFragments failed
> 
>
> Key: IMPALA-9268
> URL: https://issues.apache.org/jira/browse/IMPALA-9268
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Fix For: Impala 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9268) CreateKuduTableWithoutHMSTest caused TestLogFragments failed

2019-12-28 Thread WangSheng (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004615#comment-17004615
 ] 

WangSheng commented on IMPALA-9268:
---

[~laszlog], you are right, I just run frontend test, not all test, so I didn't 
found this bug. I did't notice your jira, so I create a new Jira to solve this 
problem and submit patch to gerrit. If you prepare to do this, I will close 
this Jira and abandon my change on gerrit. Thanks for your attention!

> CreateKuduTableWithoutHMSTest caused TestLogFragments failed
> 
>
> Key: IMPALA-9268
> URL: https://issues.apache.org/jira/browse/IMPALA-9268
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
> Fix For: Impala 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-9268) CreateKuduTableWithoutHMSTest caused TestLogFragments failed

2019-12-28 Thread WangSheng (Jira)

WangSheng created IMPALA-9268:
-

 Summary: CreateKuduTableWithoutHMSTest caused TestLogFragments 
failed
 Key: IMPALA-9268
 URL: https://issues.apache.org/jira/browse/IMPALA-9268
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.3.0
Reporter: WangSheng
Assignee: WangSheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9264) Support catalogd without HMS

2019-12-25 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9264:
--
Description: 
In my company, catalogd connected to mysql/pg directly (instead of by metastore 
service) is a  very common usage. And we just need to config hive-site.xml like 
this(metastore related config such as hive.metastore.uris is unnecessary):
{code:java}

javax.jdo.option.ConnectionDriverName
org.postgresql.Driver
  
  
javax.jdo.option.ConnectionPassword
password
  
  
javax.jdo.option.ConnectionURL
jdbc:postgresql://localhost:5432/HMS_home_impala
  
  
javax.jdo.option.ConnectionUserName
hiveuser
  
{code}
Recently, when I test impala-3.3 in this situation, I found that created kudu 
managed table failed 
([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've 
already fixed this.
I guess there maybe other funcionality that have not been took into 
considertion in this situation. So I built this jira to collect those 
functionality, and I'm willing to continue contributing when I'm free.

 

  was:
In my company, catalogd connected to mysql/pg directly (instead of by metastore 
service) is a  very common usage. And we just need to config hive-site.xml like 
this(metastore related config such as hive.metastore.uris is unnecessary):
{code:java}

javax.jdo.option.ConnectionDriverName
org.postgresql.Driver
  
  
javax.jdo.option.ConnectionPassword
password
  
  
javax.jdo.option.ConnectionURL
jdbc:postgresql://localhost:5432/HMS_home_impala
  
  
javax.jdo.option.ConnectionUserName
hiveuser
  
{code}
Recently, when I test impala-3.3 in this situation, I found that created kudu 
managed table failed 
([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've 
already fixed this.
I guess there maybe other functions that have not been took into considertion 
in this situation. So I built this jira to collect those functions, and I'm 
willing to continue contributing when I'm free.

 


> Support catalogd without HMS
> 
>
> Key: IMPALA-9264
> URL: https://issues.apache.org/jira/browse/IMPALA-9264
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.3.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> In my company, catalogd connected to mysql/pg directly (instead of by 
> metastore service) is a  very common usage. And we just need to config 
> hive-site.xml like this(metastore related config such as hive.metastore.uris 
> is unnecessary):
> {code:java}
> 
> javax.jdo.option.ConnectionDriverName
> org.postgresql.Driver
>   
>   
> javax.jdo.option.ConnectionPassword
> password
>   
>   
> javax.jdo.option.ConnectionURL
> jdbc:postgresql://localhost:5432/HMS_home_impala
>   
>   
> javax.jdo.option.ConnectionUserName
> hiveuser
>   
> {code}
> Recently, when I test impala-3.3 in this situation, I found that created kudu 
> managed table failed 
> ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've 
> already fixed this.
> I guess there maybe other funcionality that have not been took into 
> considertion in this situation. So I built this jira to collect those 
> functionality, and I'm willing to continue contributing when I'm free.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9264) Support catalogd without HMS

2019-12-25 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9264:
--
Description: 
In my company, catalogd connected to mysql/pg directly (instead of by metastore 
service) is a  very common usage. And we just need to config hive-site.xml like 
this(metastore related config such as hive.metastore.uris is unnecessary):
{code:java}

javax.jdo.option.ConnectionDriverName
org.postgresql.Driver
  
  
javax.jdo.option.ConnectionPassword
password
  
  
javax.jdo.option.ConnectionURL
jdbc:postgresql://localhost:5432/HMS_home_impala
  
  
javax.jdo.option.ConnectionUserName
hiveuser
  
{code}
Recently, when I test impala-3.3 in this situation, I found that created kudu 
managed table failed 
([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've 
already fixed this.
I guess there maybe other functions that have not been took into considertion 
in this situation. So I built this jira to collect those functions, and I'm 
willing to continue contributing when I'm free.

 

  was:
In my company, catalogd connected to mysql/pg directly (instead of by metastore 
service) is a  very common usage. And we just need to config hive-site.xml like 
this:
{code:java}

javax.jdo.option.ConnectionDriverName
org.postgresql.Driver
  
  
javax.jdo.option.ConnectionPassword
password
  
  
javax.jdo.option.ConnectionURL
jdbc:postgresql://localhost:5432/HMS_home_impala
  
  
javax.jdo.option.ConnectionUserName
hiveuser
  
{code}
Recently, when I test impala-3.3 in this situation, I found that created kudu 
managed table failed 
([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've 
already fixed this.
I guess there maybe other functions that have not been took into considertion 
in this situation. So I built this jira to collect those functions, and I'm 
willing to continue contributing when I'm free.

 

Summary: Support catalogd without HMS  (was: Support kinds of functions 
when catalogd connected to mysql/pg directly)

> Support catalogd without HMS
> 
>
> Key: IMPALA-9264
> URL: https://issues.apache.org/jira/browse/IMPALA-9264
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.3.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> In my company, catalogd connected to mysql/pg directly (instead of by 
> metastore service) is a  very common usage. And we just need to config 
> hive-site.xml like this(metastore related config such as hive.metastore.uris 
> is unnecessary):
> {code:java}
> 
> javax.jdo.option.ConnectionDriverName
> org.postgresql.Driver
>   
>   
> javax.jdo.option.ConnectionPassword
> password
>   
>   
> javax.jdo.option.ConnectionURL
> jdbc:postgresql://localhost:5432/HMS_home_impala
>   
>   
> javax.jdo.option.ConnectionUserName
> hiveuser
>   
> {code}
> Recently, when I test impala-3.3 in this situation, I found that created kudu 
> managed table failed 
> ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've 
> already fixed this.
> I guess there maybe other functions that have not been took into considertion 
> in this situation. So I built this jira to collect those functions, and I'm 
> willing to continue contributing when I'm free.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9264) Support kinds of functions when catalogd connected to mysql/pg directly

2019-12-25 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9264:
--
Summary: Support kinds of functions when catalogd connected to mysql/pg 
directly  (was: Support kinds of function when catalogd connected to mysql/pg 
directly)

> Support kinds of functions when catalogd connected to mysql/pg directly
> ---
>
> Key: IMPALA-9264
> URL: https://issues.apache.org/jira/browse/IMPALA-9264
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.3.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> In my company, catalogd connected to mysql/pg directly (instead of by 
> metastore service) is a  very common usage. And we just need to config 
> hive-site.xml like this:
> {code:java}
> 
> javax.jdo.option.ConnectionDriverName
> org.postgresql.Driver
>   
>   
> javax.jdo.option.ConnectionPassword
> password
>   
>   
> javax.jdo.option.ConnectionURL
> jdbc:postgresql://localhost:5432/HMS_home_impala
>   
>   
> javax.jdo.option.ConnectionUserName
> hiveuser
>   
> {code}
> Recently, when I test impala-3.3 in this situation, I found that created kudu 
> managed table failed 
> ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've 
> already fixed this.
> I guess there maybe other functions that have not been took into considertion 
> in this situation. So I built this jira to collect those functions, and I'm 
> willing to continue contributing when I'm free.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9264) Support kinds of function when catalogd connected to mysql/pg directly

2019-12-25 Thread WangSheng (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng updated IMPALA-9264:
--
Description: 
In my company, catalogd connected to mysql/pg directly (instead of by metastore 
service) is a  very common usage. And we just need to config hive-site.xml like 
this:
{code:java}

javax.jdo.option.ConnectionDriverName
org.postgresql.Driver
  
  
javax.jdo.option.ConnectionPassword
password
  
  
javax.jdo.option.ConnectionURL
jdbc:postgresql://localhost:5432/HMS_home_impala
  
  
javax.jdo.option.ConnectionUserName
hiveuser
  
{code}
Recently, when I test impala-3.3 in this situation, I found that created kudu 
managed table failed 
([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've 
already fixed this.
I guess there maybe other functions that have not been took into considertion 
in this situation. So I built this jira to collect those functions, and I'm 
willing to continue contributing when I'm free

 

  was:In my company, 


> Support kinds of function when catalogd connected to mysql/pg directly
> --
>
> Key: IMPALA-9264
> URL: https://issues.apache.org/jira/browse/IMPALA-9264
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.3.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>
> In my company, catalogd connected to mysql/pg directly (instead of by 
> metastore service) is a  very common usage. And we just need to config 
> hive-site.xml like this:
> {code:java}
> 
> javax.jdo.option.ConnectionDriverName
> org.postgresql.Driver
>   
>   
> javax.jdo.option.ConnectionPassword
> password
>   
>   
> javax.jdo.option.ConnectionURL
> jdbc:postgresql://localhost:5432/HMS_home_impala
>   
>   
> javax.jdo.option.ConnectionUserName
> hiveuser
>   
> {code}
> Recently, when I test impala-3.3 in this situation, I found that created kudu 
> managed table failed 
> ([IMPALA-8974|https://issues.apache.org/jira/browse/IMPALA-8974]), and I've 
> already fixed this.
> I guess there maybe other functions that have not been took into considertion 
> in this situation. So I built this jira to collect those functions, and I'm 
> willing to continue contributing when I'm free
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

1 2 >

1 - 100 of 125 matches

Mail list logo