RE: Reading drill(1.10.0) created parquet table in hive(2.1.1) using external table
I won't be the best source for explaining why the flag worked, but this thread should help explain why performance is expected to be better when using the Native Parquet reader. https://lists.apache.org/thread.html/6429051d5babb87d3b03494524c1802f75d572d630cb5690fd616741@ That said, there is work in progress to improve performance for Parquet readers in general, including complex data. -Original Message- From: Anup Tiwari [mailto:anup.tiw...@games24x7.com] Sent: Wednesday, February 14, 2018 2:09 AM To: user@drill.apache.org Subject: Re: Reading drill(1.10.0) created parquet table in hive(2.1.1) using external table Hi Kunal, That issue was related to container size which is resolved and now its working.However i was trying vice-versa which is a table is created in hive(2.1.1)/hadoop(2.7.3) and stored on s3 and i am trying to read it via drill(1.10.0). So initially when i was querying parquet data stored on s3 i was getting below error which i have resolved by setting `store.parquet.use_new_reader`=true :- ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: TProtocolException: don't know what type: 15 Fragment 1:2 [Error Id: 43369db3-532a-4004-b966-7fbf42b84cc8 on prod-hadoop-102.bom-prod.aws.games24x7.com:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: TProtocolException: don't know what type: 15 Fragment 1:2 [Error Id: 43369db3-532a-4004-b966-7fbf42b84cc8 on prod-hadoop-102.bom-prod.aws.games24x7.com:31010]at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) ~[drill-common-1.10.0.jar:1.10.0]at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293) [drill-java-exec-1.10.0.jar:1.10.0]at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) [drill-java-exec-1.10.0.jar:1.10.0]at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262) [drill-java-exec-1.10.0.jar:1.10.0]at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.10.0.jar:1.10.0]at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_72]at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72]at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet record reader. But while searching for above issue i found somewhere that setting `store.parquet.use_new_reader`=true impact query performance. can you provide any details on this ? Also post setting this i am able to query files created by hive. But when i am executing a big query on files then i am getting below error :- org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: ConnectionPoolTimeoutException: Timeout waiting for connection from pool Fragment 3:14 [Error Id: 0564e2e4-c917-489c-8a54-2a623401563c on prod-hadoop-102.bom-prod.aws.games24x7.com:31010]at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) ~[drill-common-1.10.0.jar:1.10.0]at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293) [drill-java-exec-1.10.0.jar:1.10.0]at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) [drill-java-exec-1.10.0.jar:1.10.0]at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262) [drill-java-exec-1.10.0.jar:1.10.0]at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.10.0.jar:1.10.0]at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_72]at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72]at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in drill parquet reader (complex).Message: Failure in setting up reader Caused by: com.amazonaws.AmazonClientException: Unable to execute HTTP request: Timeout waiting for connection from poolat com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:454) ~[aws-java-sdk-1.7.4.jar:na]at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) ~[aws-java-sdk-1.7.4.jar:na]at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) ~[aws-java-sdk-1.7.4.jar:na]at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) ~[aws-java-sdk-1.7.4.jar:na]at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) ~[hadoop-aws-2.7.1.jar:na]at org.apache.hadoop.fs.s3a.S3AInputStream.seek(S3AInputStream.java:115) ~[hadoop-aws-2.7.1.ja
Re: Reading drill(1.10.0) created parquet table in hive(2.1.1) using external table
Hi Kunal, That issue was related to container size which is resolved and now its working.However i was trying vice-versa which is a table is created in hive(2.1.1)/hadoop(2.7.3) and stored on s3 and i am trying to read it via drill(1.10.0). So initially when i was querying parquet data stored on s3 i was getting below error which i have resolved by setting `store.parquet.use_new_reader`=true :- ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: TProtocolException: don't know what type: 15 Fragment 1:2 [Error Id: 43369db3-532a-4004-b966-7fbf42b84cc8 on prod-hadoop-102.bom-prod.aws.games24x7.com:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: TProtocolException: don't know what type: 15 Fragment 1:2 [Error Id: 43369db3-532a-4004-b966-7fbf42b84cc8 on prod-hadoop-102.bom-prod.aws.games24x7.com:31010]at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) ~[drill-common-1.10.0.jar:1.10.0]at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293) [drill-java-exec-1.10.0.jar:1.10.0]at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) [drill-java-exec-1.10.0.jar:1.10.0]at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262) [drill-java-exec-1.10.0.jar:1.10.0]at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.10.0.jar:1.10.0]at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_72]at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72]at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet record reader. But while searching for above issue i found somewhere that setting `store.parquet.use_new_reader`=true impact query performance. can you provide any details on this ? Also post setting this i am able to query files created by hive. But when i am executing a big query on files then i am getting below error :- org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: ConnectionPoolTimeoutException: Timeout waiting for connection from pool Fragment 3:14 [Error Id: 0564e2e4-c917-489c-8a54-2a623401563c on prod-hadoop-102.bom-prod.aws.games24x7.com:31010]at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) ~[drill-common-1.10.0.jar:1.10.0]at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293) [drill-java-exec-1.10.0.jar:1.10.0]at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) [drill-java-exec-1.10.0.jar:1.10.0]at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262) [drill-java-exec-1.10.0.jar:1.10.0]at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.10.0.jar:1.10.0]at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_72]at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72]at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in drill parquet reader (complex).Message: Failure in setting up reader Caused by: com.amazonaws.AmazonClientException: Unable to execute HTTP request: Timeout waiting for connection from poolat com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:454) ~[aws-java-sdk-1.7.4.jar:na]at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) ~[aws-java-sdk-1.7.4.jar:na]at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) ~[aws-java-sdk-1.7.4.jar:na]at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) ~[aws-java-sdk-1.7.4.jar:na]at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) ~[hadoop-aws-2.7.1.jar:na]at org.apache.hadoop.fs.s3a.S3AInputStream.seek(S3AInputStream.java:115) ~[hadoop-aws-2.7.1.jar:na]at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62) ~[hadoop-common-2.7.1.jar:na]at org.apache.drill.exec.store.dfs.DrillFSDataInputStream.seek(DrillFSDataInputStream.java:57) ~[drill-java-exec-1.10.0.jar:1.10.0]at org.apache.parquet.hadoop.ColumnChunkIncReadStore.addColumn(ColumnChunkIncReadStore.java:245) ~[drill-java-exec-1.10.0.jar:1.8.1-drill-r0]at org.apache.drill.exec.store.parquet2.DrillParquetReader.setup(DrillParquetReader.java:261) ~[drill-java-exec-1.10.0.jar:1.10.0]... 16 common frames omittedCaused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from poolat org.apache.h
RE: Reading drill(1.10.0) created parquet table in hive(2.1.1) using external table
Can you share what the error is? Without that, it is anybody's guess on what the issue is. -Original Message- From: Anup Tiwari [mailto:anup.tiw...@games24x7.com] Sent: Tuesday, February 13, 2018 6:19 AM To: user@drill.apache.org Subject: Reading drill(1.10.0) created parquet table in hive(2.1.1) using external table Hi Team, I am trying to read drill(1.10.0) created parquet table in hive(2.1.1) using external table and getting some error which seems not related to drill. Just asking anyone have tried this ? If yes then do we have any best practices/link for this? Regards, Anup Tiwari