RE: Reading drill(1.10.0) created parquet table in hive(2.1.1) using external table

2018-02-14 Thread Kunal Khatua
I won't be the best source for explaining why the flag worked, but this thread 
should help explain why performance is expected to be better when using the 
Native Parquet reader. 

https://lists.apache.org/thread.html/6429051d5babb87d3b03494524c1802f75d572d630cb5690fd616741@

That said, there is work in progress to improve performance for Parquet readers 
in general, including complex data. 

-Original Message-
From: Anup Tiwari [mailto:anup.tiw...@games24x7.com] 
Sent: Wednesday, February 14, 2018 2:09 AM
To: user@drill.apache.org
Subject: Re: Reading drill(1.10.0) created parquet table in hive(2.1.1) using 
external table

Hi Kunal,
That issue was related to container size which is resolved and now its 
working.However i was trying vice-versa which is a table is created in
hive(2.1.1)/hadoop(2.7.3) and stored on s3 and i am trying to read it via 
drill(1.10.0).
So initially when i was querying parquet data stored on s3 i was getting below 
error which i have resolved by setting `store.parquet.use_new_reader`=true  :-


ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: TProtocolException:
don't know what type: 15
Fragment 1:2
[Error Id: 43369db3-532a-4004-b966-7fbf42b84cc8 on 
prod-hadoop-102.bom-prod.aws.games24x7.com:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
TProtocolException: don't know what type: 15 Fragment 1:2 [Error Id: 
43369db3-532a-4004-b966-7fbf42b84cc8 on 
prod-hadoop-102.bom-prod.aws.games24x7.com:31010]at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
~[drill-common-1.10.0.jar:1.10.0]at
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.10.0.jar:1.10.0]at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_72]at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_72]at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] 
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in 
parquet record reader.


But while searching for above issue i found somewhere that setting 
`store.parquet.use_new_reader`=true impact query performance. can you provide 
any details on this ? Also post setting this i am able to query files created 
by hive. But when i am executing a big query on files then i am getting below 
error
:-
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
ConnectionPoolTimeoutException: Timeout waiting for connection from pool 
Fragment 3:14 [Error Id: 0564e2e4-c917-489c-8a54-2a623401563c on 
prod-hadoop-102.bom-prod.aws.games24x7.com:31010]at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
~[drill-common-1.10.0.jar:1.10.0]at
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.10.0.jar:1.10.0]at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_72]at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_72]at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] 
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in 
drill parquet reader (complex).Message: Failure in setting up reader Caused by: 
com.amazonaws.AmazonClientException: Unable to execute HTTP request:
Timeout waiting for connection from poolat
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:454)
~[aws-java-sdk-1.7.4.jar:na]at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
~[aws-java-sdk-1.7.4.jar:na]at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
~[aws-java-sdk-1.7.4.jar:na]at
com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:)
~[aws-java-sdk-1.7.4.jar:na]at
org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
~[hadoop-aws-2.7.1.jar:na]at
org.apache.hadoop.fs.s3a.S3AInputStream.seek(S3AInputStream.java:115)
~[hadoop-aws-2.7.1.ja

Re: Reading drill(1.10.0) created parquet table in hive(2.1.1) using external table

2018-02-14 Thread Anup Tiwari
Hi Kunal,
That issue was related to container size which is resolved and now its
working.However i was trying vice-versa which is a table is created in
hive(2.1.1)/hadoop(2.7.3) and stored on s3 and i am trying to read it via
drill(1.10.0).
So initially when i was querying parquet data stored on s3 i was getting below
error which i have resolved by setting `store.parquet.use_new_reader`=true  :-


ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: TProtocolException:
don't know what type: 15
Fragment 1:2
[Error Id: 43369db3-532a-4004-b966-7fbf42b84cc8 on
prod-hadoop-102.bom-prod.aws.games24x7.com:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
TProtocolException: don't know what type: 15
Fragment 1:2
[Error Id: 43369db3-532a-4004-b966-7fbf42b84cc8 on
prod-hadoop-102.bom-prod.aws.games24x7.com:31010]at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
~[drill-common-1.10.0.jar:1.10.0]at
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.10.0.jar:1.10.0]at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_72]at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_72]at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in
parquet record reader.


But while searching for above issue i found somewhere that setting
`store.parquet.use_new_reader`=true impact query performance. can you provide
any details on this ? Also post setting this i am able to query files created by
hive. But when i am executing a big query on files then i am getting below error
:-
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
ConnectionPoolTimeoutException: Timeout waiting for connection from pool
Fragment 3:14
[Error Id: 0564e2e4-c917-489c-8a54-2a623401563c on
prod-hadoop-102.bom-prod.aws.games24x7.com:31010]at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
~[drill-common-1.10.0.jar:1.10.0]at
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.10.0.jar:1.10.0]at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_72]at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_72]at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in
drill parquet reader (complex).Message: Failure in setting up reader
Caused by: com.amazonaws.AmazonClientException: Unable to execute HTTP request:
Timeout waiting for connection from poolat
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:454)
~[aws-java-sdk-1.7.4.jar:na]at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
~[aws-java-sdk-1.7.4.jar:na]at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
~[aws-java-sdk-1.7.4.jar:na]at
com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:)
~[aws-java-sdk-1.7.4.jar:na]at
org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
~[hadoop-aws-2.7.1.jar:na]at
org.apache.hadoop.fs.s3a.S3AInputStream.seek(S3AInputStream.java:115)
~[hadoop-aws-2.7.1.jar:na]at
org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62)
~[hadoop-common-2.7.1.jar:na]at
org.apache.drill.exec.store.dfs.DrillFSDataInputStream.seek(DrillFSDataInputStream.java:57)
~[drill-java-exec-1.10.0.jar:1.10.0]at
org.apache.parquet.hadoop.ColumnChunkIncReadStore.addColumn(ColumnChunkIncReadStore.java:245)
~[drill-java-exec-1.10.0.jar:1.8.1-drill-r0]at
org.apache.drill.exec.store.parquet2.DrillParquetReader.setup(DrillParquetReader.java:261)
~[drill-java-exec-1.10.0.jar:1.10.0]... 16 common frames omittedCaused
by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for
connection from poolat
org.apache.h

RE: Reading drill(1.10.0) created parquet table in hive(2.1.1) using external table

2018-02-13 Thread Kunal Khatua
Can you share what the error is? Without that, it is anybody's guess on what 
the issue is.

-Original Message-
From: Anup Tiwari [mailto:anup.tiw...@games24x7.com] 
Sent: Tuesday, February 13, 2018 6:19 AM
To: user@drill.apache.org
Subject: Reading drill(1.10.0) created parquet table in hive(2.1.1) using 
external table

Hi Team,
I am trying to read drill(1.10.0) created parquet table in hive(2.1.1) using 
external table and getting some error which seems not related to drill. Just 
asking anyone have tried this ? If yes then do we have any best practices/link 
for this?
Regards,
Anup Tiwari