Hi,

I am running a CTAS query to convert JSON data stored in S3 into parquet
store back into S3. Both the input and output are S3 locations. While some
of parquest files are created in S3 but not all. I receive this error
message after some time -

*Error: DATA_READ ERROR: Failure reading JSON file - Unable to execute HTTP
request: Timeout waiting for connection from pool*

Input JSON Data Set - 93GB

Number of Rows in input data set - ~131 Million

>From Google search, it indicates some kind of resource leak while
reading/writing data to S3, which is caused by not calling close() method
on S3 object. As I am able to run select queries on the same JSON data set
without any such issues, I suspect the leak to be around around S3 writes,
if there is any.

Has anyone encountered similar issue before?

Also I am able to create table from S3 data and store it in my local fs
using dfs storage plugin. But then the queries against the dfs data returns
partial view of the data stored in just the node locally not the entire
cluster which makes this option unviable for my use case.


Detailed Stack trace from one of the drillbits -

at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
~[drill-common-1.6.0.jar:1.6.0]

at
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318)
[drill-java-exec-1.6.0.jar:1.6.0]

at
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185)
[drill-java-exec-1.6.0.jar:1.6.0]

at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287)
[drill-java-exec-1.6.0.jar:1.6.0]

at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.6.0.jar:1.6.0]

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_99]

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_99]

at java.lang.Thread.run(Thread.java:745) [na:1.7.0_99]

Caused by: com.amazonaws.AmazonClientException: Unable to execute HTTP
request: null

at
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:454)
~[aws-java-sdk-1.7.4.jar:na]

at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
~[aws-java-sdk-1.7.4.jar:na]

at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
~[aws-java-sdk-1.7.4.jar:na]

at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
~[aws-java-sdk-1.7.4.jar:na]

at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
~[aws-java-sdk-1.7.4.jar:na]

at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
~[hadoop-aws-2.7.1.jar:na]

at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
~[hadoop-aws-2.7.1.jar:na]

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
~[hadoop-common-2.7.1.jar:na]

at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:400)
~[hadoop-aws-2.7.1.jar:na]

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
~[hadoop-common-2.7.1.jar:na]

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
~[hadoop-common-2.7.1.jar:na]

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
~[hadoop-common-2.7.1.jar:na]

at
org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:228)
~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]

at
org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:183)
~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]

at
org.apache.drill.exec.store.parquet.ParquetRecordWriter.endRecord(ParquetRecordWriter.java:364)
~[drill-java-exec-1.6.0.jar:1.6.0]

at
org.apache.drill.exec.store.EventBasedRecordWriter.write(EventBasedRecordWriter.java:65)
~[drill-java-exec-1.6.0.jar:1.6.0]

at
org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:106)
~[drill-java-exec-1.6.0.jar:1.6.0]

at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.6.0.jar:1.6.0]

at
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
~[drill-java-exec-1.6.0.jar:1.6.0]

at
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
~[drill-java-exec-1.6.0.jar:1.6.0]

at
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
~[drill-java-exec-1.6.0.jar:1.6.0]

at
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:257)
~[drill-java-exec-1.6.0.jar:1.6.0]

at
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:251)
~[drill-java-exec-1.6.0.jar:1.6.0]

at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_99]

at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_99]

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
~[hadoop-common-2.7.1.jar:na]

at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:251)
[drill-java-exec-1.6.0.jar:1.6.0]

... 4 common frames omitted

Caused by: java.io.InterruptedIOException: null

at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:459)
~[httpclient-4.2.5.jar:4.2.5]

at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
~[httpclient-4.2.5.jar:4.2.5]

at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
~[httpclient-4.2.5.jar:4.2.5]

at
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384)
~[aws-java-sdk-1.7.4.jar:na]

... 30 common frames omitted

2016-04-15 07:25:51,722 [28ef693e-604b-cb6e-6562-a18377d3b10c:frag:1:127]
INFO  o.a.d.e.w.fragment.FragmentExecutor -
28ef693e-604b-cb6e-6562-a18377d3b10c:1:127: State change requested FAILED
--> FINISHED

Appreciate any response from the community.

-- 
Thanks,
Ashish

Reply via email to