Hi, I am running a CTAS query to convert JSON data stored in S3 into parquet store back into S3. Both the input and output are S3 locations. While some of parquest files are created in S3 but not all. I receive this error message after some time -
*Error: DATA_READ ERROR: Failure reading JSON file - Unable to execute HTTP request: Timeout waiting for connection from pool* Input JSON Data Set - 93GB Number of Rows in input data set - ~131 Million >From Google search, it indicates some kind of resource leak while reading/writing data to S3, which is caused by not calling close() method on S3 object. As I am able to run select queries on the same JSON data set without any such issues, I suspect the leak to be around around S3 writes, if there is any. Has anyone encountered similar issue before? Also I am able to create table from S3 data and store it in my local fs using dfs storage plugin. But then the queries against the dfs data returns partial view of the data stored in just the node locally not the entire cluster which makes this option unviable for my use case. Detailed Stack trace from one of the drillbits - at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) ~[drill-common-1.6.0.jar:1.6.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318) [drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185) [drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287) [drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.6.0.jar:1.6.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_99] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_99] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_99] Caused by: com.amazonaws.AmazonClientException: Unable to execute HTTP request: null at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:454) ~[aws-java-sdk-1.7.4.jar:na] at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) ~[aws-java-sdk-1.7.4.jar:na] at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) ~[aws-java-sdk-1.7.4.jar:na] at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976) ~[aws-java-sdk-1.7.4.jar:na] at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956) ~[aws-java-sdk-1.7.4.jar:na] at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892) ~[hadoop-aws-2.7.1.jar:na] at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) ~[hadoop-aws-2.7.1.jar:na] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424) ~[hadoop-common-2.7.1.jar:na] at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:400) ~[hadoop-aws-2.7.1.jar:na] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909) ~[hadoop-common-2.7.1.jar:na] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890) ~[hadoop-common-2.7.1.jar:na] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787) ~[hadoop-common-2.7.1.jar:na] at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:228) ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0] at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:183) ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0] at org.apache.drill.exec.store.parquet.ParquetRecordWriter.endRecord(ParquetRecordWriter.java:364) ~[drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.store.EventBasedRecordWriter.write(EventBasedRecordWriter.java:65) ~[drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:106) ~[drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) ~[drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92) ~[drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) ~[drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:257) ~[drill-java-exec-1.6.0.jar:1.6.0] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:251) ~[drill-java-exec-1.6.0.jar:1.6.0] at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_99] at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_99] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) ~[hadoop-common-2.7.1.jar:na] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:251) [drill-java-exec-1.6.0.jar:1.6.0] ... 4 common frames omitted Caused by: java.io.InterruptedIOException: null at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:459) ~[httpclient-4.2.5.jar:4.2.5] at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) ~[httpclient-4.2.5.jar:4.2.5] at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) ~[httpclient-4.2.5.jar:4.2.5] at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384) ~[aws-java-sdk-1.7.4.jar:na] ... 30 common frames omitted 2016-04-15 07:25:51,722 [28ef693e-604b-cb6e-6562-a18377d3b10c:frag:1:127] INFO o.a.d.e.w.fragment.FragmentExecutor - 28ef693e-604b-cb6e-6562-a18377d3b10c:1:127: State change requested FAILED --> FINISHED Appreciate any response from the community. -- Thanks, Ashish
