Can you take a look at https://drill.apache.org/docs/s3-storage-plugin/#quering-parquet-format-files-on-s3 ? It could be an issue of connection to s3 timing out.
On Fri, Apr 15, 2016 at 1:03 AM, Ashish Goel <[email protected]> wrote: > Hi, > > I am running a CTAS query to convert JSON data stored in S3 into parquet > store back into S3. Both the input and output are S3 locations. While some > of parquest files are created in S3 but not all. I receive this error > message after some time - > > *Error: DATA_READ ERROR: Failure reading JSON file - Unable to execute HTTP > request: Timeout waiting for connection from pool* > > Input JSON Data Set - 93GB > > Number of Rows in input data set - ~131 Million > > From Google search, it indicates some kind of resource leak while > reading/writing data to S3, which is caused by not calling close() method > on S3 object. As I am able to run select queries on the same JSON data set > without any such issues, I suspect the leak to be around around S3 writes, > if there is any. > > Has anyone encountered similar issue before? > > Also I am able to create table from S3 data and store it in my local fs > using dfs storage plugin. But then the queries against the dfs data returns > partial view of the data stored in just the node locally not the entire > cluster which makes this option unviable for my use case. > > > Detailed Stack trace from one of the drillbits - > > at > > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.6.0.jar:1.6.0] > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318) > [drill-java-exec-1.6.0.jar:1.6.0] > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185) > [drill-java-exec-1.6.0.jar:1.6.0] > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287) > [drill-java-exec-1.6.0.jar:1.6.0] > > at > > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.6.0.jar:1.6.0] > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_99] > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_99] > > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_99] > > Caused by: com.amazonaws.AmazonClientException: Unable to execute HTTP > request: null > > at > > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:454) > ~[aws-java-sdk-1.7.4.jar:na] > > at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) > ~[aws-java-sdk-1.7.4.jar:na] > > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) > ~[aws-java-sdk-1.7.4.jar:na] > > at > > com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976) > ~[aws-java-sdk-1.7.4.jar:na] > > at > > com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956) > ~[aws-java-sdk-1.7.4.jar:na] > > at > > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892) > ~[hadoop-aws-2.7.1.jar:na] > > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) > ~[hadoop-aws-2.7.1.jar:na] > > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424) > ~[hadoop-common-2.7.1.jar:na] > > at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:400) > ~[hadoop-aws-2.7.1.jar:na] > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909) > ~[hadoop-common-2.7.1.jar:na] > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890) > ~[hadoop-common-2.7.1.jar:na] > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787) > ~[hadoop-common-2.7.1.jar:na] > > at > > org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:228) > ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0] > > at > > org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:183) > ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0] > > at > > org.apache.drill.exec.store.parquet.ParquetRecordWriter.endRecord(ParquetRecordWriter.java:364) > ~[drill-java-exec-1.6.0.jar:1.6.0] > > at > > org.apache.drill.exec.store.EventBasedRecordWriter.write(EventBasedRecordWriter.java:65) > ~[drill-java-exec-1.6.0.jar:1.6.0] > > at > > org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:106) > ~[drill-java-exec-1.6.0.jar:1.6.0] > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > ~[drill-java-exec-1.6.0.jar:1.6.0] > > at > > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > ~[drill-java-exec-1.6.0.jar:1.6.0] > > at > > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92) > ~[drill-java-exec-1.6.0.jar:1.6.0] > > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > ~[drill-java-exec-1.6.0.jar:1.6.0] > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:257) > ~[drill-java-exec-1.6.0.jar:1.6.0] > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:251) > ~[drill-java-exec-1.6.0.jar:1.6.0] > > at java.security.AccessController.doPrivileged(Native Method) > ~[na:1.7.0_99] > > at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_99] > > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > ~[hadoop-common-2.7.1.jar:na] > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:251) > [drill-java-exec-1.6.0.jar:1.6.0] > > ... 4 common frames omitted > > Caused by: java.io.InterruptedIOException: null > > at > > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:459) > ~[httpclient-4.2.5.jar:4.2.5] > > at > > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > ~[httpclient-4.2.5.jar:4.2.5] > > at > > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) > ~[httpclient-4.2.5.jar:4.2.5] > > at > > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384) > ~[aws-java-sdk-1.7.4.jar:na] > > ... 30 common frames omitted > > 2016-04-15 07:25:51,722 [28ef693e-604b-cb6e-6562-a18377d3b10c:frag:1:127] > INFO o.a.d.e.w.fragment.FragmentExecutor - > 28ef693e-604b-cb6e-6562-a18377d3b10c:1:127: State change requested FAILED > --> FINISHED > > Appreciate any response from the community. > > -- > Thanks, > Ashish >
