Thanks Abhishek. This helped. On Fri, Apr 15, 2016 at 3:13 PM, Abhishek Girish <[email protected]> wrote:
> Can you take a look at > > https://drill.apache.org/docs/s3-storage-plugin/#quering-parquet-format-files-on-s3 > ? It could be an issue of connection to s3 timing out. > > On Fri, Apr 15, 2016 at 1:03 AM, Ashish Goel <[email protected] > > > wrote: > > > Hi, > > > > I am running a CTAS query to convert JSON data stored in S3 into parquet > > store back into S3. Both the input and output are S3 locations. While > some > > of parquest files are created in S3 but not all. I receive this error > > message after some time - > > > > *Error: DATA_READ ERROR: Failure reading JSON file - Unable to execute > HTTP > > request: Timeout waiting for connection from pool* > > > > Input JSON Data Set - 93GB > > > > Number of Rows in input data set - ~131 Million > > > > From Google search, it indicates some kind of resource leak while > > reading/writing data to S3, which is caused by not calling close() method > > on S3 object. As I am able to run select queries on the same JSON data > set > > without any such issues, I suspect the leak to be around around S3 > writes, > > if there is any. > > > > Has anyone encountered similar issue before? > > > > Also I am able to create table from S3 data and store it in my local fs > > using dfs storage plugin. But then the queries against the dfs data > returns > > partial view of the data stored in just the node locally not the entire > > cluster which makes this option unviable for my use case. > > > > > > Detailed Stack trace from one of the drillbits - > > > > at > > > > > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > > ~[drill-common-1.6.0.jar:1.6.0] > > > > at > > > > > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318) > > [drill-java-exec-1.6.0.jar:1.6.0] > > > > at > > > > > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185) > > [drill-java-exec-1.6.0.jar:1.6.0] > > > > at > > > > > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287) > > [drill-java-exec-1.6.0.jar:1.6.0] > > > > at > > > > > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > > [drill-common-1.6.0.jar:1.6.0] > > > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > [na:1.7.0_99] > > > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > [na:1.7.0_99] > > > > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_99] > > > > Caused by: com.amazonaws.AmazonClientException: Unable to execute HTTP > > request: null > > > > at > > > > > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:454) > > ~[aws-java-sdk-1.7.4.jar:na] > > > > at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) > > ~[aws-java-sdk-1.7.4.jar:na] > > > > at > > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) > > ~[aws-java-sdk-1.7.4.jar:na] > > > > at > > > > > com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976) > > ~[aws-java-sdk-1.7.4.jar:na] > > > > at > > > > > com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956) > > ~[aws-java-sdk-1.7.4.jar:na] > > > > at > > > > > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892) > > ~[hadoop-aws-2.7.1.jar:na] > > > > at > > > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) > > ~[hadoop-aws-2.7.1.jar:na] > > > > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424) > > ~[hadoop-common-2.7.1.jar:na] > > > > at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:400) > > ~[hadoop-aws-2.7.1.jar:na] > > > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909) > > ~[hadoop-common-2.7.1.jar:na] > > > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890) > > ~[hadoop-common-2.7.1.jar:na] > > > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787) > > ~[hadoop-common-2.7.1.jar:na] > > > > at > > > > > org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:228) > > ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0] > > > > at > > > > > org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:183) > > ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0] > > > > at > > > > > org.apache.drill.exec.store.parquet.ParquetRecordWriter.endRecord(ParquetRecordWriter.java:364) > > ~[drill-java-exec-1.6.0.jar:1.6.0] > > > > at > > > > > org.apache.drill.exec.store.EventBasedRecordWriter.write(EventBasedRecordWriter.java:65) > > ~[drill-java-exec-1.6.0.jar:1.6.0] > > > > at > > > > > org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:106) > > ~[drill-java-exec-1.6.0.jar:1.6.0] > > > > at > > > > > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > > ~[drill-java-exec-1.6.0.jar:1.6.0] > > > > at > > > > > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > > ~[drill-java-exec-1.6.0.jar:1.6.0] > > > > at > > > > > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92) > > ~[drill-java-exec-1.6.0.jar:1.6.0] > > > > at > > > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > > ~[drill-java-exec-1.6.0.jar:1.6.0] > > > > at > > > > > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:257) > > ~[drill-java-exec-1.6.0.jar:1.6.0] > > > > at > > > > > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:251) > > ~[drill-java-exec-1.6.0.jar:1.6.0] > > > > at java.security.AccessController.doPrivileged(Native Method) > > ~[na:1.7.0_99] > > > > at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_99] > > > > at > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > > ~[hadoop-common-2.7.1.jar:na] > > > > at > > > > > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:251) > > [drill-java-exec-1.6.0.jar:1.6.0] > > > > ... 4 common frames omitted > > > > Caused by: java.io.InterruptedIOException: null > > > > at > > > > > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:459) > > ~[httpclient-4.2.5.jar:4.2.5] > > > > at > > > > > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > > ~[httpclient-4.2.5.jar:4.2.5] > > > > at > > > > > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) > > ~[httpclient-4.2.5.jar:4.2.5] > > > > at > > > > > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384) > > ~[aws-java-sdk-1.7.4.jar:na] > > > > ... 30 common frames omitted > > > > 2016-04-15 07:25:51,722 [28ef693e-604b-cb6e-6562-a18377d3b10c:frag:1:127] > > INFO o.a.d.e.w.fragment.FragmentExecutor - > > 28ef693e-604b-cb6e-6562-a18377d3b10c:1:127: State change requested FAILED > > --> FINISHED > > > > Appreciate any response from the community. > > > > -- > > Thanks, > > Ashish > > > -- Thanks, Ashish
