Hi Navin, Thank you for the detailed information. Very helpful.
I may be confused about what "ECS" stands for in your case. I had assumed it is the Amazon Elastic Container Service. However, I'm struggling to understand how that ECS provides an S3 interface. Is it, instead the Dell EMC Elastic Cloud Storage storage layer from Ipsilon? [1] The stack trace shows that the delay/problem occurs when communicating with the S3 endpoint. Assuming my sources match the version you are using, the problem occurs when Drill tries to open the Parquet file footer: private ParquetMetadata readFooter(Configuration conf, Path path, ParquetReaderConfig readerConfig) throws IOException { // Error is in the following line try (ParquetFileReader reader = ParquetFileReader.open(HadoopInputFile.fromPath(path, readerConfig.addCountersToConf(conf)), readerConfig.toReadOptions())) { We can see from the code above, and from the stack trace, that Drill is blissfully ignorant of the fact that the S3 API is connecting to ECS. That is, Drill does nothing differently for the ECS S3 case than it does for the Amazon S3 case or the HDFS case. In all cases, it calls the HDFS client fromPath() function. Given this, my suspicion is that there is a problem with the Dell ECS implementation of the S3 API. A previous note suggested that you check this outside of Drill. 1. Use the HDFS client to download a Parquet (or any) file from ECS. 2. Use an S3 client to download the same file from ECS. Do the above repeatedly in a loop to determine if the operations are stable under load. There is also a Parquet client tool that lets you inspect Parquet files. [2] I think (but am not certain) that it uses the HDFS client API as well. Try using that client to inspect your Parquet files. Again, run the operations in a loop to test load. Does that tool hit the same issues? If the problem is somehow related to Dell's implementation of the S3 API, then there is little Drill can do to fix it. On the other hand, if the Dell implemetation requires certain properties or settings to work well, then we can figure out how to configure that in HDFS so that Drill can pick up those settings. Information about Dell's S3 implementation is at [3]. Please let us know if the above suggestions are off the mark; all we have to go on is the information which you've kindly shared. Perhahs there are other key facts we do not yet know. Thanks, - Paul [1] http://doc.isilon.com/ECS/3.1/DataAccessGuide/index.html#ecs_c_docs_landing_page_content.html [2] https://github.com/apache/parquet-mr/tree/master/parquet-cli [3] https://www.emc.com/techpubs/api/ecs/v2-2-0-0/S3ObjectOperations_ba672412ac371bb6cf4e69291344510e_overview.htm On Saturday, March 28, 2020, 1:39:00 AM PDT, Navin Bhawsar <navin.bhaw...@gmail.com> wrote: Thanks Paul. To add more details we are comparing drill performance using below two storage options1.dfs plugin pointing to single node hdfs cluster2. S3 plugin pointing to ecs bucket ,no hdfs In both storage we have data stored in parquet files for e.g. in this query we are querying a directory with 19 parquet files close to 2gb in total same set on s3 and hdfs. Drillbits are running on 2 unix machines with (6 core,32 gb) each.On one of the unix machine we have hdfs single node cluster + zookeeper + drillbit running .Other unix machine is running drill bit. On Both hdfs and s3 storage we have created parquet metadata file,additionally we have statistics created for dfs .Based on analysis so far dfs is performing better when compared to s3.Same query which completes in 2.121s on dfs ,times out on s3. Looking at plan mostly "parquet row group scan" is taking more time 99 %.Stack trace shows error " unable to execute http request: Timeout waiting for connection from (org.apache.drill.common.exceptions.ExecutionSetupException) java.io.InterruptedIOException: getFileStatus on s3a://test-bucket/TestDir/Test_1.parquet: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator.getBatch():261 org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():42 org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():36 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():163 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186 org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():114 org.apache.drill.exec.physical.impl.ImplCreator.getExec():90 org.apache.drill.exec.work.fragment.FragmentExecutor.run():292 org.apache.drill.common.SelfCleaningRunnable.run():38 .......():0 Caused By (java.lang.Exception) getFileStatus on s3a://test-bucket/TestDir/Test_1.parquet: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException():352 org.apache.hadoop.fs.s3a.S3AUtils.translateException():177 org.apache.hadoop.fs.s3a.S3AUtils.translateException():151 org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus():2242 org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus():2204 org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus():2143 org.apache.parquet.hadoop.util.HadoopInputFile.fromPath():39 org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator.readFooter():353 org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator.getBatch():149 org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():42 org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():36 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():163 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186 org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():114 org.apache.drill.exec.physical.impl.ImplCreator.getExec():90 org.apache.drill.exec.work.fragment.FragmentExecutor.run():292 org.apache.drill.common.SelfCleaningRunnable.run():38 .......():0Thanks & Regards ,Navin On Sat, 28 Mar 2020, 09:27 Paul Rogers, <par0...@yahoo.com> wrote: Hi Navin, You had mentioned your ECS solution in an earlier note. What are you using to access data in your container? Is your ECS container running HDFS? Or, do you have some other API? Do you have Drill running in a container on ECS, or is that were your data is located? It would be helpful if you could perhaps describe your setup in a bit more detail so we can offer suggestions about where to look for an issue. By the way: the query profile is often a good place to start. You'll find them in the Drill Web Console. Looking at each operator you can see how much memory was used and how long things took. Specifically, look at the time taken by the scan: is the slowness due to reading the data, or is some other part of the query taking the time? When you get the error, what is the stack trace? Is the error coming from some particular HDFS client? In some particular operation? Thanks, - Paul On Friday, March 27, 2020, 6:59:42 AM PDT, Navin Bhawsar <navin.bhaw...@gmail.com> wrote: Hi, We are facing performance issue where apache drill query on ecs time out with below error "ConnectionPoolTimeoutException: Timeout waiting for connection from pool" However same query works fine on hdfs single node with execution time of 2.1 sec.(planning =.483s) Parquet file size <1.5 GB Total parquet files scanned = 8( total 19 in directory) Apache drill version 1.17 JDK 1.8.0_74 Total rows returned from query =71000 There are 2 drillbits running in distributed mode . 13 GB default allocated per drill bit. Any ideas why ecs performance so bad when compared with hdfs for drill ? Please advise if drill provides options to optimize ecs querying . Please let me know if you need more details. Thanks & Regards, Navin