Re: Single Hdfs block per parquet file

2017-03-24 Thread François Méthot
eing written to. > > > Could you file a JIRA for this? > > > Thanks > > Kunal > > > From: François Méthot <fmetho...@gmail.com> > Sent: Thursday, March 23, 2017 9:08:51 AM > To: dev@drill.apache.org > Subject: Re: Single

Re: Single Hdfs block per parquet file

2017-03-23 Thread Kunal Khatua
rch 23, 2017 9:08:51 AM To: dev@drill.apache.org Subject: Re: Single Hdfs block per parquet file After further investigation, Drill uses the hadoop ParquetFileWriter ( https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileWriter.java ). This is

Re: Single Hdfs block per parquet file

2017-03-23 Thread François Méthot
After further investigation, Drill uses the hadoop ParquetFileWriter ( https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileWriter.java ). This is where the file creation occurs so it might be tricky after all. However ParquetRecordWriter.java (

Re: Single Hdfs block per parquet file

2017-03-22 Thread Padma Penumarthy
Yes, seems like it is possible to create files with different block sizes. We could potentially pass the configured store.parquet.block-size to the create call. I will try it out and see. will let you know. Thanks, Padma > On Mar 22, 2017, at 4:16 PM, François Méthot

Re: Single Hdfs block per parquet file

2017-03-22 Thread François Méthot
Here are 2 links I could find: http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)

Re: Single Hdfs block per parquet file

2017-03-22 Thread Padma Penumarthy
I think we create one file for each parquet block. If underlying HDFS block size is 128 MB and parquet block size is > 128MB, it will create more blocks on HDFS. Can you let me know what is the HDFS API that would allow you to do otherwise ? Thanks, Padma > On Mar 22, 2017, at 11:54 AM,

Single Hdfs block per parquet file

2017-03-22 Thread François Méthot
Hi, Is there a way to force Drill to store CTAS generated parquet file as a single block when using HDFS? Java HDFS API allows to do that, files could be created with the Parquet block-size. We are using Drill on hdfs configured with block size of 128MB. Changing this size is not an option at