eing written to.
>
>
> Could you file a JIRA for this?
>
>
> Thanks
>
> Kunal
>
>
> From: François Méthot <fmetho...@gmail.com>
> Sent: Thursday, March 23, 2017 9:08:51 AM
> To: dev@drill.apache.org
> Subject: Re: Single
rch 23, 2017 9:08:51 AM
To: dev@drill.apache.org
Subject: Re: Single Hdfs block per parquet file
After further investigation, Drill uses the hadoop ParquetFileWriter (
https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileWriter.java
).
This is
After further investigation, Drill uses the hadoop ParquetFileWriter (
https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileWriter.java
).
This is where the file creation occurs so it might be tricky after all.
However ParquetRecordWriter.java (
Yes, seems like it is possible to create files with different block sizes.
We could potentially pass the configured store.parquet.block-size to the create
call.
I will try it out and see. will let you know.
Thanks,
Padma
> On Mar 22, 2017, at 4:16 PM, François Méthot
Here are 2 links I could find:
http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)
I think we create one file for each parquet block.
If underlying HDFS block size is 128 MB and parquet block size is > 128MB,
it will create more blocks on HDFS.
Can you let me know what is the HDFS API that would allow you to
do otherwise ?
Thanks,
Padma
> On Mar 22, 2017, at 11:54 AM,
Hi,
Is there a way to force Drill to store CTAS generated parquet file as a
single block when using HDFS? Java HDFS API allows to do that, files could
be created with the Parquet block-size.
We are using Drill on hdfs configured with block size of 128MB. Changing
this size is not an option at