The reason it is not commonly used is typically the goal with Parquet is to
have no more that a Parquet row group should always be contained within a
single block replica set (to guarantee the possibility of total locality).
The easiest way to guarantee this is to keep your Parquet row group format
at or slightly smaller than your HDFS block size.

On Wed, Apr 8, 2015 at 5:52 PM, Steven Phillips <[email protected]>
wrote:

> No, this is currently not possible with drill.
>
> It's generally not recommended to do that anyway, so I don't know if this
> will ever be supported by drill.
>
> On Wed, Apr 8, 2015 at 4:32 PM, Hao Zhu <[email protected]> wrote:
>
> > Hi Team,
> >
> > "store.parquet.block-size" can control the parquet block size in Drill.
> > When creating a table like this:
> >
> > ALTER SESSION SET `store.format` = 'parquet';
> > ALTER SESSION SET `store.parquet.block-size` = 10485760;    --10MB block
> > size
> > CREATE TABLE dfs.root.`hao/parquet_tables/parq_10m` AS
> > (SELECT * FROM hive.`sometable`);
> >
> > All resulting files are with size 10M(Same as parquet block size).
> >
> > My question is:
> > Is there any way to create a parquet file with multiple parquet blocks?
> >
> > Thanks,
> > Hao
> >
>
>
>
> --
>  Steven Phillips
>  Software Engineer
>
>  mapr.com
>

Reply via email to