Thanks Steven and Jacques.

I found impala can not do this either.
Actually I know setting parquet.block.size to the same chunk size/HDFS
block size is the best practice.
My goal was to use Drill to generate the parquet files with multiple
parquet row groups/blocks.

Seems I should use pig to do this.

Thanks,
Hao



On Wed, Apr 8, 2015 at 6:05 PM, Jacques Nadeau <[email protected]> wrote:

> The reason it is not commonly used is typically the goal with Parquet is to
> have no more that a Parquet row group should always be contained within a
> single block replica set (to guarantee the possibility of total locality).
> The easiest way to guarantee this is to keep your Parquet row group format
> at or slightly smaller than your HDFS block size.
>
> On Wed, Apr 8, 2015 at 5:52 PM, Steven Phillips <[email protected]>
> wrote:
>
> > No, this is currently not possible with drill.
> >
> > It's generally not recommended to do that anyway, so I don't know if this
> > will ever be supported by drill.
> >
> > On Wed, Apr 8, 2015 at 4:32 PM, Hao Zhu <[email protected]> wrote:
> >
> > > Hi Team,
> > >
> > > "store.parquet.block-size" can control the parquet block size in Drill.
> > > When creating a table like this:
> > >
> > > ALTER SESSION SET `store.format` = 'parquet';
> > > ALTER SESSION SET `store.parquet.block-size` = 10485760;    --10MB
> block
> > > size
> > > CREATE TABLE dfs.root.`hao/parquet_tables/parq_10m` AS
> > > (SELECT * FROM hive.`sometable`);
> > >
> > > All resulting files are with size 10M(Same as parquet block size).
> > >
> > > My question is:
> > > Is there any way to create a parquet file with multiple parquet blocks?
> > >
> > > Thanks,
> > > Hao
> > >
> >
> >
> >
> > --
> >  Steven Phillips
> >  Software Engineer
> >
> >  mapr.com
> >
>

Reply via email to