Thanks Steven and Jacques. I found impala can not do this either. Actually I know setting parquet.block.size to the same chunk size/HDFS block size is the best practice. My goal was to use Drill to generate the parquet files with multiple parquet row groups/blocks.
Seems I should use pig to do this. Thanks, Hao On Wed, Apr 8, 2015 at 6:05 PM, Jacques Nadeau <[email protected]> wrote: > The reason it is not commonly used is typically the goal with Parquet is to > have no more that a Parquet row group should always be contained within a > single block replica set (to guarantee the possibility of total locality). > The easiest way to guarantee this is to keep your Parquet row group format > at or slightly smaller than your HDFS block size. > > On Wed, Apr 8, 2015 at 5:52 PM, Steven Phillips <[email protected]> > wrote: > > > No, this is currently not possible with drill. > > > > It's generally not recommended to do that anyway, so I don't know if this > > will ever be supported by drill. > > > > On Wed, Apr 8, 2015 at 4:32 PM, Hao Zhu <[email protected]> wrote: > > > > > Hi Team, > > > > > > "store.parquet.block-size" can control the parquet block size in Drill. > > > When creating a table like this: > > > > > > ALTER SESSION SET `store.format` = 'parquet'; > > > ALTER SESSION SET `store.parquet.block-size` = 10485760; --10MB > block > > > size > > > CREATE TABLE dfs.root.`hao/parquet_tables/parq_10m` AS > > > (SELECT * FROM hive.`sometable`); > > > > > > All resulting files are with size 10M(Same as parquet block size). > > > > > > My question is: > > > Is there any way to create a parquet file with multiple parquet blocks? > > > > > > Thanks, > > > Hao > > > > > > > > > > > -- > > Steven Phillips > > Software Engineer > > > > mapr.com > > >
