Re: Parquet Partitions

rahul challapalli Wed, 05 Aug 2015 13:39:32 -0700

John,

Drill has no idea about the names of your partitions since that information
is part of the hive metastore. You can get partition pruning if you modify
your query like below


select * from dfs.hive_parq where dir0=val1; (dir0 is equivalent to part1,
and dir1 would be equivalent to part2)

- Rahul

On Wed, Aug 5, 2015 at 1:21 PM, John Omernik <[email protected]> wrote:

> So , what I am getting at is say a table was created in Hive with PArquet
> files
>
> CREATE table hive_parq(field1 STRING, field2 STRING) Partitioned by part1
> STRING, part2 STRING STORED as Parquet.
>
> That creates a directory named hive_part, then there will be directories in
> under that part1=val1,  then under that part2=val1, part2=val2 , then the
> actual parquet files.
>
> Without the Hive Metastore, will Drill know that it's partitioned based on
> the directory name, and I if I say, select * from dfs.hive_parq where
> part1=val1 will it only look in the /hive_parq/part1=val1 one folders or
> will it look at all subdirectories, because the partitioned fields are not
> part of the parquet files and we don't have metastore information to work
> with.
>
> Thanks!
>
>
>
> On Wed, Aug 5, 2015 at 3:13 PM, Ramana I N <[email protected]> wrote:
>
> > Yes. You can use the dfs plugin in this case.
> >
> > Regards
> > Ramana
> >
> >
> > On Wed, Aug 5, 2015 at 1:02 PM, John Omernik <[email protected]> wrote:
> >
> > > Would Drill know to partition prune based on directories if it didn't
> > have
> > > the hive metastore to define the partitions at the directory level?
> > >
> > >
> > > On Wed, Aug 5, 2015 at 11:01 AM, Neeraja Rentachintala <
> > > [email protected]> wrote:
> > >
> > > > John
> > > > Both would work i.e query partitioned directories directly using file
> > > > system storage plug in or via Hive table.
> > > >
> > > > On Wed, Aug 5, 2015 at 8:58 AM, John Omernik <[email protected]>
> wrote:
> > > >
> > > > > After reading about Parquet Partition Pruning in Drill 1.1, I was
> > > > wondering
> > > > > if there is still partitioning based on "hive like" partitions.
> I.e.
> > I
> > > > have
> > > > > a process that is making a hive table with Parquet files.  It's
> using
> > > > > Partitions (Directories).  Do I need Drill to read that data using
> > the
> > > > Hive
> > > > > Plugin so it's aware of the partitions and can prune, or can I just
> > use
> > > > the
> > > > > DFS plugin, point it at the root of the table in Hive, and let it
> go,
> > > > > inferring Schema and partitions based on the directories that
> exist?
> > > > >
> > > > > John
> > > > >
> > > >
> > >
> >
>

Re: Parquet Partitions

Reply via email to