Re: Parquet Partitions

John Omernik Thu, 06 Aug 2015 05:18:07 -0700

Interesting, I wonder if that would/could be an addition, you don't need
the meta store to infer those partitions, you can see that in the directory
listing.  I will play around and let you know what I find.


Thanks!

John

On Wed, Aug 5, 2015 at 3:37 PM, rahul challapalli <
[email protected]> wrote:

> John,
>
> Drill has no idea about the names of your partitions since that information
> is part of the hive metastore. You can get partition pruning if you modify
> your query like below
>
> select * from dfs.hive_parq where dir0=val1; (dir0 is equivalent to part1,
> and dir1 would be equivalent to part2)
>
> - Rahul
>
> On Wed, Aug 5, 2015 at 1:21 PM, John Omernik <[email protected]> wrote:
>
> > So , what I am getting at is say a table was created in Hive with PArquet
> > files
> >
> > CREATE table hive_parq(field1 STRING, field2 STRING) Partitioned by part1
> > STRING, part2 STRING STORED as Parquet.
> >
> > That creates a directory named hive_part, then there will be directories
> in
> > under that part1=val1,  then under that part2=val1, part2=val2 , then the
> > actual parquet files.
> >
> > Without the Hive Metastore, will Drill know that it's partitioned based
> on
> > the directory name, and I if I say, select * from dfs.hive_parq where
> > part1=val1 will it only look in the /hive_parq/part1=val1 one folders or
> > will it look at all subdirectories, because the partitioned fields are
> not
> > part of the parquet files and we don't have metastore information to work
> > with.
> >
> > Thanks!
> >
> >
> >
> > On Wed, Aug 5, 2015 at 3:13 PM, Ramana I N <[email protected]> wrote:
> >
> > > Yes. You can use the dfs plugin in this case.
> > >
> > > Regards
> > > Ramana
> > >
> > >
> > > On Wed, Aug 5, 2015 at 1:02 PM, John Omernik <[email protected]> wrote:
> > >
> > > > Would Drill know to partition prune based on directories if it didn't
> > > have
> > > > the hive metastore to define the partitions at the directory level?
> > > >
> > > >
> > > > On Wed, Aug 5, 2015 at 11:01 AM, Neeraja Rentachintala <
> > > > [email protected]> wrote:
> > > >
> > > > > John
> > > > > Both would work i.e query partitioned directories directly using
> file
> > > > > system storage plug in or via Hive table.
> > > > >
> > > > > On Wed, Aug 5, 2015 at 8:58 AM, John Omernik <[email protected]>
> > wrote:
> > > > >
> > > > > > After reading about Parquet Partition Pruning in Drill 1.1, I was
> > > > > wondering
> > > > > > if there is still partitioning based on "hive like" partitions.
> > I.e.
> > > I
> > > > > have
> > > > > > a process that is making a hive table with Parquet files.  It's
> > using
> > > > > > Partitions (Directories).  Do I need Drill to read that data
> using
> > > the
> > > > > Hive
> > > > > > Plugin so it's aware of the partitions and can prune, or can I
> just
> > > use
> > > > > the
> > > > > > DFS plugin, point it at the root of the table in Hive, and let it
> > go,
> > > > > > inferring Schema and partitions based on the directories that
> > exist?
> > > > > >
> > > > > > John
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Parquet Partitions

Reply via email to