Interesting, I wonder if that would/could be an addition, you don't need the meta store to infer those partitions, you can see that in the directory listing. I will play around and let you know what I find.
Thanks! John On Wed, Aug 5, 2015 at 3:37 PM, rahul challapalli < [email protected]> wrote: > John, > > Drill has no idea about the names of your partitions since that information > is part of the hive metastore. You can get partition pruning if you modify > your query like below > > select * from dfs.hive_parq where dir0=val1; (dir0 is equivalent to part1, > and dir1 would be equivalent to part2) > > - Rahul > > On Wed, Aug 5, 2015 at 1:21 PM, John Omernik <[email protected]> wrote: > > > So , what I am getting at is say a table was created in Hive with PArquet > > files > > > > CREATE table hive_parq(field1 STRING, field2 STRING) Partitioned by part1 > > STRING, part2 STRING STORED as Parquet. > > > > That creates a directory named hive_part, then there will be directories > in > > under that part1=val1, then under that part2=val1, part2=val2 , then the > > actual parquet files. > > > > Without the Hive Metastore, will Drill know that it's partitioned based > on > > the directory name, and I if I say, select * from dfs.hive_parq where > > part1=val1 will it only look in the /hive_parq/part1=val1 one folders or > > will it look at all subdirectories, because the partitioned fields are > not > > part of the parquet files and we don't have metastore information to work > > with. > > > > Thanks! > > > > > > > > On Wed, Aug 5, 2015 at 3:13 PM, Ramana I N <[email protected]> wrote: > > > > > Yes. You can use the dfs plugin in this case. > > > > > > Regards > > > Ramana > > > > > > > > > On Wed, Aug 5, 2015 at 1:02 PM, John Omernik <[email protected]> wrote: > > > > > > > Would Drill know to partition prune based on directories if it didn't > > > have > > > > the hive metastore to define the partitions at the directory level? > > > > > > > > > > > > On Wed, Aug 5, 2015 at 11:01 AM, Neeraja Rentachintala < > > > > [email protected]> wrote: > > > > > > > > > John > > > > > Both would work i.e query partitioned directories directly using > file > > > > > system storage plug in or via Hive table. > > > > > > > > > > On Wed, Aug 5, 2015 at 8:58 AM, John Omernik <[email protected]> > > wrote: > > > > > > > > > > > After reading about Parquet Partition Pruning in Drill 1.1, I was > > > > > wondering > > > > > > if there is still partitioning based on "hive like" partitions. > > I.e. > > > I > > > > > have > > > > > > a process that is making a hive table with Parquet files. It's > > using > > > > > > Partitions (Directories). Do I need Drill to read that data > using > > > the > > > > > Hive > > > > > > Plugin so it's aware of the partitions and can prune, or can I > just > > > use > > > > > the > > > > > > DFS plugin, point it at the root of the table in Hive, and let it > > go, > > > > > > inferring Schema and partitions based on the directories that > > exist? > > > > > > > > > > > > John > > > > > > > > > > > > > > > > > > > > >
