Hi Eyal, For just loading Parquet files the Parquet Pig loader is okay, although I don't think it lets you use partition values in the dataset later. I know the plain old PigStorage has a trick with -tagFiles option but not sure if that'd be enough in Michael's case and also if that's something Parquet Loader supports.
Thanks On Thu, 30 Aug 2018 at 16:10, Eyal Allweil <[email protected]> wrote: > Hi Michael, > You can also use the Parquet Pig loader (especially if you're not working > with Hive). Here's a link to the Maven repository for it. > > https://mvnrepository.com/artifact/org.apache.parquet/parquet-pig/1.10.0 > Regards,Eyal > <https://mvnrepository.com/artifact/org.apache.parquet/parquet-pig/1.10.0Regards,Eyal> > > > > > > On Tuesday, August 28, 2018, 2:40:36 PM GMT+3, Adam Szita > <[email protected]> wrote: > > Hi Michael, > > Yes you can use HCatLoader to do this. > The requirement is that you have a Hive table defined on top of your data > (probably pointing to s3://path/to/files) (and Hive MetaStore has all the > relevant meta/schema information). > If you do not have a Hive table yet, you can go ahead and define it in Hive > by manually specifying schema information, and after that partitions can be > added automatically via the 'msck repair' function of Hive. > > Hope this helps, > Adam > > > On Mon, 27 Aug 2018 at 19:18, Michael Doo <[email protected]> wrote: > > > Hello, > > > > I’m trying to read in Parquet data into Pig that is partitioned (so it’s > > stored in S3 like > > > s3://path/to/files/some_flag=true/part-00095-a2a6230b-9750-48e4-9cd0-b553ffc220de.c000.gz.parquet). > > I’d like to load it into Pig and add the partitions as columns. I’ve read > > some resources suggesting using the HCatLoader, but so far haven’t had > > success. > > > > Any advice would be welcome. > > > > ~ Michael > >
