Mohit, I am suggesting setting up a whole Hive warehouse. This way your folders will look like /user/hive/warehouse/yourdataset/date=2012-09-11 /user/hive/warehouse/yourdataset/date=2012-09-12 ... All the partitions' metadata will be kept in a RDBMS, so when you query them with Hive it will look like select * from yourdataset where date = 2012-09-11 and it will be fast
HCatalog is a layer that provides this Hive's functionality to Pig and MapReduce, so in Pig you can FILTER by those dates. http://incubator.apache.org/hcatalog/docs/r0.4.0/loadstore.html#Load+Examples Best Regards On Tue, Sep 11, 2012 at 3:29 AM, Mohit Anchlia <[email protected]> wrote: > On Mon, Sep 10, 2012 at 4:17 PM, Ruslan Al-Fakikh <[email protected]>wrote: > >> Mohit, >> >> I guess you could use parameters substitution here >> http://wiki.apache.org/pig/ParameterSubstitution >> >> thanks this works. > > >> Also, a note about your architecture: >> > > Are you suggesting change to the path names or your suggestion is to use > HCatalog with pig? > > >> You can consider using Hive partitions to effectively select >> appropriate dates in the folder names. But as your tool is Pig, not >> Hive, you can use HCatalog as a layer >> >> Best Regards >> >> On Tue, Sep 11, 2012 at 3:11 AM, Mohit Anchlia <[email protected]> >> wrote: >> > Our input path is something like YYYY/MM/DD/HH/input and we like to write >> > to YYYY/MM/DD/HH/output . Is it possible to get the input path as a >> String >> > and convert it to YYYY/MM/DD/HH/output that I can use in "store into" >> > clause? >>
