You can instruct HBaseStorage to load a subset of the rows using the "-gt" and "-lt" options to HBaseStorage, documented here [1].
I don't believe querying by timestamp is currently supported in Pig, based on the comments to [2]. There is a standalone JIRA that's been created [3]. Norbert [1] http://ofps.oreilly.com/titles/9781449302641/community.html#hbase_options_table [2] https://issues.apache.org/jira/browse/PIG-1782 [3] https://issues.apache.org/jira/browse/PIG-1832 On Thu, Jul 28, 2011 at 6:18 AM, Vincent Barat <[email protected]>wrote: > Hi, > > I'd like to make PIG load only a subset of an HBase table, based on the > timestamp of the records, or on the key of the rows. > > As an example, I'd like to load only records that have a timestamp > N, or > a key > "something". > > I know that HBase can handle scanners that are highly optimized to perform > this kind of things, and it would greatly improve the time needed to load my > data. > > Is there any way to do this ? > If not, it is planned to be added in the HBase loader ? > If not, is it technically possible to do it ? > If yes, can I contribute and propose a patch on that ? > > Thank a lot ! >
