You could save some time by using http://hbase.apache.org/book.html#copytable
J-D On Mon, May 6, 2013 at 11:19 AM, Gaurav Pandit <[email protected]>wrote: > Thanks for your inputs, J-D, Shahab. > > Sorry if I was ambiguous in stating what I wanted to do. Just to restate > the goal in one line: > "Extract all rows (with rowkey, columns) from an Hbase table as of a given > time using HBase timestamp/versions, in a plain text file format" > > J-D, we have about 5 millions rows (but each could have multiple versions) > for now. So I think scanning the whole table is okay for now. But it seems > it may not be the best option for a big table. Also, as I mentioned > earlier, I think Hive/Pig does not let you access Hbase for a timestamp. If > they can do that, it's the approach I wanted to take. > > But your suggestion of using *export* got me thinking, and the following > may work out well: > 1. Export HBase table for a given timestamp using "*export*" utility . > 2. Import the file into another "temp" HBase table. > 3. Use Pig/Hive to extract the table and put it on an HDFS file in plain > text (or onto an RDBMS). > 4. Let the client retrieve the file. > > Shahab, in my case, I was talking about using internal timestamp. But > thanks for your input - I was unaware of Pig DBStorage loader! It may come > handy in some other scenario. > > > Thanks, > Gaurav > > > > On Mon, May 6, 2013 at 1:50 PM, Shahab Yunus <[email protected]> > wrote: > > > Gaurav, when you say that you want older versions of the data then are > you > > talking about filtering on the internal timestamps (and hence the > internal > > versioning mechanism) or your data has a separate column (basically using > > custom versioning) for versioning? If the later then you can use Pig. It > > can dump your data directly into an RDBMS like MySQL too as a DBStorage > > loader/store is available. > > > > Might not be totally applicable to your issue but just wanted to share a > > thought. > > > > Regards, > > Shahab > > > > > > On Mon, May 6, 2013 at 1:40 PM, Gaurav Pandit <[email protected] > > >wrote: > > > > > Thanks J-D. > > > > > > Wouldn't the export utility export the data in sequence file format? My > > > goal is to generate data in some sort of delimited plain text file and > > hand > > > it over the caller. > > > > > > - Gaurav > > > > > > > > > On Mon, May 6, 2013 at 1:33 PM, Jean-Daniel Cryans < > [email protected] > > > >wrote: > > > > > > > You can use the Export MR job provided with HBase, it lets you set a > > time > > > > range: http://hbase.apache.org/book.html#export > > > > > > > > J-D > > > > > > > > > > > > On Mon, May 6, 2013 at 10:27 AM, Gaurav Pandit < > > [email protected] > > > > >wrote: > > > > > > > > > Hi Hbase users, > > > > > > > > > > We have a use case where we need to know how data looked at a given > > > time > > > > in > > > > > past. > > > > > > > > > > The data is stored in HBase of course, with multiple versions. And, > > the > > > > > goal is to be able to extractall records (rowkey, columns) as of a > > > given > > > > > timestamp, to a file. > > > > > > > > > > > > > > > I am trying to figure out the best way to achieve this. > > > > > > > > > > The options I know are: > > > > > 1. Write a *Java* client using HBase Java API, and scan the hbase > > > table. > > > > > 2. Do the same, but over *Thrift* HBase API using Perl (since > > > > > our environment is mostly Perl). > > > > > 3. Use *Hive *to point to HBase table, and use Sqoop to extract > data > > > from > > > > > the Hive table and onto client / RDBMS. > > > > > 4. Use *Pig *to extract data from HBase table and dump it on HDFS > and > > > > move > > > > > the file over to the client. > > > > > > > > > > So far, I have successfully implemented option (2). I am still > > running > > > > some > > > > > tests to see how it performs, but it works fine as such. > > > > > > > > > > My questions are: > > > > > 1. Is option (3) or (4) even possible? I am not sure if we can > access > > > the > > > > > table for a given timestamp over Pig or Hive. > > > > > 2. Is there any other better way of achieving this? > > > > > > > > > > > > > > > Thanks! > > > > > Gaurav > > > > > > > > > > > > > > >
