Discouraging setting timestamps seems to make sense. In our situation we bulk import ever 'x' minutes and if for some reason one of the older imports fails and has to be restarted after a later import happens we would like to import the older records at the appropriate timestamp before the timestamp of the later import. It sounds like that may be one of the situations that could trigger some internals edges cases, correct?
Also, just as a separate note since the timestamp is set in the Mapper if the import has more than one mapper I wouldn't get a consistent timestamp for all the records for a given load. For our use case it is helpful to be able to identify all records associated with a given import. I went ahead and added a JIRA ( HBASE-3705 ) and uploaded the basic patch. I'll update the documentation as well. Thanks Andy -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jean-Daniel Cryans Sent: Monday, March 28, 2011 10:51 AM To: [email protected] Subject: Re: passing timestamp into importtsv... I have two thoughts about it: 1- We generally discourage users setting their own timestamps since it messes with the internals in some edge cases. Adding this functionality goes against that. 2- Almost every interface we offer lets users set their own timestamps, so to be more consistent we should indeed offer it for importtsv. So I think you should open a jira and post your patch. J-D On Mon, Mar 28, 2011 at 9:36 AM, Andy Sautins <[email protected]> wrote: > > We have been having a lot of success using the importtsv utility to load > data into HBase as described in the wiki > (http://hbase.apache.org/bulk-loads.html). The one issue we have run into is > that we would like to assign a specific timestamp to the records associated > with the import. The current ImportTsv.java class sets the timestamp to the > current time ( ts = System.currentTimeMillis() ). We have a patch we have > been using that if a system property is set ( importtsv.timestamp ) to set > the timestamp from the property. If the property is not set to use the > current time. This has been very helpful for us and allows for more control > in setting the timestamps for imported records. > > My question is is this useful functionality in general? If so I'd be happy > to submit a JIRA and patch with the appropriate changes. > > Thanks > > Andy >
