Actually, I wanted to update each row of a table each day. no new data needed, only some value will be changed by recalculation. It looks like every time I do, the data is doubled in table. even though it is update. I believe even an update will result in new hfiles and the cluster is then very busy on splitting region and related stuff. It need to about an hour undate only about 250 milliron rows. I only need one version. so, I think it might be faster, I just store the calculated resesult in HFile and then trunk the original table, then bulk load to the Hfiles to the empty table. Thanks,
On Fri, May 17, 2013 at 7:55 AM, Ted Yu <[email protected]> wrote: > bq. What I want is to read from some hbase table and create hfiles directly > > Can you describe your use case in more detail ? > > Thanks > > On Fri, May 17, 2013 at 7:52 AM, Jinyuan Zhou <[email protected] > >wrote: > > > Hi, > > I wonder if there are tool similar > > to org.apache.hadoop.hbase.mapreduce.ImportTsv. IimportTsv read from tsv > > file and create HFiles which are ready to be loaded into the > corresponding > > region by another > > tool org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. What I want > > is to read from some hbase table and create hfiles directly I think I I > > know how to write up such class by following steps in ImportTsv class > but I > > wonder if some one already did this. > > Thanks, > > Jack > > > > -- > > -- Jinyuan (Jack) Zhou > > > -- -- Jinyuan (Jack) Zhou
