If I understood your usecase correctly, then if you don't need to maintain older versions of data then why don't you set the 'max version' parameter for your table to 1? I believe that the increase in data even in case of updates is due to that (?) Have you tried that?
Regards, Shahab On Fri, May 17, 2013 at 11:49 AM, Jinyuan Zhou <[email protected]>wrote: > Actually, I wanted to update each row of a table each day. no new data > needed, only some value will be changed by recalculation. It looks like > every time I do, the data is doubled in table. even though it is update. I > believe even an update will result in new hfiles and the cluster is then > very busy on splitting region and related stuff. It need to about an hour > undate only about 250 milliron rows. I only need one version. so, I think > it might be faster, I just store the calculated resesult in HFile and then > trunk the original table, then bulk load to the Hfiles to the empty > table. > Thanks, > > > > On Fri, May 17, 2013 at 7:55 AM, Ted Yu <[email protected]> wrote: > > > bq. What I want is to read from some hbase table and create hfiles > directly > > > > Can you describe your use case in more detail ? > > > > Thanks > > > > On Fri, May 17, 2013 at 7:52 AM, Jinyuan Zhou <[email protected] > > >wrote: > > > > > Hi, > > > I wonder if there are tool similar > > > to org.apache.hadoop.hbase.mapreduce.ImportTsv. IimportTsv read from > tsv > > > file and create HFiles which are ready to be loaded into the > > corresponding > > > region by another > > > tool org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. What I > want > > > is to read from some hbase table and create hfiles directly I think I > I > > > know how to write up such class by following steps in ImportTsv class > > but I > > > wonder if some one already did this. > > > Thanks, > > > Jack > > > > > > -- > > > -- Jinyuan (Jack) Zhou > > > > > > > > > -- > -- Jinyuan (Jack) Zhou >
