Re: bulk load skipping tsv files

Jinyuan Zhou Fri, 17 May 2013 08:49:34 -0700

Actually,  I wanted to update each row of a table each day. no new data
needed, only some value will be changed by recalculation.  It looks like
every time I do, the data is doubled in table. even though it is update. I
believe even an update will result in new hfiles and the cluster is then
very busy on splitting region and related stuff. It need to about an hour
undate only about 250 milliron rows. I only need one version. so, I think
it might be faster, I just  store the calculated resesult in HFile and then
trunk the original table, then  bulk load to the Hfiles to the  empty
table.
Thanks,




On Fri, May 17, 2013 at 7:55 AM, Ted Yu <[email protected]> wrote:

> bq. What I want is to read from some hbase table and create hfiles directly
>
> Can you describe your use case in more detail ?
>
> Thanks
>
> On Fri, May 17, 2013 at 7:52 AM, Jinyuan Zhou <[email protected]
> >wrote:
>
> > Hi,
> > I wonder if there are tool similar
> > to org.apache.hadoop.hbase.mapreduce.ImportTsv.  IimportTsv read from tsv
> > file and create HFiles which are ready to be loaded into the
> corresponding
> > region by another
> > tool org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. What I want
> > is to read from some hbase table and create hfiles directly  I think I I
> > know how to write up such class by following steps in ImportTsv class
> but I
> > wonder if some one already did this.
> > Thanks,
> > Jack
> >
> > --
> > -- Jinyuan (Jack) Zhou
> >
>



-- 
-- Jinyuan (Jack) Zhou

Re: bulk load skipping tsv files

Reply via email to