Re: bulk load skipping tsv files

Shahab Yunus Fri, 17 May 2013 08:57:51 -0700

If I understood your usecase correctly, then if you don't need to maintain
older versions of data then why don't you set the 'max version' parameter
for your table to 1? I believe that the increase in data even in case of
updates is due to that (?) Have you tried that?


Regards,
Shahab


On Fri, May 17, 2013 at 11:49 AM, Jinyuan Zhou <[email protected]>wrote:

> Actually,  I wanted to update each row of a table each day. no new data
> needed, only some value will be changed by recalculation.  It looks like
> every time I do, the data is doubled in table. even though it is update. I
> believe even an update will result in new hfiles and the cluster is then
> very busy on splitting region and related stuff. It need to about an hour
> undate only about 250 milliron rows. I only need one version. so, I think
> it might be faster, I just  store the calculated resesult in HFile and then
> trunk the original table, then  bulk load to the Hfiles to the  empty
> table.
> Thanks,
>
>
>
> On Fri, May 17, 2013 at 7:55 AM, Ted Yu <[email protected]> wrote:
>
> > bq. What I want is to read from some hbase table and create hfiles
> directly
> >
> > Can you describe your use case in more detail ?
> >
> > Thanks
> >
> > On Fri, May 17, 2013 at 7:52 AM, Jinyuan Zhou <[email protected]
> > >wrote:
> >
> > > Hi,
> > > I wonder if there are tool similar
> > > to org.apache.hadoop.hbase.mapreduce.ImportTsv.  IimportTsv read from
> tsv
> > > file and create HFiles which are ready to be loaded into the
> > corresponding
> > > region by another
> > > tool org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. What I
> want
> > > is to read from some hbase table and create hfiles directly  I think I
> I
> > > know how to write up such class by following steps in ImportTsv class
> > but I
> > > wonder if some one already did this.
> > > Thanks,
> > > Jack
> > >
> > > --
> > > -- Jinyuan (Jack) Zhou
> > >
> >
>
>
>
> --
> -- Jinyuan (Jack) Zhou
>

Re: bulk load skipping tsv files

Reply via email to