Re: bulk load skipping tsv files

Jinyuan Zhou Fri, 17 May 2013 13:35:30 -0700

I had thought about coprocessor. But I had an impression that coprocessor
is last option one shoud try becuase it is so invasive to the jvm running
hbase. Not sure about current status though. However, what the croprocessor
can give me in this case is less network load. My problem is the hbase's
housekeeping work load as a result of increased data valume. For this  part
, coprocessor may not help.
Thanks,




On Fri, May 17, 2013 at 10:05 AM, Ted Yu <[email protected]> wrote:

> Jinyuan:
>
> bq. no new data needed, only some value will be changed by recalculation.
>
> Have you considered using coprocessor to fullfil the above task ?
>
> Cheers
>
> On Fri, May 17, 2013 at 8:57 AM, Shahab Yunus <[email protected]
> >wrote:
>
> > If I understood your usecase correctly, then if you don't need to
> maintain
> > older versions of data then why don't you set the 'max version' parameter
> > for your table to 1? I believe that the increase in data even in case of
> > updates is due to that (?) Have you tried that?
> >
> > Regards,
> > Shahab
> >
> >
> > On Fri, May 17, 2013 at 11:49 AM, Jinyuan Zhou <[email protected]
> > >wrote:
> >
> > > Actually,  I wanted to update each row of a table each day. no new data
> > > needed, only some value will be changed by recalculation.  It looks
> like
> > > every time I do, the data is doubled in table. even though it is
> update.
> > I
> > > believe even an update will result in new hfiles and the cluster is
> then
> > > very busy on splitting region and related stuff. It need to about an
> hour
> > > undate only about 250 milliron rows. I only need one version. so, I
> think
> > > it might be faster, I just  store the calculated resesult in HFile and
> > then
> > > trunk the original table, then  bulk load to the Hfiles to the  empty
> > > table.
> > > Thanks,
> > >
> > >
> > >
> > > On Fri, May 17, 2013 at 7:55 AM, Ted Yu <[email protected]> wrote:
> > >
> > > > bq. What I want is to read from some hbase table and create hfiles
> > > directly
> > > >
> > > > Can you describe your use case in more detail ?
> > > >
> > > > Thanks
> > > >
> > > > On Fri, May 17, 2013 at 7:52 AM, Jinyuan Zhou <
> [email protected]
> > > > >wrote:
> > > >
> > > > > Hi,
> > > > > I wonder if there are tool similar
> > > > > to org.apache.hadoop.hbase.mapreduce.ImportTsv.  IimportTsv read
> from
> > > tsv
> > > > > file and create HFiles which are ready to be loaded into the
> > > > corresponding
> > > > > region by another
> > > > > tool org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. What
> I
> > > want
> > > > > is to read from some hbase table and create hfiles directly  I
> think
> > I
> > > I
> > > > > know how to write up such class by following steps in ImportTsv
> class
> > > > but I
> > > > > wonder if some one already did this.
> > > > > Thanks,
> > > > > Jack
> > > > >
> > > > > --
> > > > > -- Jinyuan (Jack) Zhou
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Jinyuan (Jack) Zhou
> > >
> >
>



-- 
-- Jinyuan (Jack) Zhou

Re: bulk load skipping tsv files

Reply via email to