Re: habse schema design and retrieving values through REST interface

Andrew Purtell Wed, 16 Mar 2011 15:13:06 -0700

>  This facility is not exposed in the REST API at the moment
> (not that I know of -- please someone correct me if I'm
> wrong).


Wrong. :-)

See ScannerModel in the rest package: 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/rest/model/ScannerModel.html

ScannerModel#setBatch

   - Andy



--- On Wed, 3/16/11, Stack <[email protected]> wrote:

> From: Stack <[email protected]>
> Subject: Re: habse schema design and retrieving values through REST interface
> To: [email protected]
> Date: Wednesday, March 16, 2011, 10:47 AM
> You can limit the return when
> scanning from the java api; see
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setBatch(int)
>  This facility is not exposed in the REST API at the moment
> (not that
> I know of -- please someone correct me if I'm
> wrong).   So, yes, wide
> rows, if thousands of elements of some size, since they
> need to be
> composed all in RAM, could bring on an OOME if the composed
> size >
> available heap.
> 
> St.Ack
> 
> 
> On Wed, Mar 16, 2011 at 2:41 AM, sreejith P. K. <[email protected]>
> wrote:
> > With this schema, if i can limit the column family
> over a particular range,
> > I can manage everything else. (like Select first n
> columns of a column
> > family)
> >
> > Sreejith
> >
> >
> > On Wed, Mar 16, 2011 at 12:33 PM, sreejith P. K.
> <[email protected]>wrote:
> >
> >> @ Jean-Daniel,
> >>
> >> As i told, each row key contains thousands of
> column family values (may be
> >> i am wrong with the schema design). I started REST
> and tried to cURL
> >> http:/localhost/tablename/rowname. It seems it
> will work only with limited
> >> amount of data (may be i can limit the cURL
> output), and how i can limit the
> >> column values for a particular row?
> >> Suppose i have two thousand urls under a keyword
> and i need to fetch the
> >> urls and should limit the result to five hundred.
> How it is possible??
> >>
> >> @ tsuna,
> >>
> >>  It seems http://www.elasticsearch.org/ using
> CouchDB right?
> >>
> >>
> >> On Tue, Mar 15, 2011 at 11:32 PM, Jean-Daniel
> Cryans <[email protected]>wrote:
> >>
> >>> Can you tell why it's not able to get the
> bigger rows? Why would you
> >>> try another schema if you don't even know
> what's going on right now?
> >>> If you have the same issue with the new
> schema, you're back to square
> >>> one right?
> >>>
> >>> Looking at the logs should give you some
> hints.
> >>>
> >>> J-D
> >>>
> >>> On Tue, Mar 15, 2011 at 10:19 AM, sreejith P.
> K. <[email protected]>
> >>> wrote:
> >>> > Hello experts,
> >>> >
> >>> > I have a scenario as follows,
> >>> > I need to maintain a huge table for a
> 'web crawler' project in HBASE.
> >>> > Basically it contains thousands of
> keywords and for each keyword i need
> >>> to
> >>> > maintain a list of urls (it again will
> count in thousands).
> >>> Corresponding to
> >>> > each url, i need to store a number, which
> will in turn resemble the
> >>> priority
> >>> > value the keyword holds.
> >>> > Let me explain you a bit, Suppose i have
> a keyword 'united states', i
> >>> need
> >>> > to store about ten thousand urls
> corresponding to that keyword. Each
> >>> keyword
> >>> > will be holding a priority value which is
> an integer. Again i have
> >>> thousands
> >>> > of keywords like that. The rare thing
> about this is i need to do the
> >>> project
> >>> > in PHP.
> >>> >
> >>> > I have configured a hadoop-hbase cluster
> consists of three machines. My
> >>> plan
> >>> > was to design the schema by taking the
> keyword as 'row key'. The urls i
> >>> will
> >>> > keep as column family. The schema looked
> fine at first. I have done a
> >>> lot of
> >>> > research on how to retrieve the url list
> if i know the keyword. Any ways
> >>> i
> >>> > managed a way out by preg-matching the
> xml data out put using the url
> >>> > http://localhost:8080/tablename/rowkey (REST interface
> i used). It also
> >>> > works fine if the url list has a limited
> number of urls. When it comes
> >>> in
> >>> > thousands, it seems i cannot fetch the
> xml data itself!
> >>> > Now I am in a do or die situation. Please
> correct me if my schema design
> >>> > needs any changes (I do believe it should
> change!) and please help me up
> >>> to
> >>> > retrieve the column family values (urls)
> >>> >  corresponding to each row-key in an
> efficient way. Please guide me how
> >>> i
> >>> > can do the same using PHP-REST
> interface.
> >>> > Thanks in advance.
> >>> >
> >>> > Sreejith
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >> Sreejith PK
> >> Nesote Technologies (P) Ltd
> >>
> >>
> >>
> >
> >
> > --
> > Sreejith PK
> > Nesote Technologies (P) Ltd
> >
>

Re: habse schema design and retrieving values through REST interface

Reply via email to