Re: habse schema design and retrieving values through REST interface

sreejith P. K. Wed, 16 Mar 2011 02:42:25 -0700

With this schema, if i can limit the column family over a particular range,
I can manage everything else. (like Select first n columns of a column
family)


Sreejith


On Wed, Mar 16, 2011 at 12:33 PM, sreejith P. K. <[email protected]>wrote:

> @ Jean-Daniel,
>
> As i told, each row key contains thousands of column family values (may be
> i am wrong with the schema design). I started REST and tried to cURL
> http:/localhost/tablename/rowname. It seems it will work only with limited
> amount of data (may be i can limit the cURL output), and how i can limit the
> column values for a particular row?
> Suppose i have two thousand urls under a keyword and i need to fetch the
> urls and should limit the result to five hundred. How it is possible??
>
> @ tsuna,
>
>  It seems http://www.elasticsearch.org/ using CouchDB right?
>
>
> On Tue, Mar 15, 2011 at 11:32 PM, Jean-Daniel Cryans 
> <[email protected]>wrote:
>
>> Can you tell why it's not able to get the bigger rows? Why would you
>> try another schema if you don't even know what's going on right now?
>> If you have the same issue with the new schema, you're back to square
>> one right?
>>
>> Looking at the logs should give you some hints.
>>
>> J-D
>>
>> On Tue, Mar 15, 2011 at 10:19 AM, sreejith P. K. <[email protected]>
>> wrote:
>> > Hello experts,
>> >
>> > I have a scenario as follows,
>> > I need to maintain a huge table for a 'web crawler' project in HBASE.
>> > Basically it contains thousands of keywords and for each keyword i need
>> to
>> > maintain a list of urls (it again will count in thousands).
>> Corresponding to
>> > each url, i need to store a number, which will in turn resemble the
>> priority
>> > value the keyword holds.
>> > Let me explain you a bit, Suppose i have a keyword 'united states', i
>> need
>> > to store about ten thousand urls corresponding to that keyword. Each
>> keyword
>> > will be holding a priority value which is an integer. Again i have
>> thousands
>> > of keywords like that. The rare thing about this is i need to do the
>> project
>> > in PHP.
>> >
>> > I have configured a hadoop-hbase cluster consists of three machines. My
>> plan
>> > was to design the schema by taking the keyword as 'row key'. The urls i
>> will
>> > keep as column family. The schema looked fine at first. I have done a
>> lot of
>> > research on how to retrieve the url list if i know the keyword. Any ways
>> i
>> > managed a way out by preg-matching the xml data out put using the url
>> > http://localhost:8080/tablename/rowkey (REST interface i used). It also
>> > works fine if the url list has a limited number of urls. When it comes
>> in
>> > thousands, it seems i cannot fetch the xml data itself!
>> > Now I am in a do or die situation. Please correct me if my schema design
>> > needs any changes (I do believe it should change!) and please help me up
>> to
>> > retrieve the column family values (urls)
>> >  corresponding to each row-key in an efficient way. Please guide me how
>> i
>> > can do the same using PHP-REST interface.
>> > Thanks in advance.
>> >
>> > Sreejith
>> >
>>
>
>
>
> --
> Sreejith PK
> Nesote Technologies (P) Ltd
>
>
>


-- 
Sreejith PK
Nesote Technologies (P) Ltd

Re: habse schema design and retrieving values through REST interface

Reply via email to