@ Jean-Daniel,

As i told, each row key contains thousands of column family values (may be i
am wrong with the schema design). I started REST and tried to cURL
http:/localhost/tablename/rowname. It seems it will work only with limited
amount of data (may be i can limit the cURL output), and how i can limit the
column values for a particular row?
Suppose i have two thousand urls under a keyword and i need to fetch the
urls and should limit the result to five hundred. How it is possible??

@ tsuna,

 It seems http://www.elasticsearch.org/ using CouchDB right?

On Tue, Mar 15, 2011 at 11:32 PM, Jean-Daniel Cryans <[email protected]>wrote:

> Can you tell why it's not able to get the bigger rows? Why would you
> try another schema if you don't even know what's going on right now?
> If you have the same issue with the new schema, you're back to square
> one right?
>
> Looking at the logs should give you some hints.
>
> J-D
>
> On Tue, Mar 15, 2011 at 10:19 AM, sreejith P. K. <[email protected]>
> wrote:
> > Hello experts,
> >
> > I have a scenario as follows,
> > I need to maintain a huge table for a 'web crawler' project in HBASE.
> > Basically it contains thousands of keywords and for each keyword i need
> to
> > maintain a list of urls (it again will count in thousands). Corresponding
> to
> > each url, i need to store a number, which will in turn resemble the
> priority
> > value the keyword holds.
> > Let me explain you a bit, Suppose i have a keyword 'united states', i
> need
> > to store about ten thousand urls corresponding to that keyword. Each
> keyword
> > will be holding a priority value which is an integer. Again i have
> thousands
> > of keywords like that. The rare thing about this is i need to do the
> project
> > in PHP.
> >
> > I have configured a hadoop-hbase cluster consists of three machines. My
> plan
> > was to design the schema by taking the keyword as 'row key'. The urls i
> will
> > keep as column family. The schema looked fine at first. I have done a lot
> of
> > research on how to retrieve the url list if i know the keyword. Any ways
> i
> > managed a way out by preg-matching the xml data out put using the url
> > http://localhost:8080/tablename/rowkey (REST interface i used). It also
> > works fine if the url list has a limited number of urls. When it comes in
> > thousands, it seems i cannot fetch the xml data itself!
> > Now I am in a do or die situation. Please correct me if my schema design
> > needs any changes (I do believe it should change!) and please help me up
> to
> > retrieve the column family values (urls)
> >  corresponding to each row-key in an efficient way. Please guide me how i
> > can do the same using PHP-REST interface.
> > Thanks in advance.
> >
> > Sreejith
> >
>



-- 
Sreejith PK
Nesote Technologies (P) Ltd

Reply via email to