With this schema, if i can limit the column family over a particular range, I can manage everything else. (like Select first n columns of a column family)
Sreejith On Wed, Mar 16, 2011 at 12:33 PM, sreejith P. K. <[email protected]>wrote: > @ Jean-Daniel, > > As i told, each row key contains thousands of column family values (may be > i am wrong with the schema design). I started REST and tried to cURL > http:/localhost/tablename/rowname. It seems it will work only with limited > amount of data (may be i can limit the cURL output), and how i can limit the > column values for a particular row? > Suppose i have two thousand urls under a keyword and i need to fetch the > urls and should limit the result to five hundred. How it is possible?? > > @ tsuna, > > It seems http://www.elasticsearch.org/ using CouchDB right? > > > On Tue, Mar 15, 2011 at 11:32 PM, Jean-Daniel Cryans > <[email protected]>wrote: > >> Can you tell why it's not able to get the bigger rows? Why would you >> try another schema if you don't even know what's going on right now? >> If you have the same issue with the new schema, you're back to square >> one right? >> >> Looking at the logs should give you some hints. >> >> J-D >> >> On Tue, Mar 15, 2011 at 10:19 AM, sreejith P. K. <[email protected]> >> wrote: >> > Hello experts, >> > >> > I have a scenario as follows, >> > I need to maintain a huge table for a 'web crawler' project in HBASE. >> > Basically it contains thousands of keywords and for each keyword i need >> to >> > maintain a list of urls (it again will count in thousands). >> Corresponding to >> > each url, i need to store a number, which will in turn resemble the >> priority >> > value the keyword holds. >> > Let me explain you a bit, Suppose i have a keyword 'united states', i >> need >> > to store about ten thousand urls corresponding to that keyword. Each >> keyword >> > will be holding a priority value which is an integer. Again i have >> thousands >> > of keywords like that. The rare thing about this is i need to do the >> project >> > in PHP. >> > >> > I have configured a hadoop-hbase cluster consists of three machines. My >> plan >> > was to design the schema by taking the keyword as 'row key'. The urls i >> will >> > keep as column family. The schema looked fine at first. I have done a >> lot of >> > research on how to retrieve the url list if i know the keyword. Any ways >> i >> > managed a way out by preg-matching the xml data out put using the url >> > http://localhost:8080/tablename/rowkey (REST interface i used). It also >> > works fine if the url list has a limited number of urls. When it comes >> in >> > thousands, it seems i cannot fetch the xml data itself! >> > Now I am in a do or die situation. Please correct me if my schema design >> > needs any changes (I do believe it should change!) and please help me up >> to >> > retrieve the column family values (urls) >> > corresponding to each row-key in an efficient way. Please guide me how >> i >> > can do the same using PHP-REST interface. >> > Thanks in advance. >> > >> > Sreejith >> > >> > > > > -- > Sreejith PK > Nesote Technologies (P) Ltd > > > -- Sreejith PK Nesote Technologies (P) Ltd
