@ Jean-Daniel, As i told, each row key contains thousands of column family values (may be i am wrong with the schema design). I started REST and tried to cURL http:/localhost/tablename/rowname. It seems it will work only with limited amount of data (may be i can limit the cURL output), and how i can limit the column values for a particular row? Suppose i have two thousand urls under a keyword and i need to fetch the urls and should limit the result to five hundred. How it is possible??
@ tsuna, It seems http://www.elasticsearch.org/ using CouchDB right? On Tue, Mar 15, 2011 at 11:32 PM, Jean-Daniel Cryans <[email protected]>wrote: > Can you tell why it's not able to get the bigger rows? Why would you > try another schema if you don't even know what's going on right now? > If you have the same issue with the new schema, you're back to square > one right? > > Looking at the logs should give you some hints. > > J-D > > On Tue, Mar 15, 2011 at 10:19 AM, sreejith P. K. <[email protected]> > wrote: > > Hello experts, > > > > I have a scenario as follows, > > I need to maintain a huge table for a 'web crawler' project in HBASE. > > Basically it contains thousands of keywords and for each keyword i need > to > > maintain a list of urls (it again will count in thousands). Corresponding > to > > each url, i need to store a number, which will in turn resemble the > priority > > value the keyword holds. > > Let me explain you a bit, Suppose i have a keyword 'united states', i > need > > to store about ten thousand urls corresponding to that keyword. Each > keyword > > will be holding a priority value which is an integer. Again i have > thousands > > of keywords like that. The rare thing about this is i need to do the > project > > in PHP. > > > > I have configured a hadoop-hbase cluster consists of three machines. My > plan > > was to design the schema by taking the keyword as 'row key'. The urls i > will > > keep as column family. The schema looked fine at first. I have done a lot > of > > research on how to retrieve the url list if i know the keyword. Any ways > i > > managed a way out by preg-matching the xml data out put using the url > > http://localhost:8080/tablename/rowkey (REST interface i used). It also > > works fine if the url list has a limited number of urls. When it comes in > > thousands, it seems i cannot fetch the xml data itself! > > Now I am in a do or die situation. Please correct me if my schema design > > needs any changes (I do believe it should change!) and please help me up > to > > retrieve the column family values (urls) > > corresponding to each row-key in an efficient way. Please guide me how i > > can do the same using PHP-REST interface. > > Thanks in advance. > > > > Sreejith > > > -- Sreejith PK Nesote Technologies (P) Ltd
