sreejith: I leave your second question to other experts. Let me try to answer schema question.
You didn't mention how URLs and keywords scale (there're 1 trillion URLs in the world). So I base my suggestion on what you outlined. First you need to use hash/index to represent each URL. You can then use a unique separator to concatenate each (keyword, URL index) pair as row key. E.g. Assuming "yahoo.com" carries index 1, "google.com" carries index 2, etc For keyword a, you would have row "a-1" with value of 2, row "b-1" with value of 4, row "c-1" with value of 1. Same goes with other (keyword, URL index) pairs. After receiving a query, you can use PrefixFilter to retrieve relevant (keyword, URL index) pairs. Then you can easily establish (keyword, priority) mapping for each URL. Hope this helps. On Fri, Mar 18, 2011 at 10:15 PM, sreejith P. K. <[email protected]>wrote: > Thank you all for the response, > > For quite a long time i am here with the same issue, > > I need to maintain a huge amount of data for a web crawler. I have a list > of > thousands of URL's. For each URL, again I have thousands of keywords. The > keywords will be holding a priority value. > > suppose i have the data, > > For url "yahoo.com". > > I have keywors a(priority=2),b(priority=4),c(priority=1) > > For url "google.com", > > I have keywors a(priority=1),b(priority=3),c(priority=2) > > For url "facebook.com", > > I have keywors g(priority=1),b(priority=4),h(priority=2) > > And if i type "a b" in my web crawler, i need to fetch urls in the order, > > > yahoo.com(Where priority is 2+4=6 that is priority of a+priority of b) > > > google.com(where priority is 1+3=4 that is priority of a+priority of b) > > > facebook.com(where priority is 4 that is priority of b) > > > I have to do the same in PHP. Please help up creating a good schema for > the > same and suggest which connector i need to follow (Thrift or REST)? > > If it use REST, i have googled a lot to find a solution. Stargate seems > works some extend, but I need a clear cut method. > > Thanks. > > > On Sat, Mar 19, 2011 at 2:50 AM, Andrew Purtell <[email protected]> > wrote: > > > Whether to use REST or Thrift or Avro connectors is a matter of > > architecture, depends what you are trying to do. > > > > In all cases, we are here to help you if the system does not appear to > > function normally. We rely on volunteer effort for this. It is unlikely > > someone will volunteer time to help you code your application, however. > > After all, it is your application. > > > > - Andy > > > > > > > From: Stack <[email protected]> > > > Subject: Re: Stargate and Hbase > > > To: [email protected] > > > Date: Friday, March 18, 2011, 2:14 PM > > > What Ted says and we use thrift for > > > going from php to hbase. > > > St.Ack > > > > > > On Fri, Mar 18, 2011 at 5:11 AM, sreejith P. K. <[email protected] > > > > > wrote: > > > > Can anybody help me on coding PHP-hbase using Stargate > > > interface? > > > > I have posted queries on the same in many forums, > > > Unfortunately no replies > > > > yet. Nobody seems interested in PHP! > > > > > > > > -- > > > > > > > > > > > > > > > > > > -- > Sreejith PK > Nesote Technologies (P) Ltd >
