Hi On 15 July 2014 11:31, yeshwanth kumar <[email protected]> wrote:
> hi , > > i am using hbase 0.94.10 on top of hadoop 2.2. > > now i need to crawl the websites and store the results in hbase. > i saw that nutch doesn't have integration with gora 0.4 and higher versions > of hbase. > Use the 2.x branch instead (https://github.com/apache/nutch/tree/2.x) > > i went through nutch java api documentation for the possibility of crawling > through custom code. > where i found the nutch is totally dependent on gora. > i don't see any other possible ways here. > > can someone suggest me a way to store the crawled data using Nutch into > hbase > Are there any specific reasons why you are using Nutch 2 instead of Nutch 1? If not, then you could simply write a custom IndexWriter to index the documents into HBase and use GORA (or not) to define how to deserialize the fields. HTH Julien -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

