Hi

On 15 July 2014 11:31, yeshwanth kumar <[email protected]> wrote:

> hi ,
>
> i am using hbase 0.94.10 on top of hadoop 2.2.
>
> now i need to crawl the websites and store the results in hbase.
> i saw that nutch doesn't have integration with gora 0.4 and higher versions
> of hbase.
>

Use the 2.x branch instead (https://github.com/apache/nutch/tree/2.x)


>
> i went through nutch java api documentation for the possibility of crawling
> through custom code.
> where i found the nutch is totally dependent on gora.
> i don't see any other possible ways here.
>
> can someone suggest me a  way to store the crawled data using Nutch into
> hbase
>

Are there any specific reasons why you are using Nutch 2 instead of Nutch
1? If not, then you could simply write a custom IndexWriter to index the
documents into HBase and use GORA (or not) to define how to deserialize the
fields.

HTH

Julien

-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to