Hi Amit,

Nutch 2.1 with Hbase is stable than using MySQL as backend. Please check
the link here [0] on how to use Hbase as backend.

[0] - http://wiki.apache.org/nutch/Nutch2Tutorial


On Mon, Feb 18, 2013 at 8:07 AM, Amit Sela <[email protected]> wrote:

> Hi all,
>
> I installed Nutch 2.1 with Gora and MySQL and I tried running the inject
> job i got the following exception:
>
>  org.apache.gora.util.GoraException: java.io.IOException:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Column length
> too big for column 'text' (max = 16383); use BLOB or TEXT instead
>
> Then I found out it's a known BUG
> NUTCH-970<https://issues.apache.org/jira/browse/NUTCH-970>
>
> So what version should I use for a stable crawler to parse about 12MM urls
> ?
> I want to try it first on my laptop (with much less urls to parse...) and
> then deploy on an existing Hadoop cluster.
>
> Any suggestions ?
>
> Thanks,
>
> Amit.
>



-- 
Kiran Chitturi

Reply via email to