Hi Amit, Nutch 2.1 with Hbase is stable than using MySQL as backend. Please check the link here [0] on how to use Hbase as backend.
[0] - http://wiki.apache.org/nutch/Nutch2Tutorial On Mon, Feb 18, 2013 at 8:07 AM, Amit Sela <[email protected]> wrote: > Hi all, > > I installed Nutch 2.1 with Gora and MySQL and I tried running the inject > job i got the following exception: > > org.apache.gora.util.GoraException: java.io.IOException: > com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Column length > too big for column 'text' (max = 16383); use BLOB or TEXT instead > > Then I found out it's a known BUG > NUTCH-970<https://issues.apache.org/jira/browse/NUTCH-970> > > So what version should I use for a stable crawler to parse about 12MM urls > ? > I want to try it first on my laptop (with much less urls to parse...) and > then deploy on an existing Hadoop cluster. > > Any suggestions ? > > Thanks, > > Amit. > -- Kiran Chitturi

