Hi Rui, Yes you are completely free to move to the 1.x trunk. Admittedly this is more stable.
I would advise you to try to have the fecther running (on one fetch task) for less that 2 hours (maybe around and hour or even less if possible). This will prevent you from loosing too much data + time + effort should a fetch turn bad. Lewis On Thu, Jan 3, 2013 at 8:31 PM, 高睿 <[email protected]> wrote: > Failed again with Hsql 2.2.8 after 2 hours' crawling. Should I go back to > Nutch 1.5 or 1.6? It seems there are too many issues in Nutch 2.1. What a > pity. > > console: > Skipping http://blog.sina.com.cn/s/blog_blog_557f024c010.html; different > batch id (null) > Exception in thread "main" java.lang.RuntimeException: job failed: > name=parse, jobid=job_local_0008 > at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54) > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:251) > at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) > at org.apache.nutch.crawl.Crawler.run(Crawler.java:171) > at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.Crawler.main(Crawler.java:257) > > hadoop.log > 2013-01-04 02:42:53,292 INFO parse.ParserJob - Skipping > http://blog.sina.com.cn/s/blog_70b99cd80102ebqv.html; different batch id > (null) > 2013-01-04 02:43:07,412 WARN mapred.FileOutputCommitter - Output path is > null in cleanup > 2013-01-04 02:43:07,436 WARN mapred.LocalJobRunner - job_local_0008 > java.io.IOException: java.sql.BatchUpdateException: data exception: string > data, right truncation > at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340) > at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185) > at > org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > Caused by: java.sql.BatchUpdateException: data exception: string data, > right truncation > at org.hsqldb.jdbc.JDBCPreparedStatement.executeBatch(Unknown Source) > at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328) > ... 6 more > > > > > > > > > At 2013-01-03 21:52:35,"Renato Marroquín Mogrovejo" < > [email protected]> wrote: > >Hi Rui, > > > >The way this works is that Nutch uses the gora-sql-mapping.xml file to > >create automatically the necessary tables and then use them. Anyways, > >IMHO I think you are hitting [1] which means you could try changing > >the gora-sql-mapping.xml file to what has been discussed on JIRA and > >then let us know so we can narrow it down. > >Thanks! > > > > > >Renato M. > > > >[1] https://issues.apache.org/jira/browse/GORA-24 > > > >2013/1/3 高睿 <[email protected]>: > >> BTW, could you please share me the schema of webpage table or creation > script? > >> It seems the table auto-generated by nutch2.1 have problems. > >> > >> > >> > >> > >> > >> > >> At 2013-01-03 21:43:26,"高睿" <[email protected]> wrote: > >> > >> I'm using this command: > >> bin/nutch crawl urls -solr http://localhost:8080/solr/collection2-threads > >> 10 -depth 2 -topN 1000 > >> I guess the exception occurs when it try to store webpage into HSql. I > tried to increase the column size, but it fails again. Here's the schema > for HSql: > >> sql> \d webpage > >> NAME DATATYPE WIDTH NO-NULLS PRECISION SCALE > >> ----------------- -------- -------- -------- --------- ----- > >> ID VARCHAR 767 * 767 > >> HEADERS BLOB 16777216 16777216 > >> TEXT VARCHAR 16777216 16777216 > >> STATUS INTEGER 11 32 > >> MARKERS BLOB 16777216 16777216 > >> PARSESTATUS BLOB 16777216 16777216 > >> MODIFIEDTIME BIGINT 20 64 > >> SCORE DOUBLE 23 64 > >> TYP VARCHAR 32 32 > >> BASEURL VARCHAR 767 767 > >> CONTENT BLOB 16777216 16777216 > >> TITLE VARCHAR 2048 2048 > >> REPRURL VARCHAR 767 767 > >> FETCHINTERVAL INTEGER 11 32 > >> PREVFETCHTIME BIGINT 20 64 > >> INLINKS BLOB 16777216 16777216 > >> PREVSIGNATURE BLOB 16777216 16777216 > >> OUTLINKS BLOB 16777216 16777216 > >> FETCHTIME BIGINT 20 64 > >> RETRIESSINCEFETCH INTEGER 11 32 > >> PROTOCOLSTATUS BLOB 16777216 16777216 > >> SIGNATURE BLOB 16777216 16777216 > >> METADATA BLOB 16777216 16777216 > >> > >> > >> > >> > >> > >> > >> > >> At 2013-01-03 21:06:04,"Lewis John Mcgibbney" < > [email protected]> wrote: > >>>Hi Rui, > >>> > >>>The gora-sql backend is not stable so please do not be surprised if > things > >>>do not work flawlessly. > >>> > >>>I would urge you to have a look at the gora-sql-mapping.xml file [0] and > >>>check the respective field values for the columns you are attempting to > map. > >>> > >>>This aside, I would use the following SQL Store implementations if I > were > >>>going to use this backend > >>> > >>>HSQLDB - 2.2.8 > >>>MySQL - 5.1.18 > >>> > >>>Which stage (in your Nutch processes) does this Exception occur? > >>> > >>>Lewis > >>> > >>>[0] > >>> > http://svn.apache.org/repos/asf/nutch/branches/2.x/conf/gora-sql-mapping.xml > >>> > >>>On Thu, Jan 3, 2013 at 9:34 AM, 高睿 <[email protected]> wrote: > >>> > >>>> Hi, > >>>> > >>>> I can't run Nutch 2.1 with Mysql, then I tried Hsql, failed again. So, > >>>> which database are you using for nutch 2.1. I spent too much time on > this > >>>> and can not make it work. > >>>> > >>>> 2013-01-03 16:12:06,812 WARN mapred.FileOutputCommitter - Output > path is > >>>> null in cleanup > >>>> 2013-01-03 16:12:06,835 WARN mapred.LocalJobRunner - job_local_0008 > >>>> java.io.IOException: java.sql.BatchUpdateException: data exception: > string > >>>> data, right truncation > >>>> at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340) > >>>> at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185) > >>>> at > >>>> > org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55) > >>>> at > >>>> > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651) > >>>> at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) > >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > >>>> at > >>>> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > >>>> Caused by: java.sql.BatchUpdateException: data exception: string data, > >>>> right truncation > >>>> at org.hsqldb.jdbc.JDBCPreparedStatement.executeBatch(Unknown > >>>> Source) > >>>> at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328) > >>>> ... 6 more > >>>> > >>>> Regards, > >>>> Rui > >>>> > >>> > >>> > >>> > >>>-- > >>>*Lewis* > >> > >> > >> > -- *Lewis*

