Hi Rui, The way this works is that Nutch uses the gora-sql-mapping.xml file to create automatically the necessary tables and then use them. Anyways, IMHO I think you are hitting [1] which means you could try changing the gora-sql-mapping.xml file to what has been discussed on JIRA and then let us know so we can narrow it down. Thanks!
Renato M. [1] https://issues.apache.org/jira/browse/GORA-24 2013/1/3 高睿 <[email protected]>: > BTW, could you please share me the schema of webpage table or creation script? > It seems the table auto-generated by nutch2.1 have problems. > > > > > > > At 2013-01-03 21:43:26,"高睿" <[email protected]> wrote: > > I'm using this command: > bin/nutch crawl urls -solr http://localhost:8080/solr/collection2 -threads 10 > -depth 2 -topN 1000 > I guess the exception occurs when it try to store webpage into HSql. I tried > to increase the column size, but it fails again. Here's the schema for HSql: > sql> \d webpage > NAME DATATYPE WIDTH NO-NULLS PRECISION SCALE > ----------------- -------- -------- -------- --------- ----- > ID VARCHAR 767 * 767 > HEADERS BLOB 16777216 16777216 > TEXT VARCHAR 16777216 16777216 > STATUS INTEGER 11 32 > MARKERS BLOB 16777216 16777216 > PARSESTATUS BLOB 16777216 16777216 > MODIFIEDTIME BIGINT 20 64 > SCORE DOUBLE 23 64 > TYP VARCHAR 32 32 > BASEURL VARCHAR 767 767 > CONTENT BLOB 16777216 16777216 > TITLE VARCHAR 2048 2048 > REPRURL VARCHAR 767 767 > FETCHINTERVAL INTEGER 11 32 > PREVFETCHTIME BIGINT 20 64 > INLINKS BLOB 16777216 16777216 > PREVSIGNATURE BLOB 16777216 16777216 > OUTLINKS BLOB 16777216 16777216 > FETCHTIME BIGINT 20 64 > RETRIESSINCEFETCH INTEGER 11 32 > PROTOCOLSTATUS BLOB 16777216 16777216 > SIGNATURE BLOB 16777216 16777216 > METADATA BLOB 16777216 16777216 > > > > > > > > At 2013-01-03 21:06:04,"Lewis John Mcgibbney" <[email protected]> > wrote: >>Hi Rui, >> >>The gora-sql backend is not stable so please do not be surprised if things >>do not work flawlessly. >> >>I would urge you to have a look at the gora-sql-mapping.xml file [0] and >>check the respective field values for the columns you are attempting to map. >> >>This aside, I would use the following SQL Store implementations if I were >>going to use this backend >> >>HSQLDB - 2.2.8 >>MySQL - 5.1.18 >> >>Which stage (in your Nutch processes) does this Exception occur? >> >>Lewis >> >>[0] >>http://svn.apache.org/repos/asf/nutch/branches/2.x/conf/gora-sql-mapping.xml >> >>On Thu, Jan 3, 2013 at 9:34 AM, 高睿 <[email protected]> wrote: >> >>> Hi, >>> >>> I can't run Nutch 2.1 with Mysql, then I tried Hsql, failed again. So, >>> which database are you using for nutch 2.1. I spent too much time on this >>> and can not make it work. >>> >>> 2013-01-03 16:12:06,812 WARN mapred.FileOutputCommitter - Output path is >>> null in cleanup >>> 2013-01-03 16:12:06,835 WARN mapred.LocalJobRunner - job_local_0008 >>> java.io.IOException: java.sql.BatchUpdateException: data exception: string >>> data, right truncation >>> at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340) >>> at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185) >>> at >>> org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55) >>> at >>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >>> at >>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) >>> Caused by: java.sql.BatchUpdateException: data exception: string data, >>> right truncation >>> at org.hsqldb.jdbc.JDBCPreparedStatement.executeBatch(Unknown >>> Source) >>> at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328) >>> ... 6 more >>> >>> Regards, >>> Rui >>> >> >> >> >>-- >>*Lewis* > > >

