BTW, could you please share me the schema of webpage table or creation script?
It seems the table auto-generated by nutch2.1 have problems.






At 2013-01-03 21:43:26,"高睿" <[email protected]> wrote:

I'm using this command:
bin/nutch crawl urls -solr http://localhost:8080/solr/collection2 -threads 10 
-depth 2 -topN 1000
I guess the exception occurs when it try to store webpage into HSql. I tried to 
increase the column size, but it fails again. Here's the schema for HSql:
sql> \d webpage
NAME               DATATYPE     WIDTH  NO-NULLS  PRECISION  SCALE
-----------------  --------  --------  --------  ---------  -----
ID                 VARCHAR        767  *               767
HEADERS            BLOB      16777216             16777216
TEXT               VARCHAR   16777216             16777216
STATUS             INTEGER         11                   32
MARKERS            BLOB      16777216             16777216
PARSESTATUS        BLOB      16777216             16777216
MODIFIEDTIME       BIGINT          20                   64
SCORE              DOUBLE          23                   64
TYP                VARCHAR         32                   32
BASEURL            VARCHAR        767                  767
CONTENT            BLOB      16777216             16777216
TITLE              VARCHAR       2048                 2048
REPRURL            VARCHAR        767                  767
FETCHINTERVAL      INTEGER         11                   32
PREVFETCHTIME      BIGINT          20                   64
INLINKS            BLOB      16777216             16777216
PREVSIGNATURE      BLOB      16777216             16777216
OUTLINKS           BLOB      16777216             16777216
FETCHTIME          BIGINT          20                   64
RETRIESSINCEFETCH  INTEGER         11                   32
PROTOCOLSTATUS     BLOB      16777216             16777216
SIGNATURE          BLOB      16777216             16777216
METADATA           BLOB      16777216             16777216







At 2013-01-03 21:06:04,"Lewis John Mcgibbney" <[email protected]> wrote:
>Hi Rui,
>
>The gora-sql backend is not stable so please do not be surprised if things
>do not work flawlessly.
>
>I would urge you to have a look at the gora-sql-mapping.xml file [0] and
>check the respective field values for the columns you are attempting to map.
>
>This aside, I would use the following SQL Store implementations if I were
>going to use this backend
>
>HSQLDB - 2.2.8
>MySQL - 5.1.18
>
>Which stage (in your Nutch processes) does this Exception occur?
>
>Lewis
>
>[0]
>http://svn.apache.org/repos/asf/nutch/branches/2.x/conf/gora-sql-mapping.xml
>
>On Thu, Jan 3, 2013 at 9:34 AM, 高睿 <[email protected]> wrote:
>
>> Hi,
>>
>> I can't run Nutch 2.1 with Mysql, then I tried Hsql, failed again. So,
>> which database are you using for nutch 2.1. I spent too much time on this
>> and can not make it work.
>>
>> 2013-01-03 16:12:06,812 WARN  mapred.FileOutputCommitter - Output path is
>> null in cleanup
>> 2013-01-03 16:12:06,835 WARN  mapred.LocalJobRunner - job_local_0008
>> java.io.IOException: java.sql.BatchUpdateException: data exception: string
>> data, right truncation
>>         at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
>>         at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
>>         at
>> org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
>>         at
>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>> Caused by: java.sql.BatchUpdateException: data exception: string data,
>> right truncation
>>         at org.hsqldb.jdbc.JDBCPreparedStatement.executeBatch(Unknown
>> Source)
>>         at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
>>         ... 6 more
>>
>> Regards,
>> Rui
>>
>
>
>
>-- 
>*Lewis*



Reply via email to