Hi Rui,

The way this works is that Nutch uses the gora-sql-mapping.xml file to
create automatically the necessary tables and then use them. Anyways,
IMHO I think you are hitting [1] which means you could try changing
the gora-sql-mapping.xml file to what has been discussed on JIRA and
then let us know so we can narrow it down.
Thanks!


Renato M.

[1] https://issues.apache.org/jira/browse/GORA-24

2013/1/3 高睿 <[email protected]>:
> BTW, could you please share me the schema of webpage table or creation script?
> It seems the table auto-generated by nutch2.1 have problems.
>
>
>
>
>
>
> At 2013-01-03 21:43:26,"高睿" <[email protected]> wrote:
>
> I'm using this command:
> bin/nutch crawl urls -solr http://localhost:8080/solr/collection2 -threads 10 
> -depth 2 -topN 1000
> I guess the exception occurs when it try to store webpage into HSql. I tried 
> to increase the column size, but it fails again. Here's the schema for HSql:
> sql> \d webpage
> NAME               DATATYPE     WIDTH  NO-NULLS  PRECISION  SCALE
> -----------------  --------  --------  --------  ---------  -----
> ID                 VARCHAR        767  *               767
> HEADERS            BLOB      16777216             16777216
> TEXT               VARCHAR   16777216             16777216
> STATUS             INTEGER         11                   32
> MARKERS            BLOB      16777216             16777216
> PARSESTATUS        BLOB      16777216             16777216
> MODIFIEDTIME       BIGINT          20                   64
> SCORE              DOUBLE          23                   64
> TYP                VARCHAR         32                   32
> BASEURL            VARCHAR        767                  767
> CONTENT            BLOB      16777216             16777216
> TITLE              VARCHAR       2048                 2048
> REPRURL            VARCHAR        767                  767
> FETCHINTERVAL      INTEGER         11                   32
> PREVFETCHTIME      BIGINT          20                   64
> INLINKS            BLOB      16777216             16777216
> PREVSIGNATURE      BLOB      16777216             16777216
> OUTLINKS           BLOB      16777216             16777216
> FETCHTIME          BIGINT          20                   64
> RETRIESSINCEFETCH  INTEGER         11                   32
> PROTOCOLSTATUS     BLOB      16777216             16777216
> SIGNATURE          BLOB      16777216             16777216
> METADATA           BLOB      16777216             16777216
>
>
>
>
>
>
>
> At 2013-01-03 21:06:04,"Lewis John Mcgibbney" <[email protected]> 
> wrote:
>>Hi Rui,
>>
>>The gora-sql backend is not stable so please do not be surprised if things
>>do not work flawlessly.
>>
>>I would urge you to have a look at the gora-sql-mapping.xml file [0] and
>>check the respective field values for the columns you are attempting to map.
>>
>>This aside, I would use the following SQL Store implementations if I were
>>going to use this backend
>>
>>HSQLDB - 2.2.8
>>MySQL - 5.1.18
>>
>>Which stage (in your Nutch processes) does this Exception occur?
>>
>>Lewis
>>
>>[0]
>>http://svn.apache.org/repos/asf/nutch/branches/2.x/conf/gora-sql-mapping.xml
>>
>>On Thu, Jan 3, 2013 at 9:34 AM, 高睿 <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I can't run Nutch 2.1 with Mysql, then I tried Hsql, failed again. So,
>>> which database are you using for nutch 2.1. I spent too much time on this
>>> and can not make it work.
>>>
>>> 2013-01-03 16:12:06,812 WARN  mapred.FileOutputCommitter - Output path is
>>> null in cleanup
>>> 2013-01-03 16:12:06,835 WARN  mapred.LocalJobRunner - job_local_0008
>>> java.io.IOException: java.sql.BatchUpdateException: data exception: string
>>> data, right truncation
>>>         at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
>>>         at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
>>>         at
>>> org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
>>>         at
>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>>         at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>>> Caused by: java.sql.BatchUpdateException: data exception: string data,
>>> right truncation
>>>         at org.hsqldb.jdbc.JDBCPreparedStatement.executeBatch(Unknown
>>> Source)
>>>         at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
>>>         ... 6 more
>>>
>>> Regards,
>>> Rui
>>>
>>
>>
>>
>>--
>>*Lewis*
>
>
>

Reply via email to