gora-sql has few bugs. Its recommended to use hbase with Nutch. I had a
problem in fetching and parsing data.


On Fri, Feb 1, 2013 at 2:31 AM, feng lu <[email protected]> wrote:

> Hi vetus.
>
> I found the same problem when i run the crawl processing
> inject->generate->parse->updatedb. the mysql db output is:
>
> mysql> SELECT convert(markers using utf8),baseUrl FROM `webpage` WHERE 1;
>
>
> dist0_injmrk_y_updmrk_*1359699678-1110220041__prsmrk__*1359699678-1110220041_gnmrk_*1359699678-1110220041
> _ftcmrk_*1359699678-1110220041  | http://www.apache.org/ |
>
> the generate and fetch mark is still in the db.
>
> But when i use HBase as the back-end DB, with the same crawled url and same
> crawl process.
>
> In HBase , after runing the updatedb command, the Generate and Fetch mark
> are all remove.
>
> So maybe it's a bug in Gora-sql model.
>
>
> On Thu, Jan 31, 2013 at 5:42 PM, amuseme <[email protected]> wrote:
>
> > Hi vetus.
> >
> > Why updater don't delete the values from de database. I see in
> > DbUpdateReducer class WebPage has already remove the Generate and Fetcher
> > markers if they exists.
> >
> >
> >
> > -----
> > Don't Grow Old, Grow Up.
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Mysql-don-t-save-Markers-properly-tp4037310p4037651.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>



-- 
Kiran Chitturi

Reply via email to