gora-sql has few bugs. Its recommended to use hbase with Nutch. I had a problem in fetching and parsing data.
On Fri, Feb 1, 2013 at 2:31 AM, feng lu <[email protected]> wrote: > Hi vetus. > > I found the same problem when i run the crawl processing > inject->generate->parse->updatedb. the mysql db output is: > > mysql> SELECT convert(markers using utf8),baseUrl FROM `webpage` WHERE 1; > > > dist0_injmrk_y_updmrk_*1359699678-1110220041__prsmrk__*1359699678-1110220041_gnmrk_*1359699678-1110220041 > _ftcmrk_*1359699678-1110220041 | http://www.apache.org/ | > > the generate and fetch mark is still in the db. > > But when i use HBase as the back-end DB, with the same crawled url and same > crawl process. > > In HBase , after runing the updatedb command, the Generate and Fetch mark > are all remove. > > So maybe it's a bug in Gora-sql model. > > > On Thu, Jan 31, 2013 at 5:42 PM, amuseme <[email protected]> wrote: > > > Hi vetus. > > > > Why updater don't delete the values from de database. I see in > > DbUpdateReducer class WebPage has already remove the Generate and Fetcher > > markers if they exists. > > > > > > > > ----- > > Don't Grow Old, Grow Up. > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Mysql-don-t-save-Markers-properly-tp4037310p4037651.html > > Sent from the Nutch - User mailing list archive at Nabble.com. > > > > > > -- > Don't Grow Old, Grow Up... :-) > -- Kiran Chitturi

