[ https://issues.apache.org/jira/browse/NUTCH-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doğacan Güney closed NUTCH-493. ------------------------------- Resolution: Invalid Assignee: Doğacan Güney This is not a bug. When fetcher was unable to fetch pages, it created empty content. Such empty contents are not parseable, hence what you are seeing in your log. After NUTCH-443, fetcher will not create emtpy content for such pages, so you should not see them in your log anymore. Also, please use nutch-user mailing list to ask questions. > contentType parse not correctly,,,,got empty content using readseg -get > ----------------------------------------------------------------------- > > Key: NUTCH-493 > URL: https://issues.apache.org/jira/browse/NUTCH-493 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 0.9.0 > Environment: java version "1.5.0_04" > Linux localhost 2.6.8-2-386 #1 Tue Aug 16 12:46:35 UTC 2005 i686 GNU/Linux > Reporter: wangxu > Assignee: Doğacan Güney > > I am using nutch0.9. > I found lots of my crawled pages's contents are empty. > then I checked the log,and find the warnning accordingly:the ContentType is > said to be "url=http://......",and cannot > find a suitable parser for the page: > parser not found for contentType= > url=http://product.dangdang.com/product.aspx?product_id=490321 > then most of this kind of pages's contents are empty. > but I didnot find any warn or error other than "timeout" from the fetcher log. > Can somebody explain me why? > many thanks! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers