[
https://issues.apache.org/jira/browse/NUTCH-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney closed NUTCH-493.
-------------------------------
Resolution: Invalid
Assignee: Doğacan Güney
This is not a bug. When fetcher was unable to fetch pages, it created empty
content. Such empty contents are not parseable, hence what you are seeing in
your log.
After NUTCH-443, fetcher will not create emtpy content for such pages, so you
should not see them in your log anymore.
Also, please use nutch-user mailing list to ask questions.
> contentType parse not correctly,,,,got empty content using readseg -get
> -----------------------------------------------------------------------
>
> Key: NUTCH-493
> URL: https://issues.apache.org/jira/browse/NUTCH-493
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 0.9.0
> Environment: java version "1.5.0_04"
> Linux localhost 2.6.8-2-386 #1 Tue Aug 16 12:46:35 UTC 2005 i686 GNU/Linux
> Reporter: wangxu
> Assignee: Doğacan Güney
>
> I am using nutch0.9.
> I found lots of my crawled pages's contents are empty.
> then I checked the log,and find the warnning accordingly:the ContentType is
> said to be "url=http://......",and cannot
> find a suitable parser for the page:
> parser not found for contentType=
> url=http://product.dangdang.com/product.aspx?product_id=490321
> then most of this kind of pages's contents are empty.
> but I didnot find any warn or error other than "timeout" from the fetcher log.
> Can somebody explain me why?
> many thanks!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers