contentType parse not correctly,,,,got empty content using readseg -get
-----------------------------------------------------------------------

                 Key: NUTCH-493
                 URL: https://issues.apache.org/jira/browse/NUTCH-493
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 0.9.0
         Environment: java version "1.5.0_04"

Linux localhost 2.6.8-2-386 #1 Tue Aug 16 12:46:35 UTC 2005 i686 GNU/Linux
            Reporter: wangxu


I am using nutch0.9.
I found lots of my crawled pages's contents are empty.
then I checked the log,and find the warnning accordingly:the ContentType is 
said to be "url=http://......",and cannot 
find a suitable parser for the page:


parser not found for contentType=
url=http://product.dangdang.com/product.aspx?product_id=490321


then most of this kind of pages's contents are empty.
but I didnot find any warn or error other than "timeout" from the fetcher log.

Can somebody explain me why?
many thanks!



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to