Hi
I just test both of the links with the commands that you mentioned and
they both worked. so ...
On 01/27/2012 03:53 PM, Xiao Li wrote:
I am using Nutch to parse a university website. I found that the html
parser can not work properly. Here is the problem.
For the two webpages of students,
1. http://kdd.csd.uwo.ca/doku.php/people/yan_luo
2. http://kdd.csd.uwo.ca/doku.php/people/xiao_li
I use command "nutch parsechecker -dumpText
http://kdd.csd.uwo.ca/doku.php/people/yan_luo"
and "nutch parsechecker -dumpText
http://kdd.csd.uwo.ca/doku.php/people/xiao_li" to extract text.
However, nutch can only extract text from the first webpage but give no
results for the second one. I do not understand why. The two webpages come
from a template.
--
Kaveh Minooie
www.plutoz.com