1.  what version of Nutch/JDK/OS are you using?
 2.  do you have some log information that you can show to determine if the
parse-rss or feed plugin is being called?
 3.  Have you activated those plugins in your nutch-default.xml conf file?

Let me know on 1-3 and then maybe I can help more.

1) I have nutch 1.2, windows 7 ultimate and java 1.6.0_21

2) I tried with "regex-urlfilter" file and with this plugin

   2010-11-02 20:20:25,694 ERROR api.RegexURLFilterBase - Invalid first
character: # Licensed to the Apache Software Foundation (ASF) under one or
more
2010-11-02 20:20:25,698 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: Error in configuring object
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

3) Yes, I have the plugins:
     I tried to index this pages:

        http://www.edutube.org/en/taxonomy/term/7/feed
        http://ocw.mit.edu/rss/all/mit-allcourses-21A.xml
        http://cnx.org/lenses/cnxhcc/affiliation/atom

http://www.merlot.org/merlot/materials.xml?category=2788&materialType=&keywords=&qstringrss=category%3D2788%26sort.property%3DoverallRating&sort.property=overallRating&sortbutton.x=18&sortbutton.y=7&sortbutton=Sort

    but always is the same.. I thing that I have to configure the
"regex-urlfilter" file..... for index the page links, but not index the rss
main page: http://www.edutube.org/en/taxonomy/term/7/feed for expample.... i
don't know

Reply via email to