I have enabled logging and get the following error with the subcollections plugin. Any ideas what I do to fix this?
Thanks, Ed. 2008-10-02 09:37:49,401 INFO collection.CollectionManager - Instantiating CollectionManager 2008-10-02 09:37:49,401 INFO collection.CollectionManager - initializing CollectionManager 2008-10-02 09:37:49,405 WARN collection.CollectionManager - Error occured:java.lang.ClassCastException: org.apache.xerces.dom.DeferredCommentImpl 2008-10-02 09:37:49,405 WARN collection.CollectionManager - java.lang.ClassCastException: org.apache.xerces.dom.DeferredCommentImpl 2008-10-02 09:37:49,405 WARN collection.CollectionManager - at org.apache.nutch.util.DomUtil.getDom(DomUtil.java:63) 2008-10-02 09:37:49,406 WARN collection.CollectionManager - at org.apache.nutch.collection.CollectionManager.parse(CollectionManager.java:85) 2008-10-02 09:37:49,406 WARN collection.CollectionManager - at org.apache.nutch.collection.CollectionManager.init(CollectionManager.java:75) 2008-10-02 09:37:49,406 WARN collection.CollectionManager - at org.apache.nutch.collection.CollectionManager.<init>(CollectionManager.java:56) 2008-10-02 09:37:49,406 WARN collection.CollectionManager - at org.apache.nutch.collection.CollectionManager.getCollectionManager(CollectionManager.j ava:115) 2008-10-02 09:37:49,406 WARN collection.CollectionManager - at org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter.addSubCollectionFie ld(SubcollectionIndexingFilter.java:66) 2008-10-02 09:37:49,406 WARN collection.CollectionManager - at org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter.filter(Subcollectio nIndexingFilter.java:72) 2008-10-02 09:37:49,406 WARN collection.CollectionManager - at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:111) 2008-10-02 09:37:49,406 WARN collection.CollectionManager - at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:249) 2008-10-02 09:37:49,406 WARN collection.CollectionManager - at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:52) 2008-10-02 09:37:49,406 WARN collection.CollectionManager - at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) 2008-10-02 09:37:49,406 WARN collection.CollectionManager - at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:201) 2008-10-02 09:37:49,407 DEBUG collection.CollectionManager - subcollections: > > > Hi, > > I'm trying to get subcollections working in nutch 1.0-dev, and have crawled > our intranet with the subcollection.xml configured as below. However when I > submit a query to search.jsp eg, > > subcollection:im database > > I don't get any results (as opposed to submitting this without > subcollection:im) > > Is this configured wrongly? I realise that subcollection.xml doesn't do regex > expressions, but I wasn't sure if I could just put in part of the url, or had > to put in the full stem pattern eg, http://planet.somdomain.com/level1/ > > Thanks, > Ed. > > <subcollections> > <subcollection> > <name>default</name> > <id>default</id> > <whitelist> > </whitelist> > <blacklist> > > planet.somedomain.com/general/aptrix/bani.nsf/Content/Weekly+news > /aptprop.nsf/Content/Americas+ > /aptprop.nsf/Content/AB+CityFlyer+ > /aptprop.nsf/Content/CityFlyer+ > /im/barch/ > /im/dms/ > /im/tech/ > </blacklist> > </subcollection> > > <subcollection> > <name>im</name> > <id>im</id> > <whitelist> > planet.somedomain.com/general/aptrix/aptim.nsf/ > planet.somedomain.com/im/barch/ > planet.somedomain.com/im/dms/ > planet.somedomain.com/im/tech/ > </whitelist> > <blacklist /> > </subcollection> > > <subcollection> > <name>news</name> > <id>news</id> > <whitelist> > > planet.somedomain.com/general/aptrix/bani.nsf/Content/Weekly+news > </whitelist> > <blacklist /> > </subcollection> > > </subcollections> > > _________________________________________________________________ > Discover Bird's Eye View now with Multimap from Live Search > http://clk.atdmt.com/UKM/go/111354026/direct/01/ _________________________________________________________________ Get all your favourite content with the slick new MSN Toolbar - FREE http://clk.atdmt.com/UKM/go/111354027/direct/01/
