Creating new linked entries in crawlDB
Hi everyone Not sure where exactly where to post this question. Sorry for the double post. I've been using nutch for a while now and i've come up on a snag. I'm trying to find where new linked pages are added to the segment as a specific entry. To make myself clear i've been through the fetch class and the crawlDBFilter and reducer. But i'm looking for the initial entry where, for a given page, the links are transformed into segment entries, my objective here is to pass down te initial inject url to all it's liked pages. So when i create an entry for the linked urls of a wegbpage i'll add metadata to their definition giving them this originating url. By the time i get to CrawlDBFilter i already have entries for linked pages and lost the notion of which seed url brought us here. I thought the job would be done in the Fetcher maybe in the output function but i'm not finding where it happens. So if anyone knows and could point me in the right direction i'd appreciate it. thanks -- View this message in context: http://old.nabble.com/Creating-new-linked-entries-in-crawlDB-tp27864424p27864424.html Sent from the Nutch - User mailing list archive at Nabble.com.
Where are new linked entries added
Hi everyone I've been using nutch for a while now and i've come up on a snag. I'm trying to find where new linked pages are added to the segment as a specific entry. To make myself clear i've been through the fetch class and the crawlDBFilter and reducer. But i'm looking for the initial entry where, for a given page, the links are transformed into segment entries, my objective here is to pass down te initial inject url to all it's liked pages. So when i create an entry for the linked urls of a wegbpage i'll add metadata to their definition giving them this originating url. By the time i get to CrawlDBFilter i already have entries for linked pages and lost the notion of which seed url brought us here. I thought the job would be done in the Fetcher maybe in the output function but i'm not finding where it happens. So if anyone knows and could point me in the right direction i'd appreciate it. thanks -- View this message in context: http://old.nabble.com/Where-are-new-linked-entries-added-tp27864477p27864477.html Sent from the Nutch - User mailing list archive at Nabble.com.
RE: nutch-1.0.war deploying error
yeap you were right I edited the JAVA_HOME but not everywhere. Anyways thanks a bunch it was a jvm version problem. Arkadi wrote: Hi, It looks like you have to upgrade your jvm. Arkadi -Original Message- From: nikinch [mailto:maill...@qwamci.com] Sent: Tuesday, October 13, 2009 1:20 AM To: nutch-user@lucene.apache.org Subject: nutch-1.0.war deploying error Hello I have been playing around with nutch-1.0 recently and upon looking into the carrot2 feature i have tried to deploy the .war to test. I copied the war fil in my apache webapps folder but i get this error upon trying to start the service. I can't seem to get it fixed if anyone has some input as to what is the problem i'd appreciate 12 oct. 2009 15:03:35 org.apache.catalina.core.StandardContext listenerStart GRAVE: Exception lors de l'envoi de l'évènement contexte initialisé (context initialized) à l'instance de classe d'écoute (listener) org.apache.nutch.searcher.NutchBean$NutchBeanConstructor java.lang.UnsupportedClassVersionError: Bad version number in .class file (unable to load class org.apache.hadoop.io.VersionedWritable) (unable to load class org.apache.nutch.parse.ParseData) at org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClass Loader.java:1854) at org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.j ava:890) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.j ava:1354) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.j ava:1233) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at org.apache.nutch.util.NutchConfiguration.clinit(NutchConfiguration.java: 42) at org.apache.nutch.searcher.NutchBean$NutchBeanConstructor.contextInitialize d(NutchBean.java:420) at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.jav a:3934) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4429) at org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:1249) at org.apache.catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.ja va:612) at org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.ja va:136) at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicati onFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilter Chain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve. java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve. java:191) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBa se.java:525) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:1 28) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:1 02) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.ja va:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293 ) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Ht tp11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:595) -- View this message in context: http://www.nabble.com/nutch-1.0.war- deploying-error-tp25856641p25856641.html Sent from the Nutch - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/nutch-1.0.war-deploying-error-tp25856641p25869254.html Sent from the Nutch - User mailing list archive at Nabble.com.