Hi Lewis, I downloaded the nutch copy from http://apache.techartifact.com/mirror/nutch/1.6/
lewis john mcgibbney wrote > Hi, > Once I get access to my office I am going to build the patches from trunk. > Is it trunk that you are using? > Thanks > Lewis > > On Fri, Feb 8, 2013 at 9:00 PM, peterbarretto < > peterbarretto08@ > >wrote: > >> Hi Lewis, >> >> I managed to get the code working by adding the below function to >> MongodbWriter.java in the public class MongodbWriter implements >> NutchIndexWriter :- >> >> public void delete(String key) throws IOException{ >> return; >> } >> >> And the crawled data was getting stored in mongodb. >> The only issue was it was storing only the text of the page and not the >> full >> html content of the page. >> How do i store the full html content of the page also? >> Hope to see the patches soon. >> Thanks >> >> >> >> lewis john mcgibbney wrote >> > Certainly. >> > I am currently reviewing the code and will hopefully have patches for >> > Nutch trunk cooked up for tomorrow. >> > I'll update this thread likewise. >> > Thanks >> > Lewis >> > >> > On Wed, Jan 30, 2013 at 10:02 PM, peterbarretto >> > < >> >> > peterbarretto08@ >> >> > > wrote: >> >> Hi Lewis, >> >> >> >> I am new to java and i dont know how to inherit all public methods >> from >> >> NutchIndexWriter >> >> Can you help me with that? Then i can rebuild and check if it works. >> >> >> >> >> >> lewis john mcgibbney wrote >> >>> As you will see the code has not been amended in a year or so. >> >>> The positive side is that you only seem to be getting one issue with >> >>> javac >> >>> >> >>> On Tue, Jan 29, 2013 at 8:39 PM, peterbarretto < >> >> >> >>> peterbarretto08@ >> >> >> >>> >wrote: >> >>> >> >>>> >> >>>> >> >>>> >> C:\nutch-16\src\java\org\apache\nutch\indexer\mongodb\MongodbWriter.java:18: >> >>>> error: MongodbWriter is not abstract and does not override abstract >> >>>> method >> >>>> delete(String) in NutchIndexWriter >> >>>> [javac] public class MongodbWriter implements NutchIndexWriter{ >> >>>> >> >>>> Sort this error out by inheriting all public methods from >> >>>> NutchIndexWriter >> >>> for starts. I take it you are not developing from within Eclipse? As >> >>> this >> >>> would have been flagged up immediately. This should at least enable >> you >> >>> to >> >>> compile the code. >> >>> >> >>> >> >>>> >> >>>> I have already crawled some urls now and i need to move those to >> >>>> mongodb. >> >>>> Is >> >>>> there a easy to use code to do that? >> >>> >> >>> >> >>> Not apart from hacking the code as you are already doing. The code >> you >> >>> are >> >>> pulling is not part of the official nutch codebase and to be honest a >> >>> few >> >>> of us didn't even know about it until you brought it to our attention >> >>> :0) >> >>> >> >>> There is no silver bullet here, just take your time and we will get >> it >> >>> working. >> >>> Lewis >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> View this message in context: >> >> >> http://lucene.472066.n3.nabble.com/How-to-get-page-content-of-crawled-pages-tp1944330p4037621.html >> >> Sent from the Nutch - User mailing list archive at Nabble.com. >> > >> > >> > >> > -- >> > Lewis >> >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/How-to-get-page-content-of-crawled-pages-tp1944330p4039401.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> > > > > -- > *Lewis* -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-page-content-of-crawled-pages-tp1944330p4039613.html Sent from the Nutch - User mailing list archive at Nabble.com.