Hi, Once I get access to my office I am going to build the patches from trunk. Is it trunk that you are using? Thanks Lewis
On Fri, Feb 8, 2013 at 9:00 PM, peterbarretto <peterbarrett...@gmail.com>wrote: > Hi Lewis, > > I managed to get the code working by adding the below function to > MongodbWriter.java in the public class MongodbWriter implements > NutchIndexWriter :- > > public void delete(String key) throws IOException{ > return; > } > > And the crawled data was getting stored in mongodb. > The only issue was it was storing only the text of the page and not the > full > html content of the page. > How do i store the full html content of the page also? > Hope to see the patches soon. > Thanks > > > > lewis john mcgibbney wrote > > Certainly. > > I am currently reviewing the code and will hopefully have patches for > > Nutch trunk cooked up for tomorrow. > > I'll update this thread likewise. > > Thanks > > Lewis > > > > On Wed, Jan 30, 2013 at 10:02 PM, peterbarretto > > < > > > peterbarretto08@ > > > > wrote: > >> Hi Lewis, > >> > >> I am new to java and i dont know how to inherit all public methods from > >> NutchIndexWriter > >> Can you help me with that? Then i can rebuild and check if it works. > >> > >> > >> lewis john mcgibbney wrote > >>> As you will see the code has not been amended in a year or so. > >>> The positive side is that you only seem to be getting one issue with > >>> javac > >>> > >>> On Tue, Jan 29, 2013 at 8:39 PM, peterbarretto < > >> > >>> peterbarretto08@ > >> > >>> >wrote: > >>> > >>>> > >>>> > >>>> > C:\nutch-16\src\java\org\apache\nutch\indexer\mongodb\MongodbWriter.java:18: > >>>> error: MongodbWriter is not abstract and does not override abstract > >>>> method > >>>> delete(String) in NutchIndexWriter > >>>> [javac] public class MongodbWriter implements NutchIndexWriter{ > >>>> > >>>> Sort this error out by inheriting all public methods from > >>>> NutchIndexWriter > >>> for starts. I take it you are not developing from within Eclipse? As > >>> this > >>> would have been flagged up immediately. This should at least enable you > >>> to > >>> compile the code. > >>> > >>> > >>>> > >>>> I have already crawled some urls now and i need to move those to > >>>> mongodb. > >>>> Is > >>>> there a easy to use code to do that? > >>> > >>> > >>> Not apart from hacking the code as you are already doing. The code you > >>> are > >>> pulling is not part of the official nutch codebase and to be honest a > >>> few > >>> of us didn't even know about it until you brought it to our attention > >>> :0) > >>> > >>> There is no silver bullet here, just take your time and we will get it > >>> working. > >>> Lewis > >> > >> > >> > >> > >> > >> -- > >> View this message in context: > >> > http://lucene.472066.n3.nabble.com/How-to-get-page-content-of-crawled-pages-tp1944330p4037621.html > >> Sent from the Nutch - User mailing list archive at Nabble.com. > > > > > > > > -- > > Lewis > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-get-page-content-of-crawled-pages-tp1944330p4039401.html > Sent from the Nutch - User mailing list archive at Nabble.com. > -- *Lewis*