Hi Parshant

I think the fastest method  to is use nutch 2.1 like Tejas says , it can
extend your own back-end DB through Apache Gora. But it currently only
support HBase, Cassandra etc.

But if you want to modify the source code of nutch 1.x to meet your needs,
you can see the ParseOutputFormat class,  it used to output the parsed data
include content,outlinks, metadata etc. your can implement your own
ParseOutputFormat to direct information to your DB.

But i still  do not recommend to modify the source code.


On Fri, Feb 22, 2013 at 2:45 PM, Prashant More (प्रशांत मोरे) <
[email protected]> wrote:

> Thank you Tejas.
>
> Your tips helped a lot.
>
> One more thing is, after building, the plugin.folder property should point
> to build/plugins for executing the crawl.
>
> Now it crawling fine. My concern is to locate object which has the content
> and its metadata so that I can capture that and direct to my DB, as
> mentioned earlier. How to do that?
>
> Thanks,
>
> --
> Prashant More
>
>
> On Thu, Feb 7, 2013 at 11:40 AM, Tejas Patil <[email protected]
> >wrote:
>
> > On Wed, Feb 6, 2013 at 9:23 PM, Prashant More (प्रशांत मोरे) <
> > [email protected]> wrote:
> >
> > > Thank you Tejas.
> > > I have added all the libraries/jars mentioned in [1], along with my
> > source
> > > jar and other required jars to the classpath. The difference between
> the
> > > bin/nutch script and the tutorial [1] is adding java's tools.jar in the
> > > script, and not adding nutch's build directory in eclipse as we want to
> > use
> > > the source for building nutch.
> >
> > Ok.
> >
> >
> > > I have added the tools.jar and instead of
> > > build directory, I have added nutch's java source to the classpath.
> > >
> > > [1] http://wiki.apache.org/nutch/RunNutchInEclipse
> > >
> > > Still it is giving the same error.
> > >
> > What is the name of that package that you are adding: is it
> > org.apache.nutch.XXXX or something else ?
> > How do you compile the code in Eclipse: running the ant build file or
> some
> > other way ?
> > These are relevant chunks in build.xml [1] that might help you: lines
> > 86-100, 455-460.
> > If you are running ant build file, try to print the classpath formed in
> the
> > compile-core target ([2] tells how to do that).
> > There are 2 possibilities:
> > 1. the extra jars you added are not in the classpath: in this case, you
> can
> > debug the "copy-libs" target and check what all things are getting
> copied.
> > 2. the extra jars you added are in the classpath and yet you see
> > compilation error: This might be strange but leading towards an eclispe +
> > ant issue and probably wont have to do with nutch.
> >
> > [1] : http://svn.apache.org/viewvc/nutch/trunk/build.xml?view=markup
> > [2] : http://www.javalobby.org/java/forums/t71033.html
> >
> >
> > >
> > > Thanks,
> > > Prashant More
> > >
> > > On Thu, Feb 7, 2013 at 5:30 AM, Tejas Patil <[email protected]
> > > >wrote:
> > >
> > > > If you see the bin/nutch script, there are lot of things that are to
> be
> > > > added to the CP before the actual nutch class is invoked. Looking at
> > the
> > > > script you will get a hint about what is missing. Also, beware of
> your
> > > > package naming. Build script it looks at specific places only for
> > source
> > > > files. eg.
> > > > includes="org/apache/nutch/**/*.java"
> > > > Tweaking the build file or placing your classes at right place might
> > help
> > > > you here.
> > > >
> > > > thanks,
> > > > Tejas Patil
> > > >
> > > >
> > > >
> > > > On Wed, Feb 6, 2013 at 12:30 AM, Prashant More (प्रशांत मोरे) <
> > > > [email protected]> wrote:
> > > >
> > > > > Thank you, Tejas.
> > > > >
> > > > > My DB is already in place, for processing, I have configured and
> used
> > > > > Nutch1.0 from shell script, but I want to configure and modify
> using
> > > > > eclipse for Nutch1.5. So at present I do not want to use 2.1.
> > > > >
> > > > > Thanks,
> > > > > Prashant More
> > > > >
> > > > > On Wed, Feb 6, 2013 at 12:41 PM, Tejas Patil <
> > [email protected]
> > > > > >wrote:
> > > > >
> > > > > > Have you considered using nutch 2.x ? It has support for doing
> > this.
> > > > > Google
> > > > > > out "nutch 2.x mySQL" to get some good tutorials like [0].
> > > > > >
> > > > > > [0] : http://nlp.solutions.asia/?p=180
> > > > > >
> > > > > > Thanks,
> > > > > > Tejas Patil
> > > > > >
> > > > > >
> > > > > > On Tue, Feb 5, 2013 at 10:24 PM, Prashant More (प्रशांत मोरे) <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >    I am modifying the nutch source to direct the crawled
> content
> > to
> > > > > mysql
> > > > > > > db in my own database structure for further processing.
> > Initially,
> > > I
> > > > > > > condigured Nutch1.5 source with eclipse Juno and it crawls the
> > data
> > > > on
> > > > > my
> > > > > > > files system, as expected. Then I wrote some code for directing
> > the
> > > > > > crawled
> > > > > > > data to my DB.
> > > > > > >
> > > > > > > I added the code to the Nutch source and added the required
> > > libraries
> > > > > to
> > > > > > > the build path. But it is unable to find my packages in
> libraries
> > > and
> > > > > > > hadoop packages, during the build time.
> > > > > > >
> > > > > > > I placed my jars/libraries in NUTCH_HOME/lib, as this is used
> by
> > > > > > build.xml
> > > > > > > for compiling.
> > > > > > >
> > > > > > > It is showing compile error while building, however, when I
> made
> > > > > changes
> > > > > > in
> > > > > > > Nutch source it did not show any errors.
> > > > > > >
> > > > > > > Kindly let me know what am i missing?
> > > > > > > --
> > > > > > > More Prashant
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to