or use https://issues.apache.org/jira/browse/NUTCH-1047 and write your own
indexing backend. That's exactly what NUTCH-1047 is for.

On 22 February 2013 09:10, feng lu <[email protected]> wrote:

> Hi Parshant
>
> I think the fastest method  to is use nutch 2.1 like Tejas says , it can
> extend your own back-end DB through Apache Gora. But it currently only
> support HBase, Cassandra etc.
>
> But if you want to modify the source code of nutch 1.x to meet your needs,
> you can see the ParseOutputFormat class,  it used to output the parsed data
> include content,outlinks, metadata etc. your can implement your own
> ParseOutputFormat to direct information to your DB.
>
> But i still  do not recommend to modify the source code.
>
>
> On Fri, Feb 22, 2013 at 2:45 PM, Prashant More (प्रशांत मोरे) <
> [email protected]> wrote:
>
> > Thank you Tejas.
> >
> > Your tips helped a lot.
> >
> > One more thing is, after building, the plugin.folder property should
> point
> > to build/plugins for executing the crawl.
> >
> > Now it crawling fine. My concern is to locate object which has the
> content
> > and its metadata so that I can capture that and direct to my DB, as
> > mentioned earlier. How to do that?
> >
> > Thanks,
> >
> > --
> > Prashant More
> >
> >
> > On Thu, Feb 7, 2013 at 11:40 AM, Tejas Patil <[email protected]
> > >wrote:
> >
> > > On Wed, Feb 6, 2013 at 9:23 PM, Prashant More (प्रशांत मोरे) <
> > > [email protected]> wrote:
> > >
> > > > Thank you Tejas.
> > > > I have added all the libraries/jars mentioned in [1], along with my
> > > source
> > > > jar and other required jars to the classpath. The difference between
> > the
> > > > bin/nutch script and the tutorial [1] is adding java's tools.jar in
> the
> > > > script, and not adding nutch's build directory in eclipse as we want
> to
> > > use
> > > > the source for building nutch.
> > >
> > > Ok.
> > >
> > >
> > > > I have added the tools.jar and instead of
> > > > build directory, I have added nutch's java source to the classpath.
> > > >
> > > > [1] http://wiki.apache.org/nutch/RunNutchInEclipse
> > > >
> > > > Still it is giving the same error.
> > > >
> > > What is the name of that package that you are adding: is it
> > > org.apache.nutch.XXXX or something else ?
> > > How do you compile the code in Eclipse: running the ant build file or
> > some
> > > other way ?
> > > These are relevant chunks in build.xml [1] that might help you: lines
> > > 86-100, 455-460.
> > > If you are running ant build file, try to print the classpath formed in
> > the
> > > compile-core target ([2] tells how to do that).
> > > There are 2 possibilities:
> > > 1. the extra jars you added are not in the classpath: in this case, you
> > can
> > > debug the "copy-libs" target and check what all things are getting
> > copied.
> > > 2. the extra jars you added are in the classpath and yet you see
> > > compilation error: This might be strange but leading towards an
> eclispe +
> > > ant issue and probably wont have to do with nutch.
> > >
> > > [1] : http://svn.apache.org/viewvc/nutch/trunk/build.xml?view=markup
> > > [2] : http://www.javalobby.org/java/forums/t71033.html
> > >
> > >
> > > >
> > > > Thanks,
> > > > Prashant More
> > > >
> > > > On Thu, Feb 7, 2013 at 5:30 AM, Tejas Patil <
> [email protected]
> > > > >wrote:
> > > >
> > > > > If you see the bin/nutch script, there are lot of things that are
> to
> > be
> > > > > added to the CP before the actual nutch class is invoked. Looking
> at
> > > the
> > > > > script you will get a hint about what is missing. Also, beware of
> > your
> > > > > package naming. Build script it looks at specific places only for
> > > source
> > > > > files. eg.
> > > > > includes="org/apache/nutch/**/*.java"
> > > > > Tweaking the build file or placing your classes at right place
> might
> > > help
> > > > > you here.
> > > > >
> > > > > thanks,
> > > > > Tejas Patil
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Feb 6, 2013 at 12:30 AM, Prashant More (प्रशांत मोरे) <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > Thank you, Tejas.
> > > > > >
> > > > > > My DB is already in place, for processing, I have configured and
> > used
> > > > > > Nutch1.0 from shell script, but I want to configure and modify
> > using
> > > > > > eclipse for Nutch1.5. So at present I do not want to use 2.1.
> > > > > >
> > > > > > Thanks,
> > > > > > Prashant More
> > > > > >
> > > > > > On Wed, Feb 6, 2013 at 12:41 PM, Tejas Patil <
> > > [email protected]
> > > > > > >wrote:
> > > > > >
> > > > > > > Have you considered using nutch 2.x ? It has support for doing
> > > this.
> > > > > > Google
> > > > > > > out "nutch 2.x mySQL" to get some good tutorials like [0].
> > > > > > >
> > > > > > > [0] : http://nlp.solutions.asia/?p=180
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Tejas Patil
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Feb 5, 2013 at 10:24 PM, Prashant More (प्रशांत मोरे) <
> > > > > > > [email protected]> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >    I am modifying the nutch source to direct the crawled
> > content
> > > to
> > > > > > mysql
> > > > > > > > db in my own database structure for further processing.
> > > Initially,
> > > > I
> > > > > > > > condigured Nutch1.5 source with eclipse Juno and it crawls
> the
> > > data
> > > > > on
> > > > > > my
> > > > > > > > files system, as expected. Then I wrote some code for
> directing
> > > the
> > > > > > > crawled
> > > > > > > > data to my DB.
> > > > > > > >
> > > > > > > > I added the code to the Nutch source and added the required
> > > > libraries
> > > > > > to
> > > > > > > > the build path. But it is unable to find my packages in
> > libraries
> > > > and
> > > > > > > > hadoop packages, during the build time.
> > > > > > > >
> > > > > > > > I placed my jars/libraries in NUTCH_HOME/lib, as this is used
> > by
> > > > > > > build.xml
> > > > > > > > for compiling.
> > > > > > > >
> > > > > > > > It is showing compile error while building, however, when I
> > made
> > > > > > changes
> > > > > > > in
> > > > > > > > Nutch source it did not show any errors.
> > > > > > > >
> > > > > > > > Kindly let me know what am i missing?
> > > > > > > > --
> > > > > > > > More Prashant
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to