or use https://issues.apache.org/jira/browse/NUTCH-1047 and write your own indexing backend. That's exactly what NUTCH-1047 is for.
On 22 February 2013 09:10, feng lu <[email protected]> wrote: > Hi Parshant > > I think the fastest method to is use nutch 2.1 like Tejas says , it can > extend your own back-end DB through Apache Gora. But it currently only > support HBase, Cassandra etc. > > But if you want to modify the source code of nutch 1.x to meet your needs, > you can see the ParseOutputFormat class, it used to output the parsed data > include content,outlinks, metadata etc. your can implement your own > ParseOutputFormat to direct information to your DB. > > But i still do not recommend to modify the source code. > > > On Fri, Feb 22, 2013 at 2:45 PM, Prashant More (प्रशांत मोरे) < > [email protected]> wrote: > > > Thank you Tejas. > > > > Your tips helped a lot. > > > > One more thing is, after building, the plugin.folder property should > point > > to build/plugins for executing the crawl. > > > > Now it crawling fine. My concern is to locate object which has the > content > > and its metadata so that I can capture that and direct to my DB, as > > mentioned earlier. How to do that? > > > > Thanks, > > > > -- > > Prashant More > > > > > > On Thu, Feb 7, 2013 at 11:40 AM, Tejas Patil <[email protected] > > >wrote: > > > > > On Wed, Feb 6, 2013 at 9:23 PM, Prashant More (प्रशांत मोरे) < > > > [email protected]> wrote: > > > > > > > Thank you Tejas. > > > > I have added all the libraries/jars mentioned in [1], along with my > > > source > > > > jar and other required jars to the classpath. The difference between > > the > > > > bin/nutch script and the tutorial [1] is adding java's tools.jar in > the > > > > script, and not adding nutch's build directory in eclipse as we want > to > > > use > > > > the source for building nutch. > > > > > > Ok. > > > > > > > > > > I have added the tools.jar and instead of > > > > build directory, I have added nutch's java source to the classpath. > > > > > > > > [1] http://wiki.apache.org/nutch/RunNutchInEclipse > > > > > > > > Still it is giving the same error. > > > > > > > What is the name of that package that you are adding: is it > > > org.apache.nutch.XXXX or something else ? > > > How do you compile the code in Eclipse: running the ant build file or > > some > > > other way ? > > > These are relevant chunks in build.xml [1] that might help you: lines > > > 86-100, 455-460. > > > If you are running ant build file, try to print the classpath formed in > > the > > > compile-core target ([2] tells how to do that). > > > There are 2 possibilities: > > > 1. the extra jars you added are not in the classpath: in this case, you > > can > > > debug the "copy-libs" target and check what all things are getting > > copied. > > > 2. the extra jars you added are in the classpath and yet you see > > > compilation error: This might be strange but leading towards an > eclispe + > > > ant issue and probably wont have to do with nutch. > > > > > > [1] : http://svn.apache.org/viewvc/nutch/trunk/build.xml?view=markup > > > [2] : http://www.javalobby.org/java/forums/t71033.html > > > > > > > > > > > > > > Thanks, > > > > Prashant More > > > > > > > > On Thu, Feb 7, 2013 at 5:30 AM, Tejas Patil < > [email protected] > > > > >wrote: > > > > > > > > > If you see the bin/nutch script, there are lot of things that are > to > > be > > > > > added to the CP before the actual nutch class is invoked. Looking > at > > > the > > > > > script you will get a hint about what is missing. Also, beware of > > your > > > > > package naming. Build script it looks at specific places only for > > > source > > > > > files. eg. > > > > > includes="org/apache/nutch/**/*.java" > > > > > Tweaking the build file or placing your classes at right place > might > > > help > > > > > you here. > > > > > > > > > > thanks, > > > > > Tejas Patil > > > > > > > > > > > > > > > > > > > > On Wed, Feb 6, 2013 at 12:30 AM, Prashant More (प्रशांत मोरे) < > > > > > [email protected]> wrote: > > > > > > > > > > > Thank you, Tejas. > > > > > > > > > > > > My DB is already in place, for processing, I have configured and > > used > > > > > > Nutch1.0 from shell script, but I want to configure and modify > > using > > > > > > eclipse for Nutch1.5. So at present I do not want to use 2.1. > > > > > > > > > > > > Thanks, > > > > > > Prashant More > > > > > > > > > > > > On Wed, Feb 6, 2013 at 12:41 PM, Tejas Patil < > > > [email protected] > > > > > > >wrote: > > > > > > > > > > > > > Have you considered using nutch 2.x ? It has support for doing > > > this. > > > > > > Google > > > > > > > out "nutch 2.x mySQL" to get some good tutorials like [0]. > > > > > > > > > > > > > > [0] : http://nlp.solutions.asia/?p=180 > > > > > > > > > > > > > > Thanks, > > > > > > > Tejas Patil > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 5, 2013 at 10:24 PM, Prashant More (प्रशांत मोरे) < > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > I am modifying the nutch source to direct the crawled > > content > > > to > > > > > > mysql > > > > > > > > db in my own database structure for further processing. > > > Initially, > > > > I > > > > > > > > condigured Nutch1.5 source with eclipse Juno and it crawls > the > > > data > > > > > on > > > > > > my > > > > > > > > files system, as expected. Then I wrote some code for > directing > > > the > > > > > > > crawled > > > > > > > > data to my DB. > > > > > > > > > > > > > > > > I added the code to the Nutch source and added the required > > > > libraries > > > > > > to > > > > > > > > the build path. But it is unable to find my packages in > > libraries > > > > and > > > > > > > > hadoop packages, during the build time. > > > > > > > > > > > > > > > > I placed my jars/libraries in NUTCH_HOME/lib, as this is used > > by > > > > > > > build.xml > > > > > > > > for compiling. > > > > > > > > > > > > > > > > It is showing compile error while building, however, when I > > made > > > > > > changes > > > > > > > in > > > > > > > > Nutch source it did not show any errors. > > > > > > > > > > > > > > > > Kindly let me know what am i missing? > > > > > > > > -- > > > > > > > > More Prashant > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Don't Grow Old, Grow Up... :-) > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

