Re: fetcher : some doubts

2007-01-02 Thread Sean Dean
Okay, I actually just wrote you a long email of what to do, step by step but when I tried to send it, my web mail session timed out and forced me to re-login, losing it all... I'm not happy :( But straight to the point, since your using the older 0.7 code-base you can use partially fetched

Re: fetcher : some doubts

2007-01-02 Thread Justin Hartman
On 1/2/07, Sean Dean [EMAIL PROTECTED] wrote: There actually isn't much of a reason to generate huge multi-million page fetch lists when you can create lots of smaller ones and merge them together. This allows for more of a ladder-style approach, and in some cases reduces the risk of errors in

Re: fetcher : some doubts

2007-01-02 Thread shrinivas patwardhan
thank you Sean Dean that sounds good .. i will try it out . tell me if i am rite : i case of a dmoz index file is injected in the db .. then i generate only few segments by using -subset and then fetch them .. and then go on and generate the next set of segments i hope i am heading the right

Re: fetcher : some doubts

2007-01-02 Thread Sean Dean
You need to delete the old index before you re-index when working within the same directory structure. This is the procedure I follow, which is pretty much what your doing. This assumes you already have at least one active segment and index. Edit as needed. bin/nutch generate crawl/crawldb

Re: fetcher : some doubts

2007-01-02 Thread Sean Dean
I'm glad you got the slowness issue straightened out. When you import the dmoz urls into your Nutch DB, the -subset command isn't really meant to limit the size of your fetch lists. This becomes even more true when you start re-fetching. You can actually skip the subset command and allow all

Re: fetcher : some doubts

2007-01-02 Thread shrinivas patwardhan
o i understand it now .. well and thanks again for ur help sean i was wondering if anyone wud be interested in making a gui to setup and run the crawl .. say for no voice users i dont know if there is any .. i wud be glad to help if people are keen on making one Thanks Regards Shrinivas

Re: fetcher : some doubts

2007-01-02 Thread Sean Dean
There currently is open development on a Nutch administration GUI for version 0.9. I have not tested, or even really looked at it myself but apparently most of the features work. This will not work on your version, but here is the link to JIRA where you can find the patches and ongoing

Re: fetcher : some doubts

2007-01-02 Thread Justin Hartman
On 1/2/07, Sean Dean [EMAIL PROTECTED] wrote: You need to delete the old index before you re-index when working within the same directory structure This is the procedure I follow, which is pretty much what your doing. This assumes you already have at least one active segment and index. Edit as

RE: fetcher : some doubts

2007-01-02 Thread Alan Tanaman
As an interim solution when using the Nutch front end, what we did is generate the new index in a temporary folder. Then our script (Ant actually) would turn off the web server (Tomcat in our case) to free the existing index from the Nutch bean, and do a quick switcheroo using OS rename commands.

Re: fetcher : some doubts

2007-01-02 Thread Sean Dean
Looking at what I wrote, yes, it will not be acceptable for a production environment. What I failed to mention is that I copy the completed crawl directory somewhere else, and point Tomcat to look there instead via nutch-site.xml. When you completed all those steps, and have your new index

Re: Error on convert to 0.9 during mergesegs step

2007-01-02 Thread Alan Tanaman
I'm getting a similar problem with 0.9, but during the injector: 2007-01-02 21:30:25,797 INFO crawl.Injector - Injector: Merging injected urls into crawl db. 2007-01-02 21:30:25,906 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java

Re: Error on convert to 0.9 during mergesegs step

2007-01-02 Thread Andrzej Bialecki
Alan Tanaman wrote: I'm getting a similar problem with 0.9, but during the injector: 2007-01-02 21:30:25,797 INFO crawl.Injector - Injector: Merging injected urls into crawl db. 2007-01-02 21:30:25,906 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform...

RE: Error on convert to 0.9 during mergesegs step

2007-01-02 Thread Alan Tanaman
Thanks for pointing that out Andrzej. I think that the problem is actually that our native Java Nutch launcher needs to be rebuilt with the Hadoop 0.9 jar. Best regards, Alan _ Alan Tanaman iDNA Solutions -Original Message- From: Andrzej Bialecki [mailto:[EMAIL