-numFetchers in generate command

2006-08-03 Thread Murat Ali Bayir
Hi everbody, Although we give number of Fetchers in generate command, our system always produce fixed number of part in reduce process? What can be reason for this? Do we have to change anything in configuration file of Hadoop?

Recrawl urls

2006-08-03 Thread Nahuel ANGELINETTI
Hello, I was searching for the method to add new url to the crawling url list and how to recrawl all urls... Can you help me ? thanks, -- Nahuel ANGELINETTI

Re: Recrawl urls

2006-08-03 Thread Lourival Júnior
Hi Nahuel! You could use the command bin/nutch inject $nutch-dir/db -urlfile urlfile.txt. To recrawl your WebDB you can use this script.http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html Take a look to the adddays argument and to the configuration property

Re: 0.8 Recrawl script updated

2006-08-03 Thread Lourival Júnior
Hi Matthew! Could you update the script to the version 0.7.2 with the same functionalities? I write a scritp that do this, but it don't work very well... Regards! On 8/2/06, Matthew Holt [EMAIL PROTECTED] wrote: Just letting everyone know that I updated the recrawl script on the Wiki. It now

Re: Recrawl urls

2006-08-03 Thread Nahuel ANGELINETTI
Hi, Thanks a lot of those informationations, their are very helpful. Happy to see the community is active ! Regards, -- Nahuel ANGELINETTI Le Thu, 3 Aug 2006 08:31:22 -0300, Lourival Júnior [EMAIL PROTECTED] a écrit : Hi Nahuel! You could use the command bin/nutch inject $nutch-dir/db

configuration property fetcher.store.content

2006-08-03 Thread Timo Scheuer
Hi, I am not sure what setting the property fetcher.store.content to false means. Is the only consequence that there there is no cached page data available which can be accessed from search results or does this property play an important role at any other stage in overal index and search

Re: Howto deploy a ROOT.war (if needed)

2006-08-03 Thread Timo Scheuer
Am Mittwoch, 26. Juli 2006 16:09 schrieb NG-Marketing, M.Schneider: after moving the ROOT.war file in the Tomcat5 webapps directory and restarting Tomcat, the ROOT.war will be extracted into the ROOT directory automatically. Then, after changing as example my search.jsp and restarting the

Re: Recrawl urls

2006-08-03 Thread Nahuel ANGELINETTI
But the websites just added hasn't been yet crawled... And they're not crawled during recrawl... Does bin/nutch purge will restart all ? Le Thu, 3 Aug 2006 09:21:04 -0300, Lourival Júnior [EMAIL PROTECTED] a écrit : In the nutch conf/nutch-default.xml configuration file exist a property call

Re: Recrawl urls

2006-08-03 Thread Lourival Júnior
Which version are you using? On 8/3/06, Nahuel ANGELINETTI [EMAIL PROTECTED] wrote: But the websites just added hasn't been yet crawled... And they're not crawled during recrawl... Does bin/nutch purge will restart all ? Le Thu, 3 Aug 2006 09:21:04 -0300, Lourival Júnior [EMAIL PROTECTED] a

Re: Recrawl urls

2006-08-03 Thread Nahuel ANGELINETTI
0.7.2 of nutch Le Thu, 3 Aug 2006 09:37:24 -0300, Lourival Júnior [EMAIL PROTECTED] a écrit : Which version are you using? On 8/3/06, Nahuel ANGELINETTI [EMAIL PROTECTED] wrote: But the websites just added hasn't been yet crawled... And they're not crawled during recrawl... Does

Re: Recrawl urls

2006-08-03 Thread Lourival Júnior
This command bin/nutch purge doesn't exist. Well I can't say you what is happening. Give me the output when you run the recrawl. On 8/3/06, Nahuel ANGELINETTI [EMAIL PROTECTED] wrote: 0.7.2 of nutch Le Thu, 3 Aug 2006 09:37:24 -0300, Lourival Júnior [EMAIL PROTECTED] a écrit : Which version

Re: How to add database to an existing nutch index?

2006-08-03 Thread Timo Scheuer
Am Donnerstag, 27. Juli 2006 12:15 schrieb Patrick Kratzenstein: If the next search will be startet now by a term contained in the database, it ain't give me any results! Why?! I've tried it several times and I really startet to believe that there must be something about the fetch tools in

NullPointException

2006-08-03 Thread Lourival Júnior
Why when I delete some segments that reach the db.default.fetcth.intervalthe search application gets the nullPointerException? Periodically I have to recrawl my Site. And delete old segments is a problem. Someone have a suggestion? Regards -- Lourival Junior Universidade Federal do Pará Curso

Re: 0.8 Recrawl script updated

2006-08-03 Thread Matthew Holt
I'm currently pretty busy at work. If I have I'll do it later. The version 0.8 recrawl script has a working version online now. I temporarily modified it on the website yesterday when I ran into some problems, but I further tested it and the actual working code is modified now. So if you got

Re: -numFetchers in generate command

2006-08-03 Thread Andrzej Bialecki
Murat Ali Bayir wrote: Hi everbody, Although we give number of Fetchers in generate command, our system always produce fixed number of part in reduce process? What can be reason for this? Do we have to change anything in configuration file of Hadoop? Most probably you put the numbers of

Re: NullPointException

2006-08-03 Thread Marko Bauhardt
Hi, if you delete segments then be sure that you doesnt have an index from this segment. The segment contains the parsed content and the index is the index from this content. If you delete the segment and you doing a search on this index, a NPE occurs because no summary (parsed content) are

Re: NullPointException

2006-08-03 Thread Lourival Júnior
/segments/20060717150815 060803 132736 true 20060718-09:02:56 20060718-09:03:10 5 crawl-legislacao_copia/segments/20060718090250 060803 132736 true 20060803-10:55:18 20060803-12:53:49 1541crawl-legislacao_copia/segments/20060803105509 060803 132736 true 20060803

Re: NullPointException

2006-08-03 Thread Marko Bauhardt
Am 03.08.2006 um 18:52 schrieb Lourival Júnior: My questions: Why it occurs? How can I know which segments can be deleted? You must know which segment are indexed. You can not index all segments and after that delete these segments. The Indexer index the name of the segment that the

ZIP plugin in nutch 0.7.2

2006-08-03 Thread Lourival Júnior
Hi all!! Could I use the zip plugin from nutch 0.8 in nutch 0.7.2? Is there any problem? Regards. -- Lourival Junior Universidade Federal do Pará Curso de Bacharelado em Sistemas de Informação http://www.ufpa.br/cbsi Msn: [EMAIL PROTECTED]

Re: 0.8 Recrawl script updated

2006-08-03 Thread Matthew Holt
Last email regarding this script. I found a bug in it that is sporadic (i think it only affected different setups). However, since it would be a problem sometimes, I refactored the script. I'd suggest you redownload the script if you are using it. Matt Matthew Holt wrote: I'm currently

Re: 0.8 Recrawl script updated

2006-08-03 Thread Lukas Vlcek
Hi Matthew, I am surious about one thing. How do you know you can just drop $depth number of the most oldest segments in the end? I haven't studied nutch code regarding this topic yet but I thought that segment can be dropped once you are sure that all its content is already crawled in some