Hi everbody, Although we give number of Fetchers in generate command,
our system always produce fixed number of part in reduce process? What
can be reason for this? Do we have to change anything in configuration
file of Hadoop?
Hello,
I was searching for the method to add new url to the crawling url list
and how to recrawl all urls...
Can you help me ?
thanks,
--
Nahuel ANGELINETTI
Hi Nahuel!
You could use the command bin/nutch inject $nutch-dir/db -urlfile
urlfile.txt. To recrawl your WebDB you can use this
script.http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html
Take a look to the adddays argument and to the configuration property
Hi Matthew!
Could you update the script to the version 0.7.2 with the same
functionalities? I write a scritp that do this, but it don't work very
well...
Regards!
On 8/2/06, Matthew Holt [EMAIL PROTECTED] wrote:
Just letting everyone know that I updated the recrawl script on the
Wiki. It now
Hi,
Thanks a lot of those informationations, their are very helpful.
Happy to see the community is active !
Regards,
--
Nahuel ANGELINETTI
Le Thu, 3 Aug 2006 08:31:22 -0300,
Lourival Júnior [EMAIL PROTECTED] a écrit :
Hi Nahuel!
You could use the command bin/nutch inject $nutch-dir/db
Hi,
I am not sure what setting the property fetcher.store.content to false
means. Is the only consequence that there there is no cached page data
available which can be accessed from search results or does this property
play an important role at any other stage in overal index and search
Am Mittwoch, 26. Juli 2006 16:09 schrieb NG-Marketing, M.Schneider:
after moving the ROOT.war file in the Tomcat5 webapps directory and
restarting Tomcat, the ROOT.war will be extracted into the ROOT directory
automatically.
Then, after changing as example my search.jsp and restarting the
But the websites just added hasn't been yet crawled... And they're not
crawled during recrawl...
Does bin/nutch purge will restart all ?
Le Thu, 3 Aug 2006 09:21:04 -0300,
Lourival Júnior [EMAIL PROTECTED] a écrit :
In the nutch conf/nutch-default.xml configuration file exist a
property call
Which version are you using?
On 8/3/06, Nahuel ANGELINETTI [EMAIL PROTECTED] wrote:
But the websites just added hasn't been yet crawled... And they're not
crawled during recrawl...
Does bin/nutch purge will restart all ?
Le Thu, 3 Aug 2006 09:21:04 -0300,
Lourival Júnior [EMAIL PROTECTED] a
0.7.2 of nutch
Le Thu, 3 Aug 2006 09:37:24 -0300,
Lourival Júnior [EMAIL PROTECTED] a écrit :
Which version are you using?
On 8/3/06, Nahuel ANGELINETTI [EMAIL PROTECTED] wrote:
But the websites just added hasn't been yet crawled... And they're
not crawled during recrawl...
Does
This command bin/nutch purge doesn't exist. Well I can't say you what is
happening. Give me the output when you run the recrawl.
On 8/3/06, Nahuel ANGELINETTI [EMAIL PROTECTED] wrote:
0.7.2 of nutch
Le Thu, 3 Aug 2006 09:37:24 -0300,
Lourival Júnior [EMAIL PROTECTED] a écrit :
Which version
Am Donnerstag, 27. Juli 2006 12:15 schrieb Patrick Kratzenstein:
If the next search will be startet now by a term contained in the database,
it ain't give me any results! Why?!
I've tried it several times and I really startet to believe that there must
be something about the fetch tools in
Why when I delete some segments that reach the
db.default.fetcth.intervalthe search application gets the
nullPointerException? Periodically I have to
recrawl my Site. And delete old segments is a problem. Someone have a
suggestion?
Regards
--
Lourival Junior
Universidade Federal do Pará
Curso
I'm currently pretty busy at work. If I have I'll do it later.
The version 0.8 recrawl script has a working version online now. I
temporarily modified it on the website yesterday when I ran into some
problems, but I further tested it and the actual working code is
modified now. So if you got
Murat Ali Bayir wrote:
Hi everbody, Although we give number of Fetchers in generate command,
our system always produce fixed number of part in reduce process? What
can be reason for this? Do we have to change anything in configuration
file of Hadoop?
Most probably you put the numbers of
Hi,
if you delete segments then be sure that you doesnt have an index
from this segment.
The segment contains the parsed content and the index is the index
from this content. If you delete the segment and you doing a search
on this index, a NPE occurs because no summary (parsed content) are
/segments/20060717150815
060803 132736 true 20060718-09:02:56 20060718-09:03:10
5 crawl-legislacao_copia/segments/20060718090250
060803 132736 true 20060803-10:55:18 20060803-12:53:49
1541crawl-legislacao_copia/segments/20060803105509
060803 132736 true 20060803
Am 03.08.2006 um 18:52 schrieb Lourival Júnior:
My questions:
Why it occurs? How can I know which segments can be deleted?
You must know which segment are indexed. You can not index all
segments and after that delete these segments.
The Indexer index the name of the segment that the
Hi all!!
Could I use the zip plugin from nutch 0.8 in nutch 0.7.2? Is there any
problem?
Regards.
--
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: [EMAIL PROTECTED]
Last email regarding this script. I found a bug in it that is sporadic
(i think it only affected different setups). However, since it would be
a problem sometimes, I refactored the script. I'd suggest you redownload
the script if you are using it.
Matt
Matthew Holt wrote:
I'm currently
Hi Matthew,
I am surious about one thing. How do you know you can just drop $depth
number of the most oldest segments in the end? I haven't studied nutch
code regarding this topic yet but I thought that segment can be
dropped once you are sure that all its content is already crawled in
some
21 matches
Mail list logo