Hi.
As I can restore featch process? I want to continue featching to not to
run again on downloaded reference.
--
Best regards,
Sergey Levickiy
ICQ: 283616567
tel: +38(067)6483250
Hi there i wonder if there is a way after i crawled
to specify more than one url to look for
e.g.
I have in my crawlurl-filter.txt
http://www.firstUrl.com
http://www.secondUrl.com
http://*.thirdUrl.com
but I don't want results from all of these when i enter a search term
in the webinterface
so my
Hi,
Anybody knows how can I set Nutch to cache the results of the searches?
I've heard about this feature but I am not finding the information
Thanks,
Marco
Marco Vanossi wrote:
Hi,
Anybody knows how can I set Nutch to cache the results of the searches?
I've heard about this feature but I am not finding the information
Trivial web-level caching is easy to implement - just download osCache
and modify your web application settings according to
Hi,
I am using Nutch 0.9 for crawling. I recollect that
mapred.tasktracker.tasks.maximum can be used to control the max # of
tasks executed in parallel by a tasktracker.
I am running a fetch with the following config:
3 machines
My mapred-default.xml contains:
mapred.map.tasks=13
Is it possible on some pages to crawl only between tags or have it not
crawl between tags.
ie.
nocrawlblah blah blah/nocrawl
crawlherethe content only that I want to crawl/crawlhere
nocrawlblah blah blah/nocrawl
appreciate any input
kind regards
Marco,
We use a search caching system at Filangy -- uses lucene to save the Search
string, count, date and top 20 IDs of the pages. So all you have to do is
search for those IDs.
Yes, it still involves a search, but we have a distributed system with the
ID as the hash key for specifying on which
Philip Brown wrote:
Is it possible on some pages to crawl only between tags or have it not
crawl between tags.
ie.
nocrawlblah blah blah/nocrawl
crawlherethe content only that I want to crawl/crawlhere
nocrawlblah blah blah/nocrawl
appreciate any input
kind regards
You can modify
@Dennis,
Can you explain how to setup distributed search while storing the 2
indexes on the same local machine (if possible)?
@Feng,
We created a shell script to merge 2 runs, let us know if that works for
you.
http://wiki.apache.org/nutch/MergeCrawl
Renaud
Dennis Kubes wrote:
You can
Hi all!
Has anyone successful implemented the ZIP plugin in nutch version 0.7.2? How
can I do this?
Regards,
--
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: [EMAIL PROTECTED]
Hi:
Assuming you have
index 1 at /data/crawl1
index 2 at /data/crawl2
In nutch-site.xml
searcher.dir = /data
Under /data you have a text file called search-server.txt (I think do
check nutch-site search.dir description please)
In the text file you will have the following
hostname1
The task
---
I have less than 100GB of diverse documents (.doc, .pdf, .ppt, .txt,
.xls, etc.) to index. Dozens, or even hundreds and thousands of
documents can change their content, be created or deleted every day.
The crawler will run on a HP DL380 G4 server - don't know the exact
specs
Zaheed,
Thank you, that works good. Do you know if there is a big performance
overhead with starting 2 servers? As an alternative, we could use
Lucene's Multisearcher?
-- Renaud
Zaheed Haque wrote:
Hi:
Assuming you have
index 1 at /data/crawl1
index 2 at /data/crawl2
In nutch-site.xml
Renaud:
Yes or No!. I have done some testing as Dennis Kubes suggested and got
similler results like his test. In short having 4 nutch search servers
in one box but in 4 different disks with in my case 0.75 mil docs per
disk. I had about 4 gig memory and 1 AMD 64 processor and it worked
out
thanks, Renaud:
I figured out the same senario as your script, it works well.
Michael
On 9/5/06, Renaud Richardet [EMAIL PROTECTED] wrote:
@Dennis,
Can you explain how to setup distributed search while storing the 2
indexes on the same local machine (if possible)?
@Feng,
We created a shell
Hi there,
I want to filter out particular ursl from search result.
And I try to use segement merger to do it;
Firstly, I put target urls in regex-urlfiter.txt and automaton-urlfiter.txt,
as -http://abc.com/;.
then, run nutch/mergesegs and nutch/index, but the search page still
show the urls I
Is there any changes about the writing plugins between nutch 0.8 and
0.7?I have some problems folowinng the plugin guide of nutch 0.7
Hi,
I followed all the steps in the 0.8 tutorial except that I have only 2
urls in the crawling list. When I do a search in Nutch in my browser, it
can't find anything as if it doesn't have anything in the db or index. Does
anyone know why?
Thanks.
--
View this message in context:
18 matches
Mail list logo