Are there any plans to address the issue described in the post below? URL's
with spaces are still not possible in nutch.
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg01322.html
Andrzej,
This worked very well! Thanks!
> I meant setting the fs.default.name to local, and
> mapred.job.tracker to
> local.
>
> Set mapred.map.tasks to 1, and mapred.reduce.tasks to 1. You
> can achieve
> the same effect for "generate" if you use -numFetchers 1.
Teruhiko Kurosaka wrote:
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
Sent: 2006-10-24 18:27
There was a bug in some versions of 0.8, so that if you ran it with
"local" FS & jobtracker it would generate too many parts of the
fetchlist, and then process only one randomly selec
> From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
> Sent: 2006-10-24 18:27
> There was a bug in some versions of 0.8, so that if you ran it with
> "local" FS & jobtracker it would generate too many parts of the
> fetchlist, and then process only one randomly selected part.
> If that's
> th
Eelco Lempsink wrote:
Of course, for high volumes of data first indexing, and afterwards
removing it, doesn't sound like a good option in my case where only a
small part of the fetched data needs to be indexed.
Has anyone solved this problem (elegantly)? I mainly wonder if it's
feasible to d
Hello,
I'm looking for a solution to a problem typical to domain specific
search engines: a way to prevent certain pages to be indexed, based
on their content, but keeping the outlinks of the page. When
searching this mailing list I noticed this question (or something
similar) being aske
2006/10/23, Andrzej Bialecki <[EMAIL PROTECTED]>:
Tomi NA wrote:
> 2006/10/18, [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
>
>> Btw we have some virtual local hosts, hoz does the
>> db.ignore.external.links
>> deal with that ?
>
> Update:
> setting db.ignore.external.links to true in nutch-site (and l