Hello,
I'm using SegmentMergeTool to merge some large segments, and I see that
the final index optimization (below) takes a looong time. I think this
index creation and optimization is triggered by the -i param to
SegmentMergeTools. From what I saw in the SegmentMergeTools.java, this
is an
Hi,
I'd like to see your presentation, but that server is down.
Otis
--- Chris Mattmann [EMAIL PROTECTED] wrote:
Hi there Jay,
Here are some numbers that a colleague and I presented in my
graduate
computer science seminar class on search engines in the Spring 05'
semester
at USC. The
Hi,
Does anyone know why Chris Mattmann's RSS plugin (
http://issues.apache.org/jira/browse/NUTCH-30 ) wasn't put in the
repository, and whether there are plans to revive it and include it?
Thanks,
Otis
Hello Otis,
If you are only reading ParseData and FetcherOutput from nutch segment
you do not need lucene index at all. So you can safely skip -i switch.
Regards
Piotr
On 7/21/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Hello,
I'm using SegmentMergeTool to merge some large segments, and I
Hi
I'm getting alot of the following errors? when fetching a
segment:
050721 094100 fetch okay, but can't parse
http://www.sahunt.co.za/sahunter/recepies/biltongsoup.html,
reason: failed(2,203): Content-Type not application/msword:
The page above is a pure html page however the fetch is ok
but
Matthias Jaekle wrote:
050721 071234 * Optimizing index...
... this takes a long time ...
Hello,
optimizing the index takes extremly long.
I have the feeling in earlier versions, this was much faster.
I just try to index a 7.000.000 Pages Segment.
This is running till 10 days now.
Dear Users!
How to delete realy from deleted entries from index?
I run the 'prune' tool and 'dedup' tool, and after it I would like to
remove deleted entries from index? How to optimize indexes?
Regards,
Ferenc
Hi Andrzej,
thanks for your response. I am not really familar with the lucene internals.
I am just running nutch with the default parameters on a debian sarge
system with ext3 file system, maximum 1024 files opened, and 1 GB RAM.
So is ext3 a bad file system for millions of files?
I could
Hi Andrzej,
At the time that I was working diligently on this plugin (April/May), I
had done some thorough research into finding what I felt would be the most
flexible, reliable way to parse RSS files. The RSS feed parser out of the
jakarta-commmons sandbox was what I found, and I stand by it.
Hi,
--- Andrzej Bialecki [EMAIL PROTECTED] wrote:
Matthias Jaekle wrote:
Hi Andrzej,
thanks for your response. I am not really familar with the lucene
internals.
I am just running nutch with the default parameters on a debian
sarge
system with ext3 file system, maximum 1024
Stefan - thanks for the reply. I'm still digesting Nutch and how to
work with it at a basic level but it does make sense to allow
metadata to tag along with fetches - I certainly don't know enough
yet to say whether your patch fits into the long-term vision of Nutch
or not yet.
I've
Hi Erik,
Stefan - thanks for the reply. I'm still digesting Nutch and how
to work with it at a basic level but it does make sense to allow
metadata to tag along with fetches - I certainly don't know enough
yet to say whether your patch fits into the long-term vision of
Nutch or not yet.
You probably don't want to touch indexer.termIndexInterval and
indexer.maxMergeDocs (determines the max size of an individual
segment).
Why is maxMergeDocs 50 by default? Should not this value be much higher?
I found how to calculate the number of opened files
But how could I calculate the
13 matches
Mail list logo