Yes, but not with core Lucene. This is a FAQ, I believe.
You need to write or get a crawler. For something simple see Lucene's
Powered By page. For something more complex see LARM in Lucene
Sandbox.
The links are on the site.
Otis
--- Hanasaki JiJi <[EMAIL PROTECTED]> wrote:
> Any quick easy w
Adding a document does not necessarily cause existing index files to be
modified. 'not necessarily' because sometimes adding a document
triggers segment merging. There is a recent (March 5th) article on
http://onjava.com about Lucene that talks more about that.
Otis
--- Avi Drissman <[EMAIL PRO
Hi Pierre,
I did the same thing some time ago. Here are the
highlights :
1- Create a FrenchStemFilter class that extends
TokenFilter
import net.sf.snowball.ext.frenchStemmer;
/**
* Constructor for SnowballFrenchStemFilter.
*/
public FrenchStemFilter(TokenStream in)
{
stemmer = new fr
Morus Walter wrote:
Searches must be able on any combination of collections.
A typical search includes ~ 40 collections.
Now the question is, how to implement this in lucene best.
Currently I see basically three possibilities:
- create a data field containing the collection name for each document
Any quick easy way to index static files
(html/pdf/doc/http://www.htdig.org/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Morus,
On Wednesday 19 March 2003 00:44, Morus Walter wrote:
> Hi,
>
> we are currently evaluating lucene.
>
> The data we'd like to index consists of ~ 80 collections of documents
> (a few hundred up to 20 documents per collection, ~ 1.5 million
> documents total; medium document size is in t
At 10:54 AM -0800 3/19/03, you wrote:
If your data will be changing frequently and indices all of them need
to be in sync all the time then yes, probably, esp. if the changes are
frequent but small.
Hmm...
I think I need to rephrase my first question. Suppose I have a big
index. I add a document
--- Avi Drissman <[EMAIL PROTECTED]> wrote:
> At 10:25 AM -0800 3/19/03, you wrote:
>
> >Haven't used it. Reported speed (by the author) was poor.
>
> Hm. Is that due to the implementation or possibly to the database?
Not sure. The author may know.
> >I've done that. I simply used scp to co
At 10:25 AM -0800 3/19/03, you wrote:
Haven't used it. Reported speed (by the author) was poor.
Hm. Is that due to the implementation or possibly to the database?
I've done that. I simply used scp to copy the index from the build
machine to a set of maybe dozen servers.
Well, this data is going
Avi,
--- Avi Drissman <[EMAIL PROTECTED]> wrote:
> I've successfully used Lucene to do indexing of about 50-100K files,
> and have been keeping the index on a local disk. It's time to move
> up, and now I'm planning to index from 100-500K files.
>
> I'm trying to decide whether or not it pays t
Sorry. I wasn't verbose enough.
I use the default memory settings. But my issue was the core structure of Lucene
taking up (it seems to me) more memory than it would have to, if it had a
different approach.
Correct me if I'm wrong, but it seems to me that BooleanQuery stores all hits
(as Bucket ob
On Wednesday 19 March 2003 01:44, Morus Walter wrote:
...
> Searches must be able on any combination of collections.
> A typical search includes ~ 40 collections.
>
> Now the question is, how to implement this in lucene best.
>
> Currently I see basically three possibilities:
> - create a data fiel
I've successfully used Lucene to do indexing of about 50-100K files,
and have been keeping the index on a local disk. It's time to move
up, and now I'm planning to index from 100-500K files.
I'm trying to decide whether or not it pays to hold the index in our
database. Our database (FrontBase)
Robert,
I'm moving this to lucene-user, which is a more appropriate list for
this type of a problem.
You are not saying whether you are using some of those handy -X (-Xms
-Xmx) command line switches when you invoke your application that dies
with OutOfMemoryError.
If you are not, try that, it may
Ok thx !!! That is exactly what i was looking for...
But how can i use it ?
(sorry i'm kinda noob in Java)...
The snowball.JAR has been added to my project, but now i dunno how to use
it...
-Original Message-
From: Alex Murzaku [mailto:[EMAIL PROTECTED]
Sent: mercredi 19 mars 2003 15:49
You can find Danish, Dutch, English, Finnish, French, German, Italian,
Norwegian, Portuguese, Russian, Spanish and Swedish Snowball
stemmers/analyzers at:
http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/
Doug or Otis, why don't you move these out of the sandbox and make them
integral
Heya all,
I'm looking for a full French Analyser, containing a FrenchPorterStemmer...
Does anyone know where i can find one ?
And if I wanna create my own FrenchAnalyser - I have the STOP_WORDS list -
can I remove the standard PorterStemFilter ?
In fact, can I crete a new Analyser without Porte
Hello Morus,
I'd tell, how wildcard query works:
1. First, it runs over the lexcon and collects a list of terms that
satisfy the specified pattern.
2. Then it makes a boolean query joining the collected terms with "or".
3. Then the constructed boolean query is used for searching.
So is seems
I'd actually be interested in hearing answers about this too, but from our
experience:
We do something similar. We have data that we have indexed per account id
(100 or
so). We have them separate in the case one of them blows up, down time is
not acceptable
so we have the data partitioned. Unlik
Hi,
we are currently evaluating lucene.
The data we'd like to index consists of ~ 80 collections of documents
(a few hundred up to 20 documents per collection, ~ 1.5 million documents
total; medium document size is in the order of 1 kB).
Searches must be able on any combination of collection
20 matches
Mail list logo