Re: newbie question: static page / file indexing

2003-03-19 Thread Otis Gospodnetic
Yes, but not with core Lucene. This is a FAQ, I believe. You need to write or get a crawler. For something simple see Lucene's Powered By page. For something more complex see LARM in Lucene Sandbox. The links are on the site. Otis --- Hanasaki JiJi <[EMAIL PROTECTED]> wrote: > Any quick easy w

Re: Putting the Lucene index into a database

2003-03-19 Thread Otis Gospodnetic
Adding a document does not necessarily cause existing index files to be modified. 'not necessarily' because sometimes adding a document triggers segment merging. There is a recent (March 5th) article on http://onjava.com about Lucene that talks more about that. Otis --- Avi Drissman <[EMAIL PRO

RE: Full French Analyser ?

2003-03-19 Thread René Ferréro
Hi Pierre, I did the same thing some time ago. Here are the highlights : 1- Create a FrenchStemFilter class that extends TokenFilter import net.sf.snowball.ext.frenchStemmer; /** * Constructor for SnowballFrenchStemFilter. */ public FrenchStemFilter(TokenStream in) { stemmer = new fr

Re: multiple collections indexing

2003-03-19 Thread Doug Cutting
Morus Walter wrote: Searches must be able on any combination of collections. A typical search includes ~ 40 collections. Now the question is, how to implement this in lucene best. Currently I see basically three possibilities: - create a data field containing the collection name for each document

newbie question: static page / file indexing

2003-03-19 Thread Hanasaki JiJi
Any quick easy way to index static files (html/pdf/doc/http://www.htdig.org/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: multiple collections indexing

2003-03-19 Thread Ype Kingma
Morus, On Wednesday 19 March 2003 00:44, Morus Walter wrote: > Hi, > > we are currently evaluating lucene. > > The data we'd like to index consists of ~ 80 collections of documents > (a few hundred up to 20 documents per collection, ~ 1.5 million > documents total; medium document size is in t

Re: Putting the Lucene index into a database

2003-03-19 Thread Avi Drissman
At 10:54 AM -0800 3/19/03, you wrote: If your data will be changing frequently and indices all of them need to be in sync all the time then yes, probably, esp. if the changes are frequent but small. Hmm... I think I need to rephrase my first question. Suppose I have a big index. I add a document

Re: Putting the Lucene index into a database

2003-03-19 Thread Otis Gospodnetic
--- Avi Drissman <[EMAIL PROTECTED]> wrote: > At 10:25 AM -0800 3/19/03, you wrote: > > >Haven't used it. Reported speed (by the author) was poor. > > Hm. Is that due to the implementation or possibly to the database? Not sure. The author may know. > >I've done that. I simply used scp to co

Re: Putting the Lucene index into a database

2003-03-19 Thread Avi Drissman
At 10:25 AM -0800 3/19/03, you wrote: Haven't used it. Reported speed (by the author) was poor. Hm. Is that due to the implementation or possibly to the database? I've done that. I simply used scp to copy the index from the build machine to a set of maybe dozen servers. Well, this data is going

Re: Putting the Lucene index into a database

2003-03-19 Thread Otis Gospodnetic
Avi, --- Avi Drissman <[EMAIL PROTECTED]> wrote: > I've successfully used Lucene to do indexing of about 50-100K files, > and have been keeping the index on a local disk. It's time to move > up, and now I'm planning to index from 100-500K files. > > I'm trying to decide whether or not it pays t

RE: OutOfMemoryError with boolean queries

2003-03-19 Thread Robert Wennström
Sorry. I wasn't verbose enough. I use the default memory settings. But my issue was the core structure of Lucene taking up (it seems to me) more memory than it would have to, if it had a different approach. Correct me if I'm wrong, but it seems to me that BooleanQuery stores all hits (as Bucket ob

Re: multiple collections indexing

2003-03-19 Thread Tatu Saloranta
On Wednesday 19 March 2003 01:44, Morus Walter wrote: ... > Searches must be able on any combination of collections. > A typical search includes ~ 40 collections. > > Now the question is, how to implement this in lucene best. > > Currently I see basically three possibilities: > - create a data fiel

Putting the Lucene index into a database

2003-03-19 Thread Avi Drissman
I've successfully used Lucene to do indexing of about 50-100K files, and have been keeping the index on a local disk. It's time to move up, and now I'm planning to index from 100-500K files. I'm trying to decide whether or not it pays to hold the index in our database. Our database (FrontBase)

Re: OutOfMemoryError with boolean queries

2003-03-19 Thread Otis Gospodnetic
Robert, I'm moving this to lucene-user, which is a more appropriate list for this type of a problem. You are not saying whether you are using some of those handy -X (-Xms -Xmx) command line switches when you invoke your application that dies with OutOfMemoryError. If you are not, try that, it may

RE: Full French Analyser ?

2003-03-19 Thread Pierre Lacchini
Ok thx !!! That is exactly what i was looking for... But how can i use it ? (sorry i'm kinda noob in Java)... The snowball.JAR has been added to my project, but now i dunno how to use it... -Original Message- From: Alex Murzaku [mailto:[EMAIL PROTECTED] Sent: mercredi 19 mars 2003 15:49

RE: Full French Analyser ?

2003-03-19 Thread Alex Murzaku
You can find Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish and Swedish Snowball stemmers/analyzers at: http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/ Doug or Otis, why don't you move these out of the sandbox and make them integral

Full French Analyser ?

2003-03-19 Thread Pierre Lacchini
Heya all, I'm looking for a full French Analyser, containing a FrenchPorterStemmer... Does anyone know where i can find one ? And if I wanna create my own FrenchAnalyser - I have the STOP_WORDS list - can I remove the standard PorterStemFilter ? In fact, can I crete a new Analyser without Porte

Re: multiple collections indexing

2003-03-19 Thread Vladimir Lukin
Hello Morus, I'd tell, how wildcard query works: 1. First, it runs over the lexcon and collects a list of terms that satisfy the specified pattern. 2. Then it makes a boolean query joining the collected terms with "or". 3. Then the constructed boolean query is used for searching. So is seems

Re: multiple collections indexing

2003-03-19 Thread John L Cwikla
I'd actually be interested in hearing answers about this too, but from our experience: We do something similar. We have data that we have indexed per account id (100 or so). We have them separate in the case one of them blows up, down time is not acceptable so we have the data partitioned. Unlik

multiple collections indexing

2003-03-19 Thread Morus Walter
Hi, we are currently evaluating lucene. The data we'd like to index consists of ~ 80 collections of documents (a few hundred up to 20 documents per collection, ~ 1.5 million documents total; medium document size is in the order of 1 kB). Searches must be able on any combination of collection