Using Lucene to index a web forum

2007-01-13 Thread Melange
Hello, I'd like to index a web forum (phpBB) with Lucene. I wonder how to best map the forum document model (topics and their messages) to the Lucene document model. Usually, some forum member creates a new topic with its first message text, then other members add reply messages to that topic. Me

Re: Using Lucene to index a web forum

2007-01-13 Thread Nicolas Lalevée
Le Samedi 13 Janvier 2007 10:49, Melange a écrit : > Hello, I'd like to index a web forum (phpBB) with Lucene. I wonder how to > best map the forum document model (topics and their messages) to the Lucene > document model. > > Usually, some forum member creates a new topic with its first message te

Re: Using Lucene to index a web forum

2007-01-13 Thread Melange
Nicolas Lalevée-2 wrote: > > Le Samedi 13 Janvier 2007 10:49, Melange a écrit : >> Hello, I'd like to index a web forum (phpBB) with Lucene. I wonder how to >> best map the forum document model (topics and their messages) to the >> Lucene >> document model. >> >> Usually, some forum member crea

Highlighting brackets bug ?

2007-01-13 Thread heikki doeleman
Hi there, I'm having some strange behaviour using the highlighter and I'm wondering if it is a bug or should I take a different approach ? I want to highlight the search terms that were used to execute a query. If the search terms end in an end-bracket or end-square-bracket (so ')' or ']' ), the

Making document numbers persistent

2007-01-13 Thread Kay Roepke
Hi! With a project we want to use Lucene in, we are running into performance problems with regard to building filter sets. Let me give you a quick overview of what we need to do: We are indexing information about users (index magnitude is ranging between 2 - 10 million documents). Each of th

Re: Making document numbers persistent

2007-01-13 Thread Erick Erickson
you say <<>> Before going off into modifying things, could you expand a bit on how you query to build up the filter? Perhaps providing a code snippet? Just to be sure we're talking about the same thing, when you say filter, are you talking about Lucene filters? I'm assuming you are, in which cas

Re: Making document numbers persistent

2007-01-13 Thread karl wettin
13 jan 2007 kl. 19.14 skrev Kay Roepke: All of the users (documents we index) are "connected" to certain other users, in a network fashion. We must be able to restrict the query (or filter it after searching the complete index) to certain "levels of connectedness", i.e. you can search with

Re: Highlighting brackets bug ?

2007-01-13 Thread Mark Miller
Which version are you using? I believe that this is a bug that was fixed last August...but that the fix is only in the 2.1 Highlighter version. Try grabbing the latest highlighter code from the trunk. - Mark heikki doeleman wrote: Hi there, I'm having some strange behaviour using the highlig

Re: Making document numbers persistent

2007-01-13 Thread Kay Roepke
Hi Erick! On 13. Jan 2007, at 19:54 , Erick Erickson wrote: Before going off into modifying things, could you expand a bit on how you query to build up the filter? Perhaps providing a code snippet? We are passing in our unique ids from our database which we have to translate to lucene doc

Re: Making document numbers persistent

2007-01-13 Thread Kay Roepke
Hi Karl! On 13. Jan 2007, at 20:12 , karl wettin wrote: 13 jan 2007 kl. 19.14 skrev Kay Roepke: All of the users (documents we index) are "connected" to certain other users, in a network fashion. We must be able to restrict the query (or filter it after searching the complete index) to ce

Re: Making document numbers persistent

2007-01-13 Thread Mark Miller
I can handle situations where this can take long once, since I'm really asking something that Lucene isn't designed for, but the culprit is that I can't really cache the resulting bitset. I can cache it on one of the Lucene servers, but can't share it among the rest of the servers (we will e

Re: Making document numbers persistent

2007-01-13 Thread Kay Roepke
On 14. Jan 2007, at 2:40 , Mark Miller wrote: First, have you looked at SwarmCache? Cluster aware caching for java... No, I haven't come across that one. I'll take a look, thanks! As a matter of fact, we do have a network-wide caching mechanism, so that's what we use. Second...does it ma

Re: Making document numbers persistent

2007-01-13 Thread Mark Miller
Sorry Kay, I jumped in midstream...I should have read your first post more thoroughly. By the way, many of the experts rarely comment much on the weekend so you will probably get some good answers come Monday (lots of replies often attract their attention ). I do have one more whack though: I

Re: Making document numbers persistent

2007-01-13 Thread Kay Roepke
On 14. Jan 2007, at 3:20 , Mark Miller wrote: Sorry Kay, I jumped in midstream...I should have read your first post more thoroughly. No problem, it was a bit lenghty, anyway...sorry about that. I just tried to give enough information so that people don't get confused too much. By the w

Re: Making document numbers persistent

2007-01-13 Thread Erick Erickson
A couple of things... 1> You're probably already aware that the indexreader doesn't reflect updates until it is re-opened, so any filters you cached would be valid until you re-opened the reader. CachingWrapperFilter will store the Lucene filters for you. But this probably isn't germane to your p

Re: Making document numbers persistent

2007-01-13 Thread Chris Hostetter
: So what we want to do is to cache the filters, once created. Since : the document ids would not be the same across the Lucene : servers we'll be using, we can only cache the filters per server, : which is a big performance loss. We also cannot reasonably control : on which Lucene server the requ

Re: Making document numbers persistent

2007-01-13 Thread Chris Hostetter
: 4> It's playing with fire, but you say "in essence, we want persistent : Lucene document numbers". I believe they *are* persistent until and unless : you optimize *after* deleting documents. So you control when they change : (you'll get more information by searching the mail archive, but wha

Re: Making document numbers persistent

2007-01-13 Thread Doron Cohen
> > : - To keep the document ids from changing we could prevent segment > : merging - I'm not concerned with optimizing indices, this can be done > : offline, > :and I'm prepared to build the caches after that. What would be the > : ballpark figure for query time degradation, approximately? > :