Using Lucene to index a web forum

2007-01-13 Thread Melange

Hello, I'd like to index a web forum (phpBB) with Lucene. I wonder how to
best map the forum document model (topics and their messages) to the Lucene
document model.

Usually, some forum member creates a new topic with its first message text,
then other members add reply messages to that topic. Messages are sometimes
updated, but most of the time topics grow incrementally. There's no limit
for the number of replies, thousands is nothing unusual.

Currently, I see two options for my Lucene data model: A single document
type or two document types (one for the topics and one for the messages).
When using only a single document type, things are fairly clear but there
would obviously be a lot of unneccessary index modifications (their would be
one field with all messages concatenated). To reduce the amount of index
updates, the separation of topics and messages seems to be the right thing
to do.

So I'd like to use two document types for my document model, but I do not
understand how I could bring these two together when searching. I don't want
to list all messages but I want the messages grouped by topic, how can I go
about that?

The topic documents could be boosted, but perhaps that's not even necessary
because of their relativly short length (compared to message documents).

Thanks,
Melange.
-- 
View this message in context: 
http://www.nabble.com/Using-Lucene-to-index-a-web-forum-tf2970740.html#a8312744
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using Lucene to index a web forum

2007-01-13 Thread Melange



Nicolas Lalevée-2 wrote:
> 
> Le Samedi 13 Janvier 2007 10:49, Melange a écrit :
>> Hello, I'd like to index a web forum (phpBB) with Lucene. I wonder how to
>> best map the forum document model (topics and their messages) to the
>> Lucene
>> document model.
>>
>> Usually, some forum member creates a new topic with its first message
>> text,
>> then other members add reply messages to that topic. Messages are
>> sometimes
>> updated, but most of the time topics grow incrementally. There's no limit
>> for the number of replies, thousands is nothing unusual.
>>
>> Currently, I see two options for my Lucene data model: A single document
>> type or two document types (one for the topics and one for the messages).
>> When using only a single document type, things are fairly clear but there
>> would obviously be a lot of unneccessary index modifications (their would
>> be one field with all messages concatenated). To reduce the amount of
>> index
>> updates, the separation of topics and messages seems to be the right
>> thing
>> to do.
>>
>> So I'd like to use two document types for my document model, but I do not
>> understand how I could bring these two together when searching. I don't
>> want to list all messages but I want the messages grouped by topic, how
>> can
>> I go about that?
>>
>> The topic documents could be boosted, but perhaps that's not even
>> necessary
>> because of their relativly short length (compared to message documents).
> 
> Hi Melange,
> 
> The two document types design will be only usefull if you want to search
> for 
> topics and search for messages. Here you want to search for messages
> grouped 
> by topic. So you should have one kind of document : message documents. In 
> this message docment, you will refer the topic's id, so you will be able
> to 
> group by topic. To group by topic some search results, you might be 
> interested by Solr's [1] faceted search [2].
> 
> cheers,
> Nicolas
> 
> [1] http://incubator.apache.org/solr/
> [2] http://wiki.apache.org/solr/SimpleFacetParameters
> 

Thank you Nicolas, good idea with the message documents, I'll do that
instead.

Sorry, I couldn't really find anything at the Solr links you provided
regarding the grouping of search results (hits). Will I have to load all the
hits into RAM in order to perform the grouping myself or is there a way to
have Lucene do that for me? Or how is this to be done, roughly?

Thanks,
Christian.
-- 
View this message in context: 
http://www.nabble.com/Using-Lucene-to-index-a-web-forum-tf2970740.html#a8315049
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]