Re: partial updating of lucene

2004-12-09 Thread Praveen Peddi
But when I am searching, it only searches in the index. Stored fields are 
only used to display the results, not to search. Why would it lose the terms 
in the index when I retrieve the document?

First solution is not possible (I can't create a new document) since I only 
have modified fields.

When I get a document, doesn't the fields have indexed terms along with it? 
Is there no way to get a full document (along with indexed terms) and clone 
it and add it to the index?

Well is there anyway I ca update a document with just one field (because I 
only have data for that one field)?

Praveen
- Original Message - 
From: Justin Swanhart [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, December 08, 2004 5:59 PM
Subject: Re: partial updating of lucene


You unstored fields were not stored in the index, only their terms
were stored.  When you get the document from the index and modify it,
those terms are lost when you add the document again.
You can either simply create a new document and populate all the
fields and add that document to the index, or you can add the unstored
fields to the document retrieved in step 1.
On Wed, 8 Dec 2004 17:53:26 -0500, Praveen Peddi
[EMAIL PROTECTED] wrote:
Hi all,
I have a question about updating the lucene document. I know that there 
is no API to do that now. So this is what I am doing in order to update 
the document with the field title.

1) Get the document from lucene index
2) Remove a field called title and add the same field with a modified 
value
3) Remove the docment (based on one of our field) using Reader and then 
close the Reader.
4) Add the document that is obtained in 1 and modified in 2.

I am not sure if this is the right way of doing it but I am having 
problems searching for that document after updating it. The problem is 
only with the un stored fields.

For example, I search as description:boy where description is a 
unstored, indexed, tokenized field in the document. I find 1 document. 
Now I update the document the document's title as descripbed above and 
repeat the same search description:boy and now I don't find any 
results. I have not touched the field description at all. I just 
updated the field title.

Is this an expected behaviour? If not, is it a bug.
If I change the field description as stored, indexed and tokenized, the 
search works fine before and after updating.

Praveen
**
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel:  401.854.3475
Fax:  401.861.3596
web: http://www.contextmedia.com
**
Context Media- The Leader in Enterprise Content Integration

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: partial updating of lucene

2004-12-09 Thread Praveen Peddi
If I store all the fields I am indexing, is it safe to get the document, 
update a fields and add it again to the search index? I do not want to lose 
anything and I want to make sure that document is same before and after 
updating (execpt for the updated fields).

Praveen
- Original Message - 
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, December 09, 2004 10:00 AM
Subject: Re: partial updating of lucene


On Dec 9, 2004, at 9:48 AM, Praveen Peddi wrote:
But when I am searching, it only searches in the index. Stored fields are 
only used to display the results, not to search. Why would it lose the 
terms in the index when I retrieve the document?

First solution is not possible (I can't create a new document) since I 
only have modified fields.

When I get a document, doesn't the fields have indexed terms along with 
it? Is there no way to get a full document (along with indexed terms) and 
clone it and add it to the index?

Well is there anyway I ca update a document with just one field (because 
I only have data for that one field)?
A Document only carries along its *stored* fields.  Fields that are 
indexed, but not stored, are not retrievable from Document.

Have a look at the tool Luke (Google for luke lucene :) and see how it 
does its Reconstruct and Edit facility.  It is possible, though 
potentially lossy, to reconstruct a document and add it again.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: partial updating of lucene

2004-12-09 Thread Luke Francl
On Thu, 2004-12-09 at 09:00, Erik Hatcher wrote:

 Have a look at the tool Luke (Google for luke lucene :) and see how 
 it does its Reconstruct and Edit facility.  It is possible, though 
 potentially lossy, to reconstruct a document and add it again.

Or look at LIMO's implementation of that feature, which to my eyes is a
little easier to read (of course that's probably because I wrote it...
;):

http://cvs.sourceforge.net/viewcvs.py/limo/limo/src/net/sourceforge/limo/LimoUtils.java?rev=1.6view=markup

(check out LimoUtils.reconstructDocument())

However, if you're doing analysis on your text to remove stopwords and
stuff like that, this WILL be lossy. I consider it more of an aid for
debugging than a way to re-index documents, though I suppose it would
work for that as well. However, I believe the process would be highly
resource intensive so I wouldn't recommend it.

The better solution is to add a stored keyword field that stores the
location of your document, and then re-index it from the source.

Regards,
Luke Francl


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



partial updating of lucene

2004-12-08 Thread Praveen Peddi
Hi all,
I have a question about updating the lucene document. I know that there is no 
API to do that now. So this is what I am doing in order to update the document 
with the field title.

1) Get the document from lucene index
2) Remove a field called title and add the same field with a modified value
3) Remove the docment (based on one of our field) using Reader and then close 
the Reader.
4) Add the document that is obtained in 1 and modified in 2.

I am not sure if this is the right way of doing it but I am having problems 
searching for that document after updating it. The problem is only with the un 
stored fields.

For example, I search as description:boy where description is a unstored, 
indexed, tokenized field in the document. I find 1 document. Now I update the 
document the document's title as descripbed above and repeat the same search 
description:boy and now I don't find any results. I have not touched the 
field description at all. I just updated the field title.

Is this an expected behaviour? If not, is it a bug.

If I change the field description as stored, indexed and tokenized, the search 
works fine before and after updating.

Praveen
** 
Praveen Peddi
Sr Software Engg, Context Media, Inc. 
email:[EMAIL PROTECTED] 
Tel:  401.854.3475 
Fax:  401.861.3596 
web: http://www.contextmedia.com 
** 
Context Media- The Leader in Enterprise Content Integration 


Re: partial updating of lucene

2004-12-08 Thread Justin Swanhart
You unstored fields were not stored in the index, only their terms
were stored.  When you get the document from the index and modify it,
those terms are lost when you add the document again.

You can either simply create a new document and populate all the
fields and add that document to the index, or you can add the unstored
fields to the document retrieved in step 1.


On Wed, 8 Dec 2004 17:53:26 -0500, Praveen Peddi
[EMAIL PROTECTED] wrote:
 Hi all,
 I have a question about updating the lucene document. I know that there is no 
 API to do that now. So this is what I am doing in order to update the 
 document with the field title.
 
 1) Get the document from lucene index
 2) Remove a field called title and add the same field with a modified value
 3) Remove the docment (based on one of our field) using Reader and then close 
 the Reader.
 4) Add the document that is obtained in 1 and modified in 2.
 
 I am not sure if this is the right way of doing it but I am having problems 
 searching for that document after updating it. The problem is only with the 
 un stored fields.
 
 For example, I search as description:boy where description is a unstored, 
 indexed, tokenized field in the document. I find 1 document. Now I update the 
 document the document's title as descripbed above and repeat the same search 
 description:boy and now I don't find any results. I have not touched the 
 field description at all. I just updated the field title.
 
 Is this an expected behaviour? If not, is it a bug.
 
 If I change the field description as stored, indexed and tokenized, the 
 search works fine before and after updating.
 
 Praveen
 **
 Praveen Peddi
 Sr Software Engg, Context Media, Inc.
 email:[EMAIL PROTECTED]
 Tel:  401.854.3475
 Fax:  401.861.3596
 web: http://www.contextmedia.com
 **
 Context Media- The Leader in Enterprise Content Integration
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]