Re: partial updating of lucene
But when I am searching, it only searches in the index. Stored fields are only used to display the results, not to search. Why would it lose the terms in the index when I retrieve the document? First solution is not possible (I can't create a new document) since I only have modified fields. When I get a document, doesn't the fields have indexed terms along with it? Is there no way to get a full document (along with indexed terms) and clone it and add it to the index? Well is there anyway I ca update a document with just one field (because I only have data for that one field)? Praveen - Original Message - From: Justin Swanhart [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, December 08, 2004 5:59 PM Subject: Re: partial updating of lucene You unstored fields were not stored in the index, only their terms were stored. When you get the document from the index and modify it, those terms are lost when you add the document again. You can either simply create a new document and populate all the fields and add that document to the index, or you can add the unstored fields to the document retrieved in step 1. On Wed, 8 Dec 2004 17:53:26 -0500, Praveen Peddi [EMAIL PROTECTED] wrote: Hi all, I have a question about updating the lucene document. I know that there is no API to do that now. So this is what I am doing in order to update the document with the field title. 1) Get the document from lucene index 2) Remove a field called title and add the same field with a modified value 3) Remove the docment (based on one of our field) using Reader and then close the Reader. 4) Add the document that is obtained in 1 and modified in 2. I am not sure if this is the right way of doing it but I am having problems searching for that document after updating it. The problem is only with the un stored fields. For example, I search as description:boy where description is a unstored, indexed, tokenized field in the document. I find 1 document. Now I update the document the document's title as descripbed above and repeat the same search description:boy and now I don't find any results. I have not touched the field description at all. I just updated the field title. Is this an expected behaviour? If not, is it a bug. If I change the field description as stored, indexed and tokenized, the search works fine before and after updating. Praveen ** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.contextmedia.com ** Context Media- The Leader in Enterprise Content Integration - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: partial updating of lucene
If I store all the fields I am indexing, is it safe to get the document, update a fields and add it again to the search index? I do not want to lose anything and I want to make sure that document is same before and after updating (execpt for the updated fields). Praveen - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, December 09, 2004 10:00 AM Subject: Re: partial updating of lucene On Dec 9, 2004, at 9:48 AM, Praveen Peddi wrote: But when I am searching, it only searches in the index. Stored fields are only used to display the results, not to search. Why would it lose the terms in the index when I retrieve the document? First solution is not possible (I can't create a new document) since I only have modified fields. When I get a document, doesn't the fields have indexed terms along with it? Is there no way to get a full document (along with indexed terms) and clone it and add it to the index? Well is there anyway I ca update a document with just one field (because I only have data for that one field)? A Document only carries along its *stored* fields. Fields that are indexed, but not stored, are not retrievable from Document. Have a look at the tool Luke (Google for luke lucene :) and see how it does its Reconstruct and Edit facility. It is possible, though potentially lossy, to reconstruct a document and add it again. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: partial updating of lucene
On Thu, 2004-12-09 at 09:00, Erik Hatcher wrote: Have a look at the tool Luke (Google for luke lucene :) and see how it does its Reconstruct and Edit facility. It is possible, though potentially lossy, to reconstruct a document and add it again. Or look at LIMO's implementation of that feature, which to my eyes is a little easier to read (of course that's probably because I wrote it... ;): http://cvs.sourceforge.net/viewcvs.py/limo/limo/src/net/sourceforge/limo/LimoUtils.java?rev=1.6view=markup (check out LimoUtils.reconstructDocument()) However, if you're doing analysis on your text to remove stopwords and stuff like that, this WILL be lossy. I consider it more of an aid for debugging than a way to re-index documents, though I suppose it would work for that as well. However, I believe the process would be highly resource intensive so I wouldn't recommend it. The better solution is to add a stored keyword field that stores the location of your document, and then re-index it from the source. Regards, Luke Francl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: partial updating of lucene
You unstored fields were not stored in the index, only their terms were stored. When you get the document from the index and modify it, those terms are lost when you add the document again. You can either simply create a new document and populate all the fields and add that document to the index, or you can add the unstored fields to the document retrieved in step 1. On Wed, 8 Dec 2004 17:53:26 -0500, Praveen Peddi [EMAIL PROTECTED] wrote: Hi all, I have a question about updating the lucene document. I know that there is no API to do that now. So this is what I am doing in order to update the document with the field title. 1) Get the document from lucene index 2) Remove a field called title and add the same field with a modified value 3) Remove the docment (based on one of our field) using Reader and then close the Reader. 4) Add the document that is obtained in 1 and modified in 2. I am not sure if this is the right way of doing it but I am having problems searching for that document after updating it. The problem is only with the un stored fields. For example, I search as description:boy where description is a unstored, indexed, tokenized field in the document. I find 1 document. Now I update the document the document's title as descripbed above and repeat the same search description:boy and now I don't find any results. I have not touched the field description at all. I just updated the field title. Is this an expected behaviour? If not, is it a bug. If I change the field description as stored, indexed and tokenized, the search works fine before and after updating. Praveen ** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.contextmedia.com ** Context Media- The Leader in Enterprise Content Integration - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]