Re:can't delete from an index using IndexReader.delete()

2004-02-20 Thread Dhruba Borthakur
 PROTECTED]
Sent: Sunday, June 22, 2003 9:42 PM
To: Lucene Users List
The code looks fine.  Unfortunately, the provided code is not a full,
self-sufficient class that I can run on my machine to verify the
behaviour that you are describing.
Otis

_
Find and compare great deals on Broadband access at the MSN High-Speed 
Marketplace. http://click.atdmt.com/AVE/go/onm00200360ave/direct/01/
Return-Path: [EMAIL PROTECTED]
Received: (qmail 33315 invoked from network); 20 Feb 2004 09:00:36 -
Received: from unknown (HELO hotmail.com) (64.4.49.40)
 by daedalus.apache.org with SMTP; 20 Feb 2004 09:00:36 -
Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC;
	 Fri, 20 Feb 2004 01:00:49 -0800
Received: from 143.127.3.10 by by14fd.bay14.hotmail.msn.com with HTTP;
	Fri, 20 Feb 2004 09:00:49 GMT
X-Originating-IP: [143.127.3.10]
X-Originating-Email: [EMAIL PROTECTED]
X-Sender: [EMAIL PROTECTED]
From: Dhruba Borthakur [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Bcc:
Subject: Re:can't delete from an index using IndexReader.delete()
Date: Fri, 20 Feb 2004 01:00:49 -0800
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary==_NextPart_000_3236_702b_6f4e
Message-ID: [EMAIL PROTECTED]
X-OriginalArrivalTime: 20 Feb 2004 09:00:49.0661 (UTC) 
FILETIME=[091AD2D0:01C3F790]
X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N

This is a multi-part message in MIME format.

--=_NextPart_000_3236_702b_6f4e
Content-Type: text/plain; format=flowed
Hi folks,

I am using the latest and greatest Lucene jar file and am facing a problem
with
deleting documents from the index. Browsing the mail archive, I found that
the
following email (June 2003) listed the exact problem that I am encountering.
In short: I am using Field.text(id, value) to mark a document. Then I
use
reader.delete(new Term(id, value)) to remove the document: this
call returns 0 and fails to delete the document. The attached sample program
shows this behaviour.
i would appreciate it a lot if anybody in this list has encountered this
problem before
and would like to share his/her solution with me.
thanks,
dhruba
From: Robert Koberg [EMAIL PROTECTED]
Subject: can't delete from an index using IndexReader.delete()
Date: Mon, 23 Jun 2003 14:38:25 -0700
Content-Type: text/plain;
charset=us-ascii
Here is a simple class that can reproduce the problem (happens with the last
stable release too). Let me know if you would prefer this as an attachment.
Call like this:
java TestReaderDelete existing_id new_label
- or -
Try:
java TestReaderDelete B724547 ppp
and then try:
java TestReaderDelete a266122794 ppp
If an index has not been created it will create one. Keep running the one of
the above example commands (with and without deleting the index directory)
and see what happens to the System.out.println's


import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.DateField;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.xml.sax.Attributes;
import javax.xml.parsers.*;
import java.io.*;
import java.util.*;
class TestReaderDelete {



public static void main(String[] args)
  throws IOException
{
  File index = new File(./testindex);
  if (!index.exists()) {
HashMap test_map = new HashMap();
test_map.put(preamble_content, Preamble content bbb);
test_map.put(art_01_section_01, Article 1, Section 1);
test_map.put(toc_tester, Test TOC XML bbb);
test_map.put(B724547, bio example);
test_map.put(a266122794, tester);
indexFiles(index, test_map);
  }
  String identifier = args[0];
  String new_label = args[1];
  testDeleteAndAdd(index, identifier, new_label);
}
public static void indexFiles(File index, HashMap test_map)
{
  try {
IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(),
true);
for (Iterator i=test_map.entrySet().iterator(); i.hasNext(); ) {
  Map.Entry e = (Map.Entry) i.next();
System.out.println(Adding:  + e.getKey() +  =  + e.getValue());
  Document doc = new Document();
  doc.add(Field.Text(id, (String)e.getKey()));
  doc.add(Field.Text(label, (String)e.getValue()));
  writer.addDocument(doc);
}
writer.optimize();
writer.close();
  } catch (Exception e) {
System.out.println( caught a  + e.getClass() +
\n with message:  + e.getMessage());
  }
}
public static void testDeleteAndAdd(File index, String identifier, String
new_label)
  throws IOException
{
  IndexReader reader = IndexReader.open(index);
System.out.println(!!! reader.numDocs() :  + reader.numDocs());
System.out.println(reader.indexExists():  + reader.indexExists(index));
System.out.println(term field:  + new Term(id, identifier).field

Re: Re:can't delete from an index using IndexReader.delete()

2004-02-20 Thread Morus Walter
Dhruba Borthakur writes:
 Hi folks,
 
 I am using the latest and greatest Lucene jar file and am facing a problem 
 with
 deleting documents from the index. Browsing the mail archive, I found that 
 the
 following email (June 2003) listed the exact problem that I am encountering.
 
 In short: I am using Field.text(id, value) to mark a document. Then I 
 use
 reader.delete(new Term(id, value)) to remove the document: this
 call returns 0 and fails to delete the document. The attached sample program
 shows this behaviour.
 
You don't tell us how your ids look like, but Field.text(id, value)
tokenizes value, that is splits value into whatever the analyzer considers
to be a token, and creates a term for each token. 
Whereas new Term(id, value) creates one term containing value.

So I guess your ids are considered several token by the analyzer you use
and therefore they won't be matched by the term you construct for the delete.

Using keyword fields instead of text fields for the id should help.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]