Re: search on colon ":" ending words

2007-02-23 Thread Erick Erickson
I'd *strongly* advise doing it the simple way, that is, your replace. 1> it's simple and understandable. 2> next time you upgrade Lucene you, or the next poor programmer, will have to remember/reimplement your change to the parser. 3> How will you insure that others in your organization (and you

Index maintainance

2007-02-23 Thread Kainth, Sachin
Hi all, Just wondering how one would perform index maintainance. I know how to add new documents: writer = new IndexWriter(IndexDirectory, new PorterAnalyzer(), false); (incidently, I wrote PorterAnalyzer myself for the PorterStemFilter since I couldn't find an analyzer using it) But what I do

Re: Index maintainance

2007-02-23 Thread Erick Erickson
If you're using 2.1, see IndexModifier. If you're previous to 2.1, the IndexModifier is (I think), hanging around in the contrib area. You have to delete a document and re-add it, there's no such thing as "modify inplace" in lucene currently. Erick On 2/23/07, Kainth, Sachin <[EMAIL PROTECTED]>

Re: Index maintainance

2007-02-23 Thread Michael McCandless
Erick Erickson wrote: If you're using 2.1, see IndexModifier. If you're previous to 2.1, the IndexModifier is (I think), hanging around in the contrib area. You have to delete a document and re-add it, there's no such thing as "modify inplace" in lucene currently. Actually as of 2.1 you can now

RE: Index maintainance

2007-02-23 Thread Kainth, Sachin
I've just been looking at IndexReader and it seems you can do it using that, but I don't know which concrete implementation of IndexReader to use. -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 23 February 2007 15:07 To: java-user@lucene.apache.org Subject: R

Can I use Lucene to retrieve a list of duplicates

2007-02-23 Thread Paul Taylor
Hi I have Java Swing application with a table, I was considering using Lucene to index the data in the table. One task Id like to do is for the user to select 'Find Duplicate records for Column X', then I would filter the table to show only records where there is more than one with the same val

Re: Can I use Lucene to retrieve a list of duplicates

2007-02-23 Thread Erick Erickson
Sure, you can use the TermDocs/TermEnum classes. Basically, for a term (probably column value in your app) these let you quickly answer the question "which (and how many) documents does this term appear in". What you get is the Lucene doc id, which let's you fetch all the information about the doc

Re: Can I use Lucene to retrieve a list of duplicates

2007-02-23 Thread Erik Hatcher
On Feb 23, 2007, at 10:16 AM, Paul Taylor wrote: Hi I have Java Swing application with a table, I was considering using Lucene to index the data in the table. One task Id like to do is for the user to select 'Find Duplicate records for Column X', then I would filter the table to show only

Index modification

2007-02-23 Thread Kainth, Sachin
Hi all, I am using the IndexModifier class to perform index modification. I have deleted 1 document from an index and the output indicates that 1 document does indeed get deleted. However, running the program again reveals that the document deleted has appeared again in the index. This despite

Re: Can I use Lucene to retrieve a list of duplicates

2007-02-23 Thread Paul Taylor
Thanks this might do it, but do I need to know the terms beforehand, I just want to return any terms with frequency more than one? Erick Erickson wrote: Sure, you can use the TermDocs/TermEnum classes. Basically, for a term (probably column value in your app) these let you quickly answer the q

Re: Can I use Lucene to retrieve a list of duplicates

2007-02-23 Thread Paul Taylor
yes Ive seen this before thanks, it was an article that referred to this that pointed me towards lucene in the first place :) Erik Hatcher wrote: On Feb 23, 2007, at 10:16 AM, Paul Taylor wrote: Hi I have Java Swing application with a table, I was considering using Lucene to index the data i

RE: Leaking org.apache.lucene.index.* objects

2007-02-23 Thread Halsey, Stephen
Great, thanks a lot for that Hoss. Glad to hear it has been fixed. Steve. >-Original Message- >From: Chris Hostetter [mailto:[EMAIL PROTECTED] >Sent: 10 February 2007 06:14 >To: java-user@lucene.apache.org >Cc: Otis Gospodnetic >Subject: RE: Leaking org.apache.lucene.index.* objects >

Re: Leaking org.apache.lucene.index.* objects

2007-02-23 Thread Mark Miller
Are you flushing the session every so often with hibernate? If not you might not have been experiencing Otis's bug -- if so, never mind - Mark On 2/23/07, Halsey, Stephen <[EMAIL PROTECTED]> wrote: Great, thanks a lot for that Hoss. Glad to hear it has been fixed. Steve. >-Original Mes

Determining if index exists Lucene 2.1

2007-02-23 Thread Shane
Hi, Prior to Lucene 2.1, I was using FSDirectory.getDirectory(String path, boolean create)|| inside of a try block to determine whether or not a directory existed. With the deprecation of the above class call in Lucene 2.1, I need a new method for determining the existence of an index. I c

Re: TextMining.org Word extractor

2007-02-23 Thread Chris Hostetter
googling... TextMining.org licence ...turns up lots of useful info, some from the archive of this list. : Date: Fri, 23 Feb 2007 16:04:53 +1100 : From: Antony Bowesman <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: TextMining.org

Re: Determining if index exists Lucene 2.1

2007-02-23 Thread karl wettin
23 feb 2007 kl. 19.53 skrev Shane: Is there a function call to determine whether or not an index already exists? HTH -- karl ---

Re: Determining if index exists Lucene 2.1

2007-02-23 Thread Michael McCandless
Shane wrote: Prior to Lucene 2.1, I was using FSDirectory.getDirectory(String path, boolean create)|| inside of a try block to determine whether or not a directory existed. With the deprecation of the above class call in Lucene 2.1, I need a new method for determining the existence of an in

Re: Multy Language documents indexing

2007-02-23 Thread Ivan Vasilev
Thanks Erik, Here I describe about my research on this problem. It might be helpful for someone :) I will divide the problem with multiple language docs in some subproblems: *1. Determining the language in the text documents. 1.1. Determining the language in document when the whole text is in on

filtering by first letter

2007-02-23 Thread Paul Sundling (Webdaddy)
I have a requirement to support filtering search results by first letter. This is relatively simple by adding a field to each index that represents the first letter for that relevant index and then adding a filter to the search. The hard part is that I need to list all the letters you can fil

Re: filtering by first letter

2007-02-23 Thread Erick Erickson
See TermEnum (I don't think you need TermDocs for this). If you instantiate a TermEnum(new Term("firstletterfield", "")), it'll enumerate all the terms in your 'firstletter' field and you can just collect them and go... For that matter, and assuming that your names are UN_TOKENIZED, you could do

Re: Can I use Lucene to retrieve a list of duplicates

2007-02-23 Thread Chris Hostetter
: Thanks this might do it, but do I need to know the terms beforehand, I : just want to return any terms with frequency more than one? no, TermEnum will let you iterate over all the terms ... you don't even need TermDocs if you just want the docFreq for each term (which would be 1 if there are no

Re: search on colon ":" ending words

2007-02-23 Thread Chris Hostetter
: String newquery = query.replace(query, ": ", " "); you should be able to usea regex like so... String newquery = query.replaceAll(":\\b", ":"); ...(i may have some extra/missing backslashes) to ensure that literal ":" in your input which are followed by word boundaries are "escaped"

RE: Index maintainance

2007-02-23 Thread Chris Hostetter
: I've just been looking at IndexReader and it seems you can do it using : that, but I don't know which concrete implementation of IndexReader to : use. there is a static factory method for opening an IndexReader in the IndexReader class (you can't call the constructors directly) please go throu

Re: Index modification

2007-02-23 Thread Chris Hostetter
you should check the return count from deleteDocuments ... if i had to guess i would say that your analyzer is steming the input in such a way that your indexed terms don't match the Term you are trying to delete on. : Date: Fri, 23 Feb 2007 16:48:27 - : From: "Kainth, Sachin" <[EMAIL PROTEC

Re: ConstantScoreQuery and MatchAllDocsQuery

2007-02-23 Thread Chris Hostetter
: I ask this because I need to return the frequency of the search terms : with each of my results, I tried using the TermFreqVector object but : unfortunately it was not fast enough, so I decided to modifiy lucene to : be able to return the frequency the same way the score is returned by : org.ap

Re: QueryParser bug?

2007-02-23 Thread Doron Cohen
Hi Antony, Could you try the patch in http://issues.apache.org/jira/browse/LUCENE-813 Thanks, Doron Chris Hostetter <[EMAIL PROTECTED]> wrote on 22/02/2007 22:01:00: > > : than just on/off), but the original QP shows the problem with > : setAllowLeadingWildcard(true). The compiled JavaCC code