Re: Philosophy(??) question

2004-01-13 Thread Morus Walter
Scott Smith writes: I have some documents I'm indexing which have multiple languages in them (i.e., some fields in the document are always English; other fields may be other languages). Now, I understand why a query against a certain field must use the same analyzer as was used when that

Re: Philosophy(??) question

2004-01-13 Thread Erik Hatcher
On Jan 12, 2004, at 7:59 PM, Scott Smith wrote: I have some documents I'm indexing which have multiple languages in them (i.e., some fields in the document are always English; other fields may be other languages). Now, I understand why a query against a certain field must use the same analyzer

Getting word freqency?

2004-01-13 Thread ambiesense
Hello all, I would like to get a word frequency list from a text. How can I archive this in the most direct way using Lucene classes? Example: I have a very long text. I parse these text with an WhitespaceAnalyser. From this Text I generate an Index. From this index I get each word together

Re: Getting word freqency?

2004-01-13 Thread Erik Hatcher
On Jan 13, 2004, at 7:26 AM, [EMAIL PROTECTED] wrote: Example: I have a very long text. I parse these text with an WhitespaceAnalyser. From this Text I generate an Index. From this index I get each word together with its alsolute frequency / relative frequency. Can I do it without generating an

Re: Query question

2004-01-13 Thread Erik Hatcher
On Jan 12, 2004, at 7:49 PM, Scott Smith wrote: Does the following do that: BooleanQuery Query QA = new Boolean Query(); Query qa1 = QueryParser.parse(A1, FieldA, analyzer()); Query qa2 = QueryParser.parse(A2, FieldA, analyzer()); QA.add(qa1, false, false); //

Re: Getting word freqency?

2004-01-13 Thread ambiesense
Hello Erik, I know that. However, I still wonder if there this is already solved somehow in Lucene. I would prefer using Lucene methods instead of workaround. On the other generating an index only get hold of words and their frequencies would make it to complicated. I basically want to tansfer a

Re: Getting word freqency?

2004-01-13 Thread Doug Cutting
[EMAIL PROTECTED] wrote: I would like to get a word frequency list from a text. How can I archive this in the most direct way using Lucene classes? Can I do it without generating an index? No, if you want Lucene to compute frequencies, then you need to create an index. Doug

Peculiar (?) Indexing Performance

2004-01-13 Thread Terry Steichen
I just aborted a re-indexing operation (because it was taking too much time - will run it overnight instead). But I was surprised by what I found in the index directory, which contained a total of 1,402 index files! It started out with 36 files with the name of _I9a.*, followed by groups of

Re: Peculiar (?) Indexing Performance

2004-01-13 Thread Dror Matalon
Hi Terry, It's usually useful to give some information about your environment. How many documents you are indexing, what is the average size of a document, etc. But I'll answer anyway :-). For details about indexing see Otis' article at http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html

RE: Query question

2004-01-13 Thread Scott Smith
So I can write: Query q2 = new TermQuery(new Term(a1, FieldA)); And similar things for all of the QueryParser's. This makes sense and I assume must be more efficient than using the QueryParser for simple terms. As you have guessed, there may be an arbitrary number of terms (not just 2)

RE: Philosophy(??) question

2004-01-13 Thread Scott Smith
I looked at PerFieldAnalyzerWrapper. Seems perfect for what I want. Thanks. Some day, I'd be interested to understand the deeper question. Thanks again Scott -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 13, 2004 3:19 AM To: Lucene Users List

StandardAnalyzer and numbers indexed as text

2004-01-13 Thread Patrick Kates
Hi all, I have a text field called ACTIVE_YEAR that stores (of course) a year like 2003. When I index this field I can see the number in my index (using Luke) but I can't search it. If I add a text character to the end of the field and index it (200x) I can then search and find 'x', but not any

Re: Query question

2004-01-13 Thread Erik Hatcher
On Jan 13, 2004, at 5:21 PM, Scott Smith wrote: I guess what is confusing me now is that the search code no longer references an analyzer???!!! How does it know how to tokenize, stem, etc. the search terms? It doesn't. A TermQuery is exactly as-is. If you need the analysis part, you can use

Re: StandardAnalyzer and numbers indexed as text

2004-01-13 Thread Erik Hatcher
On Jan 13, 2004, at 6:19 PM, Patrick Kates wrote: I have a text field called ACTIVE_YEAR that stores (of course) a year like 2003. When I index this field I can see the number in my index (using Luke) but I can't search it. If I add a text character to the end of the field and index it (200x)

Re: StandardAnalyzer and numbers indexed as text

2004-01-13 Thread Pawan preet
Hi if u want that your number like 200x should not filter by StandardAnalyzer then Change the method isTokenChar() called in CharTokenizer.java file and is implemented in LetterTokenizer.java So do not filter number : for that use Character.isDigit(char) Patrick Kates [EMAIL PROTECTED]