Scott Smith writes:
I have some documents I'm indexing which have multiple languages in them
(i.e., some fields in the document are always English; other fields may be
other languages). Now, I understand why a query against a certain field
must use the same analyzer as was used when that
On Jan 12, 2004, at 7:59 PM, Scott Smith wrote:
I have some documents I'm indexing which have multiple languages in
them
(i.e., some fields in the document are always English; other fields
may be
other languages). Now, I understand why a query against a certain
field
must use the same analyzer
Hello all,
I would like to get a word frequency list from a text. How can I archive
this in the most direct way using Lucene classes?
Example: I have a very long text. I parse these text with an
WhitespaceAnalyser. From this Text I generate an Index. From this index I get each word
together
On Jan 13, 2004, at 7:26 AM, [EMAIL PROTECTED] wrote:
Example: I have a very long text. I parse these text with an
WhitespaceAnalyser. From this Text I generate an Index. From this
index I get each word
together with its alsolute frequency / relative frequency.
Can I do it without generating an
On Jan 12, 2004, at 7:49 PM, Scott Smith wrote:
Does the following do that:
BooleanQuery Query QA = new Boolean Query();
Query qa1 = QueryParser.parse(A1, FieldA, analyzer());
Query qa2 = QueryParser.parse(A2, FieldA, analyzer());
QA.add(qa1, false, false); //
Hello Erik,
I know that. However, I still wonder if there this is already solved somehow
in Lucene. I would prefer using Lucene methods instead of workaround. On the
other generating an index only get hold of words and their frequencies would
make it to complicated. I basically want to tansfer a
[EMAIL PROTECTED] wrote:
I would like to get a word frequency list from a text. How can I archive
this in the most direct way using Lucene classes?
Can I do it without generating an index?
No, if you want Lucene to compute frequencies, then you need to create
an index.
Doug
I just aborted a re-indexing operation (because it was taking too much time - will run
it overnight instead). But I was surprised by what I found in the index directory,
which contained a total of 1,402 index files! It started out with 36 files with the
name of _I9a.*, followed by groups of
Hi Terry,
It's usually useful to give some information about your environment. How
many documents you are indexing, what is the average size of a document,
etc. But I'll answer anyway :-).
For details about indexing see Otis' article at
http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html
So I can write:
Query q2 = new TermQuery(new Term(a1, FieldA));
And similar things for all of the QueryParser's. This makes sense and I
assume must be more efficient than using the QueryParser for simple terms.
As you have guessed, there may be an arbitrary number of terms (not just 2)
I looked at PerFieldAnalyzerWrapper. Seems perfect for what I want.
Thanks.
Some day, I'd be interested to understand the deeper question.
Thanks again
Scott
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 13, 2004 3:19 AM
To: Lucene Users List
Hi all,
I have a text field called ACTIVE_YEAR that stores (of course) a year like
2003. When I index this field I can see the number in my index (using Luke)
but I can't search it. If I add a text character to the end of the field
and index it (200x) I can then search and find 'x', but not any
On Jan 13, 2004, at 5:21 PM, Scott Smith wrote:
I guess what is confusing me now is that the search code no longer
references an analyzer???!!! How does it know how to tokenize, stem,
etc.
the search terms?
It doesn't. A TermQuery is exactly as-is. If you need the analysis
part, you can use
On Jan 13, 2004, at 6:19 PM, Patrick Kates wrote:
I have a text field called ACTIVE_YEAR that stores (of course) a year
like
2003. When I index this field I can see the number in my index (using
Luke)
but I can't search it. If I add a text character to the end of the
field
and index it (200x)
Hi
if u want that your number like 200x should not filter by StandardAnalyzer then
Change the method isTokenChar() called in CharTokenizer.java file and
is implemented in LetterTokenizer.java
So do not filter number : for that use
Character.isDigit(char)
Patrick Kates [EMAIL PROTECTED]
15 matches
Mail list logo