[mailto:gento...@gmail.com]
Sent: Tuesday, September 25, 2012 6:32 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question
Mike,
On Wed, Sep 26, 2012 at 1:05 PM, Mike O'Leary tmole...@uw.edu wrote:
Hi Chris,
So if I change my analyzer to inherit from AnalyzerWrapper
I am updating an analyzer that uses a particular configuration of the
PerFieldAnalyzerWrapper to work with Lucene 4.0. A few of the fields use a
custom analyzer and StandardTokenizer and the other fields use the
KeywordAnalyzer and KeywordTokenizer. The older version of the analyzer looks
like
in your code sample.
Are you able to expand on the problem you're encountering?
On Wed, Sep 26, 2012 at 11:57 AM, Mike O'Leary tmole...@uw.edu wrote:
I am updating an analyzer that uses a particular configuration of the
PerFieldAnalyzerWrapper to work with Lucene 4.0. A few of the fields
use
by not extending Analyzer but instead
just instantiating a PerFieldAnalyerWrapper instance directly instead of your
MyPerFieldAnalyzer.
On Wed, Sep 26, 2012 at 12:25 PM, Mike O'Leary tmole...@uw.edu wrote:
Hi Chris,
In a nutshell, my question is, what should I put in place of ??? to
make
I was looking at IndexWriter.commit(commitUserData) and
IndexCommit.getUserData() as possible ways to save metadata about documents in
an index, but I realized that the metadata we are looking at could easily get
to have way too many map entries to work well. This pair of functions looks
for an effort to fix this
trap for google summer of code.
On Wed, Aug 22, 2012 at 5:23 PM, Mike O'Leary tmole...@uw.edu wrote:
I have one more question about term vector positions and offsets being
preserved. My co-worker is working on updating the documents in an index with
a field that contains
this doesn't occur?
Thanks,
Mike
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Friday, July 20, 2012 5:59 PM
To: java-user@lucene.apache.org
Subject: Re: Problem with TermVector offsets and positions not being preserved
On Fri, Jul 20, 2012 at 8:24 PM, Mike O'Leary tmole
I would like to know if anyone has ideas (or pointers to discussions) about
good ways to support advanced search options, such as the various kinds of
SpanQuery, in a search application user interface that is understandable to
non-expert users.
Thanks,
Mike
Subject: Re: Problem with TermVector offsets and positions not being preserved
On Fri, Jul 20, 2012 at 8:24 PM, Mike O'Leary tmole...@uw.edu wrote:
Hi Robert,
I'm not trying to determine whether a document has term vectors, I'm trying
to determine whether the term vectors that are in the index have
and checking everything out. I couldnt find any problems.
Can you provide more information?
On Thu, Jul 19, 2012 at 7:16 PM, Mike O'Leary tmole...@uw.edu wrote:
I created an index using Lucene 3.6.0 in which I specified that a certain
text field in each document should be indexed, stored, analyzed
I neglected to mention that CreateTestIndex uses a collection of data files
with .properties extensions that are included in the Lucene In Action source
code download.
Mike
-Original Message-
From: Mike O'Leary [mailto:tmole...@uw.edu]
Sent: Friday, July 20, 2012 2:10 PM
To: java-user
be using something like IndexReader.getTermFreqVector for the
document to determine if it has term vectors.
On Fri, Jul 20, 2012 at 5:10 PM, Mike O'Leary tmole...@uw.edu wrote:
Hi Robert,
I put together the following two small applications to try to separate the
problem I am having from my own
I created an index using Lucene 3.6.0 in which I specified that a certain text
field in each document should be indexed, stored, analyzed with no norms, with
term vectors, offsets and positions. Later I looked at that index in Luke, and
it said that term vectors were created for this field, but
I sent this message to the Luke discussion forum, but there isn't a lot of
activity there these days, so I thought I would ask my question here too.
I was asked if Luke supports highlighting of matched terms in its search
results display. I looked through the code, and it doesn't look to me
If I have indexed a set of documents using term vectors, is there support in
Lucene to treat a list of query terms as a small document, create a term vector
for it, and find documents by computing similarity between the query's term
vector and the term vectors in the index? If so, what API
We have a large set of documents that we would like to index with a customized
stopword list. We have run tests by indexing a random set of about 10% of the
documents, and we'd like to generate a list of the terms in that smaller set
and their IDF values as a way to create a starter set of
for the terms in a document set
On Thu, Dec 15, 2011 at 6:33 PM, Mike O'Leary tmole...@uw.edu wrote:
We have a large set of documents that we would like to index with a
customized stopword list. We have run tests by indexing a random set of about
10% of the documents, and we'd like to generate a list
to do this? Thanks.
Mike O'Leary
how to do something like this? Or is there a better way that I'm not
thinking of? Thanks.
Mike O'Leary
So if I wanted to record the length of each individual document, would it be
better to store that information with each document, perhaps as an unindexed
field? Or are there ways to refer to the indexed documents that don't change
through delete and optimize steps? Thanks.
Mike O'Leary
I have a collection of XML files that I would like to parse using Digester
in order to index them for Lucene. A DTD file has been supplied for the XML
files, but none of those files has a !DOCTYPE ... line associating them
with the DTD file. Can the Digester's register function be used to tell it
21 matches
Mail list logo