RE: Lucene 4.0 PerFieldAnalyzerWrapper question

2012-09-26 Thread Mike O'Leary
[mailto:gento...@gmail.com] Sent: Tuesday, September 25, 2012 6:32 PM To: java-user@lucene.apache.org Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question Mike, On Wed, Sep 26, 2012 at 1:05 PM, Mike O'Leary tmole...@uw.edu wrote: Hi Chris, So if I change my analyzer to inherit from AnalyzerWrapper

Lucene 4.0 PerFieldAnalyzerWrapper question

2012-09-25 Thread Mike O'Leary
I am updating an analyzer that uses a particular configuration of the PerFieldAnalyzerWrapper to work with Lucene 4.0. A few of the fields use a custom analyzer and StandardTokenizer and the other fields use the KeywordAnalyzer and KeywordTokenizer. The older version of the analyzer looks like

RE: Lucene 4.0 PerFieldAnalyzerWrapper question

2012-09-25 Thread Mike O'Leary
in your code sample. Are you able to expand on the problem you're encountering? On Wed, Sep 26, 2012 at 11:57 AM, Mike O'Leary tmole...@uw.edu wrote: I am updating an analyzer that uses a particular configuration of the PerFieldAnalyzerWrapper to work with Lucene 4.0. A few of the fields use

RE: Lucene 4.0 PerFieldAnalyzerWrapper question

2012-09-25 Thread Mike O'Leary
by not extending Analyzer but instead just instantiating a PerFieldAnalyerWrapper instance directly instead of your MyPerFieldAnalyzer. On Wed, Sep 26, 2012 at 12:25 PM, Mike O'Leary tmole...@uw.edu wrote: Hi Chris, In a nutshell, my question is, what should I put in place of ??? to make

Uses for IndexWriter.commit(commitUserData)/IndexCommit.getUserData()

2012-09-21 Thread Mike O'Leary
I was looking at IndexWriter.commit(commitUserData) and IndexCommit.getUserData() as possible ways to save metadata about documents in an index, but I realized that the metadata we are looking at could easily get to have way too many map entries to work well. This pair of functions looks

RE: Problem with TermVector offsets and positions not being preserved

2012-08-24 Thread Mike O'Leary
for an effort to fix this trap for google summer of code. On Wed, Aug 22, 2012 at 5:23 PM, Mike O'Leary tmole...@uw.edu wrote: I have one more question about term vector positions and offsets being preserved. My co-worker is working on updating the documents in an index with a field that contains

RE: Problem with TermVector offsets and positions not being preserved

2012-08-22 Thread Mike O'Leary
this doesn't occur? Thanks, Mike -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Friday, July 20, 2012 5:59 PM To: java-user@lucene.apache.org Subject: Re: Problem with TermVector offsets and positions not being preserved On Fri, Jul 20, 2012 at 8:24 PM, Mike O'Leary tmole

Supporting advanced search methods in a user interface

2012-08-16 Thread Mike O'Leary
I would like to know if anyone has ideas (or pointers to discussions) about good ways to support advanced search options, such as the various kinds of SpanQuery, in a search application user interface that is understandable to non-expert users. Thanks, Mike

RE: Problem with TermVector offsets and positions not being preserved

2012-07-26 Thread Mike O'Leary
Subject: Re: Problem with TermVector offsets and positions not being preserved On Fri, Jul 20, 2012 at 8:24 PM, Mike O'Leary tmole...@uw.edu wrote: Hi Robert, I'm not trying to determine whether a document has term vectors, I'm trying to determine whether the term vectors that are in the index have

RE: Problem with TermVector offsets and positions not being preserved

2012-07-20 Thread Mike O'Leary
and checking everything out. I couldnt find any problems. Can you provide more information? On Thu, Jul 19, 2012 at 7:16 PM, Mike O'Leary tmole...@uw.edu wrote: I created an index using Lucene 3.6.0 in which I specified that a certain text field in each document should be indexed, stored, analyzed

RE: Problem with TermVector offsets and positions not being preserved

2012-07-20 Thread Mike O'Leary
I neglected to mention that CreateTestIndex uses a collection of data files with .properties extensions that are included in the Lucene In Action source code download. Mike -Original Message- From: Mike O'Leary [mailto:tmole...@uw.edu] Sent: Friday, July 20, 2012 2:10 PM To: java-user

RE: Problem with TermVector offsets and positions not being preserved

2012-07-20 Thread Mike O'Leary
be using something like IndexReader.getTermFreqVector for the document to determine if it has term vectors. On Fri, Jul 20, 2012 at 5:10 PM, Mike O'Leary tmole...@uw.edu wrote: Hi Robert, I put together the following two small applications to try to separate the problem I am having from my own

Problem with TermVector offsets and positions not being preserved

2012-07-19 Thread Mike O'Leary
I created an index using Lucene 3.6.0 in which I specified that a certain text field in each document should be indexed, stored, analyzed with no norms, with term vectors, offsets and positions. Later I looked at that index in Luke, and it said that term vectors were created for this field, but

Highlighting in Luke?

2012-03-13 Thread Mike O'Leary
I sent this message to the Luke discussion forum, but there isn't a lot of activity there these days, so I thought I would ask my question here too. I was asked if Luke supports highlighting of matched terms in its search results display. I looked through the code, and it doesn't look to me

Searching by similarity using term vectors

2012-02-14 Thread Mike O'Leary
If I have indexed a set of documents using term vectors, is there support in Lucene to treat a list of query terms as a small document, create a term vector for it, and find documents by computing similarity between the query's term vector and the term vectors in the index? If so, what API

Obtaining IDF values for the terms in a document set

2011-12-15 Thread Mike O'Leary
We have a large set of documents that we would like to index with a customized stopword list. We have run tests by indexing a random set of about 10% of the documents, and we'd like to generate a list of the terms in that smaller set and their IDF values as a way to create a starter set of

RE: Obtaining IDF values for the terms in a document set

2011-12-15 Thread Mike O'Leary
for the terms in a document set On Thu, Dec 15, 2011 at 6:33 PM, Mike O'Leary tmole...@uw.edu wrote: We have a large set of documents that we would like to index with a customized stopword list. We have run tests by indexing a random set of about 10% of the documents, and we'd like to generate a list

Indexing single words and marked phrases

2007-03-02 Thread Mike O'Leary
to do this? Thanks. Mike O'Leary

Storing extra data in index

2007-02-27 Thread Mike O'Leary
how to do something like this? Or is there a better way that I'm not thinking of? Thanks. Mike O'Leary

RE: Storing extra data in index

2007-02-27 Thread Mike O'Leary
So if I wanted to record the length of each individual document, would it be better to store that information with each document, perhaps as an unindexed field? Or are there ways to refer to the indexed documents that don't change through delete and optimize steps? Thanks. Mike O'Leary

Registering a local dtd file for use with Digester

2007-02-22 Thread Mike O'Leary
I have a collection of XML files that I would like to parse using Digester in order to index them for Lucene. A DTD file has been supplied for the XML files, but none of those files has a !DOCTYPE ... line associating them with the DTD file. Can the Digester's register function be used to tell it