Re: document diversity

2009-10-01 Thread Tricia Williams
Hi Mike, The first thing that comes to mind is to run a query for each document type (assuming that you have a field that stores the type) and qualify the document type: for example type:pdf. Then you would have to write something to combine the query results drawing an equal number of hits

Payloads, Tokenizers, and Filters. Oh My!

2007-11-16 Thread Tricia Williams
Hi All, I'll explain what I'm working on, and then I'll ask my two questions. I'm working on the issue https://issues.apache.org/jira/browse/SOLR-380 which is a feature request that allows one to index a "Structured Document" which is anything that can be represented by XML in order to pr

Payloads, Tokenizers, and Filters. Oh My!

2007-11-16 Thread Tricia Williams
Hi All, I'll explain what I'm working on, and then I'll ask my two questions. I'm working on the issue https://issues.apache.org/jira/browse/SOLR-380 which is a feature request that allows one to index a "Structured Document" which is anything that can be represented by XML in order to

Re: Payloads, Tokenizers, and Filters. Oh My!

2007-11-17 Thread Tricia Williams
Hi Grant, Thanks for your response! Taking a closer look at the TokenFilter(s) that causes my problem with the Payload are all from org.apache.solr.analysis rather than org.apache.lucene.analysis. I had originally thought that all the TokenFilters available through Solr's TokenFilterFa

Re: Payloads, Tokenizers, and Filters. Oh My!

2007-11-18 Thread Tricia Williams
I apologize for cross-posting but I believe both Solr and Lucene users and developers should be concerned with this. I am not aware of a better way to reach both communities. In this email I'm looking for comments on: * Do TokenFilters belong in the Solr code base at all? * How to deal

Re: applying patches (was [jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery)

2008-04-29 Thread Tricia Williams
Hi Maurizio, I'm replying in java-user because I believe this is the appropriate place for a question like this. All the patches that I have encountered (including this one) are usually applied at the root. One should download the source code from http://svn.apache.org/repos/asf/lucen

Re: Term Based Meta Data

2008-08-05 Thread Tricia Williams
Hi Martin, Take a look at what I've done with SOLR-380 (https://issues.apache.org/jira/browse/SOLR-380). It might solve your problem, or at least give you a good starting point. Tricia Michael McCandless wrote: I think you could use payloads (= arbitrary/opaque byte[]) for this? You ca

Re: Term Based Meta Data

2008-08-08 Thread Tricia Williams
Martin Owens wrote: Dear Lucene Users and Tricia Williams, The way we're operating our lucene index is one where we index all the terms but not store the text. From your SOLR-380 patch example Tricia I was able to get a very good idea of how to set things up. Historically I have used

Re: Indexing sections of TEI XML files

2008-08-13 Thread Tricia Williams
Hi, Take a look at what I've done with SOLR-380 (https://issues.apache.org/jira/browse/SOLR-380). The part you might find particularly useful is the Tokenizer. Tricia [EMAIL PROTECTED] wrote: Dear users, Question on approaches to indexing TEI XML or similar section/subsectioned files.

Re: number of term occurrences

2006-10-24 Thread Tricia Williams
When you create a Document by adding Field(s) (http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html) consider the last constructor which allows you to specify if the the field will have its TermVector stored or not stored. Also, Luke has a column in its document view wh

IndexReader.FieldOptions

2007-03-03 Thread Tricia Williams
Hi, I'm wondering why Stored isn't one of the IndexReader.FieldOption(s)? Stored is created at the same time and place as the other options (FieldOption.INDEXED and FieldOption.TERMVECTOR) so it doesn't make sense that it isn't retrieved in the same way. Tricia ---

Encountered "" using queryparser in XSP

2005-11-09 Thread Tricia Williams
Hi All, I'm using an html form to send a query to an xsp which uses lucene to search and then returns the results as xml. Perhaps some one has experienced the problem that I'm currently experiencing. When the query is parsed org.apache.lucene.queryParser.ParseException is thrown stating that

Re: BitSet in a HitCollector

2006-07-06 Thread Tricia Williams
Hi James, A paper was mentioned on this list in the last couple of months which presents a solution to your sampling problem without having to know the total results size in advance. The paper (http://www2005.org/cdrom/docs/p245.pdf) presents two solutions which utilize a random variable.

Storing HashMap as an UnIndexed Field

2005-09-20 Thread Tricia Williams
Hi, I'd like to store a HashMap for some extra data to be used when a given document is retrieved as a Hit for a query. To add an UnIndexed Field to an index takes only Strings as parameters. Does anyone have any suggestions on how I might convert the HashMap to a String that is efficiently r

RE: Storing HashMap as an UnIndexed Field

2005-09-20 Thread Tricia Williams
o a HashMap) > > -Original Message----- > From: Tricia Williams [mailto:[EMAIL PROTECTED] > Sent: Tuesday, September 20, 2005 3:14 PM > To: java-user@lucene.apache.org > Subject: Storing HashMap as an UnIndexed Field > > Hi, > >I'd like to store a HashMap for some ex

TermDocs.freq()

2005-09-29 Thread Tricia Williams
I am finding that TermDocs.freq() method is returning an incorrect value. I was wondering if anyone else had experienced this problem. I am using tp = IndexReader.termPositions( queryTerm ) to return a object which implements TermPositions. I then use tp.skipTo( docid ) to go directly to the docu

Re: TermDocs.freq()

2005-10-03 Thread Tricia Williams
? Is there an obvious work-around so that the frequency that I receive is correct for my document? Thank you for your consideration, Tricia On Thu, 29 Sep 2005, Tricia Williams wrote: > I am finding that TermDocs.freq() method is returning an incorrect value. > I was wondering if anyone el