Re: Indexing a single product in multiple categories.

2006-10-02 Thread Stuart Grimshaw
On Thursday 28 September 2006 10:12, Stuart Grimshaw wrote: We have an existing lucene based search, and a recent change to the way we organise our products has caused a bit of a problem for search results. Our products are arranged into subcategories, categories stores. A product can only

SV: Indexing a single product in multiple categories.

2006-10-02 Thread Marcus Falck
Can't you just add several values to the Store field? I.E: doc.addField(field.text(STOREFIELD, val1) doc.addField(field.text(STOREFIELD, val2) -Ursprungligt meddelande- Från: Stuart Grimshaw [mailto:[EMAIL PROTECTED] Skickat: den 2 oktober 2006 10:09 Till: java-user@lucene.apache.org

Modifying the PrefixQuery

2006-10-02 Thread Marcus Falck
I want to modify the PrefixQuery so that it instead of casting the TooManyBooleanClause exception takes out the most frequent N terms matching the prefix and only searches for those. Is this possible? / Regards Marcus

Search in HTML code

2006-10-02 Thread John Bugger
Hello! I've indexed HTML pages and stored html codes as UN_TOKENIZED fields. So, I need to search for specific tags in those documents, for example: option name=test Do I need to write some custom analyzer or something like that? Please help me!

DateTools again

2006-10-02 Thread Volodymyr Bychkoviak
I'm using DateTools with Resolution.DAY. I know that dates internally are converted to GMT. Converting dates 2006-10-01 00:00 and 2006-10-01 15:00 from Etc/GMT-2 timezone will give us 20060930 and 20061001 respectively. But these dates are identical with day resolution. Is this bug or I'm

Re: DateTools again

2006-10-02 Thread John Haxby
Volodymyr Bychkoviak wrote: I'm using DateTools with Resolution.DAY. I know that dates internally are converted to GMT. Converting dates 2006-10-01 00:00 and 2006-10-01 15:00 from Etc/GMT-2 timezone will give us 20060930 and 20061001 respectively. But these dates are identical with day

Re: DateTools again

2006-10-02 Thread John Haxby
John Haxby wrote: I ran across the problem with DateTools not using UTC when I tried to use an index created in California from the UK: I was looking for documents with a particular date stamp but I found documents with a date stamp from the wrong day. Even more interesting and bizarre

Re: Performing a like query

2006-10-02 Thread Chris Hostetter
: I have a custom-built Analyzer where I tokenize all non-whitespace : characters as well available in the field TERM (which is the only : field being tokenised). : If I now query my index file for a term 6/12 for instance, I get back : only ONE result : instead of TWO. There is another token in

Re: Very high fieldNorm for a field resulting in bad results

2006-10-02 Thread Chris Hostetter
: This should solve most of my heartache. : Whats the suggested way to use this ? Copy a solr jar ? Or just copy : the code for this 1 query ? that's entirely up to you, it depends on what kind of source management you want to have -- the suggested way to use it is to run Solr and use it via the

Re: Indexing a single product in multiple categories.

2006-10-02 Thread Chris Hostetter
: Is my only option here really going to be to add some more colums? I've slept : on it over the weekend, and not had any more bright ideas ... ? I have to admit, i dont't relaly udnerstand your problem ... you speak of Products and Stores and Categories and Primary Categories and wondering how

Re: Modifying the PrefixQuery

2006-10-02 Thread Chris Hostetter
: I want to modify the PrefixQuery so that it instead of casting the : TooManyBooleanClause exception takes out the most frequent N terms : matching the prefix and only searches for those. Is this possible? It should be ... look at the rewrite method of PrefixQuery and the docFreq method of

Re: lucene newbie question

2006-10-02 Thread Erik Hatcher
On Oct 2, 2006, at 2:08 PM, Los Morales wrote: I'm new to Lucene and IR in general. I'm a bit confused on the concept of fields. From what I've read, a field does not have to be indexed but its value can be stored in an index. Likewise a field can be indexed but its value is not stored

Re: lucene newbie question

2006-10-02 Thread Doron Cohen
SSN actually is a common situation. Assume you have a (relational) database with a table of products with three columns : - SSN, which is also a primary key for that table, - DESCRIPTION, which has free text (i.e. unformatted text) describing the product. - OTHER - additional info. Also assume

Re: lucene newbie question

2006-10-02 Thread Erick Erickson
Another Erick (note the correct spelling G). See below.. On 10/2/06, Los Morales [EMAIL PROTECTED] wrote: Hi Erik, Thanks for the response. Consider the index in the back of a book. You could tear that out and still use it to tell what page something is on, but you have no actual content

Re: Search in HTML code

2006-10-02 Thread Erick Erickson
I guess the thundering silence is rooted in the problem statement. I have a hard time understanding how this index is used. By storing things this way, you'll force the user to know the *exact* format of anything she's looking for. That is, it's hard to search for option name=test value=32 and

Changing Similarity on existing index

2006-10-02 Thread Shane Perry
I have an existing index which was created with DefaultSimilarity. I want to update the index to use my own Similarity class (need to change the lengthNorm). I wrote a quick script which creates a new index, calls setSimilarity(new MySimilarity) for that indexes IndexWriter, and then calls

get terms by positions

2006-10-02 Thread Renzo Scheffer
Hi, can anybody be so kind to tell me if it is possible to search a Term by its position? I search a term (for excample soccer) and get back the DocId's and positions as follows: TermPositions termPos = reader.termPositions(new Term(contents,soccer)); while(termPos.next()){ int

Re: Changing Similarity on existing index

2006-10-02 Thread Chris Hostetter
: Initially, I had anticipated that doing this would updated the : Similarity as part of the add process. But after running some tests, : this does not appear to be the case. fieldNorms are computed when the document is added to the index ... merging indexes doesn't affect them. : Is there

Re: get terms by positions

2006-10-02 Thread Doron Cohen
You can store TermVectors with position info, but I don't think this would be enough for what you are asking, because it is not meant for direct access to a term by its position, and because TermVectors store tokens, i.e. the indexed form of the word, which I am not sure is what you need. It

A question about query syntax, has it changed?

2006-10-02 Thread Bill Taylor
I am indexing individual pages of books. I get no results from the query accurate AND book:first title Each lucene document which represents one page of one book gets a field book which is indexed, stored, and not tokenized to store the title of the book. The word accurate appears on page

Re: A question about query syntax, has it changed?

2006-10-02 Thread Doron Cohen
The problem stems from using the query parser for searching a non tokenized field (book). You can either create a term query for searching in that field, like this: new TermQuery(new Term(book,first title)); Or tokenize the field book and keep using QueryParser. Decision is based on how you

[Lucene 2.0]How to recover index?

2006-10-02 Thread zhu jiang
Hi all, In some situation, index files may throw read past EOF exception so that the index cannot be used any more. I wonder how to recover the index files in such situation? -- Thanks, Jiang

Searching documents on big index by using ParallelMultiSearcher is slow...

2006-10-02 Thread Scott
Hi, I have a question about ParallelMultiSearcher performance. I want to search documents on about 10 gigabytes of index. (The index has 10,000,000 documents.) I get very slow performance using IndexSearcher with ONE index normally. Then I tried to use ParallelMultiSearcher with 10 servers of

Multi-threaded IndexWriter

2006-10-02 Thread Antony Bowesman
Hi, I have a multi-threaded indexing application that indexes documents into a set of Lucene index databases (I have millions of documents to index, hence the split DB) . When a thread gets an index request, it determines the index DB to index the data in. It grabs the IndexWriter for that