Using multiple analysers within a query

2004-11-21 Thread Kauler, Leto S
Hi Lucene list, We have the need for analysed and 'not analysed/not tokenised' clauses within one query. Imagine an unparsed query like: +title:Hello World +path:Resources\Live\1 In the above example we would want the first clause to use StandardAnalyser and the second to use an analyser which

RE: Using multiple analysers within a query

2004-11-24 Thread Kauler, Leto S
Hi again, Thanks for everyone who replied. The PerFieldAnalyzerWrapper was a good suggestion, and one I had overlooked, but for our particular requirements it wouldn't quite work so I went with overriding getFieldQuery(). You were right, Paul. In 1.4.2 a whole heap of QueryParser changes were

RE: Using multiple analysers within a query

2004-11-24 Thread Kauler, Leto S
Actually, just realised a PhraseQuery is incorrect... I only want a single TermQuery but it just needs to be quoted, d'oh. -Original Message- Then I found that because that analyser always returns a single token (TermQuery) it would send through spaces into the final query string,

Exception: cannot determine sort type

2004-12-22 Thread Kauler, Leto S
We have been implementing Lucene as the datasource for our website--Lucene is exposed through a java web service which our ASP pages query and process. So far things have been going very well and in general tests everything has been fine. Interestingly though, under a small server stress test

RE: Exception: cannot determine sort type

2004-12-23 Thread Kauler, Leto S
Thanks for the replies! It would seem best for us to move to specifying the sort type--good practice anyway and prevents possible field problems. I plan to run the stress testing again today but turning off the sorting (just using default SCORE) and see how that goes. Seasons greetings to you

Disk space used by optimize

2005-01-27 Thread Kauler, Leto S
Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down

RE: Disk space used by optimize

2005-01-27 Thread Kauler, Leto S
? Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding

RE: which HTML parser is better?

2005-02-02 Thread Kauler, Leto S
We index the content from HTML files and because we only want the good text and do not care about the structure, well-formedness, etc we went with regular expressions similar to what Luke Shannon offered. Only real difference being that we firstly remove entire blocks of (script|style|csimport)

RE: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Kauler, Leto S
First thing that jumps out is case-sensitivity. Does your olFaithFull field contain stillHere or stillhere? --Leto -Original Message- From: Luke Shannon [mailto:[EMAIL PROTECTED] This works: query1 = QueryParser.parse(jpg, kcfileupload, new StandardAnalyzer()); query2 =

RE: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Kauler, Leto S
: Every document that doesn't have a field containing x stillHere Capital H. - Original Message - From: Kauler, Leto S [EMAIL PROTECTED] To: Lucene Users List lucene-user@jakarta.apache.org Sent: Thursday, February 03, 2005 6:40 PM Subject: RE: Parsing The Query: Every

Follow-up to sorting tokenised field

2005-02-09 Thread Kauler, Leto S
Have been reading this thread http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg11180.htm l. Praveen Peddi (or anyone else), did you ever try the patch? I would be interested to know what sort of performance difference it makes. I have been trying to create a most-simple solution

Tokenised and non-tokenised terms in one field

2005-02-09 Thread Kauler, Leto S
Hi all, Seeking some best practice advice, or even if there is an alternative solution. Sorry for the email length, just trying to explain succinctly. Currently we add fields to our index like this (for reference, Field booleans are STORE, INDEX, TOKENISE): doc.add(new Field(field, value,