Confusion over wildcard search logic

2003-09-22 Thread Dan Quaroni
Hi there. I've got an index of company names, and it's split up into separate indexes by state. I have a simple command line interface for testing. I'm getting some odd results, though, with certain logic of wildcard searches. It seems like depending on what order I put the fields of the query

Re: Is it possible in lucene for numeric search

2003-09-22 Thread Terry Steichen
You can also use a RangeQuery. If you index the field of numeric data, say 'score', as a string, then you can do things like: score:[75 TO 80]. Only extra work is that you need to pad the actual score with enough 0's (such that 9 becomes 09, etc.) to cover the expected range. Regards, Terry --

Re: HTML Parsing problems...

2003-09-22 Thread Michael Giles
Yeah, I was using HTMLParser for a few days until I tried to parse a 400K document and it spun at 100% CPU for a very long time. It is tolerant of bad HTML, but does not appear to scale. TagSoup processed the same document in a second or less at <25% CPU. -Mike At 02:42 PM 9/22/2003 +0200, y

Re: per-field Analyzer (was Re: some requests)

2003-09-22 Thread hui
Good work, Erik. Hui - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Saturday, September 20, 2003 4:13 AM Subject: per-field Analyzer (was Re: some requests) > On Friday, September 19, 2003, at 07:45 PM, Erik Hatcher wrot

Re: Is it possible in lucene for numeric search

2003-09-22 Thread Erik Hatcher
Yes, you can do numeric searches as long as you realize its really just text that is indexed. You will need to ensure the Analyzer you use indexes numbers appropriately as well. Erik On Monday, September 22, 2003, at 02:06 AM, Senthil Kumar K wrote: Hi, I found that lucene is a full-feat

Re: HTML Parsing problems...

2003-09-22 Thread Andrzej Bialecki
Michael Giles wrote: Erik, Probably a good idea to swap something else in, although Neko introduces a dependency on Xerces. I didn't play with Neko because I am currently using a different XML parser and didn't want to deal with the conflicts (and also find dependencies on specific parsers ann

Distributed Indexing

2003-09-22 Thread Albert Vila Puig
Hi, I have to develop a distributed search engine for my company. I’m very interested with the Lucene index format, and I want to use it. The main problem is how to distribute the index in the different machines. The solution is not just copy the index, because I have to manage 50Gb of data. I