Re: Lucene demo ideas?

2003-09-17 Thread Ben Litchfield
- Index text and HTML files. Any others? What, no PDF files!! Ben -- http://www.pdfbox.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: Lucene demo ideas?

2003-09-17 Thread Killeen, Tom
I would suggest XML as well. Tom -Original Message- From: Ben Litchfield [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 17, 2003 7:42 AM To: Lucene Users List Subject: Re: Lucene demo ideas? - Index text and HTML files. Any others? What, no PDF files!! Ben --

Re: Lucene demo ideas?

2003-09-17 Thread Erik Hatcher
please keep the discussions on the lucene-user e-mail list. of course the source code will be available... what is there is already in lucene's CVS and i will just revamp what is there and commit it. and when we make lucene releases it will be bundled and made available as a single download

Re: Lucene demo ideas?

2003-09-17 Thread Erik Hatcher
On Wednesday, September 17, 2003, at 08:43 AM, Killeen, Tom wrote: I would suggest XML as well. Again, I'd like to hear more about how you'd do this generically. Tell me what the field names and values would correspond to when presented with an XML file. Erik

Re: Lucene demo ideas?

2003-09-17 Thread Pete Lewis
Might want two demos, one for Unix environments and one for Windows. Most users will want a fast start that they can copy and adapt. So quick targets would be: filesystems - html / text / pdf / office documents for windows. xml - fairly simple example maybe against news items. database - again

Re: Lucene demo ideas?

2003-09-17 Thread Bryan LaPlante
I would like to see the taglib for searching the index in the demo. There is an html form page and result page already built for the taglib that allows you to change search params and demonstrates a fair amount of the search capability of Lucene. - Original Message - From: Erik Hatcher

RE: Lucene demo ideas?

2003-09-17 Thread Pitre, Russell
I know this may be far fetched, but how about being able to index .jsp'sI know this is a spindle thing, but It seems a lot of people need this functionality. My suggestion Russ -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 17, 2003

slow performance with Date Range Searching

2003-09-17 Thread Killeen, Tom
Hello all, I have recently indexed approx 15.8 million XML documents in which I index the contents certain elements (titles, states, dates to name a few). I have 27 separate indices and use a MultiSearcher to search these indices. When I search on the title and state fields with multiple

RE: Lucene demo ideas?

2003-09-17 Thread Robert Koberg
Hi, Here are a couple of ideas for XML demos: 1. simply index the content into one 'content' field. Don't worry about attributes. 2. index a linked Dublin core meta data file: link rel=meta href=index.rdf / And add fields for every element after rdf:Description Best, -Rob -Original

RE: slow performance with Date Range Searching

2003-09-17 Thread Dan Quaroni
I don't know how lucene handles date ranges, but I was having very poor results using booleans between different because of the way lucene handles them. What lucene does is that it evaluates each field in the query separately and retrieves all of the results, then it evaluates the boolean joins

wildcard search and german umlauts

2003-09-17 Thread Hackl, Rene
Hi All, has someone ever written an extension of QueryParser providing the possibility to let wildcard search terms be run through an analyzer ( as suggested by Tatu Saloranta a while ago)? I want to reduce german umlauts to their base letters (eg. 'รค' (auml;) to 'a' ) and for non-wildcard

Re: slow performance with Date Range Searching

2003-09-17 Thread Eric Jain
Does anyone have any suggestions for searching date ranges. Our ranges will generally be between a 3 - 7 year period. Apparently Lucene expands ranges to boolean 'or' queries. So if you have a thousand distinct dates within a range, Lucene will build a query with a thousand terms... One

Re: Lucene demo ideas?

2003-09-17 Thread Eric Jain
Does anyone have any suggestions on what they'd like to see in the demo app? Show how lucene can 1) do incremental indexing, 2) isn't restricted to indexing file system resources and 3) can store and query arbitrary fields. These are in my opinion the features where most other search engines

Re: Lucene demo ideas?

2003-09-17 Thread hui
I think all the attribute values together with element text values should be indexed in the content part. Also a xml map file could be used to pick up the nodes need to be indexed separately so we do not create too many fields by indexing non-critical nodes separately. Simple xpath could be used

Re: Lucene demo ideas?

2003-09-17 Thread Andrzej Bialecki
Erik Hatcher wrote: On Wednesday, September 17, 2003, at 08:42 AM, Ben Litchfield wrote: What, no PDF files!! Haha! http://www.pdfbox.org And I've used pdfbox before - its cool. And I'm cool with adding PDF and Word indexing to the demo personally, but I didn't want to increase the weight

Re: Lucene demo ideas?

2003-09-17 Thread Erik Hatcher
On Wednesday, September 17, 2003, at 09:21 AM, Pitre, Russell wrote: I know this may be far fetched, but how about being able to index .jsp'sI know this is a spindle thing, but It seems a lot of people need this functionality. Like I communicated in a previous thread, indexing JSP's just

Re: slow performance with Date Range Searching

2003-09-17 Thread Doug Cutting
Killeen, Tom wrote: My query would look something like this: LongTitle:killeen AND LongTitle:state AND StateDistrict:id AND FiledDate:[1997-01-01 TO 2002-04-04] and it returned in 5.7 seconds Does anyone have any suggestions for searching date ranges. Our ranges will generally be between a 3 -

Re: Lucene demo ideas?

2003-09-17 Thread Jeff Linwood
Paging would be great for the results. Jeff - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, September 17, 2003 7:00 AM Subject: Lucene demo ideas? I'm about to start some refactorings on the web application demo that ships with Lucene

Re: slow performance with Date Range Searching

2003-09-17 Thread Erik Hatcher
And with the latest Lucene codebase in CVS, you could also use a DateFilter wrapped inside a CachingWrapperFilter instead of a QueryFilter. Just wanted to mention what is now available. But I'll reiterate what Doug says... be sure to save off the filter instance so you don't take the

Re: Lucene Scoring Behavior

2003-09-17 Thread Erik Hatcher
20030917 (that is, today), I get 157 hits, all of which have a score of .23000652. If I use 20030916 (yesterday), I get 197 hits, each of which has a score of .22295427. So far, all seems logical. However, when I search for all records for the date 20030915, the first two (of 174 hits) have

Re: Lucene Scoring Behavior

2003-09-17 Thread Doug Cutting
Steichen wrote: I've run across some puzzling behavior regarding scoring. I have a set of documents which contain, among others, a date field (whose contents is a string in the MMDD format). When I query on the date 20030917 (that is, today), I get 157 hits, all of which have a score

Re: Lucene Scoring Behavior

2003-09-17 Thread Terry Steichen
* super.lengthNorm(fieldName, Math.max(numTerms,750)); } else { return super.lengthNorm(fieldName, Math.max(numTerms, 750)); } } } Query #1: pub_date:20030917 All items: Score: .23000652 0.23000652 = weight(pub_date:20030917 in 91197), product of: 0.9994 = queryWeight(pub_date:20030917

Re: Lucene Scoring Behavior

2003-09-17 Thread Doug Cutting
Terry Steichen wrote: 0.03125 = fieldNorm(field=pub_date, doc=90992) 1.0 = fieldNorm(field=pub_date, doc=90970) It looks like the fieldNorm's are what differ, not the IDFs. These are the product of the document and/or field boost, and 1/sqrt(numTerms) where numTerms is the number of terms

Re: Lucene demo ideas?

2003-09-17 Thread Marco Tedone
I would have the code ready is wanted... - Original Message - From: Pitre, Russell [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, September 17, 2003 2:21 PM Subject: RE: Lucene demo ideas? I know this may be far fetched, but how about being able to index

Re: Lucene demo ideas?

2003-09-17 Thread Marco Tedone
Yeah, that would be great! - Original Message - From: Jeff Linwood [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, September 17, 2003 5:15 PM Subject: Re: Lucene demo ideas? Paging would be great for the results. Jeff - Original Message - From:

Re: Lucene Scoring Behavior

2003-09-17 Thread Terry Steichen
Doug, (1) No, I did *not* boost the pub_date field, either in the indexing process or in the query itself. (2) And, each pub_date field of each document (which is in XML format) contains only one instance of the date string. (3) And only the pub_date field itself is indexed. There are other

Re: Lucene demo ideas?

2003-09-17 Thread Tatu Saloranta
On Wednesday 17 September 2003 07:07, Erik Hatcher wrote: On Wednesday, September 17, 2003, at 08:43 AM, Killeen, Tom wrote: I would suggest XML as well. Again, I'd like to hear more about how you'd do this generically. Tell me what the field names and values would correspond to when

Re: Lucene Scoring Behavior

2003-09-17 Thread Doug Cutting
Hmm. This makes no sense to me. Can you supply a reproducible standalone test case? Doug Terry Steichen wrote: Doug, (1) No, I did *not* boost the pub_date field, either in the indexing process or in the query itself. (2) And, each pub_date field of each document (which is in XML format)