Re: sort by field length

2010-08-27 Thread Lance Norskog
You might be better off starting with the Lucene CheckIndex program. It walks all of the Lucene index data structures. I have done forensics by fiddling with the CheckIndex code. On Thu, Aug 26, 2010 at 9:11 AM, Shawn Heisey s...@elyograg.org wrote:  On 5/24/2010 6:30 AM, Sascha Szott wrote:

Re: sort by field length

2010-08-26 Thread Shawn Heisey
On 5/24/2010 6:30 AM, Sascha Szott wrote: Hi folks, is it possible to sort by field length without having to (redundantly) save the length information in a seperate index field? At first, I thought to accomplish this using a function query, but I couldn't find an appropriate one. I have

Re: sort by field length

2010-05-26 Thread Sascha Szott
Hi Erick, Erick Erickson wrote: Ah, I may have misunderstood, I somehow got it in my mind you were talking about the length of each term (as in string length). But if you're looking at the field length as the count of terms, that's another question, sorry for the confusion... I have to ask,

Re: sort by field length

2010-05-26 Thread Erick Erickson
Take a look at the scoring algorithm on the Wiki, it already takes this into account, albeit modified by how many times the term is mentioned in the field. So a field with 5 terms and one match will score higher than one with 10 terms and one match. Where it lands with 10 terms and 2 matches I

Re: sort by field length

2010-05-25 Thread Sascha Szott
Hi Erick, Erick Erickson wrote: Are you sure you want to recompute the length when sorting? It's the classic time/space tradeoff, but I'd suggest that when your index is big enough to make taking up some more space a problem, it's far too big to spend the cycles calculating each term length for

Re: sort by field length

2010-05-25 Thread Erick Erickson
Ah, I may have misunderstood, I somehow got it in my mind you were talking about the length of each term (as in string length). But if you're looking at the field length as the count of terms, that's another question, sorry for the confusion... I have to ask, though, why you want to sort this

Re: sort by field length

2010-05-24 Thread Erick Erickson
Are you sure you want to recompute the length when sorting? It's the classic time/space tradeoff, but I'd suggest that when your index is big enough to make taking up some more space a problem, it's far too big to spend the cycles calculating each term length for sorting purposes considering you