Writing out the term count when merging

2007-03-19 Thread Matt Chaput
Hi all! I'm reimplementing a very Lucene-like search library as a learning experience and I've run into a snag. Before I go deep code diving, I thought I'd ask here in case someone has the time to answer. The term dictionary file includes the term count in a header. But when I'm merging segm

Re: Writing out the term count when merging

2007-03-19 Thread Matt Chaput
robert engels wrote: but a better solution, since you probably need a indexed file into the terms file, you might not even need the term count, since you should read the indexed file into memory anyway (read every 16 entries, etc.) - at which point you will know the number of terms in the file.

How does segment merging work

2007-03-21 Thread Matt Chaput
Aside from the useful exchange I had with Robert, I'd still like to know how Lucene knows what value to write in the "term count" part of the term dictionary header when it's merging segments -- even if I decide forgo it in my own re-implementation. Of course, I can always just dive into the c

Re: How does segment merging work

2007-03-21 Thread Matt Chaput
robert engels wrote: It seeks back at the end to the location and writes the size. Ah! Sorry I didn't get that. Thanks for your help! Matt - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PR

Positions vs. Term Vectors

2007-03-22 Thread Matt Chaput
Hi, another abstract implementation question: Per Term Position (prox) data vs. Per Doc Term Vectors. Belt and Suspenders? Can't Term Vectors effectively (performantly) replace position data for doing phrase matches? Is there another use of position data that term vectors doesn't satisfy? D

[jira] Issue Comment Edited: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

2009-04-27 Thread Matt Chaput (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1270#action_1270 ] Matt Chaput edited comment on LUCENE-1613 at 4/27/09 1:1

[jira] Commented: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

2009-04-27 Thread Matt Chaput (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1270#action_1270 ] Matt Chaput commented on LUCENE-1613: - Given how fundamental the issue is w.r.t.