[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922894#action_12922894
 ] 

Michael McCandless commented on LUCENE-2655:


bq. You're saying record a list of segments that existed at the time of 
flushing a DWPT's deletes?

Actually, I think it's simpler!  I think the DWPT just records the index of the 
last segment in the index, as of when it is created (or re-inited after it's 
been flushed).

On flush of a given DWPT, its buffered deletes are recorded against that 
segment, and still carry over the lastSegmentIndex.  This way, when we finally 
do resolve these deletes to docIDs, we 1) always apply the delete if segment = 
lastSegmentIndex, or 2) the doc is in that segment and is = the docID upto.  I 
think this'd mean we can keep the docid-upto as local docIDs, which is nice (no 
globalizing/shifting-on-merge/flush needed).

So, segments flushed in the current IW session will carry this private pool of 
pending deletes.  But, pre-existing segments in the index don't need their own 
pool.  Instead, when it's time to resolve the buffered deletes against them 
(because they are about to be merged), they must walk all of the per-segment 
pools, resolving the deletes from that pool if its segment index is = the 
lastSegmentIndex of that pool.  We should take care to efficiently handle dup'd 
terms, ie where the same del term is present in multiple pools.  The most 
recent one wins, and we should do only one delete (per segment) for that term.

These per-segment delete pools must be updated on merge.  EG if the 
lastSegmentIndex of a pool gets merged, that's fine, but then on merge commit 
we must move that lastSegmentIndex backwards to the last segment before the 
merge, because any deletes necessary within the segment will have been resolved 
already.

When segments with del pools are merged, we obviously apply the deletes to the 
segments being merged, but, then, we have to coalesce those pools and move them 
into a single pool on the segment just before the merge.  We could actually use 
a Set at this point since there is no more docid-upto for this pool (ie, it 
applies to all docs on that segment and in segments prior to it).

So I think this is much simpler than I first thought!

bq. Lets get that data structure mapped out to start on LUCENE-2680?

+1

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923189#action_12923189
 ] 

Michael McCandless commented on LUCENE-2655:


bq. Can you explain what you mean by per-segment pools compared with the 
flushed deletes data structure in there today, ie, why are they pools?

Sorry, by pool I mean what we have today.  Ie a map from del term/query - 
global docid-upto.

Except, with this change it will be per-segment, only for those segments 
flushed since IW was opened or since last commit.

bq. I can start on the simplified version of this for the non-DWPT use case, 
eg, LUCENE-2680.

Yes, please!

This approach works fine for the single-DWPT case (ie what we now have).  It'll 
be good to get it first working on this simpler case, then port to the RT 
branch to get it working for multiple DWPTs.

bq. That version'll implement the delete pool per segment only for those 
segments created after the last flush/commit?

Created after IW was opened or after last commit, yes.

I think we'll need an applyAllDeletes (called by commit) vs an 
applyDeletesToSegment (called by merge).

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920665#action_12920665
 ] 

Michael McCandless commented on LUCENE-2655:


Just to confirm: when a delete is done, we go and buffer that delete into each 
DWPT, right?  (Mapped to the then-current docid-upto for that DWPT).

OK I see the problem.  It's because we now have a single pool for 
deletesFlushed, right?  Ie, DWPT #2 will overwrite the term id:1 entry.

But, I think the switch to generations of pending deletes (LUCENE-2680) would 
fix this?  Maybe we should go do that one first...

Ie, DWPT #1's flush would enter a new gen delete pool (maps term - global 
docid-upto).  Then DWPT #2's flush would also enter a new gen delete pool.  
Hmm, but not quite... the generations can't simply stack on top of one another. 
 I think there's a graph structure somehow?  Ie every DWPT that's flushed must 
record the segments that existed (were already flushed) when it was first 
created, because it's only those segments that should get the deleted term.  
Segments in the future obviously shouldn't get it.  And segments from parallel 
DWPTs (ie that existed at the same time) should also not get it since they will 
separately track the deleted term.

BTW, I think this makes LUCENE-2679 all the more important (the ability to 
delete such that delete will only apply to already-committed segments), since 
this'd mean we only store the pending delete in a single map instead of map per 
DWPT.


 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917173#action_12917173
 ] 

Michael McCandless commented on LUCENE-2655:


bq. Ok, I have been stuck/excited about not having to use/understand the 
remap-docids method, because it's hard to debug. However I see what you're 
saying, and why remap-docids exists. I'll push the DWP buffered deletes to the 
flushed deletes.

I think we still must remap, at least on the pushed (deletesFlushed) deletes?

On the buffered deletes for the DWPT (deletesInRAM), I think we can make these 
relative to the DWPT (ie start from 0), but on pushing them into flushed 
deletes we re-base them?

bq. This large cost is from loading the terms index and deleted docs?

Yes.  We don't (hopefully) load norms, field cache, etc.

bq. When those large segments are merged though, the IO cost is so substantial 
that loading tii or del into RAM probably doesn't account for much of the 
aggregate IO, they're probably in the noise?

Well, the applyDeletes method is sync'd, vs merging which is fully concurrent.  
(Also, merging doesn't load the tii).

bq. Or are you referring to the NRT apply deletes flush, however that is on a 
presumably pooled reader?

Right, it would be pooled for the NRT case, so this is only a (sizable) perf 
problem for the non-nrt case.

bq. Or you're just saying that today we're applying deletes across the board to 
all segments prior to a merge, regardless of whether or not they're even 
involved in the merge? It seems like that is changeable?

Right!  That's what we do today (apply deletes to all segs) whereas it's really 
only necessary to apply them to the segments being merged.  I opened 
LUCENE-2680 to track this.

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916851#action_12916851
 ] 

Michael McCandless commented on LUCENE-2655:


Stepping back here...

There are two different goals mixed into the RT branch effort, I
think:

# Make thread states fully independent so flushing is no longer sync'd (plus 
it's a nice simplification, eg no more *PerThread in the indexing chain)
# Enable direct searching on the thread states RAM buffer, for awesome
  NRT performance

It seems to me like the first one is not so far off?  Ie we nearly
have it already (LUCENE-2324)... it's just that we don't have the
deletes working?

Whereas the 2nd one is a much bigger change, and still iterating under
LUCENE-2475.

Is it possible to decouple #1 from #2?  Ie, bring it to a committable
state and land it on trunk and let it bake some?

Eg, on deletes, what if we simply have each thread state buffer its own
delete term - thread's docID, privately?  We know this approach
will work (it does today), right?  It's just wasteful of RAM (though,
cutover to BytesRefHash should help alot here), and, makes
deletes somewhat slower since you must now enroll the del term
into each thread state

It wouldn't actually be that wasteful of RAM, since the BytesRef instance
would be shared across all the maps.  Also, if we wanted (separately)
we could make a more efficient buffer when the app deletes many terms
at once, or, many calls to delete-by-single-term with no adds in between
(much like how Amazon optimizes N one-click purchases in a row...).

I really want to re-run my 6-thread indexing test on beast and see the
indexing throughput double!!

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916933#action_12916933
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

Are you saying we should simply make deletes work (as is, no BytesRefHash 
conversion) then cleanup the RT branch as a merge to trunk of the DWPT changes? 
 I was thinking along those lines.  I can spend time making the rest of the 
unit tests work on the existing RT revision, though should this should probably 
happen in conjunction with a merge from trunk.  

Or simply make the tests pass, and merge RT - trunk afterwards?

Also, from what I've seen, deletes seem to work, I'm not sure what exactly 
Michael is referring to.  I'll run the full 'suite' of unit tests, and just 
make each work?

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916946#action_12916946
 ] 

Michael McCandless commented on LUCENE-2655:


bq.  Are you saying we should simply make deletes work (as is, no BytesRefHash 
conversion) then cleanup the RT branch as a merge to trunk of the DWPT changes?

Yes!  Just keep using the Map we now use, but now it must be per-DWPT.  And 
when IW.deleteDocs(Term/Query) is called, we must go to each DWPT, grab its 
current docID, and enroll Term/Query - docID into that DWPT's pending deletes 
map.

bq. Also, from what I've seen, deletes seem to work, I'm not sure what exactly 
Michael is referring to.

But this is using the long sequenceID right?  (adds 8 bytes per buffered 
docID).  I think we wanted to explore ways to reduce that?  Or, if we can make 
it modal (you only spend these 8 bytes if IW is open in NRT mode), then that's 
OK?

I was hoping to cleanly separate the DWPT cutover (solves the nasty flush 
bottleneck) from the NRT improvements.

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916979#action_12916979
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

{quote}when IW.deleteDocs(Term/Query) is called, we must go to each DWPT, grab 
its current docID, and enroll Term/Query - docID into that DWPT's pending 
deletes map.{quote}

Ok, that's the change you're referring to. In the current RT revision, the 
deletes are held in one map in DW, guess we need to change that. However if we 
do, why do we need to keep the seq id or docid as the value in the map? When 
the delete arrives into the DWPT, we know that any buffered docs with that 
term/query need to be deleted on flush? (ie, lets *not* worry about the RT 
search use case, yet). ie2, we can simply add the terms/queries to a set, and 
apply them on flush, ala LUCENE-2679?

bq. NRT improvements

We're referring to LUCENE-1516 as NRT and LUCENE-2312 as 'RT'. I'm guessing you 
mean RT? 

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917004#action_12917004
 ] 

Michael McCandless commented on LUCENE-2655:


bq. I'm guessing you mean RT?

Duh sorry yes... I'll try to use the right name :)

bq. In the current RT revision, the deletes are held in one map in DW, guess we 
need to change that.

Right, one map that maps the del term to the long sequence ID right?  I'm 
thinking we revert just this part, and go back to how trunk how buffers deletes 
(maps to docID), but per-DWPT.

bq. However if we do, why do we need to keep the seq id or docid as the value 
in the map? When the delete arrives into the DWPT, we know that any buffered 
docs with that term/query need to be deleted on flush?  (ie, lets not worry 
about the RT search use case, yet). ie2, we can simply add the terms/queries to 
a set, and apply them on flush, ala LUCENE-2679?

For the interleaved case.  Ie, LUCENE-2679 is optional -- we still must handle 
the interleaved case correctly (and, I think, by default).  But if an app uses 
the opto in LUCENE-2679 then we only need a single Set.

Basically, the DWPT change, alone, is hugely valuable and (I think) 
decouple-able from the trickier/riskier/RAM-consuminger RT changes.

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917046#action_12917046
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

We've implied an additional change to the way deletes are flushed in that 
today, they're flushed in applyDeletes when segments are merged, however with 
flush-DWPT we're applying deletes after flushing the DWPT segment.

Also we'll have a globalesque buffered deletes presumably located in IW that 
buffers deletes for the existing segments, and these should [as today] be 
applied only when segments are merged or getreader is called?

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917055#action_12917055
 ] 

Michael McCandless commented on LUCENE-2655:


bq. We've implied an additional change to the way deletes are flushed in that 
today, they're flushed in applyDeletes when segments are merged, however with 
flush-DWPT we're applying deletes after flushing the DWPT segment.

Why is that change needed again?  Deferring until merge kickoff is a win, eg 
for a non N/R/T (heh) app, it means 1/10th the reader open cost (w/ default 
mergeFactor=10)?  Opening/closing readers can be costly for a large index.

Really, some day, we ought to only apply deletes to those segments about to be 
merged (and keep the buffer for the rest of the segments). Eg most merges are 
small... yet we'll pay huge cost opening that massive grandaddy segment every 
time these small merges kick off.  But that's another issue...

Why can't we just do what we do today?  Ie push the DWPT buffered deletes into 
the flushed deletes?

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917076#action_12917076
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

Ok, I have been stuck/excited about not having to use/understand the 
remap-docids method, because it's hard to debug.  However I see what you're 
saying, and why remap-docids exists.  I'll push the DWP buffered deletes to the 
flushed deletes.  

bq. we'll pay huge cost opening that massive grandaddy segment 

This large cost is from loading the terms index and deleted docs?   When those 
large segments are merged though, the IO cost is so substantial that loading 
tii or del into RAM probably doesn't account for much of the aggregate IO, 
they're probably in the noise?  Or are you referring to the NRT apply deletes 
flush, however that is on a presumably pooled reader?  Or you're just saying 
that today we're applying deletes across the board to all segments prior to a 
merge, regardless of whether or not they're even involved in the merge?  It 
seems like that is changeable?

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-09-29 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916366#action_12916366
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

Maybe I was using the new flex API wrongly, when I pass in the deleted docs to 
MultiFields.getTermDocsEnum, the test case passes.

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-09-29 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916367#action_12916367
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

I'm only seeing this test from TestIndexWriterDelete fail:

[junit] junit.framework.AssertionFailedError: expected:3 but was:0
[junit] at 
org.apache.lucene.index.TestIndexWriterDelete.testMaxBufferedDeletes(TestIndexWriterDelete.java:118)
[junit] at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328)


 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-09-29 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916370#action_12916370
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

There's this one as well, which I'll focus on, though it's an error in 
IR.isCurrent, it doesn't immediately appear to be related to deletes.
{code}
[junit] Testsuite: org.apache.lucene.index.TestIndexWriterReader
[junit] Testcase: 
testUpdateDocument(org.apache.lucene.index.TestIndexWriterReader):FAILED
[junit] null
[junit] junit.framework.AssertionFailedError: null
[junit] at 
org.apache.lucene.index.TestIndexWriterReader.testUpdateDocument(TestIndexWriterReader.java:104)
[junit] at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328)
{code}

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2655.patch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-09-19 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912315#action_12912315
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

Here's a relevant comment from LUCENE-2324:

https://issues.apache.org/jira/browse/LUCENE-2324?focusedCommentId=12891256page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12891256

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-09-19 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912319#action_12912319
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

{quote}Maybe we could reuse (factor out) TermsHashPerField's
custom hash here, for the buffered Terms? It efficiently maps a
BytesRef -- int.{quote}

I'm trying to get a feel for what we kind of deletes we want
working in the flush by DWPT merge viz-a-viz the realtime branch
(ie, the release where we have the new realtime search/indexing
functionality). 

Factoring out BytesHash and storing the terms in a byte block
pool will allow replacing the current hash map of terms and
likely conserve more RAM. Will we need to replace doc id upto
and instead use sequence ids? 

We can additionally, for now, implement flushing deletes on
merges, like today, for the flush by DWPT merge to trunk. Then
implement live, aka foreground deletes, for the realtime search
branch merge to trunk. 

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-09-19 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912339#action_12912339
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

A couple more things... 

The BytesHash or some other aptly named class can implement the
quick sort of terms, again from TermsHashField, which will
replace the sorted terms map used in the RT branch for deletes.
The RT branch isn't yet calculating the RAM usage of deletes. By
using the byte block pool, calculating the RAM usage will be
trivial (as the the BBP automatically records the num bytes used). 

The RT branch has an implementation of delete using the min/max
sequence ids for a given segment. What else is needed? 

 Get deletes working in the realtime branch
 --

 Key: LUCENE-2655
 URL: https://issues.apache.org/jira/browse/LUCENE-2655
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch


 Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org