[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2011-01-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986630#action_12986630
 ] 

Michael McCandless commented on LUCENE-1574:


I've been testing on a 25M doc index (all of en Wikipedia, at least as of March 
2010).

Yes, I think likely alloc of big BitVector, System.arraycopy, destroying it, 
may be a fairly low cost compared to lucene resolving the deleted term, 
indexing the doc, flushing the tiny segment, etc.

 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-1574.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2011-01-25 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986637#action_12986637
 ] 

Jason Rutherglen commented on LUCENE-1574:
--

bq. I think likely alloc of big BitVector, System.arraycopy, destroying it, may 
be a fairly low cost compared to lucene resolving the deleted term, indexing 
the doc, flushing the tiny segment, etc.

Right, and if we pool the byte[]s we'd take the cost of instantiating and 
GC'ing out of the picture in the high NRT throughput case.  It's counter 
intuitive and will require testing.

 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-1574.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2011-01-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12985877#action_12985877
 ] 

Michael McCandless commented on LUCENE-1574:


bq. I'm curious how we plan on handling this case?

I think we should keep the replay log smallish, or, expire it aggressively with 
age.  I suspect this opto is only going to be worth it for very frequent 
reopens... but I'm not sure yet.

 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 4.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2011-01-24 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12985975#action_12985975
 ] 

Jason Rutherglen commented on LUCENE-1574:
--

What size segments is the benchmark deleting against?  Maybe we're 
underestimating the speed of arraycopy, eg, it's really a hardware operation 
that could be optimized?

 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-1574.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2011-01-23 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12985408#action_12985408
 ] 

Jason Rutherglen commented on LUCENE-1574:
--

We want to record the deletes between getReader calls however there's no way to 
know ahead of time if a term or query is going to hit many documents or not, 
meaning we can't always save del docids, because we'd have too many ints in 
RAM.  I'm curious how we plan on handling this case?


 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 4.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2011-01-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12985091#action_12985091
 ] 

Michael McCandless commented on LUCENE-1574:


I'm working on an initial patch for this...

bq. I think the only open question is how we'll shrink the pool, most likely 
there'd be an expiration on the pooled objects.

I think we can simply have a max pooled free bit vectors... or we may want to 
expire by time/staleness as well.

bq. With RT, the parallel arrays will grow, so the pool will need to be size 
based, eg, when the arrays are grown, all of the previous arrays may be 
forcefully evicted, or they may simply expire.

True... but, like the other per-doc arrays, the BV can be overallocated 
(ArrayUtil.oversize) to accommodate further added docs.

 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 4.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2011-01-22 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12985150#action_12985150
 ] 

Jason Rutherglen commented on LUCENE-1574:
--

ThreadPoolExecutor can act as a guide, it's main parameters are corePoolSize, 
maximumPoolSize, keepAliveTime.  

In regards to using System.arraycopy for the RT parallel arrays, if they grow 
to become too large, then SC could become a predominant cost.  However if the 
default thread states is 8, which'd mean that many DWPTs, the arrays would 
probably never grow to be too large for their SC to become noticeably 
expensive, hopefully.

 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 4.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2011-01-21 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984996#action_12984996
 ] 

Jason Rutherglen commented on LUCENE-1574:
--

I'm going to revive this, and if it works fine for trunk, then we can use the 
basic system for RT eg, LUCENE-2312.  I think the only open question is how 
we'll shrink the pool, most likely there'd be an expiration on the pooled 
objects.  With RT, the parallel arrays will grow, so the pool will need to be 
size based, eg, when the arrays are grown, all of the previous arrays may be 
forcefully evicted, or they may simply expire.



 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 4.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2009-11-11 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776576#action_12776576
 ] 

Jason Rutherglen commented on LUCENE-1574:
--

A likely optimization for this patch is we'll only pool if the doc count is 
above a threshold, 100,000 seems like a good number.  Also pooling will be 
optional.  

 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 3.1

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2009-11-10 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776299#action_12776299
 ] 

Jason Rutherglen commented on LUCENE-1574:
--

Yonik,

Do you recommend using the method in SimpleStringInterner for lockless pooling?

 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 3.1

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2009-11-10 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776321#action_12776321
 ] 

Jason Rutherglen commented on LUCENE-1574:
--

I suppose as we're on Java 1.5, ConcurrentLinkedQueue can be used.

 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 3.1

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2009-08-01 Thread John Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737950#action_12737950
 ] 

John Wang commented on LUCENE-1574:
---

Re: Zoie and deleted docs:
That is no longer true, Zoie is using a bloom filter over a intHash set from 
fastutil for exactly the perf reason Jason pointed.

 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 3.1

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2009-04-02 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695130#action_12695130
 ] 

Jason Rutherglen commented on LUCENE-1574:
--

True the pool would hold onto spares, but they would expire.
It's mostly useful for the large on disk segments as those byte
arrays (for BitVectors) are large, and because there's more docs
in them would get hit with deletes more often, and so they'd be
reused fairly often. 

I'm not knowledgeable enough to say whether the transactional
data structure will be fast enough. We had been using
http://fastutil.dsi.unimi.it/docs/it/unimi/dsi/fastutil/ints/IntR
BTreeSet.html in Zoie for deleted docs and it's way slow. Binary
search of an int array is faster, albeit not fast enough. The
multi dimensional array thing isn't fast enough (for searching)
as we implemented this in Bobo. It's implemented in Bobo because
we have a multi value field cache (which is quite large because
for each doc we're storing potentially 64 or more values in an
inplace bitset) and a single massive array kills the GC. In some
cases this is faster than a single large array because of the
way Java (or the OS?) transfers memory around through the CPU
cache. 

 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org