[jira] Updated: (LUCENE-2662) BytesHash

2010-10-03 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2662:
-

Affects Version/s: (was: Realtime Branch)
Fix Version/s: (was: Realtime Branch)

 BytesHash
 -

 Key: LUCENE-2662
 URL: https://issues.apache.org/jira/browse/LUCENE-2662
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
 LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch


 This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2662) BytesHash

2010-10-02 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2662:


Attachment: LUCENE-2662.patch

This patch fixes nulling out the recycled but not reused byte blocks in 
RecyclingByteBlockAllocator.

I thing we are ready to go I will commit to trunk soon. I don't think we need a 
CHANGES.TXT here - at least I can not find any section this refactoring would 
fit to. 

simon

 BytesHash
 -

 Key: LUCENE-2662
 URL: https://issues.apache.org/jira/browse/LUCENE-2662
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch, 4.0
Reporter: Jason Rutherglen
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch, 4.0

 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
 LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch


 This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2662) BytesHash

2010-09-30 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2662:


Attachment: LUCENE-2662.patch

Next iteration - seems to be very close!

I have applied the following changes:

* introduces a AtomicLong to track bytesUsed in DocumetnsWriter, 
TermsHashPerField, ByteRefHash and RecyclingByteBlockAllocator
* Factored out  a BytesStartArray class from BytesRefHash that manages the 
int[] holding the bytesStart offsets. TermsHashPerField subclasses and manages 
the ParallelPostingsArray through it. 
* remove remaining no-commits
* made RecyclingbyteBlockAllocator synced by default (we use synchronized 
methods for it now)

I run a quick Wikipedia 100k docs benchmark against trunk vs. LUCENE-2662 and 
the results are promising.
|version|rec/sec|elapsed sec|avgUsedMem|
|LUCENE-2662|717.30|139.41|536,682,592|
|trunk| 682.66|146.49|546,065,344|

I will run the 10M benchmark once I get back to this.


 BytesHash
 -

 Key: LUCENE-2662
 URL: https://issues.apache.org/jira/browse/LUCENE-2662
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch, 4.0
Reporter: Jason Rutherglen
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch, 4.0

 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
 LUCENE-2662.patch, LUCENE-2662.patch


 This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2662) BytesHash

2010-09-27 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2662:


Attachment: LUCENE-2662.patch

We are almost there. I factored out ByteRefHash out of TermsHashPerField just 
having two nocommit parts left in the code I need to find a solution for. 

* there needs to be a way to communicate the byte usage up to DocumentsWriter 
which I haven't explored yet
* textStarts in ParallelPostingsArray needs to be replaced since it is already 
maintained in ByteRefHash. I will need to look closer into that but suggestions 
are welcome. One way to do it would be to attach a reference to BRH instead of 
the textStart - but that is a naive suggestion since I haven't looked into that 
in more detail.

All tests are passing so far and TermsHashPerField looks somewhat cleaner. I 
will work on fixing those nocommits and run some indexing perf test against the 
patch. 



 BytesHash
 -

 Key: LUCENE-2662
 URL: https://issues.apache.org/jira/browse/LUCENE-2662
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch, 4.0
Reporter: Jason Rutherglen
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch, 4.0

 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
 LUCENE-2662.patch


 This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2662) BytesHash

2010-09-25 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2662:


Attachment: LUCENE-2662.patch

Attaching my current state for feedback and iteration.

* factored out ByteBlockAllocator from DocumentsWriter
* moved ByteBlockPool to o.a.l.util
* added RecyclingByteBlockAllocator which can be used with or without 
synchronization. IMO the DummyConcurrentLock will be optimized away so that his 
might be super low cost. - feedback for that would more than welcome. 
* addressed all the comments from mike - thanks again
* added more tests
* cut over constants from DocumentsWriter to ByteBlockPool

TermsHashPerField is next feedback welcome.

simon

 BytesHash
 -

 Key: LUCENE-2662
 URL: https://issues.apache.org/jira/browse/LUCENE-2662
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch, 4.0
Reporter: Jason Rutherglen
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch, 4.0

 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch


 This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2662) BytesHash

2010-09-24 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2662:


Attachment: LUCENE-2662.patch

This patch contains a slightly different version of BytesHash (renamed it to 
BytesRefHash but that is to be discussed - while writing this I actually think 
BytesHash is the better name).  BytesRefHash is now final and does not create 
Entry objects anymore. Internally it maintains two integer arrays one acting as 
the hash buckets and the other one contain the bytes-start offset in the 
ByteBlockPool. Each added entry is assigned to an increasing ordinal since this 
is what Entry is used in almost all use-cases (in CSF though). For 
TermsHashPerField this is also native since is uses the same kind of 
referencing system.

These changes keep this class as efficient as possible, keeping GC costs low 
and allows JIT to do better optimizations. IMO this class is super performance 
critical and since we recently refactored indexing towards parallel arrays 
adding another object array might not be the way to go anyway.

I also incorporated robers comments - thanks for the review anyway. I guess 
that is the first step towards factoring it out of TermsHashPerField, the next 
question is are we gonna do that in a different issue and get this committed 
first?

comments / review welcome!!

One more thing, I did not move ByteBlockPool to o.a.l.utils but I thing it 
belongs there, thoughts?

 BytesHash
 -

 Key: LUCENE-2662
 URL: https://issues.apache.org/jira/browse/LUCENE-2662
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch, 4.0
Reporter: Jason Rutherglen
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch, 4.0

 Attachments: LUCENE-2662.patch, LUCENE-2662.patch


 This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2662) BytesHash

2010-09-22 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2662:
-

Attachment: LUCENE-2662.patch

We need unit tests and a base implementation as BytesHash is abstract...

 BytesHash
 -

 Key: LUCENE-2662
 URL: https://issues.apache.org/jira/browse/LUCENE-2662
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2662.patch


 This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2662) BytesHash

2010-09-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2662:


Fix Version/s: 4.0
Affects Version/s: 4.0

 BytesHash
 -

 Key: LUCENE-2662
 URL: https://issues.apache.org/jira/browse/LUCENE-2662
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch, 4.0
Reporter: Jason Rutherglen
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch, 4.0

 Attachments: LUCENE-2662.patch


 This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2662) BytesHash

2010-09-21 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2662:
-

Priority: Minor  (was: Major)

 BytesHash
 -

 Key: LUCENE-2662
 URL: https://issues.apache.org/jira/browse/LUCENE-2662
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Realtime Branch


 This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org