[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935716#action_12935716 ] Simon Willnauer commented on LUCENE-2662: - bq. I will keep it open until this is merged into Realtime Branch I think we should really close this since RT branch is not very active right now BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935735#action_12935735 ] Michael Busch commented on LUCENE-2662: --- bq. I think we should really close this since RT branch is not very active right now Sorry about that. I need to merge trunk into RT, then I'll get this change too. It's a big merge though with tons of conflicts... BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935739#action_12935739 ] Simon Willnauer commented on LUCENE-2662: - bq. Sorry about that. I need to merge trunk into RT, then I'll get this change too. It's a big merge though with tons of conflicts... HA! good to see you here! :) have fun with the merge BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935744#action_12935744 ] Uwe Schindler commented on LUCENE-2662: --- bq. HA! good to see you here! have fun with the merge He is working hard, it's 4:45 am in California :-) BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935746#action_12935746 ] Simon Willnauer commented on LUCENE-2662: - bq. He is working hard, it's 4:45 am in California true but he is in germany :D BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935747#action_12935747 ] Michael Busch commented on LUCENE-2662: --- Yeah sitting in Stuttgart, going to hit the Weihnachtsmarkt soon - let's see how the merge goes after several glasses of Gluehwein :) BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924484#action_12924484 ] Mathias Walter commented on LUCENE-2662: Why is this issue still open, if the patch was already committed to trunk? BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924488#action_12924488 ] Simon Willnauer commented on LUCENE-2662: - bq. Why is this issue still open, if the patch was already committed to trunk? see my comment above: bq. I will keep it open until this is merged into Realtime Branch BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917537#action_12917537 ] Michael McCandless commented on LUCENE-2662: This was already committed to trunk... BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917354#action_12917354 ] Jason Rutherglen commented on LUCENE-2662: -- Simon, I'm going to get deletes working, tests passing using maps in the RT branch, then we can integrate. This'll probably be best. BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917372#action_12917372 ] Simon Willnauer commented on LUCENE-2662: - bq. Simon, I'm going to get deletes working, tests passing using maps in the RT branch, then we can integrate. This'll probably be best. Jason, I suggest you create a separate issue something like Integrate BytesRefHash in Realtime Branch and I will take care of it. I think this issue had a clear target to factor out the hash table out of TermsHashPerField and we should close it. lets use a new one to track the integration. Thoughts? Simon BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917416#action_12917416 ] Jason Rutherglen commented on LUCENE-2662: -- Lets commit this to trunk. We need to merge in all of trunk to the RT branch, or vice versa at some point anyways. This patch could be a part of that bulk merge-in, or we can simply do it now. BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917188#action_12917188 ] Simon Willnauer commented on LUCENE-2662: - Committed to trunk in rev. 1003790 @Jason: do you need that merged into Realtime-Branch or is buschmi going to do that? Otherwise I can help too I will keep it open until this is merged into Realtime Branch BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916872#action_12916872 ] Michael McCandless commented on LUCENE-2662: I indexed 10M 1KB wikipedia docs, single threaded, and also see things a bit faster w/ the patch (10,353 docs/sec vs 10,182 docs/sec). Nice to have a refactor improve performance for a change, heh. The avgUsedMem was quite a bit higher (1.5GB vs 1.0GB), but, I'm not sure this stat is trustworthy I'll re-run w/ infoStream enabled to see if anything looks suspicious (eg, we are somehow not tracking bytes used correctly). Still, the resulting indices had identical structure (ie we seem to flush at exactly the same points), so I think bytes used is properly tracked. BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916873#action_12916873 ] Michael McCandless commented on LUCENE-2662: bq. Still, the resulting indices had identical structure (ie we seem to flush at exactly the same points), so I think bytes used is properly tracked. Sorry, scratch that -- I was inadvertently flushing by doc count, not by RAM usage. I'm re-running w/ flush-by-RAM to verify we flush at exactly the same points as trunk. BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916875#action_12916875 ] Michael McCandless commented on LUCENE-2662: In RecyclingByteBlockAllocator.recycleByteBlocks, if we cannot recycle all of the blocks (ie because it exceeds maxBufferedBlocks), we are failing to null out the entries in the incoming array? Also maybe rename pos - freeCount? (pos is a little too generic?) BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916882#action_12916882 ] Robert Muir commented on LUCENE-2662: - Simon, thank you for renaming the 'utf8' variables here. BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916885#action_12916885 ] Simon Willnauer commented on LUCENE-2662: - bq. Simon, thank you for renaming the 'utf8' variables here. YW :) bq. In RecyclingByteBlockAllocator.recycleByteBlocks, if we cannot recycle all of the blocks (ie because it exceeds maxBufferedBlocks), we are failing to null out the entries in the incoming array? Ahh you are right - I will fix. bq. Also maybe rename pos - freeCount? (pos is a little too generic?) I mean its internal though but I see your point. thanks for reviewing it closely. {quote} The avgUsedMem was quite a bit higher (1.5GB vs 1.0GB), but, I'm not sure this stat is trustworthy I'll re-run w/ infoStream enabled to see if anything looks suspicious (eg, we are somehow not tracking bytes used correctly). {quote} hmm I will dig once I get back to my workstation. simon BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916913#action_12916913 ] Michael McCandless commented on LUCENE-2662: OK my 2nd indexing test (10M wikipedia docs, flush @ 256 MB ram used) finished and trunk/patch are essentially the same throughput, and, all flushes happened at identical points. So I think we are good to go... Nice work Simon! BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916965#action_12916965 ] Michael McCandless commented on LUCENE-2662: I also ran a test w/ 5 threads -- they are close (22,402 docs/sec for patch, 22,868 docs/sec for trunk), and this time avgUsedMem is closer (811 MB for trunk, 965 MB for patch). I don't think the avgUsedMem is that meaningful -- it takes the max of Runtime.totalMemory() - Runtime.freeMemory() (which includes garbage I think), after each completed task, and then averages across all tasks. In my case I think it's averaging 1 measure per thread, so it's really sort of measuring how much garbage there happened to be at the time. BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916988#action_12916988 ] Michael McCandless commented on LUCENE-2662: I instrumented trunk the patch to see how many times we do new byte[bufferSize] while building 5M index, and they both alloc the same number of byte[] from the BBA. So I don't think we have a memory issue... BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916355#action_12916355 ] Jason Rutherglen commented on LUCENE-2662: -- {quote}we could factor out a super class from ParallelPostingArray which only has the textStart int array, the grow and copy method and let ParallelPostingArray subclass it. {quote} This option, makes the most sense. ParallelByteStartsArray? BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915700#action_12915700 ] Michael McCandless commented on LUCENE-2662: {quote} bq. Maybe rename ords - keys? And hash - values? (The key isn't really an ord (I think?) because it increases by more than 1 each time... it's more like an address since it references an address in the byte-pool space). yeah that depends how you see it - the array index really is the ord though. but I like those names. I will change. {quote} Duh, I agree -- the new names are confusing!! Sorry. I was confused... you are right that what's now called keys are in fact really ords! They are always incr'd by one, on adding a new one. How about renaming key back to ord? And then maybe rename values to bytesStart? And in their decls add comments saying they are indexed by hash code? And maybe rename addByOffset - addByBytesStart? * On the nocommit in ByteBlockPool -- I think that's fine? It's an internal class * The nocommit in BytesRefHash seems wrong? (Ie, compact is used internally)... though maybe we make it private if it's not used externally? * On the nocommit factor this out! in THPF.java... I agree, the postingsArray.textStarts should go away right? Ie, it's a [wasteful] copy of what the BytesRefHash is already storing? * Can we impl BytesRefHash.bytesUsed as an AtomicLong (hmm maybe AtomicInt -- none of these classes can address 2GB)? Then the pool would add in blockSize every time it binds a new block. That method (DW.bytesUsed) is called *alot* -- at least once on every addDoc. * I'm confused again -- when do we use RecyclingByteBlockAllocator from a single thread...? Ie, why did the sync need to be conditional for this class, again? It seems like we always need it sync'd (both the main pool per-doc pool need this)? If so we can simplify and make these methods sync'd? BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915713#action_12915713 ] Simon Willnauer commented on LUCENE-2662: - {quote} How about renaming key back to ord? And then maybe rename values to bytesStart? And in their decls add comments saying they are indexed by hash code? And maybe rename addByOffset - addByBytesStart? {quote} I don't like addByBytesStart I would like to keep offset since it really is an offset into the pool. addByPoolOffset? The names ord and bytesStart are a good compromise :) lets shoot for that. {quote} On the nocommit in ByteBlockPool - I think that's fine? It's an internal class {quote} you refer to this: // nocommit - public arrays are not nice! ? yeah that more of an style thing but if somebody changes them its their fault for being stupid I guess. {quote} The nocommit in BytesRefHash seems wrong? (Ie, compact is used internally)... though maybe we make it private if it's not used externally? {quote} Ah yeah thats bogus - its from a previous iteration which was wrong as well, I will remove. {quote} On the nocommit factor this out! in THPF.java... I agree, the postingsArray.textStarts should go away right? Ie, it's a [wasteful] copy of what the BytesRefHash is already storing? {quote} Yeah that is the reason for that nocommit. Yet, I though about this a little and I have two options for this. * we could factor out a super class from ParallelPostingArray which only has the textStart int array, the grow and copy method and let ParallelPostingArray subclass it. BytesRefHash would accept this class, don't have a good name for it but lets call it TextStartArray for now, and use it internally. It would call grow() once needed inside BytesRefHash and all the other code would be unchanged since PPA is a subclass. * the other way would be to bind the ByteRefHash to the postings array which seems odd to me though. More ideas? {quote} Can we impl BytesRefHash.bytesUsed as an AtomicLong (hmm maybe AtomicInt - none of these classes can address 2GB)? Then the pool would add in blockSize every time it binds a new block. That method (DW.bytesUsed) is called alot - at least once on every addDoc. {quote} I did exactly that in the not yet uploaded patch. But I figured that it would maybe make more sense to use that AtomicInt in the allocator as well as in THPF or is that what you mean? {quote} I'm confused again - when do we use RecyclingByteBlockAllocator from a single thread...? Ie, why did the sync need to be conditional for this class, again? It seems like we always need it sync'd (both the main pool per-doc pool need this)? If so we can simplify and make these methods sync'd? {quote} man, I am sorry - I thought I will use this in LUCENE-2186 in a single threaded env but if so I should change it there if needed. I was one step ahead though. I will change and maybe have a second one if needed. Agree? simon BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915079#action_12915079 ] Jason Rutherglen commented on LUCENE-2662: -- Simon, the patch looks like it's ready for the next stage, ie, TermsHashPerField deparchment. BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914888#action_12914888 ] Jason Rutherglen commented on LUCENE-2662: -- An API change to BBP that would be useful is instead of passing in the size in bytes to newSlice, it'd be more useful if the level and/or the size were passed in. In fact, throughout the codebase, a level, specifically the first, is all that is passed into the newSlice method. The utility of this change is, I'm recording the level of the last slice for the skip list in LUCENE-2312. BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914452#action_12914452 ] Robert Muir commented on LUCENE-2662: - bq. I guess that is the first step towards factoring it out of TermsHashPerField, the next question is are we gonna do that in a different issue and get this committed first? I think it would be better if this class were used in the patch... i wouldn't commit it by itself unused. Its difficult for people to review its behavior, since its just a standalone unused thing (for instance, the hashCode thing i brought up) BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914478#action_12914478 ] Jason Rutherglen commented on LUCENE-2662: -- BytesRefHash is now final and does not create Entry objects anymore That's good. move ByteBlockPool to o.a.l.utils Sure why not. factoring it out of TermsHashPerField, the next question is are we gonna do that in a different issue and get this committed first? We need to factor it out of THPF otherwise this patch isn't really useful for committing. Also, it'll get tested through the entirety of the unit tests, ie, it'll get put through the laundry. BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914486#action_12914486 ] Simon Willnauer commented on LUCENE-2662: - bq. We need to factor it out of THPF otherwise this patch isn't really useful for committing. Also, it'll get tested through the entirety of the unit tests, ie, it'll get put through the laundry. Yeah, lets see this as the first baby step towards it. I will move ByteBockPool to o.a.l.utils and start cutting THPF over to it. We need to do benchmarking in any case just to make sure JIT doesn't play nasty tricks with us again. simon BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914521#action_12914521 ] Jason Rutherglen commented on LUCENE-2662: -- bq. make sure JIT doesn't play nasty tricks with us again. What would we do if this happens? BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914621#action_12914621 ] Michael McCandless commented on LUCENE-2662: Patch looks good Simon -- some ideas: * In the class jdocs, I think state that this is basically a MapBytesRef,int? * Maybe we also move ByteBlockPool -- oal.util? * Maybe move out the ByteBlockAllocator to its own class (in util)? RecyclingByteBlockAllocator? * Can we have DocumentsWriter share the ByteBlockAllocator? (Right now it's dup'd code since DW also implements this). * Maybe rename ords - keys? And hash - values? (The key isn't really an ord (I think?) because it increases by more than 1 each time... it's more like an address since it references an address in the byte-pool space). * We should advertise the limits in the jdocs -- limited to = 2GB total byte storage, each key must be = BLOCK SIZE-2 in length. * Can we have sortedEntries() not allocate a new iterator object? Ie, just return the sorted bytesStart int[]? (This is what's done today, and, for term vectors on small docs, this method is pretty hot). And the javadocs for this should be stronger -- it's not that the behaviour is undefined after, it's that you must .clear() after you're done consume the sorted entries. BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch, 4.0 Reporter: Jason Rutherglen Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch, 4.0 Attachments: LUCENE-2662.patch, LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913589#action_12913589 ] Jason Rutherglen commented on LUCENE-2662: -- The current hash implementation needs to be separated out of TermsHashPerField. BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch Reporter: Jason Rutherglen Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913638#action_12913638 ] Jason Rutherglen commented on LUCENE-2662: -- Simon, when do you think you'll be posting? BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch Reporter: Jason Rutherglen Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913642#action_12913642 ] Simon Willnauer commented on LUCENE-2662: - bq. Simon, when do you think you'll be posting? maybe within the next week I have a busy schedule but does this patch keep you from doing any work? You shouldn't just pull out stuff from 1 month old patches especially as you don't even give me time to reply on the orig. discussion. Any rush on this? BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch Reporter: Jason Rutherglen Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913651#action_12913651 ] Jason Rutherglen commented on LUCENE-2662: -- It'd be nice to get deletes working, ie, LUCENE-2655 and move forward in a way that's useful long term. What changes have you made? BytesHash - Key: LUCENE-2662 URL: https://issues.apache.org/jira/browse/LUCENE-2662 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch Reporter: Jason Rutherglen Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2662.patch This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org