Re: Do you need help on Lucy?
On Sat, Jun 19, 2010 at 07:00:56AM +0200, VOITPTRPTR wrote: How one can contribute (I'm a C developer) to Lucy? We have a wiki page up that covers the mechanics of contributing: http://wiki.apache.org/lucy/HowToContribute Most people choose what they want to work on based on what they need or what interests them. Since you don't mention wanting to work on a particular problem, there are a some general C tasks we could use help on and that don't require a lot of prior knowledge about the code base; I'll describe one of those. A lot of Lucy code was originally written for C89. We have since changed our C dialect to the overlap of C99 and C++, allowing us to use a number of idioms which result in cleaner, more readable code. One of these is the declaration of loop variables within a for construct: Index: core/Lucy/Object/VArray.c === --- core/Lucy/Object/VArray.c(revision 956160) +++ core/Lucy/Object/VArray.c(working copy) @@ -55,8 +55,7 @@ VA_dump(VArray *self) { VArray *dump = VA_new(self-size); -uint32_t i, max; -for (i = 0, max = self-size; i max; i++) { +for (uint32_t i = 0, max = self-size; i max; i++) { Obj *elem = VA_Fetch(self, i); if (elem) { VA_Store(dump, i, Obj_Dump(elem)); } } A good place to start would be that file, VArray.c. Thanks for inquiring, Marvin Humphrey
[Lucy Wiki] Update of HowToContribute by MarvinHumphr ey
Dear Wiki user, You have subscribed to a wiki page or wiki category on Lucy Wiki for change notification. The HowToContribute page has been changed by MarvinHumphrey. The comment on this change is: Remove mention of an obsolete C89 requirement.. http://wiki.apache.org/lucy/HowToContribute?action=diffrev1=2rev2=3 -- Modify the source code using your favorite text editor or IDE. Please take the following points into account: - * All code will eventually need to be portable to multiple operating systems and compilers. This is a complex requirement and it should not block your contribution, but the most helpful thing you an do up front is declare C variables at the top of each block, C89-style. + * All code will eventually need to be portable to multiple operating systems and compilers. (This is a complex requirement and it should not block your contribution.) * All public APIs should be accompanied by informative documentation. * Code should be formatted according to the style guidelines at LucyStyleGuide. * Contributions should pass existing unit tests.
Re: Do you need help on Lucy?
Hi Marvin, We have a wiki page up that covers the mechanics of contributing: http://wiki.apache.org/lucy/HowToContribute Excellent. I'll checkout the code and start reading on monday! Most people choose what they want to work on based on what they need or what interests them. Since you don't mention wanting to work on a particular problem, there are a some general C tasks we could use help on and that don't require a lot of prior knowledge about the code base; I'll describe one of those. To be honest, I've no knowledge on how Lucene core is implemented. But I'm pretty confident when coding in C (portabilty, clarity, refactoring ...). More over I've strong background on algorithms and algorithms optimization. A lot of Lucy code was originally written for C89. We have since changed our C dialect to the overlap of C99 and C++. allowing us to use a number of idioms which result in cleaner, more readable code. One of these is the declaration of loop variables within a for construct: Index: core/Lucy/Object/VArray.c === --- core/Lucy/Object/VArray.c(revision 956160) +++ core/Lucy/Object/VArray.c(working copy) @@ -55,8 +55,7 @@ VA_dump(VArray *self) { VArray *dump = VA_new(self-size); -uint32_t i, max; -for (i = 0, max = self-size; i max; i++) { +for (uint32_t i = 0, max = self-size; i max; i++) { Obj *elem = VA_Fetch(self, i); if (elem) { VA_Store(dump, i, Obj_Dump(elem)); } } A good place to start would be that file, VArray.c. Thanks for inquiring, OK, I see. Is it possible to ask questions about design choices of Lucy (how indexes are built, algorithms behind the scene...) in this mailing list as I'm missing this Information Retrieval skills? Regards -- voidptr...@gmail.com
[jira] Updated: (LUCY-114) compile failure on OS X 10.6
[ https://issues.apache.org/jira/browse/LUCY-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Karman updated LUCY-114: -- Attachment: align_signature.patch The inline patch came out garbled. Same patch attached. compile failure on OS X 10.6 Key: LUCY-114 URL: https://issues.apache.org/jira/browse/LUCY-114 Project: Lucy Issue Type: Bug Components: Core - Store Environment: Mac OS X 10.6 Reporter: Peter Karman Attachments: align_signature.patch I get this error when trying to compile under OS X: ../core/Lucy/Store/OutStream.c:125: error: conflicting types for 'lucy_OutStream_align' autogen/Lucy/Store/OutStream.h:55: error: previous declaration of 'lucy_OutStream_align' was here patch below: Index: core/Lucy/Store/OutStream.bp === --- core/Lucy/Store/OutStream.bp (revision 925442) +++ core/Lucy/Store/OutStream.bp (revision 925443) @@ -42,7 +42,7 @@ * * @return the new file position. */ -final i64_t +final int64_t Align(OutStream *self, int64_t modulus); /** Flush output buffer to target FileHandle. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCY-115) nullable attribute not propagated
nullable attribute not propagated - Key: LUCY-115 URL: https://issues.apache.org/jira/browse/LUCY-115 Project: Lucy Issue Type: Bug Components: Clownfish Environment: OS X 10.6 Reporter: Peter Karman In Clownfish files (.bp) such as KinoSearch/Search/Compiler.bp, certain methods are defined as nullable but that nullable attribute is not being propagated to the _OVERRIDE generated code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCY-116) Build.PL opts not supported
Build.PL opts not supported --- Key: LUCY-116 URL: https://issues.apache.org/jira/browse/LUCY-116 Project: Lucy Issue Type: Bug Components: Perl bindings Environment: OS X 10.6 Reporter: Peter Karman The Build.PL docs claim that --config cc= should work but it does not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCY-116) Build.PL opts not supported
[ https://issues.apache.org/jira/browse/LUCY-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Karman updated LUCY-116: -- Attachment: get_cc.patch pass_cc.patch The attached patches implement the documented --config cc feature. Build.PL opts not supported --- Key: LUCY-116 URL: https://issues.apache.org/jira/browse/LUCY-116 Project: Lucy Issue Type: Bug Components: Perl bindings Environment: OS X 10.6 Reporter: Peter Karman Attachments: get_cc.patch, pass_cc.patch The Build.PL docs claim that --config cc= should work but it does not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
Hi Koji, - FieldCacheImpl.getStringIndex() no longer throws an exception when term count exceeds doc count. I think it is LUCENE-2142, but after it was fixed, getStringIndex() still throws AIOOBE? Am I missing something? I have seen you wrote a comment to 2142 on June 7, we have overseen this. You should have reopened it and stop the release vote :( Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count
[ https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2142: -- Attachment: LUCENE-2142-fix.patch After a coffee i have seen the problem, too - stupoid :( Here is the fix for 3.x (also 3.0 and 2.9) - in trunk the fix is not needed, as there are growable arrays. Maybe we should add a simple test to all branches! FieldCache.getStringIndex should not throw exception if term count exceeds doc count Key: LUCENE-2142 URL: https://issues.apache.org/jira/browse/LUCENE-2142 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9.3, 3.0.2, 3.1, 4.0 Attachments: LUCENE-2142-fix.patch Spinoff of LUCENE-2133/LUCENE-831. Currently FieldCache cannot handle more than one value per field. We may someday want to fix that... but until that day: FieldCache.getStringIndex currently does a simplistic check to try to catch when you've accidentally allowed more than one term per field, by testing if the number of unique terms exceeds the number of documents. The problem is, this is not a perfect check, in that it allows false negatives (you could have more than one term per field for some docs and the check won't catch you). Further, the exception thrown is the unchecked RuntimeException. So this means... you could happily think all is good, until some day, well into production, once you've updated enough docs, suddenly the check will catch you and throw an unhandled exception, stopping all searches [that need to sort by this string field] in their tracks. It's not gracefully degrading. I think we should simply remove the test, ie, if you have more terms than docs then the terms simply overwrite one another. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
(10/06/19 15:36), Uwe Schindler wrote: Hi Koji, - FieldCacheImpl.getStringIndex() no longer throws an exception when term count exceeds doc count. I think it is LUCENE-2142, but after it was fixed, getStringIndex() still throws AIOOBE? Am I missing something? I have seen you wrote a comment to 2142 on June 7, we have overseen this. You should have reopened it and stop the release vote :( Uwe Yeah. I should do that, but when vote going, I simply forgot the issue. Then I read your release announce, it reminded me of the issue. I'm sorry about that... Koji -- http://www.rondhuit.com/en/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count
[ https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2142: -- Attachment: (was: LUCENE-2142-fix.patch) FieldCache.getStringIndex should not throw exception if term count exceeds doc count Key: LUCENE-2142 URL: https://issues.apache.org/jira/browse/LUCENE-2142 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9.3, 3.0.2, 3.1, 4.0 Attachments: LUCENE-2142-fix-3x.patch, LUCENE-2142-fix-trunk.patch Spinoff of LUCENE-2133/LUCENE-831. Currently FieldCache cannot handle more than one value per field. We may someday want to fix that... but until that day: FieldCache.getStringIndex currently does a simplistic check to try to catch when you've accidentally allowed more than one term per field, by testing if the number of unique terms exceeds the number of documents. The problem is, this is not a perfect check, in that it allows false negatives (you could have more than one term per field for some docs and the check won't catch you). Further, the exception thrown is the unchecked RuntimeException. So this means... you could happily think all is good, until some day, well into production, once you've updated enough docs, suddenly the check will catch you and throw an unhandled exception, stopping all searches [that need to sort by this string field] in their tracks. It's not gracefully degrading. I think we should simply remove the test, ie, if you have more terms than docs then the terms simply overwrite one another. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count
[ https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2142: -- Attachment: LUCENE-2142-fix-3x.patch LUCENE-2142-fix-trunk.patch Here patch with test for 3.x and before. Trunk patch only contains test, which passes. FieldCache.getStringIndex should not throw exception if term count exceeds doc count Key: LUCENE-2142 URL: https://issues.apache.org/jira/browse/LUCENE-2142 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9.3, 3.0.2, 3.1, 4.0 Attachments: LUCENE-2142-fix-3x.patch, LUCENE-2142-fix-trunk.patch Spinoff of LUCENE-2133/LUCENE-831. Currently FieldCache cannot handle more than one value per field. We may someday want to fix that... but until that day: FieldCache.getStringIndex currently does a simplistic check to try to catch when you've accidentally allowed more than one term per field, by testing if the number of unique terms exceeds the number of documents. The problem is, this is not a perfect check, in that it allows false negatives (you could have more than one term per field for some docs and the check won't catch you). Further, the exception thrown is the unchecked RuntimeException. So this means... you could happily think all is good, until some day, well into production, once you've updated enough docs, suddenly the check will catch you and throw an unhandled exception, stopping all searches [that need to sort by this string field] in their tracks. It's not gracefully degrading. I think we should simply remove the test, ie, if you have more terms than docs then the terms simply overwrite one another. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
Mike, Koji: The release is out, but should I maybe simple remove the announcement line (simply strike it out) on the lucene.apache.org pages, so nobody expects this to be fixed really? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Saturday, June 19, 2010 10:19 AM To: dev@lucene.apache.org Subject: RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3 No problem, I fixed it now, see patches. For trunk, this was not an issue, but for 3x, 3.0 and 2.9. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Saturday, June 19, 2010 10:11 AM To: dev@lucene.apache.org Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3 (10/06/19 15:36), Uwe Schindler wrote: Hi Koji, - FieldCacheImpl.getStringIndex() no longer throws an exception when term count exceeds doc count. I think it is LUCENE-2142, but after it was fixed, getStringIndex() still throws AIOOBE? Am I missing something? I have seen you wrote a comment to 2142 on June 7, we have overseen this. You should have reopened it and stop the release vote :( Uwe Yeah. I should do that, but when vote going, I simply forgot the issue. Then I read your release announce, it reminded me of the issue. I'm sorry about that... Koji -- http://www.rondhuit.com/en/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
OK, I think just removing the text claiming this is fixed, is good? Mike On Sat, Jun 19, 2010 at 5:29 AM, Uwe Schindler u...@thetaphi.de wrote: Mike, Koji: The release is out, but should I maybe simple remove the announcement line (simply strike it out) on the lucene.apache.org pages, so nobody expects this to be fixed really? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Saturday, June 19, 2010 10:19 AM To: dev@lucene.apache.org Subject: RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3 No problem, I fixed it now, see patches. For trunk, this was not an issue, but for 3x, 3.0 and 2.9. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Saturday, June 19, 2010 10:11 AM To: dev@lucene.apache.org Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3 (10/06/19 15:36), Uwe Schindler wrote: Hi Koji, - FieldCacheImpl.getStringIndex() no longer throws an exception when term count exceeds doc count. I think it is LUCENE-2142, but after it was fixed, getStringIndex() still throws AIOOBE? Am I missing something? I have seen you wrote a comment to 2142 on June 7, we have overseen this. You should have reopened it and stop the release vote :( Uwe Yeah. I should do that, but when vote going, I simply forgot the issue. Then I read your release announce, it reminded me of the issue. I'm sorry about that... Koji -- http://www.rondhuit.com/en/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
It will not disappear in changes.txt, but at least it should not be so prominent. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Saturday, June 19, 2010 11:38 AM To: dev@lucene.apache.org Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3 OK, I think just removing the text claiming this is fixed, is good? Mike On Sat, Jun 19, 2010 at 5:29 AM, Uwe Schindler u...@thetaphi.de wrote: Mike, Koji: The release is out, but should I maybe simple remove the announcement line (simply strike it out) on the lucene.apache.org pages, so nobody expects this to be fixed really? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Saturday, June 19, 2010 10:19 AM To: dev@lucene.apache.org Subject: RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3 No problem, I fixed it now, see patches. For trunk, this was not an issue, but for 3x, 3.0 and 2.9. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Saturday, June 19, 2010 10:11 AM To: dev@lucene.apache.org Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3 (10/06/19 15:36), Uwe Schindler wrote: Hi Koji, - FieldCacheImpl.getStringIndex() no longer throws an exception when term count exceeds doc count. I think it is LUCENE-2142, but after it was fixed, getStringIndex() still throws AIOOBE? Am I missing something? I have seen you wrote a comment to 2142 on June 7, we have overseen this. You should have reopened it and stop the release vote :( Uwe Yeah. I should do that, but when vote going, I simply forgot the issue. Then I read your release announce, it reminded me of the issue. I'm sorry about that... Koji -- http://www.rondhuit.com/en/ --- -- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]
[ https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-2380: - Attachment: LUCENE-2380_direct_arr_access.patch This patch adds the ability to get at the raw arrays from the Direct* classes, and using those fixes the performance regressions in the fc faceting I was seeing. To do this, it adds this to DocTermsIndex. Anyone have a better solution? {code} /** @lucene.internal */ public abstract PackedInts.Reader getDocToOrd(); {code} Add FieldCache.getTermBytes, to load term data as byte[] Key: LUCENE-2380 URL: https://issues.apache.org/jira/browse/LUCENE-2380 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380_direct_arr_access.patch, LUCENE-2380_enum.patch, LUCENE-2380_enum.patch With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode string, but not necessarily), so we need to push this up the search stack. FieldCache now has getStrings and getStringIndex; we need corresponding methods to load terms as native byte[], since in general they may not be representable as String. This should be quite a bit more RAM efficient too, for US ascii content since each character would then use 1 byte not 2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]
[ https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880483#action_12880483 ] Yonik Seeley commented on LUCENE-2380: -- It was really tricky performance testing this. If I started solr and tested one type of faceting exclusively, the performance impact of going through the new FieldCache interfaces (PackedInts for ord lookup) was relatively minimal. However, I had a simple script that tested the different variants (the 4 in the table above)... and using that resulted in the bigger slowdowns. The script would do the following: {code} 1) test 100 iterations of facet.method=fc on the 100,000 term field 2) test 10 iterations of facet.method=fcs on the 100,000 term field 3) test 100 iterations of facet.method=fc on the 100 term field 4) test 10 iterations of facet.method=fcs on the 100 term field {code} I would run the script a few times, making sure the numbers stabilized and were repeatable. Testing #1 alone resulted in trunk slowing down ~ 4% Testing #1 along with any single other test: same small slowdown of ~4% Running the complete script: slowdown of 33-38% for #1 (as well as others) When running the complete script, the first run of Test #1 was always the best... as if the JVM correctly specialized it, but then discarded it later, never to return. So: you can't always depend on the JVM being able to inline stuff for you, and it seems very hard to determine when it can. This obviously has implications for the lucene benchmarker too. Add FieldCache.getTermBytes, to load term data as byte[] Key: LUCENE-2380 URL: https://issues.apache.org/jira/browse/LUCENE-2380 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380_direct_arr_access.patch, LUCENE-2380_enum.patch, LUCENE-2380_enum.patch With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode string, but not necessarily), so we need to push this up the search stack. FieldCache now has getStrings and getStringIndex; we need corresponding methods to load terms as native byte[], since in general they may not be representable as String. This should be quite a bit more RAM efficient too, for US ascii content since each character would then use 1 byte not 2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1911) File descriptor leak while indexing, may cause index corruption
[ https://issues.apache.org/jira/browse/SOLR-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880490#action_12880490 ] Simon Rosenthal commented on SOLR-1911: --- No - seems to have cleared up with trunk also,. I'm OK with closing it but am really curious to know what changed between mid May and today to clear up the problem. File descriptor leak while indexing, may cause index corruption --- Key: SOLR-1911 URL: https://issues.apache.org/jira/browse/SOLR-1911 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.5 Environment: Ubuntu Linux, Java build 1.6.0_16-b01 Solr Specification Version: 3.0.0.2010.05.12.16.17.46 Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 16:17:46 -- bult from updated trunk Lucene Specification Version: 4.0-dev Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26 Current Time: Thu May 13 12:21:12 EDT 2010 Server Start Time:Thu May 13 11:45:41 EDT 2010 Reporter: Simon Rosenthal Priority: Critical Attachments: indexlsof.tar.gz, openafteropt.txt While adding documents to an already existing index using this build, the number of open file descriptors increases dramatically until the open file per-process limit is reached (1024) , at which point there are error messages in the log to that effect. If the server is restarted the index may be corrupt commits are handled by autocommit every 60 seconds or 500 documents (usually the time limit is reached first). mergeFactor is 10. It looks as though each time a commit takes place, the number of open files (obtained from lsof -p `cat solr.pid` | egrep ' [0-9]+r ' ) increases by 40, There are several open file descriptors associated with each file in the index. Rerunning the same index updates with an older Solr (built from trunk in Feb 2010) doesn't show this problem - the number of open files fluctuates up and down as segments are created and merged, but stays basically constant. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1965) Solr 4.0 performance improvements
[ https://issues.apache.org/jira/browse/SOLR-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-1965: --- Attachment: SOLR-1965.patch Here's the patch for facet.method=fc (single valued) that uses the latest patch in LUCENE-2378 to fix the performance regression. Solr 4.0 performance improvements - Key: SOLR-1965 URL: https://issues.apache.org/jira/browse/SOLR-1965 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Fix For: 4.0 Attachments: SOLR-1965.patch Catch-all performance improvement issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2504) sorting performance regression
sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880541#action_12880541 ] Yonik Seeley commented on LUCENE-2504: -- More numbers: Ubuntu, Java 1.7.0-ea-b98 (64 bit): f10_s sort only: 126 ms sort against random field: 175 ms sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880542#action_12880542 ] Yonik Seeley commented on LUCENE-2504: -- More numbers: Windows 7: java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode) f10_s sort only: 115 ms sort against random field: 162 ms sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Doppleganger threads after ingestion completed
Chewing up cpu or blocked. The stack trace says it's blocked. The sockets are abandoned by the program, yes, but TCP/IP itself has a complex sequence for shutting down sockets that takes a few minutes. If these sockets stay around for hours, then there's a real problem. (In fact, there is a bug in the TCP/IP specification, 40 years old, that causes zombie sockets that never shut down.) The HTTP solr server really needs a socket close() method. On Thu, Jun 17, 2010 at 6:08 AM, karl.wri...@nokia.com wrote: Folks, I ran 20,000,000 records into Solr via the extractingUpdateRequestHandler under jetty. The previous problems with resources have apparently been resolved by using Http1.1 with keep-alive, rather than creating and destroying 20,000,000 sockets. ;-) However, after the client terminates, I still find the Solr process chewing away CPU – indeed, there were 5 threads doing this. A thread dump yields the following partial trace for all 5 threads: btpool0-13 prio=10 tid=0x41391000 nid=0xe7c runnable [0x7f4a8c789000] java.lang.Thread.State: RUNNABLE at org.mortbay.jetty.HttpParser$Input.blockForContent(HttpParser.java:925) at org.mortbay.jetty.HttpParser$Input.read(HttpParser.java:897) at org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:977) at org.apache.commons.fileupload.MultipartStream$ItemInputStream.close(MultipartStream.java:924) at org.apache.commons.fileupload.MultipartStream$ItemInputStream.close(MultipartStream.java:904) at org.apache.commons.fileupload.util.Streams.copy(Streams.java:119) at org.apache.commons.fileupload.util.Streams.copy(Streams.java:64) at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362) at org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126) at org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:343) at org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:396) at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:114) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) … I could be wrong, but it looks to me like either jetty or fileupload may have a problem here. I have not looked at the jetty source code, but infinitely spinning processes even after the socket has been abandoned do not seem reasonable to me. Thoughts? Karl -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org