RE: [Lucene.Net] [VOTE] Apache-Lucene-2.9.4g-incubating-RC1 Release
That is the tag - no rush on review; im still out of town myself. Sent from my Windows Phone From: Stefan Bodewig Sent: 1/1/2012 10:42 PM To: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] [VOTE] Apache-Lucene-2.9.4g-incubating-RC1 Release On 2011-12-30, Prescott Nasser wrote: Hey All, The artifacts are ready to roll, they can be found here: http://people.apache.org/~pnasser/Lucene.Net/2.9.4g-incubating-RC1/ Unfortunately I probably won't be able to review them for another 24 hours. Is http://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_4g-RC1/ the corresponding tag? Stefan
Re: [Lucene.Net] [VOTE] Apache-Lucene-2.9.4g-incubating-RC1 Release
On 2012-01-02, Stefan Bodewig wrote: On 2011-12-30, Prescott Nasser wrote: Hey All, The artifacts are ready to roll, they can be found here: http://people.apache.org/~pnasser/Lucene.Net/2.9.4g-incubating-RC1/ Signatures and checksums are good. NOTICE, LICENSE and ACKNOWLEDGEMENTS match my current understanding. RAT is reasonably happy with the binary release. The source release lacks license headers for all Solution files as well as quite a few C# source in the CJK and Chinese Analyzers. I'll open a JIRA ticket for this. Too many missing licenses for a +1, sorry. Is http://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_4g-RC1/ the corresponding tag? The tag contains additional bin and doc dirs as it used to but it now also has lib and build/vs2008 dirs that are not in the source distribution. I assume lib is in the same area as bin (stuff we need to build but don't want to distribute). vs2008 is empty, is this the reason it is not part of the distribution? Stefan
[Lucene.Net] [jira] [Updated] (LUCENENET-460) Missing License Headers in 2.9.4g branch
[ https://issues.apache.org/jira/browse/LUCENENET-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Bodewig updated LUCENENET-460: - Attachment: 2.9.4g.headers.patch Missing License Headers in 2.9.4g branch Key: LUCENENET-460 URL: https://issues.apache.org/jira/browse/LUCENENET-460 Project: Lucene.Net Issue Type: Bug Components: ASF Process Reporter: Stefan Bodewig Fix For: Lucene.Net 2.9.4g Attachments: 2.9.4g.headers.patch The patch I'm going to attach doesn't cover bin or lib or doc as we don't distribute either. The patch looks bigger than necessary as I ran RAT on Linux (sorry, don't have access to a Windows box right now) and many of the affected files don't have the svn:eol-style property set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: [Lucene.Net] [VOTE] Apache-Lucene-2.9.4g-incubating-RC1 Release
I'll apply this patch shortly and re-cut - thanks for looking it over. vs2008 is not there because it's an empty folder and spot on for lib - we aren't distributing it ~P From: bode...@apache.org To: lucene-net-...@incubator.apache.org Date: Tue, 3 Jan 2012 07:14:53 +0100 Subject: Re: [Lucene.Net] [VOTE] Apache-Lucene-2.9.4g-incubating-RC1 Release On 2012-01-02, Stefan Bodewig wrote: On 2011-12-30, Prescott Nasser wrote: Hey All, The artifacts are ready to roll, they can be found here: http://people.apache.org/~pnasser/Lucene.Net/2.9.4g-incubating-RC1/ Signatures and checksums are good. NOTICE, LICENSE and ACKNOWLEDGEMENTS match my current understanding. RAT is reasonably happy with the binary release. The source release lacks license headers for all Solution files as well as quite a few C# source in the CJK and Chinese Analyzers. I'll open a JIRA ticket for this. Too many missing licenses for a +1, sorry. Is http://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_4g-RC1/ the corresponding tag? The tag contains additional bin and doc dirs as it used to but it now also has lib and build/vs2008 dirs that are not in the source distribution. I assume lib is in the same area as bin (stuff we need to build but don't want to distribute). vs2008 is empty, is this the reason it is not part of the distribution? Stefan
[jira] [Commented] (SOLR-2993) Integrate WordBreakSpellChecker with Solr
[ https://issues.apache.org/jira/browse/SOLR-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178397#comment-13178397 ] Okke Klein commented on SOLR-2993: -- I'm having some trouble combining this patch with your other patch in https://issues.apache.org/jira/browse/SOLR-2585. Could you make a patch with both features if possible? Integrate WordBreakSpellChecker with Solr - Key: SOLR-2993 URL: https://issues.apache.org/jira/browse/SOLR-2993 Project: Solr Issue Type: Improvement Components: SolrCloud, spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2993.patch A SpellCheckComponent enhancement, leveraging the WordBreakSpellChecker from LUCENE-3523: - Detect spelling errors resulting from misplaced whitespace without the use of shingle-based dictionaries. - Seamlessly integrate word-break suggestions with single-word spelling corrections from the existing FileBased-, IndexBased- or Direct- spell checkers. - Provide collation support for word-break errors including cases where the user has a mix of single-word spelling errors and word-break errors in the same query. - Provide shard support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [JENKINS] Lucene-trunk - Build # 1786 - Failure
I agree, it happens always with tests.nightly=true and multiplicator1. I retriggered a nightly Jenkins build, same error message. The failure mail should arrive shortly J - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Sunday, January 01, 2012 12:50 PM To: dev@lucene.apache.org Subject: Re: [JENKINS] Lucene-trunk - Build # 1786 - Failure I think this is just a broken test case. If this test is ran with: -Dtests.seed=-289aae8d40093437:-2c1c9ffc76ccb3bd:71f64018e9abbebb -Dtests.multiplier=3 -Dtests.nightly=true then 1300 documents with term aaa are indexed. During searching the maximum number of documents to retrieve is hard coded to 1000. In that case the assertion on line 460 fails. Replacing: ScoreDoc[] hits = searcher.search(new TermQuery(new Term(field, aaa)), null, 1000).scoreDocs; with: ScoreDoc[] hits = searcher.search(new TermQuery(new Term(field, aaa)), null, n*100).scoreDocs; will fix this failure. Martijn On 1 January 2012 05:10, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-trunk/1786/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriter.testDiverseDocs Error Message: expected:1300 but was:1000 Stack Trace: junit.framework.AssertionFailedError: expected:1300 but was:1000 at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) at org.apache.lucene.index.TestIndexWriter.testDiverseDocs(TestIndexWriter.java:459) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:528) Build Log (for compile errors): [...truncated 12691 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3665) Make WeightedSpanTermExtractor extensible to handle custom query implemenations
[ https://issues.apache.org/jira/browse/LUCENE-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-3665. - Resolution: Fixed Assignee: Simon Willnauer Make WeightedSpanTermExtractor extensible to handle custom query implemenations --- Key: LUCENE-3665 URL: https://issues.apache.org/jira/browse/LUCENE-3665 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Affects Versions: 3.5 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3665.patch Currently if I have a custom query which subclasses query directly I can't use the QueryScorer for highlighting since it does explicit instanceof checks. In some cases its is possible to rewrite the query before passing it to the highlighter to obtain a primitive query. However I had the usecase where this was not possible ie. the original index was not available on the machine which highlights the results. To still use the highlighter I had to copy a bunch of code due to visibility issues in those classes. I think we can make this extensible with minor effort to allow this usecase without massive code duplication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [JENKINS] Lucene-trunk - Build # 1786 - Failure
Hi again, In my opinion, the test should simply test TopDocs.totalCount instead of the array size. There are more tests doing this. Maybe we should change all those tests that only count hits (and are not interested in the ScoreDocs at all) to use TotalHitCountCollector instead of TopDocs? I would prefer the latter, I could open an issue. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Sunday, January 01, 2012 12:50 PM To: dev@lucene.apache.org Subject: Re: [JENKINS] Lucene-trunk - Build # 1786 - Failure I think this is just a broken test case. If this test is ran with: -Dtests.seed=-289aae8d40093437:-2c1c9ffc76ccb3bd:71f64018e9abbebb -Dtests.multiplier=3 -Dtests.nightly=true then 1300 documents with term aaa are indexed. During searching the maximum number of documents to retrieve is hard coded to 1000. In that case the assertion on line 460 fails. Replacing: ScoreDoc[] hits = searcher.search(new TermQuery(new Term(field, aaa)), null, 1000).scoreDocs; with: ScoreDoc[] hits = searcher.search(new TermQuery(new Term(field, aaa)), null, n*100).scoreDocs; will fix this failure. Martijn On 1 January 2012 05:10, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-trunk/1786/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriter.testDiverseDocs Error Message: expected:1300 but was:1000 Stack Trace: junit.framework.AssertionFailedError: expected:1300 but was:1000 at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) at org.apache.lucene.index.TestIndexWriter.testDiverseDocs(TestIndexWriter.java:459) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:528) Build Log (for compile errors): [...truncated 12691 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk - Build # 1786 - Failure
+1 to using TotalHitCountCollector in tests that just need the hit count, hmm though that might lower test coverage/exercise of the normal collectors... But in the mean time we should commit the quick fix so the test stops failing? What changed, though? Like why suddenly is this test failing so frequently...? Mike McCandless http://blog.mikemccandless.com On Mon, Jan 2, 2012 at 9:56 AM, Uwe Schindler u...@thetaphi.de wrote: Hi again, In my opinion, the test should simply test TopDocs.totalCount instead of the array size. There are more tests doing this. Maybe we should change all those tests that only count hits (and are not interested in the ScoreDocs at all) to use TotalHitCountCollector instead of TopDocs? I would prefer the latter, I could open an issue. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Sunday, January 01, 2012 12:50 PM To: dev@lucene.apache.org Subject: Re: [JENKINS] Lucene-trunk - Build # 1786 - Failure I think this is just a broken test case. If this test is ran with: -Dtests.seed=-289aae8d40093437:-2c1c9ffc76ccb3bd:71f64018e9abbebb -Dtests.multiplier=3 -Dtests.nightly=true then 1300 documents with term aaa are indexed. During searching the maximum number of documents to retrieve is hard coded to 1000. In that case the assertion on line 460 fails. Replacing: ScoreDoc[] hits = searcher.search(new TermQuery(new Term(field, aaa)), null, 1000).scoreDocs; with: ScoreDoc[] hits = searcher.search(new TermQuery(new Term(field, aaa)), null, n*100).scoreDocs; will fix this failure. Martijn On 1 January 2012 05:10, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-trunk/1786/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriter.testDiverseDocs Error Message: expected:1300 but was:1000 Stack Trace: junit.framework.AssertionFailedError: expected:1300 but was:1000 at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) at org.apache.lucene.index.TestIndexWriter.testDiverseDocs(TestIndexWriter.java:459) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:528) Build Log (for compile errors): [...truncated 12691 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Solr plugin component resource cleanup?
This works fine for a SearchComponent, but if I try this for a QParserPlugin I get the following: [junit] org.apache.solr.common.SolrException: Invalid 'Aware' object: org.apache.solr.mcf.ManifoldCFQParserPlugin@18941f7 -- org.apache.solr.util.plugin.SolrCoreAware must be an instance of: [org.apache.solr.request.SolrRequestHandler] [org.apache.solr.response.QueryResponseWriter] [org.apache.solr.handler.component.SearchComponent] [org.apache.solr.update.processor.UpdateRequestProcessorFactory] [org.apache.solr.handler.component.ShardHandlerFactory] Any further suggestions? Karl From: ext Chris Hostetter [hossman_luc...@fucit.org] Sent: Tuesday, December 27, 2011 7:19 PM To: dev@lucene.apache.org Subject: Re: Solr plugin component resource cleanup? take a look at the CloseHook API and SolrCore.addCloseHook(...) : Is there a preferred time/manner for a Solr component (e.g. a : SearchComponent) to clean up resources that have been allocated during : the time of its existence, other than via a finalizer? There seems to : be nothing for this in the NamedListInitializedPlugin interface, and yet : if you allocate a resource the test framework warns you about it. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk - Build # 1786 - Failure
On Mon, Jan 2, 2012 at 10:39 AM, Michael McCandless luc...@mikemccandless.com wrote: What changed, though? Like why suddenly is this test failing so frequently...? i broke the test... -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk - Build # 1786 - Failure
I committed fixes. you can see the bug in the patch on LUCENE-3667. Instead of doing hardcoded 300 each time, i changed the test to use atLeast... so it uses 100 or so locally but gets bigger with multipliers and nightly. but in our nightly tests this caused it to exceed 1000 sometimes, which would fail because it used scoreDocs.length instead of totalHits. On Mon, Jan 2, 2012 at 11:12 AM, Robert Muir rcm...@gmail.com wrote: On Mon, Jan 2, 2012 at 10:39 AM, Michael McCandless luc...@mikemccandless.com wrote: What changed, though? Like why suddenly is this test failing so frequently...? i broke the test... -- lucidimagination.com -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2992) add prepareCommit
[ https://issues.apache.org/jira/browse/SOLR-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-2992. Resolution: Fixed Fix Version/s: 4.0 add prepareCommit - Key: SOLR-2992 URL: https://issues.apache.org/jira/browse/SOLR-2992 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Priority: Minor Fix For: 4.0 Attachments: SOLR-2992.patch Expose Lucene's prepareCommit to Solr update handlers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk - Build # 1786 - Failure
Ahh ok good that it's now explained... Mike McCandless http://blog.mikemccandless.com On Mon, Jan 2, 2012 at 11:22 AM, Robert Muir rcm...@gmail.com wrote: I committed fixes. you can see the bug in the patch on LUCENE-3667. Instead of doing hardcoded 300 each time, i changed the test to use atLeast... so it uses 100 or so locally but gets bigger with multipliers and nightly. but in our nightly tests this caused it to exceed 1000 sometimes, which would fail because it used scoreDocs.length instead of totalHits. On Mon, Jan 2, 2012 at 11:12 AM, Robert Muir rcm...@gmail.com wrote: On Mon, Jan 2, 2012 at 10:39 AM, Michael McCandless luc...@mikemccandless.com wrote: What changed, though? Like why suddenly is this test failing so frequently...? i broke the test... -- lucidimagination.com -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 11968 - Still Failing
I committed a fix... Mike McCandless http://blog.mikemccandless.com On Fri, Dec 30, 2011 at 5:05 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11968/ 1 tests failed. REGRESSION: org.apache.lucene.store.TestNRTCachingDirectory.testNRTAndCommit Error Message: java.lang.AssertionError: Some threads threw uncaught exceptions! Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:571) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:599) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:543) Build Log (for compile errors): [...truncated 8254 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk - Build # 1788 - Still Failing
Build: https://builds.apache.org/job/Lucene-trunk/1788/ 1 tests failed. FAILED: org.apache.lucene.index.TestIndexWriter.testDiverseDocs Error Message: expected:1200 but was:1000 Stack Trace: junit.framework.AssertionFailedError: expected:1200 but was:1000 at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) at org.apache.lucene.index.TestIndexWriter.testDiverseDocs(TestIndexWriter.java:459) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:528) Build Log (for compile errors): [...truncated 12734 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1268) Incorporate Lucene's FastVectorHighlighter
[ https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178463#comment-13178463 ] Antony Stubbs commented on SOLR-1268: - Koji, with mutli-term fields, Highlighter would return the single value that matched. FVH however merges values in the fragment returned. Is there a way to get the same behavior as highlighter in this respect (in my use case, i only want the value that matched to be highlighted)? Incorporate Lucene's FastVectorHighlighter -- Key: SOLR-1268 URL: https://issues.apache.org/jira/browse/SOLR-1268 Project: Solr Issue Type: New Feature Components: highlighter Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268-0_fragsize.patch, SOLR-1268.patch, SOLR-1268.patch, SOLR-1268.patch Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3calpine.deb.1.10.1005251052040.24...@radix.cryptio.net%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3667) Consider changing how we set the number of threads to use to run tests.
[ https://issues.apache.org/jira/browse/LUCENE-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178464#comment-13178464 ] Robert Muir commented on LUCENE-3667: - Thanks for reporting back guys. I still dont like the timings hossman has (i think 19 minutes is crazy, would really love to know whats going on there). but just for comparison here is my machines: Linux (i7-2600k@3.4ghz, 8gb ram): Before: {noformat} BUILD SUCCESSFUL Total time: 7 minutes 2 seconds real7m3.099s user27m47.900s sys 0m54.639s {noformat} After: {noformat} BUILD SUCCESSFUL Total time: 4 minutes 51 seconds real4m52.310s user17m14.869s sys 0m29.682s {noformat} Windows (Core2Quad-Q9650@3.0ghz, 8gb ram) Before: {noformat} -Solr tests always timeout/fail- {noformat} After: {noformat} BUILD SUCCESSFUL Total time: 8 minutes 37 seconds real8m39.302s user0m0.000s sys 0m0.046s {noformat} Mac (Core i5@2.3ghz, 4gb ram) Before: {noformat} -Solr tests always timeout/fail- {noformat} After: {noformat} BUILD SUCCESSFUL Total time: 11 minutes 20 seconds real11m20.428s user28m0.921s sys 1m38.629s {noformat} Consider changing how we set the number of threads to use to run tests. --- Key: LUCENE-3667 URL: https://issues.apache.org/jira/browse/LUCENE-3667 Project: Lucene - Java Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Attachments: LUCENE-3667.patch, LUCENE-3667.patch, LUCENE-3667.patch, LUCENE-3667.patch, LUCENE-3667.patch The current way we set the number of threads to use is not expressive enough for some systems. My quad core with hyper threading is recognized as 8 CPUs - since I can only override the number of threads to use per core, 8 is as low as I can go. 8 threads can be problematic for me - just the amount of RAM used sometimes can toss me into heavy paging because I only have 8 GB of RAM - the heavy paging can cause my whole system to come to a crawl. Without hacking the build, I don't think I have a lot of workarounds. I'd like to propose that switch from using threadsPerProcessor to threadCount. In some ways, it's not as nice, because it does not try to scale automatically per system. But that auto scaling is often not ideal (hyper threading, wanting to be able to do other work at the same time), so perhaps we just default to 1 or 2 threads and devs can override individually? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Closed] (SOLR-2995) Data Import on CentOS 5.7
First, please start new topics with a new thread, it helps people keep track of things Well, a Fix Version of 1.4 probably just got in there when you opened the bug. But I don't understand the question about different versions. All software releases get version numbers, the current release is 3.5 so I really don't know how you were using 1.4.1 unless you purposely mixed and matched... Best Erick On Sun, Jan 1, 2012 at 2:33 PM, Mudi Ugbowanko (Closed) (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mudi Ugbowanko closed SOLR-2995. Resolution: Not A Problem Fix Version/s: 1.4.1 I was using data import handler v1.4.1 with solr 3.5.0. My next question is why the different versions? Thanks M. Data Import on CentOS 5.7 - Key: SOLR-2995 URL: https://issues.apache.org/jira/browse/SOLR-2995 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.5 Environment: CentOS release 5.7 (Final); Linux 2.6.18-274.12.1.el5.centos.plusxen x86_64 x86_64 x86_64 GNU/Linux Reporter: Mudi Ugbowanko Priority: Minor Labels: DIH, dataimportHandler Fix For: 1.4.1 Original Estimate: 24h Remaining Estimate: 24h I have configured my solr on centos box and configured my solrconfig.xml to use 'dataimporthandler' plugin. My solrconfig contains the following configuration: lib dir=/path/to/solr/dist regex=apache-solr-dataimporthandler-.*\.jar / ... requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config.xml/str /lst /requestHandler and the 'data-config.xml' contains the correct db connections. However when I access the that plugin: http://localhost:8080/solr_app/dataimport (with or without a command), I get the following errors: Dec 30, 2011 6:46:03 PM org.apache.solr.common.SolrException log SEVERE: java.lang.AbstractMethodError: org.apache.solr.handler.RequestHandlerBase.handleRequestBody(Lorg/apache/solr/request/SolrQueryRequest;Lorg/apache/solr/response/SolrQueryResponse;)V at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:636) The really annoying part is these errors give no clear indication what is wrong. Mind you I was able to set this up on my local desktop with no issues. Running this on an online 'Centos 5.7' box ... errors! I'm sure it is an easy fix ... but the exception/error thrown gives no clear indication what is going wrong. Thanks in advance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-2358: -- Description: The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. (was: The first steps towards creating distributed indexing functionality in Solr) Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2998) FastVectorHighlighter should be able to return single matched multivalue result, not concatenate surrounding values
FastVectorHighlighter should be able to return single matched multivalue result, not concatenate surrounding values --- Key: SOLR-2998 URL: https://issues.apache.org/jira/browse/SOLR-2998 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 4.0 Reporter: Antony Stubbs Standard highlighter would, specifically LuceneGapFragmenter, only return a single highlighted value from mutlivalue field highlighting. I can't see how to get the same response from FVH, it seems to insist on concatenating all values of a multivalue field together (or at least surrounding values on highlight matches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1872) Document-level Access Control in Solr
[ https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178536#comment-13178536 ] Peter Sturge commented on SOLR-1872: Hi, I've not yet tried it directly with 3.4, but as it's a subclass of SearchComponent, it should work fine. Since it is just a plugin, it's easy to add it in via your solrconfig.xml. Peter On Fri, Dec 30, 2011 at 10:03 AM, Arvind Das (Commented) (JIRA) Document-level Access Control in Solr - Key: SOLR-1872 URL: https://issues.apache.org/jira/browse/SOLR-1872 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Labels: access, control Attachments: SolrACLSecurity.java, SolrACLSecurity.java, SolrACLSecurity.rar This issue relates to providing document-level access control for Solr index data. A related JIRA issue is: SOLR-1834. I thought it would be best if I created a separate JIRA issue, rather than tack on to SOLR-1834, as the approach here is somewhat different, and I didn't want to confuse things or step on Anders' good work. There have been lots of discussions about document-level access in Solr using LCF, custom comoponents and the like. Access Control is one of those subjects that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone agrees with the approaches taken here, it does, at the very least, highlight some of the salient issues surrounding access control in Solr, and will hopefully initiate a healthy discussion on the range of related requirements, with the aim of finding the optimum balance of requirements. The approach taken here is document and schema agnostic - i.e. the access control is independant of what is or will be in the index, and no schema changes are required. This version doesn't include LDAP/AD integration, but could be added relatively easily (see Ander's very good work on this in SOLR-1834). Note that, at the moment, this version doesn't deal with /update, /replication etc., it's currently a /select thing at the moment (but it could be used for these). This approach uses a SearchComponent subclass called SolrACLSecurity. Its configuration is read in from solrconfig.xml in the usual way, and the allow/deny configuration is split out into a config file called acl.xml. acl.xml defines a number of users and groups (and 1 global for 'everyone'), and assigns 0 or more {{acl-allow}} and/or {{acl-deny}} elements. When the SearchComponent is initialized, user objects are created and cached, including an 'allow' list and a 'deny' list. When a request comes in, these lists are used to build filter queries ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query request. Because the allow and deny elements are simply subsearch queries (e.g. {{acl-allowsomefield:secret/acl-allow}}, this mechanism will work on any stored data that can be queried, including already existing data. Authentication One of the sticky problems with access control is how to determine who's asking for data. There are many approaches, and to stay in the generic vein the current mechanism uses http parameters for this. For an initial search, a client includes a {{username=somename}} parameter and a {{hash=pwdhash}} hash of its password. If the request sends the correct parameters, the search is granted and a uuid parameter is returned in the response header. This uuid can then be used in subsequent requests from the client. If the request is wrong, the SearchComponent fails and will increment the user's failed login count (if a valid user was specified). If this count exceeds the configured lockoutThreshold, no further requests are granted until the lockoutTime has elapsed. This mechanism protects against some types of attacks (e.g. CLRF, dictionary etc.), but it really needs container HTTPS as well (as would most other auth implementations). Incorporating SSL certificates for authentication and making the authentication mechanism pluggable would be a nice improvement (i.e. separate authentication from access control). Another issue is how internal searchers perform autowarming etc. The solution here is to use a local key called 'SolrACLSecurityKey'. This key is local and [should be] unique to that server. firstSearcher, newSearcher et al then include this key in their parameters so they can perform autowarming without constraint. Again, there are likely many ways to achieve this, this approach is but one. The attached rar holds the source and associated configuration. This has been tested on the 1.4 release codebase (search in the attached
[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 1429 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/1429/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: Cannot delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1325545810945/index/_a.tii Stack Trace: java.io.IOException: Cannot delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1325545810945/index/_a.tii at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:296) at org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:395) at org.apache.lucene.store.MockDirectoryWrapper.crash(MockDirectoryWrapper.java:220) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:559) at org.apache.solr.SolrTestCaseJ4.closeDirectories(SolrTestCaseJ4.java:82) at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:290) at org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:72) FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:310) at org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:349) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:278) Build Log (for compile errors): [...truncated 15167 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 12016 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/12016/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: Cannot delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-132555796/index/_a.tii Stack Trace: java.io.IOException: Cannot delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-132555796/index/_a.tii at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:296) at org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:395) at org.apache.lucene.store.MockDirectoryWrapper.crash(MockDirectoryWrapper.java:268) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:559) at org.apache.solr.SolrTestCaseJ4.closeDirectories(SolrTestCaseJ4.java:82) at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:290) at org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:72) FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:310) at org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:349) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:278) Build Log (for compile errors): [...truncated 15155 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Some interesting IR ideas
Hi Lucene geeks, We have a whole new year in front of us and we don't want to become bored, do we... so I thought I'd share some interesting ideas that I encountered over the past few months, while reading now and then a bunch papers on IR. No code yet, sorry! just wondering what it would be like if Lucene supported this or that functionality. Feel free to say nuts or useless or brilliant or anything in between. Or come up with your ideas! Mainly the following concepts are about maintaining additional index data for improved performance or functionality. Experimentation in this area became practical now in trunk with the completion of the Codec API, but there may be still some things missing in the API-s, for example the ability to discover, select and process sub-lists of postings, or customization of query evaluation algorithms, etc. Some of these ideas got implemented as a part of the original research - I'm sorry to say that nearly none of them used Lucene, usually it was either Zetair or Terrier. I'd blame pre-flex API-s for this, so hopefully the situation will improve in the coming years. So, here we go. 1. Block-Max indexes The idea is presented fully here: http://cis.poly.edu/suel/papers/bmw.pdf . Basically, it's about skipping parts of posting lists that are unlikely to contribute to the top-N documents. The parts of the lists are marked with, well, tombstones, that carry a value, which is the maximum score of a term query for a given range of the doc-ids (under some metric). For some types of queries it's possible to predict whether any possible matches in a given portion of the posting list will produce a candidate that fits in the top-N docids, based on the maximum value of a term score (or any other useful metric for that matter). You can read the gory details of query eval. in the paper. This is a part of a broader topic of dynamic pruning of query eval. and I have a dozen or so other references on this. In Lucene, we could handle such tombstones using a specialized codec. However, I think the query evaluation mechanism wouldn't be able to use this information to skip certain ranges of docs... or maybe it could be implemented as filters initialized from tombstone values? 2. Time-machine indexes === This is really a variant of the above, only the tombstones record timestamps (and of course the index is allowed to hold duplicates of documents). We can already do an approximation of this by limiting query evaluation only to the latest segments (if we can guarantee that segment creation / merging follows monotonically increasing timestamps). But using tombstones we could merge segments from different periods of time, as long as we guarantee that we don't splitshuffle blocks of postings that belong to the same timestamp. Query evaluation that concerns a time range would then be able to skip directly to the right tombstones based on timestamps (plus some additional filtering if tombstones are too coarse-grained). No idea how to implement this with the current API - maybe with filters, as above? Note that the current flex API always assumes that postings need to be fully decoded for evaluation, because the evaluation algorithms are codec-independent. Perhaps we could come up with an api that allows us to customize the evaluation algos based on codec impl? 3. Caching results as an in-memory inverted index = I can't find the paper right now ... perhaps it was by Torsten Suel, who did a lot of research on the topic of caching. In Solr we use caches for caching docsets from past queries, and we can do some limited intersections for simple boolean queries. The idea here is really simple: since we already pull in results and doc fields (and we know what terms contribute to these results, from re-written queries, so we could provide these too) we could use this information to create a memory-constrained inverted index that will answer not only simple boolean queries using intersections of bitsets, but possibly also other queries that require full query evaluation - and under some metric we could decide that results are either exact, good enough, or need to be evaluated against the full index. We could then periodically prune this index based on LFU, LRU or some such strategy. Hmm, part of this idea is here, I think: http://www2008.org/papers/pdf/p387-zhangA.pdf or here: http://www2005.org/cdrom/docs/p257.pdf BTW, there are dozens of papers on caching in search engines, for example this: http://www.hugo-zaragoza.net/academic/pdf/blanco_SIGIR2010b.pdf - here the author argues against throwing away all cached lists after an index update (which we do in Solr), and instead to keep those lists that are likely to give identical results as before the update. 4. Phrase indexing == Again, I lost the reference to the paper that
[jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes
[ https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-1931: - Attachment: SOLR-1931-3x.patch SOLR-1931-trunk.patch Thanks Robert and Yonik for pointing me at the new 4x capabilities, they make a huge difference. But you knew that. The killer for 3.x was getting the document counts via a range query, I don't think there's a good way to get the counts and not pay the penalty, so there's a new parameter recordDocCounts. Here's my latest and close-to-last cut at this, both for 3x and 4x. The data set is 89M documents, times in seconds. 3.5 637 getting doc counts 3x with this patch 552 getting doc counts 53 Stats without doc counts, but histogram etc. No option to do this before. 4x, original 450 or so as I remember, getting doc counts, histograms, etc.. 4x with patch, histograms still work. 158 Getting the doc counts the old way (span queries). I mean, you guys *said* ranges were going to be faster. 39 Getting the doc counts with terms.getDocCount(). (including histograms) Here's my proposal, I'll probably commit this next weekend at the latest unless there are objections: 1 I'll let these stew for a couple of days, and look them over again. Anyone who wants to look too, please feel free. 2 Live with getting the doc counts in 4x including the deleted docs and remove the reportDocCounts parameter (it'll live in 3.6 and other 3x versions). I think the performance is fine without carrying that kind of kludgy option forward. I could be persuaded otherwise, but an optimized index will take care of the counting of deleted documents problem if anyone really cares. Schema Browser does not scale with large indexes Key: SOLR-1931 URL: https://issues.apache.org/jira/browse/SOLR-1931 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 3.6, 4.0 Reporter: Lance Norskog Assignee: Erick Erickson Priority: Minor Attachments: SOLR-1931-3x.patch, SOLR-1931-3x.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch The Schema Browser JSP by default causes the Luke handler to scan the world. In large indexes this make the UI useless. On an index with 64m documents 8gb of disk space, the Schema Browser took 6 minutes to open and hogged all disk I/O, making Solr useless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1931) Schema Browser does not scale with large indexes
[ https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178602#comment-13178602 ] Robert Muir commented on SOLR-1931: --- why is it still 39seconds? shouldn't tools like this just use statistics and not enumerate terms or any anything else by default so that they return instantly? its 4.0, why not just backwards break and make it fast? Instead of doing enumerations and stuff, you could display all of the Terms-level statistics per segment per field: * uniqueTermCount (# of terms) * sumDocFreq (# of postings/term-doc mappings) * sumTotalTermFreq (# of positions/tokens) * docCount (# of documents with at least one posting for the field) This would all be basically instantaneous and would give a more thorough picture of the performance characteristics of the index (e.g. how many positions). You could also compute derived stats like average field length etc too. Schema Browser does not scale with large indexes Key: SOLR-1931 URL: https://issues.apache.org/jira/browse/SOLR-1931 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 3.6, 4.0 Reporter: Lance Norskog Assignee: Erick Erickson Priority: Minor Attachments: SOLR-1931-3x.patch, SOLR-1931-3x.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch The Schema Browser JSP by default causes the Luke handler to scan the world. In large indexes this make the UI useless. On an index with 64m documents 8gb of disk space, the Schema Browser took 6 minutes to open and hogged all disk I/O, making Solr useless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 12017 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/12017/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1325559069257/index/_v.fdx (No such file or directory) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1325559069257/index/_v.fdx (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:70) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:97) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:57) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345) at org.apache.lucene.store.MockDirectoryWrapper.crash(MockDirectoryWrapper.java:248) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:559) at org.apache.solr.SolrTestCaseJ4.closeDirectories(SolrTestCaseJ4.java:82) at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:290) at org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:72) FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:310) at org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:349) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:278) Build Log (for compile errors): [...truncated 15166 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1418 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1418/ 1 tests failed. REGRESSION: org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration Error Message: expected:1 but was:0 Stack Trace: junit.framework.AssertionFailedError: expected:1 but was:0 at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) at org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:252) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:528) Build Log (for compile errors): [...truncated 11555 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 12017 - Still Failing
Anyone have an opinion on the following patch? Seems like we should be doing this in crash(), surprised though that nothing else hits this? Index: lucene/src/test-framework/java/org/apache/lucene/store/MockDirectoryWrapper.java === --- lucene/src/test-framework/java/org/apache/lucene/store/MockDirectoryWrapper.java (revision 1226613) +++ lucene/src/test-framework/java/org/apache/lucene/store/MockDirectoryWrapper.java (working copy) @@ -392,7 +392,15 @@ openFilesDeleted.remove(name); } } -delegate.deleteFile(name); +if (forced) { + try { +delegate.deleteFile(name); + } catch (FileNotFoundException e) { +// if its a forced delete (e.g. from crash(), this is fine, maybe it was already deleted) + } +} else { + delegate.deleteFile(name); +} } public synchronized SetString getOpenDeletedFiles() { On Mon, Jan 2, 2012 at 9:56 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/12017/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1325559069257/index/_v.fdx (No such file or directory) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1325559069257/index/_v.fdx (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:70) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:97) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:57) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345) at org.apache.lucene.store.MockDirectoryWrapper.crash(MockDirectoryWrapper.java:248) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:559) at org.apache.solr.SolrTestCaseJ4.closeDirectories(SolrTestCaseJ4.java:82) at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:290) at org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:72) FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:310) at org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:349) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:278) Build Log (for compile errors): [...truncated 15166 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1931) Schema Browser does not scale with large indexes
[ https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178608#comment-13178608 ] Erick Erickson commented on SOLR-1931: -- bq: why is it still 39 seconds? Histograms and collecting the top N terms by frequency. Still gotta spin through all the terms to collect either statistic. Take that bit out and the response is less than 0.5 seconds. 39 seconds isn't bad at all for an index this size, and one can still specify particular fields of interest if the index is more complex than this one. I can probably be argued out of their importance although it'll take a little doing. This is really for, from my perspective, troubleshooting at a high level and that information is valuable. Besides, I *told* you I had to look it over after a while. I just saw something horribly trivial that cuts it down to 15 seconds. There's a loop where, after the histo stuff is collected, we test to see if the current term frequency is above the threshold of the already-collected items and changing it from if (freq tiq.minfreq) continue; to, essentially, if (freq = tiq.minfreq) continue; means that the pathological case of inserting every last uniqueKey in the tracking priority queue doesn't happen. Siiigggh. Oh, and the patch I'll attach in a couple of minutes actually compiles. I half cleaned up the stupid recordDocCount parameter by removing the definition, but not getting it from the parameters. Fella has to go to sleep more often. Also, this index is a little peculiar in that many of the fields have only a very few values so YMMV. Schema Browser does not scale with large indexes Key: SOLR-1931 URL: https://issues.apache.org/jira/browse/SOLR-1931 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 3.6, 4.0 Reporter: Lance Norskog Assignee: Erick Erickson Priority: Minor Attachments: SOLR-1931-3x.patch, SOLR-1931-3x.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch The Schema Browser JSP by default causes the Luke handler to scan the world. In large indexes this make the UI useless. On an index with 64m documents 8gb of disk space, the Schema Browser took 6 minutes to open and hogged all disk I/O, making Solr useless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes
[ https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-1931: - Attachment: SOLR-1931-trunk.patch Trunk that, you know, actually compiles or something, mea culpa. Also reduces the 4x time down to 15 seconds after fixing a stupid oversight. Really gotta let this stew for a while and look at it with less-tired eyes. Schema Browser does not scale with large indexes Key: SOLR-1931 URL: https://issues.apache.org/jira/browse/SOLR-1931 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 3.6, 4.0 Reporter: Lance Norskog Assignee: Erick Erickson Priority: Minor Attachments: SOLR-1931-3x.patch, SOLR-1931-3x.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch The Schema Browser JSP by default causes the Luke handler to scan the world. In large indexes this make the UI useless. On an index with 64m documents 8gb of disk space, the Schema Browser took 6 minutes to open and hogged all disk I/O, making Solr useless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3305) Kuromoji code donation - a new Japanese morphological analyzer
[ https://issues.apache.org/jira/browse/LUCENE-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178635#comment-13178635 ] Robert Muir commented on LUCENE-3305: - I created a branch here (https://svn.apache.org/repos/asf/lucene/dev/branches/lucene3305) with an initial import of this code, only minor tweaks to get things working in the build so far. Kuromoji code donation - a new Japanese morphological analyzer -- Key: LUCENE-3305 URL: https://issues.apache.org/jira/browse/LUCENE-3305 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Christian Moen Assignee: Simon Willnauer Fix For: 4.0 Attachments: Kuromoji short overview .pdf, LUCENE-3305.patch, ip-clearance-Kuromoji.xml, ip-clearance-Kuromoji.xml, kuromoji-0.7.6-asf.tar.gz, kuromoji-0.7.6.tar.gz, kuromoji-solr-0.5.3-asf.tar.gz, kuromoji-solr-0.5.3.tar.gz Atilika Inc. (アティリカ株式会社) would like to donate the Kuromoji Japanese morphological analyzer to the Apache Software Foundation in the hope that it will be useful to Lucene and Solr users in Japan and elsewhere. The project was started in 2010 since we couldn't find any high-quality, actively maintained and easy-to-use Java-based Japanese morphological analyzers, and these become many of our design goals for Kuromoji. Kuromoji also has a segmentation mode that is particularly useful for search, which we hope will interest Lucene and Solr users. Compound-nouns, such as 関西国際空港 (Kansai International Airport) and 日本経済新聞 (Nikkei Newspaper), are segmented as one token with most analyzers. As a result, a search for 空港 (airport) or 新聞 (newspaper) will not give you a for in these words. Kuromoji can segment these words into 関西 国際 空港 and 日本 経済 新聞, which is generally what you would want for search and you'll get a hit. We also wanted to make sure the technology has a license that makes it compatible with other Apache Software Foundation software to maximize its usefulness. Kuromoji has an Apache License 2.0 and all code is currently owned by Atilika Inc. The software has been developed by my good friend and ex-colleague Masaru Hasegawa and myself. Kuromoji uses the so-called IPADIC for its dictionary/statistical model and its license terms are described in NOTICE.txt. I'll upload code distributions and their corresponding hashes and I'd very much like to start the code grant process. I'm also happy to provide patches to integrate Kuromoji into the codebase, if you prefer that. Please advise on how you'd like me to proceed with this. Thank you. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-3.x - Build # 560 - Failure
Build: https://builds.apache.org/job/Solr-3.x/560/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1325572860616/index/_a.tii (No such file or directory) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1325572860616/index/_a.tii (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:218) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345) at org.apache.lucene.store.MockDirectoryWrapper.crash(MockDirectoryWrapper.java:248) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:559) at org.apache.solr.SolrTestCaseJ4.closeDirectories(SolrTestCaseJ4.java:82) at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:290) at org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:72) FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:310) at org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:349) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:278) Build Log (for compile errors): [...truncated 21768 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org