[jira] [Created] (SOLR-2480) Text extraction of password protected files
Text extraction of password protected files --- Key: SOLR-2480 URL: https://issues.apache.org/jira/browse/SOLR-2480 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Affects Versions: 3.1 Reporter: Shinichiro Abe Priority: Minor Proposal: There are password-protected files. PDF, Office documents in 2007 format/97 format. These files are posted using SolrCell. We do not have to read these files if we do not know the reading password of files. So, these files may not be extracted text. My requirement is that these files should be processed normally without extracting text, and without throwing exception. This background: Now, when you post a password-protected file, solr returns 500 server error. Solr catches the error in ExtractingDocumentLoader and throws TikException. I use ManifoldCF. If the solr server responds 500, ManifoldCF judge is that this document should be retried because I have absolutely no idea what happened. And it attempts to retry posting many times without getting the password. In the other case, my customer posts the files with embedded images. Sometimes it seems that solr throws TikaException of unknown cause. He wants to post just metadata without extracting text, but makes him stop posting by the exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2480) Text extraction of password protected files
[ https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026137#comment-13026137 ] Shinichiro Abe commented on SOLR-2480: -- Improvement ideas: 1, TikaException is always ignored, and index only the metadata. 2, Parameter ignoreTikaException is provided newly. If it is true then it returns 200 response, if it is false then it throws TikaException. 3, If Solr can catch internal exception about encrypting error, it changes return code each exception. If it can judge poi.EncryptedDocumentException, pdfbox.exceptions.CryptographyException. etc. then it returns 200 or another code response, if it judges the other exception then it throws TikaException. Text extraction of password protected files --- Key: SOLR-2480 URL: https://issues.apache.org/jira/browse/SOLR-2480 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Affects Versions: 3.1 Reporter: Shinichiro Abe Priority: Minor Proposal: There are password-protected files. PDF, Office documents in 2007 format/97 format. These files are posted using SolrCell. We do not have to read these files if we do not know the reading password of files. So, these files may not be extracted text. My requirement is that these files should be processed normally without extracting text, and without throwing exception. This background: Now, when you post a password-protected file, solr returns 500 server error. Solr catches the error in ExtractingDocumentLoader and throws TikException. I use ManifoldCF. If the solr server responds 500, ManifoldCF judge is that this document should be retried because I have absolutely no idea what happened. And it attempts to retry posting many times without getting the password. In the other case, my customer posts the files with embedded images. Sometimes it seems that solr throws TikaException of unknown cause. He wants to post just metadata without extracting text, but makes him stop posting by the exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On Apr 26, 2011, at 11:12 PM, Chris Male gento...@gmail.com wrote: The two sides/takes seem to be (with some example reasons): 1. pro: for example, modularization can expose features that were traditionally in solr to lucene users. Some other Pros: Easier to test individual pieces. Easier to benchmark. More usage == more/better features/functionality for everyone. Easier for people to contribute to without having to know the full stack. I think most people agree that decoupled, reusable modules are a good thing in general as an abstract concept, but, of course, specifics matter. 2. con: for example, modularization slows development of these features and they will evolve slower if they are in lucene. I think this needs a bit more explanation. AIUI, the primary cause for concern is that by making something a module, you are taking a private, internal API of Solr's and now making it a public API that must be maintained (and backwards maintained) which could slow down development as one now needs to be concerned with more factors than you would if it were merely an implementation detail in Solr. I feel this can be flipped around and seen as a pro though too. Agreed. Wasnt sure where to put it. Some see it as bad, some as good Taking internal code and making it public can be beneficial for that code, because it forces the APIs to be examined, test coverage improved, and a general 'kicking of the tyres'. With private internal APIs, there is always a temptation to make quick changes that meet an immediate need, rather than having to step back and take more time considering changes. That can slow things down yes, but it definitely has its benefits. Other Cons: The concern was that Solr just becomes an uninteresting, empty shell that glues together modules. (I don't agree, but wanted to present what I have heard) I think we need to somehow get a better understanding of both sides, specific examples of portions of the code would be helpful I think. Maybe then we can arrive at a compromise so that we aren't so frustrated about this issue. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Software Developer | JTeam BV.| www.jteam.nl
[jira] [Commented] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026144#comment-13026144 ] Simon Willnauer commented on LUCENE-3023: - {quote} I noticed in the branch the test method is changed to _testIndexingThenDeleting (disabled). However, if i re-enable it (rename it back) it never seems to finish... {quote} I just reenabled, fixed and committed that testcase on branch. Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026145#comment-13026145 ] Simon Willnauer commented on LUCENE-3023: - bq. Attached is the DWPT branch in patch format against trunk (for easier reviewing). Awesome, Thanks Robert!!! Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On Wed, Apr 27, 2011 at 09:25:14AM -0400, Yonik Seeley wrote: ... But as I said... it seems only fair to meet half way and use the solr namespace for some modules and the lucene namespace for others. Please explain this part to me... I really don't understand. What does fairness have to do with the codebase? Isn't the whole point of the Lucene project to create the best code possible, for the benefit of our worldwide users? How does the concept of fairness fit into that? Cheers, -g - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3033) TestAddIndexes#testAddIndexesWithThreads fails on Realtime
[ https://issues.apache.org/jira/browse/LUCENE-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026156#comment-13026156 ] Simon Willnauer commented on LUCENE-3033: - I think I have found the root cause of this issue. There was a chance of missing a DWPT during a full flush / commit when a full flush is started while we release a new DWPT into the pool. All subsequent documents added to this DWPT will never get flushed neither if I commit nor when I close the writer. This also explains the failures in LUCENE-3035. I committed a fix in Revision: 1097156. I will resolve those issues since both haven't occurred anymore running them while(1) the entire night. TestAddIndexes#testAddIndexesWithThreads fails on Realtime -- Key: LUCENE-3033 URL: https://issues.apache.org/jira/browse/LUCENE-3033 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Fix For: Realtime Branch Selckin reported two failures on LUCENE-3023 which I can unfortunately not reproduce at all. here are the traces {noformat} [junit] Testsuite: org.apache.lucene.index.TestAddIndexes [junit] Testcase: testAddIndexesWithThreads(org.apache.lucene.index.TestAddIndexes): FAILED [junit] expected:3160 but was:3060 [junit] junit.framework.AssertionFailedError: expected:3160 but was:3060 [junit] at org.apache.lucene.index.TestAddIndexes.testAddIndexesWithThreads(TestAddIndexes.java:783) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154) [junit] [junit] [junit] Tests run: 18, Failures: 1, Errors: 0, Time elapsed: 14.272 sec [junit] [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestAddIndexes -Dtestmethod=testAddIndexesWithThreads -Dtests.seed=6128854208955988865:2552774338676281184 [junit] NOTE: test params are: codec=PreFlex, locale=no_NO_NY, timezone=America/Edmonton [junit] NOTE: all tests run in this JVM: [junit] [TestToken, TestDateTools, Test2BTerms, TestAddIndexes] [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_24 (64-bit)/cpus=8,threads=1,free=84731792,total=258080768 [junit] - --- {noformat} and {noformat} [junit] Testsuite: org.apache.lucene.index.TestAddIndexes [junit] Testcase: testAddIndexesWithThreads(org.apache.lucene.index.TestAddIndexes): FAILED [junit] expected:3160 but was:3060 [junit] junit.framework.AssertionFailedError: expected:3160 but was:3060 [junit] at org.apache.lucene.index.TestAddIndexes.testAddIndexesWithThreads(TestAddIndexes.java:783) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154) [junit] [junit] [junit] Tests run: 18, Failures: 1, Errors: 0, Time elapsed: 14.841 sec [junit] [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestAddIndexes -Dtestmethod=testAddIndexesWithThreads -Dtests.seed=4502815121171887759:-6764285049309266272 [junit] NOTE: test params are: codec=PreFlex, locale=tr_TR, timezone=Mexico/BajaNorte [junit] NOTE: all tests run in this JVM: [junit] [TestToken, TestDateTools, Test2BTerms, TestAddIndexes] [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_24 (64-bit)/cpus=8,threads=1,free=163663416,total=243335168 [junit] - --- {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3033) TestAddIndexes#testAddIndexesWithThreads fails on Realtime
[ https://issues.apache.org/jira/browse/LUCENE-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-3033. - Resolution: Fixed fixed in Revision: 1097156 TestAddIndexes#testAddIndexesWithThreads fails on Realtime -- Key: LUCENE-3033 URL: https://issues.apache.org/jira/browse/LUCENE-3033 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Fix For: Realtime Branch Selckin reported two failures on LUCENE-3023 which I can unfortunately not reproduce at all. here are the traces {noformat} [junit] Testsuite: org.apache.lucene.index.TestAddIndexes [junit] Testcase: testAddIndexesWithThreads(org.apache.lucene.index.TestAddIndexes): FAILED [junit] expected:3160 but was:3060 [junit] junit.framework.AssertionFailedError: expected:3160 but was:3060 [junit] at org.apache.lucene.index.TestAddIndexes.testAddIndexesWithThreads(TestAddIndexes.java:783) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154) [junit] [junit] [junit] Tests run: 18, Failures: 1, Errors: 0, Time elapsed: 14.272 sec [junit] [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestAddIndexes -Dtestmethod=testAddIndexesWithThreads -Dtests.seed=6128854208955988865:2552774338676281184 [junit] NOTE: test params are: codec=PreFlex, locale=no_NO_NY, timezone=America/Edmonton [junit] NOTE: all tests run in this JVM: [junit] [TestToken, TestDateTools, Test2BTerms, TestAddIndexes] [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_24 (64-bit)/cpus=8,threads=1,free=84731792,total=258080768 [junit] - --- {noformat} and {noformat} [junit] Testsuite: org.apache.lucene.index.TestAddIndexes [junit] Testcase: testAddIndexesWithThreads(org.apache.lucene.index.TestAddIndexes): FAILED [junit] expected:3160 but was:3060 [junit] junit.framework.AssertionFailedError: expected:3160 but was:3060 [junit] at org.apache.lucene.index.TestAddIndexes.testAddIndexesWithThreads(TestAddIndexes.java:783) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154) [junit] [junit] [junit] Tests run: 18, Failures: 1, Errors: 0, Time elapsed: 14.841 sec [junit] [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestAddIndexes -Dtestmethod=testAddIndexesWithThreads -Dtests.seed=4502815121171887759:-6764285049309266272 [junit] NOTE: test params are: codec=PreFlex, locale=tr_TR, timezone=Mexico/BajaNorte [junit] NOTE: all tests run in this JVM: [junit] [TestToken, TestDateTools, Test2BTerms, TestAddIndexes] [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_24 (64-bit)/cpus=8,threads=1,free=163663416,total=243335168 [junit] - --- {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3035) TestIndexWriter.testCommitThreadSafety fails on realtime_search branch
[ https://issues.apache.org/jira/browse/LUCENE-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026159#comment-13026159 ] Simon Willnauer commented on LUCENE-3035: - I think I have found the root cause of this issue. There was a chance of missing a DWPT during a full flush / commit when a full flush is started while we release a new DWPT into the pool. All subsequent documents added to this DWPT will never get flushed neither if I commit nor when I close the writer. This also explains the failures in LUCENE-3033. I committed a fix in Revision: 1097156. I will resolve those issues since both haven't occurred anymore running them while(1) the entire night. TestIndexWriter.testCommitThreadSafety fails on realtime_search branch -- Key: LUCENE-3035 URL: https://issues.apache.org/jira/browse/LUCENE-3035 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Fix For: Realtime Branch Hudson failed on RT with this error - I wasn't able to reproduce yet {noformat} NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testCommitThreadSafety -Dtests.seed=410261592077577885:-4099127561715488589 -Dtests.multiplier=3 NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testCommitThreadSafety -Dtests.seed=410261592077577885:-4099127561715488589 -Dtests.multiplier=3 The following exceptions were thrown by threads: *** Thread: Thread-331 *** java.lang.RuntimeException: java.lang.AssertionError: term=f:2_0; r=DirectoryReader(segments_6 _8(4.0):Cv7) expected:1 but was:0 at org.apache.lucene.index.TestIndexWriter$5.run(TestIndexWriter.java:2416) Caused by: java.lang.AssertionError: term=f:2_0; r=DirectoryReader(segments_6 _8(4.0):Cv7) expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.apache.lucene.index.TestIndexWriter$5.run(TestIndexWriter.java:2410) NOTE: test params are: codec=RandomCodecProvider: {=SimpleText, f6=MockVariableIntBlock(baseBlockSize=91), f7=MockFixedIntBlock(blockSize=1289), f8=Standard, f9=MockRandom, f1=MockSep, f0=Pulsing(freqCutoff=15), f3=Pulsing(freqCutoff=15), f2=MockFixedIntBlock(blockSize=1289), f5=MockVariableIntBlock(baseBlockSize=91), f4=MockRandom, f=MockSep, c=MockVariableIntBlock(baseBlockSize=91), termVector=SimpleText, d9=SimpleText, d8=MockSep, d5=MockVariableIntBlock(baseBlockSize=91), d4=MockRandom, d7=Standard, d6=SimpleText, d25=Standard, d0=MockVariableIntBlock(baseBlockSize=91), c29=Standard, d24=SimpleText, d1=MockFixedIntBlock(blockSize=1289), c28=MockFixedIntBlock(blockSize=1289), d23=MockVariableIntBlock(baseBlockSize=91), d2=Standard, c27=MockVariableIntBlock(baseBlockSize=91), d22=MockRandom, d3=MockRandom, d21=MockFixedIntBlock(blockSize=1289), d20=MockVariableIntBlock(baseBlockSize=91), c22=MockVariableIntBlock(baseBlockSize=91), c21=MockRandom, c20=Pulsing(freqCutoff=15), d29=MockVariableIntBlock(baseBlockSize=91), c26=SimpleText, d28=MockRandom, c25=MockSep, d27=Pulsing(freqCutoff=15), c24=MockRandom, d26=MockFixedIntBlock(blockSize=1289), c23=Standard, e9=MockRandom, e8=MockFixedIntBlock(blockSize=1289), e7=MockVariableIntBlock(baseBlockSize=91), e6=MockSep, e5=Pulsing(freqCutoff=15), c17=Standard, e3=MockFixedIntBlock(blockSize=1289), d12=SimpleText, c16=SimpleText, e4=Pulsing(freqCutoff=15), d11=MockSep, c19=MockSep, e1=MockSep, d14=Pulsing(freqCutoff=15), c18=Pulsing(freqCutoff=15), e2=SimpleText, d13=MockFixedIntBlock(blockSize=1289), e0=Standard, d10=Standard, d19=Pulsing(freqCutoff=15), c11=SimpleText, c10=MockSep, d16=MockRandom, c13=MockSep, c12=Pulsing(freqCutoff=15), d15=Standard, d18=SimpleText, c15=MockFixedIntBlock(blockSize=1289), d17=MockSep, c14=MockVariableIntBlock(baseBlockSize=91), b3=MockRandom, b2=Standard, b5=SimpleText, b4=MockSep, b7=MockSep, b6=Pulsing(freqCutoff=15), d50=MockVariableIntBlock(baseBlockSize=91), b9=MockFixedIntBlock(blockSize=1289), b8=MockVariableIntBlock(baseBlockSize=91), d43=Pulsing(freqCutoff=15), d42=MockFixedIntBlock(blockSize=1289), d41=SimpleText, d40=MockSep, d47=MockRandom, d46=Standard, b0=SimpleText, d45=MockFixedIntBlock(blockSize=1289), b1=Standard, d44=MockVariableIntBlock(baseBlockSize=91), d49=MockSep, d48=Pulsing(freqCutoff=15), c6=MockVariableIntBlock(baseBlockSize=91), c5=MockRandom, c4=Pulsing(freqCutoff=15), c3=MockFixedIntBlock(blockSize=1289), c9=MockSep, c8=MockRandom, c7=Standard, d30=MockFixedIntBlock(blockSize=1289),
[jira] [Resolved] (LUCENE-3035) TestIndexWriter.testCommitThreadSafety fails on realtime_search branch
[ https://issues.apache.org/jira/browse/LUCENE-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-3035. - Resolution: Fixed fix in Revision: 1097156 TestIndexWriter.testCommitThreadSafety fails on realtime_search branch -- Key: LUCENE-3035 URL: https://issues.apache.org/jira/browse/LUCENE-3035 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Fix For: Realtime Branch Hudson failed on RT with this error - I wasn't able to reproduce yet {noformat} NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testCommitThreadSafety -Dtests.seed=410261592077577885:-4099127561715488589 -Dtests.multiplier=3 NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testCommitThreadSafety -Dtests.seed=410261592077577885:-4099127561715488589 -Dtests.multiplier=3 The following exceptions were thrown by threads: *** Thread: Thread-331 *** java.lang.RuntimeException: java.lang.AssertionError: term=f:2_0; r=DirectoryReader(segments_6 _8(4.0):Cv7) expected:1 but was:0 at org.apache.lucene.index.TestIndexWriter$5.run(TestIndexWriter.java:2416) Caused by: java.lang.AssertionError: term=f:2_0; r=DirectoryReader(segments_6 _8(4.0):Cv7) expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.apache.lucene.index.TestIndexWriter$5.run(TestIndexWriter.java:2410) NOTE: test params are: codec=RandomCodecProvider: {=SimpleText, f6=MockVariableIntBlock(baseBlockSize=91), f7=MockFixedIntBlock(blockSize=1289), f8=Standard, f9=MockRandom, f1=MockSep, f0=Pulsing(freqCutoff=15), f3=Pulsing(freqCutoff=15), f2=MockFixedIntBlock(blockSize=1289), f5=MockVariableIntBlock(baseBlockSize=91), f4=MockRandom, f=MockSep, c=MockVariableIntBlock(baseBlockSize=91), termVector=SimpleText, d9=SimpleText, d8=MockSep, d5=MockVariableIntBlock(baseBlockSize=91), d4=MockRandom, d7=Standard, d6=SimpleText, d25=Standard, d0=MockVariableIntBlock(baseBlockSize=91), c29=Standard, d24=SimpleText, d1=MockFixedIntBlock(blockSize=1289), c28=MockFixedIntBlock(blockSize=1289), d23=MockVariableIntBlock(baseBlockSize=91), d2=Standard, c27=MockVariableIntBlock(baseBlockSize=91), d22=MockRandom, d3=MockRandom, d21=MockFixedIntBlock(blockSize=1289), d20=MockVariableIntBlock(baseBlockSize=91), c22=MockVariableIntBlock(baseBlockSize=91), c21=MockRandom, c20=Pulsing(freqCutoff=15), d29=MockVariableIntBlock(baseBlockSize=91), c26=SimpleText, d28=MockRandom, c25=MockSep, d27=Pulsing(freqCutoff=15), c24=MockRandom, d26=MockFixedIntBlock(blockSize=1289), c23=Standard, e9=MockRandom, e8=MockFixedIntBlock(blockSize=1289), e7=MockVariableIntBlock(baseBlockSize=91), e6=MockSep, e5=Pulsing(freqCutoff=15), c17=Standard, e3=MockFixedIntBlock(blockSize=1289), d12=SimpleText, c16=SimpleText, e4=Pulsing(freqCutoff=15), d11=MockSep, c19=MockSep, e1=MockSep, d14=Pulsing(freqCutoff=15), c18=Pulsing(freqCutoff=15), e2=SimpleText, d13=MockFixedIntBlock(blockSize=1289), e0=Standard, d10=Standard, d19=Pulsing(freqCutoff=15), c11=SimpleText, c10=MockSep, d16=MockRandom, c13=MockSep, c12=Pulsing(freqCutoff=15), d15=Standard, d18=SimpleText, c15=MockFixedIntBlock(blockSize=1289), d17=MockSep, c14=MockVariableIntBlock(baseBlockSize=91), b3=MockRandom, b2=Standard, b5=SimpleText, b4=MockSep, b7=MockSep, b6=Pulsing(freqCutoff=15), d50=MockVariableIntBlock(baseBlockSize=91), b9=MockFixedIntBlock(blockSize=1289), b8=MockVariableIntBlock(baseBlockSize=91), d43=Pulsing(freqCutoff=15), d42=MockFixedIntBlock(blockSize=1289), d41=SimpleText, d40=MockSep, d47=MockRandom, d46=Standard, b0=SimpleText, d45=MockFixedIntBlock(blockSize=1289), b1=Standard, d44=MockVariableIntBlock(baseBlockSize=91), d49=MockSep, d48=Pulsing(freqCutoff=15), c6=MockVariableIntBlock(baseBlockSize=91), c5=MockRandom, c4=Pulsing(freqCutoff=15), c3=MockFixedIntBlock(blockSize=1289), c9=MockSep, c8=MockRandom, c7=Standard, d30=MockFixedIntBlock(blockSize=1289), d32=MockRandom, d31=Standard, c1=MockVariableIntBlock(baseBlockSize=91), d34=Standard, c2=MockFixedIntBlock(blockSize=1289), d33=SimpleText, d36=MockSep, c0=MockSep, d35=Pulsing(freqCutoff=15), d38=MockVariableIntBlock(baseBlockSize=91), d37=MockRandom, d39=SimpleText, e92=MockFixedIntBlock(blockSize=1289), e93=Pulsing(freqCutoff=15), e90=MockSep, e91=SimpleText, e89=MockVariableIntBlock(baseBlockSize=91), e88=MockSep, e87=Pulsing(freqCutoff=15), e86=SimpleText, e85=MockSep, e84=MockRandom,
[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026175#comment-13026175 ] Chris Male commented on LUCENE-3041: bq. This is an excellent opportunity to redefine Queries as immutable What do you envisage this involving? Although not required by the API, most rewriting implementations make a new Query and add changes there, leaving themselves untouched. Are you wanting to require this somehow? Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Attachments: LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3023: Attachment: LUCENE-3023_iw_iwc_jdoc.patch I went through IW and IWC jdocs to bring them uptodate. here is a patch against the branch... Review from a native speaker would be very much welcome Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023_iw_iwc_jdoc.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026223#comment-13026223 ] Simon Willnauer commented on LUCENE-3023: - bq. Can we rename Healthiness - DocumentsWriterStallControl (or something like that)? I added a TODO for this - lets do that on trunk bq. I think we lost this infoStream output from trunk? just committed a fix for this to reenable it on branch.. kind of tricky since we now flush concurrently NumberFormat must be accessed single threaded. So I added it to DWPT#flush() Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023_iw_iwc_jdoc.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026229#comment-13026229 ] Simon Willnauer commented on LUCENE-3023: - FYI - I committed the javadoc changes to branch after mikes +1 on IRC. I also marked IWC#setIndexerThreadPool as expert API Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023_iw_iwc_jdoc.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2481) Add support for commitWithin in DataImportHandler
Add support for commitWithin in DataImportHandler - Key: SOLR-2481 URL: https://issues.apache.org/jira/browse/SOLR-2481 Project: Solr Issue Type: Improvement Reporter: Sami Siren Priority: Trivial It looks like DataImportHandler does not support commitWithin. Would be nice if it did. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2481) Add support for commitWithin in DataImportHandler
[ https://issues.apache.org/jira/browse/SOLR-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated SOLR-2481: - Attachment: SOLR-2481.patch initial patch Add support for commitWithin in DataImportHandler - Key: SOLR-2481 URL: https://issues.apache.org/jira/browse/SOLR-2481 Project: Solr Issue Type: Improvement Reporter: Sami Siren Priority: Trivial Attachments: SOLR-2481.patch It looks like DataImportHandler does not support commitWithin. Would be nice if it did. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026256#comment-13026256 ] Upayavira commented on SOLR-2399: - I've just seen the admin console fail on IE9, with and without compatability mode. Basically, the menu box showed up, but nothing on the right hand box, and no loading message, when running on Solr 3.1. The same system worked nicely on Firefox Windows, Firefox Mac, etc. Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Priority: Minor Fix For: 4.0 *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin Quick Tour: [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png], [Query-Form|http://files.mathe.is/solr-admin/02_query.png], [Plugins|http://files.mathe.is/solr-admin/05_plugins.png], [Logging|http://files.mathe.is/solr-admin/07_logging.png], [Analysis|http://files.mathe.is/solr-admin/04_analysis.png], [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3023: Attachment: LUCENE-3023_svndiff.patch attached (LUCENE-3023_svndiff.patch) is just the output from 'svn diff' after merging, for reviewing property changes and similar police work :) Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_svndiff.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3023: Attachment: LUCENE-3023_simonw_review.patch here is my first review round. I manually ported some missed merges from trunk and fixed some really minor things. I will commit shortly to branch if nobody objects Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, LUCENE-3023_svndiff.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2480) Text extraction of password protected files
Hmmm, I'm not sure this fits into Solr-445 or not, could you add this comment to that patch discussion so we at least look? Thanks, Erick On Thu, Apr 28, 2011 at 2:03 AM, Shinichiro Abe (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026137#comment-13026137 ] Shinichiro Abe commented on SOLR-2480: -- Improvement ideas: 1, TikaException is always ignored, and index only the metadata. 2, Parameter ignoreTikaException is provided newly. If it is true then it returns 200 response, if it is false then it throws TikaException. 3, If Solr can catch internal exception about encrypting error, it changes return code each exception. If it can judge poi.EncryptedDocumentException, pdfbox.exceptions.CryptographyException. etc. then it returns 200 or another code response, if it judges the other exception then it throws TikaException. Text extraction of password protected files --- Key: SOLR-2480 URL: https://issues.apache.org/jira/browse/SOLR-2480 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Affects Versions: 3.1 Reporter: Shinichiro Abe Priority: Minor Proposal: There are password-protected files. PDF, Office documents in 2007 format/97 format. These files are posted using SolrCell. We do not have to read these files if we do not know the reading password of files. So, these files may not be extracted text. My requirement is that these files should be processed normally without extracting text, and without throwing exception. This background: Now, when you post a password-protected file, solr returns 500 server error. Solr catches the error in ExtractingDocumentLoader and throws TikException. I use ManifoldCF. If the solr server responds 500, ManifoldCF judge is that this document should be retried because I have absolutely no idea what happened. And it attempts to retry posting many times without getting the password. In the other case, my customer posts the files with embedded images. Sometimes it seems that solr throws TikaException of unknown cause. He wants to post just metadata without extracting text, but makes him stop posting by the exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026295#comment-13026295 ] Simon Willnauer commented on LUCENE-3023: - bq. I will commit shortly to branch if nobody objects committed... robert is running a new reintegration round now Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, LUCENE-3023_svndiff.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026297#comment-13026297 ] Jonathan Rochkind commented on SOLR-2242: - Wonderful much better, thanks Lance, this is a much more clear and flexible api consistent with other parts of Solr. (For a feature I could definitely really use, thanks Bill). But I wonder... should it be facet.numTerms to group with other facetting related params? Or wait, is it already? Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3050) After RT branch lands we should remove DWPT re-use code from indexer
After RT branch lands we should remove DWPT re-use code from indexer Key: LUCENE-3050 URL: https://issues.apache.org/jira/browse/LUCENE-3050 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Priority: Minor We used to re-use data structures inside DWPT, we after RT we do not and so we should remove places where we go and clean stuff up (eg termsHash.reset in FreqProxTermsWriter)... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2324. - Resolution: Fixed we land this on trunk via LUCENE-3023 Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch, test.out, test.out, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3023: Attachment: LUCENE-3023_svndiff.patch LUCENE-3023.patch I resynced up to r1097442 and here are the latest patches (the full patch and the svn diff) Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, LUCENE-3023_svndiff.patch, LUCENE-3023_svndiff.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3023: --- Attachment: diffSources.patch Modified the Python script a bit to do a recursive diff b/w two dirs to make an applyable patch -- added a usage, and a -skipWhitespace option. I put it under a new 'dev-tools/scripts' dir... Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, LUCENE-3023_svndiff.patch, LUCENE-3023_svndiff.patch, diffMccand.py, diffSources.patch, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7527 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7527/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.SolrExampleBinaryTest.testCommitWithin Error Message: expected:1 but was:0 Stack Trace: junit.framework.AssertionFailedError: expected:1 but was:0 at org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:380) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175) Build Log (for compile errors): [...truncated 9061 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 88 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/88/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestDeletionPolicy.testKeepLastNDeletionPolicyWithReader Error Message: expected:128 but was:129 Stack Trace: junit.framework.AssertionFailedError: expected:128 but was:129 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175) at org.apache.lucene.index.TestDeletionPolicy.testKeepLastNDeletionPolicyWithReader(TestDeletionPolicy.java:657) Build Log (for compile errors): [...truncated 3227 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 88 - Failure
This is actually a trunk failure: ant test -Dtestcase=TestDeletionPolicy -Dtestmethod=testKeepLastNDeletionPolicyWithReader -Dtests.seed=4048962790831405116:1429420683736794142 -Dtests.multiplier=3 I'm hunting... Mike http://blog.mikemccandless.com On Thu, Apr 28, 2011 at 4:42 PM, Apache Jenkins Server hud...@hudson.apache.org wrote: Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/88/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestDeletionPolicy.testKeepLastNDeletionPolicyWithReader Error Message: expected:128 but was:129 Stack Trace: junit.framework.AssertionFailedError: expected:128 but was:129 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175) at org.apache.lucene.index.TestDeletionPolicy.testKeepLastNDeletionPolicyWithReader(TestDeletionPolicy.java:657) Build Log (for compile errors): [...truncated 3227 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 88 - Failure
OK fixed... Mike http://blog.mikemccandless.com On Thu, Apr 28, 2011 at 5:45 PM, Michael McCandless luc...@mikemccandless.com wrote: This is actually a trunk failure: ant test -Dtestcase=TestDeletionPolicy -Dtestmethod=testKeepLastNDeletionPolicyWithReader -Dtests.seed=4048962790831405116:1429420683736794142 -Dtests.multiplier=3 I'm hunting... Mike http://blog.mikemccandless.com On Thu, Apr 28, 2011 at 4:42 PM, Apache Jenkins Server hud...@hudson.apache.org wrote: Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/88/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestDeletionPolicy.testKeepLastNDeletionPolicyWithReader Error Message: expected:128 but was:129 Stack Trace: junit.framework.AssertionFailedError: expected:128 but was:129 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175) at org.apache.lucene.index.TestDeletionPolicy.testKeepLastNDeletionPolicyWithReader(TestDeletionPolicy.java:657) Build Log (for compile errors): [...truncated 3227 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2191) Change SolrException cstrs that take Throwable to default to alreadyLogged=false
[ https://issues.apache.org/jira/browse/SOLR-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026714#comment-13026714 ] David Smiley commented on SOLR-2191: Yes, lets do this! I was just about to log a bug when I found it's already been reported. I had some stupid error in my solr config and it never got logged. Because of the error, the core never got registered into the container. And then when I went to do any queries, Solr kept telling me I didn't specify a core name when I never had to before (using the default). I was in the twilight zone for a while. Mark, you're a committer yet you supplied a patch. Why didn't you simply commit it? I heard on the dev list recently that Solr is supposedly CTR (commit then review), yet we clearly act here as RTC. So even if RTC is it, wouldn't there be some threshold to let simple things like this through without a review? Change SolrException cstrs that take Throwable to default to alreadyLogged=false Key: SOLR-2191 URL: https://issues.apache.org/jira/browse/SOLR-2191 Project: Solr Issue Type: Bug Reporter: Mark Miller Fix For: Next Attachments: SOLR-2191.patch Because of misuse, many exceptions are now not logged at all - can be painful when doing dev. I think we should flip this setting and work at removing any double logging - losing logging is worse (and it almost looks like we lose more logging than we would get in double logging) - and bad solrexception/logging patterns are proliferating. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2191) Change SolrException cstrs that take Throwable to default to alreadyLogged=false
[ https://issues.apache.org/jira/browse/SOLR-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026794#comment-13026794 ] Mark Miller commented on SOLR-2191: --- bq. Why didn't you simply commit it? I heard on the dev list recently that Solr is supposedly CTR (commit then review), yet we clearly act here as RTC. Depends really - on the change and on the committer. We like to keep trunk extra shiny, and I think our practice is good myself. But it's up to each committer. bq. So even if RTC is it, wouldn't there be some threshold to let simple things like this through without a review? Yes - and many small things are simply committed. Likely when I ran into this, I was doing other things - and I made a quick patch, but not something I was willing to stake my name on as a commit. I like to do a thorough review first. And then this just fell off my radar. Sometimes you are just not sure of all of the ramifications of your change - a lot of times this is a mini side track while I'm doing something else, and so it's nice to just toss up a patch and get feedback from the likes of Hossman and others before just cowboying on trunk. Again though - each situation is handled by each committer based on their level of comfort, and the general culture of the community. Yeah, this bug is annoying - I'm happy to look at this again soon - I happen to be unusually busy at this time, but I'll certainly try to get this in by this weekend. Change SolrException cstrs that take Throwable to default to alreadyLogged=false Key: SOLR-2191 URL: https://issues.apache.org/jira/browse/SOLR-2191 Project: Solr Issue Type: Bug Reporter: Mark Miller Fix For: Next Attachments: SOLR-2191.patch Because of misuse, many exceptions are now not logged at all - can be painful when doing dev. I think we should flip this setting and work at removing any double logging - losing logging is worse (and it almost looks like we lose more logging than we would get in double logging) - and bad solrexception/logging patterns are proliferating. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2191) Change SolrException cstrs that take Throwable to default to alreadyLogged=false
[ https://issues.apache.org/jira/browse/SOLR-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026799#comment-13026799 ] Mark Miller commented on SOLR-2191: --- Also keep in mind that Review-Then-Commit at Apache means you need consensus and 3 votes before committing I believe: See http://www.apache.org/foundation/glossary.html Consensus Approval 'Consensus approval' refers to a vote (sense 1) which has completed with at least three binding +1 votes and no vetos. Compare . Review-Then-Commit (Often referenced as 'RTC' or 'R-T-C'.) Commit policy which requires that all changes receive consensus approval in order to be committed. Compare , and see the description of the voting process. Change SolrException cstrs that take Throwable to default to alreadyLogged=false Key: SOLR-2191 URL: https://issues.apache.org/jira/browse/SOLR-2191 Project: Solr Issue Type: Bug Reporter: Mark Miller Fix For: Next Attachments: SOLR-2191.patch Because of misuse, many exceptions are now not logged at all - can be painful when doing dev. I think we should flip this setting and work at removing any double logging - losing logging is worse (and it almost looks like we lose more logging than we would get in double logging) - and bad solrexception/logging patterns are proliferating. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2191) Change SolrException cstrs that take Throwable to default to alreadyLogged=false
[ https://issues.apache.org/jira/browse/SOLR-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026798#comment-13026798 ] Mark Miller commented on SOLR-2191: --- Also keep in mind that Review-Then-Commit at Apache means you need consensus and 3 votes before committing I believe: See http://www.apache.org/foundation/glossary.html Consensus Approval 'Consensus approval' refers to a vote (sense 1) which has completed with at least three binding +1 votes and no vetos. Compare . Review-Then-Commit (Often referenced as 'RTC' or 'R-T-C'.) Commit policy which requires that all changes receive consensus approval in order to be committed. Compare , and see the description of the voting process. Change SolrException cstrs that take Throwable to default to alreadyLogged=false Key: SOLR-2191 URL: https://issues.apache.org/jira/browse/SOLR-2191 Project: Solr Issue Type: Bug Reporter: Mark Miller Fix For: Next Attachments: SOLR-2191.patch Because of misuse, many exceptions are now not logged at all - can be painful when doing dev. I think we should flip this setting and work at removing any double logging - losing logging is worse (and it almost looks like we lose more logging than we would get in double logging) - and bad solrexception/logging patterns are proliferating. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7537 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7537/ 1 tests failed. REGRESSION: org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration Error Message: expected:2 but was:3 Stack Trace: junit.framework.AssertionFailedError: expected:2 but was:3 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175) at org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:208) Build Log (for compile errors): [...truncated 9072 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-trunk - Build # 1545 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-trunk/1545/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:85) at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:58) at org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132) at org.apache.lucene.store.RAMOutputStream.copyBytes(RAMOutputStream.java:171) at org.apache.lucene.store.MockIndexOutputWrapper.copyBytes(MockIndexOutputWrapper.java:155) at org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:222) at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:188) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:140) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3216) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1753) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1748) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1744) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2463) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1180) at org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting(TestIndexWriter.java:2719) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175) Build Log (for compile errors): [...truncated 11899 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026823#comment-13026823 ] Lance Norskog commented on SOLR-2242: - I changed it to 'facet.numTerms'. There is still a big performance problem: numTerms builds the entire list of facets and then reports the length of the list. This could be done more efficiently. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026826#comment-13026826 ] Bill Bell commented on SOLR-2242: - I am not seeing the performance problem. If you are outputting facets anyways, the loop and list is going to be called. So in that case it is as efficient as probably can be. That is why I had the 0/1/2. I was reusing the code and just looking at the list size: countFacetTerms.size() counts.size() There is a lot of logic in getListedTermCounts() and getTermCountsLimit(). If we optimize, and just add a counter, we need to make sure the new methods are not forgotten about (test cases?). I have seen that happen numerous times. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026827#comment-13026827 ] Bill Bell commented on SOLR-2242: - Also I thought you wanted to change the name to numNames? I am okay with numTerms too. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026828#comment-13026828 ] Bill Bell commented on SOLR-2242: - It would be good to be able to cache the value, instead of building a list that is cached too. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2191) Change SolrException cstrs that take Throwable to default to alreadyLogged=false
[ https://issues.apache.org/jira/browse/SOLR-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026830#comment-13026830 ] David Smiley commented on SOLR-2191: Thanks for clarification on Review Then Commit (RTC); I verified through ASF's documentation that indeed three votes is necessary. Wow, it'd take forever to get things done that way; it seems impractical for anything but perhaps security related code (e.g. crypto). I didn't see any info exceptions for minor changes (e.g. adding documentation, code formatting). I'm glad we don't do that officially. I'm about to take the conversation further on the dev list RE jira issues falling through the cracks... Change SolrException cstrs that take Throwable to default to alreadyLogged=false Key: SOLR-2191 URL: https://issues.apache.org/jira/browse/SOLR-2191 Project: Solr Issue Type: Bug Reporter: Mark Miller Fix For: Next Attachments: SOLR-2191.patch Because of misuse, many exceptions are now not logged at all - can be painful when doing dev. I think we should flip this setting and work at removing any double logging - losing logging is worse (and it almost looks like we lose more logging than we would get in double logging) - and bad solrexception/logging patterns are proliferating. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
jira issues falling off the radar -- Next JIRA version
(Comments on SOLR-2191 between Mark I were starting to get off-topic with respect to the issue so I am continuing the conversation here) A lot of JIRA issues seem to fall off the radar, IMO. I'm talking about issues that have patches and are basically ready to go. There are multiple ways to address this but at the moment I am going to just bring up one. Looking at the versions in JIRA one can assign an issue to https://issues.apache.org/jira/browse/SOLR#selectedTab=com.atlassian.jira.plugin.system.project%3Aversions-panel I see the version named Next, with this description: Placeholder for commiters to track issues that are not ready to commit, but seem close enough to being ready to warrant focus before the next feature release. This version and what it implies is a common pattern in use of JIRA that I too use for projects I manage for my employer. It appears that for the 3.1 release, nobody looked through the issues assigned to Next, and consequently, some issues like SOLR-2191 were forgotten despite being ready to go. Looking through the wiki I see information on how to do a release http://wiki.apache.org/solr/HowToRelease and release suggestions but no information on what to do in advance of a release. I also don't see any administrative tasks on managing the Next version in JIRA. So I think either the Next version should be used effectively, or if that isn't going to happen then delete this version. On a related note, I don't know what to make of the 1.5 version, nor what to make of issues marked as Closed for Next. Some house cleaning is in order. Thoughts? ~ David Smiley - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book-- View this message in context: http://lucene.472066.n3.nabble.com/jira-Created-SOLR-2191-Change-SolrException-cstrs-that-take-Throwable-to-default-to-alreadyLogged-fae-tp1763003p2878021.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2471) Localparams not working with 2 fq parameters using qt=name
[ https://issues.apache.org/jira/browse/SOLR-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026842#comment-13026842 ] Bill Bell commented on SOLR-2471: - Yonik, I am not sure why we cannot communicate? I know how to AND two dismax queries... Just throw them both into two fq params. Pretty simple. There are other ways to do it too. But this is not my question. Is it possible to have two QT parameters in the same call to Solr? I would like each of these fq params to have a pre-defined qt list of parameters for the localparams. fq={!dismax qt=second}bill fq={!dismax qt=third}tom q=jones qt=first defType=dismax In solrconfig: qt=second would be defined with qf=name and other params like mm. qt=third would be defined with qf=name2 and other params like mm. But I guess this is not possible since ALL the params are being loaded from including the 2 fq ones. If you want you can close this, but it would be a nice feature. I would want it to work that each localParams are set by second and third. Localparams not working with 2 fq parameters using qt=name -- Key: SOLR-2471 URL: https://issues.apache.org/jira/browse/SOLR-2471 Project: Solr Issue Type: Bug Reporter: Bill Bell We are having a problem with the following query. If we have two localparams (using fq) and use QT= it does not work. This does not find any results: http://localhost:8983/solr/provs/select?qname=johnqspec=dentfq={!type=dismax qt=namespec v=$qspec}fq={!type=dismax qt=dismaxname v=$qname}q=_val_:{!type=dismax qt=namespec v=$qspec} _val_:{!type=dismax qt=dismaxname v=$qname}fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_namewt=csvfacet=truefacet.field=specialties_descsort=score descrows=1000start=0 This works okay. It returns a few results. http://localhost:8983/solr/provs/select?qname=johnqspec=dentfq={!type=dismax qf=$qqf v=$qspec}fq={!type=dismax qt=dismaxname v=$qname}q=_val_:{!type=dismax qf=$qqf v=$qspec} _val_:{!type=dismax qt=dismaxname v=$qname} qqf=specialties_ngram^1.0 specialties_search^2.0fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_namewt=csvfacet=truefacet.field=specialties_descsort=score descrows=1000start=0 We would like to use a QT for both terms but it seems there is some kind of bug when using two localparams and dismax filters with QT. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026863#comment-13026863 ] Lance Norskog commented on SOLR-2242: - bq. There is a lot of logic in getListedTermCounts() and getTermCountsLimit(). If we optimize, and just add a counter, we need to make sure the new methods are not forgotten about (test cases?). I have seen that happen numerous times. Ayup. In fact this breaks SimpleFacetsTest. Everything in facets need tests. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org