[jira] [Created] (SOLR-2480) Text extraction of password protected files

2011-04-28 Thread Shinichiro Abe (JIRA)
Text extraction of password protected files
---

 Key: SOLR-2480
 URL: https://issues.apache.org/jira/browse/SOLR-2480
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 3.1
Reporter: Shinichiro Abe
Priority: Minor


Proposal:
There are password-protected files. PDF, Office documents in 2007 format/97 
format.
These files are posted using SolrCell.
We do not have to read these files if we do not know the reading password of 
files.
So, these files may not be extracted text.
My requirement is that these files should be processed normally without 
extracting text, and without throwing exception.

This background:
Now, when you post a password-protected file, solr returns 500 server error.
Solr catches the error in ExtractingDocumentLoader and throws TikException.

I use ManifoldCF.
If the solr server responds 500, ManifoldCF judge is that this
document should be retried because I have absolutely no idea what
happened.
And it attempts to retry posting many times without getting the password.

In the other case, my customer posts the files with embedded images.
Sometimes it seems that solr throws TikaException of unknown cause.
He wants to post just metadata without extracting text, but makes him stop 
posting by the exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2480) Text extraction of password protected files

2011-04-28 Thread Shinichiro Abe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026137#comment-13026137
 ] 

Shinichiro Abe commented on SOLR-2480:
--

Improvement ideas:
1, TikaException is always ignored, and index only the metadata.
2, Parameter ignoreTikaException is provided newly.
If it is true then it returns 200 response, if it is false then it throws 
TikaException.
3, If Solr can catch internal exception about encrypting error, it changes 
return code each exception.
If it can judge poi.EncryptedDocumentException, 
pdfbox.exceptions.CryptographyException. etc. then it returns 200 or another 
code response, if it judges the other exception then it throws TikaException.

 Text extraction of password protected files
 ---

 Key: SOLR-2480
 URL: https://issues.apache.org/jira/browse/SOLR-2480
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 3.1
Reporter: Shinichiro Abe
Priority: Minor

 Proposal:
 There are password-protected files. PDF, Office documents in 2007 format/97 
 format.
 These files are posted using SolrCell.
 We do not have to read these files if we do not know the reading password of 
 files.
 So, these files may not be extracted text.
 My requirement is that these files should be processed normally without 
 extracting text, and without throwing exception.
 This background:
 Now, when you post a password-protected file, solr returns 500 server error.
 Solr catches the error in ExtractingDocumentLoader and throws TikException.
 I use ManifoldCF.
 If the solr server responds 500, ManifoldCF judge is that this
 document should be retried because I have absolutely no idea what
 happened.
 And it attempts to retry posting many times without getting the password.
 In the other case, my customer posts the files with embedded images.
 Sometimes it seems that solr throws TikaException of unknown cause.
 He wants to post just metadata without extracting text, but makes him stop 
 posting by the exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: modularization discussion

2011-04-28 Thread Grant Ingersoll


On Apr 26, 2011, at 11:12 PM, Chris Male gento...@gmail.com wrote:

  The two sides/takes seem to be (with some example reasons):
  1. pro: for example, modularization can expose features that were
  traditionally in solr to lucene users.
 
 Some other Pros:
 Easier to test individual pieces.  Easier to benchmark.
 More usage == more/better features/functionality for everyone.
 Easier for people to contribute to without having to know the full stack.
 I think most people agree that decoupled, reusable modules are a good thing 
 in general as an abstract concept, but, of course, specifics matter.
 
  2. con: for example, modularization slows development of these
  features and they will evolve slower if they are in lucene.
 
 
 I think this needs a bit more explanation.  AIUI, the primary cause for 
 concern is that by making something a module, you are taking a private, 
 internal API of Solr's and now making it a public API that must be maintained 
 (and backwards maintained) which could slow down development as one now needs 
 to be concerned with more factors than you would if it were merely an 
 implementation detail in Solr.
 
 I feel this can be flipped around and seen as a pro though too.  

Agreed. Wasnt sure where to put it. Some see it as bad, some as good

 Taking internal code and making it public can be beneficial for that code, 
 because it forces the APIs to be examined, test coverage improved, and a 
 general 'kicking of the tyres'.  With private internal APIs, there is always 
 a temptation to make quick changes that meet an immediate need, rather than 
 having to step back and take more time considering changes.  That can slow 
 things down yes, but it definitely has its benefits.
  
 
 Other Cons:
 The concern was that Solr just becomes an uninteresting, empty shell that 
 glues together modules. (I don't agree, but wanted to present what I have 
 heard)
 
 
 
  I think we need to somehow get a better understanding of both sides,
  specific examples of portions of the code would be helpful I think.
  Maybe then we can arrive at a compromise so that we aren't so
  frustrated about this issue.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 
 -- 
 Chris Male | Software Developer | JTeam BV.| www.jteam.nl


[jira] [Commented] (LUCENE-3023) Land DWPT on trunk

2011-04-28 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026144#comment-13026144
 ] 

Simon Willnauer commented on LUCENE-3023:
-

{quote}
I noticed in the branch the test method is changed to _testIndexingThenDeleting 
(disabled).

However, if i re-enable it (rename it back) it never seems to finish...
{quote}

I just reenabled, fixed and committed that testcase on branch.

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, diffMccand.py, 
 realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3023) Land DWPT on trunk

2011-04-28 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026145#comment-13026145
 ] 

Simon Willnauer commented on LUCENE-3023:
-

bq. Attached is the DWPT branch in patch format against trunk (for easier 
reviewing).
Awesome, Thanks Robert!!!

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, diffMccand.py, 
 realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: modularization discussion

2011-04-28 Thread Greg Stein
On Wed, Apr 27, 2011 at 09:25:14AM -0400, Yonik Seeley wrote:
...
 But as I said... it seems only fair to meet half way and use the solr 
 namespace
 for some modules and the lucene namespace for others.

Please explain this part to me... I really don't understand.

What does fairness have to do with the codebase? Isn't the whole
point of the Lucene project to create the best code possible, for the
benefit of our worldwide users?

How does the concept of fairness fit into that?

Cheers,
-g

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3033) TestAddIndexes#testAddIndexesWithThreads fails on Realtime

2011-04-28 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026156#comment-13026156
 ] 

Simon Willnauer commented on LUCENE-3033:
-

I think I have found the root cause of this issue. There was a chance of 
missing a DWPT during a full flush / commit when a full flush is started while 
we release a new DWPT into the pool. All subsequent documents added to this 
DWPT will never get flushed neither if I commit nor when I close the writer. 
This also explains the failures in LUCENE-3035. I committed a fix in Revision: 
1097156. I will resolve those issues since both haven't occurred anymore 
running them while(1) the entire night.

 TestAddIndexes#testAddIndexesWithThreads fails on Realtime
 --

 Key: LUCENE-3033
 URL: https://issues.apache.org/jira/browse/LUCENE-3033
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
 Fix For: Realtime Branch


 Selckin reported two failures on LUCENE-3023 which I can unfortunately not 
 reproduce at all. here are the traces
 {noformat}
   [junit] Testsuite: org.apache.lucene.index.TestAddIndexes
 [junit] Testcase: 
 testAddIndexesWithThreads(org.apache.lucene.index.TestAddIndexes):  FAILED
 [junit] expected:3160 but was:3060
 [junit] junit.framework.AssertionFailedError: expected:3160 but 
 was:3060
 [junit]   at 
 org.apache.lucene.index.TestAddIndexes.testAddIndexesWithThreads(TestAddIndexes.java:783)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154)
 [junit] 
 [junit] 
 [junit] Tests run: 18, Failures: 1, Errors: 0, Time elapsed: 14.272 sec
 [junit] 
 [junit] - Standard Error -
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestAddIndexes 
 -Dtestmethod=testAddIndexesWithThreads 
 -Dtests.seed=6128854208955988865:2552774338676281184
 [junit] NOTE: test params are: codec=PreFlex, locale=no_NO_NY, 
 timezone=America/Edmonton
 [junit] NOTE: all tests run in this JVM:
 [junit] [TestToken, TestDateTools, Test2BTerms, TestAddIndexes]
 [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_24 
 (64-bit)/cpus=8,threads=1,free=84731792,total=258080768
 [junit] -  ---
 {noformat}
 and 
 {noformat}
 [junit] Testsuite: org.apache.lucene.index.TestAddIndexes
 [junit] Testcase: 
 testAddIndexesWithThreads(org.apache.lucene.index.TestAddIndexes):  FAILED
 [junit] expected:3160 but was:3060
 [junit] junit.framework.AssertionFailedError: expected:3160 but 
 was:3060
 [junit]   at 
 org.apache.lucene.index.TestAddIndexes.testAddIndexesWithThreads(TestAddIndexes.java:783)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154)
 [junit] 
 [junit] 
 [junit] Tests run: 18, Failures: 1, Errors: 0, Time elapsed: 14.841 sec
 [junit] 
 [junit] - Standard Error -
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestAddIndexes 
 -Dtestmethod=testAddIndexesWithThreads 
 -Dtests.seed=4502815121171887759:-6764285049309266272
 [junit] NOTE: test params are: codec=PreFlex, locale=tr_TR, 
 timezone=Mexico/BajaNorte
 [junit] NOTE: all tests run in this JVM:
 [junit] [TestToken, TestDateTools, Test2BTerms, TestAddIndexes]
 [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_24 
 (64-bit)/cpus=8,threads=1,free=163663416,total=243335168
 [junit] -  ---
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3033) TestAddIndexes#testAddIndexesWithThreads fails on Realtime

2011-04-28 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3033.
-

Resolution: Fixed

fixed in Revision: 1097156

 TestAddIndexes#testAddIndexesWithThreads fails on Realtime
 --

 Key: LUCENE-3033
 URL: https://issues.apache.org/jira/browse/LUCENE-3033
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
 Fix For: Realtime Branch


 Selckin reported two failures on LUCENE-3023 which I can unfortunately not 
 reproduce at all. here are the traces
 {noformat}
   [junit] Testsuite: org.apache.lucene.index.TestAddIndexes
 [junit] Testcase: 
 testAddIndexesWithThreads(org.apache.lucene.index.TestAddIndexes):  FAILED
 [junit] expected:3160 but was:3060
 [junit] junit.framework.AssertionFailedError: expected:3160 but 
 was:3060
 [junit]   at 
 org.apache.lucene.index.TestAddIndexes.testAddIndexesWithThreads(TestAddIndexes.java:783)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154)
 [junit] 
 [junit] 
 [junit] Tests run: 18, Failures: 1, Errors: 0, Time elapsed: 14.272 sec
 [junit] 
 [junit] - Standard Error -
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestAddIndexes 
 -Dtestmethod=testAddIndexesWithThreads 
 -Dtests.seed=6128854208955988865:2552774338676281184
 [junit] NOTE: test params are: codec=PreFlex, locale=no_NO_NY, 
 timezone=America/Edmonton
 [junit] NOTE: all tests run in this JVM:
 [junit] [TestToken, TestDateTools, Test2BTerms, TestAddIndexes]
 [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_24 
 (64-bit)/cpus=8,threads=1,free=84731792,total=258080768
 [junit] -  ---
 {noformat}
 and 
 {noformat}
 [junit] Testsuite: org.apache.lucene.index.TestAddIndexes
 [junit] Testcase: 
 testAddIndexesWithThreads(org.apache.lucene.index.TestAddIndexes):  FAILED
 [junit] expected:3160 but was:3060
 [junit] junit.framework.AssertionFailedError: expected:3160 but 
 was:3060
 [junit]   at 
 org.apache.lucene.index.TestAddIndexes.testAddIndexesWithThreads(TestAddIndexes.java:783)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154)
 [junit] 
 [junit] 
 [junit] Tests run: 18, Failures: 1, Errors: 0, Time elapsed: 14.841 sec
 [junit] 
 [junit] - Standard Error -
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestAddIndexes 
 -Dtestmethod=testAddIndexesWithThreads 
 -Dtests.seed=4502815121171887759:-6764285049309266272
 [junit] NOTE: test params are: codec=PreFlex, locale=tr_TR, 
 timezone=Mexico/BajaNorte
 [junit] NOTE: all tests run in this JVM:
 [junit] [TestToken, TestDateTools, Test2BTerms, TestAddIndexes]
 [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_24 
 (64-bit)/cpus=8,threads=1,free=163663416,total=243335168
 [junit] -  ---
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3035) TestIndexWriter.testCommitThreadSafety fails on realtime_search branch

2011-04-28 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026159#comment-13026159
 ] 

Simon Willnauer commented on LUCENE-3035:
-

I think I have found the root cause of this issue. There was a chance of 
missing a DWPT during a full flush / commit when a full flush is started while 
we release a new DWPT into the pool. All subsequent documents added to this 
DWPT will never get flushed neither if I commit nor when I close the writer. 
This also explains the failures in LUCENE-3033. I committed a fix in Revision: 
1097156. I will resolve those issues since both haven't occurred anymore 
running them while(1) the entire night.

 TestIndexWriter.testCommitThreadSafety fails on realtime_search branch
 --

 Key: LUCENE-3035
 URL: https://issues.apache.org/jira/browse/LUCENE-3035
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
 Fix For: Realtime Branch


 Hudson failed on RT with this error - I wasn't able to reproduce yet
 {noformat}
 NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
 -Dtestmethod=testCommitThreadSafety 
 -Dtests.seed=410261592077577885:-4099127561715488589 -Dtests.multiplier=3
 NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
 -Dtestmethod=testCommitThreadSafety 
 -Dtests.seed=410261592077577885:-4099127561715488589 -Dtests.multiplier=3
 The following exceptions were thrown by threads:
 *** Thread: Thread-331 ***
 java.lang.RuntimeException: java.lang.AssertionError: term=f:2_0; 
 r=DirectoryReader(segments_6 _8(4.0):Cv7) expected:1 but was:0
   at 
 org.apache.lucene.index.TestIndexWriter$5.run(TestIndexWriter.java:2416)
 Caused by: java.lang.AssertionError: term=f:2_0; r=DirectoryReader(segments_6 
 _8(4.0):Cv7) expected:1 but was:0
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.failNotEquals(Assert.java:645)
   at org.junit.Assert.assertEquals(Assert.java:126)
   at org.junit.Assert.assertEquals(Assert.java:470)
   at 
 org.apache.lucene.index.TestIndexWriter$5.run(TestIndexWriter.java:2410)
 NOTE: test params are: codec=RandomCodecProvider: {=SimpleText, 
 f6=MockVariableIntBlock(baseBlockSize=91), 
 f7=MockFixedIntBlock(blockSize=1289), f8=Standard, f9=MockRandom, f1=MockSep, 
 f0=Pulsing(freqCutoff=15), f3=Pulsing(freqCutoff=15), 
 f2=MockFixedIntBlock(blockSize=1289), 
 f5=MockVariableIntBlock(baseBlockSize=91), f4=MockRandom, f=MockSep, 
 c=MockVariableIntBlock(baseBlockSize=91), termVector=SimpleText, 
 d9=SimpleText, d8=MockSep, d5=MockVariableIntBlock(baseBlockSize=91), 
 d4=MockRandom, d7=Standard, d6=SimpleText, d25=Standard, 
 d0=MockVariableIntBlock(baseBlockSize=91), c29=Standard, d24=SimpleText, 
 d1=MockFixedIntBlock(blockSize=1289), c28=MockFixedIntBlock(blockSize=1289), 
 d23=MockVariableIntBlock(baseBlockSize=91), d2=Standard, 
 c27=MockVariableIntBlock(baseBlockSize=91), d22=MockRandom, d3=MockRandom, 
 d21=MockFixedIntBlock(blockSize=1289), 
 d20=MockVariableIntBlock(baseBlockSize=91), 
 c22=MockVariableIntBlock(baseBlockSize=91), c21=MockRandom, 
 c20=Pulsing(freqCutoff=15), d29=MockVariableIntBlock(baseBlockSize=91), 
 c26=SimpleText, d28=MockRandom, c25=MockSep, d27=Pulsing(freqCutoff=15), 
 c24=MockRandom, d26=MockFixedIntBlock(blockSize=1289), c23=Standard, 
 e9=MockRandom, e8=MockFixedIntBlock(blockSize=1289), 
 e7=MockVariableIntBlock(baseBlockSize=91), e6=MockSep, 
 e5=Pulsing(freqCutoff=15), c17=Standard, 
 e3=MockFixedIntBlock(blockSize=1289), d12=SimpleText, c16=SimpleText, 
 e4=Pulsing(freqCutoff=15), d11=MockSep, c19=MockSep, e1=MockSep, 
 d14=Pulsing(freqCutoff=15), c18=Pulsing(freqCutoff=15), e2=SimpleText, 
 d13=MockFixedIntBlock(blockSize=1289), e0=Standard, d10=Standard, 
 d19=Pulsing(freqCutoff=15), c11=SimpleText, c10=MockSep, d16=MockRandom, 
 c13=MockSep, c12=Pulsing(freqCutoff=15), d15=Standard, d18=SimpleText, 
 c15=MockFixedIntBlock(blockSize=1289), d17=MockSep, 
 c14=MockVariableIntBlock(baseBlockSize=91), b3=MockRandom, b2=Standard, 
 b5=SimpleText, b4=MockSep, b7=MockSep, b6=Pulsing(freqCutoff=15), 
 d50=MockVariableIntBlock(baseBlockSize=91), 
 b9=MockFixedIntBlock(blockSize=1289), 
 b8=MockVariableIntBlock(baseBlockSize=91), d43=Pulsing(freqCutoff=15), 
 d42=MockFixedIntBlock(blockSize=1289), d41=SimpleText, d40=MockSep, 
 d47=MockRandom, d46=Standard, b0=SimpleText, 
 d45=MockFixedIntBlock(blockSize=1289), b1=Standard, 
 d44=MockVariableIntBlock(baseBlockSize=91), d49=MockSep, 
 d48=Pulsing(freqCutoff=15), c6=MockVariableIntBlock(baseBlockSize=91), 
 c5=MockRandom, c4=Pulsing(freqCutoff=15), 
 c3=MockFixedIntBlock(blockSize=1289), c9=MockSep, c8=MockRandom, c7=Standard, 
 d30=MockFixedIntBlock(blockSize=1289), 

[jira] [Resolved] (LUCENE-3035) TestIndexWriter.testCommitThreadSafety fails on realtime_search branch

2011-04-28 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3035.
-

Resolution: Fixed

fix in Revision: 1097156

 TestIndexWriter.testCommitThreadSafety fails on realtime_search branch
 --

 Key: LUCENE-3035
 URL: https://issues.apache.org/jira/browse/LUCENE-3035
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
 Fix For: Realtime Branch


 Hudson failed on RT with this error - I wasn't able to reproduce yet
 {noformat}
 NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
 -Dtestmethod=testCommitThreadSafety 
 -Dtests.seed=410261592077577885:-4099127561715488589 -Dtests.multiplier=3
 NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
 -Dtestmethod=testCommitThreadSafety 
 -Dtests.seed=410261592077577885:-4099127561715488589 -Dtests.multiplier=3
 The following exceptions were thrown by threads:
 *** Thread: Thread-331 ***
 java.lang.RuntimeException: java.lang.AssertionError: term=f:2_0; 
 r=DirectoryReader(segments_6 _8(4.0):Cv7) expected:1 but was:0
   at 
 org.apache.lucene.index.TestIndexWriter$5.run(TestIndexWriter.java:2416)
 Caused by: java.lang.AssertionError: term=f:2_0; r=DirectoryReader(segments_6 
 _8(4.0):Cv7) expected:1 but was:0
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.failNotEquals(Assert.java:645)
   at org.junit.Assert.assertEquals(Assert.java:126)
   at org.junit.Assert.assertEquals(Assert.java:470)
   at 
 org.apache.lucene.index.TestIndexWriter$5.run(TestIndexWriter.java:2410)
 NOTE: test params are: codec=RandomCodecProvider: {=SimpleText, 
 f6=MockVariableIntBlock(baseBlockSize=91), 
 f7=MockFixedIntBlock(blockSize=1289), f8=Standard, f9=MockRandom, f1=MockSep, 
 f0=Pulsing(freqCutoff=15), f3=Pulsing(freqCutoff=15), 
 f2=MockFixedIntBlock(blockSize=1289), 
 f5=MockVariableIntBlock(baseBlockSize=91), f4=MockRandom, f=MockSep, 
 c=MockVariableIntBlock(baseBlockSize=91), termVector=SimpleText, 
 d9=SimpleText, d8=MockSep, d5=MockVariableIntBlock(baseBlockSize=91), 
 d4=MockRandom, d7=Standard, d6=SimpleText, d25=Standard, 
 d0=MockVariableIntBlock(baseBlockSize=91), c29=Standard, d24=SimpleText, 
 d1=MockFixedIntBlock(blockSize=1289), c28=MockFixedIntBlock(blockSize=1289), 
 d23=MockVariableIntBlock(baseBlockSize=91), d2=Standard, 
 c27=MockVariableIntBlock(baseBlockSize=91), d22=MockRandom, d3=MockRandom, 
 d21=MockFixedIntBlock(blockSize=1289), 
 d20=MockVariableIntBlock(baseBlockSize=91), 
 c22=MockVariableIntBlock(baseBlockSize=91), c21=MockRandom, 
 c20=Pulsing(freqCutoff=15), d29=MockVariableIntBlock(baseBlockSize=91), 
 c26=SimpleText, d28=MockRandom, c25=MockSep, d27=Pulsing(freqCutoff=15), 
 c24=MockRandom, d26=MockFixedIntBlock(blockSize=1289), c23=Standard, 
 e9=MockRandom, e8=MockFixedIntBlock(blockSize=1289), 
 e7=MockVariableIntBlock(baseBlockSize=91), e6=MockSep, 
 e5=Pulsing(freqCutoff=15), c17=Standard, 
 e3=MockFixedIntBlock(blockSize=1289), d12=SimpleText, c16=SimpleText, 
 e4=Pulsing(freqCutoff=15), d11=MockSep, c19=MockSep, e1=MockSep, 
 d14=Pulsing(freqCutoff=15), c18=Pulsing(freqCutoff=15), e2=SimpleText, 
 d13=MockFixedIntBlock(blockSize=1289), e0=Standard, d10=Standard, 
 d19=Pulsing(freqCutoff=15), c11=SimpleText, c10=MockSep, d16=MockRandom, 
 c13=MockSep, c12=Pulsing(freqCutoff=15), d15=Standard, d18=SimpleText, 
 c15=MockFixedIntBlock(blockSize=1289), d17=MockSep, 
 c14=MockVariableIntBlock(baseBlockSize=91), b3=MockRandom, b2=Standard, 
 b5=SimpleText, b4=MockSep, b7=MockSep, b6=Pulsing(freqCutoff=15), 
 d50=MockVariableIntBlock(baseBlockSize=91), 
 b9=MockFixedIntBlock(blockSize=1289), 
 b8=MockVariableIntBlock(baseBlockSize=91), d43=Pulsing(freqCutoff=15), 
 d42=MockFixedIntBlock(blockSize=1289), d41=SimpleText, d40=MockSep, 
 d47=MockRandom, d46=Standard, b0=SimpleText, 
 d45=MockFixedIntBlock(blockSize=1289), b1=Standard, 
 d44=MockVariableIntBlock(baseBlockSize=91), d49=MockSep, 
 d48=Pulsing(freqCutoff=15), c6=MockVariableIntBlock(baseBlockSize=91), 
 c5=MockRandom, c4=Pulsing(freqCutoff=15), 
 c3=MockFixedIntBlock(blockSize=1289), c9=MockSep, c8=MockRandom, c7=Standard, 
 d30=MockFixedIntBlock(blockSize=1289), d32=MockRandom, d31=Standard, 
 c1=MockVariableIntBlock(baseBlockSize=91), d34=Standard, 
 c2=MockFixedIntBlock(blockSize=1289), d33=SimpleText, d36=MockSep, 
 c0=MockSep, d35=Pulsing(freqCutoff=15), 
 d38=MockVariableIntBlock(baseBlockSize=91), d37=MockRandom, d39=SimpleText, 
 e92=MockFixedIntBlock(blockSize=1289), e93=Pulsing(freqCutoff=15), 
 e90=MockSep, e91=SimpleText, e89=MockVariableIntBlock(baseBlockSize=91), 
 e88=MockSep, e87=Pulsing(freqCutoff=15), e86=SimpleText, e85=MockSep, 
 e84=MockRandom, 

[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking

2011-04-28 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026175#comment-13026175
 ] 

Chris Male commented on LUCENE-3041:


bq. This is an excellent opportunity to redefine Queries as immutable

What do you envisage this involving? Although not required by the API, most 
rewriting implementations make a new Query and add changes there, leaving 
themselves untouched.  Are you wanting to require this somehow?

 Support Query Visting / Walking
 ---

 Key: LUCENE-3041
 URL: https://issues.apache.org/jira/browse/LUCENE-3041
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Chris Male
Priority: Minor
 Attachments: LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch, 
 LUCENE-3041.patch


 Out of the discussion in LUCENE-2868, it could be useful to add a generic 
 Query Visitor / Walker that could be used for more advanced rewriting, 
 optimizations or anything that requires state to be stored as each Query is 
 visited.
 We could keep the interface very simple:
 {code}
 public interface QueryVisitor {
   Query visit(Query query);
 }
 {code}
 and then use a reflection based visitor like Earwin suggested, which would 
 allow implementators to provide visit methods for just Querys that they are 
 interested in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3023) Land DWPT on trunk

2011-04-28 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3023:


Attachment: LUCENE-3023_iw_iwc_jdoc.patch

I went through IW and IWC jdocs to bring them uptodate. here is a patch against 
the branch... Review from a native speaker would be very much welcome 

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, 
 LUCENE-3023_iw_iwc_jdoc.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, 
 realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3023) Land DWPT on trunk

2011-04-28 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026223#comment-13026223
 ] 

Simon Willnauer commented on LUCENE-3023:
-

bq. Can we rename Healthiness - DocumentsWriterStallControl (or something like 
that)?
I added a TODO for this - lets do that on trunk

bq. I think we lost this infoStream output from trunk?
just committed a fix for this to reenable it on branch.. kind of tricky since 
we now flush concurrently NumberFormat must be accessed single threaded. So I 
added it to DWPT#flush()

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, 
 LUCENE-3023_iw_iwc_jdoc.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, 
 realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3023) Land DWPT on trunk

2011-04-28 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026229#comment-13026229
 ] 

Simon Willnauer commented on LUCENE-3023:
-

FYI - I committed the javadoc changes to branch after mikes +1 on IRC. I also 
marked IWC#setIndexerThreadPool as expert API

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, 
 LUCENE-3023_iw_iwc_jdoc.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, 
 realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2481) Add support for commitWithin in DataImportHandler

2011-04-28 Thread Sami Siren (JIRA)
Add support for commitWithin in DataImportHandler
-

 Key: SOLR-2481
 URL: https://issues.apache.org/jira/browse/SOLR-2481
 Project: Solr
  Issue Type: Improvement
Reporter: Sami Siren
Priority: Trivial


It looks like DataImportHandler does not support commitWithin. Would be nice if 
it did.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2481) Add support for commitWithin in DataImportHandler

2011-04-28 Thread Sami Siren (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sami Siren updated SOLR-2481:
-

Attachment: SOLR-2481.patch

initial patch

 Add support for commitWithin in DataImportHandler
 -

 Key: SOLR-2481
 URL: https://issues.apache.org/jira/browse/SOLR-2481
 Project: Solr
  Issue Type: Improvement
Reporter: Sami Siren
Priority: Trivial
 Attachments: SOLR-2481.patch


 It looks like DataImportHandler does not support commitWithin. Would be nice 
 if it did.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-04-28 Thread Upayavira (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026256#comment-13026256
 ] 

Upayavira commented on SOLR-2399:
-

I've just seen the admin console fail on IE9, with and without compatability 
mode. Basically, the menu box showed up, but nothing on the right hand box, and 
no loading message, when running on Solr 3.1. The same system worked nicely 
on Firefox Windows, Firefox Mac, etc.

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Minor
 Fix For: 4.0


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin 
 Quick Tour: [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png], 
 [Query-Form|http://files.mathe.is/solr-admin/02_query.png], 
 [Plugins|http://files.mathe.is/solr-admin/05_plugins.png], 
 [Logging|http://files.mathe.is/solr-admin/07_logging.png], 
 [Analysis|http://files.mathe.is/solr-admin/04_analysis.png], 
 [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3023) Land DWPT on trunk

2011-04-28 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3023:


Attachment: LUCENE-3023_svndiff.patch

attached (LUCENE-3023_svndiff.patch) is just the output from 'svn diff' after 
merging, for reviewing property changes and similar police work :)


 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, 
 LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_svndiff.patch, diffMccand.py, 
 realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3023) Land DWPT on trunk

2011-04-28 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3023:


Attachment: LUCENE-3023_simonw_review.patch

here is my first review round. I manually ported some missed merges from trunk 
and fixed some really minor things.

I will commit shortly to branch if nobody objects

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, 
 LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, 
 LUCENE-3023_svndiff.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, 
 realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2480) Text extraction of password protected files

2011-04-28 Thread Erick Erickson
Hmmm, I'm not sure this fits into Solr-445 or not, could you add this
comment to that
patch discussion so we at least look?

Thanks,
Erick

On Thu, Apr 28, 2011 at 2:03 AM, Shinichiro Abe (JIRA) j...@apache.org wrote:

    [ 
 https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026137#comment-13026137
  ]

 Shinichiro Abe commented on SOLR-2480:
 --

 Improvement ideas:
 1, TikaException is always ignored, and index only the metadata.
 2, Parameter ignoreTikaException is provided newly.
 If it is true then it returns 200 response, if it is false then it throws 
 TikaException.
 3, If Solr can catch internal exception about encrypting error, it changes 
 return code each exception.
 If it can judge poi.EncryptedDocumentException, 
 pdfbox.exceptions.CryptographyException. etc. then it returns 200 or another 
 code response, if it judges the other exception then it throws TikaException.

 Text extraction of password protected files
 ---

                 Key: SOLR-2480
                 URL: https://issues.apache.org/jira/browse/SOLR-2480
             Project: Solr
          Issue Type: Improvement
          Components: contrib - Solr Cell (Tika extraction)
    Affects Versions: 3.1
            Reporter: Shinichiro Abe
            Priority: Minor

 Proposal:
 There are password-protected files. PDF, Office documents in 2007 format/97 
 format.
 These files are posted using SolrCell.
 We do not have to read these files if we do not know the reading password of 
 files.
 So, these files may not be extracted text.
 My requirement is that these files should be processed normally without 
 extracting text, and without throwing exception.
 This background:
 Now, when you post a password-protected file, solr returns 500 server error.
 Solr catches the error in ExtractingDocumentLoader and throws TikException.
 I use ManifoldCF.
 If the solr server responds 500, ManifoldCF judge is that this
 document should be retried because I have absolutely no idea what
 happened.
 And it attempts to retry posting many times without getting the password.
 In the other case, my customer posts the files with embedded images.
 Sometimes it seems that solr throws TikaException of unknown cause.
 He wants to post just metadata without extracting text, but makes him stop 
 posting by the exception.

 --
 This message is automatically generated by JIRA.
 For more information on JIRA, see: http://www.atlassian.com/software/jira

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3023) Land DWPT on trunk

2011-04-28 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026295#comment-13026295
 ] 

Simon Willnauer commented on LUCENE-3023:
-

bq. I will commit shortly to branch if nobody objects
committed... robert is running a new reintegration round now

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, 
 LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, 
 LUCENE-3023_svndiff.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, 
 realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-04-28 Thread Jonathan Rochkind (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026297#comment-13026297
 ] 

Jonathan Rochkind commented on SOLR-2242:
-

Wonderful much better, thanks Lance, this is a much more clear and flexible api 
consistent with other parts of Solr. (For a feature I could definitely really 
use, thanks Bill). 

But I wonder... should it be facet.numTerms to group with other facetting 
related params? Or wait, is it already?


 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3050) After RT branch lands we should remove DWPT re-use code from indexer

2011-04-28 Thread Michael McCandless (JIRA)
After RT branch lands we should remove DWPT re-use code from indexer


 Key: LUCENE-3050
 URL: https://issues.apache.org/jira/browse/LUCENE-3050
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Priority: Minor


We used to re-use data structures inside DWPT, we after RT we do not and so we 
should remove places where we go and clean stuff up (eg termsHash.reset in 
FreqProxTermsWriter)...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-04-28 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-2324.
-

Resolution: Fixed

we land this on trunk via LUCENE-3023 

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, 
 lucene-2324.patch, lucene-2324.patch, test.out, test.out, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3023) Land DWPT on trunk

2011-04-28 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3023:


Attachment: LUCENE-3023_svndiff.patch
LUCENE-3023.patch

I resynced up to r1097442 and here are the latest patches (the full patch and 
the svn diff)

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
 LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, 
 LUCENE-3023_svndiff.patch, LUCENE-3023_svndiff.patch, diffMccand.py, 
 realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3023) Land DWPT on trunk

2011-04-28 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3023:
---

Attachment: diffSources.patch

Modified the Python script a bit to do a recursive diff b/w two dirs to make an 
applyable patch -- added a usage, and a -skipWhitespace option.

I put it under a new 'dev-tools/scripts' dir...

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
 LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, 
 LUCENE-3023_svndiff.patch, LUCENE-3023_svndiff.patch, diffMccand.py, 
 diffSources.patch, realtime-TestAddIndexes-3.txt, 
 realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7527 - Failure

2011-04-28 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7527/

1 tests failed.
REGRESSION:  org.apache.solr.client.solrj.SolrExampleBinaryTest.testCommitWithin

Error Message:
expected:1 but was:0

Stack Trace:
junit.framework.AssertionFailedError: expected:1 but was:0
at 
org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:380)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)




Build Log (for compile errors):
[...truncated 9061 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 88 - Failure

2011-04-28 Thread Apache Jenkins Server
Build: 
https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/88/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestDeletionPolicy.testKeepLastNDeletionPolicyWithReader

Error Message:
expected:128 but was:129

Stack Trace:
junit.framework.AssertionFailedError: expected:128 but was:129
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
at 
org.apache.lucene.index.TestDeletionPolicy.testKeepLastNDeletionPolicyWithReader(TestDeletionPolicy.java:657)




Build Log (for compile errors):
[...truncated 3227 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 88 - Failure

2011-04-28 Thread Michael McCandless
This is actually a trunk failure:

ant test -Dtestcase=TestDeletionPolicy
-Dtestmethod=testKeepLastNDeletionPolicyWithReader
-Dtests.seed=4048962790831405116:1429420683736794142
-Dtests.multiplier=3

I'm hunting...

Mike

http://blog.mikemccandless.com

On Thu, Apr 28, 2011 at 4:42 PM, Apache Jenkins Server
hud...@hudson.apache.org wrote:
 Build: 
 https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/88/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestDeletionPolicy.testKeepLastNDeletionPolicyWithReader

 Error Message:
 expected:128 but was:129

 Stack Trace:
 junit.framework.AssertionFailedError: expected:128 but was:129
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
        at 
 org.apache.lucene.index.TestDeletionPolicy.testKeepLastNDeletionPolicyWithReader(TestDeletionPolicy.java:657)




 Build Log (for compile errors):
 [...truncated 3227 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 88 - Failure

2011-04-28 Thread Michael McCandless
OK fixed...

Mike

http://blog.mikemccandless.com

On Thu, Apr 28, 2011 at 5:45 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 This is actually a trunk failure:

    ant test -Dtestcase=TestDeletionPolicy
 -Dtestmethod=testKeepLastNDeletionPolicyWithReader
 -Dtests.seed=4048962790831405116:1429420683736794142
 -Dtests.multiplier=3

 I'm hunting...

 Mike

 http://blog.mikemccandless.com

 On Thu, Apr 28, 2011 at 4:42 PM, Apache Jenkins Server
 hud...@hudson.apache.org wrote:
 Build: 
 https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/88/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestDeletionPolicy.testKeepLastNDeletionPolicyWithReader

 Error Message:
 expected:128 but was:129

 Stack Trace:
 junit.framework.AssertionFailedError: expected:128 but was:129
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
        at 
 org.apache.lucene.index.TestDeletionPolicy.testKeepLastNDeletionPolicyWithReader(TestDeletionPolicy.java:657)




 Build Log (for compile errors):
 [...truncated 3227 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2191) Change SolrException cstrs that take Throwable to default to alreadyLogged=false

2011-04-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026714#comment-13026714
 ] 

David Smiley commented on SOLR-2191:


Yes, lets do this!  I was just about to log a bug when I found it's already 
been reported.  I had some stupid error in my solr config and it never got 
logged. Because of the error, the core never got registered into the container. 
And then when I went to do any queries, Solr kept telling me I didn't specify a 
core name when I never had to before (using the default). I was in the twilight 
zone for a while.

Mark, you're a committer yet you supplied a patch.  Why didn't you simply 
commit it?  I heard on the dev list recently that Solr is supposedly CTR 
(commit then review), yet we clearly act here as RTC.  So even if RTC is it, 
wouldn't there be some threshold to let simple things like this through without 
a review?

 Change SolrException cstrs that take Throwable to default to 
 alreadyLogged=false
 

 Key: SOLR-2191
 URL: https://issues.apache.org/jira/browse/SOLR-2191
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
 Fix For: Next

 Attachments: SOLR-2191.patch


 Because of misuse, many exceptions are now not logged at all - can be painful 
 when doing dev. I think we should flip this setting and work at removing any 
 double logging - losing logging is worse (and it almost looks like we lose 
 more logging than we would get in double logging) - and bad 
 solrexception/logging patterns are proliferating.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2191) Change SolrException cstrs that take Throwable to default to alreadyLogged=false

2011-04-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026794#comment-13026794
 ] 

Mark Miller commented on SOLR-2191:
---

bq. Why didn't you simply commit it? I heard on the dev list recently that Solr 
is supposedly CTR (commit then review), yet we clearly act here as RTC. 

Depends really - on the change and on the committer. We like to keep trunk 
extra shiny, and I think our practice is good myself. But it's up to each 
committer.

bq. So even if RTC is it, wouldn't there be some threshold to let simple things 
like this through without a review?

Yes - and many small things are simply committed. Likely when I ran into this, 
I was doing other things - and I made a quick patch, but not something I was 
willing to stake my name on as a commit. I like to do a thorough review first. 
And then this just fell off my radar. Sometimes you are just not sure of all of 
the ramifications of your change - a lot of times this is a mini side track 
while I'm doing something else, and so it's nice to just toss up a patch and 
get feedback from the likes of Hossman and others before just cowboying on 
trunk. Again though - each situation is handled by each committer based on 
their level of comfort, and the general culture of the community.

Yeah, this bug is annoying - I'm happy to look at this again soon - I happen to 
be unusually busy at this time, but I'll certainly try to get this in by this 
weekend.

 Change SolrException cstrs that take Throwable to default to 
 alreadyLogged=false
 

 Key: SOLR-2191
 URL: https://issues.apache.org/jira/browse/SOLR-2191
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
 Fix For: Next

 Attachments: SOLR-2191.patch


 Because of misuse, many exceptions are now not logged at all - can be painful 
 when doing dev. I think we should flip this setting and work at removing any 
 double logging - losing logging is worse (and it almost looks like we lose 
 more logging than we would get in double logging) - and bad 
 solrexception/logging patterns are proliferating.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2191) Change SolrException cstrs that take Throwable to default to alreadyLogged=false

2011-04-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026799#comment-13026799
 ] 

Mark Miller commented on SOLR-2191:
---

Also keep in mind that Review-Then-Commit at Apache means you need consensus 
and 3 votes before committing I believe:

See

http://www.apache.org/foundation/glossary.html


Consensus Approval
'Consensus approval' refers to a vote (sense 1) which has completed with at 
least three binding +1 votes and no vetos. Compare .

Review-Then-Commit
(Often referenced as 'RTC' or 'R-T-C'.) Commit policy which requires that all 
changes receive consensus approval in order to be committed. Compare , and see 
the description of the voting process.

 Change SolrException cstrs that take Throwable to default to 
 alreadyLogged=false
 

 Key: SOLR-2191
 URL: https://issues.apache.org/jira/browse/SOLR-2191
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
 Fix For: Next

 Attachments: SOLR-2191.patch


 Because of misuse, many exceptions are now not logged at all - can be painful 
 when doing dev. I think we should flip this setting and work at removing any 
 double logging - losing logging is worse (and it almost looks like we lose 
 more logging than we would get in double logging) - and bad 
 solrexception/logging patterns are proliferating.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2191) Change SolrException cstrs that take Throwable to default to alreadyLogged=false

2011-04-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026798#comment-13026798
 ] 

Mark Miller commented on SOLR-2191:
---

Also keep in mind that Review-Then-Commit at Apache means you need consensus 
and 3 votes before committing I believe:

See

http://www.apache.org/foundation/glossary.html


Consensus Approval
'Consensus approval' refers to a vote (sense 1) which has completed with at 
least three binding +1 votes and no vetos. Compare .

Review-Then-Commit
(Often referenced as 'RTC' or 'R-T-C'.) Commit policy which requires that all 
changes receive consensus approval in order to be committed. Compare , and see 
the description of the voting process.

 Change SolrException cstrs that take Throwable to default to 
 alreadyLogged=false
 

 Key: SOLR-2191
 URL: https://issues.apache.org/jira/browse/SOLR-2191
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
 Fix For: Next

 Attachments: SOLR-2191.patch


 Because of misuse, many exceptions are now not logged at all - can be painful 
 when doing dev. I think we should flip this setting and work at removing any 
 double logging - losing logging is worse (and it almost looks like we lose 
 more logging than we would get in double logging) - and bad 
 solrexception/logging patterns are proliferating.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7537 - Failure

2011-04-28 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7537/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
expected:2 but was:3

Stack Trace:
junit.framework.AssertionFailedError: expected:2 but was:3
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
at 
org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:208)




Build Log (for compile errors):
[...truncated 9072 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-trunk - Build # 1545 - Failure

2011-04-28 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-trunk/1545/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:85)
at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:58)
at 
org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132)
at 
org.apache.lucene.store.RAMOutputStream.copyBytes(RAMOutputStream.java:171)
at 
org.apache.lucene.store.MockIndexOutputWrapper.copyBytes(MockIndexOutputWrapper.java:155)
at 
org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:222)
at 
org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:188)
at 
org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:140)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3216)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834)
at 
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1753)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1748)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1744)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2463)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1180)
at 
org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting(TestIndexWriter.java:2719)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)




Build Log (for compile errors):
[...truncated 11899 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-04-28 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026823#comment-13026823
 ] 

Lance Norskog commented on SOLR-2242:
-

I changed it to 'facet.numTerms'.

There is still a big performance problem: numTerms builds the entire list of 
facets and then reports the length of the list. This could be done more 
efficiently. 

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-04-28 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026826#comment-13026826
 ] 

Bill Bell commented on SOLR-2242:
-

I am not seeing the performance problem.

If you are outputting facets anyways, the loop and list is going to be called. 
So in that case it is as efficient as probably can be.
That is why I had the 0/1/2. I was reusing the code and just looking at the 
list size:

countFacetTerms.size()
counts.size()

There is a lot of logic in getListedTermCounts() and getTermCountsLimit(). If 
we optimize, and just add a counter, we need to make sure 
the new methods are not forgotten about (test cases?). I have seen that happen 
numerous times.




 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-04-28 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026827#comment-13026827
 ] 

Bill Bell commented on SOLR-2242:
-

Also I thought you wanted to change the name to numNames? I am okay with 
numTerms too.

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-04-28 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026828#comment-13026828
 ] 

Bill Bell commented on SOLR-2242:
-

It would be good to be able to cache the value, instead of building a list that 
is cached too.

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2191) Change SolrException cstrs that take Throwable to default to alreadyLogged=false

2011-04-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026830#comment-13026830
 ] 

David Smiley commented on SOLR-2191:


Thanks for clarification on Review Then Commit (RTC); I verified through ASF's 
documentation that indeed three votes is necessary.  Wow, it'd take forever to 
get things done that way; it seems impractical for anything but perhaps 
security related code (e.g. crypto). I didn't see any info exceptions for 
minor changes (e.g. adding documentation, code formatting). I'm glad we don't 
do that officially.

I'm about to take the conversation further on the dev list RE jira issues 
falling through the cracks...

 Change SolrException cstrs that take Throwable to default to 
 alreadyLogged=false
 

 Key: SOLR-2191
 URL: https://issues.apache.org/jira/browse/SOLR-2191
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
 Fix For: Next

 Attachments: SOLR-2191.patch


 Because of misuse, many exceptions are now not logged at all - can be painful 
 when doing dev. I think we should flip this setting and work at removing any 
 double logging - losing logging is worse (and it almost looks like we lose 
 more logging than we would get in double logging) - and bad 
 solrexception/logging patterns are proliferating.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



jira issues falling off the radar -- Next JIRA version

2011-04-28 Thread David Smiley (@MITRE.org)
(Comments on SOLR-2191 between Mark  I were starting to get off-topic with
respect to the issue so I am continuing the conversation here)

A lot of JIRA issues seem to fall off the radar, IMO. I'm talking about
issues that have patches and are basically ready to go.  There are multiple
ways to address this but at the moment I am going to just bring up one.
Looking at the versions in JIRA one can assign an issue to
https://issues.apache.org/jira/browse/SOLR#selectedTab=com.atlassian.jira.plugin.system.project%3Aversions-panel
I see the version named Next, with this description: Placeholder for
commiters to track issues that are not ready to commit, but seem close
enough to being ready to warrant focus before the next feature release.
This version and what it implies is a common pattern in use of JIRA that I
too use for projects I manage for my employer. It appears that for the 3.1
release, nobody looked through the issues assigned to Next, and
consequently, some issues like SOLR-2191 were forgotten despite being ready
to go.  Looking through the wiki I see information on how to do a release
http://wiki.apache.org/solr/HowToRelease and release suggestions but no
information on what to do in advance of a release.  I also don't see any
administrative tasks on managing the Next version in JIRA.  So I think
either the Next version should be used effectively, or if that isn't going
to happen then delete this version.

On a related note, I don't know what to make of the 1.5 version, nor what
to make of issues marked as Closed for Next.  Some house cleaning is in
order.

Thoughts?

~ David Smiley
- Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book--
View this message in context: 
http://lucene.472066.n3.nabble.com/jira-Created-SOLR-2191-Change-SolrException-cstrs-that-take-Throwable-to-default-to-alreadyLogged-fae-tp1763003p2878021.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2471) Localparams not working with 2 fq parameters using qt=name

2011-04-28 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026842#comment-13026842
 ] 

Bill Bell commented on SOLR-2471:
-

Yonik,

I am not sure why we cannot communicate? I know how to AND two dismax 
queries... Just throw them both into two fq params. Pretty simple. There are 
other ways to do it too.

But this is not my question. Is it possible to have two QT parameters in the 
same call to Solr? I would like each of these fq params to have a pre-defined 
qt list of parameters for the localparams. 

fq={!dismax qt=second}bill
fq={!dismax qt=third}tom
q=jones
qt=first
defType=dismax

In solrconfig:

qt=second would be defined with qf=name and other params like mm.
qt=third would be defined with qf=name2 and other params like mm.

But I guess this is not possible since ALL the params are being loaded from 
including the 2 fq ones.

If you want you can close this, but it would be a nice feature.

I would want it to work that each localParams are set by second and third. 


 Localparams not working with 2 fq parameters using qt=name
 --

 Key: SOLR-2471
 URL: https://issues.apache.org/jira/browse/SOLR-2471
 Project: Solr
  Issue Type: Bug
Reporter: Bill Bell

 We are having a problem with the following query. If we have two localparams 
 (using fq) and use QT= it does not work.
 This does not find any results:
 http://localhost:8983/solr/provs/select?qname=johnqspec=dentfq={!type=dismax
  qt=namespec v=$qspec}fq={!type=dismax qt=dismaxname 
 v=$qname}q=_val_:{!type=dismax qt=namespec v=$qspec} _val_:{!type=dismax 
 qt=dismaxname 
 v=$qname}fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_namewt=csvfacet=truefacet.field=specialties_descsort=score
  descrows=1000start=0
 This works okay. It returns a few results.
 http://localhost:8983/solr/provs/select?qname=johnqspec=dentfq={!type=dismax
  qf=$qqf v=$qspec}fq={!type=dismax qt=dismaxname 
 v=$qname}q=_val_:{!type=dismax qf=$qqf  v=$qspec} _val_:{!type=dismax 
 qt=dismaxname v=$qname} qqf=specialties_ngram^1.0 
 specialties_search^2.0fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_namewt=csvfacet=truefacet.field=specialties_descsort=score
  descrows=1000start=0
 We would like to use a QT for both terms but it seems there is some kind of 
 bug when using two localparams and dismax filters with QT.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-04-28 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026863#comment-13026863
 ] 

Lance Norskog commented on SOLR-2242:
-

bq. There is a lot of logic in getListedTermCounts() and getTermCountsLimit(). 
If we optimize, and just add a counter, we need to make sure the new methods 
are not forgotten about (test cases?). I have seen that happen numerous times.
Ayup. In fact this breaks SimpleFacetsTest. Everything in facets need tests.

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org