[jira] [Commented] (LUCENENET-484) Some possibly major tests intermittently fail

2012-05-31 Thread Luc Vanlerberghe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286433#comment-13286433
 ] 

Luc Vanlerberghe commented on LUCENENET-484:


The failures in the TestSanity test cases are due to a bug in Cleanup which is 
called whenever a GC is detected in CleanIfNeeded (which is itself called from 
several places)
Cleanup actually drops all cache entries that have live keys instead of the 
other way around!
I also corrected a race condition in WeakKeyT.Equals (that will probably only 
happen under heavy load when you least 

I'll post patches with the corrections and updated test cases in a minute...


 Some possibly major tests intermittently fail 
 --

 Key: LUCENENET-484
 URL: https://issues.apache.org/jira/browse/LUCENENET-484
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Core, Lucene.Net Test
Affects Versions: Lucene.Net 3.0.3
Reporter: Christopher Currens
 Fix For: Lucene.Net 3.0.3


 These tests will fail intermittently in Debug or Release mode, in the core 
 test suite:
 # -Lucene.Net.Index:-
 #- -TestConcurrentMergeScheduler.TestFlushExceptions-
 # Lucene.Net.Store:
 #- TestLockFactory.TestStressLocks
 # Lucene.Net.Search:
 #- TestSort.TestParallelMultiSort
 # Lucene.Net.Util:
 #- TestFieldCacheSanityChecker.TestInsanity1
 #- TestFieldCacheSanityChecker.TestInsanity2
 #- (It's possible all of the insanity tests fail at one point or another)
 # Lucene.Net.Support
 #- TestWeakHashTableMultiThreadAccess.Test
 TestWeakHashTableMultiThreadAccess should be fine to remove along with the 
 WeakHashTable in the Support namespace, since it's been replaced with 
 WeakDictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (LUCENENET-484) Some possibly major tests intermittently fail

2012-05-31 Thread Luc Vanlerberghe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luc Vanlerberghe updated LUCENENET-484:
---

Attachment: Lucenenet-484-WeakDictionaryTests.patch

This patch removes WeakHashtable and uses its tests for WeakDictionary instead 
(I actually renamed the test files and updated the tests so subversion would 
keep the history, but the .patch format apparently doesn't keep that info...)

 Some possibly major tests intermittently fail 
 --

 Key: LUCENENET-484
 URL: https://issues.apache.org/jira/browse/LUCENENET-484
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Core, Lucene.Net Test
Affects Versions: Lucene.Net 3.0.3
Reporter: Christopher Currens
 Fix For: Lucene.Net 3.0.3

 Attachments: Lucenenet-484-WeakDictionary.patch, 
 Lucenenet-484-WeakDictionaryTests.patch


 These tests will fail intermittently in Debug or Release mode, in the core 
 test suite:
 # -Lucene.Net.Index:-
 #- -TestConcurrentMergeScheduler.TestFlushExceptions-
 # Lucene.Net.Store:
 #- TestLockFactory.TestStressLocks
 # Lucene.Net.Search:
 #- TestSort.TestParallelMultiSort
 # Lucene.Net.Util:
 #- TestFieldCacheSanityChecker.TestInsanity1
 #- TestFieldCacheSanityChecker.TestInsanity2
 #- (It's possible all of the insanity tests fail at one point or another)
 # Lucene.Net.Support
 #- TestWeakHashTableMultiThreadAccess.Test
 TestWeakHashTableMultiThreadAccess should be fine to remove along with the 
 WeakHashTable in the Support namespace, since it's been replaced with 
 WeakDictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (LUCENENET-493) Make lucene.net culture insensitive (like the java version)

2012-05-31 Thread Luc Vanlerberghe (JIRA)
Luc Vanlerberghe created LUCENENET-493:
--

 Summary: Make lucene.net culture insensitive (like the java 
version)
 Key: LUCENENET-493
 URL: https://issues.apache.org/jira/browse/LUCENENET-493
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Core, Lucene.Net Test
Affects Versions: Lucene.Net 3.0.3
Reporter: Luc Vanlerberghe
 Fix For: Lucene.Net 3.0.3


In Java, conversion of the basic types to and from strings is locale (culture) 
independent. For localized input/output one needs to use the classes in the 
java.text package.
In .Net, conversion of the basic types to and from strings depends on the 
default Culture.  Otherwise you have to specify CultureInfo.InvariantCulture 
explicitly.

Some of the testcases in lucene.net fail if they are not run on a machine with 
culture set to US.
In the current version of lucene.net there are patches here and there that try 
to correct for some specific cases by using string replacement (like  
System.Double.Parse(s.Replace(., 
CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator)), but that 
seems really ugly.

I submit a patch here that removes the old workarounds and replaces them by 
calls to classes in the Lucene.Net.Support namespace that try to handle the 
conversions in a compatible way.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (LUCENENET-493) Make lucene.net culture insensitive (like the java version)

2012-05-31 Thread Luc Vanlerberghe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luc Vanlerberghe updated LUCENENET-493:
---

Attachment: Lucenenet-493.patch

Makes lucene.net locale/culture independent (like the java version).
Solves a few testcases that fail when run on a machine with a non-US culture.

 Make lucene.net culture insensitive (like the java version)
 ---

 Key: LUCENENET-493
 URL: https://issues.apache.org/jira/browse/LUCENENET-493
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Core, Lucene.Net Test
Affects Versions: Lucene.Net 3.0.3
Reporter: Luc Vanlerberghe
  Labels: patch
 Fix For: Lucene.Net 3.0.3

 Attachments: Lucenenet-493.patch


 In Java, conversion of the basic types to and from strings is locale 
 (culture) independent. For localized input/output one needs to use the 
 classes in the java.text package.
 In .Net, conversion of the basic types to and from strings depends on the 
 default Culture.  Otherwise you have to specify CultureInfo.InvariantCulture 
 explicitly.
 Some of the testcases in lucene.net fail if they are not run on a machine 
 with culture set to US.
 In the current version of lucene.net there are patches here and there that 
 try to correct for some specific cases by using string replacement (like  
 System.Double.Parse(s.Replace(., 
 CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator)), but that 
 seems really ugly.
 I submit a patch here that removes the old workarounds and replaces them by 
 calls to classes in the Lucene.Net.Support namespace that try to handle the 
 conversions in a compatible way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (LUCENENET-484) Some possibly major tests intermittently fail

2012-05-31 Thread Christopher Currens (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286699#comment-13286699
 ] 

Christopher Currens commented on LUCENENET-484:
---

Thanks Luc.  This is great stuff.  I'll run the patch on my local box and 
double check everything.  Your help with this is appreciated by all of us!

 Some possibly major tests intermittently fail 
 --

 Key: LUCENENET-484
 URL: https://issues.apache.org/jira/browse/LUCENENET-484
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Core, Lucene.Net Test
Affects Versions: Lucene.Net 3.0.3
Reporter: Christopher Currens
 Fix For: Lucene.Net 3.0.3

 Attachments: Lucenenet-484-WeakDictionary.patch, 
 Lucenenet-484-WeakDictionaryTests.patch


 These tests will fail intermittently in Debug or Release mode, in the core 
 test suite:
 # -Lucene.Net.Index:-
 #- -TestConcurrentMergeScheduler.TestFlushExceptions-
 # Lucene.Net.Store:
 #- TestLockFactory.TestStressLocks
 # Lucene.Net.Search:
 #- TestSort.TestParallelMultiSort
 # Lucene.Net.Util:
 #- TestFieldCacheSanityChecker.TestInsanity1
 #- TestFieldCacheSanityChecker.TestInsanity2
 #- (It's possible all of the insanity tests fail at one point or another)
 # Lucene.Net.Support
 #- TestWeakHashTableMultiThreadAccess.Test
 TestWeakHashTableMultiThreadAccess should be fine to remove along with the 
 WeakHashTable in the Support namespace, since it's been replaced with 
 WeakDictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (LUCENENET-484) Some possibly major tests intermittently fail

2012-05-31 Thread Christopher Currens (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Currens updated LUCENENET-484:
--

Description: 
These tests will fail intermittently in Debug or Release mode, in the core test 
suite:

# -Lucene.Net.Index:-
#- -TestConcurrentMergeScheduler.TestFlushExceptions-
# Lucene.Net.Store:
#- TestLockFactory.TestStressLocks
# Lucene.Net.Search:
#- TestSort.TestParallelMultiSort
# -Lucene.Net.Util:-
#- -TestFieldCacheSanityChecker.TestInsanity1-
#- -TestFieldCacheSanityChecker.TestInsanity2-
#- -(It's possible all of the insanity tests fail at one point or another)-
# -Lucene.Net.Support-
#- -TestWeakHashTableMultiThreadAccess.Test-

TestWeakHashTableMultiThreadAccess should be fine to remove along with the 
WeakHashTable in the Support namespace, since it's been replaced with 
WeakDictionary.

  was:
These tests will fail intermittently in Debug or Release mode, in the core test 
suite:

# -Lucene.Net.Index:-
#- -TestConcurrentMergeScheduler.TestFlushExceptions-
# Lucene.Net.Store:
#- TestLockFactory.TestStressLocks
# Lucene.Net.Search:
#- TestSort.TestParallelMultiSort
# Lucene.Net.Util:
#- TestFieldCacheSanityChecker.TestInsanity1
#- TestFieldCacheSanityChecker.TestInsanity2
#- (It's possible all of the insanity tests fail at one point or another)
# Lucene.Net.Support
#- TestWeakHashTableMultiThreadAccess.Test

TestWeakHashTableMultiThreadAccess should be fine to remove along with the 
WeakHashTable in the Support namespace, since it's been replaced with 
WeakDictionary.

Environment: All

Applied the patches.  Getting closer to resolving this issue.

 Some possibly major tests intermittently fail 
 --

 Key: LUCENENET-484
 URL: https://issues.apache.org/jira/browse/LUCENENET-484
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Core, Lucene.Net Test
Affects Versions: Lucene.Net 3.0.3
 Environment: All
Reporter: Christopher Currens
 Fix For: Lucene.Net 3.0.3

 Attachments: Lucenenet-484-WeakDictionary.patch, 
 Lucenenet-484-WeakDictionaryTests.patch


 These tests will fail intermittently in Debug or Release mode, in the core 
 test suite:
 # -Lucene.Net.Index:-
 #- -TestConcurrentMergeScheduler.TestFlushExceptions-
 # Lucene.Net.Store:
 #- TestLockFactory.TestStressLocks
 # Lucene.Net.Search:
 #- TestSort.TestParallelMultiSort
 # -Lucene.Net.Util:-
 #- -TestFieldCacheSanityChecker.TestInsanity1-
 #- -TestFieldCacheSanityChecker.TestInsanity2-
 #- -(It's possible all of the insanity tests fail at one point or another)-
 # -Lucene.Net.Support-
 #- -TestWeakHashTableMultiThreadAccess.Test-
 TestWeakHashTableMultiThreadAccess should be fine to remove along with the 
 WeakHashTable in the Support namespace, since it's been replaced with 
 WeakDictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java7-64 #191

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/191/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore

2012-05-31 Thread Christoph Kaser (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286389#comment-13286389
 ] 

Christoph Kaser commented on LUCENE-4077:
-

Thank you, now it works perfectly!

 ToParentBlockJoinCollector provides no way to access computed scores and the 
 maxScore
 -

 Key: LUCENE-4077
 URL: https://issues.apache.org/jira/browse/LUCENE-4077
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/join
Affects Versions: 3.4, 3.5, 3.6
Reporter: Christoph Kaser
Assignee: Michael McCandless
 Attachments: LUCENE-4077.patch, LUCENE-4077.patch, LUCENE-4077.patch, 
 LUCENE-4077.patch


 The constructor of ToParentBlockJoinCollector allows to turn on the tracking 
 of parent scores and the maximum parent score, however there is no way to 
 access those scores because:
 * maxScore is a private field, and there is no getter
 * TopGroups / GroupDocs does not provide access to the scores for the parent 
 documents, only the children

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java6-64 #337

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/337/

--
[...truncated 10767 lines...]
   [junit4]   2 1081 T3671 oashc.HttpShardHandlerFactory.getParameter Setting 
maximumPoolSize to: 2147483647
   [junit4]   2 1081 T3671 oashc.HttpShardHandlerFactory.getParameter Setting 
maxThreadIdleTime to: 5
   [junit4]   2 1081 T3671 oashc.HttpShardHandlerFactory.getParameter Setting 
sizeOfQueue to: -1
   [junit4]   2 1082 T3671 oashc.HttpShardHandlerFactory.getParameter Setting 
fairnessPolicy to: false
   [junit4]   2 1082 T3671 oascsi.HttpClientUtil.createClient Creating new 
http client, 
config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false
   [junit4]   2 1086 T3673 oasc.SolrCore.registerSearcher [collection1] 
Registered new searcher Searcher@77668827 
main{StandardDirectoryReader(segments_2:3 _0(5.0):C3)}
   [junit4]   2 1086 T3671 oasc.CoreContainer.register registering core: 
collection1
   [junit4]   2 1087 T3671 oas.SolrTestCaseJ4.initCore initCore end
   [junit4]   2 ASYNC  NEW_CORE C220 name=collection1 
org.apache.solr.core.SolrCore@107ede1
   [junit4]   2 1087 T3671 C220 REQ [collection1] webapp=null path=null 
params={q=acspellcheck.count=2qt=/suggest_tstspellcheck.onlyMorePopular=true}
 status=0 QTime=0 
   [junit4]   2 1091 T3671 oas.SolrTestCaseJ4.assertQ SEVERE REQUEST FAILED: 
xpath=//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='ac']/int[@name='numFound'][.='2']
   [junit4]   2xml response was: ?xml version=1.0 
encoding=UTF-8?
   [junit4]   2response
   [junit4]   2lst name=responseHeaderint 
name=status0/intint name=QTime0/int/lstlst name=spellchecklst 
name=suggestions//lst
   [junit4]   2/response
   [junit4]   2
   [junit4]   2request 
was:q=acspellcheck.count=2qt=/suggest_tstspellcheck.onlyMorePopular=true
   [junit4]   2 1091 T3671 oasc.SolrException.log SEVERE REQUEST FAILED: 
q=acspellcheck.count=2qt=/suggest_tstspellcheck.onlyMorePopular=true:java.lang.RuntimeException:
 REQUEST FAILED: 
xpath=//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='ac']/int[@name='numFound'][.='2']
   [junit4]   2xml response was: ?xml version=1.0 
encoding=UTF-8?
   [junit4]   2response
   [junit4]   2lst name=responseHeaderint 
name=status0/intint name=QTime0/int/lstlst name=spellchecklst 
name=suggestions//lst
   [junit4]   2/response
   [junit4]   2
   [junit4]   2request 
was:q=acspellcheck.count=2qt=/suggest_tstspellcheck.onlyMorePopular=true
   [junit4]   2at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:452)
   [junit4]   2at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:426)
   [junit4]   2at 
org.apache.solr.spelling.suggest.SuggesterTest.testReload(SuggesterTest.java:91)
   [junit4]   2at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   [junit4]   2at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   [junit4]   2at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   [junit4]   2at 
java.lang.reflect.Method.invoke(Method.java:597)
   [junit4]   2at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
   [junit4]   2at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
   [junit4]   2at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
   [junit4]   2at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
   [junit4]   2at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
   [junit4]   2at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
   [junit4]   2at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
   [junit4]   2at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
   [junit4]   2at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
   [junit4]   2at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
   [junit4]   2at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
   [junit4]   2at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
   [junit4]   2   

Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java6-64 #338

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/338/

--
[...truncated 10266 lines...]
   [junit4] Completed in 0.93s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.component.TermVectorComponentTest
   [junit4] Completed in 1.19s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.core.RAMDirectoryFactoryTest
   [junit4] Completed in 0.01s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.analysis.TestItalianLightStemFilterFactory
   [junit4] Completed in 0.01s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.core.RequestHandlersTest
   [junit4] Completed in 1.23s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestSolrQueryParser
   [junit4] Completed in 0.92s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestSort
   [junit4] Completed in 3.97s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.function.TestFunctionQuery
   [junit4] Completed in 3.11s, 14 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestRealTimeGet
   [junit4] IGNOR/A 0.00s | TestRealTimeGet.testStressRecovery
   [junit4] Assumption #1: FIXME: This test is horribly slow sometimes on 
Windows!
   [junit4]   2 28508 T2206 oas.SolrTestCaseJ4.setUp ###Starting 
testStressRecovery
   [junit4]   2 28508 T2206 oas.SolrTestCaseJ4.tearDown ###Ending 
testStressRecovery
   [junit4]   2
   [junit4] Completed in 28.65s, 8 tests, 1 skipped
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.OverseerTest
   [junit4] Completed in 48.54s, 7 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.LeaderElectionTest
   [junit4] Completed in 20.93s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.RecoveryZkTest
   [junit4] Completed in 35.65s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.LeaderElectionIntegrationTest
   [junit4] Completed in 29.88s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.request.TestFaceting
   [junit4] Completed in 12.36s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.update.DirectUpdateHandlerTest
   [junit4] Completed in 2.77s, 6 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.update.PeerSyncTest
   [junit4] Completed in 4.51s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.ConvertedLegacyTest
   [junit4] Completed in 3.20s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.StandardRequestHandlerTest
   [junit4] Completed in 0.94s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.update.SolrCmdDistributorTest
   [junit4] Completed in 1.87s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.spelling.IndexBasedSpellCheckerTest
   [junit4] Completed in 1.33s, 5 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.request.TestWriterPerf
   [junit4] Completed in 1.12s, 1 test
   [junit4]  
   [junit4] Suite: 
org.apache.solr.search.similarities.TestLMDirichletSimilarityFactory
   [junit4] Completed in 0.17s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.component.TermsComponentTest
   [junit4] Completed in 1.19s, 13 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.function.SortByFunctionTest
   [junit4] Completed in 2.18s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.spelling.SpellCheckCollatorTest
   [junit4] Completed in 2.23s, 6 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.SpatialFilterTest
   [junit4] Completed in 1.80s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.schema.PolyFieldTest
   [junit4] Completed in 1.39s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.schema.CopyFieldTest
   [junit4] Completed in 0.67s, 6 tests
   [junit4]  
   [junit4] Suite: 
org.apache.solr.update.processor.FieldMutatingUpdateProcessorTest
   [junit4] Completed in 0.90s, 20 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestDocSet
   [junit4] Completed in 0.70s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.XmlUpdateRequestHandlerTest
   [junit4] Completed in 0.93s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.TestCSVLoader
   [junit4] Completed in 1.24s, 5 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.component.DebugComponentTest
   [junit4] Completed in 1.05s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.JsonLoaderTest
   [junit4] Completed in 0.94s, 5 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.response.TestCSVResponseWriter
   [junit4] Completed in 0.86s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.search.QueryParsingTest
   [junit4] Completed in 0.91s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.component.SearchHandlerTest
   [junit4] Completed in 0.91s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.update.UpdateParamsTest
   [junit4] Completed in 0.92s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.search.ReturnFieldsTest
   

[jira] [Commented] (LUCENE-4090) PerFieldPostingsFormat cannot use name as suffix

2012-05-31 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286411#comment-13286411
 ] 

Mark Harwood commented on LUCENE-4090:
--

Thanks for the quick fix, Rob :)
Working fine for me here now.

 PerFieldPostingsFormat cannot use name as suffix
 

 Key: LUCENE-4090
 URL: https://issues.apache.org/jira/browse/LUCENE-4090
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4090.patch, LUCENE-4090.patch


 Currently PFPF just records the name in the metadata, which matches up to the 
 segment suffix. But this isnt enough, e.g. someone can use Pulsing(1) on one 
 field and Pulsing(2) on another field.
 See Mark Harwood's examples struggling with this on LUCENE-4069.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-31 Thread Renaud Delbru

Thanks Robert for the answers,
I'll investigate this approach.
--
Renaud Delbru

On 28/05/12 21:59, Robert Muir (JIRA) wrote:


 [ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284553#comment-13284553
 ]

Robert Muir commented on LUCENE-4055:
-

Well you can do postingsFormat instanceof PerFieldPostingsFormat + 
postingsFormat.getPostingsFormatForField if you really want.

But keep in mind PerFieldPostingsFormat is not really special and just one we 
provide for convenience, obviously one could write their own PostingsFormat
that implements the same thing in a different way.



Refactor SegmentInfo / FieldInfo to make them extensible


 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should 
be made abstract so that they can be extended by Codec-s.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java7-64 #192

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/192/

--
[...truncated 14362 lines...]
   [junit4]   2 26209 T3142 oasc.RequestHandlers.initHandlersFromConfig adding 
lazy requestHandler: solr.ReplicationHandler
   [junit4]   2 26209 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created /replication: solr.ReplicationHandler
   [junit4]   2 26209 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created standard: solr.StandardRequestHandler
   [junit4]   2 26209 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created /get: solr.RealTimeGetHandler
   [junit4]   2 26210 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created dismax: solr.SearchHandler
   [junit4]   2 26210 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created dismaxNoDefaults: solr.SearchHandler
   [junit4]   2 26210 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created mock: org.apache.solr.core.MockQuerySenderListenerReqHandler
   [junit4]   2 26211 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created /admin/: org.apache.solr.handler.admin.AdminHandlers
   [junit4]   2 26211 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created defaults: solr.StandardRequestHandler
   [junit4]   2 26211 T3142 oasc.RequestHandlers.initHandlersFromConfig adding 
lazy requestHandler: solr.StandardRequestHandler
   [junit4]   2 26211 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created lazy: solr.StandardRequestHandler
   [junit4]   2 26211 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created /update: solr.UpdateRequestHandler
   [junit4]   2 26211 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created /terms: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 26211 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created spellCheckCompRH: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 26211 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created spellCheckCompRH_Direct: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 26211 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created spellCheckCompRH1: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 26212 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created tvrh: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 26213 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created /mlt: solr.MoreLikeThisHandler
   [junit4]   2 26213 T3142 oasc.RequestHandlers.initHandlersFromConfig 
created /debug/dump: solr.DumpRequestHandler
   [junit4]   2 26214 T3142 oashl.XMLLoader.init xsltCacheLifetimeSeconds=60
   [junit4]   2 26216 T3142 oasc.SolrCore.initDeprecatedSupport WARNING 
solrconfig.xml uses deprecated admin/gettableFiles, Please update your config 
to use the ShowFileRequestHandler.
   [junit4]   2 26217 T3142 oasc.SolrCore.initDeprecatedSupport WARNING adding 
ShowFileRequestHandler with hidden files: [SOLRCONFIG-HIGHLIGHT.XML, 
SCHEMA-REQUIRED-FIELDS.XML, SCHEMA-REPLICATION2.XML, SCHEMA-MINIMAL.XML, 
BAD-SCHEMA-DUP-DYNAMICFIELD.XML, SOLRCONFIG-CACHING.XML, 
SOLRCONFIG-REPEATER.XML, CURRENCY.XML, BAD-SCHEMA-NONTEXT-ANALYZER.XML, 
SOLRCONFIG-MERGEPOLICY.XML, SOLRCONFIG-TLOG.XML, SOLRCONFIG-MASTER.XML, 
SCHEMA11.XML, SOLRCONFIG-BASIC.XML, DA_COMPOUNDDICTIONARY.TXT, 
SCHEMA-COPYFIELD-TEST.XML, SOLRCONFIG-SLAVE.XML, ELEVATE.XML, 
SOLRCONFIG-PROPINJECT-INDEXDEFAULT.XML, SCHEMA-IB.XML, 
SOLRCONFIG-QUERYSENDER.XML, SCHEMA-REPLICATION1.XML, DA_UTF8.XML, 
HYPHENATION.DTD, SOLRCONFIG-ENABLEPLUGIN.XML, SCHEMA-PHRASESUGGEST.XML, 
STEMDICT.TXT, HUNSPELL-TEST.AFF, STOPTYPES-1.TXT, STOPWORDSWRONGENCODING.TXT, 
SCHEMA-NUMERIC.XML, SOLRCONFIG-TRANSFORMERS.XML, SOLRCONFIG-PROPINJECT.XML, 
BAD-SCHEMA-NOT-INDEXED-BUT-TF.XML, SOLRCONFIG-SIMPLELOCK.XML, WDFTYPES.TXT, 
STOPTYPES-2.TXT, SCHEMA-REVERSED.XML, SOLRCONFIG-SPELLCHECKCOMPONENT.XML, 
SCHEMA-DFR.XML, SOLRCONFIG-PHRASESUGGEST.XML, 
BAD-SCHEMA-NOT-INDEXED-BUT-POS.XML, KEEP-1.TXT, OPEN-EXCHANGE-RATES.JSON, 
STOPWITHBOM.TXT, SCHEMA-BINARYFIELD.XML, SOLRCONFIG-SPELLCHECKER.XML, 
SOLRCONFIG-UPDATE-PROCESSOR-CHAINS.XML, BAD-SCHEMA-OMIT-TF-BUT-NOT-POS.XML, 
BAD-SCHEMA-DUP-FIELDTYPE.XML, SOLRCONFIG-MASTER1.XML, SYNONYMS.TXT, SCHEMA.XML, 
SCHEMA_CODEC.XML, SOLRCONFIG-SOLR-749.XML, 
SOLRCONFIG-MASTER1-KEEPONEBACKUP.XML, STOP-2.TXT, SOLRCONFIG-FUNCTIONQUERY.XML, 
SCHEMA-LMDIRICHLET.XML, SOLRCONFIG-TERMINDEX.XML, SOLRCONFIG-ELEVATE.XML, 
STOPWORDS.TXT, SCHEMA-FOLDING.XML, SCHEMA-STOP-KEEP.XML, 
BAD-SCHEMA-NOT-INDEXED-BUT-NORMS.XML, SOLRCONFIG-SOLCOREPROPERTIES.XML, 
STOP-1.TXT, SOLRCONFIG-MASTER2.XML, SCHEMA-SPELLCHECKER.XML, 
SOLRCONFIG-LAZYWRITER.XML, SCHEMA-LUCENEMATCHVERSION.XML, 
BAD-MP-SOLRCONFIG.XML, FRENCHARTICLES.TXT, SCHEMA15.XML, 
SOLRCONFIG-REQHANDLER.INCL, SCHEMASURROUND.XML, SOLRCONFIG-MASTER3.XML, 
HUNSPELL-TEST.DIC, SOLRCONFIG-XINCLUDE.XML, SOLRCONFIG-DELPOLICY1.XML, 
SOLRCONFIG-SLAVE1.XML, SCHEMA-SIM.XML, SCHEMA-COLLATE.XML, STOP-SNOWBALL.TXT, 
PROTWORDS.TXT, 

Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java6-64 #339

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/339/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3498) ContentStreamUpdateRequest doesn't seem to respect setCommitWithin()

2012-05-31 Thread Christian Moen (JIRA)
Christian Moen created SOLR-3498:


 Summary: ContentStreamUpdateRequest doesn't seem to respect 
setCommitWithin()
 Key: SOLR-3498
 URL: https://issues.apache.org/jira/browse/SOLR-3498
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 3.6
Reporter: Christian Moen


I'm using the below code to post some office format files to Solr using SolrJ. 
It seems like {{setCommitWithin()}} is ignored in my 
{{ContentStreamUpdateRequest}} request, and that I need to use 
{{setParam(UpdateParams.COMMIT_WITHIN, ...)}} instead to get the desired 
effect.

{code}
SolrServer solrServer = new HttpSolrServer(http://localhost:8983/solr;);
ContentStreamUpdateRequest updateRequest = new 
ContentStreamUpdateRequest(/update/extract);
updateRequest.addFile(file);
updateRequest.setParam(literal.id, file.getName());
updateRequest.setCommitWithin(1); // Does not work
//updateRequest.setParam(UpdateParams.COMMIT_WITHIN, 1); // Works
updateRequest.process(solrServer);
{code}

If I use the below

{code}
...
//updateRequest.setCommitWithin(1); // Does not work
updateRequest.setParam(UpdateParams.COMMIT_WITHIN, 1); // Works
...
{code}

I get the desired result and a commit is being done.

I'm doing this on 3.x, but I believe this issue could apply to 4.x as well (by 
quickly glancing over the code with tired eyes), but I haven't verified this, 
yet.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENENET-484) Some possibly major tests intermittently fail

2012-05-31 Thread Luc Vanlerberghe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luc Vanlerberghe updated LUCENENET-484:
---

Attachment: Lucenenet-484-WeakDictionary.patch

Corrects both Clean() and WeakKeyT.Equals

 Some possibly major tests intermittently fail 
 --

 Key: LUCENENET-484
 URL: https://issues.apache.org/jira/browse/LUCENENET-484
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Core, Lucene.Net Test
Affects Versions: Lucene.Net 3.0.3
Reporter: Christopher Currens
 Fix For: Lucene.Net 3.0.3

 Attachments: Lucenenet-484-WeakDictionary.patch


 These tests will fail intermittently in Debug or Release mode, in the core 
 test suite:
 # -Lucene.Net.Index:-
 #- -TestConcurrentMergeScheduler.TestFlushExceptions-
 # Lucene.Net.Store:
 #- TestLockFactory.TestStressLocks
 # Lucene.Net.Search:
 #- TestSort.TestParallelMultiSort
 # Lucene.Net.Util:
 #- TestFieldCacheSanityChecker.TestInsanity1
 #- TestFieldCacheSanityChecker.TestInsanity2
 #- (It's possible all of the insanity tests fail at one point or another)
 # Lucene.Net.Support
 #- TestWeakHashTableMultiThreadAccess.Test
 TestWeakHashTableMultiThreadAccess should be fine to remove along with the 
 WeakHashTable in the Support namespace, since it's been replaced with 
 WeakDictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (LUCENE-4019) Parsing Hunspell affix rules without regexp condition

2012-05-31 Thread Luca Cavanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Cavanna updated LUCENE-4019:
-

Attachment: LUCENE-4019.patch

Hi Chris, 
thanks for your feedback. Here is a new patch containing a new option in order 
to enable/disable the affix strict parsing, by default it is enabled. I updated 
the HunspellStemFilterFactory too in order to expose the new option to Solr.

 Parsing Hunspell affix rules without regexp condition
 -

 Key: LUCENE-4019
 URL: https://issues.apache.org/jira/browse/LUCENE-4019
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.6
Reporter: Luca Cavanna
Assignee: Chris Male
 Attachments: LUCENE-4019.patch, LUCENE-4019.patch


 We found out that some recent Dutch hunspell dictionaries contain suffix or 
 prefix rules like the following:
 {code} 
 SFX Na N 1
 SFX Na 0 ste
 {code}
 The rule on the second line doesn't contain the 5th parameter, which should 
 be the condition (a regexp usually). You can usually see a '.' as condition, 
 meaning always (for every character). As explained in LUCENE-3976 the 
 readAffix method throws error. I wonder if we should treat the missing value 
 as a kind of default value, like '.'.  On the other hand I haven't found any 
 information about this within the spec. Any thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4019) Parsing Hunspell affix rules without regexp condition

2012-05-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286474#comment-13286474
 ] 

Chris Male commented on LUCENE-4019:


Hi Luca,

Thanks for taking a shot at this.

I wonder whether we can do improve the ParseException message? At the very 
least it should include the line that is causing the problem so people can find 
it.  What would be even better is if we also included the line number.  The 
latter is probably not so urgent, but it would be handy to have for other 
parsing errors too.

Also I think the changes to the Factory are wrong:

{code}
+  if(strictAffixParsing.equalsIgnoreCase(TRUE)) ignoreCase = true;
+  else if(strictAffixParsing.equalsIgnoreCase(FALSE)) ignoreCase = false;
{code}



 Parsing Hunspell affix rules without regexp condition
 -

 Key: LUCENE-4019
 URL: https://issues.apache.org/jira/browse/LUCENE-4019
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.6
Reporter: Luca Cavanna
Assignee: Chris Male
 Attachments: LUCENE-4019.patch, LUCENE-4019.patch


 We found out that some recent Dutch hunspell dictionaries contain suffix or 
 prefix rules like the following:
 {code} 
 SFX Na N 1
 SFX Na 0 ste
 {code}
 The rule on the second line doesn't contain the 5th parameter, which should 
 be the condition (a regexp usually). You can usually see a '.' as condition, 
 meaning always (for every character). As explained in LUCENE-3976 the 
 readAffix method throws error. I wonder if we should treat the missing value 
 as a kind of default value, like '.'.  On the other hand I haven't found any 
 information about this within the spec. Any thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java7-64 #193

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/193/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4097) index was locked because of InterruptedException

2012-05-31 Thread wang (JIRA)
wang created LUCENE-4097:


 Summary: index was locked because of InterruptedException
 Key: LUCENE-4097
 URL: https://issues.apache.org/jira/browse/LUCENE-4097
 Project: Lucene - Java
  Issue Type: Bug
Reporter: wang


the index was locked, because of InterruptedException,and i could do nothing 
but restart tomcat,
how could i avoid this happen again?
thanks

this is stacktrace:
org.apache.lucene.util.ThreadInterruptedException: 
java.lang.InterruptedException
at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4118)
at 
org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2836)
at 
org.apache.lucene.index.IndexWriter.finishMerges(IndexWriter.java:2821)
at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1847)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1800)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1764)
at 
org.opencms.search.CmsSearchManager.updateIndexIncremental(CmsSearchManager.java:2262)
at 
org.opencms.search.CmsSearchManager.updateIndexOffline(CmsSearchManager.java:2306)
at 
org.opencms.search.CmsSearchManager$CmsSearchOfflineIndexThread.run(CmsSearchManager.java:327)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4116)
... 8 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2719 - Failure

2012-05-31 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2719/

1 tests failed.
REGRESSION:  org.apache.lucene.util.packed.TestPackedInts.testIntOverflow

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
__randomizedtesting.SeedInfo.seed([26B73FF6A7A21CED:83602BDCBFF8E4B8]:0)
at 
org.apache.lucene.util.packed.Packed64SingleBlock.init(Packed64SingleBlock.java:115)
at 
org.apache.lucene.util.packed.Packed64SingleBlock$Packed64SingleBlock5.init(Packed64SingleBlock.java:279)
at 
org.apache.lucene.util.packed.Packed64SingleBlock.create(Packed64SingleBlock.java:68)
at 
org.apache.lucene.util.packed.TestPackedInts.testIntOverflow(TestPackedInts.java:303)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)




Build Log (for compile errors):
[...truncated 1559 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4019) Parsing Hunspell affix rules without regexp condition

2012-05-31 Thread Luca Cavanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Cavanna updated LUCENE-4019:
-

Attachment: LUCENE-4019.patch

Yeah, sorry for my mistakes, I corrected them.
And I added the line number to the ParseException.
Let me know if there's something more I can do!

 Parsing Hunspell affix rules without regexp condition
 -

 Key: LUCENE-4019
 URL: https://issues.apache.org/jira/browse/LUCENE-4019
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.6
Reporter: Luca Cavanna
Assignee: Chris Male
 Attachments: LUCENE-4019.patch, LUCENE-4019.patch, LUCENE-4019.patch


 We found out that some recent Dutch hunspell dictionaries contain suffix or 
 prefix rules like the following:
 {code} 
 SFX Na N 1
 SFX Na 0 ste
 {code}
 The rule on the second line doesn't contain the 5th parameter, which should 
 be the condition (a regexp usually). You can usually see a '.' as condition, 
 meaning always (for every character). As explained in LUCENE-3976 the 
 readAffix method throws error. I wonder if we should treat the missing value 
 as a kind of default value, like '.'.  On the other hand I haven't found any 
 information about this within the spec. Any thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3350) TextField's parseFieldQuery method not using analyzer's enablePosIncr parameter

2012-05-31 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286531#comment-13286531
 ] 

Tommaso Teofili commented on SOLR-3350:
---

for now I think we can at least remove the useless switches inside the code, as 
the broader discussion about overall enablePositionIncrements isn't trivial.

 TextField's parseFieldQuery method not using analyzer's enablePosIncr 
 parameter
 ---

 Key: SOLR-3350
 URL: https://issues.apache.org/jira/browse/SOLR-3350
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 3.5, 4.0
Reporter: Tommaso Teofili
Priority: Minor

 parseFieldQuery method of TextField class just set 
 {code}
   ...
   boolean enablePositionIncrements = true;
   ...
 {code}
 while that should be taken from Analyzer's configuration.
 The above condition is evaluated afterwards in two points:
 {code}
   ...
   if (enablePositionIncrements) {
 mpq.add((Term[]) multiTerms.toArray(new Term[0]), position);
   } else {
 mpq.add((Term[]) multiTerms.toArray(new Term[0]));
   }
   return mpq;
   ...
   ...
   if (enablePositionIncrements) {
 position += positionIncrement;
 pq.add(new Term(field, term), position);
   } else {
  pq.add(new Term(field, term));
   }
   ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Mark Harwood (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Harwood updated LUCENE-4069:
-

Attachment: BloomFilterPostings40.patch

This is looking more promising.

Running ant test-core 
-Dtests.postingsformat=TestBloomFilteredLucene40Postings now passes all tests 
but causes OOM exception on 3 tests:
* TestConsistentFieldNumbers.testManyFields
* TestIndexableField.testArbitraryFields
* TestIndexWriter.testManyFields

Any pointers on how to annotate or otherwise avoid the BloomFilter class for 
many-field tests would be welcome. These are not realistic tests for this 
class (we don't expect indexes with 100s of primary-key like fields).

In this patch I've
* added an SPI lookup mechanism for pluggable hash algos.
* documented the file format
* fixed issues with TermVector tests
* changed the API


To use:
BloomFilteringPostingFormat now takes a delegate PostingsFormat and a set of 
field names that are to have bloom-filters created.
Fields that are not listed in the filter set can be safely indexed as per 
normal and doing so is beneficial because it allows filtered and non filtered 
field data to co-exist in the same physical files created by the delegate 
PostingsFormat.


 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterCodec40.patch, BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Mark Harwood (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Harwood updated LUCENE-4069:
-

Attachment: (was: BloomFilterCodec40.patch)

 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Mark Harwood (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Harwood updated LUCENE-4069:
-

Attachment: (was: BloomFilterPostings40.patch)

 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Mark Harwood (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Harwood updated LUCENE-4069:
-

Attachment: BloomFilterPostings40.patch

Added missing class

 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286555#comment-13286555
 ] 

Robert Muir commented on LUCENE-4069:
-

I don't think the abstract class should be registered in the SPI.

Instead i think the concrete Bloom+Lucene40 that you have in tests should be 
moved into src/java and registered there, just call it Bloom40 or something. 
The abstract api is still available for someone that wants to do something more 
specialized.

This is just like how pulsing (another wrapper) is implemented.

As far as disabling this for certain tests, import 
o.a.l.util.LuceneTestCase.SuppressCodecs and put something like this at class 
level:

{code}
@SuppressCodecs(Bloom40)
public class TestFoo...

@SuppressCodecs({Bloom40, Memory})
public class TestBar...
{code}

The strings in here can be codecs or postings formats

 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286562#comment-13286562
 ] 

Robert Muir commented on LUCENE-4069:
-

Seeing the tests in question though, i dont think you want to disable this for 
these entire test classes.

We dont have a way to disable this on a per-method basis: and I think its 
generally not possible because
many classes create indexes in @BeforeClass etc.

An alternative would be to just pick this less often in RandomCodec: see the 
SimpleText hack :)

 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286598#comment-13286598
 ] 

Mark Harwood commented on LUCENE-4069:
--

bq. Instead i think the concrete Bloom+Lucene40 that you have in tests should 
be moved into src/java and registered there

What problem would that be trying to solve? Registration (or creation) of any 
BloomFilteringPostingsFormat subclasses is not necessary to decode index 
contents. Offering a Bloom40 would only buy users a pairing of 
Lucene40Postings and Bloom filtering but they would still have to declare which 
fields they want Bloom filtering on at write time. This isn't too hard using 
the code in the existing patch:

{code:title=ThisWorks.java}
final SetStringbloomFilteredFields=new HashSetString();
bloomFilteredFields.add(PRIMARY_KEY_FIELD_NAME);

iwc.setCodec(new Lucene40Codec(){
  BloomFilteringPostingsFormat postingOptions=new 
BloomFilteringPostingsFormat(new Lucene40PostingsFormat(), bloomFilteredFields);
  @Override
  public PostingsFormat getPostingsFormatForField(String field) {
return postingOptions;
  }  
});
{code}
No extra subclasses/registration required here to read the index built with the 
above setup.


 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286599#comment-13286599
 ] 

Robert Muir commented on LUCENE-4069:
-

I dont understand why this handles fields. Someone should just pick that with 
perfieldpostingsformat.

So you have the abstract wrapper(takes the wrapped postings format, and a 
String name), not registered.
And you have a concrete impl registered that is just abstractWrapper(lucene40, 
Bloom40): done.

 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286600#comment-13286600
 ] 

Mark Harwood commented on LUCENE-4069:
--

bq. An alternative would be to just pick this less often in RandomCodec: see 
the SimpleText hack 

Another option might be to make the TestBloomFilteredLucene40Postings pick a 
ludicrously small Bitset sizing option for each field so that we can 
accommodate tests that create silly numbers of fields. The bitsets being so 
small will just quickly reach saturation and force all reads to hit the 
underlying FieldsProducer.

 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java6-64 #344

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/344/

--
[...truncated 10471 lines...]
   [junit4]   2 18390 T2929 oasc.RequestHandlers.initHandlersFromConfig 
created mock: org.apache.solr.core.MockQuerySenderListenerReqHandler
   [junit4]   2 18391 T2929 oasc.RequestHandlers.initHandlersFromConfig 
created /admin/: org.apache.solr.handler.admin.AdminHandlers
   [junit4]   2 18391 T2929 oasc.RequestHandlers.initHandlersFromConfig 
created defaults: solr.StandardRequestHandler
   [junit4]   2 18391 T2929 oasc.RequestHandlers.initHandlersFromConfig adding 
lazy requestHandler: solr.StandardRequestHandler
   [junit4]   2 18391 T2929 oasc.RequestHandlers.initHandlersFromConfig 
created lazy: solr.StandardRequestHandler
   [junit4]   2 18392 T2929 oasc.RequestHandlers.initHandlersFromConfig 
created /update: solr.UpdateRequestHandler
   [junit4]   2 18392 T2929 oasc.RequestHandlers.initHandlersFromConfig 
created /terms: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 18392 T2929 oasc.RequestHandlers.initHandlersFromConfig 
created spellCheckCompRH: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 18393 T2929 oasc.RequestHandlers.initHandlersFromConfig 
created spellCheckCompRH_Direct: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 18393 T2929 oasc.RequestHandlers.initHandlersFromConfig 
created spellCheckCompRH1: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 18393 T2929 oasc.RequestHandlers.initHandlersFromConfig 
created tvrh: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 18393 T2929 oasc.RequestHandlers.initHandlersFromConfig 
created /mlt: solr.MoreLikeThisHandler
   [junit4]   2 18394 T2929 oasc.RequestHandlers.initHandlersFromConfig 
created /debug/dump: solr.DumpRequestHandler
   [junit4]   2 18395 T2929 oashl.XMLLoader.init xsltCacheLifetimeSeconds=60
   [junit4]   2 18397 T2929 oasc.SolrCore.initDeprecatedSupport WARNING 
solrconfig.xml uses deprecated admin/gettableFiles, Please update your config 
to use the ShowFileRequestHandler.
   [junit4]   2 18398 T2929 oasc.SolrCore.initDeprecatedSupport WARNING adding 
ShowFileRequestHandler with hidden files: [SOLRCONFIG-HIGHLIGHT.XML, 
SCHEMA-REQUIRED-FIELDS.XML, SCHEMA-REPLICATION2.XML, SCHEMA-MINIMAL.XML, 
BAD-SCHEMA-DUP-DYNAMICFIELD.XML, SOLRCONFIG-CACHING.XML, 
SOLRCONFIG-REPEATER.XML, CURRENCY.XML, BAD-SCHEMA-NONTEXT-ANALYZER.XML, 
SOLRCONFIG-MERGEPOLICY.XML, SOLRCONFIG-TLOG.XML, SOLRCONFIG-MASTER.XML, 
SCHEMA11.XML, SOLRCONFIG-BASIC.XML, DA_COMPOUNDDICTIONARY.TXT, 
SCHEMA-COPYFIELD-TEST.XML, SOLRCONFIG-SLAVE.XML, ELEVATE.XML, 
SOLRCONFIG-PROPINJECT-INDEXDEFAULT.XML, SCHEMA-IB.XML, 
SOLRCONFIG-QUERYSENDER.XML, SCHEMA-REPLICATION1.XML, DA_UTF8.XML, 
HYPHENATION.DTD, SOLRCONFIG-ENABLEPLUGIN.XML, SCHEMA-PHRASESUGGEST.XML, 
STEMDICT.TXT, HUNSPELL-TEST.AFF, STOPTYPES-1.TXT, STOPWORDSWRONGENCODING.TXT, 
SCHEMA-NUMERIC.XML, SOLRCONFIG-TRANSFORMERS.XML, SOLRCONFIG-PROPINJECT.XML, 
BAD-SCHEMA-NOT-INDEXED-BUT-TF.XML, SOLRCONFIG-SIMPLELOCK.XML, WDFTYPES.TXT, 
STOPTYPES-2.TXT, SCHEMA-REVERSED.XML, SOLRCONFIG-SPELLCHECKCOMPONENT.XML, 
SCHEMA-DFR.XML, SOLRCONFIG-PHRASESUGGEST.XML, 
BAD-SCHEMA-NOT-INDEXED-BUT-POS.XML, KEEP-1.TXT, OPEN-EXCHANGE-RATES.JSON, 
STOPWITHBOM.TXT, SCHEMA-BINARYFIELD.XML, SOLRCONFIG-SPELLCHECKER.XML, 
SOLRCONFIG-UPDATE-PROCESSOR-CHAINS.XML, BAD-SCHEMA-OMIT-TF-BUT-NOT-POS.XML, 
BAD-SCHEMA-DUP-FIELDTYPE.XML, SOLRCONFIG-MASTER1.XML, SYNONYMS.TXT, SCHEMA.XML, 
SCHEMA_CODEC.XML, SOLRCONFIG-SOLR-749.XML, 
SOLRCONFIG-MASTER1-KEEPONEBACKUP.XML, STOP-2.TXT, SOLRCONFIG-FUNCTIONQUERY.XML, 
SCHEMA-LMDIRICHLET.XML, SOLRCONFIG-TERMINDEX.XML, SOLRCONFIG-ELEVATE.XML, 
STOPWORDS.TXT, SCHEMA-FOLDING.XML, SCHEMA-STOP-KEEP.XML, 
BAD-SCHEMA-NOT-INDEXED-BUT-NORMS.XML, SOLRCONFIG-SOLCOREPROPERTIES.XML, 
STOP-1.TXT, SOLRCONFIG-MASTER2.XML, SCHEMA-SPELLCHECKER.XML, 
SOLRCONFIG-LAZYWRITER.XML, SCHEMA-LUCENEMATCHVERSION.XML, 
BAD-MP-SOLRCONFIG.XML, FRENCHARTICLES.TXT, SCHEMA15.XML, 
SOLRCONFIG-REQHANDLER.INCL, SCHEMASURROUND.XML, SOLRCONFIG-MASTER3.XML, 
HUNSPELL-TEST.DIC, SOLRCONFIG-XINCLUDE.XML, SOLRCONFIG-DELPOLICY1.XML, 
SOLRCONFIG-SLAVE1.XML, SCHEMA-SIM.XML, SCHEMA-COLLATE.XML, STOP-SNOWBALL.TXT, 
PROTWORDS.TXT, SCHEMA-TRIE.XML, SOLRCONFIG_CODEC.XML, SCHEMA-TFIDF.XML, 
SCHEMA-LMJELINEKMERCER.XML, PHRASESUGGEST.TXT, OLD_SYNONYMS.TXT, 
SOLRCONFIG-DELPOLICY2.XML, XSLT, SOLRCONFIG-NATIVELOCK.XML, 
BAD-SCHEMA-DUP-FIELD.XML, SOLRCONFIG-NOCACHE.XML, SCHEMA-BM25.XML, 
SOLRCONFIG-ALTDIRECTORY.XML, SOLRCONFIG-QUERYSENDER-NOQUERY.XML, 
COMPOUNDDICTIONARY.TXT, SOLRCONFIG_PERF.XML, 
SCHEMA-NOT-REQUIRED-UNIQUE-KEY.XML, KEEP-2.TXT, SCHEMA12.XML, 
MAPPING-ISOLATIN1ACCENT.TXT, BAD_SOLRCONFIG.XML, 
BAD-SCHEMA-EXTERNAL-FILEFIELD.XML]
   [junit4]   2 18401 T2929 oass.SolrIndexSearcher.init Opening 
Searcher@48370187 main
   [junit4]   2 18401 T2929 oass.SolrIndexSearcher.init WARNING WARNING: 
Directory impl does not support 

Re: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2719 - Failure

2012-05-31 Thread Robert Muir
Is this related to the recent packed ints changes? This test
historically required quite a lot of ram, maybe that sent it over the
edge?

On Thu, May 31, 2012 at 7:17 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2719/

 1 tests failed.
 REGRESSION:  org.apache.lucene.util.packed.TestPackedInts.testIntOverflow

 Error Message:
 Java heap space

 Stack Trace:
 java.lang.OutOfMemoryError: Java heap space
        at 
 __randomizedtesting.SeedInfo.seed([26B73FF6A7A21CED:83602BDCBFF8E4B8]:0)
        at 
 org.apache.lucene.util.packed.Packed64SingleBlock.init(Packed64SingleBlock.java:115)
        at 
 org.apache.lucene.util.packed.Packed64SingleBlock$Packed64SingleBlock5.init(Packed64SingleBlock.java:279)
        at 
 org.apache.lucene.util.packed.Packed64SingleBlock.create(Packed64SingleBlock.java:68)
        at 
 org.apache.lucene.util.packed.TestPackedInts.testIntOverflow(TestPackedInts.java:303)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
        at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
        at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
        at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
        at 
 org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
        at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
        at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
        at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at 
 org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
        at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at 
 org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
        at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
        at 
 org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)




 Build Log (for compile errors):
 [...truncated 1559 lines...]




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286616#comment-13286616
 ] 

Mark Harwood commented on LUCENE-4069:
--

bq. I dont understand why this handles fields. Someone should just pick that 
with perfieldpostingsformat.

That would be inefficient because your PFPF will see 
BloomFilteringPostingsFormat(field1 + Lucene40) and 
BloomFilteringPostingsFormat(field2 + Lucene40) as fundamentally different 
PostingsFormat instances and consequently create multiple files named 
differently because it assumes these instances may be capable of using 
radically different file structures.
In reality, the choice of BloomFilter with field 1 or BloomFilter with field 2 
or indeed no BloomFilter does not fundamentally alter the underlying delegate 
PostingFormat's file format - it only adds a supplementary blm file on the 
side with the field summaries. For this reason it is a mistake to configure 
seperate BloomFilterPostingsFormat instances on a per-field basis if they can 
share a common delegate.



 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286619#comment-13286619
 ] 

Robert Muir commented on LUCENE-4069:
-

{quote}
That would be inefficient because your PFPF will see 
BloomFilteringPostingsFormat(field1 + Lucene40) and 
BloomFilteringPostingsFormat(field2 + Lucene40) as fundamentally different 
PostingsFormat instances and consequently create multiple files named 
differently because it assumes these instances may be capable of using 
radically different file structures.
{quote}

But adding per-field handling here is not the way to solve this: its messy. 

Per-Field handling should all be handled at a level above in 
PerFieldPostingsFormat.

To solve what you speak of we just need to resolve LUCENE-4093. Then multiple 
postings format instances that are 'the same' will be deduplicated correctly.


 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java6-64 #345

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/345/

--
[...truncated 13489 lines...]
   [junit4] Completed in 0.01s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.core.ResourceLoaderTest
   [junit4] Completed in 0.02s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.internal.csv.ExtendedBufferedReaderTest
   [junit4] Completed in 0.02s, 8 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.ReturnFieldsTest
   [junit4] Completed in 0.93s, 10 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.update.DirectUpdateHandlerOptimizeTest
   [junit4] Completed in 0.86s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestLFUCache
   [junit4] Completed in 0.84s, 5 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.admin.MBeansHandlerTest
   [junit4] Completed in 0.91s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.update.DirectUpdateHandlerTest
   [junit4] Completed in 3.21s, 6 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterTest
   [junit4] Completed in 1.37s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.FullSolrCloudDistribCmdsTest
   [junit4] Completed in 39.24s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.OverseerTest
   [junit4] Completed in 46.76s, 7 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.analysis.TestPhoneticFilterFactory
   [junit4] Completed in 9.59s, 5 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.TestDistributedGrouping
   [junit4] Completed in 20.37s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.TestHashPartitioner
   [junit4] Completed in 4.11s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.TestMultiCoreConfBootstrap
   [junit4] Completed in 3.85s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.update.PeerSyncTest
   [junit4] Completed in 3.87s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.ConvertedLegacyTest
   [junit4] Completed in 3.03s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestFiltering
   [junit4] Completed in 3.25s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.core.SolrCoreTest
   [junit4] Completed in 5.75s, 5 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.component.StatsComponentTest
   [junit4] Completed in 5.73s, 6 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.SolrInfoMBeanTest
   [junit4] Completed in 1.01s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.update.SolrCmdDistributorTest
   [junit4] Completed in 2.18s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.request.TestWriterPerf
   [junit4] Completed in 1.23s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestPseudoReturnFields
   [junit4] Completed in 1.59s, 13 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.admin.ShowFileRequestHandlerTest
   [junit4] Completed in 1.35s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestSurroundQueryParser
   [junit4] Completed in 1.03s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.search.function.SortByFunctionTest
   [junit4] Completed in 2.31s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.admin.CoreAdminHandlerTest
   [junit4] Completed in 2.35s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.update.DocumentBuilderTest
   [junit4] Completed in 1.28s, 11 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.function.distance.DistanceFunctionTest
   [junit4] Completed in 1.36s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.SpatialFilterTest
   [junit4] Completed in 2.09s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.DocumentAnalysisRequestHandlerTest
   [junit4] Completed in 1.17s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestFoldingMultitermQuery
   [junit4] Completed in 1.53s, 18 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.schema.CurrencyFieldTest
   [junit4] IGNORED 0.00s | CurrencyFieldTest.testPerformance
   [junit4] Cause: Annotated @Ignore()
   [junit4] Completed in 1.49s, 8 tests, 1 skipped
   [junit4]  
   [junit4] Suite: org.apache.solr.core.RequestHandlersTest
   [junit4] Completed in 1.14s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.component.DebugComponentTest
   [junit4] Completed in 1.22s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.spelling.FileBasedSpellCheckerTest
   [junit4] Completed in 1.29s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.schema.PrimitiveFieldTypeTest
   [junit4] Completed in 1.54s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestValueSourceCache
   [junit4] Completed in 1.09s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.DisMaxRequestHandlerTest
   [junit4] Completed in 1.24s, 3 tests
   [junit4]  
   [junit4] Suite: 

[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286707#comment-13286707
 ] 

Mark Harwood commented on LUCENE-4069:
--

bq.  To solve what you speak of we just need to resolve LUCENE-4093. 

Presumably the main objective here is that in order to cut down on the number 
of files we store, content consumers of various types should aim to consolidate 
multiple fields' contents into a single file (if they share common config 
choices). 

bq. Then multiple postings format instances that are 'the same' will be 
deduplicated correctly.

The complication in this case is that we essentially have 2 consumers (Bloom 
and Lucene40), one wrapped in the other with different but overlapping choices 
of fields e.g we want a single Lucene40 to process all fields but we want Bloom 
to handle only a subset of these fields. This will be a tough one for PFPF to 
untangle while we are stuck with a delegating model for composing consumers. 

This may be made easier if instead of delegating a single stream we have a 
*stream-splitting* capability via a multicast subscription e.g. Bloom filtering 
consumer registers interest in content streams for fields A and B while 
Lucene40 is consolidating content from fields A, B, C and D. A broadcast 
mechanism feeds each consumer a copy of the relevant stream and each consumer 
is responsible for inventing their own file-naming convention that avoids 
muddling files.

While that may help for writing streams it doesn't solve the re-assembly of 
producer streams at read-time where BloomFilter absolutely has to position 
itself in front of the standard Lucene40 producer in order to offer fast-fail 
lookups. 

In the absence of a fancy optimised routing mechanism (this all may be 
overkill) my current solution was to put BloomFilter in the delegate chain 
armed with a subset of fieldnames to observe as a larger array of fields flow 
past to a common delegate. I added some Javadocs to describe the need to do it 
this way for an efficient configuration.
You are right that this is messy (ie open to bad configuration) but operating 
this deep down in Lucene that's always a possibility regardless of what we put 
in place.





 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286712#comment-13286712
 ] 

Robert Muir commented on LUCENE-4069:
-

{quote}
but overlapping choices of fields e.g we want a single Lucene40 to process all 
fields but we want Bloom to handle only a subset of these fields.
{quote}

Thats not true: I disagree. Its an implementation detail that Bloom as a 
postingsformat wraps another one (thats just the abstract implementation), and 
the file formats should not expose this in general for any format.

This is true for a number of reasons: e.g. in the pulsing case the wrapped 
writer only gets a subset of the postings: therefore the wrapped writer's files 
are incomplete and an implementation detail.

its enough here that if you have 5 fields: 2 bloom and 3 not, that we detect 
there are only two postings formats in use, regardless of whether you have 2 or 
5 actual object instances.


 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286717#comment-13286717
 ] 

Robert Muir commented on LUCENE-4069:
-

And separately, you can always contain the number of files even today by:
* using only unique instances yourself when writing (rather than waiting on 
LUCENE-4093)
* using the compound file format.

The purpose of LUCENE-4093 is just to make this simpler, but I opened it as a 
separate
issue because its really solely an optimization, and only for a pretty rare 
case where
people are customizing the index format for different fields.

 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java6-64 #346

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/346/changes


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4096) impossible to CheckIndex if you use norms other than byte[]

2012-05-31 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286739#comment-13286739
 ] 

Michael McCandless commented on LUCENE-4096:


+1

Not sure why I originally used TermQuery in CheckIndex... I think switching to 
DocsEnum is fine...

 impossible to CheckIndex if you use norms other than byte[]
 ---

 Key: LUCENE-4096
 URL: https://issues.apache.org/jira/browse/LUCENE-4096
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4096.patch


 I noticed TestCustomNorms had the checkIndexOnClose disabled, but
 I think this is a real problem.
 If someone wants to use e.g. float[] norms, they should be able to run
 CheckIndex.
 CheckIndex is fine with validating any norm type, the problem is that it 
 sometimes creates an IndexSearcher and fires off TermQueries for some 
 calculations. This causes it to (wrongly) fail, because DefaultSimilarity 
 expects single byte norms.
 I don't think CheckIndex needs to use TermQuery here, we can do this 
 differently so it doesnt use IndexSearcher or TermQuery but just the postings 
 apis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4092) Check what's Jenkins pattern for e-mailing log fragments (so that it includes failures).

2012-05-31 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286744#comment-13286744
 ] 

Steven Rowe commented on LUCENE-4092:
-

I plan on adding the following (as suggested by Robert) as alternations to the 
BUILD_LOG_REGEX for all non-Maven Jenkins jobs (some of these things don't run 
under the Maven jobs, and Maven's output is different enough that it'll require 
separate treatment):

bq. the javadocs warnings task

{noformat}
(?:[^\r\n]*\[javadoc\].*\r?\n)*[^\r\n]*\[javadoc\]\s*[1-9]\d*\s+warnings.*\r?\n
{noformat}

bq. two javadocs checkers in javadocs-lint

Output from javadocs-lint seems to show up only when there's a problem, so any 
output from it will always be extracted by the following regex:

{noformat}
[^\r\n]*javadocs-lint:.*\r?\n(?:[^\r\n]*\[echo\].*\r?\n)*
{noformat}

bq. and the rat-checker

{noformat}
[^\r\n]*rat-sources:\s+\[echo\].*(?:\r?\n[^\r\n]*\[echo\].*)*\s*[1-9]\d*\s+Unknown\s+Licenses.*\r?\n(?:[^\r\n]*\[echo\].*\r?\n)*
{noformat}

Along with two others:

# Compilation failures:
{noformat}
(?:[^\r\n]*\[javac\].*\r?\n)*[^\r\n]*\[javac\]\s*[1-9]\d*\s*error.*\r?\n
{noformat}
# Jenkins FATAL errors:
{noformat}
[^\r\n]*FATAL:(?s:.*)
{noformat}


 Check what's Jenkins pattern for e-mailing log fragments (so that it includes 
 failures).
 

 Key: LUCENE-4092
 URL: https://issues.apache.org/jira/browse/LUCENE-4092
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Priority: Trivial



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286754#comment-13286754
 ] 

Mark Harwood commented on LUCENE-4069:
--

Its true to say that Bloom is a different case to Pulsing - Bloom does not 
interfere in any with the normal recording of content in the wrapped delegate 
whereas Pulsing does.
It may prove useful for us to mark a formal distinction between these 
mutating/non mutating types so we can treat them differently and provide 
optimisations?


bq. And separately, you can always contain the number of files even today by 
using only unique instances yourself when writing

Contained but not optimal - roughly double the number of required files if I 
want the common case of a primary key indexed with Bloom. I can't see a way of 
indexing with Bloom-plus-Lucene40 on field A and indexing with just Lucene40 
on fields B,C and D and winding up with only one Lucene40 set of files with a 
common segment suffix. The way I did find of achieving this was to add a 
bloomFilteredFields set into my single Bloom+Lucene40 instance used for all 
fields. Is there any other option here currently? 

Looking to the future, 4093 may have more capabilities at optimising if it 
understands the distinction between mutating wrappers and non-mutating ones and 
how they are composed?



 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286756#comment-13286756
 ] 

Robert Muir commented on LUCENE-4069:
-

{quote}
Contained but not optimal - roughly double the number of required files if I 
want the common case of a primary key indexed with Bloom.
{quote}

Then use CFS, its optimal always (1). 

I really dont think we should make this complex to save 2 or 3 files total 
(even in a complex config with many fields). Its not worth the complexity.

 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Documenting document limits for Lucene and Solr

2012-05-31 Thread Walter Underwood
Deleted documents use IDs, so you may run out of doc IDs with fewer than 2^31 
searchable documents.

I recommend designing with a lot of slack, maybe using only 75% of IDs. Solr 
might alert when 90% of the space is used.

If you want to delete everything, then re-add everything without a commit, you 
will use 2X the doc IDs. That isn't even worst case.

If you reduce or black-out merging, you can end up with serious doc ID 
consumption.

With no merges, if you find lots of near-dupes and routinely replace documents 
with a better version, you can have many deleted documents for each searchable 
one. This can happen with web spidering. If you find five mirrors of a 
million-document site, and find the best one last, you can use five million doc 
IDs for those million docs.

wunder

On May 30, 2012, at 8:52 AM, Jack Krupansky wrote:

 AFAICT, there is no clear documentation of the maximum number of documents 
 that can be stored in a Lucene or Solr Index (single core/shard). It appears 
 to be 2^31 since a Lucene document number and the value returned from 
 IW.maxDoc is a Java “int”. Lucene users have that “hint” to guide them, but 
 that hint is never surfaced for Solr users, AFAICT. A few years ago nobody in 
 their right mind would imagine indexing 2 billion documents in a single 
 machine/core, but now people are at least tempted to try. So, it is now more 
 important for people to know about it, up front, not hidden down in the fine 
 print of Lucene file formats.
  
 I wanted to file a Jira on this, but I wanted to check first if anybody knows 
 of an existing Jira for it that maybe was worded in a way that it escaped my 
 semi-diligent searches.
  
 I was also thinking of filing it as two Jiras, one for Lucene and one for 
 Solr since the doc would be in different places. Or, should there be one 
 combined “Lucene/Solr Capacity Limits/Planning” wiki? Unless somebody 
 objects, I’ll file as two separate (but linked) issues.
  
 And, I was also thinking of filing two Jiras for Lucene and Solr to each have 
 a robust check for exceeding the underlying Lucene limit and reporting this 
 exception in a well-defined manner rather than “numFound” or “maxDoc” going 
 negative. But this is separate from the documentation issue, I think. Unless 
 somebody objects, I’ll file these as two separate issues.
  
 Any objection to me filing these four issues?
 
 -- Jack Krupansky






[jira] [Created] (LUCENE-4098) Efficient bulk operations for packed integer arrays

2012-05-31 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-4098:


 Summary: Efficient bulk operations for packed integer arrays
 Key: LUCENE-4098
 URL: https://issues.apache.org/jira/browse/LUCENE-4098
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Reporter: Adrien Grand
Priority: Minor
 Fix For: 4.1


There are some places in Lucene code that {iterate over,set} ranges of values 
of a packed integer array. Because bit-packing implementations (Packed*) tend 
be slower than direct implementations, this can take a lot of time.

For example, under some scenarii, GrowableWriter can take most of its 
(averaged) {{set}} time in resizing operations.

However, some bit-packing schemes, such as the one that is used by 
{{Packed64SingleBlock*}}, allow to implement efficient bulk operations such as 
get/set/fill. Implementing these bulk operations in 
{{PackedInts.{Reader,Mutable}}} and using them across other components instead 
of their single-value counterpart could help improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4096) impossible to CheckIndex if you use norms other than byte[]

2012-05-31 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4096.
-

   Resolution: Fixed
Fix Version/s: 5.0

 impossible to CheckIndex if you use norms other than byte[]
 ---

 Key: LUCENE-4096
 URL: https://issues.apache.org/jira/browse/LUCENE-4096
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4096.patch


 I noticed TestCustomNorms had the checkIndexOnClose disabled, but
 I think this is a real problem.
 If someone wants to use e.g. float[] norms, they should be able to run
 CheckIndex.
 CheckIndex is fine with validating any norm type, the problem is that it 
 sometimes creates an IndexSearcher and fires off TermQueries for some 
 calculations. This causes it to (wrongly) fail, because DefaultSimilarity 
 expects single byte norms.
 I don't think CheckIndex needs to use TermQuery here, we can do this 
 differently so it doesnt use IndexSearcher or TermQuery but just the postings 
 apis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4092) Check what's Jenkins pattern for e-mailing log fragments (so that it includes failures).

2012-05-31 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286771#comment-13286771
 ] 

Steven Rowe commented on LUCENE-4092:
-

I'm going to add one more to the regex:

{noformat}
# Third-party dependency license/notice problems
|[^\\r\\n]*validate:.*\\r?\\n[^\\r\\n]*\\[echo\\].*\\r?\\n(?:[^\\r\\n]*\\[licenses\\].*\\r?\\n)*[^\\r\\n]*\\[licenses\\].*[1-9]\\d*\\s+error.*\\r?\\n
{noformat}

 Check what's Jenkins pattern for e-mailing log fragments (so that it includes 
 failures).
 

 Key: LUCENE-4092
 URL: https://issues.apache.org/jira/browse/LUCENE-4092
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Priority: Trivial



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore

2012-05-31 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286770#comment-13286770
 ] 

Michael McCandless commented on LUCENE-4077:


Super, thanks Christoph, I'll commit shortly...

 ToParentBlockJoinCollector provides no way to access computed scores and the 
 maxScore
 -

 Key: LUCENE-4077
 URL: https://issues.apache.org/jira/browse/LUCENE-4077
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/join
Affects Versions: 3.4, 3.5, 3.6
Reporter: Christoph Kaser
Assignee: Michael McCandless
 Attachments: LUCENE-4077.patch, LUCENE-4077.patch, LUCENE-4077.patch, 
 LUCENE-4077.patch


 The constructor of ToParentBlockJoinCollector allows to turn on the tracking 
 of parent scores and the maximum parent score, however there is no way to 
 access those scores because:
 * maxScore is a private field, and there is no getter
 * TopGroups / GroupDocs does not provide access to the scores for the parent 
 documents, only the children

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4092) Check what's Jenkins pattern for e-mailing log fragments (so that it includes failures).

2012-05-31 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286774#comment-13286774
 ] 

Steven Rowe commented on LUCENE-4092:
-

bq. I'm going to add one more to the regex

Done - added to the configuration on all non-Maven Jenkins jobs

 Check what's Jenkins pattern for e-mailing log fragments (so that it includes 
 failures).
 

 Key: LUCENE-4092
 URL: https://issues.apache.org/jira/browse/LUCENE-4092
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Priority: Trivial



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3312) Break out StorableField from IndexableField

2012-05-31 Thread Nikola Tankovic (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikola Tankovic updated LUCENE-3312:


Attachment: lucene-3312-patch-04.patch

Patch 04: Status  core compiles.

This is an attempt to separate IndexableFields and StorebleFields in indexing. 

I introduced oal.index.Document which holds both type of fields.

I also introduced StorableFieldType interface, StoredFieldType class.

Let me know what you think!

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4098) Efficient bulk operations for packed integer arrays

2012-05-31 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4098:
-

Attachment: LUCENE-4098.patch

Here is the patch for the proposed modifications. All {{Mutable}} 
implementations have a new efficient {{fill}} method and Packed64SingleBlock* 
classes also have efficient bulk get and set.

For example, the execution time of the following (unrealistic) microbenchmark 
is more than twice better with the patch applied on my computer thanks to the 
use of {{PackedInts.copy}} instead of naive copy (see 
{{GrowableWriter#ensureCapacity}}).

{code}
for (int k = 0; k  50; ++k) {
long start = System.nanoTime();
GrowableWriter wrt = new GrowableWriter(1, 1  22, PackedInts.DEFAULT);
for (int i = 0; i  1  22; ++i) {
wrt.set(i,i);
}
long end = System.nanoTime();
System.out.println((end - start) / 100);
long sum = 0;
for (int i = 0; i  wrt.size(); ++i) {
sum += wrt.get(i);
}
System.out.println(sum);
}
{code}


 Efficient bulk operations for packed integer arrays
 ---

 Key: LUCENE-4098
 URL: https://issues.apache.org/jira/browse/LUCENE-4098
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Reporter: Adrien Grand
Priority: Minor
 Fix For: 4.1

 Attachments: LUCENE-4098.patch


 There are some places in Lucene code that {iterate over,set} ranges of values 
 of a packed integer array. Because bit-packing implementations (Packed*) tend 
 be slower than direct implementations, this can take a lot of time.
 For example, under some scenarii, GrowableWriter can take most of its 
 (averaged) {{set}} time in resizing operations.
 However, some bit-packing schemes, such as the one that is used by 
 {{Packed64SingleBlock*}}, allow to implement efficient bulk operations such 
 as get/set/fill. Implementing these bulk operations in 
 {{PackedInts.{Reader,Mutable}}} and using them across other components 
 instead of their single-value counterpart could help improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Documenting document limits for Lucene and Solr

2012-05-31 Thread Jack Krupansky
Thanks. That’s all good info to be documented for users to be aware of when 
they start pushing the limits.

-- Jack Krupansky

From: Walter Underwood 
Sent: Thursday, May 31, 2012 1:30 PM
To: dev@lucene.apache.org 
Subject: Re: Documenting document limits for Lucene and Solr

Deleted documents use IDs, so you may run out of doc IDs with fewer than 2^31 
searchable documents. 

I recommend designing with a lot of slack, maybe using only 75% of IDs. Solr 
might alert when 90% of the space is used.

If you want to delete everything, then re-add everything without a commit, you 
will use 2X the doc IDs. That isn't even worst case.

If you reduce or black-out merging, you can end up with serious doc ID 
consumption.

With no merges, if you find lots of near-dupes and routinely replace documents 
with a better version, you can have many deleted documents for each searchable 
one. This can happen with web spidering. If you find five mirrors of a 
million-document site, and find the best one last, you can use five million doc 
IDs for those million docs.

wunder

On May 30, 2012, at 8:52 AM, Jack Krupansky wrote:


  AFAICT, there is no clear documentation of the maximum number of documents 
that can be stored in a Lucene or Solr Index (single core/shard). It appears to 
be 2^31 since a Lucene document number and the value returned from IW.maxDoc is 
a Java “int”. Lucene users have that “hint” to guide them, but that hint is 
never surfaced for Solr users, AFAICT. A few years ago nobody in their right 
mind would imagine indexing 2 billion documents in a single machine/core, but 
now people are at least tempted to try. So, it is now more important for people 
to know about it, up front, not hidden down in the fine print of Lucene file 
formats.

  I wanted to file a Jira on this, but I wanted to check first if anybody knows 
of an existing Jira for it that maybe was worded in a way that it escaped my 
semi-diligent searches.

  I was also thinking of filing it as two Jiras, one for Lucene and one for 
Solr since the doc would be in different places. Or, should there be one 
combined “Lucene/Solr Capacity Limits/Planning” wiki? Unless somebody objects, 
I’ll file as two separate (but linked) issues.

  And, I was also thinking of filing two Jiras for Lucene and Solr to each have 
a robust check for exceeding the underlying Lucene limit and reporting this 
exception in a well-defined manner rather than “numFound” or “maxDoc” going 
negative. But this is separate from the documentation issue, I think. Unless 
somebody objects, I’ll file these as two separate issues.

  Any objection to me filing these four issues?

  -- Jack Krupansky






[jira] [Commented] (LUCENE-4092) Check what's Jenkins pattern for e-mailing log fragments (so that it includes failures).

2012-05-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286794#comment-13286794
 ] 

Robert Muir commented on LUCENE-4092:
-

awesome! thank you!

 Check what's Jenkins pattern for e-mailing log fragments (so that it includes 
 failures).
 

 Key: LUCENE-4092
 URL: https://issues.apache.org/jira/browse/LUCENE-4092
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Priority: Trivial



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java6-64 #348

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/348/changes

Changes:

[rmuir] LUCENE-4096: impossible to checkindex if you use norms other than byte[]

--
[...truncated 10535 lines...]
   [junit4]   2 79515 T3115 oasc.RequestHandlers.initHandlersFromConfig 
created /update: solr.UpdateRequestHandler
   [junit4]   2 79515 T3115 oasc.RequestHandlers.initHandlersFromConfig 
created /terms: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 79515 T3115 oasc.RequestHandlers.initHandlersFromConfig 
created spellCheckCompRH: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 79515 T3115 oasc.RequestHandlers.initHandlersFromConfig 
created spellCheckCompRH_Direct: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 79516 T3115 oasc.RequestHandlers.initHandlersFromConfig 
created spellCheckCompRH1: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 79516 T3115 oasc.RequestHandlers.initHandlersFromConfig 
created tvrh: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 79516 T3115 oasc.RequestHandlers.initHandlersFromConfig 
created /mlt: solr.MoreLikeThisHandler
   [junit4]   2 79517 T3115 oasc.RequestHandlers.initHandlersFromConfig 
created /debug/dump: solr.DumpRequestHandler
   [junit4]   2 79517 T3115 oashl.XMLLoader.init xsltCacheLifetimeSeconds=60
   [junit4]   2 79520 T3115 oasc.SolrCore.initDeprecatedSupport WARNING 
solrconfig.xml uses deprecated admin/gettableFiles, Please update your config 
to use the ShowFileRequestHandler.
   [junit4]   2 79522 T3115 oasc.SolrCore.initDeprecatedSupport WARNING adding 
ShowFileRequestHandler with hidden files: [SOLRCONFIG-HIGHLIGHT.XML, 
SCHEMA-REQUIRED-FIELDS.XML, SCHEMA-REPLICATION2.XML, SCHEMA-MINIMAL.XML, 
BAD-SCHEMA-DUP-DYNAMICFIELD.XML, SOLRCONFIG-CACHING.XML, 
SOLRCONFIG-REPEATER.XML, CURRENCY.XML, BAD-SCHEMA-NONTEXT-ANALYZER.XML, 
SOLRCONFIG-MERGEPOLICY.XML, SOLRCONFIG-TLOG.XML, SOLRCONFIG-MASTER.XML, 
SCHEMA11.XML, SOLRCONFIG-BASIC.XML, DA_COMPOUNDDICTIONARY.TXT, 
SCHEMA-COPYFIELD-TEST.XML, SOLRCONFIG-SLAVE.XML, ELEVATE.XML, 
SOLRCONFIG-PROPINJECT-INDEXDEFAULT.XML, SCHEMA-IB.XML, 
SOLRCONFIG-QUERYSENDER.XML, SCHEMA-REPLICATION1.XML, DA_UTF8.XML, 
HYPHENATION.DTD, SOLRCONFIG-ENABLEPLUGIN.XML, SCHEMA-PHRASESUGGEST.XML, 
STEMDICT.TXT, HUNSPELL-TEST.AFF, STOPTYPES-1.TXT, STOPWORDSWRONGENCODING.TXT, 
SCHEMA-NUMERIC.XML, SOLRCONFIG-TRANSFORMERS.XML, SOLRCONFIG-PROPINJECT.XML, 
BAD-SCHEMA-NOT-INDEXED-BUT-TF.XML, SOLRCONFIG-SIMPLELOCK.XML, WDFTYPES.TXT, 
STOPTYPES-2.TXT, SCHEMA-REVERSED.XML, SOLRCONFIG-SPELLCHECKCOMPONENT.XML, 
SCHEMA-DFR.XML, SOLRCONFIG-PHRASESUGGEST.XML, 
BAD-SCHEMA-NOT-INDEXED-BUT-POS.XML, KEEP-1.TXT, OPEN-EXCHANGE-RATES.JSON, 
STOPWITHBOM.TXT, SCHEMA-BINARYFIELD.XML, SOLRCONFIG-SPELLCHECKER.XML, 
SOLRCONFIG-UPDATE-PROCESSOR-CHAINS.XML, BAD-SCHEMA-OMIT-TF-BUT-NOT-POS.XML, 
BAD-SCHEMA-DUP-FIELDTYPE.XML, SOLRCONFIG-MASTER1.XML, SYNONYMS.TXT, SCHEMA.XML, 
SCHEMA_CODEC.XML, SOLRCONFIG-SOLR-749.XML, 
SOLRCONFIG-MASTER1-KEEPONEBACKUP.XML, STOP-2.TXT, SOLRCONFIG-FUNCTIONQUERY.XML, 
SCHEMA-LMDIRICHLET.XML, SOLRCONFIG-TERMINDEX.XML, SOLRCONFIG-ELEVATE.XML, 
STOPWORDS.TXT, SCHEMA-FOLDING.XML, SCHEMA-STOP-KEEP.XML, 
BAD-SCHEMA-NOT-INDEXED-BUT-NORMS.XML, SOLRCONFIG-SOLCOREPROPERTIES.XML, 
STOP-1.TXT, SOLRCONFIG-MASTER2.XML, SCHEMA-SPELLCHECKER.XML, 
SOLRCONFIG-LAZYWRITER.XML, SCHEMA-LUCENEMATCHVERSION.XML, 
BAD-MP-SOLRCONFIG.XML, FRENCHARTICLES.TXT, SCHEMA15.XML, 
SOLRCONFIG-REQHANDLER.INCL, SCHEMASURROUND.XML, SOLRCONFIG-MASTER3.XML, 
HUNSPELL-TEST.DIC, SOLRCONFIG-XINCLUDE.XML, SOLRCONFIG-DELPOLICY1.XML, 
SOLRCONFIG-SLAVE1.XML, SCHEMA-SIM.XML, SCHEMA-COLLATE.XML, STOP-SNOWBALL.TXT, 
PROTWORDS.TXT, SCHEMA-TRIE.XML, SOLRCONFIG_CODEC.XML, SCHEMA-TFIDF.XML, 
SCHEMA-LMJELINEKMERCER.XML, PHRASESUGGEST.TXT, OLD_SYNONYMS.TXT, 
SOLRCONFIG-DELPOLICY2.XML, XSLT, SOLRCONFIG-NATIVELOCK.XML, 
BAD-SCHEMA-DUP-FIELD.XML, SOLRCONFIG-NOCACHE.XML, SCHEMA-BM25.XML, 
SOLRCONFIG-ALTDIRECTORY.XML, SOLRCONFIG-QUERYSENDER-NOQUERY.XML, 
COMPOUNDDICTIONARY.TXT, SOLRCONFIG_PERF.XML, 
SCHEMA-NOT-REQUIRED-UNIQUE-KEY.XML, KEEP-2.TXT, SCHEMA12.XML, 
MAPPING-ISOLATIN1ACCENT.TXT, BAD_SOLRCONFIG.XML, 
BAD-SCHEMA-EXTERNAL-FILEFIELD.XML]
   [junit4]   2 79525 T3115 oass.SolrIndexSearcher.init Opening 
Searcher@3de7e517 main
   [junit4]   2 79525 T3115 oass.SolrIndexSearcher.init WARNING WARNING: 
Directory impl does not support setting indexDir: 
org.apache.lucene.store.MockDirectoryWrapper
   [junit4]   2 79525 T3115 oasu.CommitTracker.init Hard AutoCommit: disabled
   [junit4]   2 79526 T3115 oasu.CommitTracker.init Soft AutoCommit: disabled
   [junit4]   2 79526 T3115 oashc.SpellCheckComponent.inform Initializing 
spell checkers
   [junit4]   2 79534 T3115 oass.DirectSolrSpellChecker.init init: 
{name=direct,classname=DirectSolrSpellChecker,field=lowerfilt,minQueryLength=3}
   [junit4]   2 79574 T3115 oashc.HttpShardHandlerFactory.getParameter Setting 

[jira] [Resolved] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore

2012-05-31 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-4077.


   Resolution: Fixed
Fix Version/s: 5.0
   4.0

 ToParentBlockJoinCollector provides no way to access computed scores and the 
 maxScore
 -

 Key: LUCENE-4077
 URL: https://issues.apache.org/jira/browse/LUCENE-4077
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/join
Affects Versions: 3.4, 3.5, 3.6
Reporter: Christoph Kaser
Assignee: Michael McCandless
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4077.patch, LUCENE-4077.patch, LUCENE-4077.patch, 
 LUCENE-4077.patch


 The constructor of ToParentBlockJoinCollector allows to turn on the tracking 
 of parent scores and the maximum parent score, however there is no way to 
 access those scores because:
 * maxScore is a private field, and there is no getter
 * TopGroups / GroupDocs does not provide access to the scores for the parent 
 documents, only the children

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286815#comment-13286815
 ] 

Mark Harwood commented on LUCENE-4069:
--

bq. Its not worth the complexity

There's no real added complexity in BloomFilterPostingsFormat - it has to be 
capable of storing blooms for 1 field anyway and using the fieldname set is 
roughly 2 extra lines of code to see if a TermsConsumer needs wrapping or not.


From a client side you don't have to use this feature - the fieldname set can 
be null in which case it will wrap all fields sent its way. If you do chose to 
supply a set the wrapped PostingsFormat will have the advantage of being 
shared for bloomed and non-bloomed fields. We could add a constructor that 
removes the set and mark the others expert.

For me this falls into one of the many faster-if-you-know-about-it 
optimisations like FieldSelectors or recycling certain objects. Basically a 
useful hint to Lucene to save some extra effort but one which you dont *need* 
to use.

Lucene-4093 may in future resolve the multi-file issue but I'm not sure it will 
do so without significant complication.

 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2723 - Failure

2012-05-31 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2723/

1 tests failed.
REGRESSION:  org.apache.lucene.util.packed.TestPackedInts.testIntOverflow

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
__randomizedtesting.SeedInfo.seed([44E7903FBFDCF43:A1996D29E3A73716]:0)
at 
org.apache.lucene.util.packed.Packed64SingleBlock.init(Packed64SingleBlock.java:115)
at 
org.apache.lucene.util.packed.Packed64SingleBlock$Packed64SingleBlock3.init(Packed64SingleBlock.java:315)
at 
org.apache.lucene.util.packed.Packed64SingleBlock.create(Packed64SingleBlock.java:64)
at 
org.apache.lucene.util.packed.TestPackedInts.testIntOverflow(TestPackedInts.java:303)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)




Build Log:
${BUILD_LOG_REGEX,regex=(?x:
# Compilation failures
(?:[^\\r\\n]*\\[javac\\].*\\r?\\n)*[^\\r\\n]*\\[javac\\]\\s*[1-9]\\d*\\s*error.*\\r?\\n
# Test failures
|[^\\r\\n]*\\[junit4\\]\\s*Suite:.*[\\r\\n]+[^\\r\\n]*\\[junit4\\]\\s*(?!Completed)(?!IGNOR)\\S(?s:.*?)\\s*FAILURES!
# License problems
|[^\\r\\n]*rat-sources:\\s+\\[echo\\].*(?:\\r?\\n[^\\r\\n]*\\[echo\\].*)*\\s*[1-9]\\d*\\s+Unknown\\s+Licenses.*\\r?\\n(?:[^\\r\\n]*\\[echo\\].*\\r?\\n)*
# Javadocs warnings
|(?:[^\\r\\n]*\\[javadoc\\].*\\r?\\n)*[^\\r\\n]*\\[javadoc\\]\\s*[1-9]\\d*\\s+warnings.*\\r?\\n
# Other javadocs problems (broken links and missing javadocs)
|[^\\r\\n]*javadocs-lint:.*\\r?\\n(?:[^\\r\\n]*\\[echo\\].*\\r?\\n)*
# Third-party dependency license/notice problems
|[^\\r\\n]*validate:.*\\r?\\n[^\\r\\n]*\\[echo\\].*\\r?\\n(?:[^\\r\\n]*\\[licenses\\].*\\r?\\n)*[^\\r\\n]*\\[licenses\\].*[1-9]\\d*\\s+error.*\\r?\\n
# Jenkins problems
|[^\\r\\n]*FATAL:(?s:.*)
)}


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2723 - Failure

2012-05-31 Thread Steven A Rowe
Hmm, looks like spreading the BUILD_LOG_REGEX across multiple lines caused it 
not to be recognized.

Jenkins's email templating functionality is provided by the Jenkins Email 
Extension Plugin (email-ext) 
https://wiki.jenkins-ci.org/display/JENKINS/Email-ext+plugin.

The token parsing is done by 
hudson.plugins.emailext.plugins.ContentBuilder.Tokenizer: 
https://github.com/jenkinsci/email-ext-plugin/blob/master/src/main/java/hudson/plugins/emailext/plugins/ContentBuilder.java#L134

Here's the relevant argument-value regex (used to parse the value of the 
regex argument to the BUILD_LOG_REGEX token): 

private static final String stringRegex = 
\([^\\\r\\n]|(.))*\;

So I *think* if I put a backslash (escaped with another backslash) at the end 
of each line, I can keep the multiple lines (and comments).  I'll give it a try.

Steve

-Original Message-
From: Apache Jenkins Server [mailto:jenk...@builds.apache.org] 
Sent: Thursday, May 31, 2012 2:55 PM
To: dev@lucene.apache.org
Subject: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2723 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2723/

1 tests failed.
REGRESSION:  org.apache.lucene.util.packed.TestPackedInts.testIntOverflow

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
__randomizedtesting.SeedInfo.seed([44E7903FBFDCF43:A1996D29E3A73716]:0)
at 
org.apache.lucene.util.packed.Packed64SingleBlock.init(Packed64SingleBlock.java:115)
at 
org.apache.lucene.util.packed.Packed64SingleBlock$Packed64SingleBlock3.init(Packed64SingleBlock.java:315)
at 
org.apache.lucene.util.packed.Packed64SingleBlock.create(Packed64SingleBlock.java:64)
at 
org.apache.lucene.util.packed.TestPackedInts.testIntOverflow(TestPackedInts.java:303)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)




Build Log:
${BUILD_LOG_REGEX,regex=(?x:
# Compilation failures
(?:[^\\r\\n]*\\[javac\\].*\\r?\\n)*[^\\r\\n]*\\[javac\\]\\s*[1-9]\\d*\\s*error.*\\r?\\n
# Test failures

[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286868#comment-13286868
 ] 

Simon Willnauer commented on LUCENE-4069:
-

bq. I really dont think we should make this complex to save 2 or 3 files total 
(even in a complex config with many fields). Its not worth the complexity.

I agree. I think those postings formats should only deal with encoding and not 
with handling certain fields different. A user / app should handle this in the 
codec. Ideally you don't have any conditions in the relevant methods like 
termsConsumer etc. 

bq. For me this falls into one of the many faster-if-you-know-about-it 
optimisations like FieldSelectors or recycling certain objects. Basically a 
useful hint to Lucene to save some extra effort but one which you dont need to 
use.

why is this a speed improvement? reading from one file vs. multiple is not 
really faster though.

Anyway, I think we should make this patch as simple as possible and don't 
handle fields in the PF. We can still open another issue or wait until 
LUCENE-4093 is in to discuss this issue?

 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286882#comment-13286882
 ] 

Robert Muir commented on LUCENE-4069:
-

{quote}
For me this falls into one of the many faster-if-you-know-about-it 
optimisations like FieldSelectors or recycling certain objects. Basically a 
useful hint to Lucene to save some extra effort but one which you dont need to 
use.
{quote}

I agree with Simon, its not going to be faster.

Worse, it creates a situation from the per-field perspective where multiple 
postings formats are sharing the same files for a segment.

This would make it harder to do things like refactorings of codec apis in the 
future.

So where is the benefit?

 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java6-64 #349

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/349/changes


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4092) Check what's Jenkins pattern for e-mailing log fragments (so that it includes failures).

2012-05-31 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286887#comment-13286887
 ] 

Dawid Weiss commented on LUCENE-4092:
-

Thanks for working on this, Steve. It'll really be useful.

 Check what's Jenkins pattern for e-mailing log fragments (so that it includes 
 failures).
 

 Key: LUCENE-4092
 URL: https://issues.apache.org/jira/browse/LUCENE-4092
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Priority: Trivial



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-05-31 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286909#comment-13286909
 ] 

Andrzej Bialecki  commented on LUCENE-3312:
---

Comments to patch 04:

* index.Document is an interface, I think for better extensibility in the 
future it could be an abstract class - who knows what we will want to put there 
in addition to the iterators...
* as noted on IRC, this strong decoupling of stored and indexed content poses 
some interesting questions:
** since you can add multiple fields with the same name, you can now add an 
arbitrary sequence of Stored and Indexed fields (all with the same name). This 
means that you can now store parts of a field that are not indexed, and parts 
of a field that are indexed but not stored.
** previously, if a field was flagged as indexed but didn't have a tokenStream, 
its String or Reader value would be used to create a token stream. Now if you 
want a value to be stored and indexed you have to add two fields with the same 
name - one StoredField and the other an IndexedField for which you create a 
token stream from the value. My assumption is that StoredField-s will never be 
used anymore as potential sources of token streams?
* maybe this is a good moment to change all getters that return arrays of 
fields or values to return List-s, since all the code is doing underneath is 
collecting them into lists and then converting to arrays?
* previously we allowed one to remove fields from document by name, are we 
going to allow this now separately for indexed and stored fields?

* minor nit: there's a grammar mistake in Field.setTokenStream(..): 
TokenStream fields tokenized.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286916#comment-13286916
 ] 

Mark Harwood commented on LUCENE-4069:
--

bq. why is this a speed improvement?

Sorry - misleading. Replace the word faster in my comment with better and 
that makes more sense - I mean better in terms of resource usage and reduced 
open file handles. This seemed relevant given the earlier comments about Solr's 
use of non-compound files:

bq. [Solr] create massive amounts of files if we did so (add to the fact it 
disables compound files by default and its a disaster...)

I can see there is a useful simplification being sought for here if PerFieldPF 
can consider each of the unique top-level PFs presented to it as looking after 
an exclusive set of files. As the centralised allocator of file names it can 
then simply call each unique PF with a choice of segment suffix to name its 
various files without conflicting with other PFs. Lucene 4093 is all about 
better determining which PF is unique using .equals(). Unfortunately I don't 
think this approach is sufficiently complex. In order to avoid allocating all 
unnecessary file names PerFieldPF would have to further understand the nuances 
of which PFs were being wrapped by other PFs and which wrapped PFs would be 
reusable outside of their wrapped PF (as is the case with BloomPF's wrapped 
PF). That seems a more complex task than implementing equals(). 

So it seems we have 3 options:
1) Ignore the problems of creating too many files in the case of BloomPF and 
any other examples of wrapping PFs
2) Create a PerFieldPF implementation that reuses wrapped PFs using some 
generic means of discovering recyclable wrapped PFs (i.e go further than what 
4093 currently proposes in adding .equals support)
3) Retain my BloomPF-specific solution to the problem for those prepared to use 
lower-level APIs.

Am I missing any other options and which one do you want to go for?



 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286926#comment-13286926
 ] 

Simon Willnauer commented on LUCENE-4069:
-

bq. Create a PerFieldPF implementation that reuses wrapped PFs using some 
generic means of discovering recyclable wrapped PFs (i.e go further than what 
4093 currently proposes in adding .equals support)

I think we should investigate this further. Lets keep this issue simple and 
remove the field handling and fix this on a higher level!


 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-31 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286926#comment-13286926
 ] 

Simon Willnauer edited comment on LUCENE-4069 at 5/31/12 8:57 PM:
--

bq. This seemed relevant given the earlier comments about Solr's use of 
non-compound files:
We can't make wrong decisions just because higher level apps make wrong 
decisions. The dependency goes Solr - Lucene not the other way around. We 
provide fine grained control when to use CFS ie for smallish segments etc. If 
you have hundreds of fields all using different PF etc. you have to deal with 
tons of files but that is to be honest not very likely to be the common case.

bq. Create a PerFieldPF implementation that reuses wrapped PFs using some 
generic means of discovering recyclable wrapped PFs (i.e go further than what 
4093 currently proposes in adding .equals support)

I think we should investigate this further. Lets keep this issue simple and 
remove the field handling and fix this on a higher level!

  was (Author: simonw):
bq. Create a PerFieldPF implementation that reuses wrapped PFs using some 
generic means of discovering recyclable wrapped PFs (i.e go further than what 
4093 currently proposes in adding .equals support)

I think we should investigate this further. Lets keep this issue simple and 
remove the field handling and fix this on a higher level!

  
 Segment-level Bloom filters for a 2 x speed up on rare term searches
 

 Key: LUCENE-4069
 URL: https://issues.apache.org/jira/browse/LUCENE-4069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.6, 4.0
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0, 3.6.1

 Attachments: BloomFilterPostings40.patch, 
 MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip


 An addition to each segment which stores a Bloom filter for selected fields 
 in order to give fast-fail to term searches, helping avoid wasted disk access.
 Best suited for low-frequency fields e.g. primary keys on big indexes with 
 many segments but also speeds up general searching in my tests.
 Overview slideshow here: 
 http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
 Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
 Patch based on 3.6 codebase attached.
 There are no 3.6 API changes currently - to play just add a field with _blm 
 on the end of the name to invoke special indexing/querying capability. 
 Clearly a new Field or schema declaration(!) would need adding to APIs to 
 configure the service properly.
 Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2723 - Failure

2012-05-31 Thread Michael McCandless
This test intentionally allocates ~256 MB packed ints ... the seed
doesn't fail in isolation, but I think the test fails if it's run with
other tests that leave too much uncollectible stuff allocated in the
heap ...

Can we somehow mark that a test should be run in isolation (it's own
new JVM)...?

Another option ... would be to ignore the OOME ... but the risk there
is we suppress a real OOME from a sudden bug in the packed ints.
Though it's unlikely such a breakage would escape our usages of packed
ints... so maybe it's fine.

Mike McCandless

http://blog.mikemccandless.com


On Thu, May 31, 2012 at 2:54 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2723/

 1 tests failed.
 REGRESSION:  org.apache.lucene.util.packed.TestPackedInts.testIntOverflow

 Error Message:
 Java heap space

 Stack Trace:
 java.lang.OutOfMemoryError: Java heap space
        at 
 __randomizedtesting.SeedInfo.seed([44E7903FBFDCF43:A1996D29E3A73716]:0)
        at 
 org.apache.lucene.util.packed.Packed64SingleBlock.init(Packed64SingleBlock.java:115)
        at 
 org.apache.lucene.util.packed.Packed64SingleBlock$Packed64SingleBlock3.init(Packed64SingleBlock.java:315)
        at 
 org.apache.lucene.util.packed.Packed64SingleBlock.create(Packed64SingleBlock.java:64)
        at 
 org.apache.lucene.util.packed.TestPackedInts.testIntOverflow(TestPackedInts.java:303)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
        at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
        at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
        at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
        at 
 org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
        at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
        at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
        at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at 
 org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
        at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at 
 org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
        at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
        at 
 org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)




 Build Log:
 ${BUILD_LOG_REGEX,regex=(?x:
 # Compilation failures
 (?:[^\\r\\n]*\\[javac\\].*\\r?\\n)*[^\\r\\n]*\\[javac\\]\\s*[1-9]\\d*\\s*error.*\\r?\\n
 # Test failures
 |[^\\r\\n]*\\[junit4\\]\\s*Suite:.*[\\r\\n]+[^\\r\\n]*\\[junit4\\]\\s*(?!Completed)(?!IGNOR)\\S(?s:.*?)\\s*FAILURES!
 # License problems
 |[^\\r\\n]*rat-sources:\\s+\\[echo\\].*(?:\\r?\\n[^\\r\\n]*\\[echo\\].*)*\\s*[1-9]\\d*\\s+Unknown\\s+Licenses.*\\r?\\n(?:[^\\r\\n]*\\[echo\\].*\\r?\\n)*
 # Javadocs warnings
 |(?:[^\\r\\n]*\\[javadoc\\].*\\r?\\n)*[^\\r\\n]*\\[javadoc\\]\\s*[1-9]\\d*\\s+warnings.*\\r?\\n
 # Other javadocs problems (broken links and missing javadocs)
 

Re: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2723 - Failure

2012-05-31 Thread Dawid Weiss
 This test intentionally allocates ~256 MB packed ints ... the seed
 doesn't fail in isolation, but I think the test fails if it's run with
 other tests that leave too much uncollectible stuff allocated in the
 heap ...

It doesn't need to be hard refs. With parallel garbage collectors
(with various staged memory pools) and fast allocation rate a thread
may fail with an OOM even if there is theoretically enough space for a
new allocated block. Running with SerialGC typically fixes the problem
but then -- this isn't realistic :)

 Can we somehow mark that a test should be run in isolation (it's own
 new JVM)...?

Technically this is possible I think (can't tell how large refactoring
it woudl be). But something in me objects to this idea. On the one
hand this is ideal test isolation; on the other hand I bet with time
all tests would just require a forked VM because it's simpler this
way. Good tests should clean up after themselves. I'm idealistic but
I believe tests should be fixed if they don't follow this rule.

 Another option ... would be to ignore the OOME ... but the risk there
 is we suppress a real OOME from a sudden bug in the packed ints.
 Though it's unlikely such a breakage would escape our usages of packed
 ints... so maybe it's fine.

How close are we to the memory limit if run in isolation (as a
stand-alone test case)? We can probably measure this by allocating a
byte[] before the test and doing binary search on its size depending
on if it OOMs or not? Maybe it's just really close to the memory
limit?

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2723 - Failure

2012-05-31 Thread Michael McCandless
On Thu, May 31, 2012 at 5:16 PM, Dawid Weiss
dawid.we...@cs.put.poznan.pl wrote:
 This test intentionally allocates ~256 MB packed ints ... the seed
 doesn't fail in isolation, but I think the test fails if it's run with
 other tests that leave too much uncollectible stuff allocated in the
 heap ...

 It doesn't need to be hard refs. With parallel garbage collectors
 (with various staged memory pools) and fast allocation rate a thread
 may fail with an OOM even if there is theoretically enough space for a
 new allocated block. Running with SerialGC typically fixes the problem
 but then -- this isn't realistic :)

Got it.

 Can we somehow mark that a test should be run in isolation (it's own
 new JVM)...?

 Technically this is possible I think (can't tell how large refactoring
 it woudl be). But something in me objects to this idea. On the one
 hand this is ideal test isolation; on the other hand I bet with time
 all tests would just require a forked VM because it's simpler this
 way. Good tests should clean up after themselves. I'm idealistic but
 I believe tests should be fixed if they don't follow this rule.

Yeah I hear you... hmm do we forcefully clear the FieldCache after
tests...?  Though, in theory once the AtomicReader is collectible the
FC's entries should be too...

 Another option ... would be to ignore the OOME ... but the risk there
 is we suppress a real OOME from a sudden bug in the packed ints.
 Though it's unlikely such a breakage would escape our usages of packed
 ints... so maybe it's fine.

 How close are we to the memory limit if run in isolation (as a
 stand-alone test case)? We can probably measure this by allocating a
 byte[] before the test and doing binary search on its size depending
 on if it OOMs or not? Maybe it's just really close to the memory
 limit?

OK I did that: if I alloc 68 MB byte[] up front we OOME, but 67 MB
byte[] and the test passes (run in isolation).

That's closer than I expected: the max long[] we alloc in the test is
273 MB.  So 512 - 273 - 68 = 171 MB unexplained hmm I think this
is because large arrays are alloc'd directly from the old generation:


http://stackoverflow.com/questions/9738911/javas-serial-garbage-collector-performing-far-better-than-other-garbage-collect

When I run with -XX:NewRatio=10 then I can pre-alloc 191 MB byte[] and
the test still passes ...

I think the best option is to ignore the OOME from this test case...?

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java6-64 #351

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/351/

--
[...truncated 10516 lines...]
   [junit4]   2at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
   [junit4]   2at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1146)
   [junit4]   2 
   [junit4]   2 168401 T3160 oas.SolrTestCaseJ4.endTrackingSearchers SEVERE 
ERROR: SolrIndexSearcher opens=430 closes=429
   [junit4]   2 NOTE: test params are: codec=Lucene40: {}, 
sim=RandomSimilarityProvider(queryNorm=false,coord=true): {}, locale=is, 
timezone=Europe/Vienna
   [junit4]   2 NOTE: Windows 7 6.1 amd64/Sun Microsystems Inc. 1.6.0_32 
(64-bit)/cpus=2,threads=2,free=161686656,total=259457024
   [junit4]   2 NOTE: All tests run in this JVM: [SpellCheckCollatorTest, 
TestIndexingPerformance, CloudStateUpdateTest, 
TestRussianLightStemFilterFactory, TestValueSourceCache, 
IndexBasedSpellCheckerTest, OutputWriterTest, TestQueryUtils, TestRecovery, 
TestHindiFilters, TestPropInjectDefaults, TestPseudoReturnFields, 
TermVectorComponentTest, ConvertedLegacyTest, PrimUtilsTest, 
TestIndonesianStemFilterFactory, TestDelimitedPayloadTokenFilterFactory, 
NoCacheHeaderTest, TestCJKWidthFilterFactory, 
TestNorwegianLightStemFilterFactory, TestHTMLStripCharFilterFactory, 
TestThaiWordFilterFactory, TestGroupingSearch, 
TestHungarianLightStemFilterFactory, BasicZkTest, TestPatternTokenizerFactory, 
BadComponentTest, TestFunctionQuery, TestJmxIntegration, QueryParsingTest, 
TestRemoveDuplicatesTokenFilterFactory, TestPhraseSuggestions, TestDocSet, 
DirectSolrConnectionTest, TestSwedishLightStemFilterFactory, TestConfig, 
TestNGramFilters, TestJoin, TestSolrCoreProperties, LukeRequestHandlerTest, 
TestSolrDeletionPolicy1, PolyFieldTest, NotRequiredUniqueKeyTest, 
TestTypeTokenFilterFactory, TestQueryTypes, XsltUpdateRequestHandlerTest, 
TestMappingCharFilterFactory, TestElisionFilterFactory, 
TestFoldingMultitermQuery, CommonGramsFilterFactoryTest, 
QueryElevationComponentTest, LengthFilterTest, TestMergePolicyConfig, 
SolrCoreTest, ShowFileRequestHandlerTest, TestKStemFilterFactory, 
SuggesterWFSTTest, TestItalianLightStemFilterFactory, TestLuceneMatchVersion, 
LeaderElectionTest, LegacyHTMLStripCharFilterTest, 
TestSuggestSpellingConverter, TestIndexSearcher, TestSolrXMLSerializer, 
DateFieldTest, TestMultiCoreConfBootstrap, TestBinaryResponseWriter, 
TestSolrQueryParser, SystemInfoHandlerTest, TestFastLRUCache, 
TestGreekStemFilterFactory, TestPersianNormalizationFilterFactory, 
TestCapitalizationFilterFactory, TestCollationField, TestDistributedGrouping, 
TestShingleFilterFactory, TestDefaultSimilarityFactory, TestFiltering, 
TestLRUCache, TestSolrDeletionPolicy2, DoubleMetaphoneFilterFactoryTest, 
MultiTermTest, TestStopFilterFactory, ZkControllerTest, 
TestLMJelinekMercerSimilarityFactory, TestCodecSupport, 
FieldMutatingUpdateProcessorTest, StatsComponentTest, FullSolrCloudTest, 
TestGalicianStemFilterFactory, SimpleFacetsTest, 
TestJapaneseBaseFormFilterFactory, TestPatternReplaceCharFilterFactory, 
TestPorterStemFilterFactory, TestPropInject, TestBadConfig, 
PrimitiveFieldTypeTest, TestPortugueseStemFilterFactory, EchoParamsTest, 
UpdateRequestProcessorFactoryTest, SignatureUpdateProcessorFactoryTest, 
DocumentAnalysisRequestHandlerTest, TestPortugueseMinimalStemFilterFactory, 
TestBeiderMorseFilterFactory, TestWriterPerf, UpdateParamsTest, 
TestQuerySenderNoQuery, SolrCoreCheckLockOnStartupTest, TestXIncludeConfig, 
PluginInfoTest, TestBM25SimilarityFactory, DocumentBuilderTest, 
ZkSolrClientTest, ZkNodePropsTest, SnowballPorterFilterFactoryTest, 
TestJapanesePartOfSpeechStopFilterFactory, SuggesterFSTTest, 
TestFrenchLightStemFilterFactory, FileBasedSpellCheckerTest, CloudStateTest, 
TestSearchPerf, TestPortugueseLightStemFilterFactory, 
TestHunspellStemFilterFactory, TestHashPartitioner, 
LeaderElectionIntegrationTest, MBeansHandlerTest, TestSurroundQueryParser, 
TermsComponentTest, TestEnglishMinimalStemFilterFactory, 
TestDFRSimilarityFactory, AlternateDirectoryTest, FastVectorHighlighterTest, 
OverseerTest, TestJmxMonitoredMap, JSONWriterTest, PingRequestHandlerTest, 
TestSynonymFilterFactory, TestHyphenationCompoundWordTokenFilterFactory, 
TestOmitPositions, CoreAdminHandlerTest, TestPhoneticFilterFactory, 
DirectUpdateHandlerTest, TestWordDelimiterFilterFactory, CacheHeaderTest, 
SoftAutoCommitTest, DistributedSpellCheckComponentTest, 
TestGermanStemFilterFactory, TestReplicationHandler, TestPerFieldSimilarity, 
MoreLikeThisHandlerTest, TestQuerySenderListener, TestCoreContainer, 
TestKeywordMarkerFilterFactory, SpellPossibilityIteratorTest, 
TestGreekLowerCaseFilterFactory, NumericFieldsTest, TestLFUCache, 
JsonLoaderTest, TestStandardFactories, BadIndexSchemaTest, 
TestReverseStringFilterFactory, SolrRequestParserTest, 
UniqFieldsUpdateProcessorFactoryTest, SolrPluginUtilsTest, 
TestWikipediaTokenizerFactory, SOLR749Test, 

[jira] [Commented] (LUCENE-4092) Check what's Jenkins pattern for e-mailing log fragments (so that it includes failures).

2012-05-31 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286994#comment-13286994
 ] 

Steven Rowe commented on LUCENE-4092:
-

Two problems:

# Spreading the BUILD_LOG_REGEX regex value over multiple lines is not 
supported by Jenkins's email templating functionality, which is provided by the 
Jenkins Email Extension Plugin (email-ext) 
[https://wiki.jenkins-ci.org/display/JENKINS/Email-ext+plugin].  See [the 
configuration token parsing regexes in 
ContentBuilder.Tokenizer|https://github.com/jenkinsci/email-ext-plugin/blob/master/src/main/java/hudson/plugins/emailext/plugins/ContentBuilder.java#L134],
 in particular the comment over the {{stringRegex}} field:{code:java}// 
Sequence of (1) not \  CR LF and (2) \ followed by non line terminator
private static final String stringRegex = 
\([^\\\r\\n]|(.))*\;{code}
This could be fixed by allowing line terminators to be escaped:{code:java}// 
Sequence of (1) not \  CR LF and (2) \ followed by any non-CR/LF character or 
(CR)LF
private static final String stringRegex = 
\([^\\\r\\n]|((?:.|\r?\n)))*\;{code}
I submitted a Jenkins JIRA issue for this: 
[https://issues.jenkins-ci.org/browse/JENKINS-13976].
# [BuildLogRegexContent, the content parser for BUILD_LOG_REGEX, matches 
line-by-line|https://github.com/jenkinsci/email-ext-plugin/blob/master/src/main/java/hudson/plugins/emailext/plugins/content/BuildLogRegexContent.java#L213],
 so regexes targeting multiple lines will fail.  I can see two possible routes 
to address this:
## The BUILD_LOG_EXCERPT token allows specification of begin/end line regexes, 
and includes everything inbetween matches.  I'm doubtful this will enable 
capture of the stuff we want, though.
## I'll try to add an argument to BUILD_LOG_REGEX to enable multi-line content 
matching, and make a pull request/JIRA issue to get it included in the next 
release of the plugin.

In the mean time, I'll switch the configuration in our Jenkins jobs to the 
following:

{noformat}
Build: ${BUILD_URL}

${FAILED_TESTS}

Build Log:
${BUILD_LOG_REGEX,regex=[ 
\\t]*(?:\\[javac\\]\\s+[1-9]\\d*\\s+error|\\[junit4\\].*\\s+FAILURES!|\\[javadoc\\]\\s+[1-9]\\d*\\s+warning).*,linesBefore=100}
${BUILD_LOG_REGEX,regex=[ 
\\t]*\\[echo\\].*)*\\s*[1-9]\\d*\\s+Unknown\\s+Licenses.*,linesBefore=17,linesAfter=20}
${BUILD_LOG_REGEX,regex=[ \\t]*javadocs-lint:.*,linesBefore=0,linesAfter=75}
${BUILD_LOG_REGEX,regex=.*FATAL:.*,linesBefore=0,linesAfter=100}
{noformat}

 Check what's Jenkins pattern for e-mailing log fragments (so that it includes 
 failures).
 

 Key: LUCENE-4092
 URL: https://issues.apache.org/jira/browse/LUCENE-4092
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Priority: Trivial



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java6-64 #352

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/352/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4092) Check what's Jenkins pattern for e-mailing log fragments (so that it includes failures).

2012-05-31 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287026#comment-13287026
 ] 

Michael McCandless commented on LUCENE-4092:


Steve you are a regexp God.

 Check what's Jenkins pattern for e-mailing log fragments (so that it includes 
 failures).
 

 Key: LUCENE-4092
 URL: https://issues.apache.org/jira/browse/LUCENE-4092
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Priority: Trivial



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2723 - Failure

2012-05-31 Thread Robert Muir
On Thu, May 31, 2012 at 5:51 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 I think the best option is to ignore the OOME from this test case...?

 Mike McCandless


I think thats fine for now, but I'm not convinced there is no problem
at all. However, its not obvious the problem is us, either.

Its easy to see this OOM is related to G1 garbage collector.

This test has failed 3 times in the past couple days (before it never
failed: i suspect packed ints changes sent it over the edge).

https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2707/
https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2719/
https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2723/

All 3 cases are java 7, and all 3 cases uses -XX:+UseG1GC. (Uwe turned
on GC randomization at lucene revolution)


-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3495) UUID and Timestamp Update Processors

2012-05-31 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-3495.


Resolution: Fixed

Committed revision 1344946. - trunk
Committed revision 1344947. - 4x


 UUID and Timestamp Update Processors
 

 Key: SOLR-3495
 URL: https://issues.apache.org/jira/browse/SOLR-3495
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.0

 Attachments: SOLR-3495.patch


 new Update Processor's to automatically add fields with new UUIDs and 
 Timestamps to SolrInputDocuments leveraging SOLR-2802.  Both processors 
 should default to selecting the uniqueKey field if it is the appropriate type.
 This is necessary for 4.0 because of SOLR-2796

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Apache Icon in JIRA ?

2012-05-31 Thread Koji Sekiguchi
Hello everyone,

When I tried to work LUCENE-4091.patch, I realized that apache icon doesn't
appear at the next to the patch file.

If I remember correctly, apache icon is displayed as long as the contributors
checks Grant License to ASF when attaching patch files. But I couldn't see
any apache icons in the past issues in jira.

Does anyone know how I see the attached files are granted or not?

koji
-- 
Query Log Visualizer for Apache Solr
http://soleami.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3499) Attachment Test Issue - IGNORE

2012-05-31 Thread Hoss Man (JIRA)
Hoss Man created SOLR-3499:
--

 Summary: Attachment Test Issue - IGNORE
 Key: SOLR-3499
 URL: https://issues.apache.org/jira/browse/SOLR-3499
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man


sanity checking attachment licensing indicator in Jira

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Apache Icon in JIRA ?

2012-05-31 Thread Robert Muir
Click the down arrow (options, to the far right side of the
attachments section), then choose manage attachments and you can see
the apache icon beside all attachments on the issue.

On Thu, May 31, 2012 at 8:33 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:
 Hello everyone,

 When I tried to work LUCENE-4091.patch, I realized that apache icon doesn't
 appear at the next to the patch file.

 If I remember correctly, apache icon is displayed as long as the 
 contributors
 checks Grant License to ASF when attaching patch files. But I couldn't see
 any apache icons in the past issues in jira.

 Does anyone know how I see the attached files are granted or not?

 koji
 --
 Query Log Visualizer for Apache Solr
 http://soleami.com/

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3499) Attachment Test Issue - IGNORE

2012-05-31 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3499:
---

Attachment: empty_file_not_intended_for_inclusion.txt

attaching empty file and selecting Attachment not intended for inclusion

 Attachment Test Issue - IGNORE
 --

 Key: SOLR-3499
 URL: https://issues.apache.org/jira/browse/SOLR-3499
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: empty_file_grant_license.txt, 
 empty_file_not_intended_for_inclusion.txt


 sanity checking attachment licensing indicator in Jira

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3499) Attachment Test Issue - IGNORE

2012-05-31 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3499:
---

Attachment: empty_file_grant_license.txt

Attaching empty file and selecting Grant license to ASF for inclusion in ASF 
works (as per the Apache License §5)

 Attachment Test Issue - IGNORE
 --

 Key: SOLR-3499
 URL: https://issues.apache.org/jira/browse/SOLR-3499
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: empty_file_grant_license.txt, 
 empty_file_not_intended_for_inclusion.txt


 sanity checking attachment licensing indicator in Jira

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Apache Icon in JIRA ?

2012-05-31 Thread Chris Hostetter
: 
: Click the down arrow (options, to the far right side of the
: attachments section), then choose manage attachments and you can see
: the apache icon beside all attachments on the issue.

For quick comparison...

https://issues.apache.org/jira/browse/SOLR-3499
https://issues.apache.org/jira/secure/ManageAttachments.jspa?id=12558890

...i'll file an INFRA Jira to see if we can get this back on the main 
issue screen.


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Apache Icon in JIRA ?

2012-05-31 Thread Chris Hostetter

: ...i'll file an INFRA Jira to see if we can get this back on the main 
: issue screen.

Scratch that ... It was already reported and Infra evidently 
considers the matter resolved...

https://issues.apache.org/jira/browse/INFRA-4842

-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Apache Icon in JIRA ?

2012-05-31 Thread Koji Sekiguchi

Robert, Hoss - Thanks! :)


(12/06/01 9:42), Chris Hostetter wrote:

:
: Click the down arrow (options, to the far right side of the
: attachments section), then choose manage attachments and you can see
: the apache icon beside all attachments on the issue.

For quick comparison...

https://issues.apache.org/jira/browse/SOLR-3499
https://issues.apache.org/jira/secure/ManageAttachments.jspa?id=12558890

...i'll file an INFRA Jira to see if we can get this back on the main
issue screen.


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org





--
Query Log Visualizer for Apache Solr
http://soleami.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4091) FastVectorHighlighter: Getter for FieldFragList.WeightedFragInfo and FieldPhraseList.WeightedPhraseInfo

2012-05-31 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved LUCENE-4091.


   Resolution: Fixed
Fix Version/s: 5.0
   4.0

committed in trunk and 4x.

 FastVectorHighlighter: Getter for FieldFragList.WeightedFragInfo and 
 FieldPhraseList.WeightedPhraseInfo
 ---

 Key: LUCENE-4091
 URL: https://issues.apache.org/jira/browse/LUCENE-4091
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 4.0
Reporter: sebastian L.
Assignee: Koji Sekiguchi
Priority: Minor
  Labels: patch
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4091.patch


 This patch introduces getter-methods for 
 * FieldFragList.WeightedFragInfo and 
 * FieldPhraseList.WeightedPhraseInfo
 in order to make FieldFragList plugable (see LUCENE-3440).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments

2012-05-31 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287088#comment-13287088
 ] 

Koji Sekiguchi commented on LUCENE-3440:


Hi sebastian,

I committed LUCENE-4091 in trunk and branch_4x. For the credit, I will give it 
in CHANGES.txt when committing the main body (LUCENE-3440) patch.

 FastVectorHighlighter: IDF-weighted terms for ordered fragments 
 

 Key: LUCENE-3440
 URL: https://issues.apache.org/jira/browse/LUCENE-3440
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: sebastian L.
Priority: Minor
  Labels: FastVectorHighlighter
 Fix For: 4.0

 Attachments: LUCENE-3440.patch, LUCENE-3440.patch, LUCENE-3440.patch, 
 LUCENE-3440_3.6.1-SNAPSHOT.patch, LUCENE-4.0-SNAPSHOT-3440-9.patch, 
 weight-vs-boost_table01.html, weight-vs-boost_table02.html


 The FastVectorHighlighter uses for every term found in a fragment an equal 
 weight, which causes a higher ranking for fragments with a high number of 
 words or, in the worst case, a high number of very common words than 
 fragments that contains *all* of the terms used in the original query. 
 This patch provides ordered fragments with IDF-weighted terms: 
 total weight = total weight + IDF for unique term per fragment * boost of 
 query; 
 The ranking-formula should be the same, or at least similar, to that one used 
 in org.apache.lucene.search.highlight.QueryTermScorer.
 The patch is simple, but it works for us. 
 Some ideas:
 - A better approach would be moving the whole fragments-scoring into a 
 separate class.
 - Switch scoring via parameter 
 - Exact phrases should be given a even better score, regardless if a 
 phrase-query was executed or not
 - edismax/dismax-parameters pf, ps and pf^boost should be observed and 
 corresponding fragments should be ranked higher 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3498) ContentStreamUpdateRequest doesn't seem to respect setCommitWithin()

2012-05-31 Thread Christian Moen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Moen updated SOLR-3498:
-

Affects Version/s: 4.0

 ContentStreamUpdateRequest doesn't seem to respect setCommitWithin()
 

 Key: SOLR-3498
 URL: https://issues.apache.org/jira/browse/SOLR-3498
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 3.6, 4.0
Reporter: Christian Moen

 I'm using the below code to post some office format files to Solr using 
 SolrJ. It seems like {{setCommitWithin()}} is ignored in my 
 {{ContentStreamUpdateRequest}} request, and that I need to use 
 {{setParam(UpdateParams.COMMIT_WITHIN, ...)}} instead to get the desired 
 effect.
 {code}
 SolrServer solrServer = new HttpSolrServer(http://localhost:8983/solr;);
 ContentStreamUpdateRequest updateRequest = new 
 ContentStreamUpdateRequest(/update/extract);
 updateRequest.addFile(file);
 updateRequest.setParam(literal.id, file.getName());
 updateRequest.setCommitWithin(1); // Does not work
 //updateRequest.setParam(UpdateParams.COMMIT_WITHIN, 1); // Works
 updateRequest.process(solrServer);
 {code}
 If I use the below
 {code}
 ...
 //updateRequest.setCommitWithin(1); // Does not work
 updateRequest.setParam(UpdateParams.COMMIT_WITHIN, 1); // Works
 ...
 {code}
 I get the desired result and a commit is being done.
 I'm doing this on 3.x, but I believe this issue could apply to 4.x as well 
 (by quickly glancing over the code with tired eyes), but I haven't verified 
 this, yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java6-64 #355

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/355/changes

Changes:

[koji] LUCENE-4091: add getter methods to FVH, part of LUCENE-3440

[hossman] SOLR-3495: new UpdateProcessors to add default values (constant, 
UUID, or Date) to documents w/o field values

--
[...truncated 10447 lines...]
   [junit4] Completed in 0.17s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.spelling.SpellPossibilityIteratorTest
   [junit4] Completed in 0.06s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.XsltUpdateRequestHandlerTest
   [junit4] Completed in 1.16s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.analysis.TestWikipediaTokenizerFactory
   [junit4] Completed in 0.01s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.analysis.TestElisionFilterFactory
   [junit4] Completed in 0.03s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.TestMultiCoreConfBootstrap
   [junit4] Completed in 4.51s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.RecoveryZkTest
   [junit4] Completed in 35.07s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.TestReplicationHandler
   [junit4] Completed in 28.80s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.ZkSolrClientTest
   [junit4] Completed in 15.81s, 4 tests
   [junit4]  
   [junit4] Suite: 
org.apache.solr.handler.component.DistributedSpellCheckComponentTest
   [junit4] Completed in 18.97s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.component.QueryElevationComponentTest
   [junit4] Completed in 6.06s, 7 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.ConvertedLegacyTest
   [junit4] Completed in 3.26s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.TestTrie
   [junit4] Completed in 1.55s, 8 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.schema.BadIndexSchemaTest
   [junit4] Completed in 1.21s, 6 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.core.TestJmxIntegration
   [junit4] IGNORED 0.00s | TestJmxIntegration.testJmxOnCoreReload
   [junit4] Cause: Annotated @Ignore(timing problem? 
https://issues.apache.org/jira/browse/SOLR-2715)
   [junit4] Completed in 1.60s, 3 tests, 1 skipped
   [junit4]  
   [junit4] Suite: org.apache.solr.servlet.SolrRequestParserTest
   [junit4] Completed in 1.32s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.spelling.IndexBasedSpellCheckerTest
   [junit4] Completed in 1.05s, 5 tests
   [junit4]  
   [junit4] Suite: 
org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest
   [junit4] Completed in 1.19s, 6 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.core.TestSolrDeletionPolicy2
   [junit4] Completed in 0.74s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.component.TermsComponentTest
   [junit4] Completed in 0.95s, 13 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.admin.ShowFileRequestHandlerTest
   [junit4] Completed in 1.04s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestSurroundQueryParser
   [junit4] Completed in 0.92s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.highlight.HighlighterTest
   [junit4] Completed in 2.06s, 27 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.update.DocumentBuilderTest
   [junit4] Completed in 0.96s, 11 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.search.function.distance.DistanceFunctionTest
   [junit4] Completed in 1.06s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterTSTTest
   [junit4] Completed in 1.28s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.schema.PolyFieldTest
   [junit4] Completed in 1.29s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterWFSTTest
   [junit4] Completed in 1.28s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.schema.TestOmitPositions
   [junit4] Completed in 0.98s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.core.TestSolrDeletionPolicy1
   [junit4] IGNOR/A 0.01s | TestSolrDeletionPolicy1.testCommitAge
   [junit4] Assumption #1: This test is not working on Windows (or maybe 
machines with only 2 CPUs)
   [junit4]   2 1167 T3512 oas.SolrTestCaseJ4.setUp ###Starting testCommitAge
   [junit4]   2 1172 T3512 C217 oasu.DirectUpdateHandler2.deleteAll 
[collection1] REMOVING ALL DOCUMENTS FROM INDEX
   [junit4]   2 1172 T3512 C217 UPDATE [collection1] webapp=null path=null 
params={} {deleteByQuery=*:*} 0 0
   [junit4]   2 1174 T3512 oas.SolrTestCaseJ4.tearDown ###Ending testCommitAge
   [junit4]   2
   [junit4] Completed in 1.19s, 3 tests, 1 skipped
   [junit4]  
   [junit4] Suite: org.apache.solr.analysis.TestReversedWildcardFilterFactory
   [junit4] Completed in 0.75s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.schema.RequiredFieldsTest
   [junit4] Completed in 0.83s, 3 tests
   [junit4]  
   [junit4] Suite: 

[jira] [Commented] (LUCENE-4092) Check what's Jenkins pattern for e-mailing log fragments (so that it includes failures).

2012-05-31 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287097#comment-13287097
 ] 

Steven Rowe commented on LUCENE-4092:
-

bq. I'll switch the configuration in our Jenkins jobs to the following ... 

Done.

 Check what's Jenkins pattern for e-mailing log fragments (so that it includes 
 failures).
 

 Key: LUCENE-4092
 URL: https://issues.apache.org/jira/browse/LUCENE-4092
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Priority: Trivial



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4097) index was locked because of InterruptedException

2012-05-31 Thread wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wang updated LUCENE-4097:
-

  Component/s: core/index
Affects Version/s: 3.1

 index was locked because of InterruptedException
 

 Key: LUCENE-4097
 URL: https://issues.apache.org/jira/browse/LUCENE-4097
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.1
Reporter: wang

 the index was locked, because of InterruptedException,and i could do nothing 
 but restart tomcat,
 how could i avoid this happen again?
 thanks
 this is stacktrace:
 org.apache.lucene.util.ThreadInterruptedException: 
 java.lang.InterruptedException
 at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4118)
 at 
 org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2836)
 at 
 org.apache.lucene.index.IndexWriter.finishMerges(IndexWriter.java:2821)
 at 
 org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1847)
 at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1800)
 at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1764)
 at 
 org.opencms.search.CmsSearchManager.updateIndexIncremental(CmsSearchManager.java:2262)
 at 
 org.opencms.search.CmsSearchManager.updateIndexOffline(CmsSearchManager.java:2306)
 at 
 org.opencms.search.CmsSearchManager$CmsSearchOfflineIndexThread.run(CmsSearchManager.java:327)
 Caused by: java.lang.InterruptedException
 at java.lang.Object.wait(Native Method)
 at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4116)
 ... 8 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Typo in UIMAUpdateRequestProcessor: Analazying text

2012-05-31 Thread Koji Sekiguchi

(12/06/01 5:25), Jack Krupansky wrote:

A typo at line 146 in UIMAUpdateRequestProcessor.java:

log.info(new StringBuffer(Analazying text).toString());

“Analazying” s.b. “Analyzing”


Thanks Jack! Committed the fix.

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java6-64 #356

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/356/

--
[...truncated 16518 lines...]
   [junit4]   2 58777 T3191 oasc.RequestHandlers.initHandlersFromConfig 
created spellCheckCompRH_Direct: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 58777 T3191 oasc.RequestHandlers.initHandlersFromConfig 
created spellCheckCompRH1: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 58777 T3191 oasc.RequestHandlers.initHandlersFromConfig 
created tvrh: org.apache.solr.handler.component.SearchHandler
   [junit4]   2 58778 T3191 oasc.RequestHandlers.initHandlersFromConfig 
created /mlt: solr.MoreLikeThisHandler
   [junit4]   2 58778 T3191 oasc.RequestHandlers.initHandlersFromConfig 
created /debug/dump: solr.DumpRequestHandler
   [junit4]   2 58779 T3191 oashl.XMLLoader.init xsltCacheLifetimeSeconds=60
   [junit4]   2 58780 T3191 oasc.SolrCore.initDeprecatedSupport WARNING 
solrconfig.xml uses deprecated admin/gettableFiles, Please update your config 
to use the ShowFileRequestHandler.
   [junit4]   2 58781 T3191 oasc.SolrCore.initDeprecatedSupport WARNING adding 
ShowFileRequestHandler with hidden files: [SOLRCONFIG-HIGHLIGHT.XML, 
SCHEMA-REQUIRED-FIELDS.XML, SCHEMA-REPLICATION2.XML, SCHEMA-MINIMAL.XML, 
BAD-SCHEMA-DUP-DYNAMICFIELD.XML, SOLRCONFIG-CACHING.XML, 
SOLRCONFIG-REPEATER.XML, CURRENCY.XML, BAD-SCHEMA-NONTEXT-ANALYZER.XML, 
SOLRCONFIG-MERGEPOLICY.XML, SOLRCONFIG-TLOG.XML, SOLRCONFIG-MASTER.XML, 
SCHEMA11.XML, SOLRCONFIG-BASIC.XML, DA_COMPOUNDDICTIONARY.TXT, 
SCHEMA-COPYFIELD-TEST.XML, SOLRCONFIG-SLAVE.XML, ELEVATE.XML, 
SOLRCONFIG-PROPINJECT-INDEXDEFAULT.XML, SCHEMA-IB.XML, 
SOLRCONFIG-QUERYSENDER.XML, SCHEMA-REPLICATION1.XML, DA_UTF8.XML, 
HYPHENATION.DTD, SOLRCONFIG-ENABLEPLUGIN.XML, SCHEMA-PHRASESUGGEST.XML, 
STEMDICT.TXT, HUNSPELL-TEST.AFF, STOPTYPES-1.TXT, STOPWORDSWRONGENCODING.TXT, 
SCHEMA-NUMERIC.XML, SOLRCONFIG-TRANSFORMERS.XML, SOLRCONFIG-PROPINJECT.XML, 
BAD-SCHEMA-NOT-INDEXED-BUT-TF.XML, SOLRCONFIG-SIMPLELOCK.XML, WDFTYPES.TXT, 
STOPTYPES-2.TXT, SCHEMA-REVERSED.XML, SOLRCONFIG-SPELLCHECKCOMPONENT.XML, 
SCHEMA-DFR.XML, SOLRCONFIG-PHRASESUGGEST.XML, 
BAD-SCHEMA-NOT-INDEXED-BUT-POS.XML, KEEP-1.TXT, OPEN-EXCHANGE-RATES.JSON, 
STOPWITHBOM.TXT, SCHEMA-BINARYFIELD.XML, SOLRCONFIG-SPELLCHECKER.XML, 
SOLRCONFIG-UPDATE-PROCESSOR-CHAINS.XML, BAD-SCHEMA-OMIT-TF-BUT-NOT-POS.XML, 
BAD-SCHEMA-DUP-FIELDTYPE.XML, SOLRCONFIG-MASTER1.XML, SYNONYMS.TXT, SCHEMA.XML, 
SCHEMA_CODEC.XML, SOLRCONFIG-SOLR-749.XML, 
SOLRCONFIG-MASTER1-KEEPONEBACKUP.XML, STOP-2.TXT, SOLRCONFIG-FUNCTIONQUERY.XML, 
SCHEMA-LMDIRICHLET.XML, SOLRCONFIG-TERMINDEX.XML, SOLRCONFIG-ELEVATE.XML, 
STOPWORDS.TXT, SCHEMA-FOLDING.XML, SCHEMA-STOP-KEEP.XML, 
BAD-SCHEMA-NOT-INDEXED-BUT-NORMS.XML, SOLRCONFIG-SOLCOREPROPERTIES.XML, 
STOP-1.TXT, SOLRCONFIG-MASTER2.XML, SCHEMA-SPELLCHECKER.XML, 
SOLRCONFIG-LAZYWRITER.XML, SCHEMA-LUCENEMATCHVERSION.XML, 
BAD-MP-SOLRCONFIG.XML, FRENCHARTICLES.TXT, SCHEMA15.XML, 
SOLRCONFIG-REQHANDLER.INCL, SCHEMASURROUND.XML, SOLRCONFIG-MASTER3.XML, 
HUNSPELL-TEST.DIC, SOLRCONFIG-XINCLUDE.XML, SOLRCONFIG-DELPOLICY1.XML, 
SOLRCONFIG-SLAVE1.XML, SCHEMA-SIM.XML, SCHEMA-COLLATE.XML, STOP-SNOWBALL.TXT, 
PROTWORDS.TXT, SCHEMA-TRIE.XML, SOLRCONFIG_CODEC.XML, SCHEMA-TFIDF.XML, 
SCHEMA-LMJELINEKMERCER.XML, PHRASESUGGEST.TXT, OLD_SYNONYMS.TXT, 
SOLRCONFIG-DELPOLICY2.XML, XSLT, SOLRCONFIG-NATIVELOCK.XML, 
BAD-SCHEMA-DUP-FIELD.XML, SOLRCONFIG-NOCACHE.XML, SCHEMA-BM25.XML, 
SOLRCONFIG-ALTDIRECTORY.XML, SOLRCONFIG-QUERYSENDER-NOQUERY.XML, 
COMPOUNDDICTIONARY.TXT, SOLRCONFIG_PERF.XML, 
SCHEMA-NOT-REQUIRED-UNIQUE-KEY.XML, KEEP-2.TXT, SCHEMA12.XML, 
MAPPING-ISOLATIN1ACCENT.TXT, BAD_SOLRCONFIG.XML, 
BAD-SCHEMA-EXTERNAL-FILEFIELD.XML]
   [junit4]   2 58784 T3191 oass.SolrIndexSearcher.init Opening 
Searcher@132bfc00 main
   [junit4]   2 58785 T3191 oass.SolrIndexSearcher.init WARNING WARNING: 
Directory impl does not support setting indexDir: 
org.apache.lucene.store.MockDirectoryWrapper
   [junit4]   2 58785 T3191 oasu.CommitTracker.init Hard AutoCommit: disabled
   [junit4]   2 58785 T3191 oasu.CommitTracker.init Soft AutoCommit: disabled
   [junit4]   2 58786 T3191 oashc.SpellCheckComponent.inform Initializing 
spell checkers
   [junit4]   2 58793 T3191 oass.DirectSolrSpellChecker.init init: 
{name=direct,classname=DirectSolrSpellChecker,field=lowerfilt,minQueryLength=3}
   [junit4]   2 58823 T208 oaz.ClientCnxn$SendThread.startConnect Opening 
socket connection to server 127.0.0.1/127.0.0.1:60602
   [junit4]   2 58825 T3191 oashc.HttpShardHandlerFactory.getParameter Setting 
socketTimeout to: 0
   [junit4]   2 58825 T3191 oashc.HttpShardHandlerFactory.getParameter Setting 
urlScheme to: http://
   [junit4]   2 58826 T3191 oashc.HttpShardHandlerFactory.getParameter Setting 
connTimeout to: 0
   [junit4]   2 58826 T3191 oashc.HttpShardHandlerFactory.getParameter Setting 
maxConnectionsPerHost to: 20
   [junit4]   2 58826 T3191 

Re: CHANGES.txt for highlighter?

2012-05-31 Thread Robert Muir
trunk/branch4x only have a single consolidated lucene/CHANGES.txt. So
a highlighter change would just go there!

On Thu, May 31, 2012 at 10:15 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:
 Hi sorry again,

 I cannot find CHANGES.txt files anymore for (ancient?) contrib packages,
 e.g. highlighter under lucene directory:

 $ find . -name CHANGES.txt
 ./lucene/CHANGES.txt
 ./solr/CHANGES.txt
 ./solr/contrib/analysis-extras/CHANGES.txt
 ./solr/contrib/clustering/CHANGES.txt
 ./solr/contrib/dataimporthandler/CHANGES.txt
 ./solr/contrib/extraction/CHANGES.txt
 ./solr/contrib/langid/CHANGES.txt
 ./solr/contrib/uima/CHANGES.txt

 where should I give a credit for a contributor for FVH?

 koji
 --
 Query Log Visualizer for Apache Solr
 http://soleami.com/

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: CHANGES.txt for highlighter?

2012-05-31 Thread Koji Sekiguchi

(12/06/01 11:28), Robert Muir wrote:

trunk/branch4x only have a single consolidated lucene/CHANGES.txt. So
a highlighter change would just go there!



Got it. Thank you again!

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java7-64 #201

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/201/changes

Changes:

[koji] fix typo in uima contrib

[koji] LUCENE-4091: add getter methods to FVH, part of LUCENE-3440

--
[...truncated 11372 lines...]
   [junit4] Suite: org.apache.solr.util.TimeZoneUtilsTest
   [junit4] Completed in 0.16s, 5 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.schema.NumericFieldsTest
   [junit4] Completed in 0.88s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.core.TestQuerySenderNoQuery
   [junit4] Completed in 0.84s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.analysis.TestGermanMinimalStemFilterFactory
   [junit4] Completed in 0.01s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.OverseerTest
   [junit4] Completed in 46.50s, 7 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.TestDistributedSearch
   [junit4] Completed in 27.63s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.CloudStateUpdateTest
   [junit4] Completed in 12.74s, 1 test
   [junit4]  
   [junit4] Suite: 
org.apache.solr.handler.component.DistributedSpellCheckComponentTest
   [junit4] Completed in 15.45s, 1 test
   [junit4]  
   [junit4] Suite: 
org.apache.solr.handler.component.DistributedTermsComponentTest
   [junit4] Completed in 13.24s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.BasicZkTest
   [junit4] Completed in 9.06s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.TestJoin
   [junit4] Completed in 10.19s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.component.SpellCheckComponentTest
   [junit4] Completed in 7.31s, 9 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.component.QueryElevationComponentTest
   [junit4] Completed in 5.33s, 7 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.cloud.TestMultiCoreConfBootstrap
   [junit4] Completed in 4.47s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestRangeQuery
   [junit4] Completed in 8.50s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.update.PeerSyncTest
   [junit4] Completed in 4.24s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterFSTTest
   [junit4] Completed in 1.25s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.MoreLikeThisHandlerTest
   [junit4] Completed in 0.92s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.core.SolrCoreTest
   [junit4] Completed in 5.18s, 5 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.core.TestJmxIntegration
   [junit4] IGNORED 0.00s | TestJmxIntegration.testJmxOnCoreReload
   [junit4] Cause: Annotated @Ignore(timing problem? 
https://issues.apache.org/jira/browse/SOLR-2715)
   [junit4] Completed in 1.74s, 3 tests, 1 skipped
   [junit4]  
   [junit4] Suite: org.apache.solr.search.TestPseudoReturnFields
   [junit4] Completed in 1.40s, 13 tests
   [junit4]  
   [junit4] Suite: 
org.apache.solr.search.similarities.TestLMDirichletSimilarityFactory
   [junit4] Completed in 0.15s, 2 tests
   [junit4]  
   [junit4] Suite: 
org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactoryTest
   [junit4] Completed in 0.81s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.handler.admin.CoreAdminHandlerTest
   [junit4] Completed in 1.78s, 1 test
   [junit4]  
   [junit4] Suite: org.apache.solr.search.SpatialFilterTest
   [junit4] Completed in 1.53s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.core.SolrCoreCheckLockOnStartupTest
   [junit4] Completed in 1.53s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterWFSTTest
   [junit4] Completed in 1.26s, 4 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.schema.CurrencyFieldTest
   [junit4] IGNORED 0.00s | CurrencyFieldTest.testPerformance
   [junit4] Cause: Annotated @Ignore()
   [junit4] Completed in 1.10s, 8 tests, 1 skipped
   [junit4]  
   [junit4] Suite: org.apache.solr.schema.TestOmitPositions
   [junit4] Completed in 0.88s, 2 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.core.TestSolrDeletionPolicy1
   [junit4] IGNOR/A 0.02s | TestSolrDeletionPolicy1.testCommitAge
   [junit4] Assumption #1: This test is not working on Windows (or maybe 
machines with only 2 CPUs)
   [junit4]   2 749 T3389 oas.SolrTestCaseJ4.setUp ###Starting testCommitAge
   [junit4]   2 ASYNC  NEW_CORE C224 name=collection1 
org.apache.solr.core.SolrCore@26d904a1
   [junit4]   2 753 T3389 C224 oasu.DirectUpdateHandler2.deleteAll 
[collection1] REMOVING ALL DOCUMENTS FROM INDEX
   [junit4]   2 754 T3390 oasc.SolrCore.registerSearcher [collection1] 
Registered new searcher Searcher@ab34164 
main{StandardDirectoryReader(segments_1:1)}
   [junit4]   2 754 T3389 C224 oasc.SolrDeletionPolicy.onInit 
SolrDeletionPolicy.onInit: commits:num=1
   [junit4]   2
commit{dir=MockDirWrapper(org.apache.lucene.store.RAMDirectory@7a766fb4 

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-05-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287136#comment-13287136
 ] 

Chris Male commented on LUCENE-3312:


bq. index.Document is an interface, I think for better extensibility in the 
future it could be an abstract class - who knows what we will want to put there 
in addition to the iterators...

I'm not sure that is such a big deal.  But I do think  should think about the 
name here.  We already have Document and it's going to become confusing with 
two different Document classes kind of doing the same thing and with 
document.Document implementing index.Document as well.

bq. previously we allowed one to remove fields from document by name, are we 
going to allow this now separately for indexed and stored fields?

I think we need to simplify the document.Document API.  I don't think it should 
hold Indexable/StorableField instances but instead should just hold Field 
instances.  It is a userland kind of class and so is Field.  We should make it 
easy for people to add the Fields that they want.  If they want to have a Field 
which is both indexed and stored, then they can create it once and add it to 
Document.  If they want to do it separately, then they can do that too.  Since 
Field implements both IndexableField and StorableField, it can serve the dual 
purpose.

That way the API in document.Document is pretty simple and you can add and 
remove things as done in the past.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Nikola Tankovic
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: Field Type branch

 Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
 lucene-3312-patch-03.patch, lucene-3312-patch-04.patch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4099) Remove generics from SpatialStrategy and remove SpatialFieldInfo

2012-05-31 Thread Chris Male (JIRA)
Chris Male created LUCENE-4099:
--

 Summary: Remove generics from SpatialStrategy and remove 
SpatialFieldInfo
 Key: LUCENE-4099
 URL: https://issues.apache.org/jira/browse/LUCENE-4099
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/spatial
Reporter: Chris Male
Priority: Minor


Same time ago I added SpatialFieldInfo as a way for SpatialStrategys to declare 
what information they needed per request.  This meant that a Strategy could be 
used across multiple requests.  However it doesn't really need to be that way 
any more, Strategies are light to instantiate and the generics are just clumsy 
and annoying.

Instead Strategies should just define what they need in their constructor. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2723 - Failure

2012-05-31 Thread Dawid Weiss
Aaahhh... I thought G1 will start causing issues at some point. Good
catch, Robert.

Dawid

On Fri, Jun 1, 2012 at 2:05 AM, Robert Muir rcm...@gmail.com wrote:
 On Thu, May 31, 2012 at 5:51 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
 I think the best option is to ignore the OOME from this test case...?

 Mike McCandless


 I think thats fine for now, but I'm not convinced there is no problem
 at all. However, its not obvious the problem is us, either.

 Its easy to see this OOM is related to G1 garbage collector.

 This test has failed 3 times in the past couple days (before it never
 failed: i suspect packed ints changes sent it over the edge).

 https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2707/
 https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2719/
 https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2723/

 All 3 cases are java 7, and all 3 cases uses -XX:+UseG1GC. (Uwe turned
 on GC randomization at lucene revolution)


 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java7-64 #202

2012-05-31 Thread jenkins
See 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/202/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >