Re: Welcome Itamar Syn-Hershko as a new committer
Thanks guys On Wed, May 23, 2012 at 1:14 AM, zoolette gaufre...@gmail.com wrote: Welcome in Itamar ! 2012/5/22 Prescott Nasser geobmx...@hotmail.com Hey all, I'd like to officially welcome Itamar as a new committer. I know the community appreciates the work you've been doing with the Spatial contrib project and the past help you've provided on the mailing lists. Please join me in welcoming Itamar, ~Prescott
PyLucene3.6 windows binaries
PyLucene 3.6.0 for Python 2.6/2.7 is now available as pre-compiled binary for windows (32bit) from the pylucene-extra site at http://code.google.com/a/apache-extras.org/p/pylucene-extra Note: pylucene-extra is not an official Apache project, but rather an attempt to lower the entry barrier to PyLucene by providing some prebuilt eggs. Further contributions (for other platforms or combinations of 32/64bit and Python2.x etc.) are highly welcome! best regards Thomas
[jira] [Resolved] (SOLR-3464) softCommit option for HttpSolrServer commit method
[ https://issues.apache.org/jira/browse/SOLR-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili resolved SOLR-3464. --- Resolution: Fixed Fix Version/s: 4.0 Assignee: Tommaso Teofili softCommit option for HttpSolrServer commit method -- Key: SOLR-3464 URL: https://issues.apache.org/jira/browse/SOLR-3464 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 4.0 Reporter: Marco Crivellaro Assignee: Tommaso Teofili Priority: Minor Fix For: 4.0 HttpSolrServer.commit method doesn't have softCommit option which appears to be an option available for the commit command: http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java6-64 #162
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/162/ -- [...truncated 16267 lines...] [junit4] 2at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [junit4] 2at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [junit4] 2at java.lang.Thread.run(Thread.java:662) [junit4] 2 [junit4] 2 61062 T2869 oas.SolrTestCaseJ4.tearDown ###Ending test [junit4] 1 replicate slave to master [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestReplicationHandler -Dtests.method=test -Dtests.seed=A486C222F6A861EE -Dtests.locale=hr -Dtests.timezone=Asia/Jakarta -Dargs=-Dfile.encoding=Cp1252 [junit4] 1 [junit4] 2 [junit4] (@AfterClass output) [junit4] 2 61089 T2869 oasc.CoreContainer.shutdown Shutting down CoreContainer instance=831846536 [junit4] 2 61089 T2869 oasc.SolrCore.close [collection1] CLOSING SolrCore org.apache.solr.core.SolrCore@129b3cec [junit4] 2 61090 T2869 oasc.SolrCore.closeSearcher [collection1] Closing main searcher on request. [junit4] 2 61090 T2869 oasu.DirectUpdateHandler2.close closing DirectUpdateHandler2{commits=2,autocommits=0,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=494,cumulative_deletesById=0,cumulative_deletesByQuery=1,cumulative_errors=0} [junit4] 2 61099 T2869 oejsh.ContextHandler.doStop stopped o.e.j.s.ServletContextHandler{/solr,null} [junit4] 2 61154 T2869 oasc.CoreContainer.shutdown Shutting down CoreContainer instance=1517759769 [junit4] 2 61155 T2869 oasc.SolrCore.close [collection1] CLOSING SolrCore org.apache.solr.core.SolrCore@27549904 [junit4] 2 61155 T2869 oasc.SolrCore.closeSearcher [collection1] Closing main searcher on request. [junit4] 2 61156 T2869 oasu.DirectUpdateHandler2.close closing DirectUpdateHandler2{commits=1,autocommits=0,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0} [junit4] 2 61158 T2869 oejsh.ContextHandler.doStop stopped o.e.j.s.ServletContextHandler{/solr,null} [junit4] 2 61235 T2869 oas.SolrTestCaseJ4.deleteCore ###deleteCore [junit4] 2 NOTE: test params are: codec=Lucene40, sim=RandomSimilarityProvider(queryNorm=false,coord=false): {}, locale=hr, timezone=Asia/Jakarta [junit4] 2 NOTE: Windows 7 6.1 amd64/Sun Microsystems Inc. 1.6.0_32 (64-bit)/cpus=2,threads=1,free=147193000,total=271253504 [junit4] 2 NOTE: All tests run in this JVM: [TestChineseTokenizerFactory, TestExtendedDismaxParser, TestQueryUtils, SampleTest, TestPseudoReturnFields, TestNumberUtils, TestBeiderMorseFilterFactory, TestMultiCoreConfBootstrap, TestItalianLightStemFilterFactory, TestSolrDeletionPolicy1, RAMDirectoryFactoryTest, TestLFUCache, DocumentAnalysisRequestHandlerTest, TestSpanishLightStemFilterFactory, TestSolrCoreProperties, IndexBasedSpellCheckerTest, JsonLoaderTest, TestValueSourceCache, UniqFieldsUpdateProcessorFactoryTest, ZkNodePropsTest, TestUAX29URLEmailTokenizerFactory, JSONWriterTest, SortByFunctionTest, FieldMutatingUpdateProcessorTest, TestPropInject, TestGermanStemFilterFactory, TestTrie, ZkSolrClientTest, DateMathParserTest, SpellCheckComponentTest, TestTypeTokenFilterFactory, HighlighterConfigTest, TestQuerySenderListener, PrimUtilsTest, IndexReaderFactoryTest, TestNorwegianLightStemFilterFactory, SystemInfoHandlerTest, TestLRUCache, FullSolrCloudDistribCmdsTest, TestGermanNormalizationFilterFactory, TestFunctionQuery, CommonGramsQueryFilterFactoryTest, OpenExchangeRatesOrgProviderTest, SolrCoreCheckLockOnStartupTest, TestPortugueseStemFilterFactory, OverseerTest, TestIndonesianStemFilterFactory, TestPerFieldSimilarity, TestHashPartitioner, TestOmitPositions, SoftAutoCommitTest, StandardRequestHandlerTest, TestRecovery, TestBM25SimilarityFactory, TestRangeQuery, StatsComponentTest, DistributedTermsComponentTest, TestDocSet, TestBinaryField, TestPhraseSuggestions, TestCollationKeyFilterFactory, DebugComponentTest, TestShingleFilterFactory, TestJoin, TestUtils, ReturnFieldsTest, SimpleFacetsTest, TestIndexingPerformance, MBeansHandlerTest, TestPersianNormalizationFilterFactory, TestRemoveDuplicatesTokenFilterFactory, TestRussianLightStemFilterFactory, PrimitiveFieldTypeTest, LeaderElectionIntegrationTest, TestWordDelimiterFilterFactory, TestCJKTokenizerFactory, IndexSchemaTest, TimeZoneUtilsTest, TestSynonymMap, AutoCommitTest, SOLR749Test, BadIndexSchemaTest, TestChineseFilterFactory, TermsComponentTest, BasicFunctionalityTest, TestSynonymFilterFactory, UUIDFieldTest, DateFieldTest, TestArbitraryIndexDir, TestLMDirichletSimilarityFactory, SolrIndexConfigTest, TestJmxMonitoredMap,
Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java7-64 #95
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/95/ -- [...truncated 11919 lines...] [junit4] 2 20820 T3143 oasc.RequestHandlers.initHandlersFromConfig created /terms: org.apache.solr.handler.component.SearchHandler [junit4] 2 20821 T3143 oasc.RequestHandlers.initHandlersFromConfig created spellCheckCompRH: org.apache.solr.handler.component.SearchHandler [junit4] 2 20821 T3143 oasc.RequestHandlers.initHandlersFromConfig created spellCheckCompRH_Direct: org.apache.solr.handler.component.SearchHandler [junit4] 2 20821 T3143 oasc.RequestHandlers.initHandlersFromConfig created spellCheckCompRH1: org.apache.solr.handler.component.SearchHandler [junit4] 2 20822 T3143 oasc.RequestHandlers.initHandlersFromConfig created tvrh: org.apache.solr.handler.component.SearchHandler [junit4] 2 20822 T3143 oasc.RequestHandlers.initHandlersFromConfig created /mlt: solr.MoreLikeThisHandler [junit4] 2 20823 T3143 oasc.RequestHandlers.initHandlersFromConfig created /debug/dump: solr.DumpRequestHandler [junit4] 2 20825 T3143 oashl.XMLLoader.init xsltCacheLifetimeSeconds=60 [junit4] 2 20827 T3143 oasc.SolrCore.initDeprecatedSupport WARNING solrconfig.xml uses deprecated admin/gettableFiles, Please update your config to use the ShowFileRequestHandler. [junit4] 2 20830 T3143 oasc.SolrCore.initDeprecatedSupport WARNING adding ShowFileRequestHandler with hidden files: [SOLRCONFIG-HIGHLIGHT.XML, SCHEMA-REQUIRED-FIELDS.XML, SCHEMA-REPLICATION2.XML, SCHEMA-MINIMAL.XML, BAD-SCHEMA-DUP-DYNAMICFIELD.XML, SOLRCONFIG-CACHING.XML, SOLRCONFIG-REPEATER.XML, CURRENCY.XML, BAD-SCHEMA-NONTEXT-ANALYZER.XML, SOLRCONFIG-MERGEPOLICY.XML, SOLRCONFIG-TLOG.XML, SOLRCONFIG-MASTER.XML, SCHEMA11.XML, SOLRCONFIG-BASIC.XML, DA_COMPOUNDDICTIONARY.TXT, SCHEMA-COPYFIELD-TEST.XML, SOLRCONFIG-SLAVE.XML, ELEVATE.XML, SOLRCONFIG-PROPINJECT-INDEXDEFAULT.XML, SCHEMA-IB.XML, SOLRCONFIG-QUERYSENDER.XML, SCHEMA-REPLICATION1.XML, DA_UTF8.XML, HYPHENATION.DTD, SOLRCONFIG-ENABLEPLUGIN.XML, STEMDICT.TXT, SCHEMA-PHRASESUGGEST.XML, HUNSPELL-TEST.AFF, STOPTYPES-1.TXT, STOPWORDSWRONGENCODING.TXT, SCHEMA-NUMERIC.XML, SOLRCONFIG-TRANSFORMERS.XML, SOLRCONFIG-PROPINJECT.XML, BAD-SCHEMA-NOT-INDEXED-BUT-TF.XML, SOLRCONFIG-SIMPLELOCK.XML, WDFTYPES.TXT, STOPTYPES-2.TXT, SCHEMA-REVERSED.XML, SOLRCONFIG-SPELLCHECKCOMPONENT.XML, SCHEMA-DFR.XML, SOLRCONFIG-PHRASESUGGEST.XML, BAD-SCHEMA-NOT-INDEXED-BUT-POS.XML, KEEP-1.TXT, OPEN-EXCHANGE-RATES.JSON, STOPWITHBOM.TXT, SCHEMA-BINARYFIELD.XML, SOLRCONFIG-SPELLCHECKER.XML, SOLRCONFIG-UPDATE-PROCESSOR-CHAINS.XML, BAD-SCHEMA-OMIT-TF-BUT-NOT-POS.XML, BAD-SCHEMA-DUP-FIELDTYPE.XML, SOLRCONFIG-MASTER1.XML, SYNONYMS.TXT, SCHEMA.XML, SCHEMA_CODEC.XML, SOLRCONFIG-SOLR-749.XML, SOLRCONFIG-MASTER1-KEEPONEBACKUP.XML, STOP-2.TXT, SOLRCONFIG-FUNCTIONQUERY.XML, SCHEMA-LMDIRICHLET.XML, SOLRCONFIG-TERMINDEX.XML, SOLRCONFIG-ELEVATE.XML, STOPWORDS.TXT, SCHEMA-FOLDING.XML, SCHEMA-STOP-KEEP.XML, BAD-SCHEMA-NOT-INDEXED-BUT-NORMS.XML, SOLRCONFIG-SOLCOREPROPERTIES.XML, STOP-1.TXT, SOLRCONFIG-MASTER2.XML, SCHEMA-SPELLCHECKER.XML, SOLRCONFIG-LAZYWRITER.XML, SCHEMA-LUCENEMATCHVERSION.XML, BAD-MP-SOLRCONFIG.XML, FRENCHARTICLES.TXT, SCHEMA15.XML, SOLRCONFIG-REQHANDLER.INCL, SCHEMASURROUND.XML, SCHEMA-COLLATEFILTER.XML, SOLRCONFIG-MASTER3.XML, HUNSPELL-TEST.DIC, SOLRCONFIG-XINCLUDE.XML, SOLRCONFIG-DELPOLICY1.XML, SOLRCONFIG-SLAVE1.XML, SCHEMA-SIM.XML, SCHEMA-COLLATE.XML, STOP-SNOWBALL.TXT, PROTWORDS.TXT, SCHEMA-TRIE.XML, SOLRCONFIG_CODEC.XML, SCHEMA-TFIDF.XML, SCHEMA-LMJELINEKMERCER.XML, PHRASESUGGEST.TXT, SOLRCONFIG-BASIC-LUCENEVERSION31.XML, OLD_SYNONYMS.TXT, SOLRCONFIG-DELPOLICY2.XML, XSLT, SOLRCONFIG-NATIVELOCK.XML, BAD-SCHEMA-DUP-FIELD.XML, SOLRCONFIG-NOCACHE.XML, SCHEMA-BM25.XML, SOLRCONFIG-ALTDIRECTORY.XML, SOLRCONFIG-QUERYSENDER-NOQUERY.XML, COMPOUNDDICTIONARY.TXT, SOLRCONFIG_PERF.XML, SCHEMA-NOT-REQUIRED-UNIQUE-KEY.XML, KEEP-2.TXT, SCHEMA12.XML, MAPPING-ISOLATIN1ACCENT.TXT, BAD_SOLRCONFIG.XML, BAD-SCHEMA-EXTERNAL-FILEFIELD.XML] [junit4] 2 20834 T3143 oass.SolrIndexSearcher.init Opening Searcher@728679fd main [junit4] 2 20834 T3143 oass.SolrIndexSearcher.init WARNING WARNING: Directory impl does not support setting indexDir: org.apache.lucene.store.MockDirectoryWrapper [junit4] 2 20834 T3143 oasu.CommitTracker.init Hard AutoCommit: disabled [junit4] 2 20835 T3143 oasu.CommitTracker.init Soft AutoCommit: disabled [junit4] 2 20835 T3143 oashc.SpellCheckComponent.inform Initializing spell checkers [junit4] 2 20845 T3143 oass.DirectSolrSpellChecker.init init: {name=direct,classname=DirectSolrSpellChecker,field=lowerfilt,minQueryLength=3} [junit4] 2 20895 T3143 oashc.HttpShardHandlerFactory.getParameter Setting socketTimeout to: 0 [junit4] 2 20895 T3143 oashc.HttpShardHandlerFactory.getParameter Setting urlScheme to: http:// [junit4] 2 20895 T3143
[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281442#comment-13281442 ] Dawid Weiss commented on LUCENE-4062: - You didn't attach the updated benchmark -- I didn't say it explicitly but you should do something with the resulting value (jit optimizer is quite smart ;). A field store (write the result to a field) should do the trick. So is System.out.println of course... All this may sound paranoid but really isn't. This is a source of many problems with microbenchmarks -- the compiler just throws away (or optimizes loops/ branches) in a way that doesn't happen later on in real code. My recent favorite example of such a problem in real life code (it's a bug in jdk) is this one: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/332bebb463d1 More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Michael McCandless Priority: Minor Labels: performance Fix For: 4.1 Attachments: LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32 * 33: Packed64 * 34: Packed64 * 35: Packed64 * 36: Packed64 * 37: Packed64 * 38: Packed64 * 39: Packed64 * 40: Packed64 * 41: Packed64 * 42: Packed64 * 43: Packed64 * 44: Packed64 * 45: Packed64 * 46: Packed64 * 47: Packed64 * 48: Packed64 * 49: Packed64 * 50: Packed64 * 51: Packed64 * 52: Packed64 * 53: Packed64 * 54: Direct64 * 55: Direct64 * 56: Direct64 * 57: Direct64 * 58: Direct64 * 59: Direct64 * 60: Direct64 * 61: Direct64 * 62: Direct64 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still choose the slower Packed64 implementation. Allowing a 50% overhead would prevent the packed implementation to be selected for bits per value under 32. Allowing an overhead of 32 bits per value would make sure that a Direct* implementation is always selected. Next steps would be to: * make lucene components use this {{getMutable}} method and let users decide what trade-off better suits them, * write a Packed32SingleBlock implementation if necessary
[HEADS-UP]: Index File Format Change on Trunk
Hey folks, I just committed LUCENE-4051 [1] (Revision 1341768) which changes the file format of DocValues, Norms (DocValues), StoredFields TermVectors incompatible to previous revisions. If you are using trunk indices you must re-index before updating to the latest trunk sources. If you are using Lucene 3.x or below you can safely ignore this message. happy indexing, simon [1] https://issues.apache.org/jira/browse/LUCENE-4051 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4051) Fix File Headers for Lucene40 StoredFields TermVectors
[ https://issues.apache.org/jira/browse/LUCENE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-4051. - Resolution: Fixed committed to trunk in rev. 1341768. I send out a headsup mail to the dev list since this breaks the index file format. thanks for reviewing lets get 4.0-alpha out! Fix File Headers for Lucene40 StoredFields TermVectors Key: LUCENE-4051 URL: https://issues.apache.org/jira/browse/LUCENE-4051 Project: Lucene - Java Issue Type: Task Components: core/codecs Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-4051.patch, LUCENE-4051.patch, LUCENE-4051.patch, LUCENE-4051.patch, LUCENE-4051.patch Currently we still write the old file header format in Lucene40StoredFieldFormat Lucene40TermVectorsFormat. We should cut over to use CodecUtil and reset the versioning before we release Lucene 4.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java6-64 #163
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/163/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Jenkins: Lucene-Solr-trunk-Linux-Java6-64 #470
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/470/ -- [...truncated 4 lines...] [junit4] [junit4] Completed on J0 in 155.73s, 1 test, 1 failure FAILURES! [junit4] [junit4] Suite: org.apache.solr.EchoParamsTest [junit4] Completed on J0 in 0.11s, 1 test [junit4] [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterWFSTTest [junit4] Completed on J0 in 0.80s, 4 tests [junit4] [junit4] Suite: org.apache.solr.util.TestUtils [junit4] Completed on J0 in 0.01s, 3 tests [junit4] [junit4] Suite: org.apache.solr.handler.admin.LukeRequestHandlerTest [junit4] Completed on J0 in 1.54s, 3 tests [junit4] [junit4] Suite: org.apache.solr.search.similarities.TestBM25SimilarityFactory [junit4] Completed on J0 in 0.08s, 2 tests [junit4] [junit4] Suite: org.apache.solr.handler.TestCSVLoader [junit4] Completed on J0 in 0.75s, 5 tests [junit4] [junit4] Suite: org.apache.solr.update.UpdateParamsTest [junit4] Completed on J0 in 0.55s, 1 test [junit4] [junit4] Suite: org.apache.solr.analysis.TestFinnishLightStemFilterFactory [junit4] Completed on J0 in 0.01s, 1 test [junit4] [junit4] Suite: org.apache.solr.handler.component.DistributedTermsComponentTest [junit4] Completed on J0 in 5.54s, 1 test [junit4] [junit4] Suite: org.apache.solr.analysis.TestCJKBigramFilterFactory [junit4] Completed on J0 in 0.01s, 2 tests [junit4] [junit4] Suite: org.apache.solr.analysis.TestTrimFilterFactory [junit4] Completed on J0 in 0.00s, 1 test [junit4] [junit4] Suite: org.apache.solr.response.TestCSVResponseWriter [junit4] Completed on J0 in 0.52s, 1 test [junit4] [junit4] Suite: org.apache.solr.TestJoin [junit4] Completed on J1 in 40.93s, 2 tests [junit4] [junit4] Suite: org.apache.solr.request.TestFaceting [junit4] Completed on J0 in 8.58s, 3 tests [junit4] [junit4] Suite: org.apache.solr.cloud.TestHashPartitioner [junit4] Completed on J1 in 5.42s, 1 test [junit4] [junit4] Suite: org.apache.solr.handler.component.QueryElevationComponentTest [junit4] Completed on J0 in 3.41s, 7 tests [junit4] [junit4] Suite: org.apache.solr.search.function.TestFunctionQuery [junit4] Completed on J0 in 1.89s, 14 tests [junit4] [junit4] Suite: org.apache.solr.request.SimpleFacetsTest [junit4] Completed on J1 in 3.37s, 29 tests [junit4] [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterFSTTest [junit4] Completed on J0 in 0.78s, 4 tests [junit4] [junit4] Suite: org.apache.solr.handler.StandardRequestHandlerTest [junit4] Completed on J0 in 0.56s, 1 test [junit4] [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterTest [junit4] Completed on J0 in 0.77s, 4 tests [junit4] [junit4] Suite: org.apache.solr.core.SolrCoreTest [junit4] Completed on J1 in 3.05s, 5 tests [junit4] [junit4] Suite: org.apache.solr.BasicFunctionalityTest [junit4] IGNORED 0.00s J0 | BasicFunctionalityTest.testDeepPaging [junit4] Cause: Annotated @Ignore(See SOLR-1726) [junit4] Completed on J0 in 1.59s, 23 tests, 1 skipped [junit4] [junit4] Suite: org.apache.solr.core.TestCoreContainer [junit4] Completed on J1 in 1.48s, 1 test [junit4] [junit4] Suite: org.apache.solr.search.function.SortByFunctionTest [junit4] Completed on J0 in 1.18s, 2 tests [junit4] [junit4] Suite: org.apache.solr.schema.CopyFieldTest [junit4] Completed on J0 in 0.41s, 6 tests [junit4] [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterTSTTest [junit4] Completed on J1 in 0.66s, 4 tests [junit4] [junit4] Suite: org.apache.solr.core.RequestHandlersTest [junit4] Completed on J0 in 0.54s, 3 tests [junit4] [junit4] Suite: org.apache.solr.highlight.FastVectorHighlighterTest [junit4] Completed on J1 in 0.57s, 2 tests [junit4] [junit4] Suite: org.apache.solr.handler.XmlUpdateRequestHandlerTest [junit4] Completed on J0 in 0.50s, 3 tests [junit4] [junit4] Suite: org.apache.solr.search.TestQueryTypes [junit4] Completed on J1 in 0.47s, 1 test [junit4] [junit4] Suite: org.apache.solr.analysis.TestReversedWildcardFilterFactory [junit4] Completed on J0 in 0.40s, 4 tests [junit4] [junit4] Suite: org.apache.solr.schema.PrimitiveFieldTypeTest [junit4] Completed on J1 in 0.72s, 1 test [junit4] [junit4] Suite: org.apache.solr.response.TestPHPSerializedResponseWriter [junit4] Completed on J0 in 0.50s, 2 tests [junit4] [junit4] Suite: org.apache.solr.DisMaxRequestHandlerTest [junit4] Completed on J1 in 0.60s, 3 tests [junit4] [junit4] Suite: org.apache.solr.schema.RequiredFieldsTest [junit4] Completed on J0 in 0.49s, 3 tests [junit4] [junit4] Suite: org.apache.solr.core.IndexReaderFactoryTest [junit4] Completed on J1 in 0.47s, 1 test
Jenkins build is back to normal : Lucene-Solr-trunk-Linux-Java6-64 #471
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/471/changes - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281460#comment-13281460 ] Adrien Grand commented on LUCENE-4062: -- Hi David. Thanks for the link, it's very interesting! I added a print statement to make sure that the sum is actually computed. Here is the code (for values of n valueCount, just modify the k loop): {code} int valueCount = 1000; int bitsPerValue = 21; int[] offsets = new int[valueCount]; Random random = new Random(); for (int i = 0; i valueCount; ++i) { offsets[i] = random.nextInt(valueCount); } byte[] bytes = new byte[valueCount * 4]; DataOutput out = new ByteArrayDataOutput(bytes); PackedInts.Writer writer = PackedInts.getWriter(out, valueCount, bitsPerValue); for (int i = 0; i valueCount; ++i) { writer.add(random.nextInt(1 bitsPerValue)); } writer.finish(); long sum = 0L; for (int i = 0; i 50; ++i) { long start = System.nanoTime(); DataInput in = new ByteArrayDataInput(bytes); // PackedInts.Reader reader = PackedInts.getReader(in, 0f); // Packed64 PackedInts.Reader reader = PackedInts.getReader(in, 0.1f); // Packed64SingleBlock for (int k = 0; k 1; ++k) { for (int j = 0, n = valueCount / 2; j n; ++j) { sum += reader.get(offsets[j]); } } long end = System.nanoTime(); System.out.println(sum is + sum); System.out.println(end - start); } {code} I'm on a different computer today and n = valueCount/3 is enough to make the benchmark faster with Packed64SingleBlock. More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Michael McCandless Priority: Minor Labels: performance Fix For: 4.1 Attachments: LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32 * 33: Packed64 * 34: Packed64 * 35: Packed64 * 36: Packed64 * 37: Packed64 * 38: Packed64 * 39: Packed64 * 40: Packed64 * 41: Packed64 * 42: Packed64 * 43: Packed64 * 44: Packed64 * 45: Packed64 * 46: Packed64 * 47: Packed64 * 48: Packed64 * 49: Packed64 * 50: Packed64 * 51: Packed64 * 52: Packed64 *
N-Gram Threshould
Hi, I made n-gram analyzer, but I am not able to set threshold during searching corresponding to index.please help me. - REACH YOUR GOAL BEFORE GOAL KICKS YOU. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/N-Gram-Threshould-tp3985614.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java7-64 #96
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/96/changes - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2614) stats with pivot
[ https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281478#comment-13281478 ] Marek Woroniecki commented on SOLR-2614: I would also like to see this feature. In our app we do a lot of faceting to allow users to drill down into data by selecting particular values per particular fields. We also calculate stats for these selections using stat component. However, quite often we need to group documents into chunks by their common attributes and then calculate stats as well. In classic approach using database we would probably do that with the group by phrase and use some aggregating functions. Unfortunately for some reasons this is not an easy option in our case, and we are left with either reading all the documents and calculating grouping in memory, or our users have to extract all the data to csv and do some pivots / stats in excel. I would be more than happy to implement this patch, if I only knew more about how Lucene / Solr works internally :( stats with pivot Key: SOLR-2614 URL: https://issues.apache.org/jira/browse/SOLR-2614 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.0 Reporter: pengyao Priority: Critical Fix For: 4.1 Is it possible to get stats (like Stats Component: min ,max, sum, count, missing, sumOfSquares, mean and stddev) from numeric fields inside hierarchical facets (with more than one level, like Pivot)? I would like to query: ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z and get min, max, sum, count, etc. from numeric_field1 and numeric_field2 from all combinations of field_x, field_y and field_z (hierarchical values). Using stats.facet I get just one field at one level and using facet.pivot I get just counts, but no stats. Looping in client application to do all combinations of facets values will be to slow because there is a lot of combinations. Thanks a lot! this is very import,because only counts value,it's no use for sometimes. please add stats with pivot in solr 4.0 thanks a lot -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 14280 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/14280/ 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: ERROR: SolrIndexSearcher opens=80 closes=78 Stack Trace: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=80 closes=78 at __randomizedtesting.SeedInfo.seed([7A9E536CED82AC23]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:190) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:82) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:752) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log (for compile errors): [...truncated 11378 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3476) Create a Solr Core with a given commit point
[ https://issues.apache.org/jira/browse/SOLR-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281484#comment-13281484 ] ludovic Boutros commented on SOLR-3476: --- some examples of usages : - Create a new core with a given commit point generation : bq. http://localhost:8084/solr/admin/cores?action=CREATEname=core4commitPointGeneration=4instanceDir=test - Get the status of this core : bq. http://localhost:8084/solr/admin/cores?action=STATUScore=core4 {code:xml} response lst name=responseHeader int name=status0/int int name=QTime2692/int /lst lst name=status lst name=core4 str name=namecore4/str str name=instanceDirD:\temp\bases\testCores\test\/str str name=dataDirD:\temp\bases\testCores\test\data\/str date name=startTime2012-05-23T09:31:50.483Z/date long name=uptime149054/long long name=indexCommitGeneration4/long lst name=indexCommitList long name=generation1/long long name=generation2/long long name=generation3/long long name=generation4/long long name=generation5/long long name=generation6/long long name=generation7/long /lst lst name=index int name=numDocs3/int int name=maxDoc3/int long name=version1337759534761/long int name=segmentCount3/int bool name=currentfalse/bool bool name=hasDeletionsfalse/bool str name=directoryorg.apache.lucene.store.SimpleFSDirectory:org.apache.lucene.store.SimpleFSDirectory@D:\temp\bases\testCores\test\data\index lockFactory=org.apache.lucene.store.NativeFSLockFactory@1c24b45/str date name=lastModified2012-05-23T09:22:10.713Z/date /lst /lst /lst /response {code} We can see the current commit point generation and the available commit point list. - Now the solr.xml file : {code:xml} solr sharedLib=lib persistent=true cores adminPath=/admin/cores core name=core4 instanceDir=test\ commitPointGeneration=4/ /cores /solr {code} Create a Solr Core with a given commit point Key: SOLR-3476 URL: https://issues.apache.org/jira/browse/SOLR-3476 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 3.6 Reporter: ludovic Boutros Attachments: commitPoint.patch In some configurations, we need to open new cores with a given commit point. For instance, when the publication of new documents must be controlled (legal obligations) in a master-slave configuration there are two cores on the same instanceDir and dataDir which are using two versions of the index. The switch of the two cores is done manually. The problem is that when the replication is done one day before the switch, if any problem occurs, and we need to restart tomcat, the new documents are published. With this functionality, we could ensure that the index generation used by the core used for querying is always the good one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3480) Refactor httpclient impl details into a utility class
Sami Siren created SOLR-3480: Summary: Refactor httpclient impl details into a utility class Key: SOLR-3480 URL: https://issues.apache.org/jira/browse/SOLR-3480 Project: Solr Issue Type: Improvement Components: clients - java, replication (java), SolrCloud Reporter: Sami Siren Assignee: Sami Siren Priority: Minor Currently there are multiple classes that deal with the impl details of httpclient when setting timeouts, basic auth details, retry handling, compression etc. I am proposing that we instead move this functionality into a reusable utility class. The ultimate goal is to be able to easily use for example https or basic auth (that can already be used in some parts of solr) throughout solr but that will require some more work. I will submit a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3480) Refactor httpclient impl details into a utility class
[ https://issues.apache.org/jira/browse/SOLR-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated SOLR-3480: - Attachment: SOLR-3480.patch Refactor httpclient impl details into a utility class - Key: SOLR-3480 URL: https://issues.apache.org/jira/browse/SOLR-3480 Project: Solr Issue Type: Improvement Components: clients - java, replication (java), SolrCloud Reporter: Sami Siren Assignee: Sami Siren Priority: Minor Attachments: SOLR-3480.patch Currently there are multiple classes that deal with the impl details of httpclient when setting timeouts, basic auth details, retry handling, compression etc. I am proposing that we instead move this functionality into a reusable utility class. The ultimate goal is to be able to easily use for example https or basic auth (that can already be used in some parts of solr) throughout solr but that will require some more work. I will submit a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3478) DataImportHandler's Entity must have a name
[ https://issues.apache.org/jira/browse/SOLR-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3478: Assignee: (was: Stefan Matheis (steffkes)) Ah okay! I opened this issue, Credits to [Emma, she reported this on the ML|http://lucene.472066.n3.nabble.com/Solr-mail-dataimporter-cannot-be-found-tc3985223.html] James will you take care of this one and i'll remove my patch, because this should not be required, right? DataImportHandler's Entity must have a name --- Key: SOLR-3478 URL: https://issues.apache.org/jira/browse/SOLR-3478 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: r1341454, {code}java -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar{code} Reporter: Stefan Matheis (steffkes) Fix For: 4.0 Attachments: SOLR-3478.patch Using trunk and trying to start the {{example-DIH}} version, throws the following Exception: {code}May 22, 2012 8:17:45 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:614) [...] Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Entity must have a name. at org.apache.solr.handler.dataimport.config.Entity.init(Entity.java:54) at org.apache.solr.handler.dataimport.config.DIHConfiguration.init(DIHConfiguration.java:61) at org.apache.solr.handler.dataimport.DataImporter.readFromXml(DataImporter.java:249) at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:187) ... 49 more{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281519#comment-13281519 ] Simon Willnauer commented on LUCENE-2878: - hey folks, due to heavy modifications on trunk I had almost no choice but creating a new branch and manually move over the changes via selective diffs. the branch is now here: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-2878 the current state of the branch is: it compiles :) lots of nocommits / todos and several tests failing due to not implemented stuff on new specialized boolean scorers. Happy coding everybody! Allow Scorer to expose positions and payloads aka. nuke spans -- Key: LUCENE-2878 URL: https://issues.apache.org/jira/browse/LUCENE-2878 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: Positions Branch Reporter: Simon Willnauer Assignee: Simon Willnauer Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, mentor Fix For: Positions Branch Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch Currently we have two somewhat separate types of queries, the one which can make use of positions (mainly spans) and payloads (spans). Yet Span*Query doesn't really do scoring comparable to what other queries do and at the end of the day they are duplicating lot of code all over lucene. Span*Queries are also limited to other Span*Query instances such that you can not use a TermQuery or a BooleanQuery with SpanNear or anthing like that. Beside of the Span*Query limitation other queries lacking a quiet interesting feature since they can not score based on term proximity since scores doesn't expose any positional information. All those problems bugged me for a while now so I stared working on that using the bulkpostings API. I would have done that first cut on trunk but TermScorer is working on BlockReader that do not expose positions while the one in this branch does. I started adding a new Positions class which users can pull from a scorer, to prevent unnecessary positions enums I added ScorerContext#needsPositions and eventually Scorere#needsPayloads to create the corresponding enum on demand. Yet, currently only TermQuery / TermScorer implements this API and other simply return null instead. To show that the API really works and our BulkPostings work fine too with positions I cut over TermSpanQuery to use a TermScorer under the hood and nuked TermSpans entirely. A nice sideeffect of this was that the Position BulkReading implementation got some exercise which now :) work all with positions while Payloads for bulkreading are kind of experimental in the patch and those only work with Standard codec. So all spans now work on top of TermScorer ( I truly hate spans since today ) including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother to implement the other codecs yet since I want to get feedback on the API and on this first cut before I go one with it. I will upload the corresponding patch in a minute. I also had to cut over SpanQuery.getSpans(IR) to SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk first but after that pain today I need a break first :). The patch passes all core tests (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Using term offsets for hit highlighting
alan, I merged the branch manually and created a new branch from it. its here: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-2878 the branch compiles but lots of nocommits / todos if you have questions please ask I will help as much as I can simon On Tue, May 22, 2012 at 8:38 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Hey, I reckon I can have a decent go at getting the branch updated. Is it best to work this out as a patch applying to trunk? Any patch that merges in all the trunk changes to the branch is going to be absolutely massive… On 17 May 2012, at 13:15, Simon Willnauer wrote: ok man. I will try to merge up the branch. I tell you this is going to be messy and it might not compile but I will make it reasonable so you can start. simon On Thu, May 17, 2012 at 8:03 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Sorry for vanishing for so long, life unexpectedly caught up with me... I'm going to have some time to look at this again next week though, if you're interested in picking it up again. On 21 Mar 2012, at 09:02, Alan Woodward wrote: That would be great, thanks! I had a go at merging it last night, but there are a *lot* of changes that I haven't got my head round yet, so it was getting pretty messy. On 21 Mar 2012, at 08:49, Simon Willnauer wrote: Alan, if you want I can just merge the branch up next week and we iterate from there? simon On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson erickerick...@gmail.com wrote: Yep, the first challenge is always getting the old patch(es) to apply. On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Thanks for all the offers of help! It looks as though most of the hard work has already been done, which is exactly where I like to pick up projects. :-) Maybe the best place to start would be for me to rebase the branch against trunk, and see what still fits? I think there have been some fairly major changes in the internals since July last year. On 19 Mar 2012, at 17:07, Mike Sokolov wrote: I posted a patch with a Collector somewhat similar to what you described, Alan - it's attached to one of the sub-issues https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly complete alpha state, but has seen no production use of course, since it relies on the remainder of the unfinished work in that branch. It works by creating a TokenStream based on match positions returned from the query and passing that to the existing Highlighter. Please feel free to get in touch if you decide to look into that and have questions. -Mike On 03/19/2012 11:51 AM, Simon Willnauer wrote: On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindleru...@thetaphi.de wrote: Have you marked that for GSOC? Would be a good idea! yes I did - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Monday, March 19, 2012 4:43 PM To: dev@lucene.apache.org Subject: Re: Using term offsets for hit highlighting Alan, you made my day! The branch is kind of outdated but I looked at it lately and I can certainly help to get it up to speed. The feature in that branch is quite a big one and its in a very early stage. Still I want to encourage you to take a look and work on it. I promise all my help with the issues! let me know if you have questions! simon On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Cool, thanks Robert. I'll take a look at the JIRA ticket. On 19 Mar 2012, at 14:44, Robert Muir wrote: On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Hello, The project I'm currently working on requires the reporting of exact hit positions from some pretty hairy queries, not all of which are covered by the existing highlighter modules. I'm working round this by translating everything into SpanQueries, and using the getSpans() method to locate hits (I've extended the Spans interface to make term offsets available - see https://issues.apache.org/jira/browse/LUCENE-3826). This works for our use-case, but isn't terribly efficient, and obviously isn't applicable to non-Span queries. I've seen a bit of chatter on the list about using term offsets to provide accurate highlighting in Lucene. I'm going to have a couple of weeks free in April, and I thought I might have a go at implementing this. Mainly I'm wondering if there's already been thoughts about how to do it. My current thoughts are to somehow extend the Weight and Scorer interface to make term offsets available; to get highlights for a given set of documents, you'd essentially run the query again, with a filter on just the documents you want highlighted, and have a
Re: Using term offsets for hit highlighting
Sweet, thanks Simon. I'll have a go at getting some failing tests passing to begin with. On 23 May 2012, at 11:59, Simon Willnauer wrote: alan, I merged the branch manually and created a new branch from it. its here: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-2878 the branch compiles but lots of nocommits / todos if you have questions please ask I will help as much as I can simon On Tue, May 22, 2012 at 8:38 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Hey, I reckon I can have a decent go at getting the branch updated. Is it best to work this out as a patch applying to trunk? Any patch that merges in all the trunk changes to the branch is going to be absolutely massive… On 17 May 2012, at 13:15, Simon Willnauer wrote: ok man. I will try to merge up the branch. I tell you this is going to be messy and it might not compile but I will make it reasonable so you can start. simon On Thu, May 17, 2012 at 8:03 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Sorry for vanishing for so long, life unexpectedly caught up with me... I'm going to have some time to look at this again next week though, if you're interested in picking it up again. On 21 Mar 2012, at 09:02, Alan Woodward wrote: That would be great, thanks! I had a go at merging it last night, but there are a *lot* of changes that I haven't got my head round yet, so it was getting pretty messy. On 21 Mar 2012, at 08:49, Simon Willnauer wrote: Alan, if you want I can just merge the branch up next week and we iterate from there? simon On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson erickerick...@gmail.com wrote: Yep, the first challenge is always getting the old patch(es) to apply. On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Thanks for all the offers of help! It looks as though most of the hard work has already been done, which is exactly where I like to pick up projects. :-) Maybe the best place to start would be for me to rebase the branch against trunk, and see what still fits? I think there have been some fairly major changes in the internals since July last year. On 19 Mar 2012, at 17:07, Mike Sokolov wrote: I posted a patch with a Collector somewhat similar to what you described, Alan - it's attached to one of the sub-issues https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly complete alpha state, but has seen no production use of course, since it relies on the remainder of the unfinished work in that branch. It works by creating a TokenStream based on match positions returned from the query and passing that to the existing Highlighter. Please feel free to get in touch if you decide to look into that and have questions. -Mike On 03/19/2012 11:51 AM, Simon Willnauer wrote: On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindleru...@thetaphi.de wrote: Have you marked that for GSOC? Would be a good idea! yes I did - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Monday, March 19, 2012 4:43 PM To: dev@lucene.apache.org Subject: Re: Using term offsets for hit highlighting Alan, you made my day! The branch is kind of outdated but I looked at it lately and I can certainly help to get it up to speed. The feature in that branch is quite a big one and its in a very early stage. Still I want to encourage you to take a look and work on it. I promise all my help with the issues! let me know if you have questions! simon On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Cool, thanks Robert. I'll take a look at the JIRA ticket. On 19 Mar 2012, at 14:44, Robert Muir wrote: On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Hello, The project I'm currently working on requires the reporting of exact hit positions from some pretty hairy queries, not all of which are covered by the existing highlighter modules. I'm working round this by translating everything into SpanQueries, and using the getSpans() method to locate hits (I've extended the Spans interface to make term offsets available - see https://issues.apache.org/jira/browse/LUCENE-3826). This works for our use-case, but isn't terribly efficient, and obviously isn't applicable to non-Span queries. I've seen a bit of chatter on the list about using term offsets to provide accurate highlighting in Lucene. I'm going to have a couple of weeks free in April, and I thought I might have a go at implementing this. Mainly I'm wondering if there's already been thoughts about how to do it. My current thoughts are to somehow extend the Weight and Scorer interface to
Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java7-64 #97
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/97/ -- [...truncated 10994 lines...] [junit4] [junit4] Suite: org.apache.solr.internal.csv.CSVPrinterTest [junit4] Completed in 0.71s, 6 tests [junit4] [junit4] Suite: org.apache.solr.core.TestSolrDeletionPolicy2 [junit4] Completed in 1.03s, 1 test [junit4] [junit4] Suite: org.apache.solr.cloud.BasicDistributedZkTest [junit4] Completed in 61.64s, 1 test [junit4] [junit4] Suite: org.apache.solr.cloud.BasicZkTest [junit4] Completed in 12.07s, 1 test [junit4] [junit4] Suite: org.apache.solr.cloud.ZkControllerTest [junit4] Completed in 22.53s, 3 tests [junit4] [junit4] Suite: org.apache.solr.cloud.TestHashPartitioner [junit4] Completed in 8.15s, 1 test [junit4] [junit4] Suite: org.apache.solr.request.SimpleFacetsTest [junit4] Completed in 8.55s, 29 tests [junit4] [junit4] Suite: org.apache.solr.handler.MoreLikeThisHandlerTest [junit4] Completed in 1.26s, 1 test [junit4] [junit4] Suite: org.apache.solr.ConvertedLegacyTest [junit4] Completed in 3.86s, 1 test [junit4] [junit4] Suite: org.apache.solr.core.TestJmxIntegration [junit4] IGNORED 0.00s | TestJmxIntegration.testJmxOnCoreReload [junit4] Cause: Annotated @Ignore(timing problem? https://issues.apache.org/jira/browse/SOLR-2715) [junit4] Completed in 1.99s, 3 tests, 1 skipped [junit4] [junit4] Suite: org.apache.solr.servlet.SolrRequestParserTest [junit4] Completed in 1.70s, 4 tests [junit4] [junit4] Suite: org.apache.solr.handler.StandardRequestHandlerTest [junit4] Completed in 1.04s, 1 test [junit4] [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterTest [junit4] Completed in 1.51s, 4 tests [junit4] [junit4] Suite: org.apache.solr.BasicFunctionalityTest [junit4] IGNORED 0.00s | BasicFunctionalityTest.testDeepPaging [junit4] Cause: Annotated @Ignore(See SOLR-1726) [junit4] Completed in 2.95s, 23 tests, 1 skipped [junit4] [junit4] Suite: org.apache.solr.update.SolrCmdDistributorTest [junit4] Completed in 2.55s, 1 test [junit4] [junit4] Suite: org.apache.solr.spelling.IndexBasedSpellCheckerTest [junit4] Completed in 1.62s, 5 tests [junit4] [junit4] Suite: org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest [junit4] Completed in 1.45s, 6 tests [junit4] [junit4] Suite: org.apache.solr.core.TestCoreContainer [junit4] Completed in 2.67s, 1 test [junit4] [junit4] Suite: org.apache.solr.handler.CSVRequestHandlerTest [junit4] Completed in 0.88s, 1 test [junit4] [junit4] Suite: org.apache.solr.search.function.SortByFunctionTest [junit4] Completed in 1.78s, 2 tests [junit4] [junit4] Suite: org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactoryTest [junit4] Completed in 0.74s, 1 test [junit4] [junit4] Suite: org.apache.solr.handler.admin.CoreAdminHandlerTest [junit4] Completed in 1.71s, 1 test [junit4] [junit4] Suite: org.apache.solr.spelling.SpellCheckCollatorTest [junit4] Completed in 1.37s, 5 tests [junit4] [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterTSTTest [junit4] Completed in 1.03s, 4 tests [junit4] [junit4] Suite: org.apache.solr.request.TestBinaryResponseWriter [junit4] Completed in 1.32s, 2 tests [junit4] [junit4] Suite: org.apache.solr.servlet.NoCacheHeaderTest [junit4] Completed in 0.81s, 3 tests [junit4] [junit4] Suite: org.apache.solr.servlet.CacheHeaderTest [junit4] Completed in 0.82s, 5 tests [junit4] [junit4] Suite: org.apache.solr.core.TestPropInject [junit4] Completed in 1.45s, 2 tests [junit4] [junit4] Suite: org.apache.solr.schema.CopyFieldTest [junit4] Completed in 0.57s, 6 tests [junit4] [junit4] Suite: org.apache.solr.core.TestSolrDeletionPolicy1 [junit4] IGNOR/A 0.02s | TestSolrDeletionPolicy1.testCommitAge [junit4] Assumption #1: This test is not working on Windows (or maybe machines with only 2 CPUs) [junit4] 2 780 T3476 oas.SolrTestCaseJ4.setUp ###Starting testCommitAge [junit4] 2 ASYNC NEW_CORE C45 name=collection1 org.apache.solr.core.SolrCore@3e9f3371 [junit4] 2 784 T3476 C45 oasu.DirectUpdateHandler2.deleteAll [collection1] REMOVING ALL DOCUMENTS FROM INDEX [junit4] 2 787 T3476 C45 oasc.SolrDeletionPolicy.onInit SolrDeletionPolicy.onInit: commits:num=1 [junit4] 2 commit{dir=MockDirWrapper(org.apache.lucene.store.RAMDirectory@38f9f930 lockFactory=org.apache.lucene.store.NativeFSLockFactory@1b67117f),segFN=segments_1,generation=1,filenames=[segments_1] [junit4] 2 787 T3476 C45 oasc.SolrDeletionPolicy.updateCommits newest commit = 1 [junit4] 2 787 T3476 C45 UPDATE [collection1] webapp=null path=null params={} {deleteByQuery=*:*} 0 3 [junit4] 2 792 T3476
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 14283 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/14283/ 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler Error Message: ERROR: SolrIndexSearcher opens=74 closes=73 Stack Trace: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=74 closes=73 at __randomizedtesting.SeedInfo.seed([2D7514737EA2DD02]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:190) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:82) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:752) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log (for compile errors): [...truncated 10346 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3476) Create a Solr Core with a given commit point
[ https://issues.apache.org/jira/browse/SOLR-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ludovic Boutros updated SOLR-3476: -- Issue Type: Improvement (was: New Feature) Create a Solr Core with a given commit point Key: SOLR-3476 URL: https://issues.apache.org/jira/browse/SOLR-3476 Project: Solr Issue Type: Improvement Components: multicore Affects Versions: 3.6 Reporter: ludovic Boutros Attachments: commitPoint.patch In some configurations, we need to open new cores with a given commit point. For instance, when the publication of new documents must be controlled (legal obligations) in a master-slave configuration there are two cores on the same instanceDir and dataDir which are using two versions of the index. The switch of the two cores is done manually. The problem is that when the replication is done one day before the switch, if any problem occurs, and we need to restart tomcat, the new documents are published. With this functionality, we could ensure that the index generation used by the core used for querying is always the good one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Using term offsets for hit highlighting
hey alan, I added position iterator support to ConjunctionTermScorer and committed it to the branch. All tests that don't rely on payloads are passing in core. Previously we had to decide if we need positions up front, the current code can pull them lazily which causes less changes on the Scorer API. I think we should keep it that way, the only problem is that we have currently now way to pass information to the iterators if we need payloads or not. Same is true for offsets since they are now in the index. I think it would be good if you could tackle the payloads first and pass some info to the Scorer#positions() method so we can pull the right thing. happy coding. simon On Wed, May 23, 2012 at 1:23 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Sweet, thanks Simon. I'll have a go at getting some failing tests passing to begin with. On 23 May 2012, at 11:59, Simon Willnauer wrote: alan, I merged the branch manually and created a new branch from it. its here: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-2878 the branch compiles but lots of nocommits / todos if you have questions please ask I will help as much as I can simon On Tue, May 22, 2012 at 8:38 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Hey, I reckon I can have a decent go at getting the branch updated. Is it best to work this out as a patch applying to trunk? Any patch that merges in all the trunk changes to the branch is going to be absolutely massive… On 17 May 2012, at 13:15, Simon Willnauer wrote: ok man. I will try to merge up the branch. I tell you this is going to be messy and it might not compile but I will make it reasonable so you can start. simon On Thu, May 17, 2012 at 8:03 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Sorry for vanishing for so long, life unexpectedly caught up with me... I'm going to have some time to look at this again next week though, if you're interested in picking it up again. On 21 Mar 2012, at 09:02, Alan Woodward wrote: That would be great, thanks! I had a go at merging it last night, but there are a *lot* of changes that I haven't got my head round yet, so it was getting pretty messy. On 21 Mar 2012, at 08:49, Simon Willnauer wrote: Alan, if you want I can just merge the branch up next week and we iterate from there? simon On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson erickerick...@gmail.com wrote: Yep, the first challenge is always getting the old patch(es) to apply. On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Thanks for all the offers of help! It looks as though most of the hard work has already been done, which is exactly where I like to pick up projects. :-) Maybe the best place to start would be for me to rebase the branch against trunk, and see what still fits? I think there have been some fairly major changes in the internals since July last year. On 19 Mar 2012, at 17:07, Mike Sokolov wrote: I posted a patch with a Collector somewhat similar to what you described, Alan - it's attached to one of the sub-issues https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly complete alpha state, but has seen no production use of course, since it relies on the remainder of the unfinished work in that branch. It works by creating a TokenStream based on match positions returned from the query and passing that to the existing Highlighter. Please feel free to get in touch if you decide to look into that and have questions. -Mike On 03/19/2012 11:51 AM, Simon Willnauer wrote: On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindleru...@thetaphi.de wrote: Have you marked that for GSOC? Would be a good idea! yes I did - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Monday, March 19, 2012 4:43 PM To: dev@lucene.apache.org Subject: Re: Using term offsets for hit highlighting Alan, you made my day! The branch is kind of outdated but I looked at it lately and I can certainly help to get it up to speed. The feature in that branch is quite a big one and its in a very early stage. Still I want to encourage you to take a look and work on it. I promise all my help with the issues! let me know if you have questions! simon On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Cool, thanks Robert. I'll take a look at the JIRA ticket. On 19 Mar 2012, at 14:44, Robert Muir wrote: On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Hello, The project I'm currently working on requires the reporting of exact hit positions from some pretty hairy queries, not all of which are covered by the existing highlighter modules. I'm
[jira] [Created] (SOLR-3481) Date field value differs between two installations
David Rekowski created SOLR-3481: Summary: Date field value differs between two installations Key: SOLR-3481 URL: https://issues.apache.org/jira/browse/SOLR-3481 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.6 Environment: A. Mac 10.7.4 with integrated Jetty B. Ubuntu 12.04 with Tomcat Reporter: David Rekowski When I query the Solr Server, I get a formatted timestamp in environment A 2012-05-11T12:59:01.691Z, whereas I get a unix timestamp like number in environment B 1336728376797 which looks like the date extended by microseconds. The corresponding schema definition: field name=index_time_s type=date indexed=true stored=true default=NOW multiValued=false/ Background: We migrated an index generated on a mac/jetty to a linux/tomcat installation of Solr. Regardless of that, this happens with newly indexed documents as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3238) Sequel of Admin UI
[ https://issues.apache.org/jira/browse/SOLR-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281607#comment-13281607 ] Markus Jelsma commented on SOLR-3238: - I think it would also be useful to display the shard information in the core overview page such as its ID and whether it is a leader. Sequel of Admin UI -- Key: SOLR-3238 URL: https://issues.apache.org/jira/browse/SOLR-3238 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.0 Reporter: Stefan Matheis (steffkes) Assignee: Stefan Matheis (steffkes) Fix For: 4.0 Attachments: SOLR-3238.patch, SOLR-3238.patch, SOLR-3238.patch, solradminbug.png Catch-All Issue for all upcoming Bugs/Reports/Suggestions on the Admin UI -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java7-64 #98
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/98/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java6-64 #168
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/168/ -- [...truncated 16649 lines...] [junit4] 2 62730 T2861 oashl.XMLLoader.init xsltCacheLifetimeSeconds=60 [junit4] 2 62732 T2861 oasc.SolrCore.initDeprecatedSupport WARNING solrconfig.xml uses deprecated admin/gettableFiles, Please update your config to use the ShowFileRequestHandler. [junit4] 2 62733 T2861 oasc.SolrCore.initDeprecatedSupport WARNING adding ShowFileRequestHandler with hidden files: [SOLRCONFIG-HIGHLIGHT.XML, SCHEMA-REQUIRED-FIELDS.XML, SCHEMA-REPLICATION2.XML, SCHEMA-MINIMAL.XML, BAD-SCHEMA-DUP-DYNAMICFIELD.XML, SOLRCONFIG-CACHING.XML, SOLRCONFIG-REPEATER.XML, CURRENCY.XML, BAD-SCHEMA-NONTEXT-ANALYZER.XML, SOLRCONFIG-MERGEPOLICY.XML, SOLRCONFIG-TLOG.XML, SOLRCONFIG-MASTER.XML, SCHEMA11.XML, SOLRCONFIG-BASIC.XML, DA_COMPOUNDDICTIONARY.TXT, SCHEMA-COPYFIELD-TEST.XML, SOLRCONFIG-SLAVE.XML, ELEVATE.XML, SOLRCONFIG-PROPINJECT-INDEXDEFAULT.XML, SCHEMA-IB.XML, SOLRCONFIG-QUERYSENDER.XML, SCHEMA-REPLICATION1.XML, DA_UTF8.XML, HYPHENATION.DTD, SOLRCONFIG-ENABLEPLUGIN.XML, STEMDICT.TXT, SCHEMA-PHRASESUGGEST.XML, HUNSPELL-TEST.AFF, STOPTYPES-1.TXT, STOPWORDSWRONGENCODING.TXT, SCHEMA-NUMERIC.XML, SOLRCONFIG-TRANSFORMERS.XML, SOLRCONFIG-PROPINJECT.XML, BAD-SCHEMA-NOT-INDEXED-BUT-TF.XML, SOLRCONFIG-SIMPLELOCK.XML, WDFTYPES.TXT, STOPTYPES-2.TXT, SCHEMA-REVERSED.XML, SOLRCONFIG-SPELLCHECKCOMPONENT.XML, SCHEMA-DFR.XML, SOLRCONFIG-PHRASESUGGEST.XML, BAD-SCHEMA-NOT-INDEXED-BUT-POS.XML, KEEP-1.TXT, OPEN-EXCHANGE-RATES.JSON, STOPWITHBOM.TXT, SCHEMA-BINARYFIELD.XML, SOLRCONFIG-SPELLCHECKER.XML, SOLRCONFIG-UPDATE-PROCESSOR-CHAINS.XML, BAD-SCHEMA-OMIT-TF-BUT-NOT-POS.XML, BAD-SCHEMA-DUP-FIELDTYPE.XML, SOLRCONFIG-MASTER1.XML, SYNONYMS.TXT, SCHEMA.XML, SCHEMA_CODEC.XML, SOLRCONFIG-SOLR-749.XML, SOLRCONFIG-MASTER1-KEEPONEBACKUP.XML, STOP-2.TXT, SOLRCONFIG-FUNCTIONQUERY.XML, SCHEMA-LMDIRICHLET.XML, SOLRCONFIG-TERMINDEX.XML, SOLRCONFIG-ELEVATE.XML, STOPWORDS.TXT, SCHEMA-FOLDING.XML, SCHEMA-STOP-KEEP.XML, BAD-SCHEMA-NOT-INDEXED-BUT-NORMS.XML, SOLRCONFIG-SOLCOREPROPERTIES.XML, STOP-1.TXT, SOLRCONFIG-MASTER2.XML, SCHEMA-SPELLCHECKER.XML, SOLRCONFIG-LAZYWRITER.XML, SCHEMA-LUCENEMATCHVERSION.XML, BAD-MP-SOLRCONFIG.XML, FRENCHARTICLES.TXT, SCHEMA15.XML, SOLRCONFIG-REQHANDLER.INCL, SCHEMASURROUND.XML, SCHEMA-COLLATEFILTER.XML, SOLRCONFIG-MASTER3.XML, HUNSPELL-TEST.DIC, SOLRCONFIG-XINCLUDE.XML, SOLRCONFIG-DELPOLICY1.XML, SOLRCONFIG-SLAVE1.XML, SCHEMA-SIM.XML, SCHEMA-COLLATE.XML, STOP-SNOWBALL.TXT, PROTWORDS.TXT, SCHEMA-TRIE.XML, SOLRCONFIG_CODEC.XML, SCHEMA-TFIDF.XML, SCHEMA-LMJELINEKMERCER.XML, PHRASESUGGEST.TXT, SOLRCONFIG-BASIC-LUCENEVERSION31.XML, OLD_SYNONYMS.TXT, SOLRCONFIG-DELPOLICY2.XML, XSLT, SOLRCONFIG-NATIVELOCK.XML, BAD-SCHEMA-DUP-FIELD.XML, SOLRCONFIG-NOCACHE.XML, SCHEMA-BM25.XML, SOLRCONFIG-ALTDIRECTORY.XML, SOLRCONFIG-QUERYSENDER-NOQUERY.XML, COMPOUNDDICTIONARY.TXT, SOLRCONFIG_PERF.XML, SCHEMA-NOT-REQUIRED-UNIQUE-KEY.XML, KEEP-2.TXT, SCHEMA12.XML, MAPPING-ISOLATIN1ACCENT.TXT, BAD_SOLRCONFIG.XML, BAD-SCHEMA-EXTERNAL-FILEFIELD.XML] [junit4] 2 62737 T2861 oass.SolrIndexSearcher.init Opening Searcher@73f3e55 main [junit4] 2 62737 T2861 oass.SolrIndexSearcher.init WARNING WARNING: Directory impl does not support setting indexDir: org.apache.lucene.store.MockDirectoryWrapper [junit4] 2 62739 T2861 oasu.CommitTracker.init Hard AutoCommit: disabled [junit4] 2 62739 T2861 oasu.CommitTracker.init Soft AutoCommit: disabled [junit4] 2 62739 T2861 oashc.SpellCheckComponent.inform Initializing spell checkers [junit4] 2 62752 T2861 oass.DirectSolrSpellChecker.init init: {name=direct,classname=DirectSolrSpellChecker,field=lowerfilt,minQueryLength=3} [junit4] 2 62825 T2861 oashc.HttpShardHandlerFactory.getParameter Setting socketTimeout to: 0 [junit4] 2 62825 T2861 oashc.HttpShardHandlerFactory.getParameter Setting urlScheme to: http:// [junit4] 2 62825 T2861 oashc.HttpShardHandlerFactory.getParameter Setting connTimeout to: 0 [junit4] 2 62825 T2861 oashc.HttpShardHandlerFactory.getParameter Setting maxConnectionsPerHost to: 20 [junit4] 2 62825 T2861 oashc.HttpShardHandlerFactory.getParameter Setting corePoolSize to: 0 [junit4] 2 62825 T2861 oashc.HttpShardHandlerFactory.getParameter Setting maximumPoolSize to: 2147483647 [junit4] 2 62825 T2861 oashc.HttpShardHandlerFactory.getParameter Setting maxThreadIdleTime to: 5 [junit4] 2 62825 T2861 oashc.HttpShardHandlerFactory.getParameter Setting sizeOfQueue to: -1 [junit4] 2 62825 T2861 oashc.HttpShardHandlerFactory.getParameter Setting fairnessPolicy to: false [junit4] 2 62841 T2861 oasc.CoreContainer.register registering core: collection1 [junit4] 2 62842 T2861 oasu.AbstractSolrTestCase.setUp SETUP_END testSoftAndHardCommitMaxTimeMixedAdds [junit4] 2 62842 T2861
[jira] [Comment Edited] (SOLR-3478) DataImportHandler's Entity must have a name
[ https://issues.apache.org/jira/browse/SOLR-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281135#comment-13281135 ] James Dyer edited comment on SOLR-3478 at 5/23/12 2:45 PM: --- Thanks for finding this one. Looking at this issue, I'm pretty sure I introduced this bug with SOLR-3422. was (Author: jdyer): Thanks for finding this one. Looking at this issue, I'm pretty sure I introduced this bug with SOLR-3430. DataImportHandler's Entity must have a name --- Key: SOLR-3478 URL: https://issues.apache.org/jira/browse/SOLR-3478 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: r1341454, {code}java -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar{code} Reporter: Stefan Matheis (steffkes) Fix For: 4.0 Attachments: SOLR-3478.patch Using trunk and trying to start the {{example-DIH}} version, throws the following Exception: {code}May 22, 2012 8:17:45 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:614) [...] Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Entity must have a name. at org.apache.solr.handler.dataimport.config.Entity.init(Entity.java:54) at org.apache.solr.handler.dataimport.config.DIHConfiguration.init(DIHConfiguration.java:61) at org.apache.solr.handler.dataimport.DataImporter.readFromXml(DataImporter.java:249) at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:187) ... 49 more{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3478) DataImportHandler's Entity must have a name
[ https://issues.apache.org/jira/browse/SOLR-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281642#comment-13281642 ] James Dyer commented on SOLR-3478: -- Actually I think for 4.0 we should break backwards-compatibility with this one and require all DIH entities to have a name. (In 3.6 and prior, it logs a warning and assigns a name based on the system clock.) In SOLR-3422 I fixed any unit tests that didn't use name but missed the examples. DataImportHandler's Entity must have a name --- Key: SOLR-3478 URL: https://issues.apache.org/jira/browse/SOLR-3478 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: r1341454, {code}java -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar{code} Reporter: Stefan Matheis (steffkes) Fix For: 4.0 Attachments: SOLR-3478.patch Using trunk and trying to start the {{example-DIH}} version, throws the following Exception: {code}May 22, 2012 8:17:45 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:614) [...] Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Entity must have a name. at org.apache.solr.handler.dataimport.config.Entity.init(Entity.java:54) at org.apache.solr.handler.dataimport.config.DIHConfiguration.init(DIHConfiguration.java:61) at org.apache.solr.handler.dataimport.DataImporter.readFromXml(DataImporter.java:249) at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:187) ... 49 more{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2585) Context-Sensitive Spelling Suggestions Collations
[ https://issues.apache.org/jira/browse/SOLR-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer resolved SOLR-2585. -- Resolution: Fixed Fix Version/s: 4.0 Assignee: James Dyer Committed to Trunk r1341894. I will also add this to the wiki. Context-Sensitive Spelling Suggestions Collations --- Key: SOLR-2585 URL: https://issues.apache.org/jira/browse/SOLR-2585 Project: Solr Issue Type: Improvement Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Assignee: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch Solr currently cannot offer what I'm calling here a context-sensitive spelling suggestion. That is, if a user enters one or more words that have docFrequency 0, but nevertheless are misspelled, then no suggestions are offered. Currently, Solr will always consider a word correctly spelled if it is in the index and/or dictionary, regardless of context. This issue patch add support for context-sensitive spelling suggestions. See SpellCheckCollatorTest.testContextSensitiveCollate() for a the typical use case for this functionality. This tests both using IndexBasedSepllChecker and DirectSolrSpellChecker. Two new Spelling Parameters were added: - spellcheck.alternativeTermCount - The count of suggestions to return for each query term existing in the index and/or dictionary. Presumably, users will want fewer suggestions for words with docFrequency0. Also setting this value turns on context-sensitive spell suggestions. - spellcheck.maxResultsForSuggest - The maximum number of hits the request can return in order to both generate spelling suggestions and set the correctlySpelled element to false. For example, if this is set to 5 and the user's query returns 5 or fewer results, the spellchecker will report correctlySpelled=false and also offer suggestions (and collations if requested). Setting this greater than zero is useful for creating did-you-mean suggestions for queries that return a low number of hits. I have also included a test using shards. See additions to DistributedSpellCheckComponentTest. In Lucene, SpellChecker.java can already support this functionality (by passing a null IndexReader and field-name). The DirectSpellChecker, however, needs a minor enhancement. This gives the option to allow DirectSpellChecker to return suggestions for all query terms regardless of frequency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3309) Slow WAR startups due to annotation scaning (affects Jetty 8)
[ https://issues.apache.org/jira/browse/SOLR-3309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer resolved SOLR-3309. -- Resolution: Fixed Committed the javaee change, Trunk r1341897. Slow WAR startups due to annotation scaning (affects Jetty 8) - Key: SOLR-3309 URL: https://issues.apache.org/jira/browse/SOLR-3309 Project: Solr Issue Type: Bug Reporter: Bill Bell Assignee: James Dyer Fix For: 4.0 Attachments: SOLR-3309.patch, SOLR-3309.patch Need to modify web.xml to increase the speed of container startup time. The header also appears to need to be modified... http://mostlywheat.wordpress.com/2012/03/10/speeding-up-slow-jetty-8-startups/ http://www.javabeat.net/articles/print.php?article_id=100 Adding 'metadata-complete=true' to our web.xml's web-app restored our startup time to 8 seconds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3457) Spellchecker always incorrectly spelled
[ https://issues.apache.org/jira/browse/SOLR-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer resolved SOLR-3457. -- Resolution: Fixed Fix Version/s: 4.0 Assignee: James Dyer Fixed with SOLR-2585 commit (Trunk r1341894). Spellchecker always incorrectly spelled --- Key: SOLR-3457 URL: https://issues.apache.org/jira/browse/SOLR-3457 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.0 Environment: solr-spec 4.0.0.2012.05.15.11.42.06 solr-impl 4.0-SNAPSHOT 1338601 - markus - 2012-05-15 11:42:06 lucene-spec 4.0-SNAPSHOT lucene-impl 4.0-SNAPSHOT 1338601 - markus - 2012-05-15 10:51:02 Reporter: Markus Jelsma Assignee: James Dyer Fix For: 4.0 Attachments: SOLR-3457-4.0-1.patch correctlySpelled is always false with default configuration, example config and example documents: http://localhost:8983/solr/collection1/browse?wt=xmlspellcheck.extendedResults=trueq=samsung {code} lst name=spellcheck lst name=suggestions bool name=correctlySpelledfalse/bool /lst /lst {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4074) FST Sorter BufferSize causes int overflow if BufferSize 2048MB
Simon Willnauer created LUCENE-4074: --- Summary: FST Sorter BufferSize causes int overflow if BufferSize 2048MB Key: LUCENE-4074 URL: https://issues.apache.org/jira/browse/LUCENE-4074 Project: Lucene - Java Issue Type: Bug Components: modules/spellchecker Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 3.6.1, 4.1 the BufferSize constructor accepts size in MB as an integer and uses multiplication to convert to bytes. While its checking the size in bytes to be less than 2048 MB it does that after byte conversion. If you pass a value 2047 to the ctor the value overflows since all constants and methods based on MB expect 32 bit signed ints. This does not even result in an exception until the BufferSize is actually passed to the sorter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3478) DataImportHandler's Entity must have a name
[ https://issues.apache.org/jira/browse/SOLR-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3478: Assignee: Stefan Matheis (steffkes) okay, now it's clear for me. will commit the changed example soon DataImportHandler's Entity must have a name --- Key: SOLR-3478 URL: https://issues.apache.org/jira/browse/SOLR-3478 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: r1341454, {code}java -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar{code} Reporter: Stefan Matheis (steffkes) Assignee: Stefan Matheis (steffkes) Fix For: 4.0 Attachments: SOLR-3478.patch Using trunk and trying to start the {{example-DIH}} version, throws the following Exception: {code}May 22, 2012 8:17:45 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:614) [...] Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Entity must have a name. at org.apache.solr.handler.dataimport.config.Entity.init(Entity.java:54) at org.apache.solr.handler.dataimport.config.DIHConfiguration.init(DIHConfiguration.java:61) at org.apache.solr.handler.dataimport.DataImporter.readFromXml(DataImporter.java:249) at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:187) ... 49 more{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4074) FST Sorter BufferSize causes int overflow if BufferSize 2048MB
[ https://issues.apache.org/jira/browse/LUCENE-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-4074: Attachment: LUCENE-4074.patch here is a patch that adds a testcase, changes all arguments and constants to 64bit signed ints and checks for negative values in the BufferSize ctor for immediate feedback. FST Sorter BufferSize causes int overflow if BufferSize 2048MB Key: LUCENE-4074 URL: https://issues.apache.org/jira/browse/LUCENE-4074 Project: Lucene - Java Issue Type: Bug Components: modules/spellchecker Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 3.6.1, 4.1 Attachments: LUCENE-4074.patch the BufferSize constructor accepts size in MB as an integer and uses multiplication to convert to bytes. While its checking the size in bytes to be less than 2048 MB it does that after byte conversion. If you pass a value 2047 to the ctor the value overflows since all constants and methods based on MB expect 32 bit signed ints. This does not even result in an exception until the BufferSize is actually passed to the sorter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4074) FST Sorter BufferSize causes int overflow if BufferSize 2048MB
[ https://issues.apache.org/jira/browse/LUCENE-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-4074: --- Assignee: Simon Willnauer FST Sorter BufferSize causes int overflow if BufferSize 2048MB Key: LUCENE-4074 URL: https://issues.apache.org/jira/browse/LUCENE-4074 Project: Lucene - Java Issue Type: Bug Components: modules/spellchecker Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.6.1, 4.1 Attachments: LUCENE-4074.patch the BufferSize constructor accepts size in MB as an integer and uses multiplication to convert to bytes. While its checking the size in bytes to be less than 2048 MB it does that after byte conversion. If you pass a value 2047 to the ctor the value overflows since all constants and methods based on MB expect 32 bit signed ints. This does not even result in an exception until the BufferSize is actually passed to the sorter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3478) DataImportHandler's Entity must have a name
[ https://issues.apache.org/jira/browse/SOLR-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) resolved SOLR-3478. - Resolution: Fixed Committed in r1341920 DataImportHandler's Entity must have a name --- Key: SOLR-3478 URL: https://issues.apache.org/jira/browse/SOLR-3478 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: r1341454, {code}java -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar{code} Reporter: Stefan Matheis (steffkes) Assignee: Stefan Matheis (steffkes) Fix For: 4.0 Attachments: SOLR-3478.patch Using trunk and trying to start the {{example-DIH}} version, throws the following Exception: {code}May 22, 2012 8:17:45 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:614) [...] Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Entity must have a name. at org.apache.solr.handler.dataimport.config.Entity.init(Entity.java:54) at org.apache.solr.handler.dataimport.config.DIHConfiguration.init(DIHConfiguration.java:61) at org.apache.solr.handler.dataimport.DataImporter.readFromXml(DataImporter.java:249) at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:187) ... 49 more{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java6-64 #169
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/169/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Resolved] (LUCENE-4051) Fix File Headers for Lucene40 StoredFields TermVectors
For 4.0-alpha, are there other known file format changes in the works? committed to trunk in rev. 1341768. I send out a headsup mail to the dev list since this breaks the index file format. thanks for reviewing lets get 4.0-alpha out! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Resolved] (LUCENE-4051) Fix File Headers for Lucene40 StoredFields TermVectors
yes On Wed, May 23, 2012 at 12:14 PM, Ryan McKinley ryan...@gmail.com wrote: For 4.0-alpha, are there other known file format changes in the works? committed to trunk in rev. 1341768. I send out a headsup mail to the dev list since this breaks the index file format. thanks for reviewing lets get 4.0-alpha out! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4051) Fix File Headers for Lucene40 StoredFields TermVectors
[ https://issues.apache.org/jira/browse/LUCENE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281700#comment-13281700 ] Uwe Schindler commented on LUCENE-4051: --- Thank you very much! Now file formats are finally consistent. Maybe our index files' consistent magic numbers now also get added to the file unix command :-) Fix File Headers for Lucene40 StoredFields TermVectors Key: LUCENE-4051 URL: https://issues.apache.org/jira/browse/LUCENE-4051 Project: Lucene - Java Issue Type: Task Components: core/codecs Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-4051.patch, LUCENE-4051.patch, LUCENE-4051.patch, LUCENE-4051.patch, LUCENE-4051.patch Currently we still write the old file header format in Lucene40StoredFieldFormat Lucene40TermVectorsFormat. We should cut over to use CodecUtil and reset the versioning before we release Lucene 4.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-2161) BasicDistributedZkTest.testDistribSearch test failure
[ https://issues.apache.org/jira/browse/SOLR-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley closed SOLR-2161. -- Resolution: Fixed Fix Version/s: (was: 4.1) 4.0 I checked in some changes that should fix the most common failures we were seeing due to the recovery threads not being stopped. BasicDistributedZkTest.testDistribSearch test failure - Key: SOLR-2161 URL: https://issues.apache.org/jira/browse/SOLR-2161 Project: Solr Issue Type: Bug Components: Build Affects Versions: 4.0 Environment: Hudson Reporter: Robert Muir Fix For: 4.0 BasicDistributedZkTest.testDistribSearch failed in Hudson. Here is the stacktrace: {noformat} [junit] Testsuite: org.apache.solr.cloud.BasicDistributedZkTest [junit] Testcase: testDistribSearch(org.apache.solr.cloud.BasicDistributedZkTest): Caused an ERROR [junit] Error executing query [junit] org.apache.solr.client.solrj.SolrServerException: Error executing query [junit] at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) [junit] at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119) [junit] at org.apache.solr.BaseDistributedSearchTestCase.queryServer(BaseDistributedSearchTestCase.java:290) [junit] at org.apache.solr.cloud.BasicDistributedZkTest.queryServer(BasicDistributedZkTest.java:256) [junit] at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:305) [junit] at org.apache.solr.cloud.BasicDistributedZkTest.doTest(BasicDistributedZkTest.java:227) [junit] at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:562) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768) [junit] Caused by: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.NoHttpResponseException: The server 127.0.0.1 failed to respond org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.NoHttpResponseException: The server 127.0.0.1 failed to respond at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:318) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325)at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.NoHttpResponseException: The server 127.0.0.1 failed to respond at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.reque [junit] [junit] org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.NoHttpResponseException: The server 127.0.0.1 failed to respond org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.NoHttpResponseException: The server 127.0.0.1 failed to respondat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:318) at
[jira] [Created] (SOLR-3482) Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml
Emma Bo Liu created SOLR-3482: - Summary: Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml Key: SOLR-3482 URL: https://issues.apache.org/jira/browse/SOLR-3482 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Environment: windows Reporter: Emma Bo Liu The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. It cannot find the tika. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3482) Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emma Bo Liu updated SOLR-3482: -- Priority: Minor (was: Major) Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml -- Key: SOLR-3482 URL: https://issues.apache.org/jira/browse/SOLR-3482 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Environment: windows Reporter: Emma Bo Liu Priority: Minor Labels: core, email, index, solr, tika The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. It cannot find the tika. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3482) Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emma Bo Liu updated SOLR-3482: -- Description: The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. It cannot find the tika. The example of mail core is not complete, miss files. (was: The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. It cannot find the tika. ) Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml -- Key: SOLR-3482 URL: https://issues.apache.org/jira/browse/SOLR-3482 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Environment: windows Reporter: Emma Bo Liu Priority: Minor Labels: core, email, index, solr, tika The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. It cannot find the tika. The example of mail core is not complete, miss files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Using term offsets for hit highlighting
OK, so the most straightforward way to do that would be to change the signature to positions(boolean needsPayloads, boolean needsOffsets), I guess. This is a new API so it's not breaking anything. It'll be tomorrow morning before I have a proper go at this now (Cambridge Beer Festival tonight…). Is the mailing list the best place to discuss this, or is JIRA/IRC better? On 23 May 2012, at 13:43, Simon Willnauer wrote: hey alan, I added position iterator support to ConjunctionTermScorer and committed it to the branch. All tests that don't rely on payloads are passing in core. Previously we had to decide if we need positions up front, the current code can pull them lazily which causes less changes on the Scorer API. I think we should keep it that way, the only problem is that we have currently now way to pass information to the iterators if we need payloads or not. Same is true for offsets since they are now in the index. I think it would be good if you could tackle the payloads first and pass some info to the Scorer#positions() method so we can pull the right thing. happy coding. simon On Wed, May 23, 2012 at 1:23 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Sweet, thanks Simon. I'll have a go at getting some failing tests passing to begin with. On 23 May 2012, at 11:59, Simon Willnauer wrote: alan, I merged the branch manually and created a new branch from it. its here: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-2878 the branch compiles but lots of nocommits / todos if you have questions please ask I will help as much as I can simon On Tue, May 22, 2012 at 8:38 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Hey, I reckon I can have a decent go at getting the branch updated. Is it best to work this out as a patch applying to trunk? Any patch that merges in all the trunk changes to the branch is going to be absolutely massive… On 17 May 2012, at 13:15, Simon Willnauer wrote: ok man. I will try to merge up the branch. I tell you this is going to be messy and it might not compile but I will make it reasonable so you can start. simon On Thu, May 17, 2012 at 8:03 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Sorry for vanishing for so long, life unexpectedly caught up with me... I'm going to have some time to look at this again next week though, if you're interested in picking it up again. On 21 Mar 2012, at 09:02, Alan Woodward wrote: That would be great, thanks! I had a go at merging it last night, but there are a *lot* of changes that I haven't got my head round yet, so it was getting pretty messy. On 21 Mar 2012, at 08:49, Simon Willnauer wrote: Alan, if you want I can just merge the branch up next week and we iterate from there? simon On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson erickerick...@gmail.com wrote: Yep, the first challenge is always getting the old patch(es) to apply. On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Thanks for all the offers of help! It looks as though most of the hard work has already been done, which is exactly where I like to pick up projects. :-) Maybe the best place to start would be for me to rebase the branch against trunk, and see what still fits? I think there have been some fairly major changes in the internals since July last year. On 19 Mar 2012, at 17:07, Mike Sokolov wrote: I posted a patch with a Collector somewhat similar to what you described, Alan - it's attached to one of the sub-issues https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly complete alpha state, but has seen no production use of course, since it relies on the remainder of the unfinished work in that branch. It works by creating a TokenStream based on match positions returned from the query and passing that to the existing Highlighter. Please feel free to get in touch if you decide to look into that and have questions. -Mike On 03/19/2012 11:51 AM, Simon Willnauer wrote: On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindleru...@thetaphi.de wrote: Have you marked that for GSOC? Would be a good idea! yes I did - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Monday, March 19, 2012 4:43 PM To: dev@lucene.apache.org Subject: Re: Using term offsets for hit highlighting Alan, you made my day! The branch is kind of outdated but I looked at it lately and I can certainly help to get it up to speed. The feature in that branch is quite a big one and its in a very early stage. Still I want to encourage you to take a look and work on it. I promise all my help with the issues! let me know if you have questions! simon On Mon, Mar 19,
[jira] [Commented] (SOLR-3482) Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281719#comment-13281719 ] Stefan Matheis (steffkes) commented on SOLR-3482: - Emma, did you find something else in data-config.xml than the missing entity-name, which is already reported fixed in SOLR-3478? Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml -- Key: SOLR-3482 URL: https://issues.apache.org/jira/browse/SOLR-3482 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Environment: windows Reporter: Emma Bo Liu Priority: Minor Labels: core, email, index, solr, tika The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. It cannot find the tika. The example of mail core is not complete, miss files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3482) Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emma Bo Liu updated SOLR-3482: -- Description: The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. It cannot find the tika. The example of mail core is not complete, miss files.There is mistake of the sor mailEnitityPorcessor tutorial. (was: The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. It cannot find the tika. The example of mail core is not complete, miss files.) Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml -- Key: SOLR-3482 URL: https://issues.apache.org/jira/browse/SOLR-3482 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Environment: windows Reporter: Emma Bo Liu Priority: Minor Labels: core, email, index, solr, tika The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. It cannot find the tika. The example of mail core is not complete, miss files.There is mistake of the sor mailEnitityPorcessor tutorial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3482) Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281724#comment-13281724 ] Emma Bo Liu commented on SOLR-3482: --- on the mail core data-config.xml of example-DIH,the entity doesn't have a name and neither on the solr mailEntityProcessor tutorial. I am glad the issue with entity name solved.But there are still other mistake in mail core and tika. I will update the patch with correct mail-core configuration quickly. Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml -- Key: SOLR-3482 URL: https://issues.apache.org/jira/browse/SOLR-3482 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Environment: windows Reporter: Emma Bo Liu Priority: Minor Labels: core, email, index, solr, tika The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. It cannot find the tika. The example of mail core is not complete, miss files.There is mistake of the sor mailEnitityPorcessor tutorial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3482) Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml, Cannot find tika
[ https://issues.apache.org/jira/browse/SOLR-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emma Bo Liu updated SOLR-3482: -- Description: The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. The example of mail core is not complete, miss files.There is mistake of the sor mailEnitityPorcessor tutorial. It cannot find the tika even tough it include the dataimporter-extra jar file. was:The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. It cannot find the tika. The example of mail core is not complete, miss files.There is mistake of the sor mailEnitityPorcessor tutorial. Summary: Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml, Cannot find tika (was: Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml) Cannot index emails, mistakes of configuration file data-config.xml solrconfig.xml, Cannot find tika - Key: SOLR-3482 URL: https://issues.apache.org/jira/browse/SOLR-3482 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Environment: windows Reporter: Emma Bo Liu Priority: Minor Labels: core, email, index, solr, tika The mail core cannot be brought up. There are mistakes of data-config.xml solrconfig.xml. The example of mail core is not complete, miss files.There is mistake of the sor mailEnitityPorcessor tutorial. It cannot find the tika even tough it include the dataimporter-extra jar file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281751#comment-13281751 ] Michael McCandless commented on LUCENE-4062: bq. A third option could be to write padding bits (Packed64SingleBlock subclasses may have such padding bits) as well, but I really dislike the fact that the on-disk format is implementation-dependent. Actually, I think we should stop specializing based on 32 bit vs 64 bit JRE, and always use the impls backed by long[] (Packed64*). Then, I think it's fine if we write the long[] image (with padding bits) directly to disk? More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Michael McCandless Priority: Minor Labels: performance Fix For: 4.1 Attachments: LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32 * 33: Packed64 * 34: Packed64 * 35: Packed64 * 36: Packed64 * 37: Packed64 * 38: Packed64 * 39: Packed64 * 40: Packed64 * 41: Packed64 * 42: Packed64 * 43: Packed64 * 44: Packed64 * 45: Packed64 * 46: Packed64 * 47: Packed64 * 48: Packed64 * 49: Packed64 * 50: Packed64 * 51: Packed64 * 52: Packed64 * 53: Packed64 * 54: Direct64 * 55: Direct64 * 56: Direct64 * 57: Direct64 * 58: Direct64 * 59: Direct64 * 60: Direct64 * 61: Direct64 * 62: Direct64 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still choose the slower Packed64 implementation. Allowing a 50% overhead would prevent the packed implementation to be selected for bits per value under 32. Allowing an overhead of 32 bits per value would make sure that a Direct* implementation is always selected. Next steps would be to: * make lucene components use this {{getMutable}} method and let users decide what trade-off better suits them, * write a Packed32SingleBlock implementation if necessary (I didn't do it because I have no 32-bits computer to test the performance improvements). I think this would allow more fine-grained control over the speed/space trade-off, what do you
Build failed in Jenkins: Lucene-Solr-trunk-Linux-Java6-64 #489
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/489/ -- [...truncated 12159 lines...] [junit4] Suite: org.apache.solr.core.RAMDirectoryFactoryTest [junit4] Completed on J1 in 0.01s, 1 test [junit4] [junit4] Suite: org.apache.solr.analysis.TestHyphenationCompoundWordTokenFilterFactory [junit4] Completed on J1 in 0.08s, 2 tests [junit4] [junit4] Suite: org.apache.solr.handler.StandardRequestHandlerTest [junit4] Completed on J0 in 0.56s, 1 test [junit4] [junit4] Suite: org.apache.solr.analysis.TestPhoneticFilterFactory [junit4] Completed on J0 in 9.44s, 5 tests [junit4] [junit4] Suite: org.apache.solr.handler.FieldAnalysisRequestHandlerTest [junit4] Completed on J0 in 0.57s, 4 tests [junit4] [junit4] Suite: org.apache.solr.handler.component.DistributedSpellCheckComponentTest [junit4] Completed on J0 in 6.83s, 1 test [junit4] [junit4] Suite: org.apache.solr.cloud.LeaderElectionTest [junit4] Completed on J1 in 16.51s, 4 tests [junit4] [junit4] Suite: org.apache.solr.request.TestFaceting [junit4] Completed on J1 in 7.84s, 3 tests [junit4] [junit4] Suite: org.apache.solr.TestDistributedSearch [junit4] Completed on J0 in 15.25s, 1 test [junit4] [junit4] Suite: org.apache.solr.request.SimpleFacetsTest [junit4] Completed on J0 in 3.50s, 29 tests [junit4] [junit4] Suite: org.apache.solr.cloud.ZkControllerTest [junit4] Completed on J1 in 6.89s, 3 tests [junit4] [junit4] Suite: org.apache.solr.core.SolrCoreTest [junit4] Completed on J1 in 3.47s, 5 tests [junit4] [junit4] Suite: org.apache.solr.search.TestSort [junit4] Completed on J0 in 3.93s, 2 tests [junit4] [junit4] Suite: org.apache.solr.SolrInfoMBeanTest [junit4] Completed on J0 in 0.54s, 1 test [junit4] [junit4] Suite: org.apache.solr.BasicFunctionalityTest [junit4] IGNORED 0.00s J1 | BasicFunctionalityTest.testDeepPaging [junit4] Cause: Annotated @Ignore(See SOLR-1726) [junit4] Completed on J1 in 1.49s, 23 tests, 1 skipped [junit4] [junit4] Suite: org.apache.solr.spelling.IndexBasedSpellCheckerTest [junit4] Completed on J1 in 0.77s, 5 tests [junit4] [junit4] Suite: org.apache.solr.update.SolrCmdDistributorTest [junit4] Completed on J0 in 1.13s, 1 test [junit4] [junit4] Suite: org.apache.solr.handler.admin.LukeRequestHandlerTest [junit4] Completed on J1 in 1.53s, 3 tests [junit4] [junit4] Suite: org.apache.solr.core.TestCoreContainer [junit4] Completed on J0 in 1.59s, 1 test [junit4] [junit4] Suite: org.apache.solr.request.TestWriterPerf [junit4] Completed on J1 in 0.74s, 1 test [junit4] [junit4] Suite: org.apache.solr.analysis.TestWordDelimiterFilterFactory [junit4] Completed on J0 in 0.86s, 7 tests [junit4] [junit4] Suite: org.apache.solr.search.function.distance.DistanceFunctionTest [junit4] Completed on J1 in 0.60s, 3 tests [junit4] [junit4] Suite: org.apache.solr.handler.XsltUpdateRequestHandlerTest [junit4] Completed on J0 in 0.67s, 1 test [junit4] [junit4] Suite: org.apache.solr.core.SolrCoreCheckLockOnStartupTest [junit4] Completed on J1 in 0.93s, 2 tests [junit4] [junit4] Suite: org.apache.solr.handler.DocumentAnalysisRequestHandlerTest [junit4] Completed on J0 in 0.63s, 4 tests [junit4] [junit4] Suite: org.apache.solr.update.processor.FieldMutatingUpdateProcessorTest [junit4] Completed on J1 in 0.49s, 20 tests [junit4] [junit4] Suite: org.apache.solr.core.RequestHandlersTest [junit4] Completed on J0 in 0.61s, 3 tests [junit4] [junit4] Suite: org.apache.solr.spelling.FileBasedSpellCheckerTest [junit4] Completed on J1 in 0.60s, 3 tests [junit4] [junit4] Suite: org.apache.solr.schema.PrimitiveFieldTypeTest [junit4] Completed on J0 in 0.74s, 1 test [junit4] [junit4] Suite: org.apache.solr.search.TestQueryUtils [junit4] Completed on J1 in 0.66s, 1 test [junit4] [junit4] Suite: org.apache.solr.search.TestValueSourceCache [junit4] Completed on J0 in 0.65s, 2 tests [junit4] [junit4] Suite: org.apache.solr.response.TestPHPSerializedResponseWriter [junit4] Completed on J1 in 0.49s, 2 tests [junit4] [junit4] Suite: org.apache.solr.DisMaxRequestHandlerTest [junit4] Completed on J0 in 0.62s, 3 tests [junit4] [junit4] Suite: org.apache.solr.util.SolrPluginUtilsTest [junit4] Completed on J1 in 0.58s, 7 tests [junit4] [junit4] Suite: org.apache.solr.core.IndexReaderFactoryTest [junit4] Completed on J1 in 0.44s, 1 test [junit4] [junit4] Suite: org.apache.solr.schema.RequiredFieldsTest [junit4] Completed on J0 in 0.46s, 3 tests [junit4] [junit4] Suite: org.apache.solr.request.JSONWriterTest [junit4] Completed on J1 in 0.48s, 3 tests [junit4] [junit4] Suite:
[jira] [Resolved] (SOLR-3481) Date field value differs between two installations
[ https://issues.apache.org/jira/browse/SOLR-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-3481. Resolution: Incomplete David: there isn't enough information here to understand what problem you might be havng, or if there is infact actually a bug in solr (as opposed to a configuration discrepancy in your setup) please start a thread on the solr-user@lucene mailing list with more details (ie: your schema.xml, including field types, examples documents you index, example queries you run, what output you get from those queries etc...) https://wiki.apache.org/solr/UsingMailingLists Date field value differs between two installations -- Key: SOLR-3481 URL: https://issues.apache.org/jira/browse/SOLR-3481 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.6 Environment: A. Mac 10.7.4 with integrated Jetty B. Ubuntu 12.04 with Tomcat Reporter: David Rekowski Labels: datefield,, format, mac When I query the Solr Server, I get a formatted timestamp in environment A 2012-05-11T12:59:01.691Z, whereas I get a unix timestamp like number in environment B 1336728376797 which looks like the date extended by microseconds. The corresponding schema definition: field name=index_time_s type=date indexed=true stored=true default=NOW multiValued=false/ Background: We migrated an index generated on a mac/jetty to a linux/tomcat installation of Solr. Regardless of that, this happens with newly indexed documents as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Using term offsets for hit highlighting
Hey Alan, On Wed, May 23, 2012 at 6:46 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: OK, so the most straightforward way to do that would be to change the signature to positions(boolean needsPayloads, boolean needsOffsets), I guess. This is a new API so it's not breaking anything. yeah I'd think so. this is also consistent how we pull scorers its safe in terms of changes ie. you won't miss an API change vs. using a struct like object. I am not sure how we expose the offsets yet but for now lets make the tests pass. That should provide you a good and straight forward start though. Don't worry about the API for now, we are in a dev phase that doesn't need to produce a fixed API we will straighten that out iteratively as we go. It'll be tomorrow morning before I have a proper go at this now (Cambridge Beer Festival tonight…). Is the mailing list the best place to discuss this, or is JIRA/IRC better? patches should go on the issue and code discussions related to the patches too. It might make sense to have discussion of a broader scope on the dev list, decisions made on the list should be referenced on the issue. IRC might make sense too if you have some questions that are better answered interactively. Yet, any decisions should also be discussed here or on the issue. If something we discussed on IRC leads to some design decisions its wise to repeat them on the issue so folks can reproduce the decision making process. In any case if its IRC make sure it #lucene-dev looking forward to the patches... simon On 23 May 2012, at 13:43, Simon Willnauer wrote: hey alan, I added position iterator support to ConjunctionTermScorer and committed it to the branch. All tests that don't rely on payloads are passing in core. Previously we had to decide if we need positions up front, the current code can pull them lazily which causes less changes on the Scorer API. I think we should keep it that way, the only problem is that we have currently now way to pass information to the iterators if we need payloads or not. Same is true for offsets since they are now in the index. I think it would be good if you could tackle the payloads first and pass some info to the Scorer#positions() method so we can pull the right thing. happy coding. simon On Wed, May 23, 2012 at 1:23 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Sweet, thanks Simon. I'll have a go at getting some failing tests passing to begin with. On 23 May 2012, at 11:59, Simon Willnauer wrote: alan, I merged the branch manually and created a new branch from it. its here: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-2878 the branch compiles but lots of nocommits / todos if you have questions please ask I will help as much as I can simon On Tue, May 22, 2012 at 8:38 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Hey, I reckon I can have a decent go at getting the branch updated. Is it best to work this out as a patch applying to trunk? Any patch that merges in all the trunk changes to the branch is going to be absolutely massive… On 17 May 2012, at 13:15, Simon Willnauer wrote: ok man. I will try to merge up the branch. I tell you this is going to be messy and it might not compile but I will make it reasonable so you can start. simon On Thu, May 17, 2012 at 8:03 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Sorry for vanishing for so long, life unexpectedly caught up with me... I'm going to have some time to look at this again next week though, if you're interested in picking it up again. On 21 Mar 2012, at 09:02, Alan Woodward wrote: That would be great, thanks! I had a go at merging it last night, but there are a *lot* of changes that I haven't got my head round yet, so it was getting pretty messy. On 21 Mar 2012, at 08:49, Simon Willnauer wrote: Alan, if you want I can just merge the branch up next week and we iterate from there? simon On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson erickerick...@gmail.com wrote: Yep, the first challenge is always getting the old patch(es) to apply. On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Thanks for all the offers of help! It looks as though most of the hard work has already been done, which is exactly where I like to pick up projects. :-) Maybe the best place to start would be for me to rebase the branch against trunk, and see what still fits? I think there have been some fairly major changes in the internals since July last year. On 19 Mar 2012, at 17:07, Mike Sokolov wrote: I posted a patch with a Collector somewhat similar to what you described, Alan - it's attached to one of the sub-issues https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly complete alpha state, but has seen no production use of course, since it relies on the remainder of the unfinished work
Jenkins build is back to normal : Lucene-Solr-trunk-Linux-Java6-64 #490
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/490/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2443) Don't assume IntsRef offset is 0 after postings bulk read
[ https://issues.apache.org/jira/browse/LUCENE-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281783#comment-13281783 ] Simon Willnauer commented on LUCENE-2443: - seems like this is invalid now no? Don't assume IntsRef offset is 0 after postings bulk read - Key: LUCENE-2443 URL: https://issues.apache.org/jira/browse/LUCENE-2443 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Yonik found 2 places where we assume the ints starts at offset=0 after bulk read -- we can't do this because in general a codec can give us a slice into private int[] arrays, eg int block codec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Change Similarity in Solr MoreLikeThis
LUCENE-896 added support for changing the Similarity class of More Like This but this functionality has not been exposed to Solr. I would like to create a jira and submit a patch for this. Do you agree? Thanks Emmanuel - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3715) TestStressIndexing2 failes with AssertionFailedError
[ https://issues.apache.org/jira/browse/LUCENE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-3715. - Resolution: Cannot Reproduce closing this for now... it never reproduced TestStressIndexing2 failes with AssertionFailedError Key: LUCENE-3715 URL: https://issues.apache.org/jira/browse/LUCENE-3715 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.0 JENKINS reported this lately, I suspect a test issue due to the RandomDWPThreadPool but I need to dig deeper. here is the failure to reproduce: {noformat} [junit] Testcase: testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED [junit] r1 is not empty but r2 is [junit] junit.framework.AssertionFailedError: r1 is not empty but r2 is [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) [junit] at org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:339) [junit] at org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:277) [junit] at org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:126) [junit] at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529) [junit] [junit] [junit] Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 2.598 sec [junit] [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 -Dtestmethod=testMultiConfig -Dtests.seed=5df78431615a5fbf:45b35512c8b8741a:235b5758de97148e -Dtests.multiplier=3 -Dtests.nightly=true -Dargs=-Dfile.encoding=ISO8859-1 [junit] NOTE: test params are: codec=Lucene3x, sim=RandomSimilarityProvider(queryNorm=true,coord=true): {f34=DFR GZ(0.3), f33=IB SPL-D2, f32=DFR I(n)B2, f31=DFR I(ne)B1, f30=IB LL-L2, f79=DFR I(n)3(800.0), f78=DFR I(F)L2, f75=DFR I(n)BZ(0.3), f76=DFR GLZ(0.3), f39=DFR I(n)BZ(0.3), f38=DFR I(F)3(800.0), f73=DFR I(ne)L1, f74=DFR I(F)3(800.0), f37=DFR I(ne)L1, f36=DFR I(ne)3(800.0), f71=DFR I(F)B3(800.0), f35=DFR I(F)B3(800.0), f72=DFR I(ne)3(800.0), f81=DFR GZ(0.3), f80=IB SPL-D2, f43=DFR I(ne)BZ(0.3), f42=DFR I(F)Z(0.3), f45=IB SPL-L2, f41=DFR I(F)BZ(0.3), f40=DFR I(n)B1, f86=DFR I(ne)B3(800.0), f87=DFR GB1, f88=IB SPL-D3(800.0), f89=DFR I(F)L3(800.0), f82=DFR GL2, f47=DFR I(ne)LZ(0.3), f46=DFR GL2, f83=DFR I(ne)LZ(0.3), f49=DFR I(ne)Z(0.3), f84=DFR I(F)B2, f48=DFR I(F)B2, f85=DFR I(ne)Z(0.3), f90=DFR I(ne)BZ(0.3), f92=IB SPL-L2, f91=DFR I(n)Z(0.3), f59=DFR G2, f6=IB SPL-DZ(0.3), f7=IB LL-L1, f57=IB LL-L3(800.0), f8=DFR I(n)L3(800.0), f58=DFR I(n)LZ(0.3), f12=DFR I(F)1, f11=DFR I(n)L2, f10=DFR I(F)LZ(0.3), f51=DFR I(n)L1, f15=DFR I(n)L1, f52=DFR I(F)L2, f14=DFR GLZ(0.3), f13=DFR I(n)BZ(0.3), f55=DFR GL3(800.0), f19=DFR GL3(800.0), f56=IB LL-L2, f53=DFR I(F)L1, f18=BM25(k1=1.2,b=0.75), f17=DFR I(F)L1, f54=BM25(k1=1.2,b=0.75), id=DFR I(F)L2, f1=DFR I(n)B3(800.0), f0=DFR G2, f3=DFR I(ne)3(800.0), f2=DFR I(F)B3(800.0), f5=DFR I(F)3(800.0), f4=DFR I(ne)L1, f68=DFR I(n)2, f69=DFR I(ne)2, f21=IB LL-LZ(0.3), f20=DFR I(n)1, f23=DFR GB2, f22=DFR I(ne)B2, f60=DFR I(ne)B3(800.0), f25=DFR GB1, f61=DFR GB1, f24=DFR I(ne)B3(800.0), f62=IB SPL-D3(800.0), f27=DFR I(F)L3(800.0), f26=IB SPL-D3(800.0), f63=DFR I(F)L3(800.0), f64=DFR GL1, f29=DFR I(ne)1, f65=DFR I(ne)1, f28=DFR GL1, f66=DFR I(n)B1, f67=DFR I(F)BZ(0.3), f98=DFR I(n)LZ(0.3), f97=IB LL-L3(800.0), f99=DFR G2, f94=DefaultSimilarity, f93=DFR I(n)3(800.0), f70=DFR GB2, f96=LM Jelinek-Mercer(0.70), f95=DFR GBZ(0.3)}, locale=ms, timezone=Africa/Bangui [junit] NOTE: all tests run in this JVM: [junit] [TestDemo, TestSearch, TestCachingTokenFilter, TestSurrogates, TestPulsingReuse, TestAddIndexes, TestBinaryTerms, TestCodecs, TestCrashCausesCorruptIndex, TestDocsAndPositions, TestFieldInfos, TestFilterIndexReader, TestFlex, TestIndexReader, TestIndexWriterMergePolicy, TestIndexWriterNRTIsCurrent, TestIndexWriterOnJRECrash, TestIndexWriterWithThreads, TestNeverDelete, TestNoDeletionPolicy, TestOmitNorms, TestParallelReader, TestPayloads, TestRandomStoredFields, TestRollback, TestRollingUpdates, TestSegmentInfo, TestStressIndexing2] [junit] NOTE: FreeBSD 8.2-RELEASE amd64/Sun Microsystems Inc. 1.6.0 (64-bit)/cpus=16,threads=1,free=349545000,total=477233152 {noformat} this failed on revision: http://svn.apache.org/repos/asf/lucene/dev/trunk : 1233708 -- This message is automatically generated
[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281793#comment-13281793 ] Adrien Grand commented on LUCENE-4062: -- Mike, I am not sure how we should do it. For 21-bits values how would the reader know whether it should use a Packed64SingleBlock21 or a Packed64? Should we add a flag to the data stream in order to know what implementation serialized the integers? More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Michael McCandless Priority: Minor Labels: performance Fix For: 4.1 Attachments: LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32 * 33: Packed64 * 34: Packed64 * 35: Packed64 * 36: Packed64 * 37: Packed64 * 38: Packed64 * 39: Packed64 * 40: Packed64 * 41: Packed64 * 42: Packed64 * 43: Packed64 * 44: Packed64 * 45: Packed64 * 46: Packed64 * 47: Packed64 * 48: Packed64 * 49: Packed64 * 50: Packed64 * 51: Packed64 * 52: Packed64 * 53: Packed64 * 54: Direct64 * 55: Direct64 * 56: Direct64 * 57: Direct64 * 58: Direct64 * 59: Direct64 * 60: Direct64 * 61: Direct64 * 62: Direct64 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still choose the slower Packed64 implementation. Allowing a 50% overhead would prevent the packed implementation to be selected for bits per value under 32. Allowing an overhead of 32 bits per value would make sure that a Direct* implementation is always selected. Next steps would be to: * make lucene components use this {{getMutable}} method and let users decide what trade-off better suits them, * write a Packed32SingleBlock implementation if necessary (I didn't do it because I have no 32-bits computer to test the performance improvements). I think this would allow more fine-grained control over the speed/space trade-off, what do you think? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281794#comment-13281794 ] Simon Willnauer commented on LUCENE-2504: - yonik, I see a bunch of commits on this issue, can we resolve this? sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip, LUCENE-2504_SortMissingLast.patch sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281797#comment-13281797 ] Michael McCandless commented on LUCENE-4062: bq. Should we add a flag to the data stream in order to know what implementation serialized the integers? I think so? More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Michael McCandless Priority: Minor Labels: performance Fix For: 4.1 Attachments: LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32 * 33: Packed64 * 34: Packed64 * 35: Packed64 * 36: Packed64 * 37: Packed64 * 38: Packed64 * 39: Packed64 * 40: Packed64 * 41: Packed64 * 42: Packed64 * 43: Packed64 * 44: Packed64 * 45: Packed64 * 46: Packed64 * 47: Packed64 * 48: Packed64 * 49: Packed64 * 50: Packed64 * 51: Packed64 * 52: Packed64 * 53: Packed64 * 54: Direct64 * 55: Direct64 * 56: Direct64 * 57: Direct64 * 58: Direct64 * 59: Direct64 * 60: Direct64 * 61: Direct64 * 62: Direct64 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still choose the slower Packed64 implementation. Allowing a 50% overhead would prevent the packed implementation to be selected for bits per value under 32. Allowing an overhead of 32 bits per value would make sure that a Direct* implementation is always selected. Next steps would be to: * make lucene components use this {{getMutable}} method and let users decide what trade-off better suits them, * write a Packed32SingleBlock implementation if necessary (I didn't do it because I have no 32-bits computer to test the performance improvements). I think this would allow more fine-grained control over the speed/space trade-off, what do you think? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-4018) Make accessible subenums in MappingMultiDocsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281798#comment-13281798 ] Simon Willnauer commented on LUCENE-4018: - this makes sense to me though any objections? Make accessible subenums in MappingMultiDocsEnum Key: LUCENE-4018 URL: https://issues.apache.org/jira/browse/LUCENE-4018 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Affects Versions: 4.0 Reporter: Renaud Delbru Labels: codec, flex, merge Fix For: 4.0 Attachments: LUCENE-4018.patch The #merge method of the PostingsConsumer receives MappingMultiDocsEnum and MappingMultiDocsAndPositionsEnum as postings enum. In certain case (with specific postings formats), the #merge method needs to be overwritten, and the underlying DocsEnums wrapped by the MappingMultiDocsEnum need to be accessed. The MappingMultiDocsEnum class should provide a method #getSubs similarly to MultiDocsEnum class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4018) Make accessible subenums in MappingMultiDocsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281804#comment-13281804 ] Michael McCandless commented on LUCENE-4018: +1 Make accessible subenums in MappingMultiDocsEnum Key: LUCENE-4018 URL: https://issues.apache.org/jira/browse/LUCENE-4018 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Affects Versions: 4.0 Reporter: Renaud Delbru Labels: codec, flex, merge Fix For: 4.0 Attachments: LUCENE-4018.patch The #merge method of the PostingsConsumer receives MappingMultiDocsEnum and MappingMultiDocsAndPositionsEnum as postings enum. In certain case (with specific postings formats), the #merge method needs to be overwritten, and the underlying DocsEnums wrapped by the MappingMultiDocsEnum need to be accessed. The MappingMultiDocsEnum class should provide a method #getSubs similarly to MultiDocsEnum class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281805#comment-13281805 ] Adrien Grand commented on LUCENE-4062: -- Isn't it a problem to break compatibility? Or should we use special ( 64) values of bitsPerValue so that current trunk indexes will still work after the patch is applied? More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Michael McCandless Priority: Minor Labels: performance Fix For: 4.1 Attachments: LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32 * 33: Packed64 * 34: Packed64 * 35: Packed64 * 36: Packed64 * 37: Packed64 * 38: Packed64 * 39: Packed64 * 40: Packed64 * 41: Packed64 * 42: Packed64 * 43: Packed64 * 44: Packed64 * 45: Packed64 * 46: Packed64 * 47: Packed64 * 48: Packed64 * 49: Packed64 * 50: Packed64 * 51: Packed64 * 52: Packed64 * 53: Packed64 * 54: Direct64 * 55: Direct64 * 56: Direct64 * 57: Direct64 * 58: Direct64 * 59: Direct64 * 60: Direct64 * 61: Direct64 * 62: Direct64 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still choose the slower Packed64 implementation. Allowing a 50% overhead would prevent the packed implementation to be selected for bits per value under 32. Allowing an overhead of 32 bits per value would make sure that a Direct* implementation is always selected. Next steps would be to: * make lucene components use this {{getMutable}} method and let users decide what trade-off better suits them, * write a Packed32SingleBlock implementation if necessary (I didn't do it because I have no 32-bits computer to test the performance improvements). I think this would allow more fine-grained control over the speed/space trade-off, what do you think? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Assigned] (LUCENE-4018) Make accessible subenums in MappingMultiDocsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-4018: --- Assignee: Simon Willnauer Make accessible subenums in MappingMultiDocsEnum Key: LUCENE-4018 URL: https://issues.apache.org/jira/browse/LUCENE-4018 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Affects Versions: 4.0 Reporter: Renaud Delbru Assignee: Simon Willnauer Labels: codec, flex, merge Fix For: 4.0 Attachments: LUCENE-4018.patch The #merge method of the PostingsConsumer receives MappingMultiDocsEnum and MappingMultiDocsAndPositionsEnum as postings enum. In certain case (with specific postings formats), the #merge method needs to be overwritten, and the underlying DocsEnums wrapped by the MappingMultiDocsEnum need to be accessed. The MappingMultiDocsEnum class should provide a method #getSubs similarly to MultiDocsEnum class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4018) Make accessible subenums in MappingMultiDocsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-4018. - Resolution: Fixed committed to trunk! thanks Renaud Make accessible subenums in MappingMultiDocsEnum Key: LUCENE-4018 URL: https://issues.apache.org/jira/browse/LUCENE-4018 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Affects Versions: 4.0 Reporter: Renaud Delbru Assignee: Simon Willnauer Labels: codec, flex, merge Fix For: 4.0 Attachments: LUCENE-4018.patch The #merge method of the PostingsConsumer receives MappingMultiDocsEnum and MappingMultiDocsAndPositionsEnum as postings enum. In certain case (with specific postings formats), the #merge method needs to be overwritten, and the underlying DocsEnums wrapped by the MappingMultiDocsEnum need to be accessed. The MappingMultiDocsEnum class should provide a method #getSubs similarly to MultiDocsEnum class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281814#comment-13281814 ] Michael McCandless commented on LUCENE-4062: bq. Isn't it a problem to break compatibility? It isn't. 3x indices never store packed ints ... so we are only breaking doc values in 4.0, and we are allowed (for only a bit more time!) to break 4.0's index format. So we should just break it and not pollute 4.0's sources with false back compat code... Separately, if somehow we did need to preserve back compat for packed ints file format... we should use the version in the codec header to accomplish that (ie, we don't have to stuff version information inside the bitsPerValue). More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Michael McCandless Priority: Minor Labels: performance Fix For: 4.1 Attachments: LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32 * 33: Packed64 * 34: Packed64 * 35: Packed64 * 36: Packed64 * 37: Packed64 * 38: Packed64 * 39: Packed64 * 40: Packed64 * 41: Packed64 * 42: Packed64 * 43: Packed64 * 44: Packed64 * 45: Packed64 * 46: Packed64 * 47: Packed64 * 48: Packed64 * 49: Packed64 * 50: Packed64 * 51: Packed64 * 52: Packed64 * 53: Packed64 * 54: Direct64 * 55: Direct64 * 56: Direct64 * 57: Direct64 * 58: Direct64 * 59: Direct64 * 60: Direct64 * 61: Direct64 * 62: Direct64 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still choose the slower Packed64 implementation. Allowing a 50% overhead would prevent the packed implementation to be selected for bits per value under 32. Allowing an overhead of 32 bits per value would make sure that a Direct* implementation is always selected. Next steps would be to: * make lucene components use this {{getMutable}} method and let users decide what trade-off better suits them, * write a Packed32SingleBlock implementation if necessary (I didn't do it because I have no 32-bits computer to test the performance
[jira] [Commented] (SOLR-2161) BasicDistributedZkTest.testDistribSearch test failure
[ https://issues.apache.org/jira/browse/SOLR-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281815#comment-13281815 ] Dawid Weiss commented on SOLR-2161: --- Thanks Yonik! BasicDistributedZkTest.testDistribSearch test failure - Key: SOLR-2161 URL: https://issues.apache.org/jira/browse/SOLR-2161 Project: Solr Issue Type: Bug Components: Build Affects Versions: 4.0 Environment: Hudson Reporter: Robert Muir Fix For: 4.0 BasicDistributedZkTest.testDistribSearch failed in Hudson. Here is the stacktrace: {noformat} [junit] Testsuite: org.apache.solr.cloud.BasicDistributedZkTest [junit] Testcase: testDistribSearch(org.apache.solr.cloud.BasicDistributedZkTest): Caused an ERROR [junit] Error executing query [junit] org.apache.solr.client.solrj.SolrServerException: Error executing query [junit] at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) [junit] at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119) [junit] at org.apache.solr.BaseDistributedSearchTestCase.queryServer(BaseDistributedSearchTestCase.java:290) [junit] at org.apache.solr.cloud.BasicDistributedZkTest.queryServer(BasicDistributedZkTest.java:256) [junit] at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:305) [junit] at org.apache.solr.cloud.BasicDistributedZkTest.doTest(BasicDistributedZkTest.java:227) [junit] at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:562) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768) [junit] Caused by: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.NoHttpResponseException: The server 127.0.0.1 failed to respond org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.NoHttpResponseException: The server 127.0.0.1 failed to respond at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:318) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325)at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.NoHttpResponseException: The server 127.0.0.1 failed to respond at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.reque [junit] [junit] org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.NoHttpResponseException: The server 127.0.0.1 failed to respond org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.NoHttpResponseException: The server 127.0.0.1 failed to respondat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:318) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325)at
[jira] [Commented] (LUCENE-4074) FST Sorter BufferSize causes int overflow if BufferSize 2048MB
[ https://issues.apache.org/jira/browse/LUCENE-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281833#comment-13281833 ] Simon Willnauer commented on LUCENE-4074: - I will commit this soon if nobody objects. FST Sorter BufferSize causes int overflow if BufferSize 2048MB Key: LUCENE-4074 URL: https://issues.apache.org/jira/browse/LUCENE-4074 Project: Lucene - Java Issue Type: Bug Components: modules/spellchecker Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0, 3.6.1 Attachments: LUCENE-4074.patch the BufferSize constructor accepts size in MB as an integer and uses multiplication to convert to bytes. While its checking the size in bytes to be less than 2048 MB it does that after byte conversion. If you pass a value 2047 to the ctor the value overflows since all constants and methods based on MB expect 32 bit signed ints. This does not even result in an exception until the BufferSize is actually passed to the sorter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4074) FST Sorter BufferSize causes int overflow if BufferSize 2048MB
[ https://issues.apache.org/jira/browse/LUCENE-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281835#comment-13281835 ] Robert Muir commented on LUCENE-4074: - +1, nice catch FST Sorter BufferSize causes int overflow if BufferSize 2048MB Key: LUCENE-4074 URL: https://issues.apache.org/jira/browse/LUCENE-4074 Project: Lucene - Java Issue Type: Bug Components: modules/spellchecker Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0, 3.6.1 Attachments: LUCENE-4074.patch the BufferSize constructor accepts size in MB as an integer and uses multiplication to convert to bytes. While its checking the size in bytes to be less than 2048 MB it does that after byte conversion. If you pass a value 2047 to the ctor the value overflows since all constants and methods based on MB expect 32 bit signed ints. This does not even result in an exception until the BufferSize is actually passed to the sorter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Change Similarity in Solr MoreLikeThis
Hi Emmanuel, Sure, go for it! Cheers, Tommaso 2012/5/23 Emmanuel Espina espinaemman...@gmail.com LUCENE-896 added support for changing the Similarity class of More Like This but this functionality has not been exposed to Solr. I would like to create a jira and submit a patch for this. Do you agree? Thanks Emmanuel - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4074) FST Sorter BufferSize causes int overflow if BufferSize 2048MB
[ https://issues.apache.org/jira/browse/LUCENE-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-4074. - Resolution: Fixed committed to trunk and 3.6 branch FST Sorter BufferSize causes int overflow if BufferSize 2048MB Key: LUCENE-4074 URL: https://issues.apache.org/jira/browse/LUCENE-4074 Project: Lucene - Java Issue Type: Bug Components: modules/spellchecker Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0, 3.6.1 Attachments: LUCENE-4074.patch the BufferSize constructor accepts size in MB as an integer and uses multiplication to convert to bytes. While its checking the size in bytes to be less than 2048 MB it does that after byte conversion. If you pass a value 2047 to the ctor the value overflows since all constants and methods based on MB expect 32 bit signed ints. This does not even result in an exception until the BufferSize is actually passed to the sorter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2025) Ability to turn off the store for an index
[ https://issues.apache.org/jira/browse/LUCENE-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2025: Fix Version/s: (was: 4.0) 4.1 moving this over to 4.1 this won't happen in 4.0 anymore Ability to turn off the store for an index -- Key: LUCENE-2025 URL: https://issues.apache.org/jira/browse/LUCENE-2025 Project: Lucene - Java Issue Type: New Feature Components: core/index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, mentor Fix For: 4.1 It would be really good in combination with parallel indexing if the Lucene store could be turned off entirely for an index. The reason is that part of the store is the FieldIndex (.fdx file), which contains an 8 bytes pointer for each document in a segment, even if a document does not contain any stored fields. With parallel indexing we will want to rewrite certain parallel indexes to update them, and if such an update affects only a small number of documents it will be a waste if you have to write the .fdx file every time. So in the case where you only want to update a data structure in the inverted index it makes sense to separate your index into multiple parallel indexes, where the ones you want to update don't contain any stored fields. It'd be also great to not only allow turning off the store but to make it customizable, similarly to what flexible indexing wants to achieve regarding the inverted index. As a start I'd be happy with the ability to simply turn off the store and to add more flexibility later. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1823) QueryParser with new features for Lucene 3
[ https://issues.apache.org/jira/browse/LUCENE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1823: Fix Version/s: (was: 4.0) 4.1 moving this over to 4.1 it seems dead to me though QueryParser with new features for Lucene 3 -- Key: LUCENE-1823 URL: https://issues.apache.org/jira/browse/LUCENE-1823 Project: Lucene - Java Issue Type: New Feature Components: core/queryparser Reporter: Michael Busch Assignee: Luis Alves Priority: Minor Fix For: 4.1 Attachments: lucene_1823_any_opaque_precedence_fuzzybug_v2.patch, lucene_1823_foo_bug_08_26_2009.patch I'd like to have a new QueryParser implementation in Lucene 3.1, ideally based on the new QP framework in contrib. It should share as much code as possible with the current StandardQueryParser implementation for easy maintainability. Wish list (feel free to extend): 1. *Operator precedence*: Support operator precedence for boolean operators 2. *Opaque terms*: Ability to plugin an external parser for certain syntax extensions, e.g. XML query terms 3. *Improved RangeQuery syntax*: Use more intuitive =, =, = instead of [] and {} 4. *Support for trierange queries*: See LUCENE-1768 5. *Complex phrases*: See LUCENE-1486 6. *ANY operator*: E.g. (a b c d) ANY 3 should match if 3 of the 4 terms occur in the same document 7. *New syntax for Span queries*: I think the surround parser supports this? 8. *Escaped wildcards*: See LUCENE-588 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2443) Don't assume IntsRef offset is 0 after postings bulk read
[ https://issues.apache.org/jira/browse/LUCENE-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281852#comment-13281852 ] Michael McCandless commented on LUCENE-2443: Yeah, definitely invalid: no more bulk postings API! Don't assume IntsRef offset is 0 after postings bulk read - Key: LUCENE-2443 URL: https://issues.apache.org/jira/browse/LUCENE-2443 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Yonik found 2 places where we assume the ints starts at offset=0 after bulk read -- we can't do this because in general a codec can give us a slice into private int[] arrays, eg int block codec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2443) Don't assume IntsRef offset is 0 after postings bulk read
[ https://issues.apache.org/jira/browse/LUCENE-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2443. - Resolution: Invalid we don't have a bulk api anymore... invalid Don't assume IntsRef offset is 0 after postings bulk read - Key: LUCENE-2443 URL: https://issues.apache.org/jira/browse/LUCENE-2443 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Yonik found 2 places where we assume the ints starts at offset=0 after bulk read -- we can't do this because in general a codec can give us a slice into private int[] arrays, eg int block codec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3197) Allow firstSearcher and newSearcher listeners to run in multiple threads
[ https://issues.apache.org/jira/browse/SOLR-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281866#comment-13281866 ] Tommaso Teofili commented on SOLR-3197: --- I think, as Marks was saying, we could expose an option to define if warmups run on one or more threads, default to 1. This would just not change things as they are now and people could just explicitly set the no. of threads for warmup. Allow firstSearcher and newSearcher listeners to run in multiple threads Key: SOLR-3197 URL: https://issues.apache.org/jira/browse/SOLR-3197 Project: Solr Issue Type: Improvement Reporter: Lance Norskog SolrCore submits all listeners (firstSearcher and newSearcher) to a java ExecutorService, but uses a single-threaded one. line 965 in the trunk: {code} SolrCore.java around line 965: final ExecutorService searcherExecutor = Executors.newSingleThreadExecutor(); line 1280 in the trunk: SolrCore.java around line 1280 runs first the, and then the first and new searchers, all with the searcherExecutor object created at line 965. Would it work if we changed this ExecutorService to a thread pool version? This seems like it should work: {code} java.util.concurrent.Executors.newFixedThreadPool(int nThreads, ThreadFactory threadFactory); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2436) FilterIndexReader doesn't delegate everything necessary
[ https://issues.apache.org/jira/browse/LUCENE-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2436. --- Resolution: Fixed This was fixed by the IndexReader to AtomucReader refactoring a while ago. FilterIndexReader doesn't delegate everything necessary --- Key: LUCENE-2436 URL: https://issues.apache.org/jira/browse/LUCENE-2436 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Yonik Seeley Fix For: 4.0 Some new methods like fields() aren't delegated by FilterIndexReader, incorrectly resulting in the IndexReader base class method being used. We should audit all current IndexReader methods to determine which should be overridden and delegated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3483) Hability to change Similarity in MoreLikeThisComponent
Emmanuel Espina created SOLR-3483: - Summary: Hability to change Similarity in MoreLikeThisComponent Key: SOLR-3483 URL: https://issues.apache.org/jira/browse/SOLR-3483 Project: Solr Issue Type: New Feature Components: MoreLikeThis Reporter: Emmanuel Espina Priority: Minor Fix For: 4.0 LUCENE-896 added support for changing the Similarity class of More Like This in Lucene but this functionality has not been exposed to Solr. This issue aims to extend the MoreLikeThisComponent to support this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281892#comment-13281892 ] Simon Willnauer commented on LUCENE-2308: - can we close this issue? seems like except of yoniks last comment everything else has been resolved? Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-FT-interface.patch, LUCENE-2308-FT-interface.patch, LUCENE-2308-FT-interface.patch, LUCENE-2308-FT-interface.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4075) Crazy checkout paths break TestXPathEntityProcessor
Greg Bowyer created LUCENE-4075: --- Summary: Crazy checkout paths break TestXPathEntityProcessor Key: LUCENE-4075 URL: https://issues.apache.org/jira/browse/LUCENE-4075 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Greg Bowyer Same as a bug I raised for javadoc generation, my build.xml is the same as upstream, the problem is my checkout path looks like this /home/buildserver/workspace/builds/{search-engineering}solr-lucene{trunk} This means that the prepare-webpages target gets its paths in the buildpaths variable as a pipe separated list like so /home/buildserver/workspace/builds/{search-engineering}solr-lucene{trunk}/lucene/analysis/common/build.xml|/home/buildserver/workspace/builds/{search-engineering}solr-lucene{trunk}/lucene/analysis/icu/build.xml|...(and so on) Attached is a patch that makes TestXPathEntityProcessor use a url rather than the filesystem path that makes XPath / xml happier with crazy path names -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4075) Crazy checkout paths break TestXPathEntityProcessor
[ https://issues.apache.org/jira/browse/LUCENE-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-4075: Attachment: LUCENE-4075-TestXPathEntityProcessor-WierdPath-Fix.patch Crazy checkout paths break TestXPathEntityProcessor --- Key: LUCENE-4075 URL: https://issues.apache.org/jira/browse/LUCENE-4075 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Greg Bowyer Attachments: LUCENE-4075-TestXPathEntityProcessor-WierdPath-Fix.patch Same as a bug I raised for javadoc generation, my build.xml is the same as upstream, the problem is my checkout path looks like this /home/buildserver/workspace/builds/{search-engineering}solr-lucene{trunk} This means that the prepare-webpages target gets its paths in the buildpaths variable as a pipe separated list like so /home/buildserver/workspace/builds/{search-engineering}solr-lucene{trunk}/lucene/analysis/common/build.xml|/home/buildserver/workspace/builds/{search-engineering}solr-lucene{trunk}/lucene/analysis/icu/build.xml|...(and so on) Attached is a patch that makes TestXPathEntityProcessor use a url rather than the filesystem path that makes XPath / xml happier with crazy path names -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: wiki software
On May 19, 2012, at 5:55 PM, Ryan McKinley wrote: A *long* time ago we discussed converting to confluence to replace the forest site. The key issue was that only commiters could have access if we want to include the generated PDF in the distribution. This is all moot now that we have ditched forest. Since then the discussion has come up and I think everyone is in favor of the idea, but noone has taken the steps to make it happen. I suggest we: 1. Create infra JIRA issue to: * delete the old https://cwiki.apache.org/SOLRxSITE/ test * create https://cwiki.apache.org/SOLR * create https://cwiki.apache.org/LUCENE +1 2. Convert existing sites using https://studio.plugins.atlassian.com/wiki/display/UWC/UWC+MoinMoin+Notes I don't know if this is something we can do, or we can make an infra JIRA issue for I'd actually argue we skip this, kind of. I'd like to see us have a left hand nav that represents the versions and then we copy the docs into each version and then go through and make sure everything jives per version. While this is more work up front, I think in the long run, it will result in a much better experience for our users. 3. replace existing MoinMoin sites with links to cwiki https://wiki.apache.org/jakarta-lucene/ https://wiki.apache.org/solr/ ryan On Sat, May 19, 2012 at 12:48 PM, Mark Miller markrmil...@gmail.com wrote: I know there was a long debate about wiki software and docs and what not. It got long enough that I petered out on it. In some ways, I guess this is a lazy plea for someone that did follow along to summarize. Did we get anywhere? Is there an action item to start on? I'm in the same spot I was when I started that thread - the first bite I'm after is switching from the dated moin moin to the modern confluence. It seems as easy as opening a JIRA issue to get a confluence space up. Should we just do that and start migrating, and take further leaps from there? Or is there some fallout from the previous debate that should be incorporated into the next move? - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org Grant Ingersoll http://www.lucidimagination.com
[jira] [Commented] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments
[ https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281897#comment-13281897 ] Simon Willnauer commented on LUCENE-3440: - Koji, do you wanna get this in any time? Now is likely a good time since 4.0 is getting close. We won't apply this to 3.6.1 since that is a bugfix only release if it is going to happen at all. FastVectorHighlighter: IDF-weighted terms for ordered fragments Key: LUCENE-3440 URL: https://issues.apache.org/jira/browse/LUCENE-3440 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Reporter: sebastian L. Priority: Minor Labels: FastVectorHighlighter Fix For: 4.0 Attachments: LUCENE-3440.patch, LUCENE-3440.patch, LUCENE-3440_3.6.1-SNAPSHOT.patch, LUCENE-4.0-SNAPSHOT-3440-9.patch, weight-vs-boost_table01.html, weight-vs-boost_table02.html The FastVectorHighlighter uses for every term found in a fragment an equal weight, which causes a higher ranking for fragments with a high number of words or, in the worst case, a high number of very common words than fragments that contains *all* of the terms used in the original query. This patch provides ordered fragments with IDF-weighted terms: total weight = total weight + IDF for unique term per fragment * boost of query; The ranking-formula should be the same, or at least similar, to that one used in org.apache.lucene.search.highlight.QueryTermScorer. The patch is simple, but it works for us. Some ideas: - A better approach would be moving the whole fragments-scoring into a separate class. - Switch scoring via parameter - Exact phrases should be given a even better score, regardless if a phrase-query was executed or not - edismax/dismax-parameters pf, ps and pf^boost should be observed and corresponding fragments should be ranked higher -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments
[ https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3440: Fix Version/s: (was: 3.6.1) remove 3.6.1 from fix version - bugfix only relase FastVectorHighlighter: IDF-weighted terms for ordered fragments Key: LUCENE-3440 URL: https://issues.apache.org/jira/browse/LUCENE-3440 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Reporter: sebastian L. Priority: Minor Labels: FastVectorHighlighter Fix For: 4.0 Attachments: LUCENE-3440.patch, LUCENE-3440.patch, LUCENE-3440_3.6.1-SNAPSHOT.patch, LUCENE-4.0-SNAPSHOT-3440-9.patch, weight-vs-boost_table01.html, weight-vs-boost_table02.html The FastVectorHighlighter uses for every term found in a fragment an equal weight, which causes a higher ranking for fragments with a high number of words or, in the worst case, a high number of very common words than fragments that contains *all* of the terms used in the original query. This patch provides ordered fragments with IDF-weighted terms: total weight = total weight + IDF for unique term per fragment * boost of query; The ranking-formula should be the same, or at least similar, to that one used in org.apache.lucene.search.highlight.QueryTermScorer. The patch is simple, but it works for us. Some ideas: - A better approach would be moving the whole fragments-scoring into a separate class. - Switch scoring via parameter - Exact phrases should be given a even better score, regardless if a phrase-query was executed or not - edismax/dismax-parameters pf, ps and pf^boost should be observed and corresponding fragments should be ranked higher -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java6-64 #173
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/173/changes Changes: [simonw] LUCENE-4074: FST Sorter BufferSize causes int overflow if BufferSize 2048MB -- [...truncated 10387 lines...] [junit4] 2 27640 T3358 oasc.RequestHandlers.initHandlersFromConfig created spellCheckCompRH: org.apache.solr.handler.component.SearchHandler [junit4] 2 27640 T3358 oasc.RequestHandlers.initHandlersFromConfig created spellCheckCompRH_Direct: org.apache.solr.handler.component.SearchHandler [junit4] 2 27640 T3358 oasc.RequestHandlers.initHandlersFromConfig created spellCheckCompRH1: org.apache.solr.handler.component.SearchHandler [junit4] 2 27641 T3358 oasc.RequestHandlers.initHandlersFromConfig created tvrh: org.apache.solr.handler.component.SearchHandler [junit4] 2 27641 T3358 oasc.RequestHandlers.initHandlersFromConfig created /mlt: solr.MoreLikeThisHandler [junit4] 2 27641 T3358 oasc.RequestHandlers.initHandlersFromConfig created /debug/dump: solr.DumpRequestHandler [junit4] 2 27642 T3358 oashl.XMLLoader.init xsltCacheLifetimeSeconds=60 [junit4] 2 27645 T3358 oasc.SolrCore.initDeprecatedSupport WARNING solrconfig.xml uses deprecated admin/gettableFiles, Please update your config to use the ShowFileRequestHandler. [junit4] 2 27646 T3358 oasc.SolrCore.initDeprecatedSupport WARNING adding ShowFileRequestHandler with hidden files: [SOLRCONFIG-HIGHLIGHT.XML, SCHEMA-REQUIRED-FIELDS.XML, SCHEMA-REPLICATION2.XML, SCHEMA-MINIMAL.XML, BAD-SCHEMA-DUP-DYNAMICFIELD.XML, SOLRCONFIG-CACHING.XML, SOLRCONFIG-REPEATER.XML, CURRENCY.XML, BAD-SCHEMA-NONTEXT-ANALYZER.XML, SOLRCONFIG-MERGEPOLICY.XML, SOLRCONFIG-TLOG.XML, SOLRCONFIG-MASTER.XML, SCHEMA11.XML, SOLRCONFIG-BASIC.XML, DA_COMPOUNDDICTIONARY.TXT, SCHEMA-COPYFIELD-TEST.XML, SOLRCONFIG-SLAVE.XML, ELEVATE.XML, SOLRCONFIG-PROPINJECT-INDEXDEFAULT.XML, SCHEMA-IB.XML, SOLRCONFIG-QUERYSENDER.XML, SCHEMA-REPLICATION1.XML, DA_UTF8.XML, HYPHENATION.DTD, SOLRCONFIG-ENABLEPLUGIN.XML, STEMDICT.TXT, SCHEMA-PHRASESUGGEST.XML, HUNSPELL-TEST.AFF, STOPTYPES-1.TXT, STOPWORDSWRONGENCODING.TXT, SCHEMA-NUMERIC.XML, SOLRCONFIG-TRANSFORMERS.XML, SOLRCONFIG-PROPINJECT.XML, BAD-SCHEMA-NOT-INDEXED-BUT-TF.XML, SOLRCONFIG-SIMPLELOCK.XML, WDFTYPES.TXT, STOPTYPES-2.TXT, SCHEMA-REVERSED.XML, SOLRCONFIG-SPELLCHECKCOMPONENT.XML, SCHEMA-DFR.XML, SOLRCONFIG-PHRASESUGGEST.XML, BAD-SCHEMA-NOT-INDEXED-BUT-POS.XML, KEEP-1.TXT, OPEN-EXCHANGE-RATES.JSON, STOPWITHBOM.TXT, SCHEMA-BINARYFIELD.XML, SOLRCONFIG-SPELLCHECKER.XML, SOLRCONFIG-UPDATE-PROCESSOR-CHAINS.XML, BAD-SCHEMA-OMIT-TF-BUT-NOT-POS.XML, BAD-SCHEMA-DUP-FIELDTYPE.XML, SOLRCONFIG-MASTER1.XML, SYNONYMS.TXT, SCHEMA.XML, SCHEMA_CODEC.XML, SOLRCONFIG-SOLR-749.XML, SOLRCONFIG-MASTER1-KEEPONEBACKUP.XML, STOP-2.TXT, SOLRCONFIG-FUNCTIONQUERY.XML, SCHEMA-LMDIRICHLET.XML, SOLRCONFIG-TERMINDEX.XML, SOLRCONFIG-ELEVATE.XML, STOPWORDS.TXT, SCHEMA-FOLDING.XML, SCHEMA-STOP-KEEP.XML, BAD-SCHEMA-NOT-INDEXED-BUT-NORMS.XML, SOLRCONFIG-SOLCOREPROPERTIES.XML, STOP-1.TXT, SOLRCONFIG-MASTER2.XML, SCHEMA-SPELLCHECKER.XML, SOLRCONFIG-LAZYWRITER.XML, SCHEMA-LUCENEMATCHVERSION.XML, BAD-MP-SOLRCONFIG.XML, FRENCHARTICLES.TXT, SCHEMA15.XML, SOLRCONFIG-REQHANDLER.INCL, SCHEMASURROUND.XML, SCHEMA-COLLATEFILTER.XML, SOLRCONFIG-MASTER3.XML, HUNSPELL-TEST.DIC, SOLRCONFIG-XINCLUDE.XML, SOLRCONFIG-DELPOLICY1.XML, SOLRCONFIG-SLAVE1.XML, SCHEMA-SIM.XML, SCHEMA-COLLATE.XML, STOP-SNOWBALL.TXT, PROTWORDS.TXT, SCHEMA-TRIE.XML, SOLRCONFIG_CODEC.XML, SCHEMA-TFIDF.XML, SCHEMA-LMJELINEKMERCER.XML, PHRASESUGGEST.TXT, SOLRCONFIG-BASIC-LUCENEVERSION31.XML, OLD_SYNONYMS.TXT, SOLRCONFIG-DELPOLICY2.XML, XSLT, SOLRCONFIG-NATIVELOCK.XML, BAD-SCHEMA-DUP-FIELD.XML, SOLRCONFIG-NOCACHE.XML, SCHEMA-BM25.XML, SOLRCONFIG-ALTDIRECTORY.XML, SOLRCONFIG-QUERYSENDER-NOQUERY.XML, COMPOUNDDICTIONARY.TXT, SOLRCONFIG_PERF.XML, SCHEMA-NOT-REQUIRED-UNIQUE-KEY.XML, KEEP-2.TXT, SCHEMA12.XML, MAPPING-ISOLATIN1ACCENT.TXT, BAD_SOLRCONFIG.XML, BAD-SCHEMA-EXTERNAL-FILEFIELD.XML] [junit4] 2 27651 T3358 oass.SolrIndexSearcher.init Opening Searcher@b8c765f main [junit4] 2 27651 T3358 oass.SolrIndexSearcher.init WARNING WARNING: Directory impl does not support setting indexDir: org.apache.lucene.store.MockDirectoryWrapper [junit4] 2 27651 T3358 oasu.CommitTracker.init Hard AutoCommit: disabled [junit4] 2 27654 T3358 oasu.CommitTracker.init Soft AutoCommit: disabled [junit4] 2 27654 T3358 oashc.SpellCheckComponent.inform Initializing spell checkers [junit4] 2 27675 T3358 oass.DirectSolrSpellChecker.init init: {name=direct,classname=DirectSolrSpellChecker,field=lowerfilt,minQueryLength=3} [junit4] 2 27750 T3358 oashc.HttpShardHandlerFactory.getParameter Setting socketTimeout to: 0 [junit4] 2 27750 T3358 oashc.HttpShardHandlerFactory.getParameter Setting urlScheme to: http:// [junit4] 2 27750 T3358
[jira] [Commented] (LUCENE-4006) system requirements is duplicated across versioned/unversioned
[ https://issues.apache.org/jira/browse/LUCENE-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281904#comment-13281904 ] Uwe Schindler commented on LUCENE-4006: --- Robert: The forrested one is now gone as we used a chainsaw, right? So I think we can close this issue :-) system requirements is duplicated across versioned/unversioned -- Key: LUCENE-4006 URL: https://issues.apache.org/jira/browse/LUCENE-4006 Project: Lucene - Java Issue Type: Task Components: general/javadocs Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 4.0 Our System requirements page is located here on the unversioned site: http://lucene.apache.org/core/systemreqs.html But its also in forrest under each release. Can we just nuke the forrested one? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4006) system requirements is duplicated across versioned/unversioned
[ https://issues.apache.org/jira/browse/LUCENE-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281906#comment-13281906 ] Uwe Schindler commented on LUCENE-4006: --- Or maybe better move the system requirements per-release...? Thats why this issue is still open. system requirements is duplicated across versioned/unversioned -- Key: LUCENE-4006 URL: https://issues.apache.org/jira/browse/LUCENE-4006 Project: Lucene - Java Issue Type: Task Components: general/javadocs Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 4.0 Our System requirements page is located here on the unversioned site: http://lucene.apache.org/core/systemreqs.html But its also in forrest under each release. Can we just nuke the forrested one? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Jenkins: Lucene-Solr-trunk-Windows-Java7-64 #102
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/102/changes Changes: [simonw] LUCENE-4074: FST Sorter BufferSize causes int overflow if BufferSize 2048MB [simonw] LUCENE-4018: Make MappingMultiDocsEnum subenums accessible -- [...truncated 11726 lines...] [junit4] Completed in 1.32s, 1 test [junit4] [junit4] Suite: org.apache.solr.cloud.LeaderElectionTest [junit4] Completed in 26.17s, 4 tests [junit4] [junit4] Suite: org.apache.solr.search.similarities.TestDefaultSimilarityFactory [junit4] Completed in 0.16s, 1 test [junit4] [junit4] Suite: org.apache.solr.analysis.TestNorwegianMinimalStemFilterFactory [junit4] Completed in 0.01s, 1 test [junit4] [junit4] Suite: org.apache.solr.cloud.OverseerTest [junit4] Completed in 63.07s, 7 tests [junit4] [junit4] Suite: org.apache.solr.cloud.NodeStateWatcherTest [junit4] Completed in 23.86s, 1 test [junit4] [junit4] Suite: org.apache.solr.cloud.BasicDistributedZkTest [junit4] Completed in 59.09s, 1 test [junit4] [junit4] Suite: org.apache.solr.analysis.TestPhoneticFilterFactory [junit4] Completed in 8.99s, 5 tests [junit4] [junit4] Suite: org.apache.solr.TestDistributedGrouping [junit4] Completed in 26.93s, 1 test [junit4] [junit4] Suite: org.apache.solr.handler.component.DistributedSpellCheckComponentTest [junit4] Completed in 21.67s, 1 test [junit4] [junit4] Suite: org.apache.solr.cloud.BasicZkTest [junit4] Completed in 11.86s, 1 test [junit4] [junit4] Suite: org.apache.solr.TestJoin [junit4] Completed in 13.22s, 2 tests [junit4] [junit4] Suite: org.apache.solr.cloud.ZkControllerTest [junit4] Completed in 17.67s, 3 tests [junit4] [junit4] Suite: org.apache.solr.TestGroupingSearch [junit4] Completed in 7.16s, 12 tests [junit4] [junit4] Suite: org.apache.solr.search.function.TestFunctionQuery [junit4] Completed in 4.15s, 14 tests [junit4] [junit4] Suite: org.apache.solr.handler.component.DistributedQueryElevationComponentTest [junit4] Completed in 5.42s, 1 test [junit4] [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterFSTTest [junit4] Completed in 1.49s, 4 tests [junit4] [junit4] Suite: org.apache.solr.core.SolrCoreTest [junit4] Completed in 5.20s, 5 tests [junit4] [junit4] Suite: org.apache.solr.schema.BadIndexSchemaTest [junit4] Completed in 1.23s, 6 tests [junit4] [junit4] Suite: org.apache.solr.handler.StandardRequestHandlerTest [junit4] Completed in 0.98s, 1 test [junit4] [junit4] Suite: org.apache.solr.request.TestWriterPerf [junit4] Completed in 1.26s, 1 test [junit4] [junit4] Suite: org.apache.solr.search.TestPseudoReturnFields [junit4] Completed in 1.41s, 13 tests [junit4] [junit4] Suite: org.apache.solr.search.TestSurroundQueryParser [junit4] Completed in 0.96s, 1 test [junit4] [junit4] Suite: org.apache.solr.search.function.SortByFunctionTest [junit4] Completed in 1.88s, 2 tests [junit4] [junit4] Suite: org.apache.solr.search.function.distance.DistanceFunctionTest [junit4] Completed in 1.04s, 3 tests [junit4] [junit4] Suite: org.apache.solr.handler.XsltUpdateRequestHandlerTest [junit4] Completed in 0.96s, 1 test [junit4] [junit4] Suite: org.apache.solr.core.SolrCoreCheckLockOnStartupTest [junit4] Completed in 1.50s, 2 tests [junit4] [junit4] Suite: org.apache.solr.search.TestFoldingMultitermQuery [junit4] Completed in 1.32s, 18 tests [junit4] [junit4] Suite: org.apache.solr.schema.CurrencyFieldTest [junit4] IGNORED 0.00s | CurrencyFieldTest.testPerformance [junit4] Cause: Annotated @Ignore() [junit4] Completed in 1.15s, 8 tests, 1 skipped [junit4] [junit4] Suite: org.apache.solr.core.TestSolrDeletionPolicy1 [junit4] IGNOR/A 0.04s | TestSolrDeletionPolicy1.testCommitAge [junit4] Assumption #1: This test is not working on Windows (or maybe machines with only 2 CPUs) [junit4] 2 1005 T3145 oas.SolrTestCaseJ4.setUp ###Starting testCommitAge [junit4] 2 1011 T3145 C60 oasu.DirectUpdateHandler2.deleteAll [collection1] REMOVING ALL DOCUMENTS FROM INDEX [junit4] 2 1012 T3145 C60 UPDATE [collection1] webapp=null path=null params={} {deleteByQuery=*:*} 0 1 [junit4] 2 1014 T3145 oas.SolrTestCaseJ4.tearDown ###Ending testCommitAge [junit4] 2 [junit4] Completed in 1.25s, 3 tests, 1 skipped [junit4] [junit4] Suite: org.apache.solr.update.SolrIndexConfigTest [junit4] Completed in 1.66s, 2 tests [junit4] [junit4] Suite: org.apache.solr.handler.XmlUpdateRequestHandlerTest [junit4] Completed in 0.96s, 3 tests [junit4] [junit4] Suite: org.apache.solr.handler.component.DebugComponentTest [junit4] Completed in 1.30s, 2 tests [junit4] [junit4] Suite:
Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java6-64 #174
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/174/changes - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Jenkins build is back to normal : Lucene-Solr-trunk-Windows-Java7-64 #103
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/103/changes - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1489) highlighter problem with n-gram tokens
[ https://issues.apache.org/jira/browse/LUCENE-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282040#comment-13282040 ] Lance Norskog commented on LUCENE-1489: --- Is this still a problem? highlighter problem with n-gram tokens -- Key: LUCENE-1489 URL: https://issues.apache.org/jira/browse/LUCENE-1489 Project: Lucene - Java Issue Type: Bug Components: modules/highlighter Reporter: Koji Sekiguchi Priority: Minor Attachments: LUCENE-1489.patch, lucene1489.patch I have a problem when using n-gram and highlighter. I thought it had been solved in LUCENE-627... Actually, I found this problem when I was using CJKTokenizer on Solr, though, here is lucene program to reproduce it using NGramTokenizer(min=2,max=2) instead of CJKTokenizer: {code:java} public class TestNGramHighlighter { public static void main(String[] args) throws Exception { Analyzer analyzer = new NGramAnalyzer(); final String TEXT = Lucene can make index. Then Lucene can search.; final String QUERY = can; QueryParser parser = new QueryParser(f,analyzer); Query query = parser.parse(QUERY); QueryScorer scorer = new QueryScorer(query,f); Highlighter h = new Highlighter( scorer ); System.out.println( h.getBestFragment(analyzer, f, TEXT) ); } static class NGramAnalyzer extends Analyzer { public TokenStream tokenStream(String field, Reader input) { return new NGramTokenizer(input,2,2); } } } {code} expected output is: Lucene Bcan/B make index. Then Lucene Bcan/B search. but the actual output is: Lucene Bcan make index. Then Lucene can/B search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2694) LogUpdateProcessor not thread safe
[ https://issues.apache.org/jira/browse/SOLR-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282041#comment-13282041 ] Ethan Tao commented on SOLR-2694: - We've decided to manually apply the patch from Solr-2804 dated 1/1/12 to current Snapshot. If this patch won't become official, at least there should be a new class ConcurrentLogUpdateProcessorFactory to handle the thread safe issue. We'll file a new bug for it. Thanks. LogUpdateProcessor not thread safe -- Key: SOLR-2694 URL: https://issues.apache.org/jira/browse/SOLR-2694 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.4.1, 3.1, 3.2, 3.3, 4.0 Reporter: Jan Høydahl Using the LogUpdateProcessor while feeding in multiple parallell threads does not work, as LogUpdateProcessor is not threadsafe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Jenkins: Lucene-Solr-trunk-Linux-Java6-64 #500
See http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/500/ -- [...truncated 11820 lines...] [junit4] Suite: org.apache.solr.update.AutoCommitTest [junit4] Completed on J1 in 8.51s, 3 tests [junit4] [junit4] Suite: org.apache.solr.util.SolrPluginUtilsTest [junit4] Completed on J1 in 0.52s, 7 tests [junit4] [junit4] Suite: org.apache.solr.cloud.FullSolrCloudDistribCmdsTest [junit4] Completed on J0 in 21.00s, 1 test [junit4] [junit4] Suite: org.apache.solr.cloud.ZkSolrClientTest [junit4] Completed on J0 in 6.40s, 4 tests [junit4] [junit4] Suite: org.apache.solr.internal.csv.writer.CSVWriterTest [junit4] Completed on J1 in 0.00s, 2 tests [junit4] [junit4] Suite: org.apache.solr.handler.TestReplicationHandler [junit4] Completed on J1 in 22.93s, 1 test [junit4] [junit4] Suite: org.apache.solr.handler.component.DistributedSpellCheckComponentTest [junit4] Completed on J0 in 6.93s, 1 test [junit4] [junit4] Suite: org.apache.solr.cloud.TestHashPartitioner [junit4] Completed on J0 in 5.12s, 1 test [junit4] [junit4] Suite: org.apache.solr.cloud.ZkControllerTest [junit4] Completed on J1 in 7.10s, 3 tests [junit4] [junit4] Suite: org.apache.solr.search.function.TestFunctionQuery [junit4] Completed on J0 in 1.98s, 14 tests [junit4] [junit4] Suite: org.apache.solr.spelling.suggest.SuggesterFSTTest [junit4] Completed on J1 in 0.73s, 4 tests [junit4] [junit4] Suite: org.apache.solr.handler.MoreLikeThisHandlerTest [junit4] Completed on J1 in 0.67s, 1 test [junit4] [junit4] Suite: org.apache.solr.TestTrie [junit4] Completed on J1 in 0.82s, 8 tests [junit4] [junit4] Suite: org.apache.solr.schema.BadIndexSchemaTest [junit4] Completed on J1 in 0.83s, 6 tests [junit4] [junit4] Suite: org.apache.solr.search.TestSort [junit4] Completed on J0 in 3.92s, 2 tests [junit4] [junit4] Suite: org.apache.solr.core.TestJmxIntegration [junit4] IGNORED 0.00s J0 | TestJmxIntegration.testJmxOnCoreReload [junit4] Cause: Annotated @Ignore(timing problem? https://issues.apache.org/jira/browse/SOLR-2715) [junit4] Completed on J0 in 1.09s, 3 tests, 1 skipped [junit4] [junit4] Suite: org.apache.solr.handler.component.StatsComponentTest [junit4] Completed on J1 in 3.12s, 6 tests [junit4] [junit4] Suite: org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest [junit4] Completed on J1 in 0.78s, 6 tests [junit4] [junit4] Suite: org.apache.solr.BasicFunctionalityTest [junit4] IGNORED 0.00s J0 | BasicFunctionalityTest.testDeepPaging [junit4] Cause: Annotated @Ignore(See SOLR-1726) [junit4] Completed on J0 in 1.74s, 23 tests, 1 skipped [junit4] [junit4] Suite: org.apache.solr.request.TestWriterPerf [junit4] Completed on J0 in 0.87s, 1 test [junit4] [junit4] Suite: org.apache.solr.core.TestCoreContainer [junit4] Completed on J1 in 1.68s, 1 test [junit4] [junit4] Suite: org.apache.solr.handler.CSVRequestHandlerTest [junit4] Completed on J1 in 0.68s, 1 test [junit4] [junit4] Suite: org.apache.solr.highlight.HighlighterTest [junit4] Completed on J0 in 1.64s, 27 tests [junit4] [junit4] Suite: org.apache.solr.search.function.distance.DistanceFunctionTest [junit4] Completed on J0 in 0.70s, 3 tests [junit4] [junit4] Suite: org.apache.solr.spelling.SpellCheckCollatorTest [junit4] Completed on J1 in 1.32s, 6 tests [junit4] [junit4] Suite: org.apache.solr.schema.PolyFieldTest [junit4] Completed on J1 in 0.77s, 4 tests [junit4] [junit4] Suite: org.apache.solr.search.SpatialFilterTest [junit4] Completed on J0 in 1.06s, 3 tests [junit4] [junit4] Suite: org.apache.solr.servlet.CacheHeaderTest [junit4] Completed on J1 in 0.62s, 5 tests [junit4] [junit4] Suite: org.apache.solr.schema.TestOmitPositions [junit4] Completed on J0 in 0.55s, 2 tests [junit4] [junit4] Suite: org.apache.solr.core.TestSolrDeletionPolicy1 [junit4] Completed on J1 in 0.73s, 3 tests [junit4] [junit4] Suite: org.apache.solr.update.SolrIndexConfigTest [junit4] Completed on J0 in 0.87s, 2 tests [junit4] [junit4] Suite: org.apache.solr.analysis.TestReversedWildcardFilterFactory [junit4] Completed on J1 in 0.52s, 4 tests [junit4] [junit4] Suite: org.apache.solr.search.TestQueryUtils [junit4] Completed on J1 in 0.59s, 1 test [junit4] [junit4] Suite: org.apache.solr.search.TestIndexSearcher [junit4] Completed on J0 in 1.29s, 2 tests [junit4] [junit4] Suite: org.apache.solr.DisMaxRequestHandlerTest [junit4] Completed on J1 in 0.62s, 3 tests [junit4] [junit4] Suite: org.apache.solr.response.TestCSVResponseWriter [junit4] Completed on J0 in 0.49s, 1 test [junit4] [junit4] Suite: org.apache.solr.schema.IndexSchemaTest
[jira] [Created] (SOLR-3484) LogUpdateProcessor throws ConcurrentModificationException under multi-threading calls
Ethan Tao created SOLR-3484: --- Summary: LogUpdateProcessor throws ConcurrentModificationException under multi-threading calls Key: SOLR-3484 URL: https://issues.apache.org/jira/browse/SOLR-3484 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.0 Environment: linux Reporter: Ethan Tao Using the LogUpdateProcessor in a singleton chain for concurrent processing throws exception. The issue has been reported in SOLR-2694 (closed), and an unoffical patch can be found in related bug-id SOLR-2804 patch dated 1/1/12. If the patch won't become official for LogUpdateProcessor, suggested to have new class ConcurrentLogUpdateProcessorFactory to address the thread safe issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: wiki software
Agree with Grant that we need a better, more structured documentation set, per version. But let's first do what Ryan suggests - switch to Confluence - put all converted MoinMoin pages in a separate legacy section of the Wiki. Then in phase II start an effort do do a fresh documentation for version 4.0 only, but first outlining the structure and placeholder pages and then filling in the meat. A good thing about Confluence is that we could probably use macros to link to SVN and Javadoc and in various ways auto genereate parts of the docs. This way we can migrate Wiki software without being held up by the need to rewrite everything, and we do not need to keep updating two systems. -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 23. mai 2012, at 22:49, Grant Ingersoll wrote: On May 19, 2012, at 5:55 PM, Ryan McKinley wrote: A *long* time ago we discussed converting to confluence to replace the forest site. The key issue was that only commiters could have access if we want to include the generated PDF in the distribution. This is all moot now that we have ditched forest. Since then the discussion has come up and I think everyone is in favor of the idea, but noone has taken the steps to make it happen. I suggest we: 1. Create infra JIRA issue to: * delete the old https://cwiki.apache.org/SOLRxSITE/ test * create https://cwiki.apache.org/SOLR * create https://cwiki.apache.org/LUCENE +1 2. Convert existing sites using https://studio.plugins.atlassian.com/wiki/display/UWC/UWC+MoinMoin+Notes I don't know if this is something we can do, or we can make an infra JIRA issue for I'd actually argue we skip this, kind of. I'd like to see us have a left hand nav that represents the versions and then we copy the docs into each version and then go through and make sure everything jives per version. While this is more work up front, I think in the long run, it will result in a much better experience for our users. 3. replace existing MoinMoin sites with links to cwiki https://wiki.apache.org/jakarta-lucene/ https://wiki.apache.org/solr/ ryan On Sat, May 19, 2012 at 12:48 PM, Mark Miller markrmil...@gmail.com wrote: I know there was a long debate about wiki software and docs and what not. It got long enough that I petered out on it. In some ways, I guess this is a lazy plea for someone that did follow along to summarize. Did we get anywhere? Is there an action item to start on? I'm in the same spot I was when I started that thread - the first bite I'm after is switching from the dated moin moin to the modern confluence. It seems as easy as opening a JIRA issue to get a confluence space up. Should we just do that and start migrating, and take further leaps from there? Or is there some fallout from the previous debate that should be incorporated into the next move? - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org Grant Ingersoll http://www.lucidimagination.com