[JENKINS] Lucene-Solr-Tests-4.x-Java7 - Build # 1412 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java7/1412/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=2766, name=recoveryCmdExecutor-1248-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=2766, name=recoveryCmdExecutor-1248-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) at __randomizedtesting.SeedInfo.seed([9EFF148BF67261FD]:0) FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: There are still zombie threads that couldn't be terminated:1) Thread[id=2766, name=recoveryCmdExecutor-1248-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at
[jira] [Updated] (LUCENE-5098) Broadword bit selection
[ https://issues.apache.org/jira/browse/LUCENE-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-5098: - Fix Version/s: 4.5 Broadword bit selection --- Key: LUCENE-5098 URL: https://issues.apache.org/jira/browse/LUCENE-5098 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Paul Elschot Assignee: Adrien Grand Priority: Minor Fix For: 4.5 Attachments: LUCENE-5098.patch, LUCENE-5098.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5098) Broadword bit selection
[ https://issues.apache.org/jira/browse/LUCENE-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-5098. -- Resolution: Fixed Broadword bit selection --- Key: LUCENE-5098 URL: https://issues.apache.org/jira/browse/LUCENE-5098 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Paul Elschot Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5098.patch, LUCENE-5098.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5111) Fix WordDelimiterFilter
Adrien Grand created LUCENE-5111: Summary: Fix WordDelimiterFilter Key: LUCENE-5111 URL: https://issues.apache.org/jira/browse/LUCENE-5111 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand Assignee: Adrien Grand WordDelimiterFilter is documented as broken is TestRandomChains (LUCENE-4641). Given how used it is, we should try to fix it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards
[ https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708301#comment-13708301 ] ASF subversion and git services commented on SOLR-4997: --- Commit 1503130 from sha...@apache.org in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1503130 ] SOLR-4997: The splitshard api doesn't call commit on new sub shards before switching shard states. Multiple bugs related to sub shard recovery and replication are also fixed. The splitshard api doesn't call commit on new sub shards Key: SOLR-4997 URL: https://issues.apache.org/jira/browse/SOLR-4997 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.3, 4.3.1 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.4 Attachments: SOLR-4997.patch, SOLR-4997.patch The splitshard api doesn't call commit on new sub shards but it happily sets them to active state which means on a successful split, the documents are not visible to searchers unless an explicit commit is called on the cluster. The coreadmin split api will still not call commit on targetCores. That is by design and we're not going to change that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards
[ https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708302#comment-13708302 ] ASF subversion and git services commented on SOLR-4997: --- Commit 1503131 from sha...@apache.org in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1503131 ] SOLR-4997: Call commit before checking shard consistency The splitshard api doesn't call commit on new sub shards Key: SOLR-4997 URL: https://issues.apache.org/jira/browse/SOLR-4997 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.3, 4.3.1 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.4 Attachments: SOLR-4997.patch, SOLR-4997.patch The splitshard api doesn't call commit on new sub shards but it happily sets them to active state which means on a successful split, the documents are not visible to searchers unless an explicit commit is called on the cluster. The coreadmin split api will still not call commit on targetCores. That is by design and we're not going to change that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken
George Rhoten created LUCENE-5112: - Summary: FilteringTokenFilter is double incrementing the position increment in incrementToken Key: LUCENE-5112 URL: https://issues.apache.org/jira/browse/LUCENE-5112 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: George Rhoten The following code from FilteringTokenFilter#incrementToken() seems wrong. {noformat} if (enablePositionIncrements) { int skippedPositions = 0; while (input.incrementToken()) { if (accept()) { if (skippedPositions != 0) { posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + skippedPositions); } return true; } skippedPositions += posIncrAtt.getPositionIncrement(); } } else { {noformat} The skippedPositions variable should probably be incremented by 1 instead of posIncrAtt.getPositionIncrement(). As it is, it seems to be double incrementing, which is a problem if your data is full of stop words and your position increment integer overflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken
[ https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708324#comment-13708324 ] George Rhoten commented on LUCENE-5112: --- The workaround seems to be to always use setEnablePositionIncrements(false) on any stop filter being used. FilteringTokenFilter is double incrementing the position increment in incrementToken Key: LUCENE-5112 URL: https://issues.apache.org/jira/browse/LUCENE-5112 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: George Rhoten The following code from FilteringTokenFilter#incrementToken() seems wrong. {noformat} if (enablePositionIncrements) { int skippedPositions = 0; while (input.incrementToken()) { if (accept()) { if (skippedPositions != 0) { posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + skippedPositions); } return true; } skippedPositions += posIncrAtt.getPositionIncrement(); } } else { {noformat} The skippedPositions variable should probably be incremented by 1 instead of posIncrAtt.getPositionIncrement(). As it is, it seems to be double incrementing, which is a problem if your data is full of stop words and your position increment integer overflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken
[ https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708334#comment-13708334 ] George Rhoten commented on LUCENE-5112: --- For reference, this issue causes this exception: {noformat} java.lang.IllegalArgumentException: position overflow for field 'labels' at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:135) at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:307) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:244) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:373) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1124) {noformat} FilteringTokenFilter is double incrementing the position increment in incrementToken Key: LUCENE-5112 URL: https://issues.apache.org/jira/browse/LUCENE-5112 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: George Rhoten The following code from FilteringTokenFilter#incrementToken() seems wrong. {noformat} if (enablePositionIncrements) { int skippedPositions = 0; while (input.incrementToken()) { if (accept()) { if (skippedPositions != 0) { posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + skippedPositions); } return true; } skippedPositions += posIncrAtt.getPositionIncrement(); } } else { {noformat} The skippedPositions variable should probably be incremented by 1 instead of posIncrAtt.getPositionIncrement(). As it is, it seems to be double incrementing, which is a problem if your data is full of stop words and your position increment integer overflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
FunctionQuery result field in SearchComponent code ?
Hi , I have written my custom Solr 4.3.0 SearchComponent and purpose of this component is to sum the result of FunctionQuery (termfreq) of some term of each doc and them embed the result in final output. This is my query: http://localhost:8080/solr/collection2/demoendpoint?q=spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%27spider%27%29 This is sample result doc on browser: docstr name=id11/strstr name=typeVideo Games/strstr name=formatxbox 360/strstr name=productThe Amazing Spider-Man/strint name=popularity11/intlong name=_version_1439994081345273856/longint name=freq1/int/doc Here is my code from SearchComponent DocList docs = rb.getResults().docList; DocIterator iterator = docs.iterator(); int sumFreq = 0; String id = null; for (int i = 0; i docs.size(); i++) { try { int docId = iterator.nextDoc(); // Document doc = searcher.doc(docId, fieldSet); Document doc = searcher.doc(docId); In 'doc' object I can see the schema fields like 'id', 'type','format' etc. but I cannot find the field 'freq' which I needed. Is there any way to get the FunctionQuery fields in doc object ? Thanks, Tony
[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken
[ https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708388#comment-13708388 ] Michael McCandless commented on LUCENE-5112: I think the code is correct: we accumulate posInc of all tokens that were not accepted, plus the final posInc of the token that was accepted. I don't see how this leads to integer overflows when a StopFilter is used ... can you make a contained test showing that? FilteringTokenFilter is double incrementing the position increment in incrementToken Key: LUCENE-5112 URL: https://issues.apache.org/jira/browse/LUCENE-5112 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: George Rhoten The following code from FilteringTokenFilter#incrementToken() seems wrong. {noformat} if (enablePositionIncrements) { int skippedPositions = 0; while (input.incrementToken()) { if (accept()) { if (skippedPositions != 0) { posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + skippedPositions); } return true; } skippedPositions += posIncrAtt.getPositionIncrement(); } } else { {noformat} The skippedPositions variable should probably be incremented by 1 instead of posIncrAtt.getPositionIncrement(). As it is, it seems to be double incrementing, which is a problem if your data is full of stop words and your position increment integer overflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-trunk-Java7 - Build # 4144 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java7/4144/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=2530, name=recoveryCmdExecutor-829-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=2530, name=recoveryCmdExecutor-829-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) at __randomizedtesting.SeedInfo.seed([6C3491EDDF8067DC]:0) FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: There are still zombie threads that couldn't be terminated:1) Thread[id=2530, name=recoveryCmdExecutor-829-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
Re: Lookback and/or time-aware Merge Policy?
Lookback is a good idea: you could at least gather statistics and assess, later, whether good merges had been selected, and maybe play what if games to explore if different merge selections would have resulted in less copying. A time-based MergeScheduler would make sense: e.g., it would allow small merges to run any time, but big ones must wait until after hours. Also, RateLimitedDirWrapper can be used to limit IO impact of ongoing merges. It's like a naive ionice, for merging. Mike McCandless http://blog.mikemccandless.com On Mon, Jul 8, 2013 at 10:41 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I was (re-re-re-re)-reading Mike's post about Lucene segment merges - http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Mike mentioned lookhead as something that could possibly yield more optimal merges. But what about lookback? :) What if some sort of stats were kept about about which segments were picked for merges? With some sort of stats in hand, could one look back and, knowing what happened after those merges, evaluate if more optimal merge choices could have been made and then use that next time? Also, what about time of day and query rates? Very often search traffic follows the wave pattern, which could mean that more aggressive merging could be done during periods with lower query rates... or maybe during that time more segments could be allowed to live in the index, assuming that after allowing that for some time, the subsequent merge could be bigger/more thorough, so to speak. Thoughts? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708428#comment-13708428 ] Michael McCandless commented on LUCENE-3069: {quote} Another thing that surprised me is, with the same code/conf, luceneutil creates different sizes of index? I tested that df==0 trick several times on wikimedium1m, the index size varies from 514M~522M... Will multi-threading affects much here? {quote} Using threads means the docs are assigned to different segments each time you run ... it's interesting this can cause such variance in the index size though. It is known that e.g. sorting docs by web site (if you are indexing content from different sites) can give good compression; maybe that's the effect we're seeing here? Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module
Hello guys, Indeed, the GWT port is work in progress and far from done. The driving factor here was to be able to later integrate luke into the solr admin as well as have the standalone webapp for non-solr users. There is (was?) a luke stats handler in the solr ui, that printed some stats on the index. That could be substituted with the GWT app. The code isn't yet ready to see the light. So if it makes more sense for Ajay to work on the existing jira with the Apache Pivot implementation, I would say go ahead. In the current port effort (the aforementioned github's fork) the UI is the original one, developed by Andrzej. Beside the UI rework there is plenty things to port / verify (like e.g. Hadoop plugin) against the latest lucene versions. See the readme.md: https://github.com/dmitrykey/luke Whichever way's taken, hopefully we end up having stable releases of luke :) Dmitry Kan On 14 July 2013 22:38, Andrzej Bialecki a...@getopt.org wrote: On 7/14/13 5:04 AM, Ajay Bhat wrote: Shawn and Andrzej, Thanks for answering my questions. I've looked over the code done by Dmitry and I'll look into what I can do to help with the UI porting in future. I was actually thinking of doing this JIRA as a project by myself with some assistance from the community after getting a mentor for the ASF ICFOSS program, which I haven't found yet. It would be great if I could get one of you guys as a mentor. As the UI work has been mostly done by others like Dmitry Kan, I don't think I need to work on that majorly for now. It's far from done - he just started the process. What other work is there to be done that I can do as a project? Any new features or improvements? Regards, Ajay On Jul 14, 2013 1:54 AM, Andrzej Bialecki a...@getopt.org mailto:a...@getopt.org wrote: On 7/13/13 8:56 PM, Shawn Heisey wrote: On 7/13/2013 3:15 AM, Ajay Bhat wrote: One more question : What version of Lucene does Luke currently support right now? I saw a comment on the issue page that it doesn't support the Lucene 4.1 and 4.2 trunk. The official Luke project only has versions up through 4.0.0-ALPHA. http://code.google.com/p/luke/ There is a forked project that has produced Luke for newer Lucene versions. https://java.net/projects/__**opengrok/downloadshttps://java.net/projects/__opengrok/downloads https://java.net/projects/**opengrok/downloadshttps://java.net/projects/opengrok/downloads I can't seem to locate any information about how they have licensed the newer versions, and I'm not really sure where the source code is living. Regarding a question you asked earlier, Luke is a standalone program. It does include Lucene classes in the lukeall version of the executable jar. Luke may have some uses as a library, but I think that most people run it separately. There is partial Luke functionality embedded in the Solr admin UI, but I don't know whether that is something cooked up by Solr devs or if it shares actual code with Luke. Ajay, Luke is a standalone GUI application, not a library. It uses a custom version of Thinlet GUI toolkit, which is no longer maintained, and it's LGPL licensed, so Luke can't be contributed to the Lucene project as is. Recently several people expressed interest in porting Luke to some other GUI toolkit that is Apache-friendly. See the discussion here: http://groups.google.com/d/__**msg/luke-discuss/S_Whwg2jwmA/_** _9JgqKIe5aiwJhttp://groups.google.com/d/__msg/luke-discuss/S_Whwg2jwmA/__9JgqKIe5aiwJ http://groups.google.com/d/**msg/luke-discuss/S_Whwg2jwmA/** 9JgqKIe5aiwJhttp://groups.google.com/d/msg/luke-discuss/S_Whwg2jwmA/9JgqKIe5aiwJ In particular, there's a fork by Dmitry Kan - he plans to integrate other patches and forks, and to port Luke from Thinlet to GWT and sync it with the latest version of Lucene. I think you should coordinate your efforts with him and other contributors that work on that code base. This fork is Apache-licensed and the long-term plan is to contribute it back to Lucene once the porting is done. The Pivot-based port of Luke that is in the Lucene sandbox is in an early stage. I'm not sure Mark Miller has time to work on it due to his involvement in SolrCloud development. The Luke handler in Solr is a completely different code base, and it shares only the name with the Luke application. -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. ___**_ [___||.__|__/|__||\/|: Information Retrieval, System Integration ___|||__||..\|..||..|: Contact:
Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module
My personal thoughts/preferences/suggestions for Luke: 1. Need a clean Luke Java library – heavily unit-tested. As integrated with Lucene as possible. 2. A simple command line interface – always useful. 3. A Solr plugin handler – based on #1. Good for apps as well as Admin UI. Nice to be able to curl a request to look at a specific doc, for example. 4. GUI fully integrated with the new Solr Web Admin UI. A separate UI... sucks. 5. Any additional, un-untegrated GUI is icing on the cake and not really desirable for Solr. May be great for Elasticsearch and other Lucene-based apps, but Solr should be the #1 priority – after #1 and #2 above. -- Jack Krupansky From: Dmitry Kan Sent: Monday, July 15, 2013 8:54 AM To: dev@lucene.apache.org Subject: Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module Hello guys, Indeed, the GWT port is work in progress and far from done. The driving factor here was to be able to later integrate luke into the solr admin as well as have the standalone webapp for non-solr users. There is (was?) a luke stats handler in the solr ui, that printed some stats on the index. That could be substituted with the GWT app. The code isn't yet ready to see the light. So if it makes more sense for Ajay to work on the existing jira with the Apache Pivot implementation, I would say go ahead. In the current port effort (the aforementioned github's fork) the UI is the original one, developed by Andrzej. Beside the UI rework there is plenty things to port / verify (like e.g. Hadoop plugin) against the latest lucene versions. See the readme.md: https://github.com/dmitrykey/luke Whichever way's taken, hopefully we end up having stable releases of luke :) Dmitry Kan On 14 July 2013 22:38, Andrzej Bialecki a...@getopt.org wrote: On 7/14/13 5:04 AM, Ajay Bhat wrote: Shawn and Andrzej, Thanks for answering my questions. I've looked over the code done by Dmitry and I'll look into what I can do to help with the UI porting in future. I was actually thinking of doing this JIRA as a project by myself with some assistance from the community after getting a mentor for the ASF ICFOSS program, which I haven't found yet. It would be great if I could get one of you guys as a mentor. As the UI work has been mostly done by others like Dmitry Kan, I don't think I need to work on that majorly for now. It's far from done - he just started the process. What other work is there to be done that I can do as a project? Any new features or improvements? Regards, Ajay On Jul 14, 2013 1:54 AM, Andrzej Bialecki a...@getopt.org mailto:a...@getopt.org wrote: On 7/13/13 8:56 PM, Shawn Heisey wrote: On 7/13/2013 3:15 AM, Ajay Bhat wrote: One more question : What version of Lucene does Luke currently support right now? I saw a comment on the issue page that it doesn't support the Lucene 4.1 and 4.2 trunk. The official Luke project only has versions up through 4.0.0-ALPHA. http://code.google.com/p/luke/ There is a forked project that has produced Luke for newer Lucene versions. https://java.net/projects/__opengrok/downloads https://java.net/projects/opengrok/downloads I can't seem to locate any information about how they have licensed the newer versions, and I'm not really sure where the source code is living. Regarding a question you asked earlier, Luke is a standalone program. It does include Lucene classes in the lukeall version of the executable jar. Luke may have some uses as a library, but I think that most people run it separately. There is partial Luke functionality embedded in the Solr admin UI, but I don't know whether that is something cooked up by Solr devs or if it shares actual code with Luke. Ajay, Luke is a standalone GUI application, not a library. It uses a custom version of Thinlet GUI toolkit, which is no longer maintained, and it's LGPL licensed, so Luke can't be contributed to the Lucene project as is. Recently several people expressed interest in porting Luke to some other GUI toolkit that is Apache-friendly. See the discussion here: http://groups.google.com/d/__msg/luke-discuss/S_Whwg2jwmA/__9JgqKIe5aiwJ http://groups.google.com/d/msg/luke-discuss/S_Whwg2jwmA/9JgqKIe5aiwJ In particular, there's a fork by Dmitry Kan - he plans to integrate other patches and forks, and to port Luke from Thinlet to GWT and sync it with the latest version of Lucene. I think you should coordinate your efforts with him and other
[jira] [Assigned] (SOLR-5040) SnapShooter doesn't create a lock as it runs
[ https://issues.apache.org/jira/browse/SOLR-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul reassigned SOLR-5040: Assignee: Noble Paul SnapShooter doesn't create a lock as it runs Key: SOLR-5040 URL: https://issues.apache.org/jira/browse/SOLR-5040 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Mark Triggs Assignee: Noble Paul Priority: Trivial Attachments: snapshooter-locking.diff Hi there, While messing around with the replication handler recently, I noticed that the snapshooter didn't seem to be writing a lock file. I had a look at the SnapShooter.java code, and to my untrained eye it seemed like it was creating a Lock object but never actually taking a lock. I modified my local installation to use lock.obtain() instead of lock.isLocked() and verified that I'm now seeing lock files. I've attached a very small patch just in case this is a genuine bug. Cheers, Mark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5040) SnapShooter doesn't create a lock as it runs
[ https://issues.apache.org/jira/browse/SOLR-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708450#comment-13708450 ] Noble Paul commented on SOLR-5040: -- multiple snapshots running in parallel should be just fine. They are just going to be created with different file names. But I don't think the snapshooter is smart enough to check if there sis a copy of the index with the same indexversion. The snapshoot process itself is async .There should be a way to poll and get the status of an ongoing snapshoot (if any) SnapShooter doesn't create a lock as it runs Key: SOLR-5040 URL: https://issues.apache.org/jira/browse/SOLR-5040 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Mark Triggs Assignee: Noble Paul Priority: Trivial Attachments: snapshooter-locking.diff Hi there, While messing around with the replication handler recently, I noticed that the snapshooter didn't seem to be writing a lock file. I had a look at the SnapShooter.java code, and to my untrained eye it seemed like it was creating a Lock object but never actually taking a lock. I modified my local installation to use lock.obtain() instead of lock.isLocked() and verified that I'm now seeing lock files. I've attached a very small patch just in case this is a genuine bug. Cheers, Mark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5040) SnapShooter doesn't create a lock as it runs
[ https://issues.apache.org/jira/browse/SOLR-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708451#comment-13708451 ] Mark Miller commented on SOLR-5040: --- bq. There should be a way to poll and get the status of an ongoing snapshoot I think that's a fine feature, but less useful than offering the option to have the call wait to return until it's done. SnapShooter doesn't create a lock as it runs Key: SOLR-5040 URL: https://issues.apache.org/jira/browse/SOLR-5040 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Mark Triggs Assignee: Noble Paul Priority: Trivial Attachments: snapshooter-locking.diff Hi there, While messing around with the replication handler recently, I noticed that the snapshooter didn't seem to be writing a lock file. I had a look at the SnapShooter.java code, and to my untrained eye it seemed like it was creating a Lock object but never actually taking a lock. I modified my local installation to use lock.obtain() instead of lock.isLocked() and verified that I'm now seeing lock files. I've attached a very small patch just in case this is a genuine bug. Cheers, Mark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3359) SynonymFilterFactory should accept fieldType attribute rather than tokenizerFactory
[ https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708459#comment-13708459 ] Koji Sekiguchi commented on SOLR-3359: -- When I opened the ticket, I thought SynonymFilterFactory should accept (Solr's) fieldType attribute as I told in the title. But today, as SynonymFilterFactory is in Lucene land, I think analyzer attribute is more natural than (Solr's) fieldType attribute. I'd like to commit the patch in a few days if no one objects. SynonymFilterFactory should accept fieldType attribute rather than tokenizerFactory --- Key: SOLR-3359 URL: https://issues.apache.org/jira/browse/SOLR-3359 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Koji Sekiguchi Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch I've not been realized that CJKTokenizer and its factory classes was marked deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me. {code} * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead. {code} I agree with the idea of using the chain of the Tokenizer and TokenFilters instead of CJKTokenizer, but it could be a problem for the existing users of SynonymFilterFactory with CJKTokenizerFactory. So this ticket comes to my mind again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708465#comment-13708465 ] Michael McCandless commented on LUCENE-3069: The new code on the branch looks great! I can't wait to see perf results after we implement .intersect().. Some small stuff in TempFSTTermsReader.java: * In next(), when we handle seekPending=true, I think we should assert that the seekCeil returned SeekStatus.FOUND? Ie, it's not possible to seekExact(TermState) to a term that doesn't exist. * useCache is an ancient option from back when we had a terms dict cache; we long ago removed it ... I think we should remove useCache parameter too? * It's silly that fstEnum.seekCeil doesn't return a status, ie that we must re-compare the term we got to differentiate FOUND vs NOT_FOUND ... so we lose some perf here. But this is just a future TODO ... * nocommit: this method doesn't act as 'seekExact' right? -- not sure why this is here; seekExact is working as it should I think. * Maybe instead of term and meta members, we could just hold the current pair? In TempTermOutputs.java: * longsSize, hasPos can be final? (Same with TempMetaData's fields) * TempMetaData.hashCode() doesn't mix in docFreq/tTF? * It doesn't impl equals (must it really impl hashCode?) Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708479#comment-13708479 ] Michael McCandless commented on LUCENE-4845: bq. I guess, there should be an AnalyzingInfixLookupFactory in Solr as well? I agree ... but this can be done separately. Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708484#comment-13708484 ] Shai Erera commented on LUCENE-4845: Mike, will you still commit it to 4.4? I think that the branch was created prematurely as there's still no resolution on whether to release or not. And this feature is pretty isolated to cause any instability ... it'd be a petty to have to wait with releasing it another 3-4 months just because of technicalities... Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708486#comment-13708486 ] Han Jiang commented on LUCENE-3069: --- bq. I think we should assert that the seekCeil returned SeekStatus.FOUND? Ok! I'll commit that. bq. useCache is an ancient option from back when we had a terms dict cache Yes, I suppose is is not 'clear' to have this parameter. bq. seekExact is working as it should I think. Currently, I think those 'seek' methods are supposed to change the enum pointer based on input term string, and fetch related metadata from term dict. However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState to enum, which doesn't actually operate 'seek' on dictionary. bq. Maybe instead of term and meta members, we could just hold the current pair? Oh, yes, I once thought about this, but not sure: like, can the callee always makes sure that, when 'term()' is called, it will always return a valid term? The codes in MemoryPF just return 'pair.output' regardless whether pair==null, is it safe? bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF? Oops! thanks, nice catch! Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708486#comment-13708486 ] Han Jiang edited comment on LUCENE-3069 at 7/15/13 2:20 PM: bq. I think we should assert that the seekCeil returned SeekStatus.FOUND? Ok! I'll commit that. bq. useCache is an ancient option from back when we had a terms dict cache Yes, I suppose is is not 'clear' to have this parameter. bq. seekExact is working as it should I think. Currently, I think those 'seek' methods are supposed to change the enum pointer based on input term string, and fetch related metadata from term dict. However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState to enum, which doesn't actually operate 'seek' on dictionary. bq. Maybe instead of term and meta members, we could just hold the current pair? Oh, yes, I once thought about this, but not sure: like, can the callee always makes sure that, when 'term()' is called, it will always return a valid term? The codes in MemoryPF just return 'pair.output' regardless whether pair==null, is it safe? bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF? Oops! thanks, nice catch! bq. It doesn't impl equals (must it really impl hashCode?) Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether to nodes can be 'merged'. was (Author: billy): bq. I think we should assert that the seekCeil returned SeekStatus.FOUND? Ok! I'll commit that. bq. useCache is an ancient option from back when we had a terms dict cache Yes, I suppose is is not 'clear' to have this parameter. bq. seekExact is working as it should I think. Currently, I think those 'seek' methods are supposed to change the enum pointer based on input term string, and fetch related metadata from term dict. However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState to enum, which doesn't actually operate 'seek' on dictionary. bq. Maybe instead of term and meta members, we could just hold the current pair? Oh, yes, I once thought about this, but not sure: like, can the callee always makes sure that, when 'term()' is called, it will always return a valid term? The codes in MemoryPF just return 'pair.output' regardless whether pair==null, is it safe? bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF? Oops! thanks, nice catch! Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708486#comment-13708486 ] Han Jiang edited comment on LUCENE-3069 at 7/15/13 2:20 PM: bq. I think we should assert that the seekCeil returned SeekStatus.FOUND? Ok! I'll commit that. bq. useCache is an ancient option from back when we had a terms dict cache Yes, I suppose is is not 'clear' to have this parameter. bq. seekExact is working as it should I think. Currently, I think those 'seek' methods are supposed to change the enum pointer based on input term string, and fetch related metadata from term dict. However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState to enum, which doesn't actually operate 'seek' on dictionary. bq. Maybe instead of term and meta members, we could just hold the current pair? Oh, yes, I once thought about this, but not sure: like, can the callee always makes sure that, when 'term()' is called, it will always return a valid term? The codes in MemoryPF just return 'pair.output' regardless whether pair==null, is it safe? bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF? Oops! thanks, nice catch! bq. It doesn't impl equals (must it really impl hashCode?) Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether two fst nodes can be 'merged'. was (Author: billy): bq. I think we should assert that the seekCeil returned SeekStatus.FOUND? Ok! I'll commit that. bq. useCache is an ancient option from back when we had a terms dict cache Yes, I suppose is is not 'clear' to have this parameter. bq. seekExact is working as it should I think. Currently, I think those 'seek' methods are supposed to change the enum pointer based on input term string, and fetch related metadata from term dict. However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState to enum, which doesn't actually operate 'seek' on dictionary. bq. Maybe instead of term and meta members, we could just hold the current pair? Oh, yes, I once thought about this, but not sure: like, can the callee always makes sure that, when 'term()' is called, it will always return a valid term? The codes in MemoryPF just return 'pair.output' regardless whether pair==null, is it safe? bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF? Oops! thanks, nice catch! bq. It doesn't impl equals (must it really impl hashCode?) Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether to nodes can be 'merged'. Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708486#comment-13708486 ] Han Jiang edited comment on LUCENE-3069 at 7/15/13 2:35 PM: bq. I think we should assert that the seekCeil returned SeekStatus.FOUND? Ok! I'll commit that. bq. useCache is an ancient option from back when we had a terms dict cache Yes, I suppose is is not 'clear' to have this parameter. bq. seekExact is working as it should I think. Currently, I think those 'seek' methods are supposed to change the enum pointer based on input term string, and fetch related metadata from term dict. However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState to enum, which doesn't actually operate 'seek' on dictionary. bq. Maybe instead of term and meta members, we could just hold the current pair? Oh, yes, I once thought about this, but not sure: like, can the callee always makes sure that, when 'term()' is called, it will always return a valid term? The codes in MemoryPF just return 'pair.output' regardless whether pair==null, is it safe? bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF? Oops! thanks, nice catch! bq. It doesn't impl equals (must it really impl hashCode?) -Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether two fst nodes can be 'merged'.- Oops, I forgot it still relys on equals to make sure two instance really matches, ok, I'll add that. was (Author: billy): bq. I think we should assert that the seekCeil returned SeekStatus.FOUND? Ok! I'll commit that. bq. useCache is an ancient option from back when we had a terms dict cache Yes, I suppose is is not 'clear' to have this parameter. bq. seekExact is working as it should I think. Currently, I think those 'seek' methods are supposed to change the enum pointer based on input term string, and fetch related metadata from term dict. However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState to enum, which doesn't actually operate 'seek' on dictionary. bq. Maybe instead of term and meta members, we could just hold the current pair? Oh, yes, I once thought about this, but not sure: like, can the callee always makes sure that, when 'term()' is called, it will always return a valid term? The codes in MemoryPF just return 'pair.output' regardless whether pair==null, is it safe? bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF? Oops! thanks, nice catch! bq. It doesn't impl equals (must it really impl hashCode?) Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether two fst nodes can be 'merged'. Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields
[ https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708515#comment-13708515 ] ASF subversion and git services commented on SOLR-4894: --- Commit 1503275 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1503275 ] SOLR-4894: fix error message Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields -- Key: SOLR-4894 URL: https://issues.apache.org/jira/browse/SOLR-4894 Project: Solr Issue Type: New Feature Components: update Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4894.patch Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same chain will detect, parse and convert unknown fields’ {{String}}-typed values to the appropriate Java object type. This factory will take as configuration a set of mappings from Java object type to schema field type. {{ManagedIndexSchema.addFields()}} adds new fields to the schema. If schema addition fails for any field, addition is re-attempted only for those that don’t match any schema field. This process is repeated, either until all new fields are successfully added, or until there are no new fields (because the fields that were new when this update chain started its work were subsequently added by a different update request, possibly on a different node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields
[ https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708516#comment-13708516 ] ASF subversion and git services commented on SOLR-4894: --- Commit 1503277 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1503277 ] SOLR-4894: fix error message (merged trunk r1503275) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields -- Key: SOLR-4894 URL: https://issues.apache.org/jira/browse/SOLR-4894 Project: Solr Issue Type: New Feature Components: update Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4894.patch Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same chain will detect, parse and convert unknown fields’ {{String}}-typed values to the appropriate Java object type. This factory will take as configuration a set of mappings from Java object type to schema field type. {{ManagedIndexSchema.addFields()}} adds new fields to the schema. If schema addition fails for any field, addition is re-attempted only for those that don’t match any schema field. This process is repeated, either until all new fields are successfully added, or until there are no new fields (because the fields that were new when this update chain started its work were subsequently added by a different update request, possibly on a different node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields
[ https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708518#comment-13708518 ] Steve Rowe commented on SOLR-4894: -- bq. Found a copy/paste exception error Thanks Jack, you're right, committed fix to trunk, branch_4x and lucene_solr_4_4 branches. Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields -- Key: SOLR-4894 URL: https://issues.apache.org/jira/browse/SOLR-4894 Project: Solr Issue Type: New Feature Components: update Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4894.patch Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same chain will detect, parse and convert unknown fields’ {{String}}-typed values to the appropriate Java object type. This factory will take as configuration a set of mappings from Java object type to schema field type. {{ManagedIndexSchema.addFields()}} adds new fields to the schema. If schema addition fails for any field, addition is re-attempted only for those that don’t match any schema field. This process is repeated, either until all new fields are successfully added, or until there are no new fields (because the fields that were new when this update chain started its work were subsequently added by a different update request, possibly on a different node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields
[ https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708517#comment-13708517 ] ASF subversion and git services commented on SOLR-4894: --- Commit 1503278 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1503278 ] SOLR-4894: fix error message (merged trunk r1503275) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields -- Key: SOLR-4894 URL: https://issues.apache.org/jira/browse/SOLR-4894 Project: Solr Issue Type: New Feature Components: update Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4894.patch Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same chain will detect, parse and convert unknown fields’ {{String}}-typed values to the appropriate Java object type. This factory will take as configuration a set of mappings from Java object type to schema field type. {{ManagedIndexSchema.addFields()}} adds new fields to the schema. If schema addition fails for any field, addition is re-attempted only for those that don’t match any schema field. This process is repeated, either until all new fields are successfully added, or until there are no new fields (because the fields that were new when this update chain started its work were subsequently added by a different update request, possibly on a different node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 321 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/321/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=554, name=recoveryCmdExecutor-105-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=554, name=recoveryCmdExecutor-105-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) at __randomizedtesting.SeedInfo.seed([B5C6D59CDB4012CB]:0) FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: There are still zombie threads that couldn't be terminated:1) Thread[id=554, name=recoveryCmdExecutor-105-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module
I disagree with this completely. Solr is last priority On Jul 15, 2013 6:14 AM, Jack Krupansky j...@basetechnology.com wrote: My personal thoughts/preferences/suggestions for Luke: 1. Need a clean Luke Java library – heavily unit-tested. As integrated with Lucene as possible. 2. A simple command line interface – always useful. 3. A Solr plugin handler – based on #1. Good for apps as well as Admin UI. Nice to be able to curl a request to look at a specific doc, for example. 4. GUI fully integrated with the new Solr Web Admin UI. A separate UI... sucks. 5. Any additional, un-untegrated GUI is icing on the cake and not really desirable for Solr. May be great for Elasticsearch and other Lucene-based apps, but Solr should be the #1 priority – after #1 and #2 above. -- Jack Krupansky *From:* Dmitry Kan dmitry.luc...@gmail.com *Sent:* Monday, July 15, 2013 8:54 AM *To:* dev@lucene.apache.org *Subject:* Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module Hello guys, Indeed, the GWT port is work in progress and far from done. The driving factor here was to be able to later integrate luke into the solr admin as well as have the standalone webapp for non-solr users. There is (was?) a luke stats handler in the solr ui, that printed some stats on the index. That could be substituted with the GWT app. The code isn't yet ready to see the light. So if it makes more sense for Ajay to work on the existing jira with the Apache Pivot implementation, I would say go ahead. In the current port effort (the aforementioned github's fork) the UI is the original one, developed by Andrzej. Beside the UI rework there is plenty things to port / verify (like e.g. Hadoop plugin) against the latest lucene versions. See the readme.md: https://github.com/dmitrykey/luke Whichever way's taken, hopefully we end up having stable releases of luke :) Dmitry Kan On 14 July 2013 22:38, Andrzej Bialecki a...@getopt.org wrote: On 7/14/13 5:04 AM, Ajay Bhat wrote: Shawn and Andrzej, Thanks for answering my questions. I've looked over the code done by Dmitry and I'll look into what I can do to help with the UI porting in future. I was actually thinking of doing this JIRA as a project by myself with some assistance from the community after getting a mentor for the ASF ICFOSS program, which I haven't found yet. It would be great if I could get one of you guys as a mentor. As the UI work has been mostly done by others like Dmitry Kan, I don't think I need to work on that majorly for now. It's far from done - he just started the process. What other work is there to be done that I can do as a project? Any new features or improvements? Regards, Ajay On Jul 14, 2013 1:54 AM, Andrzej Bialecki a...@getopt.org mailto:a...@getopt.org wrote: On 7/13/13 8:56 PM, Shawn Heisey wrote: On 7/13/2013 3:15 AM, Ajay Bhat wrote: One more question : What version of Lucene does Luke currently support right now? I saw a comment on the issue page that it doesn't support the Lucene 4.1 and 4.2 trunk. The official Luke project only has versions up through 4.0.0-ALPHA. http://code.google.com/p/luke/ There is a forked project that has produced Luke for newer Lucene versions. https://java.net/projects/__**opengrok/downloadshttps://java.net/projects/__opengrok/downloads https://java.net/projects/**opengrok/downloadshttps://java.net/projects/opengrok/downloads I can't seem to locate any information about how they have licensed the newer versions, and I'm not really sure where the source code is living. Regarding a question you asked earlier, Luke is a standalone program. It does include Lucene classes in the lukeall version of the executable jar. Luke may have some uses as a library, but I think that most people run it separately. There is partial Luke functionality embedded in the Solr admin UI, but I don't know whether that is something cooked up by Solr devs or if it shares actual code with Luke. Ajay, Luke is a standalone GUI application, not a library. It uses a custom version of Thinlet GUI toolkit, which is no longer maintained, and it's LGPL licensed, so Luke can't be contributed to the Lucene project as is. Recently several people expressed interest in porting Luke to some other GUI toolkit that is Apache-friendly. See the discussion here: http://groups.google.com/d/__**msg/luke-discuss/S_Whwg2jwmA/_** _9JgqKIe5aiwJhttp://groups.google.com/d/__msg/luke-discuss/S_Whwg2jwmA/__9JgqKIe5aiwJ http://groups.google.com/d/**msg/luke-discuss/S_Whwg2jwmA/** 9JgqKIe5aiwJhttp://groups.google.com/d/msg/luke-discuss/S_Whwg2jwmA/9JgqKIe5aiwJ In
Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module
On 7/15/2013 9:15 AM, Robert Muir wrote: I disagree with this completely. Solr is last priority I'm on the Solr side of things, with only the tiniest knowledge or interest in hacking on Lucene. Despite that, I have to agree with Robert here. Let's make sure the Luke module is very solid and prove that we can keep it operational through 2-3 full minor release cycles before we try to integrate it into Solr. We already have luke functionality in the Solr UI. Compared to the real thing it might be a band-aid, but it works. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5039) Admin UI displays -1 for term count in multiValued fields
[ https://issues.apache.org/jira/browse/SOLR-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708553#comment-13708553 ] David Smiley commented on SOLR-5039: Erick, I am looking at CHANGES.txt on trunk and see you added this as a bug fix under 4.3.1. This issue shows it's fixed on 4.4. Which is it? Admin UI displays -1 for term count in multiValued fields - Key: SOLR-5039 URL: https://issues.apache.org/jira/browse/SOLR-5039 Project: Solr Issue Type: Bug Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-5039.patch I thought this had been a JIRA before, but I couldn't find it. Problem is that LukeRequestHandler.getDetailedFieldInfo gets the count by this line: tiq.distinctTerms = new Long(terms.size()).intValue(); which is -1 at least for multiValued fields. I'll attach a patch in a second that just counts things up. It worked last night, but it was late. I obviously don't understand what's up with MultiTerms.size() is hard-coded to return -1. Can anyone shed light on this? Or see the two-line change and see if it makes sense? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields
[ https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708557#comment-13708557 ] Mikhail Khludnev commented on SOLR-4894: Good shoot (into we know what), [~steve_rowe]! Is there a plan to support specifying fieldType alongside with field name? Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields -- Key: SOLR-4894 URL: https://issues.apache.org/jira/browse/SOLR-4894 Project: Solr Issue Type: New Feature Components: update Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4894.patch Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same chain will detect, parse and convert unknown fields’ {{String}}-typed values to the appropriate Java object type. This factory will take as configuration a set of mappings from Java object type to schema field type. {{ManagedIndexSchema.addFields()}} adds new fields to the schema. If schema addition fails for any field, addition is re-attempted only for those that don’t match any schema field. This process is repeated, either until all new fields are successfully added, or until there are no new fields (because the fields that were new when this update chain started its work were subsequently added by a different update request, possibly on a different node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields
[ https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708567#comment-13708567 ] Steve Rowe commented on SOLR-4894: -- bq. Is there a plan to support specifying fieldType alongside with field name? That's (indirectly/partially) possible now, in two ways: # Using dynamic fields, which encode fieldType via a field name prefix or suffix. # Using AddSchemaFieldsUpdateProcessor and sending doc updates via JSON - its typed values are mapped to fieldTypes in the ASFUPF config in solrconfig.xml. That said, it might be useful to include the capability you describe in the future. Though I haven't made plans to do so myself, patches are welcome! Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields -- Key: SOLR-4894 URL: https://issues.apache.org/jira/browse/SOLR-4894 Project: Solr Issue Type: New Feature Components: update Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4894.patch Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same chain will detect, parse and convert unknown fields’ {{String}}-typed values to the appropriate Java object type. This factory will take as configuration a set of mappings from Java object type to schema field type. {{ManagedIndexSchema.addFields()}} adds new fields to the schema. If schema addition fails for any field, addition is re-attempted only for those that don’t match any schema field. This process is repeated, either until all new fields are successfully added, or until there are no new fields (because the fields that were new when this update chain started its work were subsequently added by a different update request, possibly on a different node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken
[ https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708568#comment-13708568 ] Robert Muir commented on LUCENE-5112: - This can happen if a consumer is not calling reset(), either code pulling the tokens or a filter overrides reset but doesn't invoke the superclass reset to pass it down the chain. FilteringTokenFilter is double incrementing the position increment in incrementToken Key: LUCENE-5112 URL: https://issues.apache.org/jira/browse/LUCENE-5112 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: George Rhoten The following code from FilteringTokenFilter#incrementToken() seems wrong. {noformat} if (enablePositionIncrements) { int skippedPositions = 0; while (input.incrementToken()) { if (accept()) { if (skippedPositions != 0) { posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + skippedPositions); } return true; } skippedPositions += posIncrAtt.getPositionIncrement(); } } else { {noformat} The skippedPositions variable should probably be incremented by 1 instead of posIncrAtt.getPositionIncrement(). As it is, it seems to be double incrementing, which is a problem if your data is full of stop words and your position increment integer overflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-4.x-Java6 - Build # 1797 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java6/1797/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=2562, name=recoveryCmdExecutor-954-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=2562, name=recoveryCmdExecutor-954-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) at __randomizedtesting.SeedInfo.seed([4215CE0A1C54C0AE]:0) FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: There are still zombie threads that couldn't be terminated:1) Thread[id=2562, name=recoveryCmdExecutor-954-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields
[ https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708577#comment-13708577 ] Jack Krupansky commented on SOLR-4894: -- bq. support specifying fieldType alongside with field name Could you elaborate and provide an example? The new parse update processors can be used to give values a desired Java type, and then this Add Schema Fields update processor can map specific Java value types (optionally constrained by field names or field name regex patterns) to specific Solr field type names. So, what exactly is still missing? Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields -- Key: SOLR-4894 URL: https://issues.apache.org/jira/browse/SOLR-4894 Project: Solr Issue Type: New Feature Components: update Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4894.patch Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same chain will detect, parse and convert unknown fields’ {{String}}-typed values to the appropriate Java object type. This factory will take as configuration a set of mappings from Java object type to schema field type. {{ManagedIndexSchema.addFields()}} adds new fields to the schema. If schema addition fails for any field, addition is re-attempted only for those that don’t match any schema field. This process is repeated, either until all new fields are successfully added, or until there are no new fields (because the fields that were new when this update chain started its work were subsequently added by a different update request, possibly on a different node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken
[ https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708580#comment-13708580 ] Uwe Schindler commented on LUCENE-5112: --- The code is correct. As [~rcmuir] says - if you don't call reset() before consuming, the overflow might happen. FilteringTokenFilter is double incrementing the position increment in incrementToken Key: LUCENE-5112 URL: https://issues.apache.org/jira/browse/LUCENE-5112 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: George Rhoten The following code from FilteringTokenFilter#incrementToken() seems wrong. {noformat} if (enablePositionIncrements) { int skippedPositions = 0; while (input.incrementToken()) { if (accept()) { if (skippedPositions != 0) { posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + skippedPositions); } return true; } skippedPositions += posIncrAtt.getPositionIncrement(); } } else { {noformat} The skippedPositions variable should probably be incremented by 1 instead of posIncrAtt.getPositionIncrement(). As it is, it seems to be double incrementing, which is a problem if your data is full of stop words and your position increment integer overflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708486#comment-13708486 ] Han Jiang edited comment on LUCENE-3069 at 7/15/13 4:09 PM: bq. I think we should assert that the seekCeil returned SeekStatus.FOUND? Ok! I'll commit that. bq. useCache is an ancient option from back when we had a terms dict cache Yes, I suppose is is not 'clear' to have this parameter. bq. seekExact is working as it should I think. Currently, I think those 'seek' methods are supposed to change the enum pointer based on input term string, and fetch related metadata from term dict. However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState to enum, which doesn't actually operate 'seek' on dictionary. bq. Maybe instead of term and meta members, we could just hold the current pair? Oh, yes, I once thought about this, but not sure: like, can the callee always makes sure that, when 'term()' is called, it will always return a valid term? The codes in MemoryPF just return 'pair.output' regardless whether pair==null, is it safe? bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF? Oops! thanks, nice catch! bq. It doesn't impl equals (must it really impl hashCode?) -Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether two fst nodes can be 'merged'.- Oops, I forgot it still relys on equals to make sure two instance really matches, ok, I'll add that. By the way, for real data, when two outputs are not 'NO_OUTPUT', even they contains the same metadata + stats, it seems to be very seldom that their arcs can be identical on FST (increases less than 1MB for wikimedium1m if equals always return false for non-singleton argument). Therefore... yes, hashCode() isn't necessary here. was (Author: billy): bq. I think we should assert that the seekCeil returned SeekStatus.FOUND? Ok! I'll commit that. bq. useCache is an ancient option from back when we had a terms dict cache Yes, I suppose is is not 'clear' to have this parameter. bq. seekExact is working as it should I think. Currently, I think those 'seek' methods are supposed to change the enum pointer based on input term string, and fetch related metadata from term dict. However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState to enum, which doesn't actually operate 'seek' on dictionary. bq. Maybe instead of term and meta members, we could just hold the current pair? Oh, yes, I once thought about this, but not sure: like, can the callee always makes sure that, when 'term()' is called, it will always return a valid term? The codes in MemoryPF just return 'pair.output' regardless whether pair==null, is it safe? bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF? Oops! thanks, nice catch! bq. It doesn't impl equals (must it really impl hashCode?) -Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether two fst nodes can be 'merged'.- Oops, I forgot it still relys on equals to make sure two instance really matches, ok, I'll add that. Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5113) Allow for packing the pending values of our AppendingLongBuffers
Adrien Grand created LUCENE-5113: Summary: Allow for packing the pending values of our AppendingLongBuffers Key: LUCENE-5113 URL: https://issues.apache.org/jira/browse/LUCENE-5113 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor When working with small arrays, the pending values might require substantial space. So we could allow for packing the pending values in order to save space, the drawback being that this operation will make the buffer read-only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708597#comment-13708597 ] Yonik Seeley commented on SOLR-3076: It seems like the implementation of AddUpdateCommand and AddBlockCommand have almost everything in common (or should... such as handing reordered delete-by-queries, etc). For the most part, the only difference will be what IndexWriter method is finally called. I'm considering just modifying AddUpdateCommand instead of having a separate AddBlockCommand, but I was wondering about the reasoning behind a separate command. Solr(Cloud) should support block joins -- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Yonik Seeley Fix For: 5.0, 4.4 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708618#comment-13708618 ] Michael McCandless commented on LUCENE-4845: bq. Mike, will you still commit it to 4.4? OK I'll commit shortly backport to 4.4 branch... Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState
[ https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708619#comment-13708619 ] ASF subversion and git services commented on LUCENE-5090: - Commit 1503327 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1503327 ] LUCENE-5090: catch mismatched readers in SortedSetDocValuesAccumulator/ReaderState SSDVA should detect a mismatch in the SSDVReaderState - Key: LUCENE-5090 URL: https://issues.apache.org/jira/browse/LUCENE-5090 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-5090.patch, LUCENE-5090.patch This is trappy today: every time you open a new reader, you must create a new SSDVReaderState (this computes the seg - global ord mapping), and pass that to SSDVA. But if this gets messed up (e.g. you pass an old SSDVReaderState) it will result in confusing AIOOBE, or silently invalid results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards
[ https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708621#comment-13708621 ] ASF subversion and git services commented on SOLR-4997: --- Commit 1503328 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1503328 ] SOLR-4997: Skip log recovery for sub shard leaders only The splitshard api doesn't call commit on new sub shards Key: SOLR-4997 URL: https://issues.apache.org/jira/browse/SOLR-4997 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.3, 4.3.1 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.4 Attachments: SOLR-4997.patch, SOLR-4997.patch The splitshard api doesn't call commit on new sub shards but it happily sets them to active state which means on a successful split, the documents are not visible to searchers unless an explicit commit is called on the cluster. The coreadmin split api will still not call commit on targetCores. That is by design and we're not going to change that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState
[ https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708623#comment-13708623 ] ASF subversion and git services commented on LUCENE-5090: - Commit 1503329 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1503329 ] LUCENE-5090: catch mismatched readers in SortedSetDocValuesAccumulator/ReaderState SSDVA should detect a mismatch in the SSDVReaderState - Key: LUCENE-5090 URL: https://issues.apache.org/jira/browse/LUCENE-5090 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-5090.patch, LUCENE-5090.patch This is trappy today: every time you open a new reader, you must create a new SSDVReaderState (this computes the seg - global ord mapping), and pass that to SSDVA. But if this gets messed up (e.g. you pass an old SSDVReaderState) it will result in confusing AIOOBE, or silently invalid results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards
[ https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708624#comment-13708624 ] ASF subversion and git services commented on SOLR-4997: --- Commit 1503331 from sha...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1503331 ] SOLR-4997: Skip log recovery for sub shard leaders only The splitshard api doesn't call commit on new sub shards Key: SOLR-4997 URL: https://issues.apache.org/jira/browse/SOLR-4997 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.3, 4.3.1 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.4 Attachments: SOLR-4997.patch, SOLR-4997.patch The splitshard api doesn't call commit on new sub shards but it happily sets them to active state which means on a successful split, the documents are not visible to searchers unless an explicit commit is called on the cluster. The coreadmin split api will still not call commit on targetCores. That is by design and we're not going to change that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards
[ https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708625#comment-13708625 ] ASF subversion and git services commented on SOLR-4997: --- Commit 1503332 from sha...@apache.org in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1503332 ] SOLR-4997: Skip log recovery for sub shard leaders only The splitshard api doesn't call commit on new sub shards Key: SOLR-4997 URL: https://issues.apache.org/jira/browse/SOLR-4997 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.3, 4.3.1 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.4 Attachments: SOLR-4997.patch, SOLR-4997.patch The splitshard api doesn't call commit on new sub shards but it happily sets them to active state which means on a successful split, the documents are not visible to searchers unless an explicit commit is called on the cluster. The coreadmin split api will still not call commit on targetCores. That is by design and we're not going to change that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState
[ https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-5090. Resolution: Fixed SSDVA should detect a mismatch in the SSDVReaderState - Key: LUCENE-5090 URL: https://issues.apache.org/jira/browse/LUCENE-5090 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-5090.patch, LUCENE-5090.patch This is trappy today: every time you open a new reader, you must create a new SSDVReaderState (this computes the seg - global ord mapping), and pass that to SSDVA. But if this gets messed up (e.g. you pass an old SSDVReaderState) it will result in confusing AIOOBE, or silently invalid results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState
[ https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708626#comment-13708626 ] ASF subversion and git services commented on LUCENE-5090: - Commit 150 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r150 ] LUCENE-5090: catch mismatched readers in SortedSetDocValuesAccumulator/ReaderState SSDVA should detect a mismatch in the SSDVReaderState - Key: LUCENE-5090 URL: https://issues.apache.org/jira/browse/LUCENE-5090 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-5090.patch, LUCENE-5090.patch This is trappy today: every time you open a new reader, you must create a new SSDVReaderState (this computes the seg - global ord mapping), and pass that to SSDVA. But if this gets messed up (e.g. you pass an old SSDVReaderState) it will result in confusing AIOOBE, or silently invalid results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5039) Admin UI displays -1 for term count in multiValued fields
[ https://issues.apache.org/jira/browse/SOLR-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708627#comment-13708627 ] ASF subversion and git services commented on SOLR-5039: --- Commit 1503335 from [~erickoerickson] in branch 'dev/trunk' [ https://svn.apache.org/r1503335 ] Moved SOLR-5039 to proper section Admin UI displays -1 for term count in multiValued fields - Key: SOLR-5039 URL: https://issues.apache.org/jira/browse/SOLR-5039 Project: Solr Issue Type: Bug Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-5039.patch I thought this had been a JIRA before, but I couldn't find it. Problem is that LukeRequestHandler.getDetailedFieldInfo gets the count by this line: tiq.distinctTerms = new Long(terms.size()).intValue(); which is -1 at least for multiValued fields. I'll attach a patch in a second that just counts things up. It worked last night, but it was late. I obviously don't understand what's up with MultiTerms.size() is hard-coded to return -1. Can anyone shed light on this? Or see the two-line change and see if it makes sense? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards
[ https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708629#comment-13708629 ] Shalin Shekhar Mangar commented on SOLR-4997: - I fixed a bug that I had introduced which skipped log recovery on startup for all leaders instead of only sub shard leaders. I caught this only because I was doing another line-by-line review of all my changes. We should have a test which catches such a condition. The splitshard api doesn't call commit on new sub shards Key: SOLR-4997 URL: https://issues.apache.org/jira/browse/SOLR-4997 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.3, 4.3.1 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.4 Attachments: SOLR-4997.patch, SOLR-4997.patch The splitshard api doesn't call commit on new sub shards but it happily sets them to active state which means on a successful split, the documents are not visible to searchers unless an explicit commit is called on the cluster. The coreadmin split api will still not call commit on targetCores. That is by design and we're not going to change that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5039) Admin UI displays -1 for term count in multiValued fields
[ https://issues.apache.org/jira/browse/SOLR-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708631#comment-13708631 ] ASF subversion and git services commented on SOLR-5039: --- Commit 1503336 from [~erickoerickson] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1503336 ] Moved SOLR-5039 to proper section Admin UI displays -1 for term count in multiValued fields - Key: SOLR-5039 URL: https://issues.apache.org/jira/browse/SOLR-5039 Project: Solr Issue Type: Bug Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-5039.patch I thought this had been a JIRA before, but I couldn't find it. Problem is that LukeRequestHandler.getDetailedFieldInfo gets the count by this line: tiq.distinctTerms = new Long(terms.size()).intValue(); which is -1 at least for multiValued fields. I'll attach a patch in a second that just counts things up. It worked last night, but it was late. I obviously don't understand what's up with MultiTerms.size() is hard-coded to return -1. Can anyone shed light on this? Or see the two-line change and see if it makes sense? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5039) Admin UI displays -1 for term count in multiValued fields
[ https://issues.apache.org/jira/browse/SOLR-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708634#comment-13708634 ] Erick Erickson commented on SOLR-5039: -- Crap! Whaddya want anyway? It's right above the 4.3 section... and immediately below the 4.3.1. It's a 4.4 fix. Fixing it up, thanks for catching! Admin UI displays -1 for term count in multiValued fields - Key: SOLR-5039 URL: https://issues.apache.org/jira/browse/SOLR-5039 Project: Solr Issue Type: Bug Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-5039.patch I thought this had been a JIRA before, but I couldn't find it. Problem is that LukeRequestHandler.getDetailedFieldInfo gets the count by this line: tiq.distinctTerms = new Long(terms.size()).intValue(); which is -1 at least for multiValued fields. I'll attach a patch in a second that just counts things up. It worked last night, but it was late. I obviously don't understand what's up with MultiTerms.size() is hard-coded to return -1. Can anyone shed light on this? Or see the two-line change and see if it makes sense? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5039) Admin UI displays -1 for term count in multiValued fields
[ https://issues.apache.org/jira/browse/SOLR-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708635#comment-13708635 ] ASF subversion and git services commented on SOLR-5039: --- Commit 1503338 from [~erickoerickson] in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1503338 ] Moved SOLR-5039 to proper section Admin UI displays -1 for term count in multiValued fields - Key: SOLR-5039 URL: https://issues.apache.org/jira/browse/SOLR-5039 Project: Solr Issue Type: Bug Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-5039.patch I thought this had been a JIRA before, but I couldn't find it. Problem is that LukeRequestHandler.getDetailedFieldInfo gets the count by this line: tiq.distinctTerms = new Long(terms.size()).intValue(); which is -1 at least for multiValued fields. I'll attach a patch in a second that just counts things up. It worked last night, but it was late. I obviously don't understand what's up with MultiTerms.size() is hard-coded to return -1. Can anyone shed light on this? Or see the two-line change and see if it makes sense? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-3069: -- Attachment: LUCENE-3069.patch Patch according to previous comments. We still somewhat need the existance of hashCode(), because in NodeHash, it will check whether the frozen node have the same hashcode with uncompiled node (NodeHash:128). Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708640#comment-13708640 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503340 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1503340 ] LUCENE-4845: add AnalyzingInfixSuggester Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708638#comment-13708638 ] Han Jiang edited comment on LUCENE-3069 at 7/15/13 5:08 PM: Patch according to previous comments. We still somewhat need the existance of hashCode(), because in NodeHash, it will check whether the frozen node have the same hashcode with uncompiled node (NodeHash.java:128). Although later, for nodes with outputs, it'll hardly find a same node from hashtable. was (Author: billy): Patch according to previous comments. We still somewhat need the existance of hashCode(), because in NodeHash, it will check whether the frozen node have the same hashcode with uncompiled node (NodeHash:128). Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (64bit/ibm-j9-jdk6) - Build # 6501 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6501/ Java: 64bit/ibm-j9-jdk6 -Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;} 1 tests failed. REGRESSION: org.apache.solr.core.TestJmxIntegration.testJmxRegistration Error Message: No SolrDynamicMBeans found Stack Trace: java.lang.AssertionError: No SolrDynamicMBeans found at __randomizedtesting.SeedInfo.seed([2387D7242E862648:AD56B31E43C77E2D]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.solr.core.TestJmxIntegration.testJmxRegistration(TestJmxIntegration.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:738) Build Log: [...truncated 8978 lines...] [junit4] Suite:
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708674#comment-13708674 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503356 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1503356 ] LUCENE-4845: add AnalyzingInfixSuggester Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5114) remove boolean useCache param from TermsEnum.seekCeil/Exact
Michael McCandless created LUCENE-5114: -- Summary: remove boolean useCache param from TermsEnum.seekCeil/Exact Key: LUCENE-5114 URL: https://issues.apache.org/jira/browse/LUCENE-5114 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.5 Long ago terms dict had a cache, but it was problematic and we removed it, but the API still has a relic boolean useCache ... I think we should drop it from the API as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708682#comment-13708682 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503359 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1503359 ] LUCENE-4845: add AnalyzingInfixSuggester Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-4845. Resolution: Fixed Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-4.x-Java7 - Build # 1413 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java7/1413/ 4 tests failed. REGRESSION: org.apache.solr.core.TestJmxIntegration.testJmxUpdate Error Message: No mbean found for SolrIndexSearcher Stack Trace: java.lang.AssertionError: No mbean found for SolrIndexSearcher at __randomizedtesting.SeedInfo.seed([81ED668B798E415A:978A54E1E958EAF1]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertFalse(Assert.java:68) at org.apache.solr.core.TestJmxIntegration.testJmxUpdate(TestJmxIntegration.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:722) REGRESSION: org.apache.solr.core.TestJmxIntegration.testJmxRegistration Error Message: No SolrDynamicMBeans found Stack Trace:
[jira] [Commented] (SOLR-2345) Extend geodist() to support MultiValued lat long field
[ https://issues.apache.org/jira/browse/SOLR-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708751#comment-13708751 ] David Smiley commented on SOLR-2345: By the way, geodist() handles a variety of invocation approaches, not all of which involve sfield. From the comments: {code} // m is a multi-value source, x is a single-value source // allow (m,m) (m,x,x) (x,x,m) (x,x,x,x) // if not enough points are present, pt will be checked first, followed by sfield. {code} Adapting geodist() to support RPT will only work with explicit use of sfield pt. Extend geodist() to support MultiValued lat long field -- Key: SOLR-2345 URL: https://issues.apache.org/jira/browse/SOLR-2345 Project: Solr Issue Type: New Feature Components: spatial Reporter: Bill Bell Assignee: David Smiley Fix For: 4.4 Attachments: SOLR-2345_geodist_refactor.patch Extend geodist() and {!geofilt} to support a multiValued lat,long field without using geohash. sort=geodist() asc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #387: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/387/ All tests passed Build Log: [...truncated 20509 lines...] [mvn] [INFO] - [mvn] [INFO] - [mvn] [ERROR] COMPILATION ERROR : [mvn] [INFO] - [...truncated 311 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708757#comment-13708757 ] Andrew Muldowney commented on SOLR-2894: Im working on this patch again, looking into the limit issue and the fact that exclusion tags aren't being respected. They both boil down to improperly formatted refinement requests, so I'm going through and cleaning those up to look more and more like the distributed field facet code. Should also have time to get to the datetime problem, where you cannot refine on datetimes because the datetime format returned by the shards is not queryable when refining. Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.4 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards
[ https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708776#comment-13708776 ] Mark Miller commented on SOLR-4997: --- bq. We should have a test which catches such a condition. yeah, scary. The splitshard api doesn't call commit on new sub shards Key: SOLR-4997 URL: https://issues.apache.org/jira/browse/SOLR-4997 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.3, 4.3.1 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.4 Attachments: SOLR-4997.patch, SOLR-4997.patch The splitshard api doesn't call commit on new sub shards but it happily sets them to active state which means on a successful split, the documents are not visible to searchers unless an explicit commit is called on the cluster. The coreadmin split api will still not call commit on targetCores. That is by design and we're not going to change that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5113) Allow for packing the pending values of our AppendingLongBuffers
[ https://issues.apache.org/jira/browse/LUCENE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708777#comment-13708777 ] Robert Muir commented on LUCENE-5113: - +1, the little 8KB pending buffers can really add up, e.g. if you have an OrdinalMap over 25 segments (with zero terms!), thats 200KB just for pending[]s. We could try to solve it in another way if it makes appending* complicated or would hurt performance, e.g. maybe this map could use some other packed ints api. There are a few other places using this buffer though: I think fieldcache term addresses, indexwriter consumers, not sure what else. Allow for packing the pending values of our AppendingLongBuffers Key: LUCENE-5113 URL: https://issues.apache.org/jira/browse/LUCENE-5113 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor When working with small arrays, the pending values might require substantial space. So we could allow for packing the pending values in order to save space, the drawback being that this operation will make the buffer read-only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module
My feeling is that what we need most is what I've been working on (surprise, surprise :) ) We need a simple Java app, very similar to the std Luke app. We need it to be Apache licensed all the way through. We need it to be fully integrated as a module. We need it to be straightforward enough that any of the Lucene/Solr committers can easily work on it and update it as API's change. GWT is probably a stretch for that goal - Apache Pivot is pretty straight forward though - for any reasonable Java developer. I picked it up in absolutely no time to build the thing from scratch - modifying it is 10 times easier. The backend code is all java, the layout and widgets all XML. I've been pushing towards that goal (over the years now) with Luke ALE (Apache Lucene Edition). It's not a straight port of Luke with thinlet to Luke with Apache Pivot - Luke has 90% of it's code in one huge class - I've already been working on modularizing that code as I've moved it over - not too heavily because that would have made it difficult to keep porting code, but a good start. Now that the majority of features have been moved over, it's probably easier to keep refactoring - which is needed, because another very important missing piece is unit tests - and good units tests will require even more refactoring of the code. I also think a GWT version - something that could probably run nicely with Solr - would be awesome. But way down the line in priority for me. We need something very close to Lucene that the committers will push up the hill as they push Lucene. - Mark On Jul 15, 2013, at 11:15 AM, Robert Muir rcm...@gmail.com wrote: I disagree with this completely. Solr is last priority On Jul 15, 2013 6:14 AM, Jack Krupansky j...@basetechnology.com wrote: My personal thoughts/preferences/suggestions for Luke: 1. Need a clean Luke Java library – heavily unit-tested. As integrated with Lucene as possible. 2. A simple command line interface – always useful. 3. A Solr plugin handler – based on #1. Good for apps as well as Admin UI. Nice to be able to curl a request to look at a specific doc, for example. 4. GUI fully integrated with the new Solr Web Admin UI. A separate UI... sucks. 5. Any additional, un-untegrated GUI is icing on the cake and not really desirable for Solr. May be great for Elasticsearch and other Lucene-based apps, but Solr should be the #1 priority – after #1 and #2 above. -- Jack Krupansky From: Dmitry Kan Sent: Monday, July 15, 2013 8:54 AM To: dev@lucene.apache.org Subject: Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module Hello guys, Indeed, the GWT port is work in progress and far from done. The driving factor here was to be able to later integrate luke into the solr admin as well as have the standalone webapp for non-solr users. There is (was?) a luke stats handler in the solr ui, that printed some stats on the index. That could be substituted with the GWT app. The code isn't yet ready to see the light. So if it makes more sense for Ajay to work on the existing jira with the Apache Pivot implementation, I would say go ahead. In the current port effort (the aforementioned github's fork) the UI is the original one, developed by Andrzej. Beside the UI rework there is plenty things to port / verify (like e.g. Hadoop plugin) against the latest lucene versions. See the readme.md: https://github.com/dmitrykey/luke Whichever way's taken, hopefully we end up having stable releases of luke :) Dmitry Kan On 14 July 2013 22:38, Andrzej Bialecki a...@getopt.org wrote: On 7/14/13 5:04 AM, Ajay Bhat wrote: Shawn and Andrzej, Thanks for answering my questions. I've looked over the code done by Dmitry and I'll look into what I can do to help with the UI porting in future. I was actually thinking of doing this JIRA as a project by myself with some assistance from the community after getting a mentor for the ASF ICFOSS program, which I haven't found yet. It would be great if I could get one of you guys as a mentor. As the UI work has been mostly done by others like Dmitry Kan, I don't think I need to work on that majorly for now. It's far from done - he just started the process. What other work is there to be done that I can do as a project? Any new features or improvements? Regards, Ajay On Jul 14, 2013 1:54 AM, Andrzej Bialecki a...@getopt.org mailto:a...@getopt.org wrote: On 7/13/13 8:56 PM, Shawn Heisey wrote: On 7/13/2013 3:15 AM, Ajay Bhat wrote: One more question : What version of Lucene does Luke currently support right now? I saw a comment on the issue page that it doesn't support the Lucene 4.1 and 4.2 trunk. The official Luke project only has versions up through 4.0.0-ALPHA. http://code.google.com/p/luke/ There is a forked project that has
[JENKINS] Lucene-Solr-Tests-trunk-Java7 - Build # 4145 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java7/4145/ 1 tests failed. REGRESSION: org.apache.lucene.facet.search.TestDrillSideways.testRandom Error Message: the SortedSetDocValuesReaderState provided to this class does not match the reader being searched; you must create a new SortedSetDocValuesReaderState every time you open a new IndexReader Stack Trace: java.lang.IllegalStateException: the SortedSetDocValuesReaderState provided to this class does not match the reader being searched; you must create a new SortedSetDocValuesReaderState every time you open a new IndexReader at __randomizedtesting.SeedInfo.seed([A42710A7EC312939:D66B35A85D519F4A]:0) at org.apache.lucene.facet.sortedset.SortedSetDocValuesAccumulator$1.aggregate(SortedSetDocValuesAccumulator.java:102) at org.apache.lucene.facet.sortedset.SortedSetDocValuesAccumulator.accumulate(SortedSetDocValuesAccumulator.java:210) at org.apache.lucene.facet.search.FacetsCollector.getFacetResults(FacetsCollector.java:214) at org.apache.lucene.facet.search.DrillSideways.search(DrillSideways.java:296) at org.apache.lucene.facet.search.DrillSideways.search(DrillSideways.java:417) at org.apache.lucene.facet.search.TestDrillSideways.testRandom(TestDrillSideways.java:810) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at
Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module
Hi all, The most pressing issue is that I need a mentor for this project by Wednesday, 17th July 2013 if I'm to do it for the ASF-ICFOSS program [1]. Currently I've not found any mentors. Would anyone here please consent to be a mentor for this project so I can include you in my proposal? For the project I've decided to use Apache Pivot and familiarize myself with it, going throught the tutorials ASAP There's some more questions I have: 1. The original version by Andrzej [2] I have checked out in Eclipse but I can't run it. It's all under mainly a huge Luke.java file. Just want to check that the UI is same as that in the sandboxed version in Lucene. 2. There are various plugins that require Luke.java to be imported. But there's also a Shell.java plugin [3] that doesn't need any import needed. Does this mean it can be ported directly or is it kept for future improvement? If its the latter I guess a CMD Interface suggested by Jack Krupansky could be implemented using this class. [1] http://community.apache.org/mentoringprogramme-icfoss-pilot.html [2] https://code.google.com/p/luke/ [3] org.getopt.luke.plugins.Shell On Mon, Jul 15, 2013 at 9:03 PM, Shawn Heisey s...@elyograg.org wrote: On 7/15/2013 9:15 AM, Robert Muir wrote: I disagree with this completely. Solr is last priority I'm on the Solr side of things, with only the tiniest knowledge or interest in hacking on Lucene. Despite that, I have to agree with Robert here. Let's make sure the Luke module is very solid and prove that we can keep it operational through 2-3 full minor release cycles before we try to integrate it into Solr. We already have luke functionality in the Solr UI. Compared to the real thing it might be a band-aid, but it works. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module
Two more questions : 1. How much of the original Luke.java has yet to be modularised? 2. What are the new APIs in Lucene 4.1 and 4.2 that need immediate attention to be updated? On Tue, Jul 16, 2013 at 12:15 AM, Ajay Bhat a.ajay.b...@gmail.com wrote: Hi all, The most pressing issue is that I need a mentor for this project by Wednesday, 17th July 2013 if I'm to do it for the ASF-ICFOSS program [1]. Currently I've not found any mentors. Would anyone here please consent to be a mentor for this project so I can include you in my proposal? For the project I've decided to use Apache Pivot and familiarize myself with it, going throught the tutorials ASAP There's some more questions I have: 1. The original version by Andrzej [2] I have checked out in Eclipse but I can't run it. It's all under mainly a huge Luke.java file. Just want to check that the UI is same as that in the sandboxed version in Lucene. 2. There are various plugins that require Luke.java to be imported. But there's also a Shell.java plugin [3] that doesn't need any import needed. Does this mean it can be ported directly or is it kept for future improvement? If its the latter I guess a CMD Interface suggested by Jack Krupansky could be implemented using this class. [1] http://community.apache.org/mentoringprogramme-icfoss-pilot.html [2] https://code.google.com/p/luke/ [3] org.getopt.luke.plugins.Shell On Mon, Jul 15, 2013 at 9:03 PM, Shawn Heisey s...@elyograg.org wrote: On 7/15/2013 9:15 AM, Robert Muir wrote: I disagree with this completely. Solr is last priority I'm on the Solr side of things, with only the tiniest knowledge or interest in hacking on Lucene. Despite that, I have to agree with Robert here. Let's make sure the Luke module is very solid and prove that we can keep it operational through 2-3 full minor release cycles before we try to integrate it into Solr. We already have luke functionality in the Solr UI. Compared to the real thing it might be a band-aid, but it works. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-Tests-trunk-Java7 - Build # 4145 - Still Failing
I'll dig. Mike McCandless http://blog.mikemccandless.com On Mon, Jul 15, 2013 at 2:40 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java7/4145/ 1 tests failed. REGRESSION: org.apache.lucene.facet.search.TestDrillSideways.testRandom Error Message: the SortedSetDocValuesReaderState provided to this class does not match the reader being searched; you must create a new SortedSetDocValuesReaderState every time you open a new IndexReader Stack Trace: java.lang.IllegalStateException: the SortedSetDocValuesReaderState provided to this class does not match the reader being searched; you must create a new SortedSetDocValuesReaderState every time you open a new IndexReader at __randomizedtesting.SeedInfo.seed([A42710A7EC312939:D66B35A85D519F4A]:0) at org.apache.lucene.facet.sortedset.SortedSetDocValuesAccumulator$1.aggregate(SortedSetDocValuesAccumulator.java:102) at org.apache.lucene.facet.sortedset.SortedSetDocValuesAccumulator.accumulate(SortedSetDocValuesAccumulator.java:210) at org.apache.lucene.facet.search.FacetsCollector.getFacetResults(FacetsCollector.java:214) at org.apache.lucene.facet.search.DrillSideways.search(DrillSideways.java:296) at org.apache.lucene.facet.search.DrillSideways.search(DrillSideways.java:417) at org.apache.lucene.facet.search.TestDrillSideways.testRandom(TestDrillSideways.java:810) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at
[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken
[ https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708810#comment-13708810 ] George Rhoten commented on LUCENE-5112: --- Calling clearAttributes() at the start of incrementToken() in our custom Tokenizer seems to resolve this issue too. It would be helpful if the purpose of clearAttributes() in incrementToken() for a typical tokenizer was made clearer. This part of the API contract is not very clear. FilteringTokenFilter is double incrementing the position increment in incrementToken Key: LUCENE-5112 URL: https://issues.apache.org/jira/browse/LUCENE-5112 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: George Rhoten The following code from FilteringTokenFilter#incrementToken() seems wrong. {noformat} if (enablePositionIncrements) { int skippedPositions = 0; while (input.incrementToken()) { if (accept()) { if (skippedPositions != 0) { posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + skippedPositions); } return true; } skippedPositions += posIncrAtt.getPositionIncrement(); } } else { {noformat} The skippedPositions variable should probably be incremented by 1 instead of posIncrAtt.getPositionIncrement(). As it is, it seems to be double incrementing, which is a problem if your data is full of stop words and your position increment integer overflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState
[ https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708823#comment-13708823 ] ASF subversion and git services commented on LUCENE-5090: - Commit 1503423 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1503423 ] LUCENE-5090: fix test bug that was using mismatched readers when faceting with SortedSetDVs SSDVA should detect a mismatch in the SSDVReaderState - Key: LUCENE-5090 URL: https://issues.apache.org/jira/browse/LUCENE-5090 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-5090.patch, LUCENE-5090.patch This is trappy today: every time you open a new reader, you must create a new SSDVReaderState (this computes the seg - global ord mapping), and pass that to SSDVA. But if this gets messed up (e.g. you pass an old SSDVReaderState) it will result in confusing AIOOBE, or silently invalid results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState
[ https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708826#comment-13708826 ] ASF subversion and git services commented on LUCENE-5090: - Commit 1503424 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1503424 ] LUCENE-5090: fix test bug that was using mismatched readers when faceting with SortedSetDVs SSDVA should detect a mismatch in the SSDVReaderState - Key: LUCENE-5090 URL: https://issues.apache.org/jira/browse/LUCENE-5090 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-5090.patch, LUCENE-5090.patch This is trappy today: every time you open a new reader, you must create a new SSDVReaderState (this computes the seg - global ord mapping), and pass that to SSDVA. But if this gets messed up (e.g. you pass an old SSDVReaderState) it will result in confusing AIOOBE, or silently invalid results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState
[ https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708827#comment-13708827 ] ASF subversion and git services commented on LUCENE-5090: - Commit 1503425 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1503425 ] LUCENE-5090: fix test bug that was using mismatched readers when faceting with SortedSetDVs SSDVA should detect a mismatch in the SSDVReaderState - Key: LUCENE-5090 URL: https://issues.apache.org/jira/browse/LUCENE-5090 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-5090.patch, LUCENE-5090.patch This is trappy today: every time you open a new reader, you must create a new SSDVReaderState (this computes the seg - global ord mapping), and pass that to SSDVA. But if this gets messed up (e.g. you pass an old SSDVReaderState) it will result in confusing AIOOBE, or silently invalid results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken
[ https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708829#comment-13708829 ] Uwe Schindler commented on LUCENE-5112: --- bq. Calling clearAttributes() at the start of incrementToken() in our custom Tokenizer seems to resolve this issue too. This is mandatory, yes. If you don't do this ugly things can happen. I would suggest that you use BaseTokenStreamTestCase as base class for your tokenizer/tokenfilter tests. This class is part of the lucene-test framework. It will detect such errors. FilteringTokenFilter is double incrementing the position increment in incrementToken Key: LUCENE-5112 URL: https://issues.apache.org/jira/browse/LUCENE-5112 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: George Rhoten The following code from FilteringTokenFilter#incrementToken() seems wrong. {noformat} if (enablePositionIncrements) { int skippedPositions = 0; while (input.incrementToken()) { if (accept()) { if (skippedPositions != 0) { posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + skippedPositions); } return true; } skippedPositions += posIncrAtt.getPositionIncrement(); } } else { {noformat} The skippedPositions variable should probably be incremented by 1 instead of posIncrAtt.getPositionIncrement(). As it is, it seems to be double incrementing, which is a problem if your data is full of stop words and your position increment integer overflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5101) make it easier to plugin different bitset implementations to CachingWrapperFilter
[ https://issues.apache.org/jira/browse/LUCENE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708830#comment-13708830 ] Paul Elschot commented on LUCENE-5101: -- I had another look at the recent benchmark results and something does not seem in order there. At density -2 (1%), Elias-Fano is faster at advance(docID() +1) (2.45 times fixed) than at nextDoc() (1.81 times fixed), and I'd the FixedBitSet should have an almost equal run times for advance(docId()+1) and nextDoc(). The code for advance (advanceToValue in EliasFanoDecoder) is really more complex than the code for nextDoc (nextValue in EliasFanoDecoder) and the code at EliasFanoDocIdSet is so simple that it should not really influence things here. So for EliasFanoDocIdSet advance(docId() + 1) should normally be slower than nextDoc(), but the benchmark contradicts this. Could there be a mistake in the benchmark for these cases? Or is this within expected (JIT) tolerances? make it easier to plugin different bitset implementations to CachingWrapperFilter - Key: LUCENE-5101 URL: https://issues.apache.org/jira/browse/LUCENE-5101 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5101.patch Currently this is possible, but its not so friendly: {code} protected DocIdSet docIdSetToCache(DocIdSet docIdSet, AtomicReader reader) throws IOException { if (docIdSet == null) { // this is better than returning null, as the nonnull result can be cached return EMPTY_DOCIDSET; } else if (docIdSet.isCacheable()) { return docIdSet; } else { final DocIdSetIterator it = docIdSet.iterator(); // null is allowed to be returned by iterator(), // in this case we wrap with the sentinel set, // which is cacheable. if (it == null) { return EMPTY_DOCIDSET; } else { /* INTERESTING PART */ final FixedBitSet bits = new FixedBitSet(reader.maxDoc()); bits.or(it); return bits; /* END INTERESTING PART */ } } } {code} Is there any value to having all this other logic in the protected API? It seems like something thats not useful for a subclass... Maybe this stuff can become final, and INTERESTING PART calls a simpler method, something like: {code} protected DocIdSet cacheImpl(DocIdSetIterator iterator, AtomicReader reader) { final FixedBitSet bits = new FixedBitSet(reader.maxDoc()); bits.or(iterator); return bits; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5101) make it easier to plugin different bitset implementations to CachingWrapperFilter
[ https://issues.apache.org/jira/browse/LUCENE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708830#comment-13708830 ] Paul Elschot edited comment on LUCENE-5101 at 7/15/13 7:04 PM: --- I had another look at the recent benchmark results and something does not seem in order there. At density -2 (1%), Elias-Fano is faster at advance(docID() +1) (2.45 times fixed) than at nextDoc() (1.81 times fixed), and I would expect that FixedBitSet would have an almost equal run times for advance(docId()+1) and nextDoc(). The code for advance (advanceToValue in EliasFanoDecoder) is really more complex than the code for nextDoc (nextValue in EliasFanoDecoder) and the code at EliasFanoDocIdSet is so simple that it should not really influence things here. So for EliasFanoDocIdSet advance(docId() + 1) should normally be slower than nextDoc(), but the benchmark contradicts this. Could there be a mistake in the benchmark for these cases? Or is this within expected (JIT) tolerances? was (Author: paul.elsc...@xs4all.nl): I had another look at the recent benchmark results and something does not seem in order there. At density -2 (1%), Elias-Fano is faster at advance(docID() +1) (2.45 times fixed) than at nextDoc() (1.81 times fixed), and I'd the FixedBitSet should have an almost equal run times for advance(docId()+1) and nextDoc(). The code for advance (advanceToValue in EliasFanoDecoder) is really more complex than the code for nextDoc (nextValue in EliasFanoDecoder) and the code at EliasFanoDocIdSet is so simple that it should not really influence things here. So for EliasFanoDocIdSet advance(docId() + 1) should normally be slower than nextDoc(), but the benchmark contradicts this. Could there be a mistake in the benchmark for these cases? Or is this within expected (JIT) tolerances? make it easier to plugin different bitset implementations to CachingWrapperFilter - Key: LUCENE-5101 URL: https://issues.apache.org/jira/browse/LUCENE-5101 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5101.patch Currently this is possible, but its not so friendly: {code} protected DocIdSet docIdSetToCache(DocIdSet docIdSet, AtomicReader reader) throws IOException { if (docIdSet == null) { // this is better than returning null, as the nonnull result can be cached return EMPTY_DOCIDSET; } else if (docIdSet.isCacheable()) { return docIdSet; } else { final DocIdSetIterator it = docIdSet.iterator(); // null is allowed to be returned by iterator(), // in this case we wrap with the sentinel set, // which is cacheable. if (it == null) { return EMPTY_DOCIDSET; } else { /* INTERESTING PART */ final FixedBitSet bits = new FixedBitSet(reader.maxDoc()); bits.or(it); return bits; /* END INTERESTING PART */ } } } {code} Is there any value to having all this other logic in the protected API? It seems like something thats not useful for a subclass... Maybe this stuff can become final, and INTERESTING PART calls a simpler method, something like: {code} protected DocIdSet cacheImpl(DocIdSetIterator iterator, AtomicReader reader) { final FixedBitSet bits = new FixedBitSet(reader.maxDoc()); bits.or(iterator); return bits; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken
[ https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler closed LUCENE-5112. - FilteringTokenFilter is double incrementing the position increment in incrementToken Key: LUCENE-5112 URL: https://issues.apache.org/jira/browse/LUCENE-5112 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: George Rhoten Assignee: Uwe Schindler The following code from FilteringTokenFilter#incrementToken() seems wrong. {noformat} if (enablePositionIncrements) { int skippedPositions = 0; while (input.incrementToken()) { if (accept()) { if (skippedPositions != 0) { posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + skippedPositions); } return true; } skippedPositions += posIncrAtt.getPositionIncrement(); } } else { {noformat} The skippedPositions variable should probably be incremented by 1 instead of posIncrAtt.getPositionIncrement(). As it is, it seems to be double incrementing, which is a problem if your data is full of stop words and your position increment integer overflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708856#comment-13708856 ] Mikhail Khludnev commented on SOLR-3076: [~ysee...@gmail.com]it's a ginger cake Solr(Cloud) should support block joins -- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Yonik Seeley Fix For: 5.0, 4.4 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.7.0_25) - Build # 3037 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3037/ Java: 32bit/jdk1.7.0_25 -client -XX:+UseSerialGC 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggesterTest Error Message: Resource in scope SUITE failed to close. Resource was registered from thread Thread[id=16, name=TEST-AnalyzingInfixSuggesterTest.testRandomMinPrefixLength-seed#[ECF9CF89952D6F7F], state=RUNNABLE, group=TGRP-AnalyzingInfixSuggesterTest], registration stack trace below. Stack Trace: com.carrotsearch.randomizedtesting.ResourceDisposalError: Resource in scope SUITE failed to close. Resource was registered from thread Thread[id=16, name=TEST-AnalyzingInfixSuggesterTest.testRandomMinPrefixLength-seed#[ECF9CF89952D6F7F], state=RUNNABLE, group=TGRP-AnalyzingInfixSuggesterTest], registration stack trace below. at java.lang.Thread.getStackTrace(Thread.java:1568) at com.carrotsearch.randomizedtesting.RandomizedContext.closeAtEnd(RandomizedContext.java:150) at org.apache.lucene.util.LuceneTestCase.closeAfterSuite(LuceneTestCase.java:545) at org.apache.lucene.util._TestUtil.getTempDir(_TestUtil.java:131) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggesterTest.testRandomMinPrefixLength(AnalyzingInfixSuggesterTest.java:116) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at
Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.7.0_25) - Build # 3037 - Failure!
I'll fix. Mike McCandless http://blog.mikemccandless.com On Mon, Jul 15, 2013 at 3:39 PM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3037/ Java: 32bit/jdk1.7.0_25 -client -XX:+UseSerialGC 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggesterTest Error Message: Resource in scope SUITE failed to close. Resource was registered from thread Thread[id=16, name=TEST-AnalyzingInfixSuggesterTest.testRandomMinPrefixLength-seed#[ECF9CF89952D6F7F], state=RUNNABLE, group=TGRP-AnalyzingInfixSuggesterTest], registration stack trace below. Stack Trace: com.carrotsearch.randomizedtesting.ResourceDisposalError: Resource in scope SUITE failed to close. Resource was registered from thread Thread[id=16, name=TEST-AnalyzingInfixSuggesterTest.testRandomMinPrefixLength-seed#[ECF9CF89952D6F7F], state=RUNNABLE, group=TGRP-AnalyzingInfixSuggesterTest], registration stack trace below. at java.lang.Thread.getStackTrace(Thread.java:1568) at com.carrotsearch.randomizedtesting.RandomizedContext.closeAtEnd(RandomizedContext.java:150) at org.apache.lucene.util.LuceneTestCase.closeAfterSuite(LuceneTestCase.java:545) at org.apache.lucene.util._TestUtil.getTempDir(_TestUtil.java:131) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggesterTest.testRandomMinPrefixLength(AnalyzingInfixSuggesterTest.java:116) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at
[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.7.0_25) - Build # 2988 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/2988/ Java: 64bit/jdk1.7.0_25 -XX:-UseCompressedOops -XX:+UseSerialGC 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggesterTest Error Message: Resource in scope SUITE failed to close. Resource was registered from thread Thread[id=55, name=TEST-AnalyzingInfixSuggesterTest.testRandomMinPrefixLength-seed#[B1D21B594838883B], state=RUNNABLE, group=TGRP-AnalyzingInfixSuggesterTest], registration stack trace below. Stack Trace: com.carrotsearch.randomizedtesting.ResourceDisposalError: Resource in scope SUITE failed to close. Resource was registered from thread Thread[id=55, name=TEST-AnalyzingInfixSuggesterTest.testRandomMinPrefixLength-seed#[B1D21B594838883B], state=RUNNABLE, group=TGRP-AnalyzingInfixSuggesterTest], registration stack trace below. at java.lang.Thread.getStackTrace(Thread.java:1568) at com.carrotsearch.randomizedtesting.RandomizedContext.closeAtEnd(RandomizedContext.java:150) at org.apache.lucene.util.LuceneTestCase.closeAfterSuite(LuceneTestCase.java:546) at org.apache.lucene.util._TestUtil.getTempDir(_TestUtil.java:125) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggesterTest.testRandomMinPrefixLength(AnalyzingInfixSuggesterTest.java:116) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at
[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields
[ https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708899#comment-13708899 ] Mikhail Khludnev commented on SOLR-4894: [~jkrupan] I'm aiming something different than modeling java types. What we have now with dynamic fields cloth_COLOR, shoe_COLOR, wristlet_COLOR. I prefer to don't bother with dynamic field wildcard, but just send: {wristlet:red, type:COLOR}, {shoe:brown, type:COLOR }, etc Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields -- Key: SOLR-4894 URL: https://issues.apache.org/jira/browse/SOLR-4894 Project: Solr Issue Type: New Feature Components: update Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4894.patch Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same chain will detect, parse and convert unknown fields’ {{String}}-typed values to the appropriate Java object type. This factory will take as configuration a set of mappings from Java object type to schema field type. {{ManagedIndexSchema.addFields()}} adds new fields to the schema. If schema addition fails for any field, addition is re-attempted only for those that don’t match any schema field. This process is repeated, either until all new fields are successfully added, or until there are no new fields (because the fields that were new when this update chain started its work were subsequently added by a different update request, possibly on a different node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 315 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/315/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=3869, name=recoveryCmdExecutor-1535-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=3869, name=recoveryCmdExecutor-1535-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) at __randomizedtesting.SeedInfo.seed([66AAD5F04D9BE4A2]:0) FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: There are still zombie threads that couldn't be terminated:1) Thread[id=3869, name=recoveryCmdExecutor-1535-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708937#comment-13708937 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503459 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1503459 ] LUCENE-4845: close tmp directory; fix test to catch un-closed files; add missing suggester.close() Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708939#comment-13708939 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503460 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1503460 ] LUCENE-4845: close tmp directory; fix test to catch un-closed files; add missing suggester.close() Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708934#comment-13708934 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503458 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1503458 ] LUCENE-4845: close tmp directory; fix test to catch un-closed files; add missing suggester.close() Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #909: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/909/ All tests passed Build Log: [...truncated 20111 lines...] [mvn] [INFO] - [mvn] [INFO] - [mvn] [ERROR] COMPILATION ERROR : [mvn] [INFO] - [...truncated 305 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3633) web UI reports an error if CoreAdminHandler says there are no SolrCores
[ https://issues.apache.org/jira/browse/SOLR-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3633: Attachment: SOLR-3633.patch web UI reports an error if CoreAdminHandler says there are no SolrCores --- Key: SOLR-3633 URL: https://issues.apache.org/jira/browse/SOLR-3633 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0-ALPHA Reporter: Hoss Man Assignee: Stefan Matheis (steffkes) Fix For: 4.4 Attachments: SOLR-3633.patch, SOLR-3633.patch, SOLR-3633.patch, SOLR-3633.patch, SOLR-3633.patch, SOLR-3633.patch Spun off from SOLR-3591... * having no SolrCores is a valid situation * independent of what may happen in SOLR-3591, the web UI should cleanly deal with there being no SolrCores, and just hide/grey out any tabs that can't be supported w/o at least one core * even if there are no SolrCores the core admin features (ie: creating a new core) should be accessible in the UI -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #909: POMs out of sync
I *think* something like this is needed to fix Maven's POMs? Can someone with more Maven experience test this? AnalyzingInfixSuggester added deps to lucene/suggest on misc and analyzers-common, but analyzers-common already seems to be in the POM: Index: dev-tools/maven/lucene/suggest/pom.xml.template === --- dev-tools/maven/lucene/suggest/pom.xml.template (revision 1503469) +++ dev-tools/maven/lucene/suggest/pom.xml.template (working copy) @@ -59,6 +59,11 @@ artifactIdlucene-analyzers-common/artifactId version${project.version}/version /dependency +dependency + groupId${project.groupId}/groupId + artifactIdlucene-misc/artifactId + version${project.version}/version +/dependency /dependencies build sourceDirectory${module-path}/src/java/sourceDirectory Mike McCandless http://blog.mikemccandless.com On Mon, Jul 15, 2013 at 4:51 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/909/ All tests passed Build Log: [...truncated 20111 lines...] [mvn] [INFO] - [mvn] [INFO] - [mvn] [ERROR] COMPILATION ERROR : [mvn] [INFO] - [...truncated 305 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #909: POMs out of sync
Mike, That's the right thing for Maven config - not sure why analyzers-common was in there already. I'm running the Maven build now with this fix to make sure. (By the way, the IntelliJ config needs these two deps added as well - I'll take care of it.) Steve On Jul 15, 2013, at 4:59 PM, Michael McCandless luc...@mikemccandless.com wrote: I *think* something like this is needed to fix Maven's POMs? Can someone with more Maven experience test this? AnalyzingInfixSuggester added deps to lucene/suggest on misc and analyzers-common, but analyzers-common already seems to be in the POM: Index: dev-tools/maven/lucene/suggest/pom.xml.template === --- dev-tools/maven/lucene/suggest/pom.xml.template (revision 1503469) +++ dev-tools/maven/lucene/suggest/pom.xml.template (working copy) @@ -59,6 +59,11 @@ artifactIdlucene-analyzers-common/artifactId version${project.version}/version /dependency +dependency + groupId${project.groupId}/groupId + artifactIdlucene-misc/artifactId + version${project.version}/version +/dependency /dependencies build sourceDirectory${module-path}/src/java/sourceDirectory Mike McCandless http://blog.mikemccandless.com On Mon, Jul 15, 2013 at 4:51 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/909/ All tests passed Build Log: [...truncated 20111 lines...] [mvn] [INFO] - [mvn] [INFO] - [mvn] [ERROR] COMPILATION ERROR : [mvn] [INFO] - [...truncated 305 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708989#comment-13708989 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503477 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1503477 ] LUCENE-4845: Maven and IntelliJ config (merged trunk r1503476) Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708987#comment-13708987 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503476 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1503476 ] LUCENE-4845: Maven and IntelliJ config Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708992#comment-13708992 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503478 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1503478 ] LUCENE-4845: Maven and IntelliJ config (merged trunk r1503476) Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org