date:20130715

Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java7/1412/

2 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=2766, 
name=recoveryCmdExecutor-1248-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest] at 
java.net.PlainSocketImpl.socketConnect(Native Method) at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)  
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at 
java.net.Socket.connect(Socket.java:579) at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
 at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
 at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
 at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
 at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:722)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=2766, name=recoveryCmdExecutor-1248-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
at __randomizedtesting.SeedInfo.seed([9EFF148BF67261FD]:0)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
There are still zombie threads that couldn't be terminated:1) 
Thread[id=2766, name=recoveryCmdExecutor-1248-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest] at 
java.net.PlainSocketImpl.socketConnect(Native Method) at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)  
   at

[jira] [Updated] (LUCENE-5098) Broadword bit selection


 [ 
https://issues.apache.org/jira/browse/LUCENE-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5098:
-

Fix Version/s: 4.5

 Broadword bit selection
 ---

 Key: LUCENE-5098
 URL: https://issues.apache.org/jira/browse/LUCENE-5098
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Paul Elschot
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.5

 Attachments: LUCENE-5098.patch, LUCENE-5098.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5098) Broadword bit selection


 [ 
https://issues.apache.org/jira/browse/LUCENE-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-5098.
--

Resolution: Fixed

 Broadword bit selection
 ---

 Key: LUCENE-5098
 URL: https://issues.apache.org/jira/browse/LUCENE-5098
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Paul Elschot
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5098.patch, LUCENE-5098.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5111) Fix WordDelimiterFilter

2013-07-15 Thread ASF subversion and git services (JIRA)

Adrien Grand created LUCENE-5111:


 Summary: Fix WordDelimiterFilter
 Key: LUCENE-5111
 URL: https://issues.apache.org/jira/browse/LUCENE-5111
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand


WordDelimiterFilter is documented as broken is TestRandomChains (LUCENE-4641). 
Given how used it is, we should try to fix it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards


[ 
https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708301#comment-13708301
 ] 

ASF subversion and git services commented on SOLR-4997:
---

Commit 1503130 from sha...@apache.org in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1503130 ]

SOLR-4997: The splitshard api doesn't call commit on new sub shards before 
switching shard states. Multiple bugs related to sub shard recovery and 
replication are also fixed.

 The splitshard api doesn't call commit on new sub shards
 

 Key: SOLR-4997
 URL: https://issues.apache.org/jira/browse/SOLR-4997
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3, 4.3.1
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.4

 Attachments: SOLR-4997.patch, SOLR-4997.patch


 The splitshard api doesn't call commit on new sub shards but it happily sets 
 them to active state which means on a successful split, the documents are not 
 visible to searchers unless an explicit commit is called on the cluster.
 The coreadmin split api will still not call commit on targetCores. That is by 
 design and we're not going to change that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708302#comment-13708302
 ] 

ASF subversion and git services commented on SOLR-4997:
---

Commit 1503131 from sha...@apache.org in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1503131 ]

SOLR-4997: Call commit before checking shard consistency

 The splitshard api doesn't call commit on new sub shards
 

 Key: SOLR-4997
 URL: https://issues.apache.org/jira/browse/SOLR-4997
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3, 4.3.1
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.4

 Attachments: SOLR-4997.patch, SOLR-4997.patch


 The splitshard api doesn't call commit on new sub shards but it happily sets 
 them to active state which means on a successful split, the documents are not 
 visible to searchers unless an explicit commit is called on the cluster.
 The coreadmin split api will still not call commit on targetCores. That is by 
 design and we're not going to change that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken

George Rhoten created LUCENE-5112:
-

 Summary: FilteringTokenFilter is double incrementing the position 
increment in incrementToken
 Key: LUCENE-5112
 URL: https://issues.apache.org/jira/browse/LUCENE-5112
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: George Rhoten


The following code from FilteringTokenFilter#incrementToken() seems wrong.
{noformat}
if (enablePositionIncrements) {
  int skippedPositions = 0;
  while (input.incrementToken()) {
if (accept()) {
  if (skippedPositions != 0) {
posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + 
skippedPositions);
  }
  return true;
}
skippedPositions += posIncrAtt.getPositionIncrement();
  }
} else {
{noformat} 
The skippedPositions variable should probably be incremented by 1 instead of 
posIncrAtt.getPositionIncrement(). As it is, it seems to be double 
incrementing, which is a problem if your data is full of stop words and your 
position increment integer overflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken


[ 
https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708324#comment-13708324
 ] 

George Rhoten commented on LUCENE-5112:
---

The workaround seems to be to always use setEnablePositionIncrements(false) on 
any stop filter being used.

 FilteringTokenFilter is double incrementing the position increment in 
 incrementToken
 

 Key: LUCENE-5112
 URL: https://issues.apache.org/jira/browse/LUCENE-5112
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: George Rhoten

 The following code from FilteringTokenFilter#incrementToken() seems wrong.
 {noformat}
 if (enablePositionIncrements) {
   int skippedPositions = 0;
   while (input.incrementToken()) {
 if (accept()) {
   if (skippedPositions != 0) {
 posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() 
 + skippedPositions);
   }
   return true;
 }
 skippedPositions += posIncrAtt.getPositionIncrement();
   }
 } else {
 {noformat} 
 The skippedPositions variable should probably be incremented by 1 instead of 
 posIncrAtt.getPositionIncrement(). As it is, it seems to be double 
 incrementing, which is a problem if your data is full of stop words and your 
 position increment integer overflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken


[ 
https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708334#comment-13708334
 ] 

George Rhoten commented on LUCENE-5112:
---

For reference, this issue causes this exception:
{noformat}
java.lang.IllegalArgumentException: position overflow for field 'labels'
at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:135)
at 
org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:307)
at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:244)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:373)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1124)
{noformat}

 FilteringTokenFilter is double incrementing the position increment in 
 incrementToken
 

 Key: LUCENE-5112
 URL: https://issues.apache.org/jira/browse/LUCENE-5112
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: George Rhoten

 The following code from FilteringTokenFilter#incrementToken() seems wrong.
 {noformat}
 if (enablePositionIncrements) {
   int skippedPositions = 0;
   while (input.incrementToken()) {
 if (accept()) {
   if (skippedPositions != 0) {
 posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() 
 + skippedPositions);
   }
   return true;
 }
 skippedPositions += posIncrAtt.getPositionIncrement();
   }
 } else {
 {noformat} 
 The skippedPositions variable should probably be incremented by 1 instead of 
 posIncrAtt.getPositionIncrement(). As it is, it seems to be double 
 incrementing, which is a problem if your data is full of stop words and your 
 position increment integer overflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

FunctionQuery result field in SearchComponent code ?

2013-07-15 Thread Tony Mullins

Hi ,

I have written my custom Solr 4.3.0 SearchComponent and purpose of this
component is to sum the result of FunctionQuery (termfreq) of some term of
each doc and them embed the result in final output.

This is my query:

http://localhost:8080/solr/collection2/demoendpoint?q=spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%27spider%27%29

This is sample result doc on browser:

docstr name=id11/strstr name=typeVideo Games/strstr
name=formatxbox 360/strstr name=productThe Amazing
Spider-Man/strint name=popularity11/intlong
name=_version_1439994081345273856/longint name=freq1/int/doc


Here is my code from SearchComponent

DocList docs = rb.getResults().docList;
DocIterator iterator = docs.iterator();
int sumFreq = 0;
String id = null;

for (int i = 0; i  docs.size(); i++) {
try {
int docId = iterator.nextDoc();

   // Document doc = searcher.doc(docId, fieldSet);
Document doc = searcher.doc(docId);

In 'doc' object I can see the schema fields like 'id', 'type','format' etc.
but I cannot find the field 'freq' which I needed.

Is there any way to get the FunctionQuery fields in doc object ?

Thanks,
Tony

[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken


[ 
https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708388#comment-13708388
 ] 

Michael McCandless commented on LUCENE-5112:


I think the code is correct: we accumulate posInc of all tokens that were not 
accepted, plus the final posInc of the token that was accepted.  I don't see 
how this leads to integer overflows when a StopFilter is used ... can you make 
a contained test showing that?

 FilteringTokenFilter is double incrementing the position increment in 
 incrementToken
 

 Key: LUCENE-5112
 URL: https://issues.apache.org/jira/browse/LUCENE-5112
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: George Rhoten

 The following code from FilteringTokenFilter#incrementToken() seems wrong.
 {noformat}
 if (enablePositionIncrements) {
   int skippedPositions = 0;
   while (input.incrementToken()) {
 if (accept()) {
   if (skippedPositions != 0) {
 posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() 
 + skippedPositions);
   }
   return true;
 }
 skippedPositions += posIncrAtt.getPositionIncrement();
   }
 } else {
 {noformat} 
 The skippedPositions variable should probably be incremented by 1 instead of 
 posIncrAtt.getPositionIncrement(). As it is, it seems to be double 
 incrementing, which is a problem if your data is full of stop words and your 
 position increment integer overflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-trunk-Java7 - Build # 4144 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java7/4144/

2 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=2530, 
name=recoveryCmdExecutor-829-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest] at 
java.net.PlainSocketImpl.socketConnect(Native Method) at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)  
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at 
java.net.Socket.connect(Socket.java:579) at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
 at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
 at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
 at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
 at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:722)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=2530, name=recoveryCmdExecutor-829-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
at __randomizedtesting.SeedInfo.seed([6C3491EDDF8067DC]:0)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
There are still zombie threads that couldn't be terminated:1) 
Thread[id=2530, name=recoveryCmdExecutor-829-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest] at 
java.net.PlainSocketImpl.socketConnect(Native Method) at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)  
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)

Re: Lookback and/or time-aware Merge Policy?

Lookback is a good idea: you could at least gather statistics and
assess, later, whether good merges had been selected, and maybe play
what if games to explore if different merge selections would have
resulted in less copying.

A time-based MergeScheduler would make sense: e.g., it would allow
small merges to run any time, but big ones must wait until after
hours.

Also, RateLimitedDirWrapper can be used to limit IO impact of ongoing
merges.  It's like a naive ionice, for merging.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jul 8, 2013 at 10:41 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
 Hi,

 I was (re-re-re-re)-reading Mike's post about Lucene segment merges -
 http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

 Mike mentioned lookhead as something that could possibly yield more
 optimal merges.

 But what about lookback? :)

 What if some sort of stats were kept about about which segments were
 picked for merges?  With some sort of stats in hand, could one look
 back and, knowing what happened after those merges, evaluate if more
 optimal merge choices could have been made and then use that next
 time?

 Also, what about time of day and query rates?  Very often search
 traffic follows the wave pattern, which could mean that more
 aggressive merging could be done during periods with lower query
 rates... or maybe during that time more segments could be allowed to
 live in the index, assuming that after allowing that for some time,
 the subsequent merge could be bigger/more thorough, so to speak.

 Thoughts?

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary


[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708428#comment-13708428
 ] 

Michael McCandless commented on LUCENE-3069:


{quote}
Another thing that surprised me is, with the same code/conf, 
luceneutil creates different sizes of index? I tested 
that df==0 trick several times on wikimedium1m, the 
index size varies from 514M~522M... Will multi-threading affects
much here?
{quote}

Using threads means the docs are assigned to different segments each time you 
run ... it's interesting this can cause such variance in the index size though.

It is known that e.g. sorting docs by web site (if you are indexing content 
from different sites) can give good compression; maybe that's the effect we're 
seeing here?

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module

2013-07-15 Thread Dmitry Kan

Hello guys,

Indeed, the GWT port is work in progress and far from done. The driving
factor here was to be able to later integrate luke into the solr admin as
well as have the standalone webapp for non-solr users.
There is (was?) a luke stats handler in the solr ui, that printed some
stats on the index. That could be substituted with the GWT app.

The code isn't yet ready to see the light. So if it makes more sense for
Ajay to work on the existing jira with the Apache Pivot implementation, I
would say go ahead.

In the current port effort (the aforementioned github's fork) the UI is the
original one, developed by Andrzej. Beside the UI rework there is plenty
things to port / verify (like e.g. Hadoop plugin) against the latest lucene
versions.

See the readme.md: https://github.com/dmitrykey/luke

Whichever way's taken, hopefully we end up having stable releases of luke :)

Dmitry Kan

On 14 July 2013 22:38, Andrzej Bialecki a...@getopt.org wrote:

On 7/14/13 5:04 AM, Ajay Bhat wrote:

Shawn and Andrzej,

Thanks for answering my questions. I've looked over the code done by
Dmitry and I'll look into what I can do to help with the UI porting in
future.

I was actually thinking of doing this JIRA as a project by myself with
some assistance from the community after getting a mentor for the ASF
ICFOSS program, which I haven't found yet. It would be great if I could
get one of you guys as a mentor.

As the UI work has been mostly done by others like Dmitry Kan, I don't
think I need to work on that majorly for now.

It's far from done - he just started the process.

What other work is there to be done that I can do as a project? Any new
features or improvements?

Regards,
Ajay

On Jul 14, 2013 1:54 AM, Andrzej Bialecki a...@getopt.org
mailto:a...@getopt.org wrote:

On 7/13/13 8:56 PM, Shawn Heisey wrote:

On 7/13/2013 3:15 AM, Ajay Bhat wrote:

One more question : What version of Lucene does Luke
currently support
right now? I saw a comment on the issue page that it doesn't
support the
Lucene 4.1 and 4.2 trunk.

The official Luke project only has versions up through
4.0.0-ALPHA.

http://code.google.com/p/luke/

There is a forked project that has produced Luke for newer
Lucene versions.

https://java.net/projects/__**opengrok/downloadshttps://java.net/projects/__opengrok/downloads

https://java.net/projects/**opengrok/downloadshttps://java.net/projects/opengrok/downloads

I can't seem to locate any information about how they have
licensed the
newer versions, and I'm not really sure where the source code is
living.

Regarding a question you asked earlier, Luke is a standalone
program.
It does include Lucene classes in the lukeall version of the
executable jar.

Luke may have some uses as a library, but I think that most
people run
it separately. There is partial Luke functionality embedded in
the Solr
admin UI, but I don't know whether that is something cooked up
by Solr
devs or if it shares actual code with Luke.

Ajay,

Luke is a standalone GUI application, not a library. It uses a
custom version of Thinlet GUI toolkit, which is no longer
maintained, and it's LGPL licensed, so Luke can't be contributed to
the Lucene project as is.

Recently several people expressed interest in porting Luke to some
other GUI toolkit that is Apache-friendly. See the discussion here:

http://groups.google.com/d/__**msg/luke-discuss/S_Whwg2jwmA/_**
_9JgqKIe5aiwJhttp://groups.google.com/d/__msg/luke-discuss/S_Whwg2jwmA/__9JgqKIe5aiwJ

http://groups.google.com/d/**msg/luke-discuss/S_Whwg2jwmA/**
9JgqKIe5aiwJhttp://groups.google.com/d/msg/luke-discuss/S_Whwg2jwmA/9JgqKIe5aiwJ

In particular, there's a fork by Dmitry Kan - he plans to integrate
other patches and forks, and to port Luke from Thinlet to GWT and
sync it with the latest version of Lucene. I think you should
coordinate your efforts with him and other contributors that work on
that code base. This fork is Apache-licensed and the long-term plan
is to contribute it back to Lucene once the porting is done.

The Pivot-based port of Luke that is in the Lucene sandbox is in an
early stage. I'm not sure Mark Miller has time to work on it due to
his involvement in SolrCloud development.

The Luke handler in Solr is a completely different code base, and it
shares only the name with the Luke application.

--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
___.,___,___,___,_._. ___**_

[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact:

Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module

2013-07-15 Thread Jack Krupansky

My personal thoughts/preferences/suggestions for Luke:

1. Need a clean Luke Java library – heavily unit-tested. As integrated with 
Lucene as possible.
2. A simple command line interface – always useful.
3. A Solr plugin handler – based on #1. Good for apps as well as Admin UI. Nice 
to be able to curl a request to look at a specific doc, for example.
4. GUI fully integrated with the new Solr Web Admin UI. A separate UI... sucks.
5. Any additional, un-untegrated GUI is icing on the cake and not really 
desirable for Solr. May be great for Elasticsearch and other Lucene-based apps, 
but Solr should be the #1 priority – after #1 and #2 above.

-- Jack Krupansky

From: Dmitry Kan 
Sent: Monday, July 15, 2013 8:54 AM
To: dev@lucene.apache.org 
Subject: Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module

Hello guys,

Indeed, the GWT port is work in progress and far from done. The driving factor 
here was to be able to later integrate luke into the solr admin as well as have 
the standalone webapp for non-solr users.

There is (was?) a luke stats handler in the solr ui, that printed some stats on 
the index. That could be substituted with the GWT app.


The code isn't yet ready to see the light. So if it makes more sense for Ajay 
to work on the existing jira with the Apache Pivot implementation, I would say 
go ahead.


In the current port effort (the aforementioned github's fork) the UI is the 
original one, developed by Andrzej.  Beside the UI rework there is plenty 
things to port / verify (like e.g. Hadoop plugin) against the latest lucene 
versions.

See the readme.md: https://github.com/dmitrykey/luke



Whichever way's taken, hopefully we end up having stable releases of luke :)

Dmitry Kan




On 14 July 2013 22:38, Andrzej Bialecki a...@getopt.org wrote:

  On 7/14/13 5:04 AM, Ajay Bhat wrote:

Shawn and Andrzej,

Thanks for answering my questions. I've looked over the code done by
Dmitry and I'll look into what I can do to help with the UI porting in
future.

I was actually thinking of doing this JIRA as a project by myself with
some assistance from the community after getting a mentor for the ASF
ICFOSS program, which I haven't found yet. It would be great if I could
get one of you guys as a mentor.

As the UI work has been mostly done by others like Dmitry Kan, I don't
think I need to work on that majorly for now.



  It's far from done - he just started the process.



What other work is there to be done that I can do as a project? Any new
features or improvements?

Regards,
Ajay

On Jul 14, 2013 1:54 AM, Andrzej Bialecki a...@getopt.org

mailto:a...@getopt.org wrote:

On 7/13/13 8:56 PM, Shawn Heisey wrote:

On 7/13/2013 3:15 AM, Ajay Bhat wrote:

One more question : What version of Lucene does Luke
currently support
right now? I saw a comment on the issue page that it doesn't
support the
Lucene 4.1 and 4.2 trunk.


The official Luke project only has versions up through 4.0.0-ALPHA.

http://code.google.com/p/luke/

There is a forked project that has produced Luke for newer
Lucene versions.


https://java.net/projects/__opengrok/downloads 

https://java.net/projects/opengrok/downloads

I can't seem to locate any information about how they have
licensed the
newer versions, and I'm not really sure where the source code is
living.

Regarding a question you asked earlier, Luke is a standalone
program.
It does include Lucene classes in the lukeall version of the
executable jar.

Luke may have some uses as a library, but I think that most
people run
it separately.  There is partial Luke functionality embedded in
the Solr
admin UI, but I don't know whether that is something cooked up
by Solr
devs or if it shares actual code with Luke.


Ajay,

Luke is a standalone GUI application, not a library. It uses a
custom version of Thinlet GUI toolkit, which is no longer
maintained, and it's LGPL licensed, so Luke can't be contributed to
the Lucene project as is.

Recently several people expressed interest in porting Luke to some
other GUI toolkit that is Apache-friendly. See the discussion here:



http://groups.google.com/d/__msg/luke-discuss/S_Whwg2jwmA/__9JgqKIe5aiwJ 

http://groups.google.com/d/msg/luke-discuss/S_Whwg2jwmA/9JgqKIe5aiwJ

In particular, there's a fork by Dmitry Kan - he plans to integrate
other patches and forks, and to port Luke from Thinlet to GWT and
sync it with the latest version of Lucene. I think you should
coordinate your efforts with him and other

[jira] [Assigned] (SOLR-5040) SnapShooter doesn't create a lock as it runs

2013-07-15 Thread Noble Paul (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-5040:


Assignee: Noble Paul

 SnapShooter doesn't create a lock as it runs
 

 Key: SOLR-5040
 URL: https://issues.apache.org/jira/browse/SOLR-5040
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Mark Triggs
Assignee: Noble Paul
Priority: Trivial
 Attachments: snapshooter-locking.diff


 Hi there,
 While messing around with the replication handler recently, I noticed that 
 the snapshooter didn't seem to be writing a lock file.  I had a look at the 
 SnapShooter.java code, and to my untrained eye it seemed like it was creating 
 a Lock object but never actually taking a lock.
 I modified my local installation to use lock.obtain() instead of 
 lock.isLocked() and verified that I'm now seeing lock files.  I've attached a 
 very small patch just in case this is a genuine bug.
 Cheers,
 Mark

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5040) SnapShooter doesn't create a lock as it runs

2013-07-15 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708450#comment-13708450
 ] 

Noble Paul commented on SOLR-5040:
--

multiple snapshots running in parallel should be just fine. They are just going 
to be created with different file names.

But I don't think the snapshooter is smart enough to check if there sis a copy 
of the index with the same indexversion.

The snapshoot process itself is async .There should be a way to poll and get 
the status of an ongoing snapshoot (if any)





 SnapShooter doesn't create a lock as it runs
 

 Key: SOLR-5040
 URL: https://issues.apache.org/jira/browse/SOLR-5040
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Mark Triggs
Assignee: Noble Paul
Priority: Trivial
 Attachments: snapshooter-locking.diff


 Hi there,
 While messing around with the replication handler recently, I noticed that 
 the snapshooter didn't seem to be writing a lock file.  I had a look at the 
 SnapShooter.java code, and to my untrained eye it seemed like it was creating 
 a Lock object but never actually taking a lock.
 I modified my local installation to use lock.obtain() instead of 
 lock.isLocked() and verified that I'm now seeing lock files.  I've attached a 
 very small patch just in case this is a genuine bug.
 Cheers,
 Mark

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5040) SnapShooter doesn't create a lock as it runs

2013-07-15 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708451#comment-13708451
 ] 

Mark Miller commented on SOLR-5040:
---

bq. There should be a way to poll and get the status of an ongoing snapshoot

I think that's a fine feature, but less useful than offering the option to have 
the call wait to return until it's done.

 SnapShooter doesn't create a lock as it runs
 

 Key: SOLR-5040
 URL: https://issues.apache.org/jira/browse/SOLR-5040
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Mark Triggs
Assignee: Noble Paul
Priority: Trivial
 Attachments: snapshooter-locking.diff


 Hi there,
 While messing around with the replication handler recently, I noticed that 
 the snapshooter didn't seem to be writing a lock file.  I had a look at the 
 SnapShooter.java code, and to my untrained eye it seemed like it was creating 
 a Lock object but never actually taking a lock.
 I modified my local installation to use lock.obtain() instead of 
 lock.isLocked() and verified that I'm now seeing lock files.  I've attached a 
 very small patch just in case this is a genuine bug.
 Cheers,
 Mark

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3359) SynonymFilterFactory should accept fieldType attribute rather than tokenizerFactory

2013-07-15 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708459#comment-13708459
 ] 

Koji Sekiguchi commented on SOLR-3359:
--

When I opened the ticket, I thought SynonymFilterFactory should accept (Solr's) 
fieldType attribute as I told in the title.

But today, as SynonymFilterFactory is in Lucene land, I think analyzer 
attribute is more natural than (Solr's) fieldType attribute.

I'd like to commit the patch in a few days if no one objects.

 SynonymFilterFactory should accept fieldType attribute rather than 
 tokenizerFactory
 ---

 Key: SOLR-3359
 URL: https://issues.apache.org/jira/browse/SOLR-3359
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Koji Sekiguchi
 Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch


 I've not been realized that CJKTokenizer and its factory classes was marked 
 deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me.
 {code}
  * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and 
 LowerCaseFilter instead.   
 {code}
 I agree with the idea of using the chain of the Tokenizer and TokenFilters 
 instead of CJKTokenizer, but it could be a problem for the existing users of 
 SynonymFilterFactory with CJKTokenizerFactory.
 So this ticket comes to my mind again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary


[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708465#comment-13708465
 ] 

Michael McCandless commented on LUCENE-3069:


The new code on the branch looks great!  I can't wait to see perf results after 
we
implement .intersect()..

Some small stuff in TempFSTTermsReader.java:

  * In next(), when we handle seekPending=true, I think we should assert
that the seekCeil returned SeekStatus.FOUND?  Ie, it's not
possible to seekExact(TermState) to a term that doesn't exist.

  * useCache is an ancient option from back when we had a terms dict
cache; we long ago removed it ... I think we should remove
useCache parameter too?

  * It's silly that fstEnum.seekCeil doesn't return a status, ie that
we must re-compare the term we got to differentiate FOUND vs
NOT_FOUND ... so we lose some perf here.  But this is just a
future TODO ...

  * nocommit: this method doesn't act as 'seekExact' right? -- not
sure why this is here; seekExact is working as it should I think.

  * Maybe instead of term and meta members, we could just hold the
current pair?

In TempTermOutputs.java:

  * longsSize, hasPos can be final?  (Same with TempMetaData's fields)

  * TempMetaData.hashCode() doesn't mix in docFreq/tTF?

  * It doesn't impl equals (must it really impl hashCode?)


 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester


[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708479#comment-13708479
 ] 

Michael McCandless commented on LUCENE-4845:


bq. I guess, there should be an AnalyzingInfixLookupFactory in Solr as well?

I agree ... but this can be done separately.

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708484#comment-13708484
 ] 

Shai Erera commented on LUCENE-4845:


Mike, will you still commit it to 4.4? I think that the branch was created 
prematurely as there's still no resolution on whether to release or not. And 
this feature is pretty isolated to cause any instability ... it'd be a petty to 
have to wait with releasing it another 3-4 months just because of 
technicalities...

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

[
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708486#comment-13708486
]

Han Jiang commented on LUCENE-3069:
---

bq. I think we should assert that the seekCeil returned SeekStatus.FOUND?

Ok! I'll commit that.

bq. useCache is an ancient option from back when we had a terms dict cache

Yes, I suppose is is not 'clear' to have this parameter.

bq. seekExact is working as it should I think.

Currently, I think those 'seek' methods are supposed to change the enum pointer
based on
input term string, and fetch related metadata from term dict.

However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState
to enum, which
doesn't actually operate 'seek' on dictionary.

bq. Maybe instead of term and meta members, we could just hold the current pair?

Oh, yes, I once thought about this, but not sure: like, can the callee always
makes sure that,
when 'term()' is called, it will always return a valid term?
The codes in MemoryPF just return 'pair.output' regardless whether pair==null,
is it safe?

bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF?

Oops! thanks, nice catch!

Lucene should have an entirely memory resident term dictionary
--

Key: LUCENE-3069
URL: https://issues.apache.org/jira/browse/LUCENE-3069
Project: Lucene - Core
Issue Type: Improvement
Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
Labels: gsoc2013
Fix For: 4.4

Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch

FST based TermDictionary has been a great improvement yet it still uses a
delta codec file for scanning to terms. Some environments have enough memory
available to keep the entire FST based term dict in memory. We should add a
TermDictionary implementation that encodes all needed information for each
term into the FST (custom fst.Output) and builds a FST from the entire term
not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary


[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708486#comment-13708486
 ] 

Han Jiang edited comment on LUCENE-3069 at 7/15/13 2:20 PM:


bq. I think we should assert that the seekCeil returned SeekStatus.FOUND?

Ok! I'll commit that.

bq. useCache is an ancient option from back when we had a terms dict cache

Yes, I suppose is is not 'clear' to have this parameter.

bq. seekExact is working as it should I think.

Currently, I think those 'seek' methods are supposed to change the enum pointer 
based on
input term string, and fetch related metadata from term dict. 

However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState 
to enum, which 
doesn't actually operate 'seek' on dictionary. 

bq. Maybe instead of term and meta members, we could just hold the current pair?

Oh, yes, I once thought about this, but not sure: like, can the callee always 
makes sure that,
when 'term()' is called, it will always return a valid term?
The codes in MemoryPF just return 'pair.output' regardless whether pair==null, 
is it safe?

bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF?

Oops! thanks, nice catch!

bq. It doesn't impl equals (must it really impl hashCode?)

Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether to 
nodes can be 'merged'.

  was (Author: billy):
bq. I think we should assert that the seekCeil returned SeekStatus.FOUND?

Ok! I'll commit that.

bq. useCache is an ancient option from back when we had a terms dict cache

Yes, I suppose is is not 'clear' to have this parameter.

bq. seekExact is working as it should I think.

Currently, I think those 'seek' methods are supposed to change the enum pointer 
based on
input term string, and fetch related metadata from term dict. 

However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState 
to enum, which 
doesn't actually operate 'seek' on dictionary. 

bq. Maybe instead of term and meta members, we could just hold the current pair?

Oh, yes, I once thought about this, but not sure: like, can the callee always 
makes sure that,
when 'term()' is called, it will always return a valid term?
The codes in MemoryPF just return 'pair.output' regardless whether pair==null, 
is it safe?

bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF?

Oops! thanks, nice catch!

  
 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary


[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708486#comment-13708486
 ] 

Han Jiang edited comment on LUCENE-3069 at 7/15/13 2:20 PM:


bq. I think we should assert that the seekCeil returned SeekStatus.FOUND?

Ok! I'll commit that.

bq. useCache is an ancient option from back when we had a terms dict cache

Yes, I suppose is is not 'clear' to have this parameter.

bq. seekExact is working as it should I think.

Currently, I think those 'seek' methods are supposed to change the enum pointer 
based on
input term string, and fetch related metadata from term dict. 

However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState 
to enum, which 
doesn't actually operate 'seek' on dictionary. 

bq. Maybe instead of term and meta members, we could just hold the current pair?

Oh, yes, I once thought about this, but not sure: like, can the callee always 
makes sure that,
when 'term()' is called, it will always return a valid term?
The codes in MemoryPF just return 'pair.output' regardless whether pair==null, 
is it safe?

bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF?

Oops! thanks, nice catch!

bq. It doesn't impl equals (must it really impl hashCode?)

Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether two 
fst nodes can be 'merged'.

  was (Author: billy):
bq. I think we should assert that the seekCeil returned SeekStatus.FOUND?

Ok! I'll commit that.

bq. useCache is an ancient option from back when we had a terms dict cache

Yes, I suppose is is not 'clear' to have this parameter.

bq. seekExact is working as it should I think.

Currently, I think those 'seek' methods are supposed to change the enum pointer 
based on
input term string, and fetch related metadata from term dict. 

However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState 
to enum, which 
doesn't actually operate 'seek' on dictionary. 

bq. Maybe instead of term and meta members, we could just hold the current pair?

Oh, yes, I once thought about this, but not sure: like, can the callee always 
makes sure that,
when 'term()' is called, it will always return a valid term?
The codes in MemoryPF just return 'pair.output' regardless whether pair==null, 
is it safe?

bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF?

Oops! thanks, nice catch!

bq. It doesn't impl equals (must it really impl hashCode?)

Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether to 
nodes can be 'merged'.
  
 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708486#comment-13708486
 ] 

Han Jiang edited comment on LUCENE-3069 at 7/15/13 2:35 PM:


bq. I think we should assert that the seekCeil returned SeekStatus.FOUND?

Ok! I'll commit that.

bq. useCache is an ancient option from back when we had a terms dict cache

Yes, I suppose is is not 'clear' to have this parameter.

bq. seekExact is working as it should I think.

Currently, I think those 'seek' methods are supposed to change the enum pointer 
based on
input term string, and fetch related metadata from term dict. 

However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState 
to enum, which 
doesn't actually operate 'seek' on dictionary. 

bq. Maybe instead of term and meta members, we could just hold the current pair?

Oh, yes, I once thought about this, but not sure: like, can the callee always 
makes sure that,
when 'term()' is called, it will always return a valid term?
The codes in MemoryPF just return 'pair.output' regardless whether pair==null, 
is it safe?

bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF?

Oops! thanks, nice catch!

bq. It doesn't impl equals (must it really impl hashCode?)

-Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether two 
fst nodes can be 'merged'.-
Oops, I forgot it still relys on equals to make sure two instance really 
matches, ok, I'll add that.

  was (Author: billy):
bq. I think we should assert that the seekCeil returned SeekStatus.FOUND?

Ok! I'll commit that.

bq. useCache is an ancient option from back when we had a terms dict cache

Yes, I suppose is is not 'clear' to have this parameter.

bq. seekExact is working as it should I think.

Currently, I think those 'seek' methods are supposed to change the enum pointer 
based on
input term string, and fetch related metadata from term dict. 

However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState 
to enum, which 
doesn't actually operate 'seek' on dictionary. 

bq. Maybe instead of term and meta members, we could just hold the current pair?

Oh, yes, I once thought about this, but not sure: like, can the callee always 
makes sure that,
when 'term()' is called, it will always return a valid term?
The codes in MemoryPF just return 'pair.output' regardless whether pair==null, 
is it safe?

bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF?

Oops! thanks, nice catch!

bq. It doesn't impl equals (must it really impl hashCode?)

Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether two 
fst nodes can be 'merged'.
  
 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields


[ 
https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708515#comment-13708515
 ] 

ASF subversion and git services commented on SOLR-4894:
---

Commit 1503275 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1503275 ]

SOLR-4894: fix error message

 Add a new update processor factory that will dynamically add fields to the 
 schema if an input document contains unknown fields
 --

 Key: SOLR-4894
 URL: https://issues.apache.org/jira/browse/SOLR-4894
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-4894.patch


 Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same 
 chain will detect, parse and convert unknown fields’ {{String}}-typed values 
 to the appropriate Java object type.
 This factory will take as configuration a set of mappings from Java object 
 type to schema field type.
 {{ManagedIndexSchema.addFields()}} adds new fields to the schema.
 If schema addition fails for any field, addition is re-attempted only for 
 those that don’t match any schema field.  This process is repeated, either 
 until all new fields are successfully added, or until there are no new fields 
 (because the fields that were new when this update chain started its work 
 were subsequently added by a different update request, possibly on a 
 different node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708516#comment-13708516
 ] 

ASF subversion and git services commented on SOLR-4894:
---

Commit 1503277 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1503277 ]

SOLR-4894: fix error message (merged trunk r1503275)

 Add a new update processor factory that will dynamically add fields to the 
 schema if an input document contains unknown fields
 --

 Key: SOLR-4894
 URL: https://issues.apache.org/jira/browse/SOLR-4894
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-4894.patch


 Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same 
 chain will detect, parse and convert unknown fields’ {{String}}-typed values 
 to the appropriate Java object type.
 This factory will take as configuration a set of mappings from Java object 
 type to schema field type.
 {{ManagedIndexSchema.addFields()}} adds new fields to the schema.
 If schema addition fails for any field, addition is re-attempted only for 
 those that don’t match any schema field.  This process is repeated, either 
 until all new fields are successfully added, or until there are no new fields 
 (because the fields that were new when this update chain started its work 
 were subsequently added by a different update request, possibly on a 
 different node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields

2013-07-15 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708518#comment-13708518
 ] 

Steve Rowe commented on SOLR-4894:
--

bq. Found a copy/paste exception error 

Thanks Jack, you're right, committed fix to trunk, branch_4x and 
lucene_solr_4_4 branches.

 Add a new update processor factory that will dynamically add fields to the 
 schema if an input document contains unknown fields
 --

 Key: SOLR-4894
 URL: https://issues.apache.org/jira/browse/SOLR-4894
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-4894.patch


 Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same 
 chain will detect, parse and convert unknown fields’ {{String}}-typed values 
 to the appropriate Java object type.
 This factory will take as configuration a set of mappings from Java object 
 type to schema field type.
 {{ManagedIndexSchema.addFields()}} adds new fields to the schema.
 If schema addition fails for any field, addition is re-attempted only for 
 those that don’t match any schema field.  This process is repeated, either 
 until all new fields are successfully added, or until there are no new fields 
 (because the fields that were new when this update chain started its work 
 were subsequently added by a different update request, possibly on a 
 different node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708517#comment-13708517
 ] 

ASF subversion and git services commented on SOLR-4894:
---

Commit 1503278 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1503278 ]

SOLR-4894: fix error message (merged trunk r1503275)

 Add a new update processor factory that will dynamically add fields to the 
 schema if an input document contains unknown fields
 --

 Key: SOLR-4894
 URL: https://issues.apache.org/jira/browse/SOLR-4894
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-4894.patch


 Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same 
 chain will detect, parse and convert unknown fields’ {{String}}-typed values 
 to the appropriate Java object type.
 This factory will take as configuration a set of mappings from Java object 
 type to schema field type.
 {{ManagedIndexSchema.addFields()}} adds new fields to the schema.
 If schema addition fails for any field, addition is re-attempted only for 
 those that don’t match any schema field.  This process is repeated, either 
 until all new fields are successfully added, or until there are no new fields 
 (because the fields that were new when this update chain started its work 
 were subsequently added by a different update request, possibly on a 
 different node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 321 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/321/

2 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=554, 
name=recoveryCmdExecutor-105-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest] at 
java.net.PlainSocketImpl.socketConnect(Native Method) at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)  
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at 
java.net.Socket.connect(Socket.java:579) at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
 at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
 at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
 at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
 at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:722)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=554, name=recoveryCmdExecutor-105-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
at __randomizedtesting.SeedInfo.seed([B5C6D59CDB4012CB]:0)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
There are still zombie threads that couldn't be terminated:1) 
Thread[id=554, name=recoveryCmdExecutor-105-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest] at 
java.net.PlainSocketImpl.socketConnect(Native Method) at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)  
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)

Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module

2013-07-15 Thread Robert Muir

I disagree with this completely. Solr is last priority
On Jul 15, 2013 6:14 AM, Jack Krupansky j...@basetechnology.com wrote:

My personal thoughts/preferences/suggestions for Luke:

1. Need a clean Luke Java library – heavily unit-tested. As integrated
with Lucene as possible.
2. A simple command line interface – always useful.
3. A Solr plugin handler – based on #1. Good for apps as well as Admin UI.
Nice to be able to curl a request to look at a specific doc, for example.
4. GUI fully integrated with the new Solr Web Admin UI. A separate UI...
sucks.
5. Any additional, un-untegrated GUI is icing on the cake and not really
desirable for Solr. May be great for Elasticsearch and other Lucene-based
apps, but Solr should be the #1 priority – after #1 and #2 above.

-- Jack Krupansky

*From:* Dmitry Kan dmitry.luc...@gmail.com
*Sent:* Monday, July 15, 2013 8:54 AM
*To:* dev@lucene.apache.org
*Subject:* Re: Request for Mentor for LUCENE-2562 : Make Luke a
Lucene/Solr Module

Hello guys,

The code isn't yet ready to see the light. So if it makes more sense for
Ajay to work on the existing jira with the Apache Pivot implementation, I
would say go ahead.

In the current port effort (the aforementioned github's fork) the UI is
the original one, developed by Andrzej. Beside the UI rework there is
plenty things to port / verify (like e.g. Hadoop plugin) against the latest
lucene versions.

See the readme.md: https://github.com/dmitrykey/luke

Whichever way's taken, hopefully we end up having stable releases of luke
:)

Dmitry Kan

On 14 July 2013 22:38, Andrzej Bialecki a...@getopt.org wrote:

On 7/14/13 5:04 AM, Ajay Bhat wrote:

Shawn and Andrzej,

Thanks for answering my questions. I've looked over the code done by
Dmitry and I'll look into what I can do to help with the UI porting in
future.

As the UI work has been mostly done by others like Dmitry Kan, I don't
think I need to work on that majorly for now.

It's far from done - he just started the process.

What other work is there to be done that I can do as a project? Any new
features or improvements?

Regards,
Ajay

On Jul 14, 2013 1:54 AM, Andrzej Bialecki a...@getopt.org
mailto:a...@getopt.org wrote:

On 7/13/13 8:56 PM, Shawn Heisey wrote:

On 7/13/2013 3:15 AM, Ajay Bhat wrote:

One more question : What version of Lucene does Luke
currently support
right now? I saw a comment on the issue page that it doesn't
support the
Lucene 4.1 and 4.2 trunk.

The official Luke project only has versions up through
4.0.0-ALPHA.

http://code.google.com/p/luke/

There is a forked project that has produced Luke for newer
Lucene versions.

https://java.net/projects/__**opengrok/downloadshttps://java.net/projects/__opengrok/downloads

https://java.net/projects/**opengrok/downloadshttps://java.net/projects/opengrok/downloads

I can't seem to locate any information about how they have
licensed the
newer versions, and I'm not really sure where the source code is
living.

Regarding a question you asked earlier, Luke is a standalone
program.
It does include Lucene classes in the lukeall version of the
executable jar.

Ajay,

Recently several people expressed interest in porting Luke to some
other GUI toolkit that is Apache-friendly. See the discussion here:

http://groups.google.com/d/__**msg/luke-discuss/S_Whwg2jwmA/_**
_9JgqKIe5aiwJhttp://groups.google.com/d/__msg/luke-discuss/S_Whwg2jwmA/__9JgqKIe5aiwJ

http://groups.google.com/d/**msg/luke-discuss/S_Whwg2jwmA/**
9JgqKIe5aiwJhttp://groups.google.com/d/msg/luke-discuss/S_Whwg2jwmA/9JgqKIe5aiwJ

Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module

2013-07-15 Thread Shawn Heisey

On 7/15/2013 9:15 AM, Robert Muir wrote:
 I disagree with this completely. Solr is last priority

I'm on the Solr side of things, with only the tiniest knowledge or
interest in hacking on Lucene.  Despite that, I have to agree with
Robert here.

Let's make sure the Luke module is very solid and prove that we can keep
it operational through 2-3 full minor release cycles before we try to
integrate it into Solr.

We already have luke functionality in the Solr UI.  Compared to the real
thing it might be a band-aid, but it works.

Thanks,
Shawn


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5039) Admin UI displays -1 for term count in multiValued fields

2013-07-15 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708553#comment-13708553
 ] 

David Smiley commented on SOLR-5039:


Erick, I am looking at CHANGES.txt on trunk and see you added this as a bug fix 
under 4.3.1.  This issue shows it's fixed on 4.4.  Which is it?

 Admin UI displays -1 for term count in multiValued fields
 -

 Key: SOLR-5039
 URL: https://issues.apache.org/jira/browse/SOLR-5039
 Project: Solr
  Issue Type: Bug
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-5039.patch


 I thought this had been a JIRA before, but I couldn't find it. Problem is 
 that LukeRequestHandler.getDetailedFieldInfo gets the count by this line:
 tiq.distinctTerms = new Long(terms.size()).intValue();
 which is -1 at least for multiValued fields. I'll attach a patch in a second 
 that just counts things up. It worked last night, but it was late.
 I obviously don't understand what's up with MultiTerms.size() is hard-coded 
 to return -1. Can anyone shed light on this? Or see the two-line change and 
 see if it makes sense?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields

2013-07-15 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708557#comment-13708557
 ] 

Mikhail Khludnev commented on SOLR-4894:


Good shoot (into we know what), [~steve_rowe]!
Is there a plan to support specifying fieldType alongside with field name?

 Add a new update processor factory that will dynamically add fields to the 
 schema if an input document contains unknown fields
 --

 Key: SOLR-4894
 URL: https://issues.apache.org/jira/browse/SOLR-4894
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-4894.patch


 Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same 
 chain will detect, parse and convert unknown fields’ {{String}}-typed values 
 to the appropriate Java object type.
 This factory will take as configuration a set of mappings from Java object 
 type to schema field type.
 {{ManagedIndexSchema.addFields()}} adds new fields to the schema.
 If schema addition fails for any field, addition is re-attempted only for 
 those that don’t match any schema field.  This process is repeated, either 
 until all new fields are successfully added, or until there are no new fields 
 (because the fields that were new when this update chain started its work 
 were subsequently added by a different update request, possibly on a 
 different node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields

2013-07-15 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708567#comment-13708567
 ] 

Steve Rowe commented on SOLR-4894:
--

bq. Is there a plan to support specifying fieldType alongside with field name?

That's (indirectly/partially) possible now, in two ways:

# Using dynamic fields, which encode fieldType via a field name prefix or 
suffix.
# Using AddSchemaFieldsUpdateProcessor and sending doc updates via JSON - its 
typed values are mapped to fieldTypes in the ASFUPF config in solrconfig.xml.

That said, it might be useful to include the capability you describe in the 
future.  Though I haven't made plans to do so myself, patches are welcome!

 Add a new update processor factory that will dynamically add fields to the 
 schema if an input document contains unknown fields
 --

 Key: SOLR-4894
 URL: https://issues.apache.org/jira/browse/SOLR-4894
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-4894.patch


 Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same 
 chain will detect, parse and convert unknown fields’ {{String}}-typed values 
 to the appropriate Java object type.
 This factory will take as configuration a set of mappings from Java object 
 type to schema field type.
 {{ManagedIndexSchema.addFields()}} adds new fields to the schema.
 If schema addition fails for any field, addition is re-attempted only for 
 those that don’t match any schema field.  This process is repeated, either 
 until all new fields are successfully added, or until there are no new fields 
 (because the fields that were new when this update chain started its work 
 were subsequently added by a different update request, possibly on a 
 different node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken

2013-07-15 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708568#comment-13708568
 ] 

Robert Muir commented on LUCENE-5112:
-

This can happen if a consumer is not calling reset(), either code pulling the 
tokens or a filter overrides reset but doesn't invoke the superclass reset to 
pass it down the chain.

 FilteringTokenFilter is double incrementing the position increment in 
 incrementToken
 

 Key: LUCENE-5112
 URL: https://issues.apache.org/jira/browse/LUCENE-5112
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: George Rhoten

 The following code from FilteringTokenFilter#incrementToken() seems wrong.
 {noformat}
 if (enablePositionIncrements) {
   int skippedPositions = 0;
   while (input.incrementToken()) {
 if (accept()) {
   if (skippedPositions != 0) {
 posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() 
 + skippedPositions);
   }
   return true;
 }
 skippedPositions += posIncrAtt.getPositionIncrement();
   }
 } else {
 {noformat} 
 The skippedPositions variable should probably be incremented by 1 instead of 
 posIncrAtt.getPositionIncrement(). As it is, it seems to be double 
 incrementing, which is a problem if your data is full of stop words and your 
 position increment integer overflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-4.x-Java6 - Build # 1797 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java6/1797/

2 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=2562, 
name=recoveryCmdExecutor-954-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest] at 
java.net.PlainSocketImpl.socketConnect(Native Method) at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
 at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)  
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at 
java.net.Socket.connect(Socket.java:546) at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
 at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
 at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
 at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
 at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:679)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=2562, name=recoveryCmdExecutor-954-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
at __randomizedtesting.SeedInfo.seed([4215CE0A1C54C0AE]:0)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
There are still zombie threads that couldn't be terminated:1) 
Thread[id=2562, name=recoveryCmdExecutor-954-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest] at 
java.net.PlainSocketImpl.socketConnect(Native Method) at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
 at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)  
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)

[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields

2013-07-15 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708577#comment-13708577
 ] 

Jack Krupansky commented on SOLR-4894:
--

bq. support specifying fieldType alongside with field name

Could you elaborate and provide an example?

The new parse update processors can be used to give values a desired Java 
type, and then this Add Schema Fields update processor can map specific Java 
value types (optionally constrained by field names or field name regex 
patterns) to specific Solr field type names. So, what exactly is still missing?

 Add a new update processor factory that will dynamically add fields to the 
 schema if an input document contains unknown fields
 --

 Key: SOLR-4894
 URL: https://issues.apache.org/jira/browse/SOLR-4894
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-4894.patch


 Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same 
 chain will detect, parse and convert unknown fields’ {{String}}-typed values 
 to the appropriate Java object type.
 This factory will take as configuration a set of mappings from Java object 
 type to schema field type.
 {{ManagedIndexSchema.addFields()}} adds new fields to the schema.
 If schema addition fails for any field, addition is re-attempted only for 
 those that don’t match any schema field.  This process is repeated, either 
 until all new fields are successfully added, or until there are no new fields 
 (because the fields that were new when this update chain started its work 
 were subsequently added by a different update request, possibly on a 
 different node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken

2013-07-15 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708580#comment-13708580
 ] 

Uwe Schindler commented on LUCENE-5112:
---

The code is correct. As [~rcmuir] says - if you don't call reset() before 
consuming, the overflow might happen.

 FilteringTokenFilter is double incrementing the position increment in 
 incrementToken
 

 Key: LUCENE-5112
 URL: https://issues.apache.org/jira/browse/LUCENE-5112
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: George Rhoten

 The following code from FilteringTokenFilter#incrementToken() seems wrong.
 {noformat}
 if (enablePositionIncrements) {
   int skippedPositions = 0;
   while (input.incrementToken()) {
 if (accept()) {
   if (skippedPositions != 0) {
 posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() 
 + skippedPositions);
   }
   return true;
 }
 skippedPositions += posIncrAtt.getPositionIncrement();
   }
 } else {
 {noformat} 
 The skippedPositions variable should probably be incremented by 1 instead of 
 posIncrAtt.getPositionIncrement(). As it is, it seems to be double 
 incrementing, which is a problem if your data is full of stop words and your 
 position increment integer overflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary


[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708486#comment-13708486
 ] 

Han Jiang edited comment on LUCENE-3069 at 7/15/13 4:09 PM:


bq. I think we should assert that the seekCeil returned SeekStatus.FOUND?

Ok! I'll commit that.

bq. useCache is an ancient option from back when we had a terms dict cache

Yes, I suppose is is not 'clear' to have this parameter.

bq. seekExact is working as it should I think.

Currently, I think those 'seek' methods are supposed to change the enum pointer 
based on
input term string, and fetch related metadata from term dict. 

However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState 
to enum, which 
doesn't actually operate 'seek' on dictionary. 

bq. Maybe instead of term and meta members, we could just hold the current pair?

Oh, yes, I once thought about this, but not sure: like, can the callee always 
makes sure that,
when 'term()' is called, it will always return a valid term?
The codes in MemoryPF just return 'pair.output' regardless whether pair==null, 
is it safe?

bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF?

Oops! thanks, nice catch!

bq. It doesn't impl equals (must it really impl hashCode?)

-Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether two 
fst nodes can be 'merged'.-
Oops, I forgot it still relys on equals to make sure two instance really 
matches, ok, I'll add that.

By the way, for real data, when two outputs are not 'NO_OUTPUT', even they 
contains the same metadata + stats, 
it seems to be very seldom that their arcs can be identical on FST (increases 
less than 1MB for wikimedium1m if 
equals always return false for non-singleton argument). Therefore... yes, 
hashCode() isn't necessary here.

  was (Author: billy):
bq. I think we should assert that the seekCeil returned SeekStatus.FOUND?

Ok! I'll commit that.

bq. useCache is an ancient option from back when we had a terms dict cache

Yes, I suppose is is not 'clear' to have this parameter.

bq. seekExact is working as it should I think.

Currently, I think those 'seek' methods are supposed to change the enum pointer 
based on
input term string, and fetch related metadata from term dict. 

However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState 
to enum, which 
doesn't actually operate 'seek' on dictionary. 

bq. Maybe instead of term and meta members, we could just hold the current pair?

Oh, yes, I once thought about this, but not sure: like, can the callee always 
makes sure that,
when 'term()' is called, it will always return a valid term?
The codes in MemoryPF just return 'pair.output' regardless whether pair==null, 
is it safe?

bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF?

Oops! thanks, nice catch!

bq. It doesn't impl equals (must it really impl hashCode?)

-Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether two 
fst nodes can be 'merged'.-
Oops, I forgot it still relys on equals to make sure two instance really 
matches, ok, I'll add that.
  
 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5113) Allow for packing the pending values of our AppendingLongBuffers

Adrien Grand created LUCENE-5113:


 Summary: Allow for packing the pending values of our 
AppendingLongBuffers
 Key: LUCENE-5113
 URL: https://issues.apache.org/jira/browse/LUCENE-5113
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor


When working with small arrays, the pending values might require substantial 
space. So we could allow for packing the pending values in order to save space, 
the drawback being that this operation will make the buffer read-only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-07-15 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708597#comment-13708597
 ] 

Yonik Seeley commented on SOLR-3076:


It seems like the implementation of AddUpdateCommand and AddBlockCommand have 
almost everything in common (or should... such as handing reordered 
delete-by-queries, etc).  For the most part, the only difference will be what 
IndexWriter method is finally called.  I'm considering just modifying 
AddUpdateCommand instead of having a separate AddBlockCommand, but I was 
wondering about the reasoning behind a separate command.


 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
 Fix For: 5.0, 4.4

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708618#comment-13708618
 ] 

Michael McCandless commented on LUCENE-4845:


bq. Mike, will you still commit it to 4.4?

OK I'll commit shortly  backport to 4.4 branch...

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState


[ 
https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708619#comment-13708619
 ] 

ASF subversion and git services commented on LUCENE-5090:
-

Commit 1503327 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1503327 ]

LUCENE-5090: catch mismatched readers in 
SortedSetDocValuesAccumulator/ReaderState

 SSDVA should detect a mismatch in the SSDVReaderState
 -

 Key: LUCENE-5090
 URL: https://issues.apache.org/jira/browse/LUCENE-5090
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: LUCENE-5090.patch, LUCENE-5090.patch


 This is trappy today: every time you open a new reader, you must create a new 
 SSDVReaderState (this computes the seg - global ord mapping), and pass that 
 to SSDVA.
 But if this gets messed up (e.g. you pass an old SSDVReaderState) it will 
 result in confusing AIOOBE, or silently invalid results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708621#comment-13708621
 ] 

ASF subversion and git services commented on SOLR-4997:
---

Commit 1503328 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1503328 ]

SOLR-4997: Skip log recovery for sub shard leaders only

 The splitshard api doesn't call commit on new sub shards
 

 Key: SOLR-4997
 URL: https://issues.apache.org/jira/browse/SOLR-4997
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3, 4.3.1
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.4

 Attachments: SOLR-4997.patch, SOLR-4997.patch


 The splitshard api doesn't call commit on new sub shards but it happily sets 
 them to active state which means on a successful split, the documents are not 
 visible to searchers unless an explicit commit is called on the cluster.
 The coreadmin split api will still not call commit on targetCores. That is by 
 design and we're not going to change that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708623#comment-13708623
 ] 

ASF subversion and git services commented on LUCENE-5090:
-

Commit 1503329 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1503329 ]

LUCENE-5090: catch mismatched readers in 
SortedSetDocValuesAccumulator/ReaderState

 SSDVA should detect a mismatch in the SSDVReaderState
 -

 Key: LUCENE-5090
 URL: https://issues.apache.org/jira/browse/LUCENE-5090
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: LUCENE-5090.patch, LUCENE-5090.patch


 This is trappy today: every time you open a new reader, you must create a new 
 SSDVReaderState (this computes the seg - global ord mapping), and pass that 
 to SSDVA.
 But if this gets messed up (e.g. you pass an old SSDVReaderState) it will 
 result in confusing AIOOBE, or silently invalid results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708624#comment-13708624
 ] 

ASF subversion and git services commented on SOLR-4997:
---

Commit 1503331 from sha...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1503331 ]

SOLR-4997: Skip log recovery for sub shard leaders only

 The splitshard api doesn't call commit on new sub shards
 

 Key: SOLR-4997
 URL: https://issues.apache.org/jira/browse/SOLR-4997
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3, 4.3.1
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.4

 Attachments: SOLR-4997.patch, SOLR-4997.patch


 The splitshard api doesn't call commit on new sub shards but it happily sets 
 them to active state which means on a successful split, the documents are not 
 visible to searchers unless an explicit commit is called on the cluster.
 The coreadmin split api will still not call commit on targetCores. That is by 
 design and we're not going to change that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708625#comment-13708625
 ] 

ASF subversion and git services commented on SOLR-4997:
---

Commit 1503332 from sha...@apache.org in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1503332 ]

SOLR-4997: Skip log recovery for sub shard leaders only

 The splitshard api doesn't call commit on new sub shards
 

 Key: SOLR-4997
 URL: https://issues.apache.org/jira/browse/SOLR-4997
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3, 4.3.1
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.4

 Attachments: SOLR-4997.patch, SOLR-4997.patch


 The splitshard api doesn't call commit on new sub shards but it happily sets 
 them to active state which means on a successful split, the documents are not 
 visible to searchers unless an explicit commit is called on the cluster.
 The coreadmin split api will still not call commit on targetCores. That is by 
 design and we're not going to change that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState

2013-07-15 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-5090.


Resolution: Fixed

 SSDVA should detect a mismatch in the SSDVReaderState
 -

 Key: LUCENE-5090
 URL: https://issues.apache.org/jira/browse/LUCENE-5090
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: LUCENE-5090.patch, LUCENE-5090.patch


 This is trappy today: every time you open a new reader, you must create a new 
 SSDVReaderState (this computes the seg - global ord mapping), and pass that 
 to SSDVA.
 But if this gets messed up (e.g. you pass an old SSDVReaderState) it will 
 result in confusing AIOOBE, or silently invalid results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState


[ 
https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708626#comment-13708626
 ] 

ASF subversion and git services commented on LUCENE-5090:
-

Commit 150 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r150 ]

LUCENE-5090: catch mismatched readers in 
SortedSetDocValuesAccumulator/ReaderState

 SSDVA should detect a mismatch in the SSDVReaderState
 -

 Key: LUCENE-5090
 URL: https://issues.apache.org/jira/browse/LUCENE-5090
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: LUCENE-5090.patch, LUCENE-5090.patch


 This is trappy today: every time you open a new reader, you must create a new 
 SSDVReaderState (this computes the seg - global ord mapping), and pass that 
 to SSDVA.
 But if this gets messed up (e.g. you pass an old SSDVReaderState) it will 
 result in confusing AIOOBE, or silently invalid results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5039) Admin UI displays -1 for term count in multiValued fields

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708627#comment-13708627
 ] 

ASF subversion and git services commented on SOLR-5039:
---

Commit 1503335 from [~erickoerickson] in branch 'dev/trunk'
[ https://svn.apache.org/r1503335 ]

Moved SOLR-5039 to proper section

 Admin UI displays -1 for term count in multiValued fields
 -

 Key: SOLR-5039
 URL: https://issues.apache.org/jira/browse/SOLR-5039
 Project: Solr
  Issue Type: Bug
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-5039.patch


 I thought this had been a JIRA before, but I couldn't find it. Problem is 
 that LukeRequestHandler.getDetailedFieldInfo gets the count by this line:
 tiq.distinctTerms = new Long(terms.size()).intValue();
 which is -1 at least for multiValued fields. I'll attach a patch in a second 
 that just counts things up. It worked last night, but it was late.
 I obviously don't understand what's up with MultiTerms.size() is hard-coded 
 to return -1. Can anyone shed light on this? Or see the two-line change and 
 see if it makes sense?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards

2013-07-15 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708629#comment-13708629
 ] 

Shalin Shekhar Mangar commented on SOLR-4997:
-

I fixed a bug that I had introduced which skipped log recovery on startup for 
all leaders instead of only sub shard leaders. I caught this only because I was 
doing another line-by-line review of all my changes. We should have a test 
which catches such a condition.

 The splitshard api doesn't call commit on new sub shards
 

 Key: SOLR-4997
 URL: https://issues.apache.org/jira/browse/SOLR-4997
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3, 4.3.1
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.4

 Attachments: SOLR-4997.patch, SOLR-4997.patch


 The splitshard api doesn't call commit on new sub shards but it happily sets 
 them to active state which means on a successful split, the documents are not 
 visible to searchers unless an explicit commit is called on the cluster.
 The coreadmin split api will still not call commit on targetCores. That is by 
 design and we're not going to change that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5039) Admin UI displays -1 for term count in multiValued fields

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708631#comment-13708631
 ] 

ASF subversion and git services commented on SOLR-5039:
---

Commit 1503336 from [~erickoerickson] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1503336 ]

Moved SOLR-5039 to proper section

 Admin UI displays -1 for term count in multiValued fields
 -

 Key: SOLR-5039
 URL: https://issues.apache.org/jira/browse/SOLR-5039
 Project: Solr
  Issue Type: Bug
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-5039.patch


 I thought this had been a JIRA before, but I couldn't find it. Problem is 
 that LukeRequestHandler.getDetailedFieldInfo gets the count by this line:
 tiq.distinctTerms = new Long(terms.size()).intValue();
 which is -1 at least for multiValued fields. I'll attach a patch in a second 
 that just counts things up. It worked last night, but it was late.
 I obviously don't understand what's up with MultiTerms.size() is hard-coded 
 to return -1. Can anyone shed light on this? Or see the two-line change and 
 see if it makes sense?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5039) Admin UI displays -1 for term count in multiValued fields

2013-07-15 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708634#comment-13708634
 ] 

Erick Erickson commented on SOLR-5039:
--

Crap! Whaddya want anyway? It's right above the 4.3 section... and immediately 
below the 4.3.1.

It's a 4.4 fix.

Fixing it up, thanks for catching!

 Admin UI displays -1 for term count in multiValued fields
 -

 Key: SOLR-5039
 URL: https://issues.apache.org/jira/browse/SOLR-5039
 Project: Solr
  Issue Type: Bug
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-5039.patch


 I thought this had been a JIRA before, but I couldn't find it. Problem is 
 that LukeRequestHandler.getDetailedFieldInfo gets the count by this line:
 tiq.distinctTerms = new Long(terms.size()).intValue();
 which is -1 at least for multiValued fields. I'll attach a patch in a second 
 that just counts things up. It worked last night, but it was late.
 I obviously don't understand what's up with MultiTerms.size() is hard-coded 
 to return -1. Can anyone shed light on this? Or see the two-line change and 
 see if it makes sense?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5039) Admin UI displays -1 for term count in multiValued fields

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708635#comment-13708635
 ] 

ASF subversion and git services commented on SOLR-5039:
---

Commit 1503338 from [~erickoerickson] in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1503338 ]

Moved SOLR-5039 to proper section

 Admin UI displays -1 for term count in multiValued fields
 -

 Key: SOLR-5039
 URL: https://issues.apache.org/jira/browse/SOLR-5039
 Project: Solr
  Issue Type: Bug
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-5039.patch


 I thought this had been a JIRA before, but I couldn't find it. Problem is 
 that LukeRequestHandler.getDetailedFieldInfo gets the count by this line:
 tiq.distinctTerms = new Long(terms.size()).intValue();
 which is -1 at least for multiValued fields. I'll attach a patch in a second 
 that just counts things up. It worked last night, but it was late.
 I obviously don't understand what's up with MultiTerms.size() is hard-coded 
 to return -1. Can anyone shed light on this? Or see the two-line change and 
 see if it makes sense?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-07-15 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-3069:
--

Attachment: LUCENE-3069.patch

Patch according to previous comments.

We still somewhat need the existance of
hashCode(), because in NodeHash, it will 
check whether the frozen node have the same 
hashcode with uncompiled node (NodeHash:128).

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
 LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester


[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708640#comment-13708640
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503340 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1503340 ]

LUCENE-4845: add AnalyzingInfixSuggester

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708638#comment-13708638
 ] 

Han Jiang edited comment on LUCENE-3069 at 7/15/13 5:08 PM:


Patch according to previous comments.

We still somewhat need the existance of
hashCode(), because in NodeHash, it will 
check whether the frozen node have the same 
hashcode with uncompiled node (NodeHash.java:128).

Although later, for nodes with outputs, it'll hardly 
find a same node from hashtable.

  was (Author: billy):
Patch according to previous comments.

We still somewhat need the existance of
hashCode(), because in NodeHash, it will 
check whether the frozen node have the same 
hashcode with uncompiled node (NodeHash:128).
  
 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
 LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (64bit/ibm-j9-jdk6) - Build # 6501 - Failure!

2013-07-15 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6501/
Java: 64bit/ibm-j9-jdk6 
-Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}

1 tests failed.
REGRESSION:  org.apache.solr.core.TestJmxIntegration.testJmxRegistration

Error Message:
No SolrDynamicMBeans found

Stack Trace:
java.lang.AssertionError: No SolrDynamicMBeans found
at 
__randomizedtesting.SeedInfo.seed([2387D7242E862648:AD56B31E43C77E2D]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.core.TestJmxIntegration.testJmxRegistration(TestJmxIntegration.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:738)




Build Log:
[...truncated 8978 lines...]
   [junit4] Suite:

[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester


[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708674#comment-13708674
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503356 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1503356 ]

LUCENE-4845: add AnalyzingInfixSuggester

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5114) remove boolean useCache param from TermsEnum.seekCeil/Exact

2013-07-15 Thread ASF subversion and git services (JIRA)

Michael McCandless created LUCENE-5114:
--

 Summary: remove boolean useCache param from 
TermsEnum.seekCeil/Exact
 Key: LUCENE-5114
 URL: https://issues.apache.org/jira/browse/LUCENE-5114
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.5


Long ago terms dict had a cache, but it was problematic and we removed it, but 
the API still has a relic boolean useCache ... I think we should drop it from 
the API as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester


[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708682#comment-13708682
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503359 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1503359 ]

LUCENE-4845: add AnalyzingInfixSuggester

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4845) Add AnalyzingInfixSuggester


 [ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-4845.


Resolution: Fixed

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-4.x-Java7 - Build # 1413 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java7/1413/

4 tests failed.
REGRESSION:  org.apache.solr.core.TestJmxIntegration.testJmxUpdate

Error Message:
No mbean found for SolrIndexSearcher

Stack Trace:
java.lang.AssertionError: No mbean found for SolrIndexSearcher
at 
__randomizedtesting.SeedInfo.seed([81ED668B798E415A:978A54E1E958EAF1]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertFalse(Assert.java:68)
at 
org.apache.solr.core.TestJmxIntegration.testJmxUpdate(TestJmxIntegration.java:120)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:722)


REGRESSION:  org.apache.solr.core.TestJmxIntegration.testJmxRegistration

Error Message:
No SolrDynamicMBeans found

Stack Trace:

[jira] [Commented] (SOLR-2345) Extend geodist() to support MultiValued lat long field

2013-07-15 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708751#comment-13708751
 ] 

David Smiley commented on SOLR-2345:


By the way, geodist() handles a variety of invocation approaches, not all of 
which involve sfield.  From the comments:
{code}
// m is a multi-value source, x is a single-value source
// allow (m,m) (m,x,x) (x,x,m) (x,x,x,x)
// if not enough points are present, pt will be checked first, followed 
by sfield.
{code}
Adapting geodist() to support RPT will only work with explicit use of sfield  
pt.

 Extend geodist() to support MultiValued lat long field
 --

 Key: SOLR-2345
 URL: https://issues.apache.org/jira/browse/SOLR-2345
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: Bill Bell
Assignee: David Smiley
 Fix For: 4.4

 Attachments: SOLR-2345_geodist_refactor.patch


 Extend geodist() and {!geofilt} to support a multiValued lat,long field 
 without using geohash.
 sort=geodist() asc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #387: POMs out of sync

Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/387/

All tests passed

Build Log:
[...truncated 20509 lines...]
  [mvn] [INFO] -
  [mvn] [INFO] -
  [mvn] [ERROR] COMPILATION ERROR : 
  [mvn] [INFO] -

[...truncated 311 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2013-07-15 Thread Andrew Muldowney (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708757#comment-13708757
 ] 

Andrew Muldowney commented on SOLR-2894:


Im working on this patch again, looking into the limit issue and the fact that 
exclusion tags aren't being respected. They both boil down to improperly 
formatted refinement requests, so I'm going through and cleaning those up to 
look more and more like the distributed field facet code. Should also have time 
to get to the datetime problem, where you cannot refine on datetimes because 
the datetime format returned by the shards is not queryable when refining.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.4

 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894-reworked.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4997) The splitshard api doesn't call commit on new sub shards

2013-07-15 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708776#comment-13708776
 ] 

Mark Miller commented on SOLR-4997:
---

bq. We should have a test which catches such a condition.

yeah, scary.

 The splitshard api doesn't call commit on new sub shards
 

 Key: SOLR-4997
 URL: https://issues.apache.org/jira/browse/SOLR-4997
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3, 4.3.1
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.4

 Attachments: SOLR-4997.patch, SOLR-4997.patch


 The splitshard api doesn't call commit on new sub shards but it happily sets 
 them to active state which means on a successful split, the documents are not 
 visible to searchers unless an explicit commit is called on the cluster.
 The coreadmin split api will still not call commit on targetCores. That is by 
 design and we're not going to change that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5113) Allow for packing the pending values of our AppendingLongBuffers

2013-07-15 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708777#comment-13708777
 ] 

Robert Muir commented on LUCENE-5113:
-

+1, the little 8KB pending buffers can really add up, e.g. if you have an 
OrdinalMap over 25 segments (with zero terms!), thats 200KB just for pending[]s.

We could try to solve it in another way if it makes appending* complicated or 
would hurt performance, e.g. maybe this map could use some other packed ints 
api.

There are a few other places using this buffer though: I think fieldcache term 
addresses, indexwriter consumers, not sure what else.


 Allow for packing the pending values of our AppendingLongBuffers
 

 Key: LUCENE-5113
 URL: https://issues.apache.org/jira/browse/LUCENE-5113
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor

 When working with small arrays, the pending values might require substantial 
 space. So we could allow for packing the pending values in order to save 
 space, the drawback being that this operation will make the buffer read-only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module

2013-07-15 Thread Mark Miller

My feeling is that what we need most is what I've been working on (surprise, 
surprise :) )

We need a simple Java app, very similar to the std Luke app. We need it to be 
Apache licensed all the way through. We need it to be fully integrated as a 
module. We need it to be straightforward enough that any of the Lucene/Solr 
committers can easily work on it and update it as API's change. GWT is probably 
a stretch for that goal - Apache Pivot is pretty straight forward though - for 
any reasonable Java developer. I picked it up in absolutely no time to build 
the thing from scratch - modifying it is 10 times easier. The backend code is 
all java, the layout and widgets all XML.

I've been pushing towards that goal (over the years now) with Luke ALE (Apache 
Lucene Edition).

It's not a straight port of Luke with thinlet to Luke with Apache Pivot - Luke 
has 90% of it's code in one huge class - I've already been working on 
modularizing that code as I've moved it over - not too heavily because that 
would have made it difficult to keep porting code, but a good start. Now that 
the majority of features have been moved over, it's probably easier to keep 
refactoring - which is needed, because another very important missing piece is 
unit tests - and good units tests will require even more refactoring of the 
code.

I also think a GWT version - something that could probably run nicely with Solr 
- would be awesome. But way down the line in priority for me. We need something 
very close to Lucene that the committers will push up the hill as they push 
Lucene.

- Mark

On Jul 15, 2013, at 11:15 AM, Robert Muir rcm...@gmail.com wrote:

 I disagree with this completely. Solr is last priority
 
 On Jul 15, 2013 6:14 AM, Jack Krupansky j...@basetechnology.com wrote:
 My personal thoughts/preferences/suggestions for Luke:
  
 1. Need a clean Luke Java library – heavily unit-tested. As integrated with 
 Lucene as possible.
 2. A simple command line interface – always useful.
 3. A Solr plugin handler – based on #1. Good for apps as well as Admin UI. 
 Nice to be able to curl a request to look at a specific doc, for example.
 4. GUI fully integrated with the new Solr Web Admin UI. A separate UI... 
 sucks.
 5. Any additional, un-untegrated GUI is icing on the cake and not really 
 desirable for Solr. May be great for Elasticsearch and other Lucene-based 
 apps, but Solr should be the #1 priority – after #1 and #2 above.
 
 -- Jack Krupansky
  
 From: Dmitry Kan
 Sent: Monday, July 15, 2013 8:54 AM
 To: dev@lucene.apache.org
 Subject: Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr 
 Module
  
 Hello guys,
 
 Indeed, the GWT port is work in progress and far from done. The driving 
 factor here was to be able to later integrate luke into the solr admin as 
 well as have the standalone webapp for non-solr users.
 There is (was?) a luke stats handler in the solr ui, that printed some stats 
 on the index. That could be substituted with the GWT app.
 
 The code isn't yet ready to see the light. So if it makes more sense for Ajay 
 to work on the existing jira with the Apache Pivot implementation, I would 
 say go ahead.
 
 In the current port effort (the aforementioned github's fork) the UI is the 
 original one, developed by Andrzej.  Beside the UI rework there is plenty 
 things to port / verify (like e.g. Hadoop plugin) against the latest lucene 
 versions.
 
 See the readme.md: https://github.com/dmitrykey/luke
 
 
 Whichever way's taken, hopefully we end up having stable releases of luke :)
 
 Dmitry Kan
 
 
 On 14 July 2013 22:38, Andrzej Bialecki a...@getopt.org wrote:
 On 7/14/13 5:04 AM, Ajay Bhat wrote:
 Shawn and Andrzej,
 
 Thanks for answering my questions. I've looked over the code done by
 Dmitry and I'll look into what I can do to help with the UI porting in
 future.
 
 I was actually thinking of doing this JIRA as a project by myself with
 some assistance from the community after getting a mentor for the ASF
 ICFOSS program, which I haven't found yet. It would be great if I could
 get one of you guys as a mentor.
 
 As the UI work has been mostly done by others like Dmitry Kan, I don't
 think I need to work on that majorly for now.
 
 It's far from done - he just started the process.
 
 
 What other work is there to be done that I can do as a project? Any new
 features or improvements?
 
 Regards,
 Ajay
 
 On Jul 14, 2013 1:54 AM, Andrzej Bialecki a...@getopt.org
 mailto:a...@getopt.org wrote:
 
 On 7/13/13 8:56 PM, Shawn Heisey wrote:
 
 On 7/13/2013 3:15 AM, Ajay Bhat wrote:
 
 One more question : What version of Lucene does Luke
 currently support
 right now? I saw a comment on the issue page that it doesn't
 support the
 Lucene 4.1 and 4.2 trunk.
 
 
 The official Luke project only has versions up through 4.0.0-ALPHA.
 
 http://code.google.com/p/luke/
 
 There is a forked project that has

[JENKINS] Lucene-Solr-Tests-trunk-Java7 - Build # 4145 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java7/4145/

1 tests failed.
REGRESSION:  org.apache.lucene.facet.search.TestDrillSideways.testRandom

Error Message:
the SortedSetDocValuesReaderState provided to this class does not match the 
reader being searched; you must create a new SortedSetDocValuesReaderState 
every time you open a new IndexReader

Stack Trace:
java.lang.IllegalStateException: the SortedSetDocValuesReaderState provided to 
this class does not match the reader being searched; you must create a new 
SortedSetDocValuesReaderState every time you open a new IndexReader
at 
__randomizedtesting.SeedInfo.seed([A42710A7EC312939:D66B35A85D519F4A]:0)
at 
org.apache.lucene.facet.sortedset.SortedSetDocValuesAccumulator$1.aggregate(SortedSetDocValuesAccumulator.java:102)
at 
org.apache.lucene.facet.sortedset.SortedSetDocValuesAccumulator.accumulate(SortedSetDocValuesAccumulator.java:210)
at 
org.apache.lucene.facet.search.FacetsCollector.getFacetResults(FacetsCollector.java:214)
at 
org.apache.lucene.facet.search.DrillSideways.search(DrillSideways.java:296)
at 
org.apache.lucene.facet.search.DrillSideways.search(DrillSideways.java:417)
at 
org.apache.lucene.facet.search.TestDrillSideways.testRandom(TestDrillSideways.java:810)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at

Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module

2013-07-15 Thread Ajay Bhat

Hi all,

The most pressing issue is that I need a mentor for this project by
Wednesday, 17th July 2013 if I'm to do it for the ASF-ICFOSS program [1].
Currently I've not found any mentors. Would anyone here please consent to
be a mentor for this project so I can include you in my proposal?

For the project I've decided to use Apache Pivot and familiarize myself
with it, going throught the tutorials ASAP

There's some more questions I have:

1. The original version by Andrzej [2] I have checked out in Eclipse but I
can't run it. It's all under mainly a huge Luke.java file. Just want to
check that the UI is same as that in the sandboxed version in Lucene.

2. There are various plugins that require Luke.java to be imported. But
there's also a Shell.java plugin [3] that doesn't need any import needed.
Does this mean it can be ported directly or is it kept for future
improvement? If its the latter I guess a CMD Interface suggested by Jack
Krupansky could be implemented using this class.


[1] http://community.apache.org/mentoringprogramme-icfoss-pilot.html

[2] https://code.google.com/p/luke/

[3] org.getopt.luke.plugins.Shell


On Mon, Jul 15, 2013 at 9:03 PM, Shawn Heisey s...@elyograg.org wrote:

 On 7/15/2013 9:15 AM, Robert Muir wrote:
  I disagree with this completely. Solr is last priority

 I'm on the Solr side of things, with only the tiniest knowledge or
 interest in hacking on Lucene.  Despite that, I have to agree with
 Robert here.

 Let's make sure the Luke module is very solid and prove that we can keep
 it operational through 2-3 full minor release cycles before we try to
 integrate it into Solr.

 We already have luke functionality in the Solr UI.  Compared to the real
 thing it might be a band-aid, but it works.

 Thanks,
 Shawn


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Request for Mentor for LUCENE-2562 : Make Luke a Lucene/Solr Module

2013-07-15 Thread Ajay Bhat

Two more questions :

1. How much of the original Luke.java has yet to be modularised?

2. What are the new APIs in Lucene 4.1 and 4.2 that need immediate
attention to be updated?


On Tue, Jul 16, 2013 at 12:15 AM, Ajay Bhat a.ajay.b...@gmail.com wrote:

 Hi all,

 The most pressing issue is that I need a mentor for this project by
 Wednesday, 17th July 2013 if I'm to do it for the ASF-ICFOSS program [1].
 Currently I've not found any mentors. Would anyone here please consent to
 be a mentor for this project so I can include you in my proposal?

 For the project I've decided to use Apache Pivot and familiarize myself
 with it, going throught the tutorials ASAP

 There's some more questions I have:

 1. The original version by Andrzej [2] I have checked out in Eclipse but I
 can't run it. It's all under mainly a huge Luke.java file. Just want to
 check that the UI is same as that in the sandboxed version in Lucene.

 2. There are various plugins that require Luke.java to be imported. But
 there's also a Shell.java plugin [3] that doesn't need any import needed.
 Does this mean it can be ported directly or is it kept for future
 improvement? If its the latter I guess a CMD Interface suggested by Jack
 Krupansky could be implemented using this class.


 [1] http://community.apache.org/mentoringprogramme-icfoss-pilot.html

 [2] https://code.google.com/p/luke/

 [3] org.getopt.luke.plugins.Shell


 On Mon, Jul 15, 2013 at 9:03 PM, Shawn Heisey s...@elyograg.org wrote:

 On 7/15/2013 9:15 AM, Robert Muir wrote:
  I disagree with this completely. Solr is last priority

 I'm on the Solr side of things, with only the tiniest knowledge or
 interest in hacking on Lucene.  Despite that, I have to agree with
 Robert here.

 Let's make sure the Luke module is very solid and prove that we can keep
 it operational through 2-3 full minor release cycles before we try to
 integrate it into Solr.

 We already have luke functionality in the Solr UI.  Compared to the real
 thing it might be a band-aid, but it works.

 Thanks,
 Shawn


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-Tests-trunk-Java7 - Build # 4145 - Still Failing

I'll dig.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jul 15, 2013 at 2:40 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java7/4145/

 1 tests failed.
 REGRESSION:  org.apache.lucene.facet.search.TestDrillSideways.testRandom

 Error Message:
 the SortedSetDocValuesReaderState provided to this class does not match the 
 reader being searched; you must create a new SortedSetDocValuesReaderState 
 every time you open a new IndexReader

 Stack Trace:
 java.lang.IllegalStateException: the SortedSetDocValuesReaderState provided 
 to this class does not match the reader being searched; you must create a new 
 SortedSetDocValuesReaderState every time you open a new IndexReader
 at 
 __randomizedtesting.SeedInfo.seed([A42710A7EC312939:D66B35A85D519F4A]:0)
 at 
 org.apache.lucene.facet.sortedset.SortedSetDocValuesAccumulator$1.aggregate(SortedSetDocValuesAccumulator.java:102)
 at 
 org.apache.lucene.facet.sortedset.SortedSetDocValuesAccumulator.accumulate(SortedSetDocValuesAccumulator.java:210)
 at 
 org.apache.lucene.facet.search.FacetsCollector.getFacetResults(FacetsCollector.java:214)
 at 
 org.apache.lucene.facet.search.DrillSideways.search(DrillSideways.java:296)
 at 
 org.apache.lucene.facet.search.DrillSideways.search(DrillSideways.java:417)
 at 
 org.apache.lucene.facet.search.TestDrillSideways.testRandom(TestDrillSideways.java:810)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at

[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708810#comment-13708810
 ] 

George Rhoten commented on LUCENE-5112:
---

Calling clearAttributes() at the start of incrementToken() in our custom 
Tokenizer seems to resolve this issue too. It would be helpful if the purpose 
of clearAttributes() in incrementToken() for a typical tokenizer was made 
clearer. This part of the API contract is not very clear.


 FilteringTokenFilter is double incrementing the position increment in 
 incrementToken
 

 Key: LUCENE-5112
 URL: https://issues.apache.org/jira/browse/LUCENE-5112
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: George Rhoten

 The following code from FilteringTokenFilter#incrementToken() seems wrong.
 {noformat}
 if (enablePositionIncrements) {
   int skippedPositions = 0;
   while (input.incrementToken()) {
 if (accept()) {
   if (skippedPositions != 0) {
 posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() 
 + skippedPositions);
   }
   return true;
 }
 skippedPositions += posIncrAtt.getPositionIncrement();
   }
 } else {
 {noformat} 
 The skippedPositions variable should probably be incremented by 1 instead of 
 posIncrAtt.getPositionIncrement(). As it is, it seems to be double 
 incrementing, which is a problem if your data is full of stop words and your 
 position increment integer overflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState


[ 
https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708823#comment-13708823
 ] 

ASF subversion and git services commented on LUCENE-5090:
-

Commit 1503423 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1503423 ]

LUCENE-5090: fix test bug that was using mismatched readers when faceting with 
SortedSetDVs

 SSDVA should detect a mismatch in the SSDVReaderState
 -

 Key: LUCENE-5090
 URL: https://issues.apache.org/jira/browse/LUCENE-5090
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: LUCENE-5090.patch, LUCENE-5090.patch


 This is trappy today: every time you open a new reader, you must create a new 
 SSDVReaderState (this computes the seg - global ord mapping), and pass that 
 to SSDVA.
 But if this gets messed up (e.g. you pass an old SSDVReaderState) it will 
 result in confusing AIOOBE, or silently invalid results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708826#comment-13708826
 ] 

ASF subversion and git services commented on LUCENE-5090:
-

Commit 1503424 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1503424 ]

LUCENE-5090: fix test bug that was using mismatched readers when faceting with 
SortedSetDVs

 SSDVA should detect a mismatch in the SSDVReaderState
 -

 Key: LUCENE-5090
 URL: https://issues.apache.org/jira/browse/LUCENE-5090
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: LUCENE-5090.patch, LUCENE-5090.patch


 This is trappy today: every time you open a new reader, you must create a new 
 SSDVReaderState (this computes the seg - global ord mapping), and pass that 
 to SSDVA.
 But if this gets messed up (e.g. you pass an old SSDVReaderState) it will 
 result in confusing AIOOBE, or silently invalid results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5090) SSDVA should detect a mismatch in the SSDVReaderState

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708827#comment-13708827
 ] 

ASF subversion and git services commented on LUCENE-5090:
-

Commit 1503425 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1503425 ]

LUCENE-5090: fix test bug that was using mismatched readers when faceting with 
SortedSetDVs

 SSDVA should detect a mismatch in the SSDVReaderState
 -

 Key: LUCENE-5090
 URL: https://issues.apache.org/jira/browse/LUCENE-5090
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: LUCENE-5090.patch, LUCENE-5090.patch


 This is trappy today: every time you open a new reader, you must create a new 
 SSDVReaderState (this computes the seg - global ord mapping), and pass that 
 to SSDVA.
 But if this gets messed up (e.g. you pass an old SSDVReaderState) it will 
 result in confusing AIOOBE, or silently invalid results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken

2013-07-15 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708829#comment-13708829
 ] 

Uwe Schindler commented on LUCENE-5112:
---

bq. Calling clearAttributes() at the start of incrementToken() in our custom 
Tokenizer seems to resolve this issue too.

This is mandatory, yes. If you don't do this ugly things can happen. I would 
suggest that you use BaseTokenStreamTestCase as base class for your 
tokenizer/tokenfilter tests. This class is part of the lucene-test framework. 
It will detect such errors.

 FilteringTokenFilter is double incrementing the position increment in 
 incrementToken
 

 Key: LUCENE-5112
 URL: https://issues.apache.org/jira/browse/LUCENE-5112
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: George Rhoten

 The following code from FilteringTokenFilter#incrementToken() seems wrong.
 {noformat}
 if (enablePositionIncrements) {
   int skippedPositions = 0;
   while (input.incrementToken()) {
 if (accept()) {
   if (skippedPositions != 0) {
 posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() 
 + skippedPositions);
   }
   return true;
 }
 skippedPositions += posIncrAtt.getPositionIncrement();
   }
 } else {
 {noformat} 
 The skippedPositions variable should probably be incremented by 1 instead of 
 posIncrAtt.getPositionIncrement(). As it is, it seems to be double 
 incrementing, which is a problem if your data is full of stop words and your 
 position increment integer overflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5101) make it easier to plugin different bitset implementations to CachingWrapperFilter

2013-07-15 Thread Paul Elschot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708830#comment-13708830
 ] 

Paul Elschot commented on LUCENE-5101:
--

I had another look at the recent benchmark results and something does not seem 
in order there.

At density -2 (1%), Elias-Fano is faster at advance(docID() +1) (2.45 times 
fixed) than at nextDoc() (1.81 times fixed), and I'd the FixedBitSet should 
have an almost equal run times for advance(docId()+1) and nextDoc().

The code for advance (advanceToValue in EliasFanoDecoder) is really more 
complex than the code for nextDoc (nextValue in EliasFanoDecoder) and the code 
at EliasFanoDocIdSet is so simple that it should not really influence things 
here.
So for EliasFanoDocIdSet advance(docId() + 1) should normally be slower than 
nextDoc(), but the benchmark contradicts this.

Could there be a mistake in the benchmark for these cases? Or is this within 
expected (JIT) tolerances?


 make it easier to plugin different bitset implementations to 
 CachingWrapperFilter
 -

 Key: LUCENE-5101
 URL: https://issues.apache.org/jira/browse/LUCENE-5101
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5101.patch


 Currently this is possible, but its not so friendly:
 {code}
   protected DocIdSet docIdSetToCache(DocIdSet docIdSet, AtomicReader reader) 
 throws IOException {
 if (docIdSet == null) {
   // this is better than returning null, as the nonnull result can be 
 cached
   return EMPTY_DOCIDSET;
 } else if (docIdSet.isCacheable()) {
   return docIdSet;
 } else {
   final DocIdSetIterator it = docIdSet.iterator();
   // null is allowed to be returned by iterator(),
   // in this case we wrap with the sentinel set,
   // which is cacheable.
   if (it == null) {
 return EMPTY_DOCIDSET;
   } else {
 /* INTERESTING PART */
 final FixedBitSet bits = new FixedBitSet(reader.maxDoc());
 bits.or(it);
 return bits;
 /* END INTERESTING PART */
   }
 }
   }
 {code}
 Is there any value to having all this other logic in the protected API? It 
 seems like something thats not useful for a subclass... Maybe this stuff can 
 become final, and INTERESTING PART calls a simpler method, something like:
 {code}
 protected DocIdSet cacheImpl(DocIdSetIterator iterator, AtomicReader reader) {
   final FixedBitSet bits = new FixedBitSet(reader.maxDoc());
   bits.or(iterator);
   return bits;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5101) make it easier to plugin different bitset implementations to CachingWrapperFilter

2013-07-15 Thread Paul Elschot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708830#comment-13708830
 ] 

Paul Elschot edited comment on LUCENE-5101 at 7/15/13 7:04 PM:
---

I had another look at the recent benchmark results and something does not seem 
in order there.

At density -2 (1%), Elias-Fano is faster at advance(docID() +1) (2.45 times 
fixed) than at nextDoc() (1.81 times fixed), and I would expect that 
FixedBitSet would have an almost equal run times for advance(docId()+1) and 
nextDoc().

The code for advance (advanceToValue in EliasFanoDecoder) is really more 
complex than the code for nextDoc (nextValue in EliasFanoDecoder) and the code 
at EliasFanoDocIdSet is so simple that it should not really influence things 
here.
So for EliasFanoDocIdSet advance(docId() + 1) should normally be slower than 
nextDoc(), but the benchmark contradicts this.

Could there be a mistake in the benchmark for these cases? Or is this within 
expected (JIT) tolerances?


  was (Author: paul.elsc...@xs4all.nl):
I had another look at the recent benchmark results and something does not 
seem in order there.

At density -2 (1%), Elias-Fano is faster at advance(docID() +1) (2.45 times 
fixed) than at nextDoc() (1.81 times fixed), and I'd the FixedBitSet should 
have an almost equal run times for advance(docId()+1) and nextDoc().

The code for advance (advanceToValue in EliasFanoDecoder) is really more 
complex than the code for nextDoc (nextValue in EliasFanoDecoder) and the code 
at EliasFanoDocIdSet is so simple that it should not really influence things 
here.
So for EliasFanoDocIdSet advance(docId() + 1) should normally be slower than 
nextDoc(), but the benchmark contradicts this.

Could there be a mistake in the benchmark for these cases? Or is this within 
expected (JIT) tolerances?

  
 make it easier to plugin different bitset implementations to 
 CachingWrapperFilter
 -

 Key: LUCENE-5101
 URL: https://issues.apache.org/jira/browse/LUCENE-5101
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5101.patch


 Currently this is possible, but its not so friendly:
 {code}
   protected DocIdSet docIdSetToCache(DocIdSet docIdSet, AtomicReader reader) 
 throws IOException {
 if (docIdSet == null) {
   // this is better than returning null, as the nonnull result can be 
 cached
   return EMPTY_DOCIDSET;
 } else if (docIdSet.isCacheable()) {
   return docIdSet;
 } else {
   final DocIdSetIterator it = docIdSet.iterator();
   // null is allowed to be returned by iterator(),
   // in this case we wrap with the sentinel set,
   // which is cacheable.
   if (it == null) {
 return EMPTY_DOCIDSET;
   } else {
 /* INTERESTING PART */
 final FixedBitSet bits = new FixedBitSet(reader.maxDoc());
 bits.or(it);
 return bits;
 /* END INTERESTING PART */
   }
 }
   }
 {code}
 Is there any value to having all this other logic in the protected API? It 
 seems like something thats not useful for a subclass... Maybe this stuff can 
 become final, and INTERESTING PART calls a simpler method, something like:
 {code}
 protected DocIdSet cacheImpl(DocIdSetIterator iterator, AtomicReader reader) {
   final FixedBitSet bits = new FixedBitSet(reader.maxDoc());
   bits.or(iterator);
   return bits;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (LUCENE-5112) FilteringTokenFilter is double incrementing the position increment in incrementToken

2013-07-15 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-5112.
-


 FilteringTokenFilter is double incrementing the position increment in 
 incrementToken
 

 Key: LUCENE-5112
 URL: https://issues.apache.org/jira/browse/LUCENE-5112
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: George Rhoten
Assignee: Uwe Schindler

 The following code from FilteringTokenFilter#incrementToken() seems wrong.
 {noformat}
 if (enablePositionIncrements) {
   int skippedPositions = 0;
   while (input.incrementToken()) {
 if (accept()) {
   if (skippedPositions != 0) {
 posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() 
 + skippedPositions);
   }
   return true;
 }
 skippedPositions += posIncrAtt.getPositionIncrement();
   }
 } else {
 {noformat} 
 The skippedPositions variable should probably be incremented by 1 instead of 
 posIncrAtt.getPositionIncrement(). As it is, it seems to be double 
 incrementing, which is a problem if your data is full of stop words and your 
 position increment integer overflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-07-15 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708856#comment-13708856
 ] 

Mikhail Khludnev commented on SOLR-3076:


[~ysee...@gmail.com]it's a ginger cake

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
 Fix For: 5.0, 4.4

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.7.0_25) - Build # 3037 - Failure!

2013-07-15 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3037/
Java: 32bit/jdk1.7.0_25 -client -XX:+UseSerialGC

2 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggesterTest

Error Message:
Resource in scope SUITE failed to close. Resource was registered from thread 
Thread[id=16, 
name=TEST-AnalyzingInfixSuggesterTest.testRandomMinPrefixLength-seed#[ECF9CF89952D6F7F],
 state=RUNNABLE, group=TGRP-AnalyzingInfixSuggesterTest], registration stack 
trace below.

Stack Trace:
com.carrotsearch.randomizedtesting.ResourceDisposalError: Resource in scope 
SUITE failed to close. Resource was registered from thread Thread[id=16, 
name=TEST-AnalyzingInfixSuggesterTest.testRandomMinPrefixLength-seed#[ECF9CF89952D6F7F],
 state=RUNNABLE, group=TGRP-AnalyzingInfixSuggesterTest], registration stack 
trace below.
at java.lang.Thread.getStackTrace(Thread.java:1568)
at 
com.carrotsearch.randomizedtesting.RandomizedContext.closeAtEnd(RandomizedContext.java:150)
at 
org.apache.lucene.util.LuceneTestCase.closeAfterSuite(LuceneTestCase.java:545)
at org.apache.lucene.util._TestUtil.getTempDir(_TestUtil.java:131)
at 
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggesterTest.testRandomMinPrefixLength(AnalyzingInfixSuggesterTest.java:116)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at

Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.7.0_25) - Build # 3037 - Failure!

I'll fix.


Mike McCandless

http://blog.mikemccandless.com


On Mon, Jul 15, 2013 at 3:39 PM, Policeman Jenkins Server
jenk...@thetaphi.de wrote:
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3037/
 Java: 32bit/jdk1.7.0_25 -client -XX:+UseSerialGC

 2 tests failed.
 FAILED:  
 junit.framework.TestSuite.org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggesterTest

 Error Message:
 Resource in scope SUITE failed to close. Resource was registered from thread 
 Thread[id=16, 
 name=TEST-AnalyzingInfixSuggesterTest.testRandomMinPrefixLength-seed#[ECF9CF89952D6F7F],
  state=RUNNABLE, group=TGRP-AnalyzingInfixSuggesterTest], registration stack 
 trace below.

 Stack Trace:
 com.carrotsearch.randomizedtesting.ResourceDisposalError: Resource in scope 
 SUITE failed to close. Resource was registered from thread Thread[id=16, 
 name=TEST-AnalyzingInfixSuggesterTest.testRandomMinPrefixLength-seed#[ECF9CF89952D6F7F],
  state=RUNNABLE, group=TGRP-AnalyzingInfixSuggesterTest], registration stack 
 trace below.
 at java.lang.Thread.getStackTrace(Thread.java:1568)
 at 
 com.carrotsearch.randomizedtesting.RandomizedContext.closeAtEnd(RandomizedContext.java:150)
 at 
 org.apache.lucene.util.LuceneTestCase.closeAfterSuite(LuceneTestCase.java:545)
 at org.apache.lucene.util._TestUtil.getTempDir(_TestUtil.java:131)
 at 
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggesterTest.testRandomMinPrefixLength(AnalyzingInfixSuggesterTest.java:116)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at

[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.7.0_25) - Build # 2988 - Failure!

2013-07-15 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/2988/
Java: 64bit/jdk1.7.0_25 -XX:-UseCompressedOops -XX:+UseSerialGC

2 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggesterTest

Error Message:
Resource in scope SUITE failed to close. Resource was registered from thread 
Thread[id=55, 
name=TEST-AnalyzingInfixSuggesterTest.testRandomMinPrefixLength-seed#[B1D21B594838883B],
 state=RUNNABLE, group=TGRP-AnalyzingInfixSuggesterTest], registration stack 
trace below.

Stack Trace:
com.carrotsearch.randomizedtesting.ResourceDisposalError: Resource in scope 
SUITE failed to close. Resource was registered from thread Thread[id=55, 
name=TEST-AnalyzingInfixSuggesterTest.testRandomMinPrefixLength-seed#[B1D21B594838883B],
 state=RUNNABLE, group=TGRP-AnalyzingInfixSuggesterTest], registration stack 
trace below.
at java.lang.Thread.getStackTrace(Thread.java:1568)
at 
com.carrotsearch.randomizedtesting.RandomizedContext.closeAtEnd(RandomizedContext.java:150)
at 
org.apache.lucene.util.LuceneTestCase.closeAfterSuite(LuceneTestCase.java:546)
at org.apache.lucene.util._TestUtil.getTempDir(_TestUtil.java:125)
at 
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggesterTest.testRandomMinPrefixLength(AnalyzingInfixSuggesterTest.java:116)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at

[jira] [Commented] (SOLR-4894) Add a new update processor factory that will dynamically add fields to the schema if an input document contains unknown fields

2013-07-15 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708899#comment-13708899
 ] 

Mikhail Khludnev commented on SOLR-4894:


[~jkrupan] I'm aiming something different than modeling java types. What we 
have now with dynamic fields cloth_COLOR, shoe_COLOR, wristlet_COLOR. I prefer 
to don't bother with dynamic field wildcard, but just send: {wristlet:red, 
type:COLOR}, {shoe:brown, type:COLOR }, etc 

 Add a new update processor factory that will dynamically add fields to the 
 schema if an input document contains unknown fields
 --

 Key: SOLR-4894
 URL: https://issues.apache.org/jira/browse/SOLR-4894
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-4894.patch


 Previous {{ParseFooUpdateProcessorFactory}}-s (see SOLR-4892) in the same 
 chain will detect, parse and convert unknown fields’ {{String}}-typed values 
 to the appropriate Java object type.
 This factory will take as configuration a set of mappings from Java object 
 type to schema field type.
 {{ManagedIndexSchema.addFields()}} adds new fields to the schema.
 If schema addition fails for any field, addition is re-attempted only for 
 those that don’t match any schema field.  This process is repeated, either 
 until all new fields are successfully added, or until there are no new fields 
 (because the fields that were new when this update chain started its work 
 were subsequently added by a different update request, possibly on a 
 different node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 315 - Still Failing

2013-07-15 Thread ASF subversion and git services (JIRA)

Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/315/

2 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=3869, 
name=recoveryCmdExecutor-1535-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest] at 
java.net.PlainSocketImpl.socketConnect(Native Method) at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
 at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)  
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at 
java.net.Socket.connect(Socket.java:546) at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
 at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
 at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
 at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
 at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:679)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=3869, name=recoveryCmdExecutor-1535-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
at __randomizedtesting.SeedInfo.seed([66AAD5F04D9BE4A2]:0)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
There are still zombie threads that couldn't be terminated:1) 
Thread[id=3869, name=recoveryCmdExecutor-1535-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest] at 
java.net.PlainSocketImpl.socketConnect(Native Method) at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
 at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)  
   at

[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester


[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708937#comment-13708937
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503459 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1503459 ]

LUCENE-4845: close tmp directory; fix test to catch un-closed files; add 
missing suggester.close()

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708939#comment-13708939
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503460 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1503460 ]

LUCENE-4845: close tmp directory; fix test to catch un-closed files; add 
missing suggester.close()

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708934#comment-13708934
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503458 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1503458 ]

LUCENE-4845: close tmp directory; fix test to catch un-closed files; add 
missing suggester.close()

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #909: POMs out of sync

Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/909/

All tests passed

Build Log:
[...truncated 20111 lines...]
  [mvn] [INFO] -
  [mvn] [INFO] -
  [mvn] [ERROR] COMPILATION ERROR : 
  [mvn] [INFO] -

[...truncated 305 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3633) web UI reports an error if CoreAdminHandler says there are no SolrCores

2013-07-15 Thread Stefan Matheis (steffkes) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3633:


Attachment: SOLR-3633.patch

 web UI reports an error if CoreAdminHandler says there are no SolrCores
 ---

 Key: SOLR-3633
 URL: https://issues.apache.org/jira/browse/SOLR-3633
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0-ALPHA
Reporter: Hoss Man
Assignee: Stefan Matheis (steffkes)
 Fix For: 4.4

 Attachments: SOLR-3633.patch, SOLR-3633.patch, SOLR-3633.patch, 
 SOLR-3633.patch, SOLR-3633.patch, SOLR-3633.patch


 Spun off from SOLR-3591...
 * having no SolrCores is a valid situation
 * independent of what may happen in SOLR-3591, the web UI should cleanly deal 
 with there being no SolrCores, and just hide/grey out any tabs that can't be 
 supported w/o at least one core
 * even if there are no SolrCores the core admin features (ie: creating a new 
 core) should be accessible in the UI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #909: POMs out of sync