Re: Getting term vectors/computing cosine similarity

2014-05-28 Thread Andi Vajda

 On May 27, 2014, at 19:17, Michael O'Leary mich...@moz.com wrote:
 
 *tl;dnr*: a next() method is defined for the Java class TVTermsEnum in
 Lucene 4.8.1, but it looks like there is no next() method available for an
 object that looks like it is an instance of the Python class TVTermsEnum in
 PyLucene 4.8.1.

If there is a next() method, there is a good chance the object is even iterable 
(in the python sense). You may need to cast it first, though, as the api that 
returned it to you may not be defined to return TVTermsEnum:
  TVTermsEnum.cast_(obj)

A good place for PyLucene code examples is its suite of unit tests. It also has 
a few samples - way less than in 3.x releases because the APIs changed too much.
I'm pretty sure there is a test involving TermsEnum in the tests directory.

Andi..

 I have a set of documents that I would like to cluster. These documents
 share a vocabulary of only about 3,000 unique terms, but there are about
 15,000,000 documents. One way I thought of doing this would be to index the
 documents using PyLucene (Python is the preferred programming language at
 work), obtain term vectors for the documents using PyLucene API functions,
 and calculate cosine similarities between pairs of term vectors in order to
 determine which documents are close to each other.
 
 I found some sample Java code on the web that various people have posted
 showing ways to do this with older versions of Lucene. I downloaded
 PyLucene 4.8.1 and compared its API functions with the ones used in the
 code samples, and saw that this is an area of Lucene that has changed quite
 a bit. I can send an email to the lucene-user mailing group to ask what
 would be a good way of doing this using version 4.8.1, but the question I
 have for this mailing group has to do with some Java API functions that it
 looks like are not exposed in Python, unless I have to go about accessing
 them in a different way.
 
 If I obtain the term vector for the field cat_ids in a document with id
 doc_id_1
 
 doc_1_tfv = reader.getTermVector(doc_id_1, cat_ids)
 
 then doc_1_tfv is displayed as this object:
 
 Terms:
 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTerms@32c46396
 
 In some of the sample code I looked at, the terms in doc_1_tfv could be
 obtained with doc_1_tfv.getTerms(), but it looks like getTerms is not a
 member function of Terms or its subclasses any more. In another code
 sample, an iterator for the term vector is obtained via tfv_iter =
 doc_1_tfv.iterator(None) and then the terms are obtained one by one with
 calls to tfv_iter.next(). This is where I get stuck. tfv_iter has this
 value:
 
 TermsEnum:
 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum@1cca2369
 
 and there is a next() function defined for the TVTermsEnum class, but this
 object doesn't list next() as one of its member functions and an exception
 is raised if it is called. It looks like the object only supports the
 member functions defined for the TermsEnum class, and next() is not one of
 them. Is this the case, or is there a way have it support all of the
 TVTermsEnum member functions, including next()? TVTermsEnum is a private
 class in CompressingTermVectorsReader.java.
 
 So I am wondering if there is a way to obtain term vectors in this way and
 that I am just not treating doc_1_tfv and tfv_iter in the right way, or if
 there is a different, better way to get term vectors for documents in a
 PyLucene index, or if this isn't something that Lucene should be used for.
 Thank you very much for any help you can provide.
 Mike


Re: Getting term vectors/computing cosine similarity

2014-05-28 Thread Andi Vajda

 On May 27, 2014, at 21:03, Michael O'Leary mich...@moz.com wrote:
 
 Hi Andi,
 Thanks for the help. I just tried to import TVTermsEnum so I could try
 casting my iter, and I don't see how to do it since TVTermsEnum is a
 private class with fully qualified
 name 
 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum.

If it's a private class then you have no access to it and no python wrapper 
class was generated for it. Try going up the superclass chain until you hit a 
public class or interface, probably TermsEnum.

Andi..

 I tried
 
 from org.apache.lucene.codecs.compressing import
 CompressingTermVectorsReader$TVTermsEnum
 from org.apache.lucene.codecs.compressing import TVTermsEnum
 and
 import org.apache.lucene.codecs.compressing
 
 but none of them provided access to TVTermsEnum (the first two raised
 exceptions). After running import org.apache.lucene.codecs.compressing, I
 could do dir(org.apache.lucene.codecs.compressing) and see the contents of
 that module. CompressingTermVectorsReader was listed, but TVTermsEnum
 wasn't. TVTermsEnum also wasn't listed in the output of
 dir(org.apache.lucene.codecs.compressing.CompressingTermVectorsReader). So
 it looks like my first problem is how to get access to TVTermsEnum.
 Mike
 
 
 On Tue, May 27, 2014 at 11:10 PM, Andi Vajda va...@apache.org wrote:
 
 
 On May 27, 2014, at 19:17, Michael O'Leary mich...@moz.com wrote:
 
 *tl;dnr*: a next() method is defined for the Java class TVTermsEnum in
 Lucene 4.8.1, but it looks like there is no next() method available for
 an
 object that looks like it is an instance of the Python class TVTermsEnum
 in
 PyLucene 4.8.1.
 
 If there is a next() method, there is a good chance the object is even
 iterable (in the python sense). You may need to cast it first, though, as
 the api that returned it to you may not be defined to return TVTermsEnum:
  TVTermsEnum.cast_(obj)
 
 A good place for PyLucene code examples is its suite of unit tests. It
 also has a few samples - way less than in 3.x releases because the APIs
 changed too much.
 I'm pretty sure there is a test involving TermsEnum in the tests directory.
 
 Andi..
 
 I have a set of documents that I would like to cluster. These documents
 share a vocabulary of only about 3,000 unique terms, but there are about
 15,000,000 documents. One way I thought of doing this would be to index
 the
 documents using PyLucene (Python is the preferred programming language at
 work), obtain term vectors for the documents using PyLucene API
 functions,
 and calculate cosine similarities between pairs of term vectors in order
 to
 determine which documents are close to each other.
 
 I found some sample Java code on the web that various people have posted
 showing ways to do this with older versions of Lucene. I downloaded
 PyLucene 4.8.1 and compared its API functions with the ones used in the
 code samples, and saw that this is an area of Lucene that has changed
 quite
 a bit. I can send an email to the lucene-user mailing group to ask what
 would be a good way of doing this using version 4.8.1, but the question I
 have for this mailing group has to do with some Java API functions that
 it
 looks like are not exposed in Python, unless I have to go about accessing
 them in a different way.
 
 If I obtain the term vector for the field cat_ids in a document with id
 doc_id_1
 
 doc_1_tfv = reader.getTermVector(doc_id_1, cat_ids)
 
 then doc_1_tfv is displayed as this object:
 
 Terms:
 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTerms@32c46396
 
 In some of the sample code I looked at, the terms in doc_1_tfv could be
 obtained with doc_1_tfv.getTerms(), but it looks like getTerms is not a
 member function of Terms or its subclasses any more. In another code
 sample, an iterator for the term vector is obtained via tfv_iter =
 doc_1_tfv.iterator(None) and then the terms are obtained one by one with
 calls to tfv_iter.next(). This is where I get stuck. tfv_iter has this
 value:
 
 TermsEnum:
 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum@1cca2369
 
 and there is a next() function defined for the TVTermsEnum class, but
 this
 object doesn't list next() as one of its member functions and an
 exception
 is raised if it is called. It looks like the object only supports the
 member functions defined for the TermsEnum class, and next() is not one
 of
 them. Is this the case, or is there a way have it support all of the
 TVTermsEnum member functions, including next()? TVTermsEnum is a private
 class in CompressingTermVectorsReader.java.
 
 So I am wondering if there is a way to obtain term vectors in this way
 and
 that I am just not treating doc_1_tfv and tfv_iter in the right way, or
 if
 there is a different, better way to get term vectors for documents in a
 PyLucene index, or if this isn't something that Lucene should be used
 for.
 Thank you very much for any help you can provide.
 Mike
 


Re: Getting term vectors/computing cosine similarity

2014-05-28 Thread Aric Coady
On May 28, 2014, at 12:03 AM, Michael O'Leary mich...@moz.com wrote:
 Hi Andi,
 Thanks for the help. I just tried to import TVTermsEnum so I could try
 casting my iter, and I don't see how to do it since TVTermsEnum is a
 private class with fully qualified
 name 
 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum.
 I tried

Cast the TermsEnum object with BytesRefIterator.cast_.  Then it will have a 
next method, and be python-iterable.

Here’s an example that outputs the term vectors as a generator.  Look at the 
vector method just above:
https://pythonhosted.org/lupyne/_modules/lupyne/engine/indexers.html#IndexReader.termvector

 from org.apache.lucene.codecs.compressing import
 CompressingTermVectorsReader$TVTermsEnum
 from org.apache.lucene.codecs.compressing import TVTermsEnum
 and
 import org.apache.lucene.codecs.compressing
 
 but none of them provided access to TVTermsEnum (the first two raised
 exceptions). After running import org.apache.lucene.codecs.compressing, I
 could do dir(org.apache.lucene.codecs.compressing) and see the contents of
 that module. CompressingTermVectorsReader was listed, but TVTermsEnum
 wasn't. TVTermsEnum also wasn't listed in the output of
 dir(org.apache.lucene.codecs.compressing.CompressingTermVectorsReader). So
 it looks like my first problem is how to get access to TVTermsEnum.
 Mike
 
 
 On Tue, May 27, 2014 at 11:10 PM, Andi Vajda va...@apache.org wrote:
 
 
 On May 27, 2014, at 19:17, Michael O'Leary mich...@moz.com wrote:
 
 *tl;dnr*: a next() method is defined for the Java class TVTermsEnum in
 Lucene 4.8.1, but it looks like there is no next() method available for
 an
 object that looks like it is an instance of the Python class TVTermsEnum
 in
 PyLucene 4.8.1.
 
 If there is a next() method, there is a good chance the object is even
 iterable (in the python sense). You may need to cast it first, though, as
 the api that returned it to you may not be defined to return TVTermsEnum:
  TVTermsEnum.cast_(obj)
 
 A good place for PyLucene code examples is its suite of unit tests. It
 also has a few samples - way less than in 3.x releases because the APIs
 changed too much.
 I'm pretty sure there is a test involving TermsEnum in the tests directory.
 
 Andi..
 
 I have a set of documents that I would like to cluster. These documents
 share a vocabulary of only about 3,000 unique terms, but there are about
 15,000,000 documents. One way I thought of doing this would be to index
 the
 documents using PyLucene (Python is the preferred programming language at
 work), obtain term vectors for the documents using PyLucene API
 functions,
 and calculate cosine similarities between pairs of term vectors in order
 to
 determine which documents are close to each other.
 
 I found some sample Java code on the web that various people have posted
 showing ways to do this with older versions of Lucene. I downloaded
 PyLucene 4.8.1 and compared its API functions with the ones used in the
 code samples, and saw that this is an area of Lucene that has changed
 quite
 a bit. I can send an email to the lucene-user mailing group to ask what
 would be a good way of doing this using version 4.8.1, but the question I
 have for this mailing group has to do with some Java API functions that
 it
 looks like are not exposed in Python, unless I have to go about accessing
 them in a different way.
 
 If I obtain the term vector for the field cat_ids in a document with id
 doc_id_1
 
 doc_1_tfv = reader.getTermVector(doc_id_1, cat_ids)
 
 then doc_1_tfv is displayed as this object:
 
 Terms:
 
 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTerms@32c46396
 
 In some of the sample code I looked at, the terms in doc_1_tfv could be
 obtained with doc_1_tfv.getTerms(), but it looks like getTerms is not a
 member function of Terms or its subclasses any more. In another code
 sample, an iterator for the term vector is obtained via tfv_iter =
 doc_1_tfv.iterator(None) and then the terms are obtained one by one with
 calls to tfv_iter.next(). This is where I get stuck. tfv_iter has this
 value:
 
 TermsEnum:
 
 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum@1cca2369
 
 and there is a next() function defined for the TVTermsEnum class, but
 this
 object doesn't list next() as one of its member functions and an
 exception
 is raised if it is called. It looks like the object only supports the
 member functions defined for the TermsEnum class, and next() is not one
 of
 them. Is this the case, or is there a way have it support all of the
 TVTermsEnum member functions, including next()? TVTermsEnum is a private
 class in CompressingTermVectorsReader.java.
 
 So I am wondering if there is a way to obtain term vectors in this way
 and
 that I am just not treating doc_1_tfv and tfv_iter in the right way, or
 if
 there is a different, better way to get term vectors for documents in a
 PyLucene index, or if this isn't something that 

[JENKINS] Lucene-Solr-SmokeRelease-4.x - Build # 166 - Still Failing

2014-05-28 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/166/

No tests ran.

Build Log:
[...truncated 53313 lines...]
prepare-release-no-sign:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease
 [copy] Copying 431 files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/lucene
 [copy] Copying 239 files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/solr
 [exec] JAVA7_HOME is /home/hudson/tools/java/latest1.7
 [exec] NOTE: output encoding is US-ASCII
 [exec] 
 [exec] Load release URL 
file:/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/...
 [exec] 
 [exec] Test Lucene...
 [exec]   test basics...
 [exec]   get KEYS
 [exec] 0.1 MB in 0.01 sec (12.9 MB/sec)
 [exec]   check changes HTML...
 [exec]   download lucene-4.9.0-src.tgz...
 [exec] 27.5 MB in 0.04 sec (670.3 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   download lucene-4.9.0.tgz...
 [exec] 61.4 MB in 0.09 sec (665.1 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   download lucene-4.9.0.zip...
 [exec] 71.1 MB in 0.14 sec (503.1 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   unpack lucene-4.9.0.tgz...
 [exec] verify JAR metadata/identity/no javax.* or java.* classes...
 [exec] test demo with 1.7...
 [exec]   got 5711 hits for query lucene
 [exec] check Lucene's javadoc JAR
 [exec]   unpack lucene-4.9.0.zip...
 [exec] verify JAR metadata/identity/no javax.* or java.* classes...
 [exec] test demo with 1.7...
 [exec]   got 5711 hits for query lucene
 [exec] check Lucene's javadoc JAR
 [exec]   unpack lucene-4.9.0-src.tgz...
 [exec] make sure no JARs/WARs in src dist...
 [exec] run ant validate
 [exec] run tests w/ Java 7 and testArgs='-Dtests.jettyConnector=Socket 
 -Dtests.disableHdfs=true'...
 [exec] test demo with 1.7...
 [exec]   got 249 hits for query lucene
 [exec] generate javadocs w/ Java 7...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [exec] 
 [exec] Test Solr...
 [exec]   test basics...
 [exec]   get KEYS
 [exec] 0.1 MB in 0.02 sec (6.3 MB/sec)
 [exec] Traceback (most recent call last):
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/script
  check changes HTML...
 [exec] s/smokeTestRelease.py, line 1347, in module
 [exec] main()
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py,
 line 1291, in main
 [exec] smokeTest(baseURL, svnRevision, version, tmpDir, isSigned, 
testArgs)
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py,
 line 1333, in smokeTest
 [exec] checkSigs('solr', solrPath, version, tmpDir, isSigned)
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py,
 line 410, in checkSigs
 [exec] testChanges(project, version, changesURL)
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py,
 line 458, in testChanges
 [exec] checkChangesContent(s, version, changesURL, project, True)
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py,
 line 485, in checkChangesContent
 [exec] raise RuntimeError('incorrect issue (_ instead of -) in %s: %s' 
% (name, m.group(1)))
 [exec] RuntimeError: incorrect issue (_ instead of -) in 
file:///usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/solr/changes/Changes.html:
 SOLR_3671

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/build.xml:387:
 exec returned: 1

Total time: 54 minutes 7 seconds
Build step 'Invoke Ant' marked build as failure
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010881#comment-14010881
 ] 

Alexander S. commented on SOLR-5463:


Inability to use this without sorting by an unique key (e.g. id) makes this 
feature useless. Same could be achieved previously with sorting by id and 
searching for docs where id is / than the last received. See how cursors do 
work in MongoDB, that's the right direction.

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010883#comment-14010883
 ] 

Alexander S. commented on SOLR-5463:


http://docs.mongodb.org/manual/core/cursors/

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5701) Move core closed listeners to AtomicReader

2014-05-28 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-5701.
--

Resolution: Fixed

 Move core closed listeners to AtomicReader
 --

 Key: LUCENE-5701
 URL: https://issues.apache.org/jira/browse/LUCENE-5701
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5701.patch, LUCENE-5701.patch


 Core listeners are very helpful when managing per-segment caches (filters, 
 uninverted doc values, etc.) yet this API is only exposed on 
 {{SegmentReader}}. If you want to use it today, you need to do instanceof 
 checks, try to unwrap in case of a FilterAtomicReader and finally fall back 
 to a reader closed listener if every other attempt to get the underlying 
 SegmentReader failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010888#comment-14010888
 ] 

Alexander S. commented on SOLR-5463:


Sorry for spamming, but can't edit my previous message. I just found that in 
mongo they also aren't isolated and could return duplicates, I was thinking 
they are. But sorting docs by id is not acceptable in 99% of use cases, 
especially in Solr, where it is more expected to get results sorted by 
relevance.

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3671) DIH doesn't use its own interface + writerImpl has no information about the request

2014-05-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010900#comment-14010900
 ] 

ASF subversion and git services commented on SOLR-3671:
---

Commit 1597936 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1597936 ]

SOLR-3671: fix ongoing smoke test build failure

 DIH doesn't use its own interface + writerImpl has no information about the 
 request
 ---

 Key: SOLR-3671
 URL: https://issues.apache.org/jira/browse/SOLR-3671
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.0-ALPHA, 4.0-BETA
Reporter: Roman Chyla
Assignee: James Dyer
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-3671.patch, SOLR-3671.patch


 The use case: I would like to extend DIH by providing a new writer, I have 
 tried everything but can't accomplish it without either a) duplicating whole 
 DIHandler or b) java reflection tricks. Almost everything inside DIH is 
 private and the mechanism to instantiate a new writer based on the 
 'writerImpl' mechanism seems lacking important functionality
 It doesn't give the new class a chance to get information about the request, 
 update processor. Also, the writer is instantiated twice (when 'writerImpl' 
 is there), which is really unnecessary.
 As a solution, the existing DIHandler.getSolrWriter() should instantiate the 
 appropriate writer and send it to DocBuilder (it is already doing that for 
 SolrWriter). And DocBuilder doesn't need to create a second (duplicate) writer



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3671) DIH doesn't use its own interface + writerImpl has no information about the request

2014-05-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010901#comment-14010901
 ] 

ASF subversion and git services commented on SOLR-3671:
---

Commit 1597937 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1597937 ]

SOLR-3671: fix ongoing smoke test build failure

 DIH doesn't use its own interface + writerImpl has no information about the 
 request
 ---

 Key: SOLR-3671
 URL: https://issues.apache.org/jira/browse/SOLR-3671
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.0-ALPHA, 4.0-BETA
Reporter: Roman Chyla
Assignee: James Dyer
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-3671.patch, SOLR-3671.patch


 The use case: I would like to extend DIH by providing a new writer, I have 
 tried everything but can't accomplish it without either a) duplicating whole 
 DIHandler or b) java reflection tricks. Almost everything inside DIH is 
 private and the mechanism to instantiate a new writer based on the 
 'writerImpl' mechanism seems lacking important functionality
 It doesn't give the new class a chance to get information about the request, 
 update processor. Also, the writer is instantiated twice (when 'writerImpl' 
 is there), which is really unnecessary.
 As a solution, the existing DIHandler.getSolrWriter() should instantiate the 
 appropriate writer and send it to DocBuilder (it is already doing that for 
 SolrWriter). And DocBuilder doesn't need to create a second (duplicate) writer



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-4121) Standardize ramBytesUsed/sizeInBytes/memSize

2014-05-28 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand closed LUCENE-4121.


   Resolution: Duplicate
Fix Version/s: (was: 4.9)
   (was: 5.0)

 Standardize ramBytesUsed/sizeInBytes/memSize
 

 Key: LUCENE-4121
 URL: https://issues.apache.org/jira/browse/LUCENE-4121
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4121.patch


 We should standardize the names of the methods we use to estimate the sizes 
 of objects in memory and on disk. (cf. discussion on dev@lucene 
 http://search-lucene.com/m/VbXSx1BP60G).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)

2014-05-28 Thread Eyal Zaidman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010943#comment-14010943
 ] 

Eyal Zaidman commented on SOLR-6113:


I like the idea of removing just the field name, because I agree it gives a 
better result when I look at it from the permission use case scenario - you get 
to search where you're allowed, and there's a chance your data is in the 
default search field.
A literal search with the field name in that scenario would likely suffer from 
the original issue - there is very little chance the field name exists, as the 
user did not intend to look for it.

I also agree that it's important to provide feedback that search results have 
been changed, but it seems to me if the search client is adding a uf 
restriction, it should be the clients responsibility to inform the end user 
about that restriction ? or at least I can't come up with a good way for Solr 
to do that.

 Edismax doesn't parse well the query uf (User Fields)
 -

 Key: SOLR-6113
 URL: https://issues.apache.org/jira/browse/SOLR-6113
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Liram Vardi
Priority: Minor
  Labels: edismax

 It seems that Edismax User Fields feature does not behave as expected.
 For instance, assuming the following query:
 _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_
 The parsed query (taken from query debug info) is:
 _+((id:b* (text:user) (text:anna collins))~1)_
 I expect that because user was filtered out in uf (User fields), the 
 parsed query should not contain the user search part.
 In another words, the parsed query should look simply like this:  _+id:b*_
 This issue is affected by a the patch on issue SOLR-2649: When changing the 
 default OP of Edismax to AND, the query results change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5700) Add 'accountable' interface for various ramBytesUsed

2014-05-28 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5700:
-

Attachment: LUCENE-5700.patch

Here is a patch:
 - new oal.util.Accountable interface with a ramBytesUsed() method
 - Classes that had a ramBytesUsed/sizeInBytes method to estimate memory usage 
now implement this interface
 - Classes that had a sizeInBytes method to compute disk usage remained as-is.

I think the tough question is what to do in case memory usage cannot be 
computed. Returning -1 would work but we would need to make sure all consumers 
of this API handle that case properly... Since we don't have this issue now 
(all classes that implement the interface know how to do it), the documentation 
specifies that negative values are unsupported. Maybe we'll need to revisit it 
in the future in case the problem arises but for now I think that is the 
simplest option?

 Add 'accountable' interface for various ramBytesUsed
 

 Key: LUCENE-5700
 URL: https://issues.apache.org/jira/browse/LUCENE-5700
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5700.patch


 Currently this is a disaster. there is ramBytesUsed(), sizeInBytes(), etc etc 
 everywhere, with zero consistency, little javadocs, and no structure. For 
 example, look at LUCENE-5695, where we go back and forth on how to handle 
 don't know. 
 I don't think we should add any more of these methods to any classes in 
 lucene until this has been cleaned up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5700) Add 'accountable' interface for various ramBytesUsed

2014-05-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010984#comment-14010984
 ] 

Robert Muir commented on LUCENE-5700:
-

{quote}
Since we don't have this issue now (all classes that implement the interface 
know how to do it)
{quote}

Yeah, I don't think a class should implement the interface if it can't actually 
return a valid result.

 Add 'accountable' interface for various ramBytesUsed
 

 Key: LUCENE-5700
 URL: https://issues.apache.org/jira/browse/LUCENE-5700
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5700.patch


 Currently this is a disaster. there is ramBytesUsed(), sizeInBytes(), etc etc 
 everywhere, with zero consistency, little javadocs, and no structure. For 
 example, look at LUCENE-5695, where we go back and forth on how to handle 
 don't know. 
 I don't think we should add any more of these methods to any classes in 
 lucene until this has been cleaned up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5700) Add 'accountable' interface for various ramBytesUsed

2014-05-28 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010994#comment-14010994
 ] 

Dawid Weiss commented on LUCENE-5700:
-

I'm for throwing an exception. Either a class knows how to handle it or 
shouldn't implement it (throw UnsupportedOperationException).

 Add 'accountable' interface for various ramBytesUsed
 

 Key: LUCENE-5700
 URL: https://issues.apache.org/jira/browse/LUCENE-5700
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5700.patch


 Currently this is a disaster. there is ramBytesUsed(), sizeInBytes(), etc etc 
 everywhere, with zero consistency, little javadocs, and no structure. For 
 example, look at LUCENE-5695, where we go back and forth on how to handle 
 don't know. 
 I don't think we should add any more of these methods to any classes in 
 lucene until this has been cleaned up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4556) FuzzyTermsEnum creates tons of objects

2014-05-28 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4556:


Assignee: Michael McCandless  (was: Simon Willnauer)

 FuzzyTermsEnum creates tons of objects
 --

 Key: LUCENE-4556
 URL: https://issues.apache.org/jira/browse/LUCENE-4556
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search, modules/spellchecker
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Critical
 Fix For: 4.9, 5.0

 Attachments: LUCENE-4556.patch, LUCENE-4556.patch


 I ran into this problem in production using the DirectSpellchecker. The 
 number of objects created by the spellchecker shoot through the roof very 
 very quickly. We ran about 130 queries and ended up with  2M transitions / 
 states. We spend 50% of the time in GC just because of transitions. Other 
 parts of the system behave just fine here.
 I talked quickly to robert and gave a POC a shot providing a 
 LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case 
 and build a array based strucuture converted into UTF-8 directly instead of 
 going through the object based APIs. This involved quite a bit of changes but 
 they are all package private at this point. I have a patch that still has a 
 fair set of nocommits but its shows that its possible and IMO worth the 
 trouble to make this really useable in production. All tests pass with the 
 patch - its a start



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5709) NorwegianPhoneticFilter

2014-05-28 Thread JIRA
Jan Høydahl created LUCENE-5709:
---

 Summary: NorwegianPhoneticFilter
 Key: LUCENE-5709
 URL: https://issues.apache.org/jira/browse/LUCENE-5709
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Jan Høydahl
Priority: Minor
 Fix For: 4.9


There has been a steady demand for a Norwegian phonetic normalization filter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5709) NorwegianPhoneticFilter

2014-05-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011002#comment-14011002
 ] 

Jan Høydahl commented on LUCENE-5709:
-

One candidate could be https://github.com/kvalle/norphoname

 NorwegianPhoneticFilter
 ---

 Key: LUCENE-5709
 URL: https://issues.apache.org/jira/browse/LUCENE-5709
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Jan Høydahl
Priority: Minor
 Fix For: 4.9


 There has been a steady demand for a Norwegian phonetic normalization filter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_05) - Build # 10405 - Still Failing!

2014-05-28 Thread Dawid Weiss
 java.lang.NullPointerException
 at 
 __randomizedtesting.SeedInfo.seed([6330FD6A7351D588:22BBDD0F54EF26C7]:0)
 at 
 org.apache.solr.SolrTestCaseJ4.recurseDelete(SolrTestCaseJ4.java:1018)

This looks like a JVM problem too?

1017:if (f.isDirectory()) {
1018:  for (File sub : f.listFiles()) {

Anyway, cleaned it up a bit.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4556) FuzzyTermsEnum creates tons of objects

2014-05-28 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011014#comment-14011014
 ] 

Nik Everett commented on LUCENE-4556:
-

I'm having GC trouble and I'm using the DirectCandidateGenerator.  Its 
obviously kind of hard to tell how much the automata is contributing in 
production but when I try it locally just generating the automata for two or 
three terms takes about 200KB of memory.  Napkin math (200KB * 
250queries/second) says this makes about 50MB of garbage per second per index.  
Obviously it gets worse if you run this in a sharded context where each shard 
does the generating.  Well, not really worse, but the large up front cost and 
memory consumption of this process is relatively static based on shard size so 
this becomes a reason to use larger shards. 

I should propose that in addition to Simon's patches another other option is to 
try to implement something like the stack based automaton simulation that the 
Schulz Mihov paper (the one that proposed the Lev automaton) describes in 
section 6.  Its not useful for stuff like intersecting the enums but if you are 
willing to forgo that you could probably get away with much less memory 
consumption.

 FuzzyTermsEnum creates tons of objects
 --

 Key: LUCENE-4556
 URL: https://issues.apache.org/jira/browse/LUCENE-4556
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search, modules/spellchecker
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Critical
 Fix For: 4.9, 5.0

 Attachments: LUCENE-4556.patch, LUCENE-4556.patch


 I ran into this problem in production using the DirectSpellchecker. The 
 number of objects created by the spellchecker shoot through the roof very 
 very quickly. We ran about 130 queries and ended up with  2M transitions / 
 states. We spend 50% of the time in GC just because of transitions. Other 
 parts of the system behave just fine here.
 I talked quickly to robert and gave a POC a shot providing a 
 LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case 
 and build a array based strucuture converted into UTF-8 directly instead of 
 going through the object based APIs. This involved quite a bit of changes but 
 they are all package private at this point. I have a patch that still has a 
 fair set of nocommits but its shows that its possible and IMO worth the 
 trouble to make this really useable in production. All tests pass with the 
 patch - its a start



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5709) NorwegianPhoneticFilter

2014-05-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011018#comment-14011018
 ] 

Jan Høydahl commented on LUCENE-5709:
-

Question: Do any of you have experience with Swedish / Danish phonetic 
normalization? Right now I'm a bit sceptic to try to mash the three languages 
into one filter, but if they end up sharing most of the rules it could be an 
option to make a Scandinavian filter parameterized with 
mode=no|se|dk|scandinavian. There are many use cases spanning content from the 
whole region which could benefit from a unified solution.

 NorwegianPhoneticFilter
 ---

 Key: LUCENE-5709
 URL: https://issues.apache.org/jira/browse/LUCENE-5709
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Jan Høydahl
Priority: Minor
 Fix For: 4.9


 There has been a steady demand for a Norwegian phonetic normalization filter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5680) Allow updating multiple DocValues fields atomically

2014-05-28 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5680:
---

Attachment: LUCENE-5680.patch

Patch ports all tests to use the atomic updates. This removed the complexity in 
TestIWExceptions that we've added recently, since now each set of updates is 
either atomically applied or not.

 Allow updating multiple DocValues fields atomically
 ---

 Key: LUCENE-5680
 URL: https://issues.apache.org/jira/browse/LUCENE-5680
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5680.patch, LUCENE-5680.patch, LUCENE-5680.patch, 
 LUCENE-5680.patch


 This has come up on the list (http://markmail.org/message/2wmpvksuwc5t57pg) 
 -- it would be good if we can allow updating several doc-values fields, 
 atomically. It will also improve/simplify our tests, where today we index two 
 fields, e.g. the field itself and a control field. In some multi-threaded 
 tests, since we cannot be sure which updates came through first, we limit the 
 test such that each thread updates a different set of fields, otherwise they 
 will collide and it will be hard to verify the index in the end.
 I was working on a patch and it looks pretty simple to do, will post a patch 
 shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5285) Solr response format should support child Docs

2014-05-28 Thread Arcadius Ahouansou (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011030#comment-14011030
 ] 

Arcadius Ahouansou commented on SOLR-5285:
--

Hi [~varunthacker],
I tried {code}facet=truefacet.field=content_type{code} 
The facet count for children was always 0.
Is this a feature?
Thanks.

 Solr response format should support child Docs
 --

 Key: SOLR-5285
 URL: https://issues.apache.org/jira/browse/SOLR-5285
 Project: Solr
  Issue Type: New Feature
Reporter: Varun Thacker
 Fix For: 4.9, 5.0

 Attachments: SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, 
 SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, 
 SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, 
 SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, 
 javabin_backcompat_child_docs.bin


 Solr has added support for taking childDocs as input ( only XML till now ). 
 It's currently used for BlockJoinQuery. 
 I feel that if a user indexes a document with child docs, even if he isn't 
 using the BJQ features and is just searching which results in a hit on the 
 parentDoc, it's childDocs should be returned in the response format.
 [~hossman_luc...@fucit.org] on IRC suggested that the DocTransformers would 
 be the place to add childDocs to the response.
 Now given a docId one needs to find out all the childDoc id's. A couple of 
 approaches which I could think of are 
 1. Maintain the relation between a parentDoc and it's childDocs during 
 indexing time in maybe a separate index?
 2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a 
 parentDoc it finds out all the childDocs but this requires a childScorer.
 Am I missing something obvious on how to find the relation between a 
 parentDoc and it's childDocs because none of the above solutions for this 
 look right.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6088) Add query re-ranking with the ReRankingQParserPlugin

2014-05-28 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6088:
-

Attachment: SOLR-6088.patch

New patch supports the sort parameter for the main query and preserves query 
elevation (QueryElevationComponent).

 Add query re-ranking with the ReRankingQParserPlugin
 

 Key: SOLR-6088
 URL: https://issues.apache.org/jira/browse/SOLR-6088
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Joel Bernstein
 Attachments: SOLR-6088.patch, SOLR-6088.patch, SOLR-6088.patch, 
 SOLR-6088.patch


 This ticket introduces the ReRankingQParserPlugin which adds query 
 Reranking/Rescoring for Solr. It leverages the new RankQuery framework to 
 plug-in the new Lucene QueryRescorer.
 See ticket LUCENE-5489 for details on the use case.
 Sample syntax:
 {code}
 q={!rerank mainQuery=$qq reRankQuery=$rqq reRankDocs=200}
 {code}
 In the example above the mainQuery is executed and 200 docs are collected and 
 re-ranked based on the results of the reRankQuery. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6115) Cleanup enum/string action types in Overseer, OverseerCollectionProcessor and CollectionHandler

2014-05-28 Thread Shalin Shekhar Mangar (JIRA)
Shalin Shekhar Mangar created SOLR-6115:
---

 Summary: Cleanup enum/string action types in Overseer, 
OverseerCollectionProcessor and CollectionHandler
 Key: SOLR-6115
 URL: https://issues.apache.org/jira/browse/SOLR-6115
 Project: Solr
  Issue Type: Task
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.9, 5.0


The enum/string handling for actions in Overseer and OCP is a mess. We should 
fix it.

From: 
https://issues.apache.org/jira/browse/SOLR-5466?focusedCommentId=13918059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13918059
{quote}
I started to untangle the fact that we have all the strings in 
OverseerCollectionProcessor, but also have a nice CollectionAction enum. And 
the commands are intermingled with parameters, it all seems rather confusing. 
Does it make sense to use the enum rather than the strings? Or somehow 
associate the two? Probably something for another JIRA though...
{quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5710) DefaultIndexingChain swallows useful information from MaxBytesLengthExceededException

2014-05-28 Thread Lee Hinman (JIRA)
Lee Hinman created LUCENE-5710:
--

 Summary: DefaultIndexingChain swallows useful information from 
MaxBytesLengthExceededException
 Key: LUCENE-5710
 URL: https://issues.apache.org/jira/browse/LUCENE-5710
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.8.1
Reporter: Lee Hinman
Priority: Minor


In DefaultIndexingChain, when a MaxBytesLengthExceededException is caught, the 
original message is discarded, however, the message contains useful information 
like the size that exceeded the limit.

Lucene should make this information included in the newly thrown 
IllegalArgumentException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6116) DocRouter.getDocRouter accepts routerName as an Object

2014-05-28 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6116:


Description: 
Refactor DocRouter.java from:
{code}
public static DocRouter getDocRouter(Object routerName) {...}
{code}
to:
{code}
public static DocRouter getDocRouter(String routerName) {
{code}

There's really no reason not to accept a routerName as a string.

  was:
Refactor DocRouter.java from:
{code}
public static DocRouter getDocRouter(Object routerName) {...}
{code}
to:
{code}
public static DocRouter getDocRouter(String routerName) {
{code}

There's really no reason to accept a routerName as a string.


 DocRouter.getDocRouter accepts routerName as an Object
 --

 Key: SOLR-6116
 URL: https://issues.apache.org/jira/browse/SOLR-6116
 Project: Solr
  Issue Type: Task
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 4.9, 5.0


 Refactor DocRouter.java from:
 {code}
 public static DocRouter getDocRouter(Object routerName) {...}
 {code}
 to:
 {code}
 public static DocRouter getDocRouter(String routerName) {
 {code}
 There's really no reason not to accept a routerName as a string.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6116) DocRouter.getDocRouter accepts routerName as an Object

2014-05-28 Thread Shalin Shekhar Mangar (JIRA)
Shalin Shekhar Mangar created SOLR-6116:
---

 Summary: DocRouter.getDocRouter accepts routerName as an Object
 Key: SOLR-6116
 URL: https://issues.apache.org/jira/browse/SOLR-6116
 Project: Solr
  Issue Type: Task
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 4.9, 5.0


Refactor DocRouter.java from:
{code}
public static DocRouter getDocRouter(Object routerName) {...}
{code}
to:
{code}
public static DocRouter getDocRouter(String routerName) {
{code}

There's really no reason to accept a routerName as a string.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5710) DefaultIndexingChain swallows useful information from MaxBytesLengthExceededException

2014-05-28 Thread Lee Hinman (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lee Hinman updated LUCENE-5710:
---

Attachment: LUCENE-5710.patch

Attaching patch that includes the original exception's message in the 
IllegalArgumentException message.

 DefaultIndexingChain swallows useful information from 
 MaxBytesLengthExceededException
 -

 Key: LUCENE-5710
 URL: https://issues.apache.org/jira/browse/LUCENE-5710
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.8.1
Reporter: Lee Hinman
Priority: Minor
 Attachments: LUCENE-5710.patch


 In DefaultIndexingChain, when a MaxBytesLengthExceededException is caught, 
 the original message is discarded, however, the message contains useful 
 information like the size that exceeded the limit.
 Lucene should make this information included in the newly thrown 
 IllegalArgumentException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5708) Remove IndexWriterConfig.clone

2014-05-28 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011085#comment-14011085
 ] 

Simon Willnauer commented on LUCENE-5708:
-

part of the problem is that we holding IW state on MergePolicy and 
MergeScheduler. Both classes should get the IW passed to the relevant methods 
so we can share them across as many instances we want...

 Remove IndexWriterConfig.clone
 --

 Key: LUCENE-5708
 URL: https://issues.apache.org/jira/browse/LUCENE-5708
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5708.patch, LUCENE-5708.patch


 We originally added this clone to allow a single IWC to be re-used against 
 more than one IndexWriter, but I think this is a mis-feature: it adds 
 complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I 
 think it's buggy today.
 I think we should just disallow sharing: you must make a new IWC for a new 
 IndexWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5711) Pass IW to MergePolicy

2014-05-28 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-5711:
---

 Summary: Pass IW to MergePolicy
 Key: LUCENE-5711
 URL: https://issues.apache.org/jira/browse/LUCENE-5711
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Simon Willnauer
 Fix For: 4.9, 5.0


Related to LUCENE-5708 we keep state in the MP holding on to the IW which 
prevents sharing the MP across index writers. Aside of this we should really 
not keep state in the MP it should really only select merges without being 
bound to the index writer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5711) Pass IW to MergePolicy

2014-05-28 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-5711:


Attachment: LUCENE-5711.patch

here is a patch

 Pass IW to MergePolicy
 --

 Key: LUCENE-5711
 URL: https://issues.apache.org/jira/browse/LUCENE-5711
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Simon Willnauer
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5711.patch


 Related to LUCENE-5708 we keep state in the MP holding on to the IW which 
 prevents sharing the MP across index writers. Aside of this we should really 
 not keep state in the MP it should really only select merges without being 
 bound to the index writer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5711) Pass IW to MergePolicy

2014-05-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011093#comment-14011093
 ] 

Michael McCandless commented on LUCENE-5711:


+1

 Pass IW to MergePolicy
 --

 Key: LUCENE-5711
 URL: https://issues.apache.org/jira/browse/LUCENE-5711
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Simon Willnauer
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5711.patch


 Related to LUCENE-5708 we keep state in the MP holding on to the IW which 
 prevents sharing the MP across index writers. Aside of this we should really 
 not keep state in the MP it should really only select merges without being 
 bound to the index writer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5710) DefaultIndexingChain swallows useful information from MaxBytesLengthExceededException

2014-05-28 Thread Lee Hinman (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lee Hinman updated LUCENE-5710:
---

Attachment: LUCENE-5710.patch

Patch that includes the exception as the cause parameter for 
IllegalArgumentException

 DefaultIndexingChain swallows useful information from 
 MaxBytesLengthExceededException
 -

 Key: LUCENE-5710
 URL: https://issues.apache.org/jira/browse/LUCENE-5710
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.8.1
Reporter: Lee Hinman
Priority: Minor
 Attachments: LUCENE-5710.patch, LUCENE-5710.patch


 In DefaultIndexingChain, when a MaxBytesLengthExceededException is caught, 
 the original message is discarded, however, the message contains useful 
 information like the size that exceeded the limit.
 Lucene should make this information included in the newly thrown 
 IllegalArgumentException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5711) Pass IW to MergePolicy

2014-05-28 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011098#comment-14011098
 ] 

Shai Erera commented on LUCENE-5711:


Note that this was changed in LUCENE-1763. At the time there were issues with 
some MPs that didn't treat it well, but I don't remember what issues. It also 
helped clean up the API, since I think most apps don't share an MP between 
writers. But then again, most people also don't write their own MPs, or 
interact with it directly, so the API is less of an issue. If it works and 
allows to share an MP between writers more easily, let's go for it.

 Pass IW to MergePolicy
 --

 Key: LUCENE-5711
 URL: https://issues.apache.org/jira/browse/LUCENE-5711
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Simon Willnauer
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5711.patch


 Related to LUCENE-5708 we keep state in the MP holding on to the IW which 
 prevents sharing the MP across index writers. Aside of this we should really 
 not keep state in the MP it should really only select merges without being 
 bound to the index writer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011217#comment-14011217
 ] 

Yonik Seeley commented on SOLR-5463:


 bq. But sorting docs by id is not acceptable in 99% of use cases, especially 
in Solr, where it is more expected to get results sorted by relevance.

It's only a tiebreak by id that is needed.  So sort=score desc, id asc is 
fine.

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-SmokeRelease-4.8 - Build # 4 - Failure

2014-05-28 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.8/4/

No tests ran.

Build Log:
[...truncated 52992 lines...]
prepare-release-no-sign:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/lucene/build/fakeRelease
 [copy] Copying 431 files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/lucene/build/fakeRelease/lucene
 [copy] Copying 239 files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/lucene/build/fakeRelease/solr
 [exec] JAVA7_HOME is /home/hudson/tools/java/latest1.7
 [exec] NOTE: output encoding is US-ASCII
 [exec] 
 [exec] Load release URL 
file:/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/lucene/build/fakeRelease/...
 [exec] 
 [exec] Test Lucene...
 [exec]   test basics...
 [exec]   get KEYS
 [exec] 0.1 MB in 0.01 sec (12.4 MB/sec)
 [exec]   check changes HTML...
 [exec]   download lucene-4.8.0-src.tgz...
 [exec] 27.5 MB in 0.04 sec (668.5 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   download lucene-4.8.0.tgz...
 [exec] 61.3 MB in 0.09 sec (692.9 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   download lucene-4.8.0.zip...
 [exec] 70.9 MB in 0.08 sec (934.7 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   unpack lucene-4.8.0.tgz...
 [exec] verify JAR metadata/identity/no javax.* or java.* classes...
 [exec] test demo with 1.7...
 [exec]   got 5687 hits for query lucene
 [exec] check Lucene's javadoc JAR
 [exec]   unpack lucene-4.8.0.zip...
 [exec] verify JAR metadata/identity/no javax.* or java.* classes...
 [exec] test demo with 1.7...
 [exec]   got 5687 hits for query lucene
 [exec] check Lucene's javadoc JAR
 [exec]   unpack lucene-4.8.0-src.tgz...
 [exec] make sure no JARs/WARs in src dist...
 [exec] run ant validate
 [exec] run tests w/ Java 7 and testArgs='-Dtests.jettyConnector=Socket 
 -Dtests.disableHdfs=true'...
 [exec] test demo with 1.7...
 [exec]   got 249 hits for query lucene
 [exec] generate javadocs w/ Java 7...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [exec] 
 [exec] Test Solr...
 [exec]   test basics...
 [exec]   get KEYS
 [exec] 0.1 MB in 0.00 sec (77.6 MB/sec)
 [exec]   check changes HTML...
 [exec] Traceback (most recent call last):
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/dev-tools/script
 [exec] s/smokeTestRelease.py, line 1347, in module
 [exec] main()
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/dev-tools/scripts/smokeTestRelease.py,
 line 1291, in main
 [exec] smokeTest(baseURL, svnRevision, version, tmpDir, isSigned, 
testArgs)
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/dev-tools/scripts/smokeTestRelease.py,
 line 1333, in smokeTest
 [exec] checkSigs('solr', solrPath, version, tmpDir, isSigned)
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/dev-tools/scripts/smokeTestRelease.py,
 line 410, in checkSigs
 [exec] testChanges(project, version, changesURL)
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/dev-tools/scripts/smokeTestRelease.py,
 line 458, in testChanges
 [exec] checkChangesContent(s, version, changesURL, project, True)
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/dev-tools/scripts/smokeTestRelease.py,
 line 485, in checkChangesContent
 [exec] raise RuntimeError('incorrect issue (_ instead of -) in %s: %s' 
% (name, m.group(1)))
 [exec] RuntimeError: incorrect issue (_ instead of -) in 
file:///usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/lucene/build/fakeRelease/solr/changes/Changes.html:
 SOLR_6029

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/build.xml:387:
 exec returned: 1

Total time: 53 minutes 57 seconds
Build step 'Invoke Ant' marked build as failure
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011226#comment-14011226
 ] 

Alexander S. commented on SOLR-5463:


Oh, that's awesome, thanks for the tip.

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011244#comment-14011244
 ] 

David Smiley commented on SOLR-5463:


I think Solr could be more user-friendly here by auto-adding the , id asc if 
it's not there.

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6021) Always persist router.field in cluster state so CloudSolrServer can route documents correctly

2014-05-28 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6021:


Attachment: SOLR-6021.patch

Changes
# Added a getter for coreConfigService in CoreContainer
# Overseer reads config/schema to get the unique key field name if router is 
not implicit and router.field is not specified.
# Added a test in TestCollection API

I initially wanted to do this in OverseerCollectionProcessor but then you can 
skip that completely if you're creating collections through the core admin API.

SolrJ doesn't need any changes because if a router.field is configured, the 
idField and it's value is not used at all.

 Always persist router.field in cluster state so CloudSolrServer can route 
 documents correctly
 -

 Key: SOLR-6021
 URL: https://issues.apache.org/jira/browse/SOLR-6021
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
 Attachments: SOLR-6021.patch


 CloudSolrServer has idField as id which is used for hashing and 
 distributing documents. There is a setter to change it as well.
 IMO, we should use the correct uniqueKey automatically. I propose that we 
 start storing router.field always in cluster state and set it to the 
 uniqueKey field name by default. Then CloudSolrServer would not need to 
 assume an id field by default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4502) ShardHandlerFactory not initialized in CoreContainer when creating a Core manually.

2014-05-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011291#comment-14011291
 ] 

David Smiley commented on SOLR-4502:


This is annoying; it's bitten me twice now. I think if this is tackled, we 
should step back a bit and think about how to make it easier to create an 
EmbeddedSolrServer considering that an ESS is limited to a single SolrCore.  In 
light of that, the user shouldn't have to be bothered with a CoreContainer, and 
I argue not with a SolrConfig object either.  They should be able to point to a 
SolrCore's instance directory, _and that's it_.  Anything else is superfluous 
ceremony.

FYI I'm creating my CoreContainer like this, before I then do all the other 
stuff:
{code:java}
public static SolrServer createEmbeddedSolr(final String instanceDir) throws 
Exception {
final String coreName = new File(instanceDir).getName();
final String dataDir = instanceDir + /../../cores_data/ + coreName;//or 
use null for default
// note: this is more complex than it should be. See SOLR-4502
SolrResourceLoader resourceLoader = new SolrResourceLoader(instanceDir);
CoreContainer container = new CoreContainer(resourceLoader,
ConfigSolr.fromString(resourceLoader, solr /) );
container.load();
CoreDescriptor descriptor = new CoreDescriptor(container, coreName, 
instanceDir);
SolrConfig config = new SolrConfig(instanceDir, descriptor.getConfigName(), 
null);
SolrCore core = new SolrCore(coreName, dataDir, config, null, descriptor);
container.register(core, false);
return new EmbeddedSolrServer(container, core.getName());
  }
{code}

 ShardHandlerFactory not initialized in CoreContainer when creating a Core 
 manually.
 ---

 Key: SOLR-4502
 URL: https://issues.apache.org/jira/browse/SOLR-4502
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.1
Reporter: Michael Aspetsberger
Assignee: Mark Miller
  Labels: NPE
 Fix For: 4.9, 5.0


 We are using an embedded solr server for our unit testing purposes. In our 
 scenario, we create a {{CoreContainer}} using only the solr-home path, and 
 then create the cores manually using a {{CoreDescriptor}}.
 While the creation appears to work fine, it hits an NPE when it handles the 
 search:
 {quote}
 Caused by: java.lang.NullPointerException
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
   at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150)
 {quote}
 According to 
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3CE8A9BF60-5577-45F9-8BEA-B85616C6539D%40gmail.com%3E
  , this is due to a missing {{CoreContainer#load}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations

2014-05-28 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011302#comment-14011302
 ] 

Noble Paul commented on SOLR-6091:
--

I have fixed this and I have logged it whenever a race condition occurs. So , 
the logs say that the race condition occurs . But after a few (10+) restarts I 
could still end up without an Overseer . SOLR-6095 is another issue I have 
identified and fixed it for local testing 

 Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
 ---

 Key: SOLR-6091
 URL: https://issues.apache.org/jira/browse/SOLR-6091
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7, 4.8
Reporter: Shalin Shekhar Mangar
Assignee: Noble Paul
 Fix For: 4.9, 5.0

 Attachments: SOLR-6091.patch


 When using the overseer roles feature, there is a possibility of more than 
 one thread executing the prioritizeOverseerNodes method and extra QUIT 
 commands being inserted into the overseer queue.
 At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a 
 race condition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations

2014-05-28 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011302#comment-14011302
 ] 

Noble Paul edited comment on SOLR-6091 at 5/28/14 4:50 PM:
---

I have fixed this and I have logged it whenever a race condition occurs. So , 
the logs say that the race condition occurs . But after a few (10+) restarts I 
could still end up without an Overseer . SOLR-6095 is another issue I have 
identified and fixed it for local testing . So these 2 issues together have 
resolved the problem


was (Author: noble.paul):
I have fixed this and I have logged it whenever a race condition occurs. So , 
the logs say that the race condition occurs . But after a few (10+) restarts I 
could still end up without an Overseer . SOLR-6095 is another issue I have 
identified and fixed it for local testing 

 Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
 ---

 Key: SOLR-6091
 URL: https://issues.apache.org/jira/browse/SOLR-6091
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7, 4.8
Reporter: Shalin Shekhar Mangar
Assignee: Noble Paul
 Fix For: 4.9, 5.0

 Attachments: SOLR-6091.patch


 When using the overseer roles feature, there is a possibility of more than 
 one thread executing the prioritizeOverseerNodes method and extra QUIT 
 commands being inserted into the overseer queue.
 At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a 
 race condition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations

2014-05-28 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011302#comment-14011302
 ] 

Noble Paul edited comment on SOLR-6091 at 5/28/14 4:53 PM:
---

I have fixed this and I have logged it whenever a race condition occurs. So , 
the logs say that the race condition occurs . But after a few (10+) restarts I 
could still end up without an Overseer . SOLR-6095 is another issue I have 
identified and fixed it for local testing . So these 2 issues together have 
resolved the problem. I have patches for both and will post them once some 
tests are added


was (Author: noble.paul):
I have fixed this and I have logged it whenever a race condition occurs. So , 
the logs say that the race condition occurs . But after a few (10+) restarts I 
could still end up without an Overseer . SOLR-6095 is another issue I have 
identified and fixed it for local testing . So these 2 issues together have 
resolved the problem

 Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
 ---

 Key: SOLR-6091
 URL: https://issues.apache.org/jira/browse/SOLR-6091
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7, 4.8
Reporter: Shalin Shekhar Mangar
Assignee: Noble Paul
 Fix For: 4.9, 5.0

 Attachments: SOLR-6091.patch


 When using the overseer roles feature, there is a possibility of more than 
 one thread executing the prioritizeOverseerNodes method and extra QUIT 
 commands being inserted into the overseer queue.
 At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a 
 race condition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6117) Replication command=fetchindex always return success.

2014-05-28 Thread Raintung Li (JIRA)
Raintung Li created SOLR-6117:
-

 Summary: Replication command=fetchindex always return success.
 Key: SOLR-6117
 URL: https://issues.apache.org/jira/browse/SOLR-6117
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 4.6
Reporter: Raintung Li


Replication API command=fetchindex do fetch the index. while occur the error, 
still give success response. 
API should return the right status, especially WAIT parameter is 
true.(synchronous).





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6117) Replication command=fetchindex always return success.

2014-05-28 Thread Raintung Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raintung Li updated SOLR-6117:
--

Attachment: SOLR-6117.txt

 Replication command=fetchindex always return success.
 -

 Key: SOLR-6117
 URL: https://issues.apache.org/jira/browse/SOLR-6117
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 4.6
Reporter: Raintung Li
 Attachments: SOLR-6117.txt


 Replication API command=fetchindex do fetch the index. while occur the error, 
 still give success response. 
 API should return the right status, especially WAIT parameter is 
 true.(synchronous).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6117) Replication command=fetchindex always return success.

2014-05-28 Thread Raintung Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raintung Li updated SOLR-6117:
--

Attachment: SOLR-6117.txt

 Replication command=fetchindex always return success.
 -

 Key: SOLR-6117
 URL: https://issues.apache.org/jira/browse/SOLR-6117
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 4.6
Reporter: Raintung Li
 Attachments: SOLR-6117.txt


 Replication API command=fetchindex do fetch the index. while occur the error, 
 still give success response. 
 API should return the right status, especially WAIT parameter is 
 true.(synchronous).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6117) Replication command=fetchindex always return success.

2014-05-28 Thread Raintung Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raintung Li updated SOLR-6117:
--

Attachment: (was: SOLR-6117.txt)

 Replication command=fetchindex always return success.
 -

 Key: SOLR-6117
 URL: https://issues.apache.org/jira/browse/SOLR-6117
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 4.6
Reporter: Raintung Li
 Attachments: SOLR-6117.txt


 Replication API command=fetchindex do fetch the index. while occur the error, 
 still give success response. 
 API should return the right status, especially WAIT parameter is 
 true.(synchronous).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011384#comment-14011384
 ] 

Hoss Man commented on SOLR-5463:


bq. I think Solr could be more user-friendly here by auto-adding the , id asc 
if it's not there.

The reason the code currently throws an error was because i figured it was 
better to force the user to choose which tie breaker they wanted (asc vs desc) 
then to just magically pick one arbitrarily.

If folks think a magic default is a better i've got no serious objections -- 
just open a new issue.

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5831) Scale score PostFilter

2014-05-28 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011402#comment-14011402
 ] 

Peter Keegan commented on SOLR-5831:


Hi Joel,

I'm not sure why I didn't see this problem until now, but this PostFilter 
doesn't work after being cached. When the ScoreScaleFilter is retrieved from 
the cache, the docSet is null and a new PostFilter collector is created, but 
the Collector's 'setScorer' method isn't called. As a result, the 'collect' 
method throws NPE (scorer.score()). What do I need to do to keep the query from 
rerunning?  Can the scorer be saved with the ScoreScaleFilter instead of the 
ScoreCollector?

Thanks,
Peter

 Scale score PostFilter
 --

 Key: SOLR-5831
 URL: https://issues.apache.org/jira/browse/SOLR-5831
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.7
Reporter: Peter Keegan
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, 
 SOLR-5831.patch, SOLR-5831.patch, TestScaleScoreQParserPlugin.patch


 The ScaleScoreQParserPlugin is a PostFilter that performs score scaling.
 This is an alternative to using a function query wrapping a scale() wrapping 
 a query(). For example:
 select?qq={!edismax v='news' qf='title^2 
 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query
  v=$qq}
 The problem with this query is that it has to scale every hit. Usually, only 
 the returned hits need to be scaled,
 but there may be use cases where the number of hits to be scaled is greater 
 than the returned hit count,
 but less than or equal to the total hit count.
 Sample syntax:
 fq={!scalescore+l=0.0 u=1.0 maxscalehits=1 
 func=sum(product(sscore(),0.75),product(field(myfield),0.25))}
 l=0.0 u=1.0   //Scale scores to values between 0-1, inclusive 
 maxscalehits=1//The maximum number of result scores to scale (-1 = 
 all hits, 0 = results 'page' size)
 func=...  //Apply the composite function to each hit. The 
 scaled score value is accessed by the 'score()' value source
 All parameters are optional. The defaults are:
 l=0.0 u=1.0
 maxscalehits=0 (result window size)
 func=(null)
  
 Note: this patch is not complete, as it contains no test cases and may not 
 conform 
 to all the guidelines in http://wiki.apache.org/solr/HowToContribute. 
  
 I would appreciate any feedback on the usability and implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6021) Always persist router.field in cluster state so CloudSolrServer can route documents correctly

2014-05-28 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011403#comment-14011403
 ] 

Alan Woodward commented on SOLR-6021:
-

{code:java}
  CoreContainer coreContainer = zkController.getCoreContainer();
  ConfigSetService configService = coreContainer.getCoreConfigService();
  CoreDescriptor dummy = new CoreDescriptor(coreContainer, 
collectionName, collectionName);
  ConfigSet configSet = configService.getConfig(dummy);
{code}

This seems like a lot of ceremony just to get the ConfigSet for a collection - 
maybe it should be a method on zkController itself?

 Always persist router.field in cluster state so CloudSolrServer can route 
 documents correctly
 -

 Key: SOLR-6021
 URL: https://issues.apache.org/jira/browse/SOLR-6021
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
 Attachments: SOLR-6021.patch


 CloudSolrServer has idField as id which is used for hashing and 
 distributing documents. There is a setter to change it as well.
 IMO, we should use the correct uniqueKey automatically. I propose that we 
 start storing router.field always in cluster state and set it to the 
 uniqueKey field name by default. Then CloudSolrServer would not need to 
 assume an id field by default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6118) expand.sort doesn't work with function queries

2014-05-28 Thread David Smiley (JIRA)
David Smiley created SOLR-6118:
--

 Summary: expand.sort doesn't work with function queries
 Key: SOLR-6118
 URL: https://issues.apache.org/jira/browse/SOLR-6118
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: David Smiley


The new ExpandComponent doesn't support function queries in the {{expand.sort}} 
parameter, such as geodist() for example.  Here's the stack trace if you try:

{noformat}
527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
null:java.lang.IllegalStateException: SortField needs to be rewritten through 
Sort.rewrite(..) and SortField.rewrite(..)
at org.apache.lucene.search.SortField.getComparator(SortField.java:433)
at 
org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66)
at 
org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171)
at 
org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133)
at 
org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079)
at 
org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310)
at 
org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6021) Always persist router.field in cluster state so CloudSolrServer can route documents correctly

2014-05-28 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011411#comment-14011411
 ] 

Shalin Shekhar Mangar commented on SOLR-6021:
-

Yeah, I don't like that as well. I'll refactor it into a method in 
ZkController. Thanks Alan.

 Always persist router.field in cluster state so CloudSolrServer can route 
 documents correctly
 -

 Key: SOLR-6021
 URL: https://issues.apache.org/jira/browse/SOLR-6021
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
 Attachments: SOLR-6021.patch


 CloudSolrServer has idField as id which is used for hashing and 
 distributing documents. There is a setter to change it as well.
 IMO, we should use the correct uniqueKey automatically. I propose that we 
 start storing router.field always in cluster state and set it to the 
 uniqueKey field name by default. Then CloudSolrServer would not need to 
 assume an id field by default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5285) Solr response format should support child Docs

2014-05-28 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011414#comment-14011414
 ] 

Mikhail Khludnev commented on SOLR-5285:


[~arcadius] it's SOLR-5743 . Not much progress so far. I'm expecting some 
movement during this year.

 Solr response format should support child Docs
 --

 Key: SOLR-5285
 URL: https://issues.apache.org/jira/browse/SOLR-5285
 Project: Solr
  Issue Type: New Feature
Reporter: Varun Thacker
 Fix For: 4.9, 5.0

 Attachments: SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, 
 SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, 
 SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, 
 SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, 
 javabin_backcompat_child_docs.bin


 Solr has added support for taking childDocs as input ( only XML till now ). 
 It's currently used for BlockJoinQuery. 
 I feel that if a user indexes a document with child docs, even if he isn't 
 using the BJQ features and is just searching which results in a hit on the 
 parentDoc, it's childDocs should be returned in the response format.
 [~hossman_luc...@fucit.org] on IRC suggested that the DocTransformers would 
 be the place to add childDocs to the response.
 Now given a docId one needs to find out all the childDoc id's. A couple of 
 approaches which I could think of are 
 1. Maintain the relation between a parentDoc and it's childDocs during 
 indexing time in maybe a separate index?
 2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a 
 parentDoc it finds out all the childDocs but this requires a childScorer.
 Am I missing something obvious on how to find the relation between a 
 parentDoc and it's childDocs because none of the above solutions for this 
 look right.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-6118) expand.sort doesn't work with function queries

2014-05-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned SOLR-6118:
--

Assignee: David Smiley

 expand.sort doesn't work with function queries
 --

 Key: SOLR-6118
 URL: https://issues.apache.org/jira/browse/SOLR-6118
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: David Smiley
Assignee: David Smiley

 The new ExpandComponent doesn't support function queries in the 
 {{expand.sort}} parameter, such as geodist() for example.  Here's the stack 
 trace if you try:
 {noformat}
 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
 null:java.lang.IllegalStateException: SortField needs to be rewritten through 
 Sort.rewrite(..) and SortField.rewrite(..)
   at org.apache.lucene.search.SortField.getComparator(SortField.java:433)
   at 
 org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66)
   at 
 org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079)
   at 
 org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310)
   at 
 org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6118) expand.sort doesn't work with function queries

2014-05-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-6118:
---

Attachment: SOLR-6118_expand_sort_rewrite.patch

Simple fix w/ test. I think I've been bitten by this type of bug before -- 
forgetting to call:
{code:java}
  if (sort != null)
sort = sort.rewrite(searcher);
{code}

It'd be nice if somehow Lucene's Sort.java maintained a rewritten boolean 
flag so that a bug like this would have been caught earlier during development. 
 Maybe that's the solution, maybe not.

I noticed some indentation/spacing problems in ExpandComponent.java. I'll fix 
them in a separate commit.

 expand.sort doesn't work with function queries
 --

 Key: SOLR-6118
 URL: https://issues.apache.org/jira/browse/SOLR-6118
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-6118_expand_sort_rewrite.patch


 The new ExpandComponent doesn't support function queries in the 
 {{expand.sort}} parameter, such as geodist() for example.  Here's the stack 
 trace if you try:
 {noformat}
 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
 null:java.lang.IllegalStateException: SortField needs to be rewritten through 
 Sort.rewrite(..) and SortField.rewrite(..)
   at org.apache.lucene.search.SortField.getComparator(SortField.java:433)
   at 
 org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66)
   at 
 org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079)
   at 
 org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310)
   at 
 org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6118) expand.sort doesn't work with function queries

2014-05-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011560#comment-14011560
 ] 

ASF subversion and git services commented on SOLR-6118:
---

Commit 1598138 from [~dsmiley] in branch 'dev/trunk'
[ https://svn.apache.org/r1598138 ]

SOLR-6118: expand.sort bug for function queries; needed to 
sort.rewrite(searcher)

 expand.sort doesn't work with function queries
 --

 Key: SOLR-6118
 URL: https://issues.apache.org/jira/browse/SOLR-6118
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-6118_expand_sort_rewrite.patch


 The new ExpandComponent doesn't support function queries in the 
 {{expand.sort}} parameter, such as geodist() for example.  Here's the stack 
 trace if you try:
 {noformat}
 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
 null:java.lang.IllegalStateException: SortField needs to be rewritten through 
 Sort.rewrite(..) and SortField.rewrite(..)
   at org.apache.lucene.search.SortField.getComparator(SortField.java:433)
   at 
 org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66)
   at 
 org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079)
   at 
 org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310)
   at 
 org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5831) Scale score PostFilter

2014-05-28 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011568#comment-14011568
 ] 

Joel Bernstein commented on SOLR-5831:
--

If your Query class implements PostFilter and ScoreFilter then the 
SolrIndexSearcher will make sure the scorer is present during docset retrieval. 
ScoreFilter is just a flag, no methods.

It's little things like this that make me believe this would be better as a 
pluggable collector. SOLR-5973 is now committed. You can also see a pluggable 
collector example with SOLR-6088.

 Scale score PostFilter
 --

 Key: SOLR-5831
 URL: https://issues.apache.org/jira/browse/SOLR-5831
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.7
Reporter: Peter Keegan
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, 
 SOLR-5831.patch, SOLR-5831.patch, TestScaleScoreQParserPlugin.patch


 The ScaleScoreQParserPlugin is a PostFilter that performs score scaling.
 This is an alternative to using a function query wrapping a scale() wrapping 
 a query(). For example:
 select?qq={!edismax v='news' qf='title^2 
 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query
  v=$qq}
 The problem with this query is that it has to scale every hit. Usually, only 
 the returned hits need to be scaled,
 but there may be use cases where the number of hits to be scaled is greater 
 than the returned hit count,
 but less than or equal to the total hit count.
 Sample syntax:
 fq={!scalescore+l=0.0 u=1.0 maxscalehits=1 
 func=sum(product(sscore(),0.75),product(field(myfield),0.25))}
 l=0.0 u=1.0   //Scale scores to values between 0-1, inclusive 
 maxscalehits=1//The maximum number of result scores to scale (-1 = 
 all hits, 0 = results 'page' size)
 func=...  //Apply the composite function to each hit. The 
 scaled score value is accessed by the 'score()' value source
 All parameters are optional. The defaults are:
 l=0.0 u=1.0
 maxscalehits=0 (result window size)
 func=(null)
  
 Note: this patch is not complete, as it contains no test cases and may not 
 conform 
 to all the guidelines in http://wiki.apache.org/solr/HowToContribute. 
  
 I would appreciate any feedback on the usability and implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4502) ShardHandlerFactory not initialized in CoreContainer when creating a Core manually.

2014-05-28 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011600#comment-14011600
 ] 

Erick Erickson commented on SOLR-4502:
--

bq: and I argue not with a SolrConfig object either

Hmmm. What about running EmbeddedSolrServer as part of the 
MapReduceIndexerTool, especially when the configs are kept in ZooKeeper and the 
index is backed by HDFS?

...and the thigh bone is connected to the arm bone

 ShardHandlerFactory not initialized in CoreContainer when creating a Core 
 manually.
 ---

 Key: SOLR-4502
 URL: https://issues.apache.org/jira/browse/SOLR-4502
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.1
Reporter: Michael Aspetsberger
Assignee: Mark Miller
  Labels: NPE
 Fix For: 4.9, 5.0


 We are using an embedded solr server for our unit testing purposes. In our 
 scenario, we create a {{CoreContainer}} using only the solr-home path, and 
 then create the cores manually using a {{CoreDescriptor}}.
 While the creation appears to work fine, it hits an NPE when it handles the 
 search:
 {quote}
 Caused by: java.lang.NullPointerException
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
   at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150)
 {quote}
 According to 
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3CE8A9BF60-5577-45F9-8BEA-B85616C6539D%40gmail.com%3E
  , this is due to a missing {{CoreContainer#load}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6118) expand.sort doesn't work with function queries

2014-05-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011615#comment-14011615
 ] 

ASF subversion and git services commented on SOLR-6118:
---

Commit 1598147 from [~dsmiley] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1598147 ]

SOLR-6118: expand.sort bug for function queries; needed to 
sort.rewrite(searcher)

 expand.sort doesn't work with function queries
 --

 Key: SOLR-6118
 URL: https://issues.apache.org/jira/browse/SOLR-6118
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-6118_expand_sort_rewrite.patch


 The new ExpandComponent doesn't support function queries in the 
 {{expand.sort}} parameter, such as geodist() for example.  Here's the stack 
 trace if you try:
 {noformat}
 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
 null:java.lang.IllegalStateException: SortField needs to be rewritten through 
 Sort.rewrite(..) and SortField.rewrite(..)
   at org.apache.lucene.search.SortField.getComparator(SortField.java:433)
   at 
 org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66)
   at 
 org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079)
   at 
 org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310)
   at 
 org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-6118) expand.sort doesn't work with function queries

2014-05-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-6118.


   Resolution: Fixed
Fix Version/s: 5.0
   4.9

 expand.sort doesn't work with function queries
 --

 Key: SOLR-6118
 URL: https://issues.apache.org/jira/browse/SOLR-6118
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.9, 5.0

 Attachments: SOLR-6118_expand_sort_rewrite.patch


 The new ExpandComponent doesn't support function queries in the 
 {{expand.sort}} parameter, such as geodist() for example.  Here's the stack 
 trace if you try:
 {noformat}
 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
 null:java.lang.IllegalStateException: SortField needs to be rewritten through 
 Sort.rewrite(..) and SortField.rewrite(..)
   at org.apache.lucene.search.SortField.getComparator(SortField.java:433)
   at 
 org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66)
   at 
 org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079)
   at 
 org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310)
   at 
 org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5831) Scale score PostFilter

2014-05-28 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011635#comment-14011635
 ] 

Peter Keegan commented on SOLR-5831:


Thanks, that was a simple fix. But if the results are coming from the cache, 
why does the PostFilter collection have to be rerun?

I totally agree that there are a lot of little details that make it tricky to 
implement a PostFilter. For the short term, we'll likely go to production with 
it, though, since we're running on 4.6.1.  Can the pluggable collector 
framework be patched into 4.6.1? (when I looked at it a while ago, it didn't 
seem so)

Peter


 Scale score PostFilter
 --

 Key: SOLR-5831
 URL: https://issues.apache.org/jira/browse/SOLR-5831
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.7
Reporter: Peter Keegan
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, 
 SOLR-5831.patch, SOLR-5831.patch, TestScaleScoreQParserPlugin.patch


 The ScaleScoreQParserPlugin is a PostFilter that performs score scaling.
 This is an alternative to using a function query wrapping a scale() wrapping 
 a query(). For example:
 select?qq={!edismax v='news' qf='title^2 
 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query
  v=$qq}
 The problem with this query is that it has to scale every hit. Usually, only 
 the returned hits need to be scaled,
 but there may be use cases where the number of hits to be scaled is greater 
 than the returned hit count,
 but less than or equal to the total hit count.
 Sample syntax:
 fq={!scalescore+l=0.0 u=1.0 maxscalehits=1 
 func=sum(product(sscore(),0.75),product(field(myfield),0.25))}
 l=0.0 u=1.0   //Scale scores to values between 0-1, inclusive 
 maxscalehits=1//The maximum number of result scores to scale (-1 = 
 all hits, 0 = results 'page' size)
 func=...  //Apply the composite function to each hit. The 
 scaled score value is accessed by the 'score()' value source
 All parameters are optional. The defaults are:
 l=0.0 u=1.0
 maxscalehits=0 (result window size)
 func=(null)
  
 Note: this patch is not complete, as it contains no test cases and may not 
 conform 
 to all the guidelines in http://wiki.apache.org/solr/HowToContribute. 
  
 I would appreciate any feedback on the usability and implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6118) expand.sort doesn't work with function queries

2014-05-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011670#comment-14011670
 ] 

David Smiley commented on SOLR-6118:


FYI I also committed minor improvements pertaining to java 5 generics.

 expand.sort doesn't work with function queries
 --

 Key: SOLR-6118
 URL: https://issues.apache.org/jira/browse/SOLR-6118
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.9, 5.0

 Attachments: SOLR-6118_expand_sort_rewrite.patch


 The new ExpandComponent doesn't support function queries in the 
 {{expand.sort}} parameter, such as geodist() for example.  Here's the stack 
 trace if you try:
 {noformat}
 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
 null:java.lang.IllegalStateException: SortField needs to be rewritten through 
 Sort.rewrite(..) and SortField.rewrite(..)
   at org.apache.lucene.search.SortField.getComparator(SortField.java:433)
   at 
 org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66)
   at 
 org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079)
   at 
 org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310)
   at 
 org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5712) Remove Similarity.queryNorm

2014-05-28 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5712:
--

 Summary: Remove Similarity.queryNorm
 Key: LUCENE-5712
 URL: https://issues.apache.org/jira/browse/LUCENE-5712
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.9, 5.0


This method is a no-op for ranking within one query, causes confusion for users 
making their own Similarity impls, and isn't necessary for / makes it harder to 
switch the default to more modern scoring models like BM25.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5831) Scale score PostFilter

2014-05-28 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011723#comment-14011723
 ] 

Joel Bernstein commented on SOLR-5831:
--

The main DocSet isn't cached. So, if you pull a DocList from the 
QueryResultCache, Solr needs to regenerate the DocSet for faceting etc...

 Scale score PostFilter
 --

 Key: SOLR-5831
 URL: https://issues.apache.org/jira/browse/SOLR-5831
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.7
Reporter: Peter Keegan
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, 
 SOLR-5831.patch, SOLR-5831.patch, TestScaleScoreQParserPlugin.patch


 The ScaleScoreQParserPlugin is a PostFilter that performs score scaling.
 This is an alternative to using a function query wrapping a scale() wrapping 
 a query(). For example:
 select?qq={!edismax v='news' qf='title^2 
 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query
  v=$qq}
 The problem with this query is that it has to scale every hit. Usually, only 
 the returned hits need to be scaled,
 but there may be use cases where the number of hits to be scaled is greater 
 than the returned hit count,
 but less than or equal to the total hit count.
 Sample syntax:
 fq={!scalescore+l=0.0 u=1.0 maxscalehits=1 
 func=sum(product(sscore(),0.75),product(field(myfield),0.25))}
 l=0.0 u=1.0   //Scale scores to values between 0-1, inclusive 
 maxscalehits=1//The maximum number of result scores to scale (-1 = 
 all hits, 0 = results 'page' size)
 func=...  //Apply the composite function to each hit. The 
 scaled score value is accessed by the 'score()' value source
 All parameters are optional. The defaults are:
 l=0.0 u=1.0
 maxscalehits=0 (result window size)
 func=(null)
  
 Note: this patch is not complete, as it contains no test cases and may not 
 conform 
 to all the guidelines in http://wiki.apache.org/solr/HowToContribute. 
  
 I would appreciate any feedback on the usability and implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6118) expand.sort doesn't work with function queries

2014-05-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011969#comment-14011969
 ] 

ASF subversion and git services commented on SOLR-6118:
---

Commit 1598193 from [~dsmiley] in branch 'dev/trunk'
[ https://svn.apache.org/r1598193 ]

SOLR-6118: CHANGES.txt

 expand.sort doesn't work with function queries
 --

 Key: SOLR-6118
 URL: https://issues.apache.org/jira/browse/SOLR-6118
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.9, 5.0

 Attachments: SOLR-6118_expand_sort_rewrite.patch


 The new ExpandComponent doesn't support function queries in the 
 {{expand.sort}} parameter, such as geodist() for example.  Here's the stack 
 trace if you try:
 {noformat}
 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
 null:java.lang.IllegalStateException: SortField needs to be rewritten through 
 Sort.rewrite(..) and SortField.rewrite(..)
   at org.apache.lucene.search.SortField.getComparator(SortField.java:433)
   at 
 org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66)
   at 
 org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079)
   at 
 org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310)
   at 
 org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6118) expand.sort doesn't work with function queries

2014-05-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011972#comment-14011972
 ] 

ASF subversion and git services commented on SOLR-6118:
---

Commit 1598194 from [~dsmiley] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1598194 ]

Merged from trunk
SOLR-6118: CHANGES.txt [from revision 1598193]

 expand.sort doesn't work with function queries
 --

 Key: SOLR-6118
 URL: https://issues.apache.org/jira/browse/SOLR-6118
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.9, 5.0

 Attachments: SOLR-6118_expand_sort_rewrite.patch


 The new ExpandComponent doesn't support function queries in the 
 {{expand.sort}} parameter, such as geodist() for example.  Here's the stack 
 trace if you try:
 {noformat}
 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
 null:java.lang.IllegalStateException: SortField needs to be rewritten through 
 Sort.rewrite(..) and SortField.rewrite(..)
   at org.apache.lucene.search.SortField.getComparator(SortField.java:433)
   at 
 org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66)
   at 
 org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133)
   at 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079)
   at 
 org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310)
   at 
 org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5868) HttpClient should be configured to use ALLOW_ALL_HOSTNAME hostname verifier to simplify SSL setup

2014-05-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011974#comment-14011974
 ] 

David Smiley commented on SOLR-5868:


There's a lingering entry for this in CHANGES.txt on the 4x below the Getting 
Started section; probably as a result of a merge problem.  If this issue isn't 
resolved them it should be removed; if it is resolved then it should be moved 
to the right spot.

 HttpClient should be configured to use ALLOW_ALL_HOSTNAME hostname verifier 
 to simplify SSL setup
 -

 Key: SOLR-5868
 URL: https://issues.apache.org/jira/browse/SOLR-5868
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.7
Reporter: Steve Davids
Assignee: Mark Miller
 Fix For: 4.9, 5.0

 Attachments: SOLR-5868.patch, SOLR-5868.patch


 The default HttpClient hostname verifier is the 
 BROWSER_COMPATIBLE_HOSTNAME_VERIFIER which verifies the hostname that is 
 being connected to matches the hostname presented within the certificate. 
 This is meant to protect clients that are making external requests out across 
 the internet, but requests within the the SOLR cluster should be trusted and 
 can be relaxed to simplify the SSL/certificate setup process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6029) CollapsingQParserPlugin throws ArrayIndexOutOfBoundsException if elevated doc has been deleted from a segment

2014-05-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012035#comment-14012035
 ] 

ASF subversion and git services commented on SOLR-6029:
---

Commit 1598196 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1598196 ]

SOLR-6029: fix smoke test failure

 CollapsingQParserPlugin throws ArrayIndexOutOfBoundsException if elevated doc 
 has been deleted from a segment
 -

 Key: SOLR-6029
 URL: https://issues.apache.org/jira/browse/SOLR-6029
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.7.1
Reporter: Greg Harris
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.8.1, 4.9

 Attachments: SOLR-6029.patch


 CollapsingQParserPlugin misidentifies if a document is not found in a segment 
 if the docid previously existed in a segment ie was deleted. 
 Relevant code bit from CollapsingQParserPlugin needs to be changed from:
 -if(doc != -1) {
 +if((doc != -1)  (doc != DocsEnum.NO_MORE_DOCS)) {
 What happens is if the doc is not found the returned value is 
 DocsEnum.NO_MORE_DOCS. This would then get set in the fq bitSet array as the 
 doc location causing an ArrayIndexOutOfBoundsException as the array is only 
 as big as maxDocs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6029) CollapsingQParserPlugin throws ArrayIndexOutOfBoundsException if elevated doc has been deleted from a segment

2014-05-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012034#comment-14012034
 ] 

ASF subversion and git services commented on SOLR-6029:
---

Commit 1598195 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1598195 ]

SOLR-6029: fix smoke test failure

 CollapsingQParserPlugin throws ArrayIndexOutOfBoundsException if elevated doc 
 has been deleted from a segment
 -

 Key: SOLR-6029
 URL: https://issues.apache.org/jira/browse/SOLR-6029
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.7.1
Reporter: Greg Harris
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.8.1, 4.9

 Attachments: SOLR-6029.patch


 CollapsingQParserPlugin misidentifies if a document is not found in a segment 
 if the docid previously existed in a segment ie was deleted. 
 Relevant code bit from CollapsingQParserPlugin needs to be changed from:
 -if(doc != -1) {
 +if((doc != -1)  (doc != DocsEnum.NO_MORE_DOCS)) {
 What happens is if the doc is not found the returned value is 
 DocsEnum.NO_MORE_DOCS. This would then get set in the fq bitSet array as the 
 doc location causing an ArrayIndexOutOfBoundsException as the array is only 
 as big as maxDocs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Request for discussion: Solr plugins

2014-05-28 Thread Alexandre Rafalovitch
Hi,

I would like to (re-)initiate a discussion about Solr support for
plugin life-cycle (publish, discover, download, dependency
management). Triggered by a discussion on the Solr mailing list:
http://search-lucene.com/m/QTPaIv50e1subj=Re+Contribute+QParserPlugin

My main points:
1) Plugins/Modules/packages seem to be a very core part of most of the
modern projects
2) Solr is extremely modular on the implementation level.
3) The community has been slowly building various plugins for Solr,
but without any way to announce/share them.
4) ElasticSearch has plugins and it's been consistently pointed out as
a positive point (good on them)
5) Solr, frankly, is getting rather pudgy. Or possibly beyond mere
pudgy. This is becoming especially noticeable by comparison with
ElasticSearch but also with the increasing frequency of releases. I
mentioned this issue a couple of times in the past under different
angles (bundling Javadoc, compressing files, easy onboarding, etc).
6) Solr has so many features now that nobody will explore them all;
yet people still download them and - sometimes - get confused by all
the directories, jars and locations. Polish support alone has a
significant number of jars (not sure about file sizes). I am not even
talking about map-reduce+morphline in the recent release.
7) Solr is already published as a set of Maven jars with dependencies
expressed between components.
8) Apart from making initial downloads smaller, having proper module
system (publish, discover, download, install) provides incentives for
people to push the packages out, creates a stronger community

Now, I know that some of the weight might be addressed in Solr 5 by
not bundling the war file as well as the libs. And some of the
ElasticSearch comparison is due to the different philosophical
approach (kitchen sink vs not even bundling Admin UI). And that any
individual person does not download Solr packages that often (I might
be an exception). But I still think we need the discussion.

I would especially love to see a discussion of the lowest hanging
fruit. Even if we cannot decompose Solr itself right now, maybe we can
introduce additional package handling mechanism and then retrofit Solr
into that.

In terms of skin-in-the-game, I would be happy to build and operate
package publish/discovery system/website if Solr was actually able to
support one. :-)

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6062) Phrase queries are created for each field supplied through edismax's pf, pf2 and pf3 parameters (rather them being combined in a single dismax query)

2014-05-28 Thread Ron Mayer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012039#comment-14012039
 ] 

Ron Mayer commented on SOLR-6062:
-

Regarding the original/linked issue allowed the same field to be passed 
through a pf parameter with differing slop values. The intent being that those 
scores would be combined, rather than the max being used across those fields.  
  The observation that lead to using the same field with different slop values 
was that if: either many of the words in searched clauses were in the same 
paragraph ( a pretty large slop value); or many pairs of words from search 
clauses were in the same adjectives/noun clauses of the text (quite small small 
slop value; to make a search for 'old hairy cat' rank well against 'hairy old 
cat' ) a document was likely to be interesting.

If I understand right, it sounds to me like what Michael described continue to 
be good for those cases.I'm traveling this week, but have some test cases 
comparing ranking of solr-2058 vs human-sorted documents that I can run when 
I'm back thursday of next week.



 Phrase queries are created for each field supplied through edismax's pf, pf2 
 and pf3 parameters (rather them being combined in a single dismax query)
 -

 Key: SOLR-6062
 URL: https://issues.apache.org/jira/browse/SOLR-6062
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Michael Dodsworth
Priority: Minor
 Attachments: combined-phrased-dismax.patch


 https://issues.apache.org/jira/browse/SOLR-2058 subtly changed how phrase 
 queries, created through the pf, pf2 and pf3 parameters, are merged into the 
 main user query.
 For the query: 'term1 term2' with pf2:[field1, field2, field3] we now get 
 (omitting the non phrase query section for clarity):
 {code:java}
 main query
 DisjunctionMaxQuery((field1:term1 term2^1.0)~0.1)
 DisjunctionMaxQuery((field2:term1 term2^1.0)~0.1)
 DisjunctionMaxQuery((field3:term1 term2^1.0)~0.1)
 {code}
 Prior to this change, we had:
 {code:java}
 main query 
 DisjunctionMaxQuery((field1:term1 term2^1.0 | field2:term1 term2^1.0 | 
 field3:term1 term2^1.0)~0.1)
 {code}
 The upshot being that if the phrase query term1 term2 appears in multiple 
 fields, it will get a significant boost over the previous implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5713) DocValues related CheckIndex test failure

2014-05-28 Thread David Smiley (JIRA)
David Smiley created LUCENE-5713:


 Summary: DocValues related CheckIndex test failure
 Key: LUCENE-5713
 URL: https://issues.apache.org/jira/browse/LUCENE-5713
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 5.0
Reporter: David Smiley


The following reproduces for me and [~varunshenoy] on trunk:
lucene/spatial %ant test  -Dtestcase=SpatialOpRecursivePrefixTreeTest 
-Dtests.method=testContains -Dtests.seed=3AD27D1EB168088A

{noformat}
[junit4]   1 Strategy: 
RecursivePrefixTreeStrategy(SPG:(GeohashPrefixTree(maxLevels:2,ctx:SpatialContext.GEO)))
   [junit4]   1 CheckReader failed
   [junit4]   1 test: field norms.OK [0 fields]
   [junit4]   1 test: terms, freq, prox...OK [207 terms; 208 terms/docs 
pairs; 0 tokens]
   [junit4]   1 test: stored fields...OK [8 total field count; avg 2 
fields per doc]
   [junit4]   1 test: term vectorsOK [0 total vector count; avg 0 
term/freq vector fields per doc]
   [junit4]   1 test: docvalues...ERROR [dv for field: 
SpatialOpRecursivePrefixTreeTest has -1 ord but is not marked missing for doc: 
0]
   [junit4]   1 java.lang.RuntimeException: dv for field: 
SpatialOpRecursivePrefixTreeTest has -1 ord but is not marked missing for doc: 0
   [junit4]   1at 
org.apache.lucene.index.CheckIndex.checkSortedDocValues(CheckIndex.java:1414)
   [junit4]   1at 
org.apache.lucene.index.CheckIndex.checkDocValues(CheckIndex.java:1536)
   [junit4]   1at 
org.apache.lucene.index.CheckIndex.testDocValues(CheckIndex.java:1367)
   [junit4]   1at 
org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:229)
   [junit4]   1at 
org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:216)
   [junit4]   1at 
org.apache.lucene.util.LuceneTestCase.newSearcher(LuceneTestCase.java:1597)
{noformat}

A 1-in-500 random condition hit to check the index on newSearcher, hitting 
this.  DocValues used to not be enabled for this spatial test but [~rcmuir] 
added it recently as part of the move to the DocValues API in lieu of the 
FieldCache API, and because the DisjointSpatialFilter uses getDocsWithField 
(though nothing else).  That probably doesn't have anything to do with whatever 
the problem here is, though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5680) Allow updating multiple DocValues fields atomically

2014-05-28 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012065#comment-14012065
 ] 

Shai Erera commented on LUCENE-5680:


I think it's ready so if there are no objections I'll commit later today.

 Allow updating multiple DocValues fields atomically
 ---

 Key: LUCENE-5680
 URL: https://issues.apache.org/jira/browse/LUCENE-5680
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5680.patch, LUCENE-5680.patch, LUCENE-5680.patch, 
 LUCENE-5680.patch


 This has come up on the list (http://markmail.org/message/2wmpvksuwc5t57pg) 
 -- it would be good if we can allow updating several doc-values fields, 
 atomically. It will also improve/simplify our tests, where today we index two 
 fields, e.g. the field itself and a control field. In some multi-threaded 
 tests, since we cannot be sure which updates came through first, we limit the 
 test such that each thread updates a different set of fields, otherwise they 
 will collide and it will be hard to verify the index in the end.
 I was working on a patch and it looks pretty simple to do, will post a patch 
 shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012084#comment-14012084
 ] 

Alexander S. commented on SOLR-5463:


If, as David mentioned, Solr will add it only if it is not there, this should 
keep the ability for users to manually specify another key and order when that 
is required (a rare case it seems).

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6119) TestReplicationHandler attempts to remove open folders

2014-05-28 Thread Dawid Weiss (JIRA)
Dawid Weiss created SOLR-6119:
-

 Summary: TestReplicationHandler attempts to remove open folders
 Key: SOLR-6119
 URL: https://issues.apache.org/jira/browse/SOLR-6119
 Project: Solr
  Issue Type: Bug
Reporter: Dawid Weiss
Priority: Minor


TestReplicationHandler has a weird logic around the 'snapDir' variable. It 
attempts to remove snapshot folders, even though they're not closed yet. My 
recent patch uncovered the bug but I don't know how to fix it cleanly -- the 
test itself seems to be very fragile (for example I don't understand the 
'namedBackup' variable which is always set to true, yet there are conditionals 
around it).





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6119) TestReplicationHandler attempts to remove open folders

2014-05-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012096#comment-14012096
 ] 

ASF subversion and git services commented on SOLR-6119:
---

Commit 1598206 from [~dawidweiss] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1598206 ]

SOLR-6119: a quick workaround for the problem of removing files that are still 
open during the test.

 TestReplicationHandler attempts to remove open folders
 --

 Key: SOLR-6119
 URL: https://issues.apache.org/jira/browse/SOLR-6119
 Project: Solr
  Issue Type: Bug
Reporter: Dawid Weiss
Priority: Minor

 TestReplicationHandler has a weird logic around the 'snapDir' variable. It 
 attempts to remove snapshot folders, even though they're not closed yet. My 
 recent patch uncovered the bug but I don't know how to fix it cleanly -- the 
 test itself seems to be very fragile (for example I don't understand the 
 'namedBackup' variable which is always set to true, yet there are 
 conditionals around it).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6119) TestReplicationHandler attempts to remove open folders

2014-05-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012094#comment-14012094
 ] 

ASF subversion and git services commented on SOLR-6119:
---

Commit 1598205 from [~dawidweiss] in branch 'dev/trunk'
[ https://svn.apache.org/r1598205 ]

SOLR-6119: a quick workaround for the problem of removing files that are still 
open during the test.

 TestReplicationHandler attempts to remove open folders
 --

 Key: SOLR-6119
 URL: https://issues.apache.org/jira/browse/SOLR-6119
 Project: Solr
  Issue Type: Bug
Reporter: Dawid Weiss
Priority: Minor

 TestReplicationHandler has a weird logic around the 'snapDir' variable. It 
 attempts to remove snapshot folders, even though they're not closed yet. My 
 recent patch uncovered the bug but I don't know how to fix it cleanly -- the 
 test itself seems to be very fragile (for example I don't understand the 
 'namedBackup' variable which is always set to true, yet there are 
 conditionals around it).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5713) DocValues related CheckIndex test failure

2014-05-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012101#comment-14012101
 ] 

Robert Muir commented on LUCENE-5713:
-

I did not add docvalues to the lucene/spatial module. It still uses fieldcache. 
I hope this is clear that this is just a fieldcache bug :)

 DocValues related CheckIndex test failure
 -

 Key: LUCENE-5713
 URL: https://issues.apache.org/jira/browse/LUCENE-5713
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 5.0
Reporter: David Smiley

 The following reproduces for me and [~varunshenoy] on trunk:
 lucene/spatial %ant test  -Dtestcase=SpatialOpRecursivePrefixTreeTest 
 -Dtests.method=testContains -Dtests.seed=3AD27D1EB168088A
 {noformat}
 [junit4]   1 Strategy: 
 RecursivePrefixTreeStrategy(SPG:(GeohashPrefixTree(maxLevels:2,ctx:SpatialContext.GEO)))
[junit4]   1 CheckReader failed
[junit4]   1 test: field norms.OK [0 fields]
[junit4]   1 test: terms, freq, prox...OK [207 terms; 208 terms/docs 
 pairs; 0 tokens]
[junit4]   1 test: stored fields...OK [8 total field count; avg 2 
 fields per doc]
[junit4]   1 test: term vectorsOK [0 total vector count; avg 
 0 term/freq vector fields per doc]
[junit4]   1 test: docvalues...ERROR [dv for field: 
 SpatialOpRecursivePrefixTreeTest has -1 ord but is not marked missing for 
 doc: 0]
[junit4]   1 java.lang.RuntimeException: dv for field: 
 SpatialOpRecursivePrefixTreeTest has -1 ord but is not marked missing for 
 doc: 0
[junit4]   1  at 
 org.apache.lucene.index.CheckIndex.checkSortedDocValues(CheckIndex.java:1414)
[junit4]   1  at 
 org.apache.lucene.index.CheckIndex.checkDocValues(CheckIndex.java:1536)
[junit4]   1  at 
 org.apache.lucene.index.CheckIndex.testDocValues(CheckIndex.java:1367)
[junit4]   1  at 
 org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:229)
[junit4]   1  at 
 org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:216)
[junit4]   1  at 
 org.apache.lucene.util.LuceneTestCase.newSearcher(LuceneTestCase.java:1597)
 {noformat}
 A 1-in-500 random condition hit to check the index on newSearcher, hitting 
 this.  DocValues used to not be enabled for this spatial test but [~rcmuir] 
 added it recently as part of the move to the DocValues API in lieu of the 
 FieldCache API, and because the DisjointSpatialFilter uses getDocsWithField 
 (though nothing else).  That probably doesn't have anything to do with 
 whatever the problem here is, though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5713) FieldCache related test failure

2014-05-28 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5713:


Summary: FieldCache related test failure  (was: DocValues related 
CheckIndex test failure)

 FieldCache related test failure
 ---

 Key: LUCENE-5713
 URL: https://issues.apache.org/jira/browse/LUCENE-5713
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 5.0
Reporter: David Smiley

 The following reproduces for me and [~varunshenoy] on trunk:
 lucene/spatial %ant test  -Dtestcase=SpatialOpRecursivePrefixTreeTest 
 -Dtests.method=testContains -Dtests.seed=3AD27D1EB168088A
 {noformat}
 [junit4]   1 Strategy: 
 RecursivePrefixTreeStrategy(SPG:(GeohashPrefixTree(maxLevels:2,ctx:SpatialContext.GEO)))
[junit4]   1 CheckReader failed
[junit4]   1 test: field norms.OK [0 fields]
[junit4]   1 test: terms, freq, prox...OK [207 terms; 208 terms/docs 
 pairs; 0 tokens]
[junit4]   1 test: stored fields...OK [8 total field count; avg 2 
 fields per doc]
[junit4]   1 test: term vectorsOK [0 total vector count; avg 
 0 term/freq vector fields per doc]
[junit4]   1 test: docvalues...ERROR [dv for field: 
 SpatialOpRecursivePrefixTreeTest has -1 ord but is not marked missing for 
 doc: 0]
[junit4]   1 java.lang.RuntimeException: dv for field: 
 SpatialOpRecursivePrefixTreeTest has -1 ord but is not marked missing for 
 doc: 0
[junit4]   1  at 
 org.apache.lucene.index.CheckIndex.checkSortedDocValues(CheckIndex.java:1414)
[junit4]   1  at 
 org.apache.lucene.index.CheckIndex.checkDocValues(CheckIndex.java:1536)
[junit4]   1  at 
 org.apache.lucene.index.CheckIndex.testDocValues(CheckIndex.java:1367)
[junit4]   1  at 
 org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:229)
[junit4]   1  at 
 org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:216)
[junit4]   1  at 
 org.apache.lucene.util.LuceneTestCase.newSearcher(LuceneTestCase.java:1597)
 {noformat}
 A 1-in-500 random condition hit to check the index on newSearcher, hitting 
 this.  DocValues used to not be enabled for this spatial test but [~rcmuir] 
 added it recently as part of the move to the DocValues API in lieu of the 
 FieldCache API, and because the DisjointSpatialFilter uses getDocsWithField 
 (though nothing else).  That probably doesn't have anything to do with 
 whatever the problem here is, though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org