Re: Getting term vectors/computing cosine similarity
On May 27, 2014, at 19:17, Michael O'Leary mich...@moz.com wrote: *tl;dnr*: a next() method is defined for the Java class TVTermsEnum in Lucene 4.8.1, but it looks like there is no next() method available for an object that looks like it is an instance of the Python class TVTermsEnum in PyLucene 4.8.1. If there is a next() method, there is a good chance the object is even iterable (in the python sense). You may need to cast it first, though, as the api that returned it to you may not be defined to return TVTermsEnum: TVTermsEnum.cast_(obj) A good place for PyLucene code examples is its suite of unit tests. It also has a few samples - way less than in 3.x releases because the APIs changed too much. I'm pretty sure there is a test involving TermsEnum in the tests directory. Andi.. I have a set of documents that I would like to cluster. These documents share a vocabulary of only about 3,000 unique terms, but there are about 15,000,000 documents. One way I thought of doing this would be to index the documents using PyLucene (Python is the preferred programming language at work), obtain term vectors for the documents using PyLucene API functions, and calculate cosine similarities between pairs of term vectors in order to determine which documents are close to each other. I found some sample Java code on the web that various people have posted showing ways to do this with older versions of Lucene. I downloaded PyLucene 4.8.1 and compared its API functions with the ones used in the code samples, and saw that this is an area of Lucene that has changed quite a bit. I can send an email to the lucene-user mailing group to ask what would be a good way of doing this using version 4.8.1, but the question I have for this mailing group has to do with some Java API functions that it looks like are not exposed in Python, unless I have to go about accessing them in a different way. If I obtain the term vector for the field cat_ids in a document with id doc_id_1 doc_1_tfv = reader.getTermVector(doc_id_1, cat_ids) then doc_1_tfv is displayed as this object: Terms: org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTerms@32c46396 In some of the sample code I looked at, the terms in doc_1_tfv could be obtained with doc_1_tfv.getTerms(), but it looks like getTerms is not a member function of Terms or its subclasses any more. In another code sample, an iterator for the term vector is obtained via tfv_iter = doc_1_tfv.iterator(None) and then the terms are obtained one by one with calls to tfv_iter.next(). This is where I get stuck. tfv_iter has this value: TermsEnum: org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum@1cca2369 and there is a next() function defined for the TVTermsEnum class, but this object doesn't list next() as one of its member functions and an exception is raised if it is called. It looks like the object only supports the member functions defined for the TermsEnum class, and next() is not one of them. Is this the case, or is there a way have it support all of the TVTermsEnum member functions, including next()? TVTermsEnum is a private class in CompressingTermVectorsReader.java. So I am wondering if there is a way to obtain term vectors in this way and that I am just not treating doc_1_tfv and tfv_iter in the right way, or if there is a different, better way to get term vectors for documents in a PyLucene index, or if this isn't something that Lucene should be used for. Thank you very much for any help you can provide. Mike
Re: Getting term vectors/computing cosine similarity
On May 27, 2014, at 21:03, Michael O'Leary mich...@moz.com wrote: Hi Andi, Thanks for the help. I just tried to import TVTermsEnum so I could try casting my iter, and I don't see how to do it since TVTermsEnum is a private class with fully qualified name org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum. If it's a private class then you have no access to it and no python wrapper class was generated for it. Try going up the superclass chain until you hit a public class or interface, probably TermsEnum. Andi.. I tried from org.apache.lucene.codecs.compressing import CompressingTermVectorsReader$TVTermsEnum from org.apache.lucene.codecs.compressing import TVTermsEnum and import org.apache.lucene.codecs.compressing but none of them provided access to TVTermsEnum (the first two raised exceptions). After running import org.apache.lucene.codecs.compressing, I could do dir(org.apache.lucene.codecs.compressing) and see the contents of that module. CompressingTermVectorsReader was listed, but TVTermsEnum wasn't. TVTermsEnum also wasn't listed in the output of dir(org.apache.lucene.codecs.compressing.CompressingTermVectorsReader). So it looks like my first problem is how to get access to TVTermsEnum. Mike On Tue, May 27, 2014 at 11:10 PM, Andi Vajda va...@apache.org wrote: On May 27, 2014, at 19:17, Michael O'Leary mich...@moz.com wrote: *tl;dnr*: a next() method is defined for the Java class TVTermsEnum in Lucene 4.8.1, but it looks like there is no next() method available for an object that looks like it is an instance of the Python class TVTermsEnum in PyLucene 4.8.1. If there is a next() method, there is a good chance the object is even iterable (in the python sense). You may need to cast it first, though, as the api that returned it to you may not be defined to return TVTermsEnum: TVTermsEnum.cast_(obj) A good place for PyLucene code examples is its suite of unit tests. It also has a few samples - way less than in 3.x releases because the APIs changed too much. I'm pretty sure there is a test involving TermsEnum in the tests directory. Andi.. I have a set of documents that I would like to cluster. These documents share a vocabulary of only about 3,000 unique terms, but there are about 15,000,000 documents. One way I thought of doing this would be to index the documents using PyLucene (Python is the preferred programming language at work), obtain term vectors for the documents using PyLucene API functions, and calculate cosine similarities between pairs of term vectors in order to determine which documents are close to each other. I found some sample Java code on the web that various people have posted showing ways to do this with older versions of Lucene. I downloaded PyLucene 4.8.1 and compared its API functions with the ones used in the code samples, and saw that this is an area of Lucene that has changed quite a bit. I can send an email to the lucene-user mailing group to ask what would be a good way of doing this using version 4.8.1, but the question I have for this mailing group has to do with some Java API functions that it looks like are not exposed in Python, unless I have to go about accessing them in a different way. If I obtain the term vector for the field cat_ids in a document with id doc_id_1 doc_1_tfv = reader.getTermVector(doc_id_1, cat_ids) then doc_1_tfv is displayed as this object: Terms: org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTerms@32c46396 In some of the sample code I looked at, the terms in doc_1_tfv could be obtained with doc_1_tfv.getTerms(), but it looks like getTerms is not a member function of Terms or its subclasses any more. In another code sample, an iterator for the term vector is obtained via tfv_iter = doc_1_tfv.iterator(None) and then the terms are obtained one by one with calls to tfv_iter.next(). This is where I get stuck. tfv_iter has this value: TermsEnum: org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum@1cca2369 and there is a next() function defined for the TVTermsEnum class, but this object doesn't list next() as one of its member functions and an exception is raised if it is called. It looks like the object only supports the member functions defined for the TermsEnum class, and next() is not one of them. Is this the case, or is there a way have it support all of the TVTermsEnum member functions, including next()? TVTermsEnum is a private class in CompressingTermVectorsReader.java. So I am wondering if there is a way to obtain term vectors in this way and that I am just not treating doc_1_tfv and tfv_iter in the right way, or if there is a different, better way to get term vectors for documents in a PyLucene index, or if this isn't something that Lucene should be used for. Thank you very much for any help you can provide. Mike
Re: Getting term vectors/computing cosine similarity
On May 28, 2014, at 12:03 AM, Michael O'Leary mich...@moz.com wrote: Hi Andi, Thanks for the help. I just tried to import TVTermsEnum so I could try casting my iter, and I don't see how to do it since TVTermsEnum is a private class with fully qualified name org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum. I tried Cast the TermsEnum object with BytesRefIterator.cast_. Then it will have a next method, and be python-iterable. Here’s an example that outputs the term vectors as a generator. Look at the vector method just above: https://pythonhosted.org/lupyne/_modules/lupyne/engine/indexers.html#IndexReader.termvector from org.apache.lucene.codecs.compressing import CompressingTermVectorsReader$TVTermsEnum from org.apache.lucene.codecs.compressing import TVTermsEnum and import org.apache.lucene.codecs.compressing but none of them provided access to TVTermsEnum (the first two raised exceptions). After running import org.apache.lucene.codecs.compressing, I could do dir(org.apache.lucene.codecs.compressing) and see the contents of that module. CompressingTermVectorsReader was listed, but TVTermsEnum wasn't. TVTermsEnum also wasn't listed in the output of dir(org.apache.lucene.codecs.compressing.CompressingTermVectorsReader). So it looks like my first problem is how to get access to TVTermsEnum. Mike On Tue, May 27, 2014 at 11:10 PM, Andi Vajda va...@apache.org wrote: On May 27, 2014, at 19:17, Michael O'Leary mich...@moz.com wrote: *tl;dnr*: a next() method is defined for the Java class TVTermsEnum in Lucene 4.8.1, but it looks like there is no next() method available for an object that looks like it is an instance of the Python class TVTermsEnum in PyLucene 4.8.1. If there is a next() method, there is a good chance the object is even iterable (in the python sense). You may need to cast it first, though, as the api that returned it to you may not be defined to return TVTermsEnum: TVTermsEnum.cast_(obj) A good place for PyLucene code examples is its suite of unit tests. It also has a few samples - way less than in 3.x releases because the APIs changed too much. I'm pretty sure there is a test involving TermsEnum in the tests directory. Andi.. I have a set of documents that I would like to cluster. These documents share a vocabulary of only about 3,000 unique terms, but there are about 15,000,000 documents. One way I thought of doing this would be to index the documents using PyLucene (Python is the preferred programming language at work), obtain term vectors for the documents using PyLucene API functions, and calculate cosine similarities between pairs of term vectors in order to determine which documents are close to each other. I found some sample Java code on the web that various people have posted showing ways to do this with older versions of Lucene. I downloaded PyLucene 4.8.1 and compared its API functions with the ones used in the code samples, and saw that this is an area of Lucene that has changed quite a bit. I can send an email to the lucene-user mailing group to ask what would be a good way of doing this using version 4.8.1, but the question I have for this mailing group has to do with some Java API functions that it looks like are not exposed in Python, unless I have to go about accessing them in a different way. If I obtain the term vector for the field cat_ids in a document with id doc_id_1 doc_1_tfv = reader.getTermVector(doc_id_1, cat_ids) then doc_1_tfv is displayed as this object: Terms: org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTerms@32c46396 In some of the sample code I looked at, the terms in doc_1_tfv could be obtained with doc_1_tfv.getTerms(), but it looks like getTerms is not a member function of Terms or its subclasses any more. In another code sample, an iterator for the term vector is obtained via tfv_iter = doc_1_tfv.iterator(None) and then the terms are obtained one by one with calls to tfv_iter.next(). This is where I get stuck. tfv_iter has this value: TermsEnum: org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum@1cca2369 and there is a next() function defined for the TVTermsEnum class, but this object doesn't list next() as one of its member functions and an exception is raised if it is called. It looks like the object only supports the member functions defined for the TermsEnum class, and next() is not one of them. Is this the case, or is there a way have it support all of the TVTermsEnum member functions, including next()? TVTermsEnum is a private class in CompressingTermVectorsReader.java. So I am wondering if there is a way to obtain term vectors in this way and that I am just not treating doc_1_tfv and tfv_iter in the right way, or if there is a different, better way to get term vectors for documents in a PyLucene index, or if this isn't something that
[JENKINS] Lucene-Solr-SmokeRelease-4.x - Build # 166 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/166/ No tests ran. Build Log: [...truncated 53313 lines...] prepare-release-no-sign: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease [copy] Copying 431 files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/lucene [copy] Copying 239 files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/solr [exec] JAVA7_HOME is /home/hudson/tools/java/latest1.7 [exec] NOTE: output encoding is US-ASCII [exec] [exec] Load release URL file:/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/... [exec] [exec] Test Lucene... [exec] test basics... [exec] get KEYS [exec] 0.1 MB in 0.01 sec (12.9 MB/sec) [exec] check changes HTML... [exec] download lucene-4.9.0-src.tgz... [exec] 27.5 MB in 0.04 sec (670.3 MB/sec) [exec] verify md5/sha1 digests [exec] download lucene-4.9.0.tgz... [exec] 61.4 MB in 0.09 sec (665.1 MB/sec) [exec] verify md5/sha1 digests [exec] download lucene-4.9.0.zip... [exec] 71.1 MB in 0.14 sec (503.1 MB/sec) [exec] verify md5/sha1 digests [exec] unpack lucene-4.9.0.tgz... [exec] verify JAR metadata/identity/no javax.* or java.* classes... [exec] test demo with 1.7... [exec] got 5711 hits for query lucene [exec] check Lucene's javadoc JAR [exec] unpack lucene-4.9.0.zip... [exec] verify JAR metadata/identity/no javax.* or java.* classes... [exec] test demo with 1.7... [exec] got 5711 hits for query lucene [exec] check Lucene's javadoc JAR [exec] unpack lucene-4.9.0-src.tgz... [exec] make sure no JARs/WARs in src dist... [exec] run ant validate [exec] run tests w/ Java 7 and testArgs='-Dtests.jettyConnector=Socket -Dtests.disableHdfs=true'... [exec] test demo with 1.7... [exec] got 249 hits for query lucene [exec] generate javadocs w/ Java 7... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [exec] [exec] Test Solr... [exec] test basics... [exec] get KEYS [exec] 0.1 MB in 0.02 sec (6.3 MB/sec) [exec] Traceback (most recent call last): [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/script check changes HTML... [exec] s/smokeTestRelease.py, line 1347, in module [exec] main() [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py, line 1291, in main [exec] smokeTest(baseURL, svnRevision, version, tmpDir, isSigned, testArgs) [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py, line 1333, in smokeTest [exec] checkSigs('solr', solrPath, version, tmpDir, isSigned) [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py, line 410, in checkSigs [exec] testChanges(project, version, changesURL) [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py, line 458, in testChanges [exec] checkChangesContent(s, version, changesURL, project, True) [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py, line 485, in checkChangesContent [exec] raise RuntimeError('incorrect issue (_ instead of -) in %s: %s' % (name, m.group(1))) [exec] RuntimeError: incorrect issue (_ instead of -) in file:///usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/solr/changes/Changes.html: SOLR_3671 BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/build.xml:387: exec returned: 1 Total time: 54 minutes 7 seconds Build step 'Invoke Ant' marked build as failure Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010881#comment-14010881 ] Alexander S. commented on SOLR-5463: Inability to use this without sorting by an unique key (e.g. id) makes this feature useless. Same could be achieved previously with sorting by id and searching for docs where id is / than the last received. See how cursors do work in MongoDB, that's the right direction. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010883#comment-14010883 ] Alexander S. commented on SOLR-5463: http://docs.mongodb.org/manual/core/cursors/ Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5701) Move core closed listeners to AtomicReader
[ https://issues.apache.org/jira/browse/LUCENE-5701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-5701. -- Resolution: Fixed Move core closed listeners to AtomicReader -- Key: LUCENE-5701 URL: https://issues.apache.org/jira/browse/LUCENE-5701 Project: Lucene - Core Issue Type: New Feature Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5701.patch, LUCENE-5701.patch Core listeners are very helpful when managing per-segment caches (filters, uninverted doc values, etc.) yet this API is only exposed on {{SegmentReader}}. If you want to use it today, you need to do instanceof checks, try to unwrap in case of a FilterAtomicReader and finally fall back to a reader closed listener if every other attempt to get the underlying SegmentReader failed. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010888#comment-14010888 ] Alexander S. commented on SOLR-5463: Sorry for spamming, but can't edit my previous message. I just found that in mongo they also aren't isolated and could return duplicates, I was thinking they are. But sorting docs by id is not acceptable in 99% of use cases, especially in Solr, where it is more expected to get results sorted by relevance. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3671) DIH doesn't use its own interface + writerImpl has no information about the request
[ https://issues.apache.org/jira/browse/SOLR-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010900#comment-14010900 ] ASF subversion and git services commented on SOLR-3671: --- Commit 1597936 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1597936 ] SOLR-3671: fix ongoing smoke test build failure DIH doesn't use its own interface + writerImpl has no information about the request --- Key: SOLR-3671 URL: https://issues.apache.org/jira/browse/SOLR-3671 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 4.0-ALPHA, 4.0-BETA Reporter: Roman Chyla Assignee: James Dyer Priority: Minor Fix For: 4.9 Attachments: SOLR-3671.patch, SOLR-3671.patch The use case: I would like to extend DIH by providing a new writer, I have tried everything but can't accomplish it without either a) duplicating whole DIHandler or b) java reflection tricks. Almost everything inside DIH is private and the mechanism to instantiate a new writer based on the 'writerImpl' mechanism seems lacking important functionality It doesn't give the new class a chance to get information about the request, update processor. Also, the writer is instantiated twice (when 'writerImpl' is there), which is really unnecessary. As a solution, the existing DIHandler.getSolrWriter() should instantiate the appropriate writer and send it to DocBuilder (it is already doing that for SolrWriter). And DocBuilder doesn't need to create a second (duplicate) writer -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3671) DIH doesn't use its own interface + writerImpl has no information about the request
[ https://issues.apache.org/jira/browse/SOLR-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010901#comment-14010901 ] ASF subversion and git services commented on SOLR-3671: --- Commit 1597937 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1597937 ] SOLR-3671: fix ongoing smoke test build failure DIH doesn't use its own interface + writerImpl has no information about the request --- Key: SOLR-3671 URL: https://issues.apache.org/jira/browse/SOLR-3671 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 4.0-ALPHA, 4.0-BETA Reporter: Roman Chyla Assignee: James Dyer Priority: Minor Fix For: 4.9 Attachments: SOLR-3671.patch, SOLR-3671.patch The use case: I would like to extend DIH by providing a new writer, I have tried everything but can't accomplish it without either a) duplicating whole DIHandler or b) java reflection tricks. Almost everything inside DIH is private and the mechanism to instantiate a new writer based on the 'writerImpl' mechanism seems lacking important functionality It doesn't give the new class a chance to get information about the request, update processor. Also, the writer is instantiated twice (when 'writerImpl' is there), which is really unnecessary. As a solution, the existing DIHandler.getSolrWriter() should instantiate the appropriate writer and send it to DocBuilder (it is already doing that for SolrWriter). And DocBuilder doesn't need to create a second (duplicate) writer -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-4121) Standardize ramBytesUsed/sizeInBytes/memSize
[ https://issues.apache.org/jira/browse/LUCENE-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand closed LUCENE-4121. Resolution: Duplicate Fix Version/s: (was: 4.9) (was: 5.0) Standardize ramBytesUsed/sizeInBytes/memSize Key: LUCENE-4121 URL: https://issues.apache.org/jira/browse/LUCENE-4121 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4121.patch We should standardize the names of the methods we use to estimate the sizes of objects in memory and on disk. (cf. discussion on dev@lucene http://search-lucene.com/m/VbXSx1BP60G). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)
[ https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010943#comment-14010943 ] Eyal Zaidman commented on SOLR-6113: I like the idea of removing just the field name, because I agree it gives a better result when I look at it from the permission use case scenario - you get to search where you're allowed, and there's a chance your data is in the default search field. A literal search with the field name in that scenario would likely suffer from the original issue - there is very little chance the field name exists, as the user did not intend to look for it. I also agree that it's important to provide feedback that search results have been changed, but it seems to me if the search client is adding a uf restriction, it should be the clients responsibility to inform the end user about that restriction ? or at least I can't come up with a good way for Solr to do that. Edismax doesn't parse well the query uf (User Fields) - Key: SOLR-6113 URL: https://issues.apache.org/jira/browse/SOLR-6113 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Liram Vardi Priority: Minor Labels: edismax It seems that Edismax User Fields feature does not behave as expected. For instance, assuming the following query: _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_ The parsed query (taken from query debug info) is: _+((id:b* (text:user) (text:anna collins))~1)_ I expect that because user was filtered out in uf (User fields), the parsed query should not contain the user search part. In another words, the parsed query should look simply like this: _+id:b*_ This issue is affected by a the patch on issue SOLR-2649: When changing the default OP of Edismax to AND, the query results change. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5700) Add 'accountable' interface for various ramBytesUsed
[ https://issues.apache.org/jira/browse/LUCENE-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-5700: - Attachment: LUCENE-5700.patch Here is a patch: - new oal.util.Accountable interface with a ramBytesUsed() method - Classes that had a ramBytesUsed/sizeInBytes method to estimate memory usage now implement this interface - Classes that had a sizeInBytes method to compute disk usage remained as-is. I think the tough question is what to do in case memory usage cannot be computed. Returning -1 would work but we would need to make sure all consumers of this API handle that case properly... Since we don't have this issue now (all classes that implement the interface know how to do it), the documentation specifies that negative values are unsupported. Maybe we'll need to revisit it in the future in case the problem arises but for now I think that is the simplest option? Add 'accountable' interface for various ramBytesUsed Key: LUCENE-5700 URL: https://issues.apache.org/jira/browse/LUCENE-5700 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5700.patch Currently this is a disaster. there is ramBytesUsed(), sizeInBytes(), etc etc everywhere, with zero consistency, little javadocs, and no structure. For example, look at LUCENE-5695, where we go back and forth on how to handle don't know. I don't think we should add any more of these methods to any classes in lucene until this has been cleaned up. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5700) Add 'accountable' interface for various ramBytesUsed
[ https://issues.apache.org/jira/browse/LUCENE-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010984#comment-14010984 ] Robert Muir commented on LUCENE-5700: - {quote} Since we don't have this issue now (all classes that implement the interface know how to do it) {quote} Yeah, I don't think a class should implement the interface if it can't actually return a valid result. Add 'accountable' interface for various ramBytesUsed Key: LUCENE-5700 URL: https://issues.apache.org/jira/browse/LUCENE-5700 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5700.patch Currently this is a disaster. there is ramBytesUsed(), sizeInBytes(), etc etc everywhere, with zero consistency, little javadocs, and no structure. For example, look at LUCENE-5695, where we go back and forth on how to handle don't know. I don't think we should add any more of these methods to any classes in lucene until this has been cleaned up. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5700) Add 'accountable' interface for various ramBytesUsed
[ https://issues.apache.org/jira/browse/LUCENE-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010994#comment-14010994 ] Dawid Weiss commented on LUCENE-5700: - I'm for throwing an exception. Either a class knows how to handle it or shouldn't implement it (throw UnsupportedOperationException). Add 'accountable' interface for various ramBytesUsed Key: LUCENE-5700 URL: https://issues.apache.org/jira/browse/LUCENE-5700 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5700.patch Currently this is a disaster. there is ramBytesUsed(), sizeInBytes(), etc etc everywhere, with zero consistency, little javadocs, and no structure. For example, look at LUCENE-5695, where we go back and forth on how to handle don't know. I don't think we should add any more of these methods to any classes in lucene until this has been cleaned up. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4556) FuzzyTermsEnum creates tons of objects
[ https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-4556: Assignee: Michael McCandless (was: Simon Willnauer) FuzzyTermsEnum creates tons of objects -- Key: LUCENE-4556 URL: https://issues.apache.org/jira/browse/LUCENE-4556 Project: Lucene - Core Issue Type: Improvement Components: core/search, modules/spellchecker Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Critical Fix For: 4.9, 5.0 Attachments: LUCENE-4556.patch, LUCENE-4556.patch I ran into this problem in production using the DirectSpellchecker. The number of objects created by the spellchecker shoot through the roof very very quickly. We ran about 130 queries and ended up with 2M transitions / states. We spend 50% of the time in GC just because of transitions. Other parts of the system behave just fine here. I talked quickly to robert and gave a POC a shot providing a LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case and build a array based strucuture converted into UTF-8 directly instead of going through the object based APIs. This involved quite a bit of changes but they are all package private at this point. I have a patch that still has a fair set of nocommits but its shows that its possible and IMO worth the trouble to make this really useable in production. All tests pass with the patch - its a start -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5709) NorwegianPhoneticFilter
Jan Høydahl created LUCENE-5709: --- Summary: NorwegianPhoneticFilter Key: LUCENE-5709 URL: https://issues.apache.org/jira/browse/LUCENE-5709 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Jan Høydahl Priority: Minor Fix For: 4.9 There has been a steady demand for a Norwegian phonetic normalization filter. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5709) NorwegianPhoneticFilter
[ https://issues.apache.org/jira/browse/LUCENE-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011002#comment-14011002 ] Jan Høydahl commented on LUCENE-5709: - One candidate could be https://github.com/kvalle/norphoname NorwegianPhoneticFilter --- Key: LUCENE-5709 URL: https://issues.apache.org/jira/browse/LUCENE-5709 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Jan Høydahl Priority: Minor Fix For: 4.9 There has been a steady demand for a Norwegian phonetic normalization filter. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_05) - Build # 10405 - Still Failing!
java.lang.NullPointerException at __randomizedtesting.SeedInfo.seed([6330FD6A7351D588:22BBDD0F54EF26C7]:0) at org.apache.solr.SolrTestCaseJ4.recurseDelete(SolrTestCaseJ4.java:1018) This looks like a JVM problem too? 1017:if (f.isDirectory()) { 1018: for (File sub : f.listFiles()) { Anyway, cleaned it up a bit. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4556) FuzzyTermsEnum creates tons of objects
[ https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011014#comment-14011014 ] Nik Everett commented on LUCENE-4556: - I'm having GC trouble and I'm using the DirectCandidateGenerator. Its obviously kind of hard to tell how much the automata is contributing in production but when I try it locally just generating the automata for two or three terms takes about 200KB of memory. Napkin math (200KB * 250queries/second) says this makes about 50MB of garbage per second per index. Obviously it gets worse if you run this in a sharded context where each shard does the generating. Well, not really worse, but the large up front cost and memory consumption of this process is relatively static based on shard size so this becomes a reason to use larger shards. I should propose that in addition to Simon's patches another other option is to try to implement something like the stack based automaton simulation that the Schulz Mihov paper (the one that proposed the Lev automaton) describes in section 6. Its not useful for stuff like intersecting the enums but if you are willing to forgo that you could probably get away with much less memory consumption. FuzzyTermsEnum creates tons of objects -- Key: LUCENE-4556 URL: https://issues.apache.org/jira/browse/LUCENE-4556 Project: Lucene - Core Issue Type: Improvement Components: core/search, modules/spellchecker Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Critical Fix For: 4.9, 5.0 Attachments: LUCENE-4556.patch, LUCENE-4556.patch I ran into this problem in production using the DirectSpellchecker. The number of objects created by the spellchecker shoot through the roof very very quickly. We ran about 130 queries and ended up with 2M transitions / states. We spend 50% of the time in GC just because of transitions. Other parts of the system behave just fine here. I talked quickly to robert and gave a POC a shot providing a LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case and build a array based strucuture converted into UTF-8 directly instead of going through the object based APIs. This involved quite a bit of changes but they are all package private at this point. I have a patch that still has a fair set of nocommits but its shows that its possible and IMO worth the trouble to make this really useable in production. All tests pass with the patch - its a start -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5709) NorwegianPhoneticFilter
[ https://issues.apache.org/jira/browse/LUCENE-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011018#comment-14011018 ] Jan Høydahl commented on LUCENE-5709: - Question: Do any of you have experience with Swedish / Danish phonetic normalization? Right now I'm a bit sceptic to try to mash the three languages into one filter, but if they end up sharing most of the rules it could be an option to make a Scandinavian filter parameterized with mode=no|se|dk|scandinavian. There are many use cases spanning content from the whole region which could benefit from a unified solution. NorwegianPhoneticFilter --- Key: LUCENE-5709 URL: https://issues.apache.org/jira/browse/LUCENE-5709 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Jan Høydahl Priority: Minor Fix For: 4.9 There has been a steady demand for a Norwegian phonetic normalization filter. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5680) Allow updating multiple DocValues fields atomically
[ https://issues.apache.org/jira/browse/LUCENE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5680: --- Attachment: LUCENE-5680.patch Patch ports all tests to use the atomic updates. This removed the complexity in TestIWExceptions that we've added recently, since now each set of updates is either atomically applied or not. Allow updating multiple DocValues fields atomically --- Key: LUCENE-5680 URL: https://issues.apache.org/jira/browse/LUCENE-5680 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5680.patch, LUCENE-5680.patch, LUCENE-5680.patch, LUCENE-5680.patch This has come up on the list (http://markmail.org/message/2wmpvksuwc5t57pg) -- it would be good if we can allow updating several doc-values fields, atomically. It will also improve/simplify our tests, where today we index two fields, e.g. the field itself and a control field. In some multi-threaded tests, since we cannot be sure which updates came through first, we limit the test such that each thread updates a different set of fields, otherwise they will collide and it will be hard to verify the index in the end. I was working on a patch and it looks pretty simple to do, will post a patch shortly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5285) Solr response format should support child Docs
[ https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011030#comment-14011030 ] Arcadius Ahouansou commented on SOLR-5285: -- Hi [~varunthacker], I tried {code}facet=truefacet.field=content_type{code} The facet count for children was always 0. Is this a feature? Thanks. Solr response format should support child Docs -- Key: SOLR-5285 URL: https://issues.apache.org/jira/browse/SOLR-5285 Project: Solr Issue Type: New Feature Reporter: Varun Thacker Fix For: 4.9, 5.0 Attachments: SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, javabin_backcompat_child_docs.bin Solr has added support for taking childDocs as input ( only XML till now ). It's currently used for BlockJoinQuery. I feel that if a user indexes a document with child docs, even if he isn't using the BJQ features and is just searching which results in a hit on the parentDoc, it's childDocs should be returned in the response format. [~hossman_luc...@fucit.org] on IRC suggested that the DocTransformers would be the place to add childDocs to the response. Now given a docId one needs to find out all the childDoc id's. A couple of approaches which I could think of are 1. Maintain the relation between a parentDoc and it's childDocs during indexing time in maybe a separate index? 2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a parentDoc it finds out all the childDocs but this requires a childScorer. Am I missing something obvious on how to find the relation between a parentDoc and it's childDocs because none of the above solutions for this look right. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6088) Add query re-ranking with the ReRankingQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-6088: - Attachment: SOLR-6088.patch New patch supports the sort parameter for the main query and preserves query elevation (QueryElevationComponent). Add query re-ranking with the ReRankingQParserPlugin Key: SOLR-6088 URL: https://issues.apache.org/jira/browse/SOLR-6088 Project: Solr Issue Type: New Feature Components: search Reporter: Joel Bernstein Attachments: SOLR-6088.patch, SOLR-6088.patch, SOLR-6088.patch, SOLR-6088.patch This ticket introduces the ReRankingQParserPlugin which adds query Reranking/Rescoring for Solr. It leverages the new RankQuery framework to plug-in the new Lucene QueryRescorer. See ticket LUCENE-5489 for details on the use case. Sample syntax: {code} q={!rerank mainQuery=$qq reRankQuery=$rqq reRankDocs=200} {code} In the example above the mainQuery is executed and 200 docs are collected and re-ranked based on the results of the reRankQuery. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6115) Cleanup enum/string action types in Overseer, OverseerCollectionProcessor and CollectionHandler
Shalin Shekhar Mangar created SOLR-6115: --- Summary: Cleanup enum/string action types in Overseer, OverseerCollectionProcessor and CollectionHandler Key: SOLR-6115 URL: https://issues.apache.org/jira/browse/SOLR-6115 Project: Solr Issue Type: Task Components: SolrCloud Reporter: Shalin Shekhar Mangar Priority: Minor Fix For: 4.9, 5.0 The enum/string handling for actions in Overseer and OCP is a mess. We should fix it. From: https://issues.apache.org/jira/browse/SOLR-5466?focusedCommentId=13918059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13918059 {quote} I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though... {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5710) DefaultIndexingChain swallows useful information from MaxBytesLengthExceededException
Lee Hinman created LUCENE-5710: -- Summary: DefaultIndexingChain swallows useful information from MaxBytesLengthExceededException Key: LUCENE-5710 URL: https://issues.apache.org/jira/browse/LUCENE-5710 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 4.8.1 Reporter: Lee Hinman Priority: Minor In DefaultIndexingChain, when a MaxBytesLengthExceededException is caught, the original message is discarded, however, the message contains useful information like the size that exceeded the limit. Lucene should make this information included in the newly thrown IllegalArgumentException. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6116) DocRouter.getDocRouter accepts routerName as an Object
[ https://issues.apache.org/jira/browse/SOLR-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-6116: Description: Refactor DocRouter.java from: {code} public static DocRouter getDocRouter(Object routerName) {...} {code} to: {code} public static DocRouter getDocRouter(String routerName) { {code} There's really no reason not to accept a routerName as a string. was: Refactor DocRouter.java from: {code} public static DocRouter getDocRouter(Object routerName) {...} {code} to: {code} public static DocRouter getDocRouter(String routerName) { {code} There's really no reason to accept a routerName as a string. DocRouter.getDocRouter accepts routerName as an Object -- Key: SOLR-6116 URL: https://issues.apache.org/jira/browse/SOLR-6116 Project: Solr Issue Type: Task Components: SolrCloud Reporter: Shalin Shekhar Mangar Priority: Trivial Fix For: 4.9, 5.0 Refactor DocRouter.java from: {code} public static DocRouter getDocRouter(Object routerName) {...} {code} to: {code} public static DocRouter getDocRouter(String routerName) { {code} There's really no reason not to accept a routerName as a string. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6116) DocRouter.getDocRouter accepts routerName as an Object
Shalin Shekhar Mangar created SOLR-6116: --- Summary: DocRouter.getDocRouter accepts routerName as an Object Key: SOLR-6116 URL: https://issues.apache.org/jira/browse/SOLR-6116 Project: Solr Issue Type: Task Components: SolrCloud Reporter: Shalin Shekhar Mangar Priority: Trivial Fix For: 4.9, 5.0 Refactor DocRouter.java from: {code} public static DocRouter getDocRouter(Object routerName) {...} {code} to: {code} public static DocRouter getDocRouter(String routerName) { {code} There's really no reason to accept a routerName as a string. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5710) DefaultIndexingChain swallows useful information from MaxBytesLengthExceededException
[ https://issues.apache.org/jira/browse/LUCENE-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lee Hinman updated LUCENE-5710: --- Attachment: LUCENE-5710.patch Attaching patch that includes the original exception's message in the IllegalArgumentException message. DefaultIndexingChain swallows useful information from MaxBytesLengthExceededException - Key: LUCENE-5710 URL: https://issues.apache.org/jira/browse/LUCENE-5710 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 4.8.1 Reporter: Lee Hinman Priority: Minor Attachments: LUCENE-5710.patch In DefaultIndexingChain, when a MaxBytesLengthExceededException is caught, the original message is discarded, however, the message contains useful information like the size that exceeded the limit. Lucene should make this information included in the newly thrown IllegalArgumentException. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5708) Remove IndexWriterConfig.clone
[ https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011085#comment-14011085 ] Simon Willnauer commented on LUCENE-5708: - part of the problem is that we holding IW state on MergePolicy and MergeScheduler. Both classes should get the IW passed to the relevant methods so we can share them across as many instances we want... Remove IndexWriterConfig.clone -- Key: LUCENE-5708 URL: https://issues.apache.org/jira/browse/LUCENE-5708 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: LUCENE-5708.patch, LUCENE-5708.patch We originally added this clone to allow a single IWC to be re-used against more than one IndexWriter, but I think this is a mis-feature: it adds complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I think it's buggy today. I think we should just disallow sharing: you must make a new IWC for a new IndexWriter. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5711) Pass IW to MergePolicy
Simon Willnauer created LUCENE-5711: --- Summary: Pass IW to MergePolicy Key: LUCENE-5711 URL: https://issues.apache.org/jira/browse/LUCENE-5711 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Simon Willnauer Fix For: 4.9, 5.0 Related to LUCENE-5708 we keep state in the MP holding on to the IW which prevents sharing the MP across index writers. Aside of this we should really not keep state in the MP it should really only select merges without being bound to the index writer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5711) Pass IW to MergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-5711: Attachment: LUCENE-5711.patch here is a patch Pass IW to MergePolicy -- Key: LUCENE-5711 URL: https://issues.apache.org/jira/browse/LUCENE-5711 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Simon Willnauer Fix For: 4.9, 5.0 Attachments: LUCENE-5711.patch Related to LUCENE-5708 we keep state in the MP holding on to the IW which prevents sharing the MP across index writers. Aside of this we should really not keep state in the MP it should really only select merges without being bound to the index writer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5711) Pass IW to MergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011093#comment-14011093 ] Michael McCandless commented on LUCENE-5711: +1 Pass IW to MergePolicy -- Key: LUCENE-5711 URL: https://issues.apache.org/jira/browse/LUCENE-5711 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Simon Willnauer Fix For: 4.9, 5.0 Attachments: LUCENE-5711.patch Related to LUCENE-5708 we keep state in the MP holding on to the IW which prevents sharing the MP across index writers. Aside of this we should really not keep state in the MP it should really only select merges without being bound to the index writer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5710) DefaultIndexingChain swallows useful information from MaxBytesLengthExceededException
[ https://issues.apache.org/jira/browse/LUCENE-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lee Hinman updated LUCENE-5710: --- Attachment: LUCENE-5710.patch Patch that includes the exception as the cause parameter for IllegalArgumentException DefaultIndexingChain swallows useful information from MaxBytesLengthExceededException - Key: LUCENE-5710 URL: https://issues.apache.org/jira/browse/LUCENE-5710 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 4.8.1 Reporter: Lee Hinman Priority: Minor Attachments: LUCENE-5710.patch, LUCENE-5710.patch In DefaultIndexingChain, when a MaxBytesLengthExceededException is caught, the original message is discarded, however, the message contains useful information like the size that exceeded the limit. Lucene should make this information included in the newly thrown IllegalArgumentException. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5711) Pass IW to MergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011098#comment-14011098 ] Shai Erera commented on LUCENE-5711: Note that this was changed in LUCENE-1763. At the time there were issues with some MPs that didn't treat it well, but I don't remember what issues. It also helped clean up the API, since I think most apps don't share an MP between writers. But then again, most people also don't write their own MPs, or interact with it directly, so the API is less of an issue. If it works and allows to share an MP between writers more easily, let's go for it. Pass IW to MergePolicy -- Key: LUCENE-5711 URL: https://issues.apache.org/jira/browse/LUCENE-5711 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Simon Willnauer Fix For: 4.9, 5.0 Attachments: LUCENE-5711.patch Related to LUCENE-5708 we keep state in the MP holding on to the IW which prevents sharing the MP across index writers. Aside of this we should really not keep state in the MP it should really only select merges without being bound to the index writer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011217#comment-14011217 ] Yonik Seeley commented on SOLR-5463: bq. But sorting docs by id is not acceptable in 99% of use cases, especially in Solr, where it is more expected to get results sorted by relevance. It's only a tiebreak by id that is needed. So sort=score desc, id asc is fine. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-SmokeRelease-4.8 - Build # 4 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.8/4/ No tests ran. Build Log: [...truncated 52992 lines...] prepare-release-no-sign: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/lucene/build/fakeRelease [copy] Copying 431 files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/lucene/build/fakeRelease/lucene [copy] Copying 239 files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/lucene/build/fakeRelease/solr [exec] JAVA7_HOME is /home/hudson/tools/java/latest1.7 [exec] NOTE: output encoding is US-ASCII [exec] [exec] Load release URL file:/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/lucene/build/fakeRelease/... [exec] [exec] Test Lucene... [exec] test basics... [exec] get KEYS [exec] 0.1 MB in 0.01 sec (12.4 MB/sec) [exec] check changes HTML... [exec] download lucene-4.8.0-src.tgz... [exec] 27.5 MB in 0.04 sec (668.5 MB/sec) [exec] verify md5/sha1 digests [exec] download lucene-4.8.0.tgz... [exec] 61.3 MB in 0.09 sec (692.9 MB/sec) [exec] verify md5/sha1 digests [exec] download lucene-4.8.0.zip... [exec] 70.9 MB in 0.08 sec (934.7 MB/sec) [exec] verify md5/sha1 digests [exec] unpack lucene-4.8.0.tgz... [exec] verify JAR metadata/identity/no javax.* or java.* classes... [exec] test demo with 1.7... [exec] got 5687 hits for query lucene [exec] check Lucene's javadoc JAR [exec] unpack lucene-4.8.0.zip... [exec] verify JAR metadata/identity/no javax.* or java.* classes... [exec] test demo with 1.7... [exec] got 5687 hits for query lucene [exec] check Lucene's javadoc JAR [exec] unpack lucene-4.8.0-src.tgz... [exec] make sure no JARs/WARs in src dist... [exec] run ant validate [exec] run tests w/ Java 7 and testArgs='-Dtests.jettyConnector=Socket -Dtests.disableHdfs=true'... [exec] test demo with 1.7... [exec] got 249 hits for query lucene [exec] generate javadocs w/ Java 7... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [exec] [exec] Test Solr... [exec] test basics... [exec] get KEYS [exec] 0.1 MB in 0.00 sec (77.6 MB/sec) [exec] check changes HTML... [exec] Traceback (most recent call last): [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/dev-tools/script [exec] s/smokeTestRelease.py, line 1347, in module [exec] main() [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/dev-tools/scripts/smokeTestRelease.py, line 1291, in main [exec] smokeTest(baseURL, svnRevision, version, tmpDir, isSigned, testArgs) [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/dev-tools/scripts/smokeTestRelease.py, line 1333, in smokeTest [exec] checkSigs('solr', solrPath, version, tmpDir, isSigned) [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/dev-tools/scripts/smokeTestRelease.py, line 410, in checkSigs [exec] testChanges(project, version, changesURL) [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/dev-tools/scripts/smokeTestRelease.py, line 458, in testChanges [exec] checkChangesContent(s, version, changesURL, project, True) [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/dev-tools/scripts/smokeTestRelease.py, line 485, in checkChangesContent [exec] raise RuntimeError('incorrect issue (_ instead of -) in %s: %s' % (name, m.group(1))) [exec] RuntimeError: incorrect issue (_ instead of -) in file:///usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/lucene/build/fakeRelease/solr/changes/Changes.html: SOLR_6029 BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.8/build.xml:387: exec returned: 1 Total time: 53 minutes 57 seconds Build step 'Invoke Ant' marked build as failure Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011226#comment-14011226 ] Alexander S. commented on SOLR-5463: Oh, that's awesome, thanks for the tip. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011244#comment-14011244 ] David Smiley commented on SOLR-5463: I think Solr could be more user-friendly here by auto-adding the , id asc if it's not there. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6021) Always persist router.field in cluster state so CloudSolrServer can route documents correctly
[ https://issues.apache.org/jira/browse/SOLR-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-6021: Attachment: SOLR-6021.patch Changes # Added a getter for coreConfigService in CoreContainer # Overseer reads config/schema to get the unique key field name if router is not implicit and router.field is not specified. # Added a test in TestCollection API I initially wanted to do this in OverseerCollectionProcessor but then you can skip that completely if you're creating collections through the core admin API. SolrJ doesn't need any changes because if a router.field is configured, the idField and it's value is not used at all. Always persist router.field in cluster state so CloudSolrServer can route documents correctly - Key: SOLR-6021 URL: https://issues.apache.org/jira/browse/SOLR-6021 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Shalin Shekhar Mangar Attachments: SOLR-6021.patch CloudSolrServer has idField as id which is used for hashing and distributing documents. There is a setter to change it as well. IMO, we should use the correct uniqueKey automatically. I propose that we start storing router.field always in cluster state and set it to the uniqueKey field name by default. Then CloudSolrServer would not need to assume an id field by default. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4502) ShardHandlerFactory not initialized in CoreContainer when creating a Core manually.
[ https://issues.apache.org/jira/browse/SOLR-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011291#comment-14011291 ] David Smiley commented on SOLR-4502: This is annoying; it's bitten me twice now. I think if this is tackled, we should step back a bit and think about how to make it easier to create an EmbeddedSolrServer considering that an ESS is limited to a single SolrCore. In light of that, the user shouldn't have to be bothered with a CoreContainer, and I argue not with a SolrConfig object either. They should be able to point to a SolrCore's instance directory, _and that's it_. Anything else is superfluous ceremony. FYI I'm creating my CoreContainer like this, before I then do all the other stuff: {code:java} public static SolrServer createEmbeddedSolr(final String instanceDir) throws Exception { final String coreName = new File(instanceDir).getName(); final String dataDir = instanceDir + /../../cores_data/ + coreName;//or use null for default // note: this is more complex than it should be. See SOLR-4502 SolrResourceLoader resourceLoader = new SolrResourceLoader(instanceDir); CoreContainer container = new CoreContainer(resourceLoader, ConfigSolr.fromString(resourceLoader, solr /) ); container.load(); CoreDescriptor descriptor = new CoreDescriptor(container, coreName, instanceDir); SolrConfig config = new SolrConfig(instanceDir, descriptor.getConfigName(), null); SolrCore core = new SolrCore(coreName, dataDir, config, null, descriptor); container.register(core, false); return new EmbeddedSolrServer(container, core.getName()); } {code} ShardHandlerFactory not initialized in CoreContainer when creating a Core manually. --- Key: SOLR-4502 URL: https://issues.apache.org/jira/browse/SOLR-4502 Project: Solr Issue Type: Bug Affects Versions: 4.1 Reporter: Michael Aspetsberger Assignee: Mark Miller Labels: NPE Fix For: 4.9, 5.0 We are using an embedded solr server for our unit testing purposes. In our scenario, we create a {{CoreContainer}} using only the solr-home path, and then create the cores manually using a {{CoreDescriptor}}. While the creation appears to work fine, it hits an NPE when it handles the search: {quote} Caused by: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150) {quote} According to http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3CE8A9BF60-5577-45F9-8BEA-B85616C6539D%40gmail.com%3E , this is due to a missing {{CoreContainer#load}}. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
[ https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011302#comment-14011302 ] Noble Paul commented on SOLR-6091: -- I have fixed this and I have logged it whenever a race condition occurs. So , the logs say that the race condition occurs . But after a few (10+) restarts I could still end up without an Overseer . SOLR-6095 is another issue I have identified and fixed it for local testing Race condition in prioritizeOverseerNodes can trigger extra QUIT operations --- Key: SOLR-6091 URL: https://issues.apache.org/jira/browse/SOLR-6091 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7, 4.8 Reporter: Shalin Shekhar Mangar Assignee: Noble Paul Fix For: 4.9, 5.0 Attachments: SOLR-6091.patch When using the overseer roles feature, there is a possibility of more than one thread executing the prioritizeOverseerNodes method and extra QUIT commands being inserted into the overseer queue. At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a race condition. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
[ https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011302#comment-14011302 ] Noble Paul edited comment on SOLR-6091 at 5/28/14 4:50 PM: --- I have fixed this and I have logged it whenever a race condition occurs. So , the logs say that the race condition occurs . But after a few (10+) restarts I could still end up without an Overseer . SOLR-6095 is another issue I have identified and fixed it for local testing . So these 2 issues together have resolved the problem was (Author: noble.paul): I have fixed this and I have logged it whenever a race condition occurs. So , the logs say that the race condition occurs . But after a few (10+) restarts I could still end up without an Overseer . SOLR-6095 is another issue I have identified and fixed it for local testing Race condition in prioritizeOverseerNodes can trigger extra QUIT operations --- Key: SOLR-6091 URL: https://issues.apache.org/jira/browse/SOLR-6091 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7, 4.8 Reporter: Shalin Shekhar Mangar Assignee: Noble Paul Fix For: 4.9, 5.0 Attachments: SOLR-6091.patch When using the overseer roles feature, there is a possibility of more than one thread executing the prioritizeOverseerNodes method and extra QUIT commands being inserted into the overseer queue. At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a race condition. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
[ https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011302#comment-14011302 ] Noble Paul edited comment on SOLR-6091 at 5/28/14 4:53 PM: --- I have fixed this and I have logged it whenever a race condition occurs. So , the logs say that the race condition occurs . But after a few (10+) restarts I could still end up without an Overseer . SOLR-6095 is another issue I have identified and fixed it for local testing . So these 2 issues together have resolved the problem. I have patches for both and will post them once some tests are added was (Author: noble.paul): I have fixed this and I have logged it whenever a race condition occurs. So , the logs say that the race condition occurs . But after a few (10+) restarts I could still end up without an Overseer . SOLR-6095 is another issue I have identified and fixed it for local testing . So these 2 issues together have resolved the problem Race condition in prioritizeOverseerNodes can trigger extra QUIT operations --- Key: SOLR-6091 URL: https://issues.apache.org/jira/browse/SOLR-6091 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7, 4.8 Reporter: Shalin Shekhar Mangar Assignee: Noble Paul Fix For: 4.9, 5.0 Attachments: SOLR-6091.patch When using the overseer roles feature, there is a possibility of more than one thread executing the prioritizeOverseerNodes method and extra QUIT commands being inserted into the overseer queue. At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a race condition. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6117) Replication command=fetchindex always return success.
Raintung Li created SOLR-6117: - Summary: Replication command=fetchindex always return success. Key: SOLR-6117 URL: https://issues.apache.org/jira/browse/SOLR-6117 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.6 Reporter: Raintung Li Replication API command=fetchindex do fetch the index. while occur the error, still give success response. API should return the right status, especially WAIT parameter is true.(synchronous). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6117) Replication command=fetchindex always return success.
[ https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raintung Li updated SOLR-6117: -- Attachment: SOLR-6117.txt Replication command=fetchindex always return success. - Key: SOLR-6117 URL: https://issues.apache.org/jira/browse/SOLR-6117 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.6 Reporter: Raintung Li Attachments: SOLR-6117.txt Replication API command=fetchindex do fetch the index. while occur the error, still give success response. API should return the right status, especially WAIT parameter is true.(synchronous). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6117) Replication command=fetchindex always return success.
[ https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raintung Li updated SOLR-6117: -- Attachment: SOLR-6117.txt Replication command=fetchindex always return success. - Key: SOLR-6117 URL: https://issues.apache.org/jira/browse/SOLR-6117 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.6 Reporter: Raintung Li Attachments: SOLR-6117.txt Replication API command=fetchindex do fetch the index. while occur the error, still give success response. API should return the right status, especially WAIT parameter is true.(synchronous). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6117) Replication command=fetchindex always return success.
[ https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raintung Li updated SOLR-6117: -- Attachment: (was: SOLR-6117.txt) Replication command=fetchindex always return success. - Key: SOLR-6117 URL: https://issues.apache.org/jira/browse/SOLR-6117 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.6 Reporter: Raintung Li Attachments: SOLR-6117.txt Replication API command=fetchindex do fetch the index. while occur the error, still give success response. API should return the right status, especially WAIT parameter is true.(synchronous). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011384#comment-14011384 ] Hoss Man commented on SOLR-5463: bq. I think Solr could be more user-friendly here by auto-adding the , id asc if it's not there. The reason the code currently throws an error was because i figured it was better to force the user to choose which tie breaker they wanted (asc vs desc) then to just magically pick one arbitrarily. If folks think a magic default is a better i've got no serious objections -- just open a new issue. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5831) Scale score PostFilter
[ https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011402#comment-14011402 ] Peter Keegan commented on SOLR-5831: Hi Joel, I'm not sure why I didn't see this problem until now, but this PostFilter doesn't work after being cached. When the ScoreScaleFilter is retrieved from the cache, the docSet is null and a new PostFilter collector is created, but the Collector's 'setScorer' method isn't called. As a result, the 'collect' method throws NPE (scorer.score()). What do I need to do to keep the query from rerunning? Can the scorer be saved with the ScoreScaleFilter instead of the ScoreCollector? Thanks, Peter Scale score PostFilter -- Key: SOLR-5831 URL: https://issues.apache.org/jira/browse/SOLR-5831 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.7 Reporter: Peter Keegan Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, TestScaleScoreQParserPlugin.patch The ScaleScoreQParserPlugin is a PostFilter that performs score scaling. This is an alternative to using a function query wrapping a scale() wrapping a query(). For example: select?qq={!edismax v='news' qf='title^2 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query v=$qq} The problem with this query is that it has to scale every hit. Usually, only the returned hits need to be scaled, but there may be use cases where the number of hits to be scaled is greater than the returned hit count, but less than or equal to the total hit count. Sample syntax: fq={!scalescore+l=0.0 u=1.0 maxscalehits=1 func=sum(product(sscore(),0.75),product(field(myfield),0.25))} l=0.0 u=1.0 //Scale scores to values between 0-1, inclusive maxscalehits=1//The maximum number of result scores to scale (-1 = all hits, 0 = results 'page' size) func=... //Apply the composite function to each hit. The scaled score value is accessed by the 'score()' value source All parameters are optional. The defaults are: l=0.0 u=1.0 maxscalehits=0 (result window size) func=(null) Note: this patch is not complete, as it contains no test cases and may not conform to all the guidelines in http://wiki.apache.org/solr/HowToContribute. I would appreciate any feedback on the usability and implementation. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6021) Always persist router.field in cluster state so CloudSolrServer can route documents correctly
[ https://issues.apache.org/jira/browse/SOLR-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011403#comment-14011403 ] Alan Woodward commented on SOLR-6021: - {code:java} CoreContainer coreContainer = zkController.getCoreContainer(); ConfigSetService configService = coreContainer.getCoreConfigService(); CoreDescriptor dummy = new CoreDescriptor(coreContainer, collectionName, collectionName); ConfigSet configSet = configService.getConfig(dummy); {code} This seems like a lot of ceremony just to get the ConfigSet for a collection - maybe it should be a method on zkController itself? Always persist router.field in cluster state so CloudSolrServer can route documents correctly - Key: SOLR-6021 URL: https://issues.apache.org/jira/browse/SOLR-6021 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Shalin Shekhar Mangar Attachments: SOLR-6021.patch CloudSolrServer has idField as id which is used for hashing and distributing documents. There is a setter to change it as well. IMO, we should use the correct uniqueKey automatically. I propose that we start storing router.field always in cluster state and set it to the uniqueKey field name by default. Then CloudSolrServer would not need to assume an id field by default. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6118) expand.sort doesn't work with function queries
David Smiley created SOLR-6118: -- Summary: expand.sort doesn't work with function queries Key: SOLR-6118 URL: https://issues.apache.org/jira/browse/SOLR-6118 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: David Smiley The new ExpandComponent doesn't support function queries in the {{expand.sort}} parameter, such as geodist() for example. Here's the stack trace if you try: {noformat} 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.IllegalStateException: SortField needs to be rewritten through Sort.rewrite(..) and SortField.rewrite(..) at org.apache.lucene.search.SortField.getComparator(SortField.java:433) at org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66) at org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079) at org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310) at org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6021) Always persist router.field in cluster state so CloudSolrServer can route documents correctly
[ https://issues.apache.org/jira/browse/SOLR-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011411#comment-14011411 ] Shalin Shekhar Mangar commented on SOLR-6021: - Yeah, I don't like that as well. I'll refactor it into a method in ZkController. Thanks Alan. Always persist router.field in cluster state so CloudSolrServer can route documents correctly - Key: SOLR-6021 URL: https://issues.apache.org/jira/browse/SOLR-6021 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Shalin Shekhar Mangar Attachments: SOLR-6021.patch CloudSolrServer has idField as id which is used for hashing and distributing documents. There is a setter to change it as well. IMO, we should use the correct uniqueKey automatically. I propose that we start storing router.field always in cluster state and set it to the uniqueKey field name by default. Then CloudSolrServer would not need to assume an id field by default. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5285) Solr response format should support child Docs
[ https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011414#comment-14011414 ] Mikhail Khludnev commented on SOLR-5285: [~arcadius] it's SOLR-5743 . Not much progress so far. I'm expecting some movement during this year. Solr response format should support child Docs -- Key: SOLR-5285 URL: https://issues.apache.org/jira/browse/SOLR-5285 Project: Solr Issue Type: New Feature Reporter: Varun Thacker Fix For: 4.9, 5.0 Attachments: SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, javabin_backcompat_child_docs.bin Solr has added support for taking childDocs as input ( only XML till now ). It's currently used for BlockJoinQuery. I feel that if a user indexes a document with child docs, even if he isn't using the BJQ features and is just searching which results in a hit on the parentDoc, it's childDocs should be returned in the response format. [~hossman_luc...@fucit.org] on IRC suggested that the DocTransformers would be the place to add childDocs to the response. Now given a docId one needs to find out all the childDoc id's. A couple of approaches which I could think of are 1. Maintain the relation between a parentDoc and it's childDocs during indexing time in maybe a separate index? 2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a parentDoc it finds out all the childDocs but this requires a childScorer. Am I missing something obvious on how to find the relation between a parentDoc and it's childDocs because none of the above solutions for this look right. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6118) expand.sort doesn't work with function queries
[ https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley reassigned SOLR-6118: -- Assignee: David Smiley expand.sort doesn't work with function queries -- Key: SOLR-6118 URL: https://issues.apache.org/jira/browse/SOLR-6118 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: David Smiley Assignee: David Smiley The new ExpandComponent doesn't support function queries in the {{expand.sort}} parameter, such as geodist() for example. Here's the stack trace if you try: {noformat} 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.IllegalStateException: SortField needs to be rewritten through Sort.rewrite(..) and SortField.rewrite(..) at org.apache.lucene.search.SortField.getComparator(SortField.java:433) at org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66) at org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079) at org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310) at org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6118) expand.sort doesn't work with function queries
[ https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-6118: --- Attachment: SOLR-6118_expand_sort_rewrite.patch Simple fix w/ test. I think I've been bitten by this type of bug before -- forgetting to call: {code:java} if (sort != null) sort = sort.rewrite(searcher); {code} It'd be nice if somehow Lucene's Sort.java maintained a rewritten boolean flag so that a bug like this would have been caught earlier during development. Maybe that's the solution, maybe not. I noticed some indentation/spacing problems in ExpandComponent.java. I'll fix them in a separate commit. expand.sort doesn't work with function queries -- Key: SOLR-6118 URL: https://issues.apache.org/jira/browse/SOLR-6118 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-6118_expand_sort_rewrite.patch The new ExpandComponent doesn't support function queries in the {{expand.sort}} parameter, such as geodist() for example. Here's the stack trace if you try: {noformat} 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.IllegalStateException: SortField needs to be rewritten through Sort.rewrite(..) and SortField.rewrite(..) at org.apache.lucene.search.SortField.getComparator(SortField.java:433) at org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66) at org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079) at org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310) at org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6118) expand.sort doesn't work with function queries
[ https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011560#comment-14011560 ] ASF subversion and git services commented on SOLR-6118: --- Commit 1598138 from [~dsmiley] in branch 'dev/trunk' [ https://svn.apache.org/r1598138 ] SOLR-6118: expand.sort bug for function queries; needed to sort.rewrite(searcher) expand.sort doesn't work with function queries -- Key: SOLR-6118 URL: https://issues.apache.org/jira/browse/SOLR-6118 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-6118_expand_sort_rewrite.patch The new ExpandComponent doesn't support function queries in the {{expand.sort}} parameter, such as geodist() for example. Here's the stack trace if you try: {noformat} 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.IllegalStateException: SortField needs to be rewritten through Sort.rewrite(..) and SortField.rewrite(..) at org.apache.lucene.search.SortField.getComparator(SortField.java:433) at org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66) at org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079) at org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310) at org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5831) Scale score PostFilter
[ https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011568#comment-14011568 ] Joel Bernstein commented on SOLR-5831: -- If your Query class implements PostFilter and ScoreFilter then the SolrIndexSearcher will make sure the scorer is present during docset retrieval. ScoreFilter is just a flag, no methods. It's little things like this that make me believe this would be better as a pluggable collector. SOLR-5973 is now committed. You can also see a pluggable collector example with SOLR-6088. Scale score PostFilter -- Key: SOLR-5831 URL: https://issues.apache.org/jira/browse/SOLR-5831 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.7 Reporter: Peter Keegan Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, TestScaleScoreQParserPlugin.patch The ScaleScoreQParserPlugin is a PostFilter that performs score scaling. This is an alternative to using a function query wrapping a scale() wrapping a query(). For example: select?qq={!edismax v='news' qf='title^2 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query v=$qq} The problem with this query is that it has to scale every hit. Usually, only the returned hits need to be scaled, but there may be use cases where the number of hits to be scaled is greater than the returned hit count, but less than or equal to the total hit count. Sample syntax: fq={!scalescore+l=0.0 u=1.0 maxscalehits=1 func=sum(product(sscore(),0.75),product(field(myfield),0.25))} l=0.0 u=1.0 //Scale scores to values between 0-1, inclusive maxscalehits=1//The maximum number of result scores to scale (-1 = all hits, 0 = results 'page' size) func=... //Apply the composite function to each hit. The scaled score value is accessed by the 'score()' value source All parameters are optional. The defaults are: l=0.0 u=1.0 maxscalehits=0 (result window size) func=(null) Note: this patch is not complete, as it contains no test cases and may not conform to all the guidelines in http://wiki.apache.org/solr/HowToContribute. I would appreciate any feedback on the usability and implementation. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4502) ShardHandlerFactory not initialized in CoreContainer when creating a Core manually.
[ https://issues.apache.org/jira/browse/SOLR-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011600#comment-14011600 ] Erick Erickson commented on SOLR-4502: -- bq: and I argue not with a SolrConfig object either Hmmm. What about running EmbeddedSolrServer as part of the MapReduceIndexerTool, especially when the configs are kept in ZooKeeper and the index is backed by HDFS? ...and the thigh bone is connected to the arm bone ShardHandlerFactory not initialized in CoreContainer when creating a Core manually. --- Key: SOLR-4502 URL: https://issues.apache.org/jira/browse/SOLR-4502 Project: Solr Issue Type: Bug Affects Versions: 4.1 Reporter: Michael Aspetsberger Assignee: Mark Miller Labels: NPE Fix For: 4.9, 5.0 We are using an embedded solr server for our unit testing purposes. In our scenario, we create a {{CoreContainer}} using only the solr-home path, and then create the cores manually using a {{CoreDescriptor}}. While the creation appears to work fine, it hits an NPE when it handles the search: {quote} Caused by: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150) {quote} According to http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3CE8A9BF60-5577-45F9-8BEA-B85616C6539D%40gmail.com%3E , this is due to a missing {{CoreContainer#load}}. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6118) expand.sort doesn't work with function queries
[ https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011615#comment-14011615 ] ASF subversion and git services commented on SOLR-6118: --- Commit 1598147 from [~dsmiley] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1598147 ] SOLR-6118: expand.sort bug for function queries; needed to sort.rewrite(searcher) expand.sort doesn't work with function queries -- Key: SOLR-6118 URL: https://issues.apache.org/jira/browse/SOLR-6118 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-6118_expand_sort_rewrite.patch The new ExpandComponent doesn't support function queries in the {{expand.sort}} parameter, such as geodist() for example. Here's the stack trace if you try: {noformat} 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.IllegalStateException: SortField needs to be rewritten through Sort.rewrite(..) and SortField.rewrite(..) at org.apache.lucene.search.SortField.getComparator(SortField.java:433) at org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66) at org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079) at org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310) at org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6118) expand.sort doesn't work with function queries
[ https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved SOLR-6118. Resolution: Fixed Fix Version/s: 5.0 4.9 expand.sort doesn't work with function queries -- Key: SOLR-6118 URL: https://issues.apache.org/jira/browse/SOLR-6118 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: David Smiley Assignee: David Smiley Fix For: 4.9, 5.0 Attachments: SOLR-6118_expand_sort_rewrite.patch The new ExpandComponent doesn't support function queries in the {{expand.sort}} parameter, such as geodist() for example. Here's the stack trace if you try: {noformat} 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.IllegalStateException: SortField needs to be rewritten through Sort.rewrite(..) and SortField.rewrite(..) at org.apache.lucene.search.SortField.getComparator(SortField.java:433) at org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66) at org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079) at org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310) at org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5831) Scale score PostFilter
[ https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011635#comment-14011635 ] Peter Keegan commented on SOLR-5831: Thanks, that was a simple fix. But if the results are coming from the cache, why does the PostFilter collection have to be rerun? I totally agree that there are a lot of little details that make it tricky to implement a PostFilter. For the short term, we'll likely go to production with it, though, since we're running on 4.6.1. Can the pluggable collector framework be patched into 4.6.1? (when I looked at it a while ago, it didn't seem so) Peter Scale score PostFilter -- Key: SOLR-5831 URL: https://issues.apache.org/jira/browse/SOLR-5831 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.7 Reporter: Peter Keegan Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, TestScaleScoreQParserPlugin.patch The ScaleScoreQParserPlugin is a PostFilter that performs score scaling. This is an alternative to using a function query wrapping a scale() wrapping a query(). For example: select?qq={!edismax v='news' qf='title^2 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query v=$qq} The problem with this query is that it has to scale every hit. Usually, only the returned hits need to be scaled, but there may be use cases where the number of hits to be scaled is greater than the returned hit count, but less than or equal to the total hit count. Sample syntax: fq={!scalescore+l=0.0 u=1.0 maxscalehits=1 func=sum(product(sscore(),0.75),product(field(myfield),0.25))} l=0.0 u=1.0 //Scale scores to values between 0-1, inclusive maxscalehits=1//The maximum number of result scores to scale (-1 = all hits, 0 = results 'page' size) func=... //Apply the composite function to each hit. The scaled score value is accessed by the 'score()' value source All parameters are optional. The defaults are: l=0.0 u=1.0 maxscalehits=0 (result window size) func=(null) Note: this patch is not complete, as it contains no test cases and may not conform to all the guidelines in http://wiki.apache.org/solr/HowToContribute. I would appreciate any feedback on the usability and implementation. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6118) expand.sort doesn't work with function queries
[ https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011670#comment-14011670 ] David Smiley commented on SOLR-6118: FYI I also committed minor improvements pertaining to java 5 generics. expand.sort doesn't work with function queries -- Key: SOLR-6118 URL: https://issues.apache.org/jira/browse/SOLR-6118 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: David Smiley Assignee: David Smiley Fix For: 4.9, 5.0 Attachments: SOLR-6118_expand_sort_rewrite.patch The new ExpandComponent doesn't support function queries in the {{expand.sort}} parameter, such as geodist() for example. Here's the stack trace if you try: {noformat} 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.IllegalStateException: SortField needs to be rewritten through Sort.rewrite(..) and SortField.rewrite(..) at org.apache.lucene.search.SortField.getComparator(SortField.java:433) at org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66) at org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079) at org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310) at org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5712) Remove Similarity.queryNorm
Michael McCandless created LUCENE-5712: -- Summary: Remove Similarity.queryNorm Key: LUCENE-5712 URL: https://issues.apache.org/jira/browse/LUCENE-5712 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Michael McCandless Fix For: 4.9, 5.0 This method is a no-op for ranking within one query, causes confusion for users making their own Similarity impls, and isn't necessary for / makes it harder to switch the default to more modern scoring models like BM25. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5831) Scale score PostFilter
[ https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011723#comment-14011723 ] Joel Bernstein commented on SOLR-5831: -- The main DocSet isn't cached. So, if you pull a DocList from the QueryResultCache, Solr needs to regenerate the DocSet for faceting etc... Scale score PostFilter -- Key: SOLR-5831 URL: https://issues.apache.org/jira/browse/SOLR-5831 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.7 Reporter: Peter Keegan Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, TestScaleScoreQParserPlugin.patch The ScaleScoreQParserPlugin is a PostFilter that performs score scaling. This is an alternative to using a function query wrapping a scale() wrapping a query(). For example: select?qq={!edismax v='news' qf='title^2 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query v=$qq} The problem with this query is that it has to scale every hit. Usually, only the returned hits need to be scaled, but there may be use cases where the number of hits to be scaled is greater than the returned hit count, but less than or equal to the total hit count. Sample syntax: fq={!scalescore+l=0.0 u=1.0 maxscalehits=1 func=sum(product(sscore(),0.75),product(field(myfield),0.25))} l=0.0 u=1.0 //Scale scores to values between 0-1, inclusive maxscalehits=1//The maximum number of result scores to scale (-1 = all hits, 0 = results 'page' size) func=... //Apply the composite function to each hit. The scaled score value is accessed by the 'score()' value source All parameters are optional. The defaults are: l=0.0 u=1.0 maxscalehits=0 (result window size) func=(null) Note: this patch is not complete, as it contains no test cases and may not conform to all the guidelines in http://wiki.apache.org/solr/HowToContribute. I would appreciate any feedback on the usability and implementation. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6118) expand.sort doesn't work with function queries
[ https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011969#comment-14011969 ] ASF subversion and git services commented on SOLR-6118: --- Commit 1598193 from [~dsmiley] in branch 'dev/trunk' [ https://svn.apache.org/r1598193 ] SOLR-6118: CHANGES.txt expand.sort doesn't work with function queries -- Key: SOLR-6118 URL: https://issues.apache.org/jira/browse/SOLR-6118 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: David Smiley Assignee: David Smiley Fix For: 4.9, 5.0 Attachments: SOLR-6118_expand_sort_rewrite.patch The new ExpandComponent doesn't support function queries in the {{expand.sort}} parameter, such as geodist() for example. Here's the stack trace if you try: {noformat} 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.IllegalStateException: SortField needs to be rewritten through Sort.rewrite(..) and SortField.rewrite(..) at org.apache.lucene.search.SortField.getComparator(SortField.java:433) at org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66) at org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079) at org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310) at org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6118) expand.sort doesn't work with function queries
[ https://issues.apache.org/jira/browse/SOLR-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011972#comment-14011972 ] ASF subversion and git services commented on SOLR-6118: --- Commit 1598194 from [~dsmiley] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1598194 ] Merged from trunk SOLR-6118: CHANGES.txt [from revision 1598193] expand.sort doesn't work with function queries -- Key: SOLR-6118 URL: https://issues.apache.org/jira/browse/SOLR-6118 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: David Smiley Assignee: David Smiley Fix For: 4.9, 5.0 Attachments: SOLR-6118_expand_sort_rewrite.patch The new ExpandComponent doesn't support function queries in the {{expand.sort}} parameter, such as geodist() for example. Here's the stack trace if you try: {noformat} 527561 [qtp1458849419-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.IllegalStateException: SortField needs to be rewritten through Sort.rewrite(..) and SortField.rewrite(..) at org.apache.lucene.search.SortField.getComparator(SortField.java:433) at org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.init(FieldValueHitQueue.java:66) at org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:171) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1133) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079) at org.apache.solr.handler.component.ExpandComponent$GroupExpandCollector.init(ExpandComponent.java:310) at org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:203) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5868) HttpClient should be configured to use ALLOW_ALL_HOSTNAME hostname verifier to simplify SSL setup
[ https://issues.apache.org/jira/browse/SOLR-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011974#comment-14011974 ] David Smiley commented on SOLR-5868: There's a lingering entry for this in CHANGES.txt on the 4x below the Getting Started section; probably as a result of a merge problem. If this issue isn't resolved them it should be removed; if it is resolved then it should be moved to the right spot. HttpClient should be configured to use ALLOW_ALL_HOSTNAME hostname verifier to simplify SSL setup - Key: SOLR-5868 URL: https://issues.apache.org/jira/browse/SOLR-5868 Project: Solr Issue Type: Improvement Affects Versions: 4.7 Reporter: Steve Davids Assignee: Mark Miller Fix For: 4.9, 5.0 Attachments: SOLR-5868.patch, SOLR-5868.patch The default HttpClient hostname verifier is the BROWSER_COMPATIBLE_HOSTNAME_VERIFIER which verifies the hostname that is being connected to matches the hostname presented within the certificate. This is meant to protect clients that are making external requests out across the internet, but requests within the the SOLR cluster should be trusted and can be relaxed to simplify the SSL/certificate setup process. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6029) CollapsingQParserPlugin throws ArrayIndexOutOfBoundsException if elevated doc has been deleted from a segment
[ https://issues.apache.org/jira/browse/SOLR-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012035#comment-14012035 ] ASF subversion and git services commented on SOLR-6029: --- Commit 1598196 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1598196 ] SOLR-6029: fix smoke test failure CollapsingQParserPlugin throws ArrayIndexOutOfBoundsException if elevated doc has been deleted from a segment - Key: SOLR-6029 URL: https://issues.apache.org/jira/browse/SOLR-6029 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.7.1 Reporter: Greg Harris Assignee: Joel Bernstein Priority: Minor Fix For: 4.8.1, 4.9 Attachments: SOLR-6029.patch CollapsingQParserPlugin misidentifies if a document is not found in a segment if the docid previously existed in a segment ie was deleted. Relevant code bit from CollapsingQParserPlugin needs to be changed from: -if(doc != -1) { +if((doc != -1) (doc != DocsEnum.NO_MORE_DOCS)) { What happens is if the doc is not found the returned value is DocsEnum.NO_MORE_DOCS. This would then get set in the fq bitSet array as the doc location causing an ArrayIndexOutOfBoundsException as the array is only as big as maxDocs. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6029) CollapsingQParserPlugin throws ArrayIndexOutOfBoundsException if elevated doc has been deleted from a segment
[ https://issues.apache.org/jira/browse/SOLR-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012034#comment-14012034 ] ASF subversion and git services commented on SOLR-6029: --- Commit 1598195 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1598195 ] SOLR-6029: fix smoke test failure CollapsingQParserPlugin throws ArrayIndexOutOfBoundsException if elevated doc has been deleted from a segment - Key: SOLR-6029 URL: https://issues.apache.org/jira/browse/SOLR-6029 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.7.1 Reporter: Greg Harris Assignee: Joel Bernstein Priority: Minor Fix For: 4.8.1, 4.9 Attachments: SOLR-6029.patch CollapsingQParserPlugin misidentifies if a document is not found in a segment if the docid previously existed in a segment ie was deleted. Relevant code bit from CollapsingQParserPlugin needs to be changed from: -if(doc != -1) { +if((doc != -1) (doc != DocsEnum.NO_MORE_DOCS)) { What happens is if the doc is not found the returned value is DocsEnum.NO_MORE_DOCS. This would then get set in the fq bitSet array as the doc location causing an ArrayIndexOutOfBoundsException as the array is only as big as maxDocs. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Request for discussion: Solr plugins
Hi, I would like to (re-)initiate a discussion about Solr support for plugin life-cycle (publish, discover, download, dependency management). Triggered by a discussion on the Solr mailing list: http://search-lucene.com/m/QTPaIv50e1subj=Re+Contribute+QParserPlugin My main points: 1) Plugins/Modules/packages seem to be a very core part of most of the modern projects 2) Solr is extremely modular on the implementation level. 3) The community has been slowly building various plugins for Solr, but without any way to announce/share them. 4) ElasticSearch has plugins and it's been consistently pointed out as a positive point (good on them) 5) Solr, frankly, is getting rather pudgy. Or possibly beyond mere pudgy. This is becoming especially noticeable by comparison with ElasticSearch but also with the increasing frequency of releases. I mentioned this issue a couple of times in the past under different angles (bundling Javadoc, compressing files, easy onboarding, etc). 6) Solr has so many features now that nobody will explore them all; yet people still download them and - sometimes - get confused by all the directories, jars and locations. Polish support alone has a significant number of jars (not sure about file sizes). I am not even talking about map-reduce+morphline in the recent release. 7) Solr is already published as a set of Maven jars with dependencies expressed between components. 8) Apart from making initial downloads smaller, having proper module system (publish, discover, download, install) provides incentives for people to push the packages out, creates a stronger community Now, I know that some of the weight might be addressed in Solr 5 by not bundling the war file as well as the libs. And some of the ElasticSearch comparison is due to the different philosophical approach (kitchen sink vs not even bundling Admin UI). And that any individual person does not download Solr packages that often (I might be an exception). But I still think we need the discussion. I would especially love to see a discussion of the lowest hanging fruit. Even if we cannot decompose Solr itself right now, maybe we can introduce additional package handling mechanism and then retrofit Solr into that. In terms of skin-in-the-game, I would be happy to build and operate package publish/discovery system/website if Solr was actually able to support one. :-) Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6062) Phrase queries are created for each field supplied through edismax's pf, pf2 and pf3 parameters (rather them being combined in a single dismax query)
[ https://issues.apache.org/jira/browse/SOLR-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012039#comment-14012039 ] Ron Mayer commented on SOLR-6062: - Regarding the original/linked issue allowed the same field to be passed through a pf parameter with differing slop values. The intent being that those scores would be combined, rather than the max being used across those fields. The observation that lead to using the same field with different slop values was that if: either many of the words in searched clauses were in the same paragraph ( a pretty large slop value); or many pairs of words from search clauses were in the same adjectives/noun clauses of the text (quite small small slop value; to make a search for 'old hairy cat' rank well against 'hairy old cat' ) a document was likely to be interesting. If I understand right, it sounds to me like what Michael described continue to be good for those cases.I'm traveling this week, but have some test cases comparing ranking of solr-2058 vs human-sorted documents that I can run when I'm back thursday of next week. Phrase queries are created for each field supplied through edismax's pf, pf2 and pf3 parameters (rather them being combined in a single dismax query) - Key: SOLR-6062 URL: https://issues.apache.org/jira/browse/SOLR-6062 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0 Reporter: Michael Dodsworth Priority: Minor Attachments: combined-phrased-dismax.patch https://issues.apache.org/jira/browse/SOLR-2058 subtly changed how phrase queries, created through the pf, pf2 and pf3 parameters, are merged into the main user query. For the query: 'term1 term2' with pf2:[field1, field2, field3] we now get (omitting the non phrase query section for clarity): {code:java} main query DisjunctionMaxQuery((field1:term1 term2^1.0)~0.1) DisjunctionMaxQuery((field2:term1 term2^1.0)~0.1) DisjunctionMaxQuery((field3:term1 term2^1.0)~0.1) {code} Prior to this change, we had: {code:java} main query DisjunctionMaxQuery((field1:term1 term2^1.0 | field2:term1 term2^1.0 | field3:term1 term2^1.0)~0.1) {code} The upshot being that if the phrase query term1 term2 appears in multiple fields, it will get a significant boost over the previous implementation. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5713) DocValues related CheckIndex test failure
David Smiley created LUCENE-5713: Summary: DocValues related CheckIndex test failure Key: LUCENE-5713 URL: https://issues.apache.org/jira/browse/LUCENE-5713 Project: Lucene - Core Issue Type: Bug Affects Versions: 5.0 Reporter: David Smiley The following reproduces for me and [~varunshenoy] on trunk: lucene/spatial %ant test -Dtestcase=SpatialOpRecursivePrefixTreeTest -Dtests.method=testContains -Dtests.seed=3AD27D1EB168088A {noformat} [junit4] 1 Strategy: RecursivePrefixTreeStrategy(SPG:(GeohashPrefixTree(maxLevels:2,ctx:SpatialContext.GEO))) [junit4] 1 CheckReader failed [junit4] 1 test: field norms.OK [0 fields] [junit4] 1 test: terms, freq, prox...OK [207 terms; 208 terms/docs pairs; 0 tokens] [junit4] 1 test: stored fields...OK [8 total field count; avg 2 fields per doc] [junit4] 1 test: term vectorsOK [0 total vector count; avg 0 term/freq vector fields per doc] [junit4] 1 test: docvalues...ERROR [dv for field: SpatialOpRecursivePrefixTreeTest has -1 ord but is not marked missing for doc: 0] [junit4] 1 java.lang.RuntimeException: dv for field: SpatialOpRecursivePrefixTreeTest has -1 ord but is not marked missing for doc: 0 [junit4] 1at org.apache.lucene.index.CheckIndex.checkSortedDocValues(CheckIndex.java:1414) [junit4] 1at org.apache.lucene.index.CheckIndex.checkDocValues(CheckIndex.java:1536) [junit4] 1at org.apache.lucene.index.CheckIndex.testDocValues(CheckIndex.java:1367) [junit4] 1at org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:229) [junit4] 1at org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:216) [junit4] 1at org.apache.lucene.util.LuceneTestCase.newSearcher(LuceneTestCase.java:1597) {noformat} A 1-in-500 random condition hit to check the index on newSearcher, hitting this. DocValues used to not be enabled for this spatial test but [~rcmuir] added it recently as part of the move to the DocValues API in lieu of the FieldCache API, and because the DisjointSpatialFilter uses getDocsWithField (though nothing else). That probably doesn't have anything to do with whatever the problem here is, though. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5680) Allow updating multiple DocValues fields atomically
[ https://issues.apache.org/jira/browse/LUCENE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012065#comment-14012065 ] Shai Erera commented on LUCENE-5680: I think it's ready so if there are no objections I'll commit later today. Allow updating multiple DocValues fields atomically --- Key: LUCENE-5680 URL: https://issues.apache.org/jira/browse/LUCENE-5680 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5680.patch, LUCENE-5680.patch, LUCENE-5680.patch, LUCENE-5680.patch This has come up on the list (http://markmail.org/message/2wmpvksuwc5t57pg) -- it would be good if we can allow updating several doc-values fields, atomically. It will also improve/simplify our tests, where today we index two fields, e.g. the field itself and a control field. In some multi-threaded tests, since we cannot be sure which updates came through first, we limit the test such that each thread updates a different set of fields, otherwise they will collide and it will be hard to verify the index in the end. I was working on a patch and it looks pretty simple to do, will post a patch shortly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012084#comment-14012084 ] Alexander S. commented on SOLR-5463: If, as David mentioned, Solr will add it only if it is not there, this should keep the ability for users to manually specify another key and order when that is required (a rare case it seems). Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6119) TestReplicationHandler attempts to remove open folders
Dawid Weiss created SOLR-6119: - Summary: TestReplicationHandler attempts to remove open folders Key: SOLR-6119 URL: https://issues.apache.org/jira/browse/SOLR-6119 Project: Solr Issue Type: Bug Reporter: Dawid Weiss Priority: Minor TestReplicationHandler has a weird logic around the 'snapDir' variable. It attempts to remove snapshot folders, even though they're not closed yet. My recent patch uncovered the bug but I don't know how to fix it cleanly -- the test itself seems to be very fragile (for example I don't understand the 'namedBackup' variable which is always set to true, yet there are conditionals around it). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6119) TestReplicationHandler attempts to remove open folders
[ https://issues.apache.org/jira/browse/SOLR-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012096#comment-14012096 ] ASF subversion and git services commented on SOLR-6119: --- Commit 1598206 from [~dawidweiss] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1598206 ] SOLR-6119: a quick workaround for the problem of removing files that are still open during the test. TestReplicationHandler attempts to remove open folders -- Key: SOLR-6119 URL: https://issues.apache.org/jira/browse/SOLR-6119 Project: Solr Issue Type: Bug Reporter: Dawid Weiss Priority: Minor TestReplicationHandler has a weird logic around the 'snapDir' variable. It attempts to remove snapshot folders, even though they're not closed yet. My recent patch uncovered the bug but I don't know how to fix it cleanly -- the test itself seems to be very fragile (for example I don't understand the 'namedBackup' variable which is always set to true, yet there are conditionals around it). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6119) TestReplicationHandler attempts to remove open folders
[ https://issues.apache.org/jira/browse/SOLR-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012094#comment-14012094 ] ASF subversion and git services commented on SOLR-6119: --- Commit 1598205 from [~dawidweiss] in branch 'dev/trunk' [ https://svn.apache.org/r1598205 ] SOLR-6119: a quick workaround for the problem of removing files that are still open during the test. TestReplicationHandler attempts to remove open folders -- Key: SOLR-6119 URL: https://issues.apache.org/jira/browse/SOLR-6119 Project: Solr Issue Type: Bug Reporter: Dawid Weiss Priority: Minor TestReplicationHandler has a weird logic around the 'snapDir' variable. It attempts to remove snapshot folders, even though they're not closed yet. My recent patch uncovered the bug but I don't know how to fix it cleanly -- the test itself seems to be very fragile (for example I don't understand the 'namedBackup' variable which is always set to true, yet there are conditionals around it). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5713) DocValues related CheckIndex test failure
[ https://issues.apache.org/jira/browse/LUCENE-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012101#comment-14012101 ] Robert Muir commented on LUCENE-5713: - I did not add docvalues to the lucene/spatial module. It still uses fieldcache. I hope this is clear that this is just a fieldcache bug :) DocValues related CheckIndex test failure - Key: LUCENE-5713 URL: https://issues.apache.org/jira/browse/LUCENE-5713 Project: Lucene - Core Issue Type: Bug Affects Versions: 5.0 Reporter: David Smiley The following reproduces for me and [~varunshenoy] on trunk: lucene/spatial %ant test -Dtestcase=SpatialOpRecursivePrefixTreeTest -Dtests.method=testContains -Dtests.seed=3AD27D1EB168088A {noformat} [junit4] 1 Strategy: RecursivePrefixTreeStrategy(SPG:(GeohashPrefixTree(maxLevels:2,ctx:SpatialContext.GEO))) [junit4] 1 CheckReader failed [junit4] 1 test: field norms.OK [0 fields] [junit4] 1 test: terms, freq, prox...OK [207 terms; 208 terms/docs pairs; 0 tokens] [junit4] 1 test: stored fields...OK [8 total field count; avg 2 fields per doc] [junit4] 1 test: term vectorsOK [0 total vector count; avg 0 term/freq vector fields per doc] [junit4] 1 test: docvalues...ERROR [dv for field: SpatialOpRecursivePrefixTreeTest has -1 ord but is not marked missing for doc: 0] [junit4] 1 java.lang.RuntimeException: dv for field: SpatialOpRecursivePrefixTreeTest has -1 ord but is not marked missing for doc: 0 [junit4] 1 at org.apache.lucene.index.CheckIndex.checkSortedDocValues(CheckIndex.java:1414) [junit4] 1 at org.apache.lucene.index.CheckIndex.checkDocValues(CheckIndex.java:1536) [junit4] 1 at org.apache.lucene.index.CheckIndex.testDocValues(CheckIndex.java:1367) [junit4] 1 at org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:229) [junit4] 1 at org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:216) [junit4] 1 at org.apache.lucene.util.LuceneTestCase.newSearcher(LuceneTestCase.java:1597) {noformat} A 1-in-500 random condition hit to check the index on newSearcher, hitting this. DocValues used to not be enabled for this spatial test but [~rcmuir] added it recently as part of the move to the DocValues API in lieu of the FieldCache API, and because the DisjointSpatialFilter uses getDocsWithField (though nothing else). That probably doesn't have anything to do with whatever the problem here is, though. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5713) FieldCache related test failure
[ https://issues.apache.org/jira/browse/LUCENE-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5713: Summary: FieldCache related test failure (was: DocValues related CheckIndex test failure) FieldCache related test failure --- Key: LUCENE-5713 URL: https://issues.apache.org/jira/browse/LUCENE-5713 Project: Lucene - Core Issue Type: Bug Affects Versions: 5.0 Reporter: David Smiley The following reproduces for me and [~varunshenoy] on trunk: lucene/spatial %ant test -Dtestcase=SpatialOpRecursivePrefixTreeTest -Dtests.method=testContains -Dtests.seed=3AD27D1EB168088A {noformat} [junit4] 1 Strategy: RecursivePrefixTreeStrategy(SPG:(GeohashPrefixTree(maxLevels:2,ctx:SpatialContext.GEO))) [junit4] 1 CheckReader failed [junit4] 1 test: field norms.OK [0 fields] [junit4] 1 test: terms, freq, prox...OK [207 terms; 208 terms/docs pairs; 0 tokens] [junit4] 1 test: stored fields...OK [8 total field count; avg 2 fields per doc] [junit4] 1 test: term vectorsOK [0 total vector count; avg 0 term/freq vector fields per doc] [junit4] 1 test: docvalues...ERROR [dv for field: SpatialOpRecursivePrefixTreeTest has -1 ord but is not marked missing for doc: 0] [junit4] 1 java.lang.RuntimeException: dv for field: SpatialOpRecursivePrefixTreeTest has -1 ord but is not marked missing for doc: 0 [junit4] 1 at org.apache.lucene.index.CheckIndex.checkSortedDocValues(CheckIndex.java:1414) [junit4] 1 at org.apache.lucene.index.CheckIndex.checkDocValues(CheckIndex.java:1536) [junit4] 1 at org.apache.lucene.index.CheckIndex.testDocValues(CheckIndex.java:1367) [junit4] 1 at org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:229) [junit4] 1 at org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:216) [junit4] 1 at org.apache.lucene.util.LuceneTestCase.newSearcher(LuceneTestCase.java:1597) {noformat} A 1-in-500 random condition hit to check the index on newSearcher, hitting this. DocValues used to not be enabled for this spatial test but [~rcmuir] added it recently as part of the move to the DocValues API in lieu of the FieldCache API, and because the DisjointSpatialFilter uses getDocsWithField (though nothing else). That probably doesn't have anything to do with whatever the problem here is, though. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org