[jira] Created: (LUCENE-1791) Enhance QueryUtils and CheckHIts to wrap everything they check in MultiReader/MultiSearcher
Enhance QueryUtils and CheckHIts to wrap everything they check in MultiReader/MultiSearcher --- Key: LUCENE-1791 URL: https://issues.apache.org/jira/browse/LUCENE-1791 Project: Lucene - Java Issue Type: Test Reporter: Hoss Man methods in CheckHits QueryUtils are in a good position to take any Searcher they are given and not only test it, but also test MultiReader MultiSearcher constructs built around them -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740439#action_12740439 ] Bill Bell commented on LUCENE-1781: --- Michael - Please rerun your tests. The 2 normalization functions probably are now not needed, but they are there as an added check... I am using the algorithm from Destination point given distance and bearing from start point at http://www.movable-type.co.uk/scripts/latlong.html Thanks. Large distances in Spatial go beyond Prime MEridian --- Key: LUCENE-1781 URL: https://issues.apache.org/jira/browse/LUCENE-1781 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Affects Versions: 2.9 Environment: All Reporter: Bill Bell Assignee: Michael McCandless Fix For: 3.1 Attachments: LLRect.java, LLRect.java, LUCENE-1781.patch http://amidev.kaango.com/solr/core0/select?fl=*json.nl=mapwt=jsonradius=5000rows=20lat=39.5500507q=hondaqt=geolong=-105.7820674 Get an error when using Solr when distance is calculated for the boundary box past 90 degrees. Aug 4, 2009 1:54:00 PM org.apache.solr.common.SolrException log SEVERE: java.lang.IllegalArgumentException: Illegal lattitude value 93.1558669413734 at org.apache.lucene.spatial.geometry.FloatLatLng.init(FloatLatLng.java:26) at org.apache.lucene.spatial.geometry.shape.LLRect.createBox(LLRect.java:93) at org.apache.lucene.spatial.tier.DistanceUtils.getBoundary(DistanceUtils.java:50) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoxShape(CartesianPolyFilterBuilder.java:47) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoundingArea(CartesianPolyFilterBuilder.java:109) at org.apache.lucene.spatial.tier.DistanceQueryBuilder.init(DistanceQueryBuilder.java:61) at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:151) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1328) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740439#action_12740439 ] Bill Bell edited comment on LUCENE-1781 at 8/7/09 1:03 AM: --- Michael - Please rerun your tests. The 2 normalization functions probably are now not needed, but they are there as an added check... This algorithm is standard, several web sites use it from Haversine. One example is at Destination point given distance and bearing from start point at http://www.movable-type.co.uk/scripts/latlong.html Thanks. was (Author: billnbell): Michael - Please rerun your tests. The 2 normalization functions probably are now not needed, but they are there as an added check... I am using the algorithm from Destination point given distance and bearing from start point at http://www.movable-type.co.uk/scripts/latlong.html Thanks. Large distances in Spatial go beyond Prime MEridian --- Key: LUCENE-1781 URL: https://issues.apache.org/jira/browse/LUCENE-1781 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Affects Versions: 2.9 Environment: All Reporter: Bill Bell Assignee: Michael McCandless Fix For: 3.1 Attachments: LLRect.java, LLRect.java, LUCENE-1781.patch http://amidev.kaango.com/solr/core0/select?fl=*json.nl=mapwt=jsonradius=5000rows=20lat=39.5500507q=hondaqt=geolong=-105.7820674 Get an error when using Solr when distance is calculated for the boundary box past 90 degrees. Aug 4, 2009 1:54:00 PM org.apache.solr.common.SolrException log SEVERE: java.lang.IllegalArgumentException: Illegal lattitude value 93.1558669413734 at org.apache.lucene.spatial.geometry.FloatLatLng.init(FloatLatLng.java:26) at org.apache.lucene.spatial.geometry.shape.LLRect.createBox(LLRect.java:93) at org.apache.lucene.spatial.tier.DistanceUtils.getBoundary(DistanceUtils.java:50) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoxShape(CartesianPolyFilterBuilder.java:47) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoundingArea(CartesianPolyFilterBuilder.java:109) at org.apache.lucene.spatial.tier.DistanceQueryBuilder.init(DistanceQueryBuilder.java:61) at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:151) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1328) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740478#action_12740478 ] Michael McCandless commented on LUCENE-1781: Thanks for the updated patch Bill! That's a good improvement (taking into account the varying miles per lng, depending on lat), but isn't that fix orthogonal to the normalization issue? Ie, one could still easily overflow lat or lng with a large enough miles. EG, I added 6000 miles as a testcase in TestCartesian, and if I turn off the normalization, it hits the same exception (Illegal lattitude value 94.77745787739758). And I'm still concerned that the normalization fails to properly cross the north (or south) pole, by flipping the lng whenever the lat is too high; instead it seems to incorrectly bounce the point back? Am I missing something? Large distances in Spatial go beyond Prime MEridian --- Key: LUCENE-1781 URL: https://issues.apache.org/jira/browse/LUCENE-1781 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Affects Versions: 2.9 Environment: All Reporter: Bill Bell Assignee: Michael McCandless Fix For: 3.1 Attachments: LLRect.java, LLRect.java, LUCENE-1781.patch http://amidev.kaango.com/solr/core0/select?fl=*json.nl=mapwt=jsonradius=5000rows=20lat=39.5500507q=hondaqt=geolong=-105.7820674 Get an error when using Solr when distance is calculated for the boundary box past 90 degrees. Aug 4, 2009 1:54:00 PM org.apache.solr.common.SolrException log SEVERE: java.lang.IllegalArgumentException: Illegal lattitude value 93.1558669413734 at org.apache.lucene.spatial.geometry.FloatLatLng.init(FloatLatLng.java:26) at org.apache.lucene.spatial.geometry.shape.LLRect.createBox(LLRect.java:93) at org.apache.lucene.spatial.tier.DistanceUtils.getBoundary(DistanceUtils.java:50) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoxShape(CartesianPolyFilterBuilder.java:47) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoundingArea(CartesianPolyFilterBuilder.java:109) at org.apache.lucene.spatial.tier.DistanceQueryBuilder.init(DistanceQueryBuilder.java:61) at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:151) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1328) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1782) Rename OriginalQueryParserHelper
[ https://issues.apache.org/jira/browse/LUCENE-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740481#action_12740481 ] Michael McCandless commented on LUCENE-1782: bq. I did not see the readme.txt for the StandardSyntaxParser.jj, but everything else looks good It's README.javacc, under contrib/queryparser. OK I'll commit shortly! Rename OriginalQueryParserHelper Key: LUCENE-1782 URL: https://issues.apache.org/jira/browse/LUCENE-1782 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 2.9 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1782.patch, LUCENE-1782.patch We should rename the new QueryParser so it's clearer that it's Lucene's default QueryParser, going forward, and not just a temporary bridge to a future new QueryParser. How about we rename oal.queryParser.original -- oal.queryParser.standard (can't use default: it's a Java keyword)? Then, leave the OriginalQueryParserHelper under that package, but simply rename it to QueryParser? This way if we create other sub-packages in the future, eg ComplexPhraseQueryParser, they too can have a QueryParser class under them, to make it clear that's the top class you use to parse queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1628) Persian Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740482#action_12740482 ] Robert Muir commented on LUCENE-1628: - I have been looking this over, I think this one is ready. any comments/concerns? Persian Analyzer Key: LUCENE-1628 URL: https://issues.apache.org/jira/browse/LUCENE-1628 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Robert Muir Assignee: Robert Muir Priority: Minor Fix For: 2.9 Attachments: LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.txt A simple persian analyzer. i measured trec scores with the benchmark package below against http://ece.ut.ac.ir/DBRG/Hamshahri/ : SimpleAnalyzer: SUMMARY Search Seconds: 0.012 DocName Seconds:0.020 Num Points: 981.015 Num Good Points: 33.738 Max Good Points: 36.185 Average Precision: 0.374 MRR:0.667 Recall: 0.905 Precision At 1: 0.585 Precision At 2: 0.531 Precision At 3: 0.513 Precision At 4: 0.496 Precision At 5: 0.486 Precision At 6: 0.487 Precision At 7: 0.479 Precision At 8: 0.465 Precision At 9: 0.458 Precision At 10:0.460 Precision At 11:0.453 Precision At 12:0.453 Precision At 13:0.445 Precision At 14:0.438 Precision At 15:0.438 Precision At 16:0.438 Precision At 17:0.429 Precision At 18:0.429 Precision At 19:0.419 Precision At 20:0.415 PersianAnalyzer: SUMMARY Search Seconds: 0.004 DocName Seconds:0.011 Num Points: 987.692 Num Good Points: 36.123 Max Good Points: 36.185 Average Precision: 0.481 MRR:0.833 Recall: 0.998 Precision At 1: 0.754 Precision At 2: 0.715 Precision At 3: 0.646 Precision At 4: 0.646 Precision At 5: 0.631 Precision At 6: 0.621 Precision At 7: 0.593 Precision At 8: 0.577 Precision At 9: 0.573 Precision At 10:0.566 Precision At 11:0.572 Precision At 12:0.562 Precision At 13:0.554 Precision At 14:0.549 Precision At 15:0.542 Precision At 16:0.538 Precision At 17:0.533 Precision At 18:0.527 Precision At 19:0.525 Precision At 20:0.518 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1789) getDocValues should provide a MultiReader DocValues abstraction
[ https://issues.apache.org/jira/browse/LUCENE-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740492#action_12740492 ] Michael McCandless commented on LUCENE-1789: It is nice that DocValues gives us the freedom to do this, but I'm not sure we should, because it's a sizable performance trap. Ie, we'll be silently inserting a call to ReaderUtil.subSearcher on every doc value lookup (vs previously when it was a single top-level array lookup). While client code that has relied on this in the past will nicely continue to function properly, if we make this change, its performance is going to silently take a [possibly sizable] hit. In general, with Lucene, we can do the per-segment switching up high (which is what the core now does, exclusively), or we can do it down low (creating MultiTermDocs, MultiTermEnum, MultiTermPositions, MultiDocValues, etc.), which has sizable performance costs. It's also costly for us because we'll have N different places where we must create maintain a MultiXXX class. I would love to someday deprecate all of the down low switching classes :) In the core I think we should always switch up high. We've already done this w/ searching and collection/sorting. In LUCENE-1771 we're fixing IndexSearcher.explain to do so as well. With external code, I'd like over time to strongly encourage only switching up high as well. Maybe it'd be best if we could somehow allow this down low switching for 2.9, but 1) warn that you'll see a performance hit right off, 2) deprecate it, and 3) and somehow state that in 3.0 you'll have to send only a SegmentReader to this API, instead. EG, imagine an app that created an external custom HitCollector that calls say FloatFieldSource on the top reader in order to use of a float value per doc in each collect() call. On upgrading to 2.9, this app will already have to make the switch to the Collector API, which'd be a great time for them to also then switch to pulling these float values per-segment. But, if we make the proposed change here, the app could in fact just keep working off the top-level values (eg if the ctor in their class is pulling these values), thinking everything is fine when in fact there is a sizable, silent perf hit. I'd prefer in 2.9 for them to also switch their DocValues lookup to be per segment. [Aside: once we gain clarity on LUCENE-831, hopefully we can do away with oal.search.function.FieldCacheSource, {Byte,Short,Int,Ord,ReverseOrd}FieldSource, etc. Ie these classes basically copy what FieldCache does, but expose a per-doc method call instead of a fixed array lookup.] getDocValues should provide a MultiReader DocValues abstraction --- Key: LUCENE-1789 URL: https://issues.apache.org/jira/browse/LUCENE-1789 Project: Lucene - Java Issue Type: Improvement Reporter: Hoss Man Priority: Minor Fix For: 2.9 When scoring a ValueSourceQuery, the scoring code calls ValueSource.getValues(reader) on *each* leaf level subreader -- so DocValue instances are backed by the individual FieldCache entries of the subreaders -- but if Client code were to inadvertently called getValues() on a MultiReader (or DirectoryReader) they would wind up using the outer FieldCache. Since getValues(IndexReader) returns DocValues, we have an advantage here that we don't have with FieldCache API (which is required to provide direct array access). getValues(IndexReader) could be implimented so that *IF* some a caller inadvertently passes in a reader with non-null subReaders, getValues could generate a DocValues instance for each of the subReaders, and then wrap them in a composite MultiDocValues. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1789) getDocValues should provide a MultiReader DocValues abstraction
[ https://issues.apache.org/jira/browse/LUCENE-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740494#action_12740494 ] Michael McCandless commented on LUCENE-1789: Or... how about if we made a separate helper class, whose purpose was to accept a top-level reader and do down low switching to this new MultiDocValues class. This class would be deprecated, ie, exist only in 2.9 to help external usage of the DocValues API migrate to up high switching. However, you'd have to explicitly create this class. EG, in the normal DocValues classes we throw an exception if you pass in a top-level reader, noting clearly that you could 1) switch to this helper class (at a sizable per-lookup performance hit), or 2) switch to looking up your values per-segment? This way at least it'd be much clearer to the external consumer the cost of using the down low switching class. It'd make the decision explicit, not silent, on upgrading to 2.9. getDocValues should provide a MultiReader DocValues abstraction --- Key: LUCENE-1789 URL: https://issues.apache.org/jira/browse/LUCENE-1789 Project: Lucene - Java Issue Type: Improvement Reporter: Hoss Man Priority: Minor Fix For: 2.9 When scoring a ValueSourceQuery, the scoring code calls ValueSource.getValues(reader) on *each* leaf level subreader -- so DocValue instances are backed by the individual FieldCache entries of the subreaders -- but if Client code were to inadvertently called getValues() on a MultiReader (or DirectoryReader) they would wind up using the outer FieldCache. Since getValues(IndexReader) returns DocValues, we have an advantage here that we don't have with FieldCache API (which is required to provide direct array access). getValues(IndexReader) could be implimented so that *IF* some a caller inadvertently passes in a reader with non-null subReaders, getValues could generate a DocValues instance for each of the subReaders, and then wrap them in a composite MultiDocValues. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1782) Rename OriginalQueryParserHelper
[ https://issues.apache.org/jira/browse/LUCENE-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1782. Resolution: Fixed Rename OriginalQueryParserHelper Key: LUCENE-1782 URL: https://issues.apache.org/jira/browse/LUCENE-1782 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 2.9 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1782.patch, LUCENE-1782.patch We should rename the new QueryParser so it's clearer that it's Lucene's default QueryParser, going forward, and not just a temporary bridge to a future new QueryParser. How about we rename oal.queryParser.original -- oal.queryParser.standard (can't use default: it's a Java keyword)? Then, leave the OriginalQueryParserHelper under that package, but simply rename it to QueryParser? This way if we create other sub-packages in the future, eg ComplexPhraseQueryParser, they too can have a QueryParser class under them, to make it clear that's the top class you use to parse queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1789) getDocValues should provide a MultiReader DocValues abstraction
[ https://issues.apache.org/jira/browse/LUCENE-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740328#action_12740328 ] Hoss Man edited comment on LUCENE-1789 at 8/7/09 7:16 AM: -- This idea orriginated in LUCENE-1749, see these comments... https://issues.apache.org/jira/browse/LUCENE-1749?focusedCommentId=12740155#action_12740155 https://issues.apache.org/jira/browse/LUCENE-1749?focusedCommentId=12740256#action_12740256 https://issues.apache.org/jira/browse/LUCENE-1749?focusedCommentId=12740278#action_12740278 I've marked this for 2.9 for now i think it's a nice to have in 2.9, because unlike general FieldCache usage, the API is abstract enough we can protect our users from mistakes; but i don't personally think it's critical that we do this if no one else wants to take a stab at it. (EDIT: shorter versions of URLs to prevent horizontal scroll) was (Author: hossman): This idea orriginated in LUCENE-1749, see these comments... https://issues.apache.org/jira/browse/LUCENE-1749?focusedCommentId=12740155page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12740155 https://issues.apache.org/jira/browse/LUCENE-1749?focusedCommentId=12740256page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12740256 https://issues.apache.org/jira/browse/LUCENE-1749?focusedCommentId=12740278page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12740278 I've marked this for 2.9 for now i think it's a nice to have in 2.9, because unlike general FieldCache usage, the API is abstract enough we can protect our users from mistakes; but i don't personally think it's critical that we do this if no one else wants to take a stab at it. getDocValues should provide a MultiReader DocValues abstraction --- Key: LUCENE-1789 URL: https://issues.apache.org/jira/browse/LUCENE-1789 Project: Lucene - Java Issue Type: Improvement Reporter: Hoss Man Priority: Minor Fix For: 2.9 When scoring a ValueSourceQuery, the scoring code calls ValueSource.getValues(reader) on *each* leaf level subreader -- so DocValue instances are backed by the individual FieldCache entries of the subreaders -- but if Client code were to inadvertently called getValues() on a MultiReader (or DirectoryReader) they would wind up using the outer FieldCache. Since getValues(IndexReader) returns DocValues, we have an advantage here that we don't have with FieldCache API (which is required to provide direct array access). getValues(IndexReader) could be implimented so that *IF* some a caller inadvertently passes in a reader with non-null subReaders, getValues could generate a DocValues instance for each of the subReaders, and then wrap them in a composite MultiDocValues. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
JDK 1.5 in Analyzers
Looks like more 1.5 in contrib/analyzers, even though the smartcn build says 1.4: compile-core: [mkdir] Created dir: /lucene/java/lucene-clean/build/contrib/ analyzers/smartcn/classes/java [javac] Compiling 18 source files to /lucene/java/lucene-clean/ build/contrib/analyzers/smartcn/classes/java [javac] /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/ java/org/apache/lucene/analysis/cn/smart/hhmm/SegToken.java:94: hashCode() in java.lang.Object cannot be applied to (char[]) [javac] result = prime * result + Arrays.hashCode(charArray); [javac] ^ [javac] /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/ java/org/apache/lucene/analysis/cn/smart/hhmm/SegToken.java:94: incompatible types [javac] found : java.lang.String [javac] required: int [javac] result = prime * result + Arrays.hashCode(charArray); [javac] ^ [javac] /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/ java/org/apache/lucene/analysis/cn/smart/hhmm/SegTokenPair.java:54: hashCode() in java.lang.Object cannot be applied to (char[]) [javac] result = prime * result + Arrays.hashCode(charArray); [javac] ^ [javac] /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/ java/org/apache/lucene/analysis/cn/smart/hhmm/SegTokenPair.java:54: incompatible types [javac] found : java.lang.String [javac] required: int [javac] result = prime * result + Arrays.hashCode(charArray); [javac] ^ [javac] Note: /lucene/java/lucene-clean/contrib/analyzers/smartcn/ src/java/org/apache/lucene/analysis/cn/SmartChineseAnalyzer.java uses or overrides a deprecated API. [javac] Note: Recompile with -deprecation for details. [javac] 4 errors - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740575#action_12740575 ] Yonik Seeley commented on LUCENE-1768: -- It feels like going that route would add much code and complexity. If the user already knows how to create a range query in code, it's much more straightforward to just do {code} if (money.equals(field)) return new NumericRangeQuery(field,...) else return super.getRangeQuery(field,...) {code} NumericRange support for new query parser - Key: LUCENE-1768 URL: https://issues.apache.org/jira/browse/LUCENE-1768 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9 It would be good to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. There is currently no way to extract if a field is numeric from the index, so the user will have to configure the FieldConfig objects in the ConfigHandler. But if this is done, it will not be that difficult to implement the rest. The only difference between the current handling of RangeQuery is then the instantiation of the correct Query type and conversion of the entered numeric values (simple Number.valueOf(...) cast of the user entered numbers). Evenerything else is identical, NumericRangeQuery also supports the MTQ rewrite modes (as it is a MTQ). Another thing is a change in Date semantics. There are some strange flags in the current parser that tells it how to handle dates. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1789) getDocValues should provide a MultiReader DocValues abstraction
[ https://issues.apache.org/jira/browse/LUCENE-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740574#action_12740574 ] Hoss Man commented on LUCENE-1789: -- {quote} While client code that has relied on this in the past will nicely continue to function properly, if we make this change, its performance is going to silently take a [possibly sizable] hit. {quote} Correct: a change like this could cause 2.9 to introduce a _time_ based performance hit from the added method call to resolve the sub(reader|docvalue) on each method call ... but if we don't have a change like this, 2.9 could introduce a _memory_ based performance hit from the other FieldCache changes as it client code accessing DocValues for the top level reader will create a duplication of the whole array. Incidently: I'm willing to believe you that the time based perf hit would be high, but my instinct is that it wouldn't be that bad: the DocValues API already introduces at least one method call per doc lookup (two depending on datatype). adding a second method call to delegate to a sub-DocValues isntance doesn't seem that bad (especially since a new MultDocValues class could get the subReader list and compute the docId offsets on init, and then reuse them on each method call) bq. In the core I think we should always switch up high. (In case there is any confusion: wasn't suggesting that we stop using up high switching on DocValues in code included in the Lucene dist, i was suggesting that if someone uses DocValues directly in their code (against a top level reader) then we help them out by giving them the down low switching ... so expected usages wouldn't pay the added time based hit, just unexpected usages (which would be saved from the memory hit)) {quote} Maybe it'd be best if we could somehow allow this down low switching for 2.9, but 1) warn that you'll see a performance hit right off, 2) deprecate it, and 3) and somehow state that in 3.0 you'll have to send only a SegmentReader to this API, instead. {quote} that would get into really sticky territory for people writting custom IndexReaders (or using FilteredIndexReader) bq. But, if we make the proposed change here, the app could in fact just keep working off the top-level values (eg if the ctor in their class is pulling these values), thinking everything is fine when in fact there is a sizable, silent perf hit. I agree ... but unless i'm missing something about the code on the trunk, that situation already exists: the developer might switch to using the Collector API, but nothing about the current trunk will prevent/warn him that this... {code} ValueSource vs = new ValueSource(aFieldIAlsoSortOn); IndexReader r = getCurrentReaderThatCouldBeAMultiReader(); DocValues vals = vs.getDocValues(r); {code} ...could have a sizable, silent, _memory_ perf hit in 2.9 (ValueSource.getValues has a javadoc indicating that caching will be done on the IndexReader passed in, but your comment suggests that if 2.9 were released today (with hte current trunk) people upgrading would have some obvious way of noticing that they need to pass a sub reader to getValues) getDocValues should provide a MultiReader DocValues abstraction --- Key: LUCENE-1789 URL: https://issues.apache.org/jira/browse/LUCENE-1789 Project: Lucene - Java Issue Type: Improvement Reporter: Hoss Man Priority: Minor Fix For: 2.9 When scoring a ValueSourceQuery, the scoring code calls ValueSource.getValues(reader) on *each* leaf level subreader -- so DocValue instances are backed by the individual FieldCache entries of the subreaders -- but if Client code were to inadvertently called getValues() on a MultiReader (or DirectoryReader) they would wind up using the outer FieldCache. Since getValues(IndexReader) returns DocValues, we have an advantage here that we don't have with FieldCache API (which is required to provide direct array access). getValues(IndexReader) could be implimented so that *IF* some a caller inadvertently passes in a reader with non-null subReaders, getValues could generate a DocValues instance for each of the subReaders, and then wrap them in a composite MultiDocValues. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1790) Boosting Function Term Query
[ https://issues.apache.org/jira/browse/LUCENE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1790: Description: Similar to the BoostingTermQuery, the BoostingFunctionTermQuery is a SpanTermQuery, but the difference is the payload score for a doc is not the average of all the payloads, but applies a function to them instead. BoostingTermQuery becomes a BoostingFunctionTermQuery with an AveragePayloadFunction applied to it. (was: Similar to the BoostingTermQuery, the BoostingMaxTermQuery is a SpanTermQuery, but the difference is the payload score for a doc is not the average of all the payloads, but the maximum instead.) Summary: Boosting Function Term Query (was: Boosting Max Term Query) Boosting Function Term Query Key: LUCENE-1790 URL: https://issues.apache.org/jira/browse/LUCENE-1790 Project: Lucene - Java Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 2.9 Attachments: LUCENE-1790.patch Similar to the BoostingTermQuery, the BoostingFunctionTermQuery is a SpanTermQuery, but the difference is the payload score for a doc is not the average of all the payloads, but applies a function to them instead. BoostingTermQuery becomes a BoostingFunctionTermQuery with an AveragePayloadFunction applied to it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1790) Boosting Function Term Query
[ https://issues.apache.org/jira/browse/LUCENE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1790: Attachment: LUCENE-1790.patch Refactors BoostingTermQuery to be a BoostingFunctionQuery. Adds in several PayloadFunction implementations. All tests pass Will commit today or tomorrow. Boosting Function Term Query Key: LUCENE-1790 URL: https://issues.apache.org/jira/browse/LUCENE-1790 Project: Lucene - Java Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 2.9 Attachments: LUCENE-1790.patch, LUCENE-1790.patch Similar to the BoostingTermQuery, the BoostingFunctionTermQuery is a SpanTermQuery, but the difference is the payload score for a doc is not the average of all the payloads, but applies a function to them instead. BoostingTermQuery becomes a BoostingFunctionTermQuery with an AveragePayloadFunction applied to it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1789) getDocValues should provide a MultiReader DocValues abstraction
[ https://issues.apache.org/jira/browse/LUCENE-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740604#action_12740604 ] Michael McCandless commented on LUCENE-1789: bq. Correct: a change like this could cause 2.9 to introduce a time based performance hit from the added method call to resolve the sub(reader|docvalue) on each method call ... but if we don't have a change like this, 2.9 could introduce a memory based performance hit from the other FieldCache changes as it client code accessing DocValues for the top level reader will create a duplication of the whole array. True, and of the two, I agree a hidden time cost is the lesser evil. But I'd prefer to not hide the cost, ie, encourage/force an explicit choice when users upgrade to 2.9. If we can't think of some realistic way to do that, then I agree we should go forward with the current approach... bq. Incidently: I'm willing to believe you that the time based perf hit would be high, but my instinct is that it wouldn't be that bad: the DocValues API already introduces at least one method call per doc lookup (two depending on datatype). adding a second method call to delegate to a sub-DocValues isntance doesn't seem that bad (especially since a new MultDocValues class could get the subReader list and compute the docId offsets on init, and then reuse them on each method call) It's the added binary search in ReaderUtil.subSearcher that worries me. {quote} bq. In the core I think we should always switch up high. (In case there is any confusion: wasn't suggesting that we stop using up high switching on DocValues in code included in the Lucene dist, i was suggesting that if someone uses DocValues directly in their code (against a top level reader) then we help them out by giving them the down low switching ... so expected usages wouldn't pay the added time based hit, just unexpected usages (which would be saved from the memory hit)) {quote} Understood. We are only talking about external usages of these APIs, and even then, exceptionally advance usage. Ie, users who make their own ValueSourceQuery and then run it against an IndexSearcher will be fine. It's only people who directly invoke getValues, w/ some random reader, that hit the hidden cost. {quote} bq. But, if we make the proposed change here, the app could in fact just keep working off the top-level values (eg if the ctor in their class is pulling these values), thinking everything is fine when in fact there is a sizable, silent perf hit. I agree ... but unless i'm missing something about the code on the trunk, that situation already exists: the developer might switch to using the Collector API, but nothing about the current trunk will prevent/warn him that this... ValueSource vs = new ValueSource(aFieldIAlsoSortOn); IndexReader r = getCurrentReaderThatCouldBeAMultiReader(); DocValues vals = vs.getDocValues(r); ...could have a sizable, silent, memory perf hit in 2.9 (ValueSource.getValues has a javadoc indicating that caching will be done on the IndexReader passed in, but your comment suggests that if 2.9 were released today (with hte current trunk) people upgrading would have some obvious way of noticing that they need to pass a sub reader to getValues) {quote} How about this: we add a new param to the ctors of the value sources, called (say) acceptMultiReader. It has 3 values: - NO means an exception is thrown on seeing a top reader (where top reader means any reader whose getSequentialSubReaders is non-null) - YES_BURN_TIME means accept the top reader and make a MultiDocValues - YES_BURN_MEMORY means use the top reader against the field cache We deprecate the existing ctors, so on moving to 3.0 you have to make an explicit choice, but default it to YES_BURN_TIME. One benefit of making the choice explicit is for those apps that have memory to burn they may in fact choose to burn it. Would this give a clean migration path forward? getDocValues should provide a MultiReader DocValues abstraction --- Key: LUCENE-1789 URL: https://issues.apache.org/jira/browse/LUCENE-1789 Project: Lucene - Java Issue Type: Improvement Reporter: Hoss Man Priority: Minor Fix For: 2.9 When scoring a ValueSourceQuery, the scoring code calls ValueSource.getValues(reader) on *each* leaf level subreader -- so DocValue instances are backed by the individual FieldCache entries of the subreaders -- but if Client code were to inadvertently called getValues() on a MultiReader (or DirectoryReader) they would wind up using the outer FieldCache. Since getValues(IndexReader) returns DocValues, we have an advantage here that we don't have with FieldCache API (which is required to provide direct
[jira] Commented: (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740608#action_12740608 ] Michael McCandless commented on LUCENE-1768: bq. You could still do something similar by simply override RangeQueryNodeBuilder.build(QueryNode queryNode), but this is not clean (it is kind of a hack). What's the cleaner way to do this? EG could I make my own ParametricRangeQueryNodeProcessor, subclassing the current one in the standard.processors package, that overrides postProcessNode to do its own conversion? NumericRange support for new query parser - Key: LUCENE-1768 URL: https://issues.apache.org/jira/browse/LUCENE-1768 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9 It would be good to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. There is currently no way to extract if a field is numeric from the index, so the user will have to configure the FieldConfig objects in the ConfigHandler. But if this is done, it will not be that difficult to implement the rest. The only difference between the current handling of RangeQuery is then the instantiation of the correct Query type and conversion of the entered numeric values (simple Number.valueOf(...) cast of the user entered numbers). Evenerything else is identical, NumericRangeQuery also supports the MTQ rewrite modes (as it is a MTQ). Another thing is a change in Date semantics. There are some strange flags in the current parser that tells it how to handle dates. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: JDK 1.5 in Analyzers
I am SO looking forward to 3.0 ;) I'll fix. Mike On Fri, Aug 7, 2009 at 10:34 AM, Grant Ingersollgsing...@apache.org wrote: Looks like more 1.5 in contrib/analyzers, even though the smartcn build says 1.4: compile-core: [mkdir] Created dir: /lucene/java/lucene-clean/build/contrib/analyzers/smartcn/classes/java [javac] Compiling 18 source files to /lucene/java/lucene-clean/build/contrib/analyzers/smartcn/classes/java [javac] /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegToken.java:94: hashCode() in java.lang.Object cannot be applied to (char[]) [javac] result = prime * result + Arrays.hashCode(charArray); [javac] ^ [javac] /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegToken.java:94: incompatible types [javac] found : java.lang.String [javac] required: int [javac] result = prime * result + Arrays.hashCode(charArray); [javac] ^ [javac] /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegTokenPair.java:54: hashCode() in java.lang.Object cannot be applied to (char[]) [javac] result = prime * result + Arrays.hashCode(charArray); [javac] ^ [javac] /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegTokenPair.java:54: incompatible types [javac] found : java.lang.String [javac] required: int [javac] result = prime * result + Arrays.hashCode(charArray); [javac] ^ [javac] Note: /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/SmartChineseAnalyzer.java uses or overrides a deprecated API. [javac] Note: Recompile with -deprecation for details. [javac] 4 errors - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1749) FieldCache introspection API
[ https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740624#action_12740624 ] Michael McCandless commented on LUCENE-1749: Maybe we should simply print a warning, eg to System.err, on detecting that 2X RAM usage has occurred, pointing people to the sanity checker? We could eg do it once only so we don't spam the stderr logs... FieldCache introspection API Key: LUCENE-1749 URL: https://issues.apache.org/jira/browse/LUCENE-1749 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Hoss Man Priority: Minor Fix For: 2.9 Attachments: fieldcache-introspection.patch, LUCENE-1749-hossfork.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch FieldCache should expose an Expert level API for runtime introspection of the FieldCache to provide info about what is in the FieldCache at any given moment. We should also provide utility methods for sanity checking that the FieldCache doesn't contain anything odd... * entries for the same reader/field with different types/parsers * entries for the same field/type/parser in a reader and it's subreader(s) * etc... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Sorting cleanup and FieldCacheImpl.Entry confusion
I don't know why Entry has int type and String locale, either. I agree it'd be cleaner for FieldSortedHitQueue to store these on its own, privately. Note that FieldSortedHitQueue is deprecated in favor of FieldValueHitQueue, and that FieldValueHitQueue doesn't cache comparators anymore. Mike On Thu, Aug 6, 2009 at 8:07 PM, Chris Hostetterhossman_luc...@fucit.org wrote: Hey everybody, over in LUCENE-1749 i'm trying to make sanity checking of the FieldCache possible, and i'm banging my head into a few walls, and hoping people can help me fill in the gaps about how sorting w/FieldCache is *suppose* to work. For starters: i was getting confused why some debugging code wasn't showing the Locale specified when getting the String[] cache for Locale.US. Looking at FieldSortedHitQueue.comparatorStringLocale, i see that it calls FieldCache.DEFAULT.getStrings(reader, field) and doesn't pass the Locale at all -- which makes me wonder why FieldCacheImpl.Entry bothers having a locale member at all? ... it seems like the only purpose is so FieldSortedHitQueue can abuse the Entry object as a key for it's own static final FieldCacheImpl.Cache Comparators ... but couldn't it just use it's on key object and keep FieldCacheImpl.Entry simpler? Ditto for the int type property of FieldCacheImpl.Entry, which has the comment // which SortField type ... it's used by FieldSortedHitQueue in it's Comparators cache (and getCachedComparator) but FieldCacheImpl never uses it, but the time the FieldCache is access, the type has already been translated into the appropriate method (getInts, getBytes, etc...) if FieldSortedHitQueue used it's own private inner class for it's comparator cache, the FieldCacheImpl.Entry code could eliminate a lot of cruft, and the class would get much simpler. Does anyone know a good reason *why* it's implemented the way it currently is? or is this simply the end result of code gradually being refactored out of FieldCcaheImpl and into FieldSortedHitQueue ? -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: JDK 1.5 in Analyzers
On Fri, Aug 7, 2009 at 6:17 PM, Michael McCandlessluc...@mikemccandless.com wrote: I am SO looking forward to 3.0 ;) Oh man! Me too! I'll fix. Mike On Fri, Aug 7, 2009 at 10:34 AM, Grant Ingersollgsing...@apache.org wrote: Looks like more 1.5 in contrib/analyzers, even though the smartcn build says 1.4: compile-core: [mkdir] Created dir: /lucene/java/lucene-clean/build/contrib/analyzers/smartcn/classes/java [javac] Compiling 18 source files to /lucene/java/lucene-clean/build/contrib/analyzers/smartcn/classes/java [javac] /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegToken.java:94: hashCode() in java.lang.Object cannot be applied to (char[]) [javac] result = prime * result + Arrays.hashCode(charArray); [javac] ^ [javac] /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegToken.java:94: incompatible types [javac] found : java.lang.String [javac] required: int [javac] result = prime * result + Arrays.hashCode(charArray); [javac] ^ [javac] /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegTokenPair.java:54: hashCode() in java.lang.Object cannot be applied to (char[]) [javac] result = prime * result + Arrays.hashCode(charArray); [javac] ^ [javac] /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegTokenPair.java:54: incompatible types [javac] found : java.lang.String [javac] required: int [javac] result = prime * result + Arrays.hashCode(charArray); [javac] ^ [javac] Note: /lucene/java/lucene-clean/contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/SmartChineseAnalyzer.java uses or overrides a deprecated API. [javac] Note: Recompile with -deprecation for details. [javac] 4 errors - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: svn commit: r802085 - in /lucene/java/trunk/contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm: SegToken.java SegTokenPair.java
By the way: o.a.l.util.ArrayUtil contains a hashCode impl for char arrays. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: mikemcc...@apache.org [mailto:mikemcc...@apache.org] Sent: Friday, August 07, 2009 6:48 PM To: java-comm...@lucene.apache.org Subject: svn commit: r802085 - in /lucene/java/trunk/contrib/analyzers/smartcn/src/java/org/apache/lucene/an alysis/cn/smart/hhmm: SegToken.java SegTokenPair.java Author: mikemccand Date: Fri Aug 7 16:48:09 2009 New Revision: 802085 URL: http://svn.apache.org/viewvc?rev=802085view=rev Log: fix smartcn to be JDK 1.4 only Modified: lucene/java/trunk/contrib/analyzers/smartcn/src/java/org/apache/lucene/ana lysis/cn/smart/hhmm/SegToken.java lucene/java/trunk/contrib/analyzers/smartcn/src/java/org/apache/lucene/ana lysis/cn/smart/hhmm/SegTokenPair.java Modified: lucene/java/trunk/contrib/analyzers/smartcn/src/java/org/apache/lucene/ana lysis/cn/smart/hhmm/SegToken.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/analyzers/smartcn/s rc/java/org/apache/lucene/analysis/cn/smart/hhmm/SegToken.java?rev=802085 r1=802084r2=802085view=diff == --- lucene/java/trunk/contrib/analyzers/smartcn/src/java/org/apache/lucene/ana lysis/cn/smart/hhmm/SegToken.java (original) +++ lucene/java/trunk/contrib/analyzers/smartcn/src/java/org/apache/lucene/ana lysis/cn/smart/hhmm/SegToken.java Fri Aug 7 16:48:09 2009 @@ -91,7 +91,9 @@ public int hashCode() { final int prime = 31; int result = 1; -result = prime * result + Arrays.hashCode(charArray); +for(int i=0;icharArray.length;i++) { + result = prime * result + charArray[i]; +} result = prime * result + endOffset; result = prime * result + index; result = prime * result + startOffset; Modified: lucene/java/trunk/contrib/analyzers/smartcn/src/java/org/apache/lucene/ana lysis/cn/smart/hhmm/SegTokenPair.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/analyzers/smartcn/s rc/java/org/apache/lucene/analysis/cn/smart/hhmm/SegTokenPair.java?rev=802 085r1=802084r2=802085view=diff == --- lucene/java/trunk/contrib/analyzers/smartcn/src/java/org/apache/lucene/ana lysis/cn/smart/hhmm/SegTokenPair.java (original) +++ lucene/java/trunk/contrib/analyzers/smartcn/src/java/org/apache/lucene/ana lysis/cn/smart/hhmm/SegTokenPair.java Fri Aug 7 16:48:09 2009 @@ -51,7 +51,9 @@ public int hashCode() { final int prime = 31; int result = 1; -result = prime * result + Arrays.hashCode(charArray); +for(int i=0;icharArray.length;i++) { + result = prime * result + charArray[i]; +} result = prime * result + from; result = prime * result + to; long temp; - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740638#action_12740638 ] Uwe Schindler commented on LUCENE-1607: --- Committed rev 802095. String.intern() faster alternative -- Key: LUCENE-1607 URL: https://issues.apache.org/jira/browse/LUCENE-1607 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Assignee: Yonik Seeley Fix For: 2.9 Attachments: intern.patch, LUCENE-1607-contrib.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch By using our own interned string pool on top of default, String.intern() can be greatly optimized. On my setup (java 6) this alternative runs ~15.8x faster for already interned strings, and ~2.2x faster for 'new String(interned)' For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740643#action_12740643 ] Luis Alves commented on LUCENE-1768: Hi Yonik, As I said before you can do that in the RangeQueryNodeBuilder.build(QueryNode queryNode), but it's ugly and this is not what we intended when using the new flexible query parser. The new flexible query parser does not follow the concept of method overwriting has the old one. So solutions that worked in the old queryparser, like overwriting a method, have to be implemented using a programmatic way. Your approach requires creating a new class, overwrite a method. you still need to create a instance of your QueryParser and is not reusable. Here is a sample of what your approach is: {code} Class YonikQueryParser extends QueryParser{ Query getRangeQuery(field,...) { if (money.equals(field)) return new NumericRangeQuery(field,...) else return super.getRangeQuery(field,...) } } ... QueryParser yqp = new YonikQueryParser(...); yqp.parser(query); {code} vs What I am proposing: {code} MapCharSequence, RangeTools.Type rangeTypes = new HashMapCharSequence, RangeTools.Type(); rangeTypes.put(money, RangeUtils.getType(RangeUtils.NUMERIC, RangeUtils.NumericType.Type.FLOAT, NumericUtils.PRECISION_STEP_DEFAULT) ); StandardQueryParser qp = new StandardQueryParser(); qp.setRangeTypes(rangeTypes); qp.parser(query); {code} The second approach is programmatic does not require a new class, or the overwrite of a method and is reusable by other users, and it's backward compatible, meaning we can integrate this on the current Flexible query parser and deliver this feature on 2.9 without affecting any current usecase. Your approach is not compatible, it does require new class, and is not programmatic, It's not reusable by other users (we can't commit your code to lucene), since fields are hard-coded. Also the approach I proposing is very similar to setFieldsBoost setDateResolution, already available on the old QP and the new flexible query parser. I also want to say, that extending the old QP vs extending the New flexible Query Parser approaches are never going to be similar, they completely different implementations. NumericRange support for new query parser - Key: LUCENE-1768 URL: https://issues.apache.org/jira/browse/LUCENE-1768 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9 It would be good to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. There is currently no way to extract if a field is numeric from the index, so the user will have to configure the FieldConfig objects in the ConfigHandler. But if this is done, it will not be that difficult to implement the rest. The only difference between the current handling of RangeQuery is then the instantiation of the correct Query type and conversion of the entered numeric values (simple Number.valueOf(...) cast of the user entered numbers). Evenerything else is identical, NumericRangeQuery also supports the MTQ rewrite modes (as it is a MTQ). Another thing is a change in Date semantics. There are some strange flags in the current parser that tells it how to handle dates. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740659#action_12740659 ] Yonik Seeley commented on LUCENE-1768: -- bq. It's not reusable by other users (we can't commit your code to lucene) Neither is your version with rangeTypes.put(money, RangeUtils.getType(RangeUtils.NUMERIC... That's the application specific configuration code and doesn't need (or want) to be committed. Directly instantiating the query you want is simple, ultimately configurable, and avoids adding a ton of unnecessary classes or methods that need to be kept in sync with everything that a user *may* want to do. Is there a simple way to provide a custom QueryBuilder for range queries (or any other query type?) I'm sure there must be, but there are so many classes in the new QP, I'm having a little difficulty finding my way around. NumericRange support for new query parser - Key: LUCENE-1768 URL: https://issues.apache.org/jira/browse/LUCENE-1768 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9 It would be good to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. There is currently no way to extract if a field is numeric from the index, so the user will have to configure the FieldConfig objects in the ConfigHandler. But if this is done, it will not be that difficult to implement the rest. The only difference between the current handling of RangeQuery is then the instantiation of the correct Query type and conversion of the entered numeric values (simple Number.valueOf(...) cast of the user entered numbers). Evenerything else is identical, NumericRangeQuery also supports the MTQ rewrite modes (as it is a MTQ). Another thing is a change in Date semantics. There are some strange flags in the current parser that tells it how to handle dates. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740662#action_12740662 ] Luis Alves commented on LUCENE-1768: {quote} What's the cleaner way to do this? EG could I make my own ParametricRangeQueryNodeProcessor, subclassing the current one in the standard.processors package, that overrides postProcessNode to do its own conversion? {quote} For Yonik simple requirement, you could Option 1 (more flexible): - make your own ParametricRangeQueryNodeProcessor, subclassing the current, returning NumericQueryNodes where needed - create a NumericQueryNode that extends RangeQueryNode (node extra code needed) - create a NumericQueryNodeBuilder that handles NumericQueryNodes, and set the map in StandardQueryTreeBuilder, ex: setBuilder(NumericQueryNode.class, new NumericQueryNodeBuilder()),. RangeQueryNodes will still be normally handled by the RangeQueryNodeBuilder. Option 2, (less flexible): - make your own RangeQueryNodeBuilder subclassing the current(ex: NumericQueryNodeBuilder) , set the map in StandardQueryTreeBuilder, ex: setBuilder(RangeQueryNode.class, new NumericQueryNodeBuilder()) Option 1, implements the correct usage of the APIs. It's more flexible and dirty work is done in the processors pipeline. Option 2, is not the correct use case for the APIs, requires less code and it will work, but the builder will be performing the tasks the Processor should be doing. NumericRange support for new query parser - Key: LUCENE-1768 URL: https://issues.apache.org/jira/browse/LUCENE-1768 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9 It would be good to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. There is currently no way to extract if a field is numeric from the index, so the user will have to configure the FieldConfig objects in the ConfigHandler. But if this is done, it will not be that difficult to implement the rest. The only difference between the current handling of RangeQuery is then the instantiation of the correct Query type and conversion of the entered numeric values (simple Number.valueOf(...) cast of the user entered numbers). Evenerything else is identical, NumericRangeQuery also supports the MTQ rewrite modes (as it is a MTQ). Another thing is a change in Date semantics. There are some strange flags in the current parser that tells it how to handle dates. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1771) Using explain may double ram reqs for fieldcaches when using ValueSourceQuery/CustomScoreQuery or for ConstantScoreQuerys that use a caching Filter.
[ https://issues.apache.org/jira/browse/LUCENE-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740667#action_12740667 ] Michael McCandless commented on LUCENE-1771: Patch looks good!: * Looks like you need to svn rm src/java/org/apache/lucene/search/QueryWeight.java * Some javadocs still reference QueryWeight * Why do we need this in Weight? {code} public Explanation explain(IndexReader reader, int doc) throws IOException { return explain(null, reader, doc); } {code} Ie, do we think there are places outside of Lucene that invoke Weight.explain directly? Using explain may double ram reqs for fieldcaches when using ValueSourceQuery/CustomScoreQuery or for ConstantScoreQuerys that use a caching Filter. Key: LUCENE-1771 URL: https://issues.apache.org/jira/browse/LUCENE-1771 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Mark Miller Assignee: Mark Miller Fix For: 2.9 Attachments: LUCENE-1771.bc-tests.patch, LUCENE-1771.patch, LUCENE-1771.patch, LUCENE-1771.patch, LUCENE-1771.patch, LUCENE-1771.patch, LUCENE-1771.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1790) Boosting Function Term Query
[ https://issues.apache.org/jira/browse/LUCENE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1790: Attachment: LUCENE-1790.patch Next take on this: 1. Added includeSpanScore flag, which allows you to ignore the score from the TermQuery part of the score and only count the payload. 2. Deprecated Similarity.scorePayload(String fieldName, ...) to a similar method that also takes in the Doc id. Now, in theory, you could have different scoring for payloads based on different documents, fields, etc. The old method just calls the new one and passes in a NO_DOC_ID_PROVIDED value (-1). 3. Added a Marker Interface named PayloadQuery and marked the various PayloadQueries. This could be useful for Queries that work with other PayloadQueries (more exclusive than the fact that they are SpanQueries. I really do intend to commit this :-) Boosting Function Term Query Key: LUCENE-1790 URL: https://issues.apache.org/jira/browse/LUCENE-1790 Project: Lucene - Java Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 2.9 Attachments: LUCENE-1790.patch, LUCENE-1790.patch, LUCENE-1790.patch Similar to the BoostingTermQuery, the BoostingFunctionTermQuery is a SpanTermQuery, but the difference is the payload score for a doc is not the average of all the payloads, but applies a function to them instead. BoostingTermQuery becomes a BoostingFunctionTermQuery with an AveragePayloadFunction applied to it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1790) Add Boosting Function Term Query and Some Payload Query refactorings
[ https://issues.apache.org/jira/browse/LUCENE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1790: Description: Similar to the BoostingTermQuery, the BoostingFunctionTermQuery is a SpanTermQuery, but the difference is the payload score for a doc is not the average of all the payloads, but applies a function to them instead. BoostingTermQuery becomes a BoostingFunctionTermQuery with an AveragePayloadFunction applied to it. Also add marker interface to indicate PayloadQuery types. Refactor Similarity.scorePayload to also take in the doc id. was:Similar to the BoostingTermQuery, the BoostingFunctionTermQuery is a SpanTermQuery, but the difference is the payload score for a doc is not the average of all the payloads, but applies a function to them instead. BoostingTermQuery becomes a BoostingFunctionTermQuery with an AveragePayloadFunction applied to it. Lucene Fields: [Patch Available] (was: [Patch Available, New]) Summary: Add Boosting Function Term Query and Some Payload Query refactorings (was: Boosting Function Term Query) Add Boosting Function Term Query and Some Payload Query refactorings Key: LUCENE-1790 URL: https://issues.apache.org/jira/browse/LUCENE-1790 Project: Lucene - Java Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 2.9 Attachments: LUCENE-1790.patch, LUCENE-1790.patch, LUCENE-1790.patch Similar to the BoostingTermQuery, the BoostingFunctionTermQuery is a SpanTermQuery, but the difference is the payload score for a doc is not the average of all the payloads, but applies a function to them instead. BoostingTermQuery becomes a BoostingFunctionTermQuery with an AveragePayloadFunction applied to it. Also add marker interface to indicate PayloadQuery types. Refactor Similarity.scorePayload to also take in the doc id. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740718#action_12740718 ] Luis Alves commented on LUCENE-1768: {quote} Neither is your version with rangeTypes.put(money, RangeUtils.getType(RangeUtils.NUMERIC... That's the application specific configuration code and doesn't need (or want) to be committed. {quote} You are correct, I was describing the use case from the user perspective. That code was a example how to use the API's if we implement them in the future, those API's are not currently available. {quote} Directly instantiating the query you want is simple, ultimately configurable, and avoids adding a ton of unnecessary classes or methods that need to be kept in sync with everything that a user may want to do. {quote} I'm not sure what to say here. So I'll point to the documentation that we currently have: You can read https://issues.apache.org/jira/secure/attachment/12410046/QueryParser_restructure_meetup_june2009_v2.pdf and the java docs for package org.apache.lucene.queryParser.core class org.apache.lucene.queryParser.standard.StandardQueryParser You can also look at TestSpanQueryParserSimpleSample junit for another example how the API's can be used, in a completely different way. The new QueryParser was designed to be extensible, allow the implementation of languages extensions or different languages, and have reusable components like the processors and builders We use SyntaxParsers, Processors and Builders, all are replaceable components at runtime. Any user can build it's own pipeline and create new processors, builders, querynodes and integrate them with the existing ones to create the features they require. Some of the features are: - Syntax Tree optimization - Syntax Tree expansion - Syntax Tree validation and error reporting - Tokenization and normalization of the query - Makes it easy to create extensions - Support for translation of error messages - Allows users to plug and play processors and builders, without having to modify lucene code. - Allow lucene users to implement features much faster - Allow users to change default behavior in a easy way without having to modify lucene code. {quote} Is there a simple way to provide a custom QueryBuilder for range queries (or any other query type?) I'm sure there must be, but there are so many classes in the new QP, I'm having a little difficulty finding my way around. {quote} {code} class NumericQueryNodeBuilder extends RangeQueryNodeBuilder { public TermRangeQuery build(QueryNode queryNode) throws QueryNodeException { RangeQueryNode rangeNode = (RangeQueryNode) queryNode; if (rangeNode.getField().toString().equals(money)) { // do whatever you need here with queryNode. return new NumericRangeQuery(field,...) } else { return super.build(queryNode); } } } public void testNewRangeQueryBuilder() throws Exception { StandardQueryParser qp = new StandardQueryParser(); QueryTreeBuilder builder = (QueryTreeBuilder)qp.getQueryBuilder(); builder.setBuilder(RangeQueryNode.class, new NumericQueryNodeBuilder()); String startDate = getLocalizedDate(2002, 1, 1, false); String endDate = getLocalizedDate(2002, 1, 4, false); StandardAnalyzer oneStopAnalyzer = new StandardAnalyzer(); qp.setAnalyzer(oneStopAnalyzer); Query a = qp.parse(date:[ + startDate + TO + endDate + ], null); System.out.print(a); } {code} NumericRange support for new query parser - Key: LUCENE-1768 URL: https://issues.apache.org/jira/browse/LUCENE-1768 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9 It would be good to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. There is currently no way to extract if a field is numeric from the index, so the user will have to configure the FieldConfig objects in the ConfigHandler. But if this is done, it will not be that difficult to implement the rest. The only difference between the current handling of RangeQuery is then the instantiation of the correct Query type and conversion of the entered numeric values (simple Number.valueOf(...) cast of the user entered numbers). Evenerything else is identical, NumericRangeQuery also supports the MTQ rewrite modes
[jira] Issue Comment Edited: (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740718#action_12740718 ] Luis Alves edited comment on LUCENE-1768 at 8/7/09 1:43 PM: {quote} Neither is your version with rangeTypes.put(money, RangeUtils.getType(RangeUtils.NUMERIC... That's the application specific configuration code and doesn't need (or want) to be committed. {quote} You are correct, I was describing the use case from the user perspective. That code was a example how to use the API's if we implement them in the future, those API's are not currently available. {quote} Directly instantiating the query you want is simple, ultimately configurable, and avoids adding a ton of unnecessary classes or methods that need to be kept in sync with everything that a user may want to do. {quote} I'm not sure what to say here. So I'll point to the documentation that we currently have: You can read https://issues.apache.org/jira/secure/attachment/12410046/QueryParser_restructure_meetup_june2009_v2.pdf and the java docs for package org.apache.lucene.queryParser.core class org.apache.lucene.queryParser.standard.StandardQueryParser You can also look at TestSpanQueryParserSimpleSample junit for another example how the API's can be used, in a completely different way. The new QueryParser was designed to be extensible, allow the implementation of languages extensions or different languages, and have reusable components like the processors and builders We use SyntaxParsers, Processors and Builders, all are replaceable components at runtime. Any user can build it's own pipeline and create new processors, builders, querynodes and integrate them with the existing ones to create the features they require. Some of the features are: - Syntax Tree optimization - Syntax Tree expansion - Syntax Tree validation and error reporting - Tokenization and normalization of the query - Makes it easy to create extensions - Support for translation of error messages - Allows users to plug and play processors and builders, without having to modify lucene code. - Allow lucene users to implement features much faster - Allow users to change default behavior in a easy way without having to modify lucene code. {quote} Is there a simple way to provide a custom QueryBuilder for range queries (or any other query type?) I'm sure there must be, but there are so many classes in the new QP, I'm having a little difficulty finding my way around. {quote} Below is the java code for option 2. It's not the recomend way to use the new queryparser, but is the shortest way to do what you want. {code} class NumericQueryNodeBuilder extends RangeQueryNodeBuilder { public TermRangeQuery build(QueryNode queryNode) throws QueryNodeException { RangeQueryNode rangeNode = (RangeQueryNode) queryNode; if (rangeNode.getField().toString().equals(money)) { // do whatever you need here with queryNode. return new NumericRangeQuery(field,...) } else { return super.build(queryNode); } } } public void testNewRangeQueryBuilder() throws Exception { StandardQueryParser qp = new StandardQueryParser(); QueryTreeBuilder builder = (QueryTreeBuilder)qp.getQueryBuilder(); builder.setBuilder(RangeQueryNode.class, new NumericQueryNodeBuilder()); String startDate = getLocalizedDate(2002, 1, 1, false); String endDate = getLocalizedDate(2002, 1, 4, false); StandardAnalyzer oneStopAnalyzer = new StandardAnalyzer(); qp.setAnalyzer(oneStopAnalyzer); Query a = qp.parse(date:[ + startDate + TO + endDate + ], null); System.out.print(a); } {code} was (Author: lafa): {quote} Neither is your version with rangeTypes.put(money, RangeUtils.getType(RangeUtils.NUMERIC... That's the application specific configuration code and doesn't need (or want) to be committed. {quote} You are correct, I was describing the use case from the user perspective. That code was a example how to use the API's if we implement them in the future, those API's are not currently available. {quote} Directly instantiating the query you want is simple, ultimately configurable, and avoids adding a ton of unnecessary classes or methods that need to be kept in sync with everything that a user may want to do. {quote} I'm not sure what to say here. So I'll point to the documentation that we currently have: You can read https://issues.apache.org/jira/secure/attachment/12410046/QueryParser_restructure_meetup_june2009_v2.pdf and the java docs for package org.apache.lucene.queryParser.core class org.apache.lucene.queryParser.standard.StandardQueryParser You can also look at TestSpanQueryParserSimpleSample junit for another example how the API's can be used, in a completely different way. The new QueryParser was designed to be extensible, allow the
[jira] Commented: (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740728#action_12740728 ] Uwe Schindler commented on LUCENE-1768: --- To go back to the idea why I opened the issue (and I think, this is also Mike's intention): From what you see on java-user, where users asking questions about how to use Lucene: Most users are not aware of the fact, that they can create Query classes themselves. Most examplecode on the list is just: I have such query string and I pass it to lucene and it does not work as exspected. It is hard to explain them, that they should simply not use a query parser for their queries and just instantiate the query classes directly. For such users it is even harder to customize this query parser. My intention behind is: Make the RangeQueryNodeBuilder somehow configureable like Luis proposed, that you can set the type of a field (what we do not have in Lucene currently). If the type is undefined or explicite set to string/term, create a TermRangeQuery. If it is set to any numeric type, create a NumericRangeQuery.newXxxRange(field,). The same can currently be done by the original Lucene query parser, but only for dates (and it is really a hack using this DateField class). I simply want to extend it that you can say: this field is of type 'int' and create automatically the correct range query for it. Because the old query parser is now deprecated, I want to do it for the new one. This would also be an intention for new users to throw away the old parser and use the new one, because it can be configured easily to create numeric ranges in addition to term ranges. NumericRange support for new query parser - Key: LUCENE-1768 URL: https://issues.apache.org/jira/browse/LUCENE-1768 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9 It would be good to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. There is currently no way to extract if a field is numeric from the index, so the user will have to configure the FieldConfig objects in the ConfigHandler. But if this is done, it will not be that difficult to implement the rest. The only difference between the current handling of RangeQuery is then the instantiation of the correct Query type and conversion of the entered numeric values (simple Number.valueOf(...) cast of the user entered numbers). Evenerything else is identical, NumericRangeQuery also supports the MTQ rewrite modes (as it is a MTQ). Another thing is a change in Date semantics. There are some strange flags in the current parser that tells it how to handle dates. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: SpanQuery and Spans optimizations
On Aug 6, 2009, at 5:09 PM, Grant Ingersoll wrote: On Aug 6, 2009, at 5:06 PM, Shai Erera wrote: Only w/ ScoreDocs we reuse the same instance. So I guess we'd like to do the same here. Seems like providing a TopSpansCollector is what you want, only unlike TopFieldCollector which populates the fields post search, you'd like to do it during search. Bingo, but I think the collection functionality needs to be on Collector, as I'd hate to have to lose out on functionality that the other impls have to offer, or have to recreate them. Hmm, maybe I can get at this info from the setScorer capabilities. Then I would just need a place to hang the data... Maybe would just take having the SpanScorer implementation provide just a wee bit more access to structures... - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740798#action_12740798 ] Bill Bell commented on LUCENE-1781: --- Everything is working except when you use a large area like 1 miles. I get no results at this distance when crossing the anti-meridian (180 degrees). Most of the time this is fine, but specifically when -181 becomes 178 there appears to be an issue somewhere else in the code and nothing is returned. I believe this code is good, the issue is somewhere else. Maybe lower left is no longer lower left, and upper right is no longer upper right? The box is probably too big for the other algorithms. Not sure what else to check. How it is being used? Regardless this section appears right. Start here: ctr 39.3209801,-111.0937311 Distance: 7200 boxCorners: before norm 22.100623434197477,21.15746490712925 boxCorners: normLng 22.100623434197477,21.15746490712925 boxCorners: distance: d 7200.0 boxCorners: ctr 39.3209801,-111.0937311 boxCorners: normLat 22.100623434197477,21.15746490712925 boxCorners: before norm -43.22565169384456,-181.34791600031286 -- note -181 boxCorners: normLng -43.22565169384456,178.65208399968714 -- Note 178 boxCorners: distance: d 7200.0 boxCorners: ctr 39.3209801,-111.0937311 boxCorners: normLat -43.22565169384456,178.65208399968714 corner 1054.4155877284288 I do get results from Hawaii crossing this at 10,000 miles. boxCorners: before norm 6.201324582593365,-0.012709669713800501 boxCorners: normLng 6.201324582593365,-0.012709669713800501 boxCorners: distance: d 1.0 boxCorners: ctr 19.8986819,-155.6658568 boxCorners: normLat 6.201324582593365,-0.012709669713800501 boxCorners: before norm -41.508634930577436,-302.4840293070323 -- note -302 boxCorners: normLng -41.508634930577436,57.5159706929677 -- note 57 boxCorners: distance: d 1.0 boxCorners: ctr 19.8986819,-155.6658568 boxCorners: normLat -41.508634930577436,57.5159706929677 corner 1464.4660940672625 Large distances in Spatial go beyond Prime MEridian --- Key: LUCENE-1781 URL: https://issues.apache.org/jira/browse/LUCENE-1781 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Affects Versions: 2.9 Environment: All Reporter: Bill Bell Assignee: Michael McCandless Fix For: 3.1 Attachments: LLRect.java, LLRect.java, LUCENE-1781.patch http://amidev.kaango.com/solr/core0/select?fl=*json.nl=mapwt=jsonradius=5000rows=20lat=39.5500507q=hondaqt=geolong=-105.7820674 Get an error when using Solr when distance is calculated for the boundary box past 90 degrees. Aug 4, 2009 1:54:00 PM org.apache.solr.common.SolrException log SEVERE: java.lang.IllegalArgumentException: Illegal lattitude value 93.1558669413734 at org.apache.lucene.spatial.geometry.FloatLatLng.init(FloatLatLng.java:26) at org.apache.lucene.spatial.geometry.shape.LLRect.createBox(LLRect.java:93) at org.apache.lucene.spatial.tier.DistanceUtils.getBoundary(DistanceUtils.java:50) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoxShape(CartesianPolyFilterBuilder.java:47) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoundingArea(CartesianPolyFilterBuilder.java:109) at org.apache.lucene.spatial.tier.DistanceQueryBuilder.init(DistanceQueryBuilder.java:61) at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:151) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1328) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at
[jira] Updated: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bell updated LUCENE-1781: -- Attachment: (was: LLRect.java) Large distances in Spatial go beyond Prime MEridian --- Key: LUCENE-1781 URL: https://issues.apache.org/jira/browse/LUCENE-1781 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Affects Versions: 2.9 Environment: All Reporter: Bill Bell Assignee: Michael McCandless Fix For: 3.1 Attachments: LLRect.java, LLRect.java, LUCENE-1781.patch http://amidev.kaango.com/solr/core0/select?fl=*json.nl=mapwt=jsonradius=5000rows=20lat=39.5500507q=hondaqt=geolong=-105.7820674 Get an error when using Solr when distance is calculated for the boundary box past 90 degrees. Aug 4, 2009 1:54:00 PM org.apache.solr.common.SolrException log SEVERE: java.lang.IllegalArgumentException: Illegal lattitude value 93.1558669413734 at org.apache.lucene.spatial.geometry.FloatLatLng.init(FloatLatLng.java:26) at org.apache.lucene.spatial.geometry.shape.LLRect.createBox(LLRect.java:93) at org.apache.lucene.spatial.tier.DistanceUtils.getBoundary(DistanceUtils.java:50) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoxShape(CartesianPolyFilterBuilder.java:47) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoundingArea(CartesianPolyFilterBuilder.java:109) at org.apache.lucene.spatial.tier.DistanceQueryBuilder.init(DistanceQueryBuilder.java:61) at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:151) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1328) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bell updated LUCENE-1781: -- Attachment: LLRect.java Added flipping for 90 degrees if needed. See comment. Large distances in Spatial go beyond Prime MEridian --- Key: LUCENE-1781 URL: https://issues.apache.org/jira/browse/LUCENE-1781 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Affects Versions: 2.9 Environment: All Reporter: Bill Bell Assignee: Michael McCandless Fix For: 3.1 Attachments: LLRect.java, LLRect.java, LUCENE-1781.patch http://amidev.kaango.com/solr/core0/select?fl=*json.nl=mapwt=jsonradius=5000rows=20lat=39.5500507q=hondaqt=geolong=-105.7820674 Get an error when using Solr when distance is calculated for the boundary box past 90 degrees. Aug 4, 2009 1:54:00 PM org.apache.solr.common.SolrException log SEVERE: java.lang.IllegalArgumentException: Illegal lattitude value 93.1558669413734 at org.apache.lucene.spatial.geometry.FloatLatLng.init(FloatLatLng.java:26) at org.apache.lucene.spatial.geometry.shape.LLRect.createBox(LLRect.java:93) at org.apache.lucene.spatial.tier.DistanceUtils.getBoundary(DistanceUtils.java:50) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoxShape(CartesianPolyFilterBuilder.java:47) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoundingArea(CartesianPolyFilterBuilder.java:109) at org.apache.lucene.spatial.tier.DistanceQueryBuilder.init(DistanceQueryBuilder.java:61) at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:151) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1328) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740798#action_12740798 ] Bill Bell edited comment on LUCENE-1781 at 8/7/09 5:48 PM: --- Everything is working except when you use a large area like 1 miles. I get no results at this distance when crossing the anti-meridian (180 degrees). Most of the time this is fine, but specifically when -181 becomes 178 there appears to be an issue somewhere else in the code and nothing is returned. I believe this code is good, the issue is somewhere else. Maybe lower left is no longer lower left, and upper right is no longer upper right? The box is probably too big for the other algorithms. Not sure what else to check. How it is being used? Regardless this section appears right. Start here: ctr 39.3209801,-111.0937311 Distance: 7200 boxCorners: before norm 22.100623434197477,21.15746490712925 boxCorners: normLng 22.100623434197477,21.15746490712925 boxCorners: distance: d 7200.0 boxCorners: ctr 39.3209801,-111.0937311 boxCorners: normLat 22.100623434197477,21.15746490712925 boxCorners: before norm -43.22565169384456,-181.34791600031286 note -181 boxCorners: normLng -43.22565169384456,178.65208399968714 Note 178 boxCorners: distance: d 7200.0 boxCorners: ctr 39.3209801,-111.0937311 boxCorners: normLat -43.22565169384456,178.65208399968714 corner 1054.4155877284288 I do get results from Hawaii crossing this at 10,000 miles. This works: boxCorners: before norm 6.201324582593365,-0.012709669713800501 boxCorners: normLng 6.201324582593365,-0.012709669713800501 boxCorners: distance: d 1.0 boxCorners: ctr 19.8986819,-155.6658568 boxCorners: normLat 6.201324582593365,-0.012709669713800501 boxCorners: before norm -41.508634930577436,-302.4840293070323 note -302 boxCorners: normLng -41.508634930577436,57.5159706929677 note 57 boxCorners: distance: d 1.0 boxCorners: ctr 19.8986819,-155.6658568 boxCorners: normLat -41.508634930577436,57.5159706929677 corner 1464.4660940672625 was (Author: billnbell): Everything is working except when you use a large area like 1 miles. I get no results at this distance when crossing the anti-meridian (180 degrees). Most of the time this is fine, but specifically when -181 becomes 178 there appears to be an issue somewhere else in the code and nothing is returned. I believe this code is good, the issue is somewhere else. Maybe lower left is no longer lower left, and upper right is no longer upper right? The box is probably too big for the other algorithms. Not sure what else to check. How it is being used? Regardless this section appears right. Start here: ctr 39.3209801,-111.0937311 Distance: 7200 boxCorners: before norm 22.100623434197477,21.15746490712925 boxCorners: normLng 22.100623434197477,21.15746490712925 boxCorners: distance: d 7200.0 boxCorners: ctr 39.3209801,-111.0937311 boxCorners: normLat 22.100623434197477,21.15746490712925 boxCorners: before norm -43.22565169384456,-181.34791600031286 -- note -181 boxCorners: normLng -43.22565169384456,178.65208399968714 -- Note 178 boxCorners: distance: d 7200.0 boxCorners: ctr 39.3209801,-111.0937311 boxCorners: normLat -43.22565169384456,178.65208399968714 corner 1054.4155877284288 I do get results from Hawaii crossing this at 10,000 miles. boxCorners: before norm 6.201324582593365,-0.012709669713800501 boxCorners: normLng 6.201324582593365,-0.012709669713800501 boxCorners: distance: d 1.0 boxCorners: ctr 19.8986819,-155.6658568 boxCorners: normLat 6.201324582593365,-0.012709669713800501 boxCorners: before norm -41.508634930577436,-302.4840293070323 -- note -302 boxCorners: normLng -41.508634930577436,57.5159706929677 -- note 57 boxCorners: distance: d 1.0 boxCorners: ctr 19.8986819,-155.6658568 boxCorners: normLat -41.508634930577436,57.5159706929677 corner 1464.4660940672625 Large distances in Spatial go beyond Prime MEridian --- Key: LUCENE-1781 URL: https://issues.apache.org/jira/browse/LUCENE-1781 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Affects Versions: 2.9 Environment: All Reporter: Bill Bell Assignee: Michael McCandless Fix For: 3.1 Attachments: LLRect.java, LLRect.java, LUCENE-1781.patch http://amidev.kaango.com/solr/core0/select?fl=*json.nl=mapwt=jsonradius=5000rows=20lat=39.5500507q=hondaqt=geolong=-105.7820674 Get an error when using Solr when distance is calculated for the boundary box past 90 degrees. Aug 4, 2009 1:54:00 PM org.apache.solr.common.SolrException log SEVERE: java.lang.IllegalArgumentException: Illegal lattitude value 93.1558669413734 at org.apache.lucene.spatial.geometry.FloatLatLng.init(FloatLatLng.java:26) at
[jira] Issue Comment Edited: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740798#action_12740798 ] Bill Bell edited comment on LUCENE-1781 at 8/7/09 5:59 PM: --- Everything is working except when you use a large area like 1 miles. I get no results at this distance when crossing the anti-meridian (180 degrees). Most of the time this is fine, but specifically when -181 becomes 178 there appears to be an issue somewhere else in the code and nothing is returned. I believe this code is good, the issue is somewhere else. Maybe lower left is no longer lower left, and upper right is no longer upper right? The box is probably too big for the other algorithms. Not sure what else to check. How it is being used? Regardless this section appears right. Start here: ctr 39.3209801,-111.0937311 Distance: 7200 boxCorners: before norm 22.100623434197477,21.15746490712925 boxCorners: normLng 22.100623434197477,21.15746490712925 boxCorners: distance: d 7200.0 boxCorners: ctr 39.3209801,-111.0937311 boxCorners: normLat 22.100623434197477,21.15746490712925 boxCorners: before norm -43.22565169384456,-181.34791600031286 note -181 boxCorners: normLng -43.22565169384456,178.65208399968714 Note 178 boxCorners: distance: d 7200.0 boxCorners: ctr 39.3209801,-111.0937311 boxCorners: normLat -43.22565169384456,178.65208399968714 corner 1054.4155877284288 I do get results from Hawaii crossing this at 10,000 miles. This works: boxCorners: before norm 6.201324582593365,-0.012709669713800501 boxCorners: normLng 6.201324582593365,-0.012709669713800501 boxCorners: distance: d 1.0 boxCorners: ctr 19.8986819,-155.6658568 boxCorners: normLat 6.201324582593365,-0.012709669713800501 boxCorners: before norm -41.508634930577436,-302.4840293070323 note -302 boxCorners: normLng -41.508634930577436,57.5159706929677 note 57 boxCorners: distance: d 1.0 boxCorners: ctr 19.8986819,-155.6658568 boxCorners: normLat -41.508634930577436,57.5159706929677 corner 1464.4660940672625 Note: This does not get any results. Note the 4.815339955430126 difference. Very weird. boxCorners: distance: d 10500.0 boxCorners: ctr 19.8986819,-155.6658568 boxCorners: normLat 0.8114618951495843,4.815339955430126 boxCorners: before norm -37.88735182208723,-310.6222696081052 boxCorners: normLng -37.88735182208723,49.37773039189477 boxCorners: distance: d 10500.0 boxCorners: ctr 19.8986819,-155.6658568 boxCorners: normLat -37.88735182208723,49.37773039189477 corner 1537.6893987706253 was (Author: billnbell): Everything is working except when you use a large area like 1 miles. I get no results at this distance when crossing the anti-meridian (180 degrees). Most of the time this is fine, but specifically when -181 becomes 178 there appears to be an issue somewhere else in the code and nothing is returned. I believe this code is good, the issue is somewhere else. Maybe lower left is no longer lower left, and upper right is no longer upper right? The box is probably too big for the other algorithms. Not sure what else to check. How it is being used? Regardless this section appears right. Start here: ctr 39.3209801,-111.0937311 Distance: 7200 boxCorners: before norm 22.100623434197477,21.15746490712925 boxCorners: normLng 22.100623434197477,21.15746490712925 boxCorners: distance: d 7200.0 boxCorners: ctr 39.3209801,-111.0937311 boxCorners: normLat 22.100623434197477,21.15746490712925 boxCorners: before norm -43.22565169384456,-181.34791600031286 note -181 boxCorners: normLng -43.22565169384456,178.65208399968714 Note 178 boxCorners: distance: d 7200.0 boxCorners: ctr 39.3209801,-111.0937311 boxCorners: normLat -43.22565169384456,178.65208399968714 corner 1054.4155877284288 I do get results from Hawaii crossing this at 10,000 miles. This works: boxCorners: before norm 6.201324582593365,-0.012709669713800501 boxCorners: normLng 6.201324582593365,-0.012709669713800501 boxCorners: distance: d 1.0 boxCorners: ctr 19.8986819,-155.6658568 boxCorners: normLat 6.201324582593365,-0.012709669713800501 boxCorners: before norm -41.508634930577436,-302.4840293070323 note -302 boxCorners: normLng -41.508634930577436,57.5159706929677 note 57 boxCorners: distance: d 1.0 boxCorners: ctr 19.8986819,-155.6658568 boxCorners: normLat -41.508634930577436,57.5159706929677 corner 1464.4660940672625 Large distances in Spatial go beyond Prime MEridian --- Key: LUCENE-1781 URL: https://issues.apache.org/jira/browse/LUCENE-1781 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Affects Versions: 2.9 Environment: All Reporter: Bill Bell Assignee: Michael McCandless Fix For: 3.1 Attachments: LLRect.java, LLRect.java,
[jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740831#action_12740831 ] Bill Bell commented on LUCENE-1781: --- I did some additional research. The current Spatial ONLY works for one hemisphere at a time. It does a simple min/max for lat/long measurements. This makes the whole solution not useful between one hemisphere and another. Specifically Rectangle.java, getBoundary, etc needs to work on a circle. The first step is to build a rectangle when lat goes from -90 to +89, and long goes from -180 to +179, etc. new Rectangle(ll.getLng(), ll.getLat(), ur.getLng(), ur.getLat()) At least LLRect appears correct now... Next step is to fix the CartesianPolyFilterBuilder. Large distances in Spatial go beyond Prime MEridian --- Key: LUCENE-1781 URL: https://issues.apache.org/jira/browse/LUCENE-1781 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Affects Versions: 2.9 Environment: All Reporter: Bill Bell Assignee: Michael McCandless Fix For: 3.1 Attachments: LLRect.java, LLRect.java, LUCENE-1781.patch http://amidev.kaango.com/solr/core0/select?fl=*json.nl=mapwt=jsonradius=5000rows=20lat=39.5500507q=hondaqt=geolong=-105.7820674 Get an error when using Solr when distance is calculated for the boundary box past 90 degrees. Aug 4, 2009 1:54:00 PM org.apache.solr.common.SolrException log SEVERE: java.lang.IllegalArgumentException: Illegal lattitude value 93.1558669413734 at org.apache.lucene.spatial.geometry.FloatLatLng.init(FloatLatLng.java:26) at org.apache.lucene.spatial.geometry.shape.LLRect.createBox(LLRect.java:93) at org.apache.lucene.spatial.tier.DistanceUtils.getBoundary(DistanceUtils.java:50) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoxShape(CartesianPolyFilterBuilder.java:47) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoundingArea(CartesianPolyFilterBuilder.java:109) at org.apache.lucene.spatial.tier.DistanceQueryBuilder.init(DistanceQueryBuilder.java:61) at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:151) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1328) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org