[jira] [Created] (SOLR-7539) Add a QueryAutofilteringComponent for query introspection using indexed metadata
Ted Sullivan created SOLR-7539: -- Summary: Add a QueryAutofilteringComponent for query introspection using indexed metadata Key: SOLR-7539 URL: https://issues.apache.org/jira/browse/SOLR-7539 Project: Solr Issue Type: New Feature Reporter: Ted Sullivan Priority: Minor The Query Autofiltering Component provides a method of inferring user intent by matching noun phrases that are typically used for faceted-navigation into Solr filter or boost queries (depending on configuration settings) so that more precise user queries can be met with more precise results. The algorithm uses a longest contiguous phrase match strategy which allows it to disambiguate queries where single terms are ambiguous but phrases are not. It will work when there is structured information in the form of String fields that are normally used for faceted navigation. It works across fields by building a map of search term to index field using the Lucene FieldCache (UninvertingReader). This enables users to create free text, multi-term queries that combine attributes across facet fields - as if they had searched and then navigated through several facet layers. To address the problem of exact-match only semantics of String fields, support for synonyms (including multi-term synonyms) and stemming was added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Recent Java 9 commit (e5b66323ae45) breaks fsync on directory
Hi Brian, many thanks for opening this issue! I agree with Alan that adding an OpenOption would be a good possibility. In any case, as Files only contains static methods, we could still add a “utility” method that forces file/directory buffers to disk, that just uses the new open option under the hood. By that FileSystem SPI interfaces do not need to be modified and just need to take care about the new OpenOption (if supported). There is one additional issue we found recently on MacOSX, but this is only slightly related to the one here. It looks like on MacOSX, FileChannel#force is mostly a noop regarding syncing data to disk, because the underlying operating system requires a “special” fnctl to force buffers to disk device: https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fsync.2.html: For applications that require tighter guarantees about the integrity of their data, Mac OS X provides the F_FULLFSYNC fcntl. The F_FULLFSYNC fcntl asks the drive to flush all buffered data to permanent storage. Applications, such as databases, that require a strict ordering of writes should use F_FULLFSYNC to ensure that their data is written in the order they expect. Please see fcntl(2) for more detail. This different behavior breaks the guarantees of FileChannel#force on MacOSX (as described in Javadocs). So the MacOSX FileSystemProvider implementation should use this special fnctl to force file buffers to disk. Should I open a bug report on bugs.sun.com? Uwe - Uwe Schindler uschind...@apache.org ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/ From: nio-dev [mailto:nio-dev-boun...@openjdk.java.net] On Behalf Of Brian Burkhalter Sent: Wednesday, May 13, 2015 12:26 AM To: nio-dev Cc: rory.odonn...@oracle.com; dev@lucene.apache.org; Balchandra Vaidya Subject: Re: Recent Java 9 commit (e5b66323ae45) breaks fsync on directory I have created an enhancement issue here: https://bugs.openjdk.java.net/browse/JDK-8080235 Brian On May 12, 2015, at 3:10 PM, Brian Burkhalter brian.burkhal...@oracle.com wrote: I will create an issue now and post the ID.
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541638#comment-14541638 ] Noble Paul commented on SOLR-6220: -- [~mewmewball] That's the plan it should be added for addReplica , createShard and splitShard Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one replica of EACH shard node:*,shard:*,replica:1- #in each node there should be a max one replica of ANY shard
[jira] [Commented] (SOLR-7143) MoreLikeThis Query Parser does not handle multiple field names
[ https://issues.apache.org/jira/browse/SOLR-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541695#comment-14541695 ] Jens Wille commented on SOLR-7143: -- Hi Anshum, can you say anything about the status of this issue? Can you give me any pointers as to what I might be able to do? MoreLikeThis Query Parser does not handle multiple field names -- Key: SOLR-7143 URL: https://issues.apache.org/jira/browse/SOLR-7143 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 5.0 Reporter: Jens Wille Assignee: Anshum Gupta Attachments: SOLR-7143.patch, SOLR-7143.patch The newly introduced MoreLikeThis Query Parser (SOLR-6248) does not return any results when supplied with multiple fields in the {{qf}} parameter. To reproduce within the techproducts example, compare: {code} curl 'http://localhost:8983/solr/techproducts/select?q=%7B!mlt+qf=name%7DMA147LL/A' curl 'http://localhost:8983/solr/techproducts/select?q=%7B!mlt+qf=features%7DMA147LL/A' curl 'http://localhost:8983/solr/techproducts/select?q=%7B!mlt+qf=name,features%7DMA147LL/A' {code} The first two queries return 8 and 5 results, respectively. The third query doesn't return any results (not even the matched document). In contrast, the MoreLikeThis Handler works as expected (accounting for the default {{mintf}} and {{mindf}} values in SimpleMLTQParser): {code} curl 'http://localhost:8983/solr/techproducts/mlt?q=id:MA147LL/Amlt.fl=namemlt.mintf=1mlt.mindf=1' curl 'http://localhost:8983/solr/techproducts/mlt?q=id:MA147LL/Amlt.fl=featuresmlt.mintf=1mlt.mindf=1' curl 'http://localhost:8983/solr/techproducts/mlt?q=id:MA147LL/Amlt.fl=name,featuresmlt.mintf=1mlt.mindf=1' {code} After adding the following line to {{example/techproducts/solr/techproducts/conf/solrconfig.xml}}: {code:language=XML} requestHandler name=/mlt class=solr.MoreLikeThisHandler / {code} The first two queries return 7 and 4 results, respectively (excluding the matched document). The third query returns 7 results, as one would expect. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7539) Add a QueryAutofilteringComponent for query introspection using indexed metadata
[ https://issues.apache.org/jira/browse/SOLR-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541721#comment-14541721 ] Ted Sullivan commented on SOLR-7539: Initial patch uploaded. I have published a blog article explaining the rationale of this component, etc at http://lucidworks.com/blog/query-autofiltering-revisited-can-precise/ Add a QueryAutofilteringComponent for query introspection using indexed metadata Key: SOLR-7539 URL: https://issues.apache.org/jira/browse/SOLR-7539 Project: Solr Issue Type: New Feature Reporter: Ted Sullivan Priority: Minor Attachments: SOLR-7539.patch The Query Autofiltering Component provides a method of inferring user intent by matching noun phrases that are typically used for faceted-navigation into Solr filter or boost queries (depending on configuration settings) so that more precise user queries can be met with more precise results. The algorithm uses a longest contiguous phrase match strategy which allows it to disambiguate queries where single terms are ambiguous but phrases are not. It will work when there is structured information in the form of String fields that are normally used for faceted navigation. It works across fields by building a map of search term to index field using the Lucene FieldCache (UninvertingReader). This enables users to create free text, multi-term queries that combine attributes across facet fields - as if they had searched and then navigated through several facet layers. To address the problem of exact-match only semantics of String fields, support for synonyms (including multi-term synonyms) and stemming was added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7538) Get count of facet.pivot on distinct combination of fileds
Nagabhushan created SOLR-7538: - Summary: Get count of facet.pivot on distinct combination of fileds Key: SOLR-7538 URL: https://issues.apache.org/jira/browse/SOLR-7538 Project: Solr Issue Type: Task Environment: 4.10 Reporter: Nagabhushan Priority: Trivial Hi I need to get action wise count in a campaign. Using facet.pivot=campaignId,action to get it. Ex : campaignId,id,action 1,1,a 1,1,a 1,2,a 1,2,b When I do facet.pivot I get {a:3,b:1}, Facet considers duplicate rows in count. I need distinct by combination of campaignId,id,action which is {a:2,b:1} Thanks, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7508) SolrParams.toMultiMap() does not handle arrays
[ https://issues.apache.org/jira/browse/SOLR-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Scheffler updated SOLR-7508: --- Attachment: SOLRJ-7508.patch Provided patch to fix the issue. SolrParams.toMultiMap() does not handle arrays -- Key: SOLR-7508 URL: https://issues.apache.org/jira/browse/SOLR-7508 Project: Solr Issue Type: Bug Components: SolrJ Affects Versions: 5.0, 5.1 Reporter: Thomas Scheffler Labels: easyfix, easytest Attachments: SOLRJ-7508.patch Following JUnit test to show what I mean: {code} ModifiableSolrParams params = new ModifiableSolrParams(); String[] paramValues = new String[] { title:junit, author:john }; String paramName = fq; params.add(paramName, paramValues); NamedListObject namedList = params.toNamedList(); assertEquals(parameter values are not equal, paramValues, namedList.get(paramName)); MapString, String[] multiMap = SolrParams.toMultiMap(namedList); assertEquals(Expected + paramValues.length + values, paramValues.length, multiMap.get(paramName).length); {code} The first {{assertEquals()}} will run fine, while the last one triggers the error. Suddenly the length of the array is 1 and it's value of {{fq}} is like {{[Ljava.lang.String;@6f09c9c0}}. Looking into the code I see that the toMultiMap() method does not even look for arrays. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7539) Add a QueryAutofilteringComponent for query introspection using indexed metadata
[ https://issues.apache.org/jira/browse/SOLR-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Sullivan updated SOLR-7539: --- Attachment: SOLR-7539.patch Add a QueryAutofilteringComponent for query introspection using indexed metadata Key: SOLR-7539 URL: https://issues.apache.org/jira/browse/SOLR-7539 Project: Solr Issue Type: New Feature Reporter: Ted Sullivan Priority: Minor Attachments: SOLR-7539.patch The Query Autofiltering Component provides a method of inferring user intent by matching noun phrases that are typically used for faceted-navigation into Solr filter or boost queries (depending on configuration settings) so that more precise user queries can be met with more precise results. The algorithm uses a longest contiguous phrase match strategy which allows it to disambiguate queries where single terms are ambiguous but phrases are not. It will work when there is structured information in the form of String fields that are normally used for faceted navigation. It works across fields by building a map of search term to index field using the Lucene FieldCache (UninvertingReader). This enables users to create free text, multi-term queries that combine attributes across facet fields - as if they had searched and then navigated through several facet layers. To address the problem of exact-match only semantics of String fields, support for synonyms (including multi-term synonyms) and stemming was added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541840#comment-14541840 ] Uwe Schindler commented on LUCENE-6450: --- bq. As a side note, I'm finishing up a patch that uses precision_step for indexing the longs at variable resolution to take advantage of the postings list and not visit every term Cool! :-) Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
[ https://issues.apache.org/jira/browse/LUCENE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541850#comment-14541850 ] Karl Wright commented on LUCENE-6480: - So my idea for an (x,y,z) based geohash is as follows: - three bits per split iteration: each splits x,y,z into a smaller range - initial range for each dimension is -1 to 1, thus size 2. - the first split determines the sign, and is thus backwards: e.g. -1 = x 0 yields bit 0, 0 = x = 1 yields bit 1. - second bit splits range, e.g. 00 means -0.5 = x 0. Questions: - Q1: how precise is it to fit in a long? A: 64/3 = 21 splits with 1 bit left over. 2/(2^21) = 2^(-20) = 6.07585906982421875 meters - Q2: how to quickly convert to a geocode value? A: need bit manipulation of mantissa and exponent for this; requires further thought (and maybe a hash change) - Q2: how to quickly convert back to usable (x,y,z) from a geocode value? A: first, geo3d has to tolerate imprecision in evaluation. It does, possibly excepting small GeoCircles. Otherwise, similar bit manipulation of mantissa and exponent in a double. Once there's a reversible packing method, it's pretty trivial to make use of all geo3d shapes. Extend Simple GeoPointField Type to 3d --- Key: LUCENE-6480 URL: https://issues.apache.org/jira/browse/LUCENE-6480 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] proposes a simple GeoPointField type to lucene core. This field uses 64bit encoding of 2 dimensional points to construct sorted term representations of GeoPoints (aka: GeoHashing). This feature investigates adding support for encoding 3 dimensional GeoPoints, either by extending GeoPointField to a Geo3DPointField or adding an additional 3d constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7540) SSLMigrationTest urlScheme isn't tested properly
Steve Rowe created SOLR-7540: Summary: SSLMigrationTest urlScheme isn't tested properly Key: SOLR-7540 URL: https://issues.apache.org/jira/browse/SOLR-7540 Project: Solr Issue Type: Bug Reporter: Steve Rowe Priority: Minor I noticed that {{SSLMigrationTest.assertReplicaInformation(urlScheme)}} only checks that a replicas' base url *starts with* the given url scheme - since the urlScheme can only be http or https, this check will always succeed when the given urlScheme is http. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7540) SSLMigrationTest urlScheme isn't tested properly
[ https://issues.apache.org/jira/browse/SOLR-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541868#comment-14541868 ] Steve Rowe commented on SOLR-7540: -- This fixes it: {noformat} Index: solr/core/src/test/org/apache/solr/cloud/SSLMigrationTest.java === --- solr/core/src/test/org/apache/solr/cloud/SSLMigrationTest.java (revision 1679199) +++ solr/core/src/test/org/apache/solr/cloud/SSLMigrationTest.java (working copy) @@ -103,7 +103,7 @@ assertEquals(Wrong number of replicas found, 4, replicas.size()); for(Replica replica : replicas) { assertTrue(Replica didn't have the proper urlScheme in the ClusterState, - StringUtils.startsWith(replica.getStr(ZkStateReader.BASE_URL_PROP), urlScheme)); + StringUtils.startsWith(replica.getStr(ZkStateReader.BASE_URL_PROP), urlScheme + :)); } } {noformat} SSLMigrationTest urlScheme isn't tested properly Key: SOLR-7540 URL: https://issues.apache.org/jira/browse/SOLR-7540 Project: Solr Issue Type: Bug Reporter: Steve Rowe Priority: Minor I noticed that {{SSLMigrationTest.assertReplicaInformation(urlScheme)}} only checks that a replicas' base url *starts with* the given url scheme - since the urlScheme can only be http or https, this check will always succeed when the given urlScheme is http. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541831#comment-14541831 ] Nicholas Knize commented on LUCENE-6450: bq. does lucene efficiently support field types of that length? Yes. This patch (and PackedQuadTree) uses longs for encoding 2d points. I went ahead and opened a separate issue [LUCENE-6480 | https://issues.apache.org/jira/browse/LUCENE-6480] for investigating the 3d case so we can carry the discussion over there. The goal for this field is to provide a framework for search so all we have to worry about is trying out different encoding techniques. As a side note, I'm finishing up a patch that uses precision_step for indexing the longs at variable resolution to take advantage of the postings list and not visit every term. The index will be slightly bigger but it should provide the foundation for faster search on large polygons and bounding boxes. I'll add mercator projection after to reduce precision error over large search regions and then switch to geo3d and benchmark. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7539) Add a QueryAutofilteringComponent for query introspection using indexed metadata
[ https://issues.apache.org/jira/browse/SOLR-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Sullivan updated SOLR-7539: --- Fix Version/s: Trunk Add a QueryAutofilteringComponent for query introspection using indexed metadata Key: SOLR-7539 URL: https://issues.apache.org/jira/browse/SOLR-7539 Project: Solr Issue Type: New Feature Reporter: Ted Sullivan Priority: Minor Fix For: Trunk Attachments: SOLR-7539.patch The Query Autofiltering Component provides a method of inferring user intent by matching noun phrases that are typically used for faceted-navigation into Solr filter or boost queries (depending on configuration settings) so that more precise user queries can be met with more precise results. The algorithm uses a longest contiguous phrase match strategy which allows it to disambiguate queries where single terms are ambiguous but phrases are not. It will work when there is structured information in the form of String fields that are normally used for faceted navigation. It works across fields by building a map of search term to index field using the Lucene FieldCache (UninvertingReader). This enables users to create free text, multi-term queries that combine attributes across facet fields - as if they had searched and then navigated through several facet layers. To address the problem of exact-match only semantics of String fields, support for synonyms (including multi-term synonyms) and stemming was added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5750) Backup/Restore API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-5750: Attachment: SOLR-5750.patch First pass at the feature. *BACKUP:* Required params - collection, name, location Example API: {{/admin/collections?action=backupname=my_backuplocation=/my_locationcollection=techproducts}} It will create a backup directory called my_location inside which it will store the following - /my_location /my_backup /shard1 /shard2 /zk_backup /zk_backup/configs/configName ( The config which was being used for the backup collection ) /zk_backup/collection_state.json ( Always store the cluster state for that collection in collection_state.json ) /backup.properties ( Metadata about the backup ) If you have setup any aliases or roles or any other special property then that will not be backed up. That might not be that useful to restore as the it could be restored in some other cluster. We can add it later if its required. *BACKUPSTATUS:* Required params - name Example API: {{/admin/collections?action=backupstatusname=my_backup}} *RESTORE:* Required params - collection, name, location Example API: {{/admin/collections?action=restorename=my_backuplocation=/my_locationcollection=techproducts_restored}} You can't restore into an existing collection. Provide a collection name where you want to restore the index into. The restore process will create the collection similar to the backed up collection and restore the indexes. Restoring in the same collection would be simple to add. But in that case we should only restore the indexes. {{RESTORESTATUS:}} Required params - name Example API: {{/admin/collections?action=restorestatusname=my_backup}} Would appreciate a review on this. I'll work on adding more tests Backup/Restore API for SolrCloud Key: SOLR-5750 URL: https://issues.apache.org/jira/browse/SOLR-5750 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Shalin Shekhar Mangar Assignee: Varun Thacker Fix For: Trunk, 5.2 Attachments: SOLR-5750.patch We should have an easy way to do backups and restores in SolrCloud. The ReplicationHandler supports a backup command which can create snapshots of the index but that is too little. The command should be able to backup: # Snapshots of all indexes or indexes from the leader or the shards # Config set # Cluster state # Cluster properties # Aliases # Overseer work queue? A restore should be able to completely restore the cloud i.e. no manual steps required other than bringing nodes back up or setting up a new cloud cluster. SOLR-5340 will be a part of this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7541) CollectionsHandler#createNodeIfNotExists is a duplicate of ZkCmdExecutor#ensureExists
Varun Thacker created SOLR-7541: --- Summary: CollectionsHandler#createNodeIfNotExists is a duplicate of ZkCmdExecutor#ensureExists Key: SOLR-7541 URL: https://issues.apache.org/jira/browse/SOLR-7541 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Priority: Minor Looks like CollectionsHandler#createNodeIfNotExists is a duplicate of ZkCmdExecutor#ensureExists . Both do the same thing so we could remove CollectionsHandler#createNodeIfNotExists. Also looking at {{ZkCmdExecutor#ensureExists(final String path, final byte[] data,CreateMode createMode, final SolrZkClient zkClient)}} the createMode parameter is getting discarded. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
Nicholas Knize created LUCENE-6480: -- Summary: Extend Simple GeoPointField Type to 3d Key: LUCENE-6480 URL: https://issues.apache.org/jira/browse/LUCENE-6480 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] proposes a simple GeoPointField type to lucene core. This field uses 64bit encoding of 2 dimensional points to construct sorted term representations of GeoPoints (aka: GeoHashing). This feature investigates adding support for encoding 3 dimensional GeoPoints, either by extending GeoPointField to a Geo3DPointField or adding an additional 3d constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
[ https://issues.apache.org/jira/browse/LUCENE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541933#comment-14541933 ] Nicholas Knize commented on LUCENE-6480: Its sounds much like the simple morton interleaving I'm using for the 2D case? But since you're using an extra bit for the 3rd dimension you lose precision in the horizontal direction. We could start w/ that as a phase one? Instead of worrying about the sign bit the values in the 2D case are scaled 0:360, 0:180 and divided into 32 bits per lat/lon (see GeoUtils.java). Extending to 3D divide 0:360, 0:180, 0:?? by 21 and extend BitUtil.interleave to the 3 value case. Its super fast since its done by bit twiddling using magic numbers (although the magic numbers will need to be reworked). The question is the max value of the altitude? The larger the value the less precise, but you could conceivably go as far as 3,300 (km) to cover the earth's atmosphere? Maybe that's configurable. As a phase 2 there has been some work in this area for 3 and 4d hilbert order (still using 64 bit), which will better preserve locality. (I mentioned it in a comment in the previous issue). Locality is important since it will drive the complexity of the range search and how much the postings list will actually help (e.g. stepping one unit in the 3rd dimension can result in a boundary range that requires post-filtering a significant number of high precision terms). The more I think about it, this might be efficiently done using a statically computed lookup table (we'd have to tinker)? i.e., one hilbert order for the 3d unit cube is 000, 001, 101, 100, 110, 111, 011, 010, and the order of the suboctants at each succeeding level is a permutation of this base unit cube. For example, the next rotated level (for suboctant 000) gives the binary order: 000 000, 000 010, 000 110, 000 100, 000 101, 000 111, 000 011, 000 001. There's a paper that describes how to compute the suboctant permutation rather efficiently, and it could be statically computed and represented using 1. base unit ordering, 2. substitution list. So for level 2, each suboctant ordering is: base order (000, 001, 101, 100, 110, 111, 011, 010), substitution list (2 8) (3 5), (2 8 4) (3 7 5), (2 8 4) (3 7 5), (1 3) (2 4) (5 7) (6 8), (1 3) (2 4) (5 7) (6 8), (1 5 7) (2 4 6), (1 7) (4 6). Something to think about as an enhancement. I'll try to find the paper. Extend Simple GeoPointField Type to 3d --- Key: LUCENE-6480 URL: https://issues.apache.org/jira/browse/LUCENE-6480 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] proposes a simple GeoPointField type to lucene core. This field uses 64bit encoding of 2 dimensional points to construct sorted term representations of GeoPoints (aka: GeoHashing). This feature investigates adding support for encoding 3 dimensional GeoPoints, either by extending GeoPointField to a Geo3DPointField or adding an additional 3d constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 3111 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3111/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.TestLBHttpSolrClient.testReliability Error Message: No live SolrServers available to handle this request Stack Trace: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request at __randomizedtesting.SeedInfo.seed([7DD14AAD805F40C:C615C9EC796325A5]:0) at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:576) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958) at org.apache.solr.client.solrj.TestLBHttpSolrClient.testReliability(TestLBHttpSolrClient.java:219) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:745) Caused by:
[jira] [Comment Edited] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
[ https://issues.apache.org/jira/browse/LUCENE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541933#comment-14541933 ] Nicholas Knize edited comment on LUCENE-6480 at 5/13/15 1:43 PM: - Its sounds much like the simple morton interleaving I'm using for the 2D case? But since you're using an extra bit for the 3rd dimension you lose precision in the horizontal direction. We could start w/ that as a phase one? Instead of worrying about the sign bit the values in the 2D case are scaled 0:360, 0:180 and divided into 32 bits per lat/lon (see GeoUtils.java). Extending to 3D divide 0:360, 0:180, 0:?? by 21 and extend BitUtil.interleave to the 3 value case. Its super fast since its done by bit twiddling using magic numbers (although the magic numbers will need to be reworked). The question is the max value of the altitude? The larger the value the less precise, but you could conceivably go as far as 3,300 (km) to cover the earth's atmosphere? Maybe that's configurable. As a phase 2 there has been some work in this area for 3 and 4d hilbert order (still using 64 bit), which will better preserve locality. (I mentioned it in a comment in the previous issue). Locality is important since it will drive the complexity of the range search and how much the postings list will actually help (e.g. stepping one unit in the 3rd dimension can result in a boundary range that requires post-filtering a significant number of high precision terms). The more I think about it, this might be efficiently done using a statically computed lookup table (we'd have to tinker)? i.e., one hilbert order for the 3d unit cube is 000, 001, 101, 100, 110, 111, 011, 010, and the order of the suboctants at each succeeding level is a permutation of this base unit cube. For example, the next rotated level (for suboctant 000) gives the binary order: 000 000, 000 010, 000 110, 000 100, 000 101, 000 111, 000 011, 000 001. There's a paper that describes how to compute the suboctant permutation rather efficiently, and it could be statically computed and represented using 1. base unit ordering, 2. substitution list. So for level 2, each suboctant ordering is: base order (000, 001, 101, 100, 110, 111, 011, 010), substitution list (2 8) (3 5), (2 8 4) (3 7 5), (2 8 4) (3 7 5), (1 3) (2 4) (5 7) (6 8), (1 3) (2 4) (5 7) (6 8), (1 5 7) (2 4 6), (1 7) (4 6). Something to think about as an enhancement. I'll try to find the paper as I've got this worked out in my notebook from some previous work (lol). was (Author: nknize): Its sounds much like the simple morton interleaving I'm using for the 2D case? But since you're using an extra bit for the 3rd dimension you lose precision in the horizontal direction. We could start w/ that as a phase one? Instead of worrying about the sign bit the values in the 2D case are scaled 0:360, 0:180 and divided into 32 bits per lat/lon (see GeoUtils.java). Extending to 3D divide 0:360, 0:180, 0:?? by 21 and extend BitUtil.interleave to the 3 value case. Its super fast since its done by bit twiddling using magic numbers (although the magic numbers will need to be reworked). The question is the max value of the altitude? The larger the value the less precise, but you could conceivably go as far as 3,300 (km) to cover the earth's atmosphere? Maybe that's configurable. As a phase 2 there has been some work in this area for 3 and 4d hilbert order (still using 64 bit), which will better preserve locality. (I mentioned it in a comment in the previous issue). Locality is important since it will drive the complexity of the range search and how much the postings list will actually help (e.g. stepping one unit in the 3rd dimension can result in a boundary range that requires post-filtering a significant number of high precision terms). The more I think about it, this might be efficiently done using a statically computed lookup table (we'd have to tinker)? i.e., one hilbert order for the 3d unit cube is 000, 001, 101, 100, 110, 111, 011, 010, and the order of the suboctants at each succeeding level is a permutation of this base unit cube. For example, the next rotated level (for suboctant 000) gives the binary order: 000 000, 000 010, 000 110, 000 100, 000 101, 000 111, 000 011, 000 001. There's a paper that describes how to compute the suboctant permutation rather efficiently, and it could be statically computed and represented using 1. base unit ordering, 2. substitution list. So for level 2, each suboctant ordering is: base order (000, 001, 101, 100, 110, 111, 011, 010), substitution list (2 8) (3 5), (2 8 4) (3 7 5), (2 8 4) (3 7 5), (1 3) (2 4) (5 7) (6 8), (1 3) (2 4) (5 7) (6 8), (1 5 7) (2 4 6), (1 7) (4 6). Something to think about as an enhancement. I'll try to find the paper. Extend Simple GeoPointField Type to 3d
[jira] [Updated] (SOLR-7540) SSLMigrationTest urlScheme isn't tested properly
[ https://issues.apache.org/jira/browse/SOLR-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-7540: - Description: I noticed that {{SSLMigrationTest.assertReplicaInformation(urlScheme)}} only checks that replicas' base url *starts with* the given url scheme - since the urlScheme can only be http or https, this check will always succeed when the given urlScheme is http. (was: I noticed that {{SSLMigrationTest.assertReplicaInformation(urlScheme)}} only checks that a replicas' base url *starts with* the given url scheme - since the urlScheme can only be http or https, this check will always succeed when the given urlScheme is http.) SSLMigrationTest urlScheme isn't tested properly Key: SOLR-7540 URL: https://issues.apache.org/jira/browse/SOLR-7540 Project: Solr Issue Type: Bug Reporter: Steve Rowe Priority: Minor I noticed that {{SSLMigrationTest.assertReplicaInformation(urlScheme)}} only checks that replicas' base url *starts with* the given url scheme - since the urlScheme can only be http or https, this check will always succeed when the given urlScheme is http. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
[ https://issues.apache.org/jira/browse/LUCENE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541852#comment-14541852 ] Karl Wright commented on LUCENE-6480: - I'll attach code snippets for packing and unpacking ASAP, but it may not be until this weekend. Extend Simple GeoPointField Type to 3d --- Key: LUCENE-6480 URL: https://issues.apache.org/jira/browse/LUCENE-6480 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] proposes a simple GeoPointField type to lucene core. This field uses 64bit encoding of 2 dimensional points to construct sorted term representations of GeoPoints (aka: GeoHashing). This feature investigates adding support for encoding 3 dimensional GeoPoints, either by extending GeoPointField to a Geo3DPointField or adding an additional 3d constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6213) StackOverflowException in Solr cloud's leader election
[ https://issues.apache.org/jira/browse/SOLR-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541958#comment-14541958 ] Mark Miller commented on SOLR-6213: --- Of course it should be improved. StackOverflowException in Solr cloud's leader election -- Key: SOLR-6213 URL: https://issues.apache.org/jira/browse/SOLR-6213 Project: Solr Issue Type: Bug Affects Versions: 4.10, Trunk Reporter: Dawid Weiss Priority: Critical Attachments: stackoverflow.txt This is what's causing test hangs (at least on FreeBSD, LUCENE-5786), possibly on other machines too. The problem is stack overflow from looped calls in: {code} org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:313) org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:448) org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:212) org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541960#comment-14541960 ] Nicholas Knize edited comment on LUCENE-6450 at 5/13/15 2:12 PM: - yes yes! That's the idea anyway. I've tinkered with this a bit already. It took the same amount of time to build the Automaton as it did the ranges (no surprises since it used the same logic to union binaryIntervals) but queries were on the order of 10x slower (0.8sec/query on 60M points). Thinking maybe there's some optimization to the automaton that needs to be done? I figured first make progress here and post a separate issue for the automaton WIP. was (Author: nknize): yes yes! That's the idea anyway. I've tinkered with this a bit already. It took the same amount of time to build the Automaton as it did the ranges (no surprises since it used the same logic) but queries were on the order of 10x slower (0.8sec/query on 60M points). Thinking maybe there's some optimization to the automaton that needs to be done? I figured first make progress here and post a separate issue for the automaton WIP. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
[ https://issues.apache.org/jira/browse/LUCENE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542001#comment-14542001 ] Karl Wright edited comment on LUCENE-6480 at 5/13/15 2:28 PM: -- bq. The question is the max value of the altitude? To clarify, geo3d is not using (lat,lon,altitude) tuples. When I said (x,y,z) I meant unit sphere (x,y,z), where z = sin(lat), x = cos(lat)**cos(lon), y = cos(lat)**sin(lon). The reason you'd want to pack (x,y,z) instead of just (lat,lon) is that computing cosines and sines is quite expensive, so you don't want to be constructing a geo3d.GeoPoint using lat/lon at document scoring time. Instead you'd want to unpack the (x,y,z) values directly from the Geo3DPointField. The range of *all three* parameters in this case is -1 to 1, which is how I came up with the packing resolution I did. bq. Locality is important since it will drive the complexity of the range search and how much the postings list will actually help The reason you need (x,y,z) instead of (lat,lon) at scoring time is because geo3d determines whether a point is within the shape using math that requires points to be in that form. If you do that, then the evaluation of membership is blindingly fast. The splitting proposal does have locality. was (Author: kwri...@metacarta.com): bq. The question is the max value of the altitude? To clarify, geo3d is not using (lat,lon,altitude) tuples. When I said (x,y,z) I meant unit sphere (x,y,z), where z = sin(lat), x = cos(lat)*cos(lon), y = cos(lat)*sin(lon). The reason you'd want to pack (x,y,z) instead of just (lat,lon) is that computing cosines and sines is quite expensive, so you don't want to be constructing a geo3d.GeoPoint using lat/lon at document scoring time. Instead you'd want to unpack the (x,y,z) values directly from the Geo3DPointField. The range of *all three* parameters in this case is -1 to 1, which is how I came up with the packing resolution I did. bq. Locality is important since it will drive the complexity of the range search and how much the postings list will actually help The reason you need (x,y,z) instead of (lat,lon) at scoring time is because geo3d determines whether a point is within the shape using math that requires points to be in that form. If you do that, then the evaluation of membership is blindingly fast. The splitting proposal does have locality. Extend Simple GeoPointField Type to 3d --- Key: LUCENE-6480 URL: https://issues.apache.org/jira/browse/LUCENE-6480 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] proposes a simple GeoPointField type to lucene core. This field uses 64bit encoding of 2 dimensional points to construct sorted term representations of GeoPoints (aka: GeoHashing). This feature investigates adding support for encoding 3 dimensional GeoPoints, either by extending GeoPointField to a Geo3DPointField or adding an additional 3d constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
[ https://issues.apache.org/jira/browse/LUCENE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542050#comment-14542050 ] Nicholas Knize commented on LUCENE-6480: bq. ...when I said (x,y,z) I meant unit sphere (x,y,z). Ah, yes reprojecting would be the right way. So why not just use ECEF then instead of the unit sphere? Its a better approximation of the earth. Or have you tried this and the few extra trig computations impaired performance? Could try SloppyMath in that case and evaluate the performance/precision trade off? bq. ...you are basically using recursive descent, intersecting with the ordering in the posting list.. No. Using the terms dictionary and only checking high precision terms for boundary ranges and using the postings list for lower resolution terms completely contained. bq. Is membership of a point within the shape sufficient? Core geo search is meant for simple use cases, points only, contains only. In that case, if a point is contained by a query bbox or polygon it is added to the result set. Anything more advanced than this (e.g., DE9IM) is intended for the shape module. Extend Simple GeoPointField Type to 3d --- Key: LUCENE-6480 URL: https://issues.apache.org/jira/browse/LUCENE-6480 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] proposes a simple GeoPointField type to lucene core. This field uses 64bit encoding of 2 dimensional points to construct sorted term representations of GeoPoints (aka: GeoHashing). This feature investigates adding support for encoding 3 dimensional GeoPoints, either by extending GeoPointField to a Geo3DPointField or adding an additional 3d constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
[ https://issues.apache.org/jira/browse/LUCENE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542081#comment-14542081 ] Michael McCandless commented on LUCENE-6480: I'm not following closely here (yet!) but just wanted to say: we shouldn't feel like we must use at most 8 bytes to encode lat+lon+altitude, since we are indexing into arbitrary byte[] terms in the postings ... I mean, the fewer bytes the better, but there's not a hard limit of 8. Extend Simple GeoPointField Type to 3d --- Key: LUCENE-6480 URL: https://issues.apache.org/jira/browse/LUCENE-6480 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] proposes a simple GeoPointField type to lucene core. This field uses 64bit encoding of 2 dimensional points to construct sorted term representations of GeoPoints (aka: GeoHashing). This feature investigates adding support for encoding 3 dimensional GeoPoints, either by extending GeoPointField to a Geo3DPointField or adding an additional 3d constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541948#comment-14541948 ] David Smiley commented on LUCENE-6450: -- bq. As a side note, I'm finishing up a patch that uses precision_step for indexing the longs at variable resolution to take advantage of the postings list and not visit every term. The index will be slightly bigger but it should provide the foundation for faster search on large polygons and bounding boxes. If I'm not mistaken, the term auto-prefixing that Mike worked on means we need not do that here, especially just for point data; no? Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
[ https://issues.apache.org/jira/browse/LUCENE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542001#comment-14542001 ] Karl Wright commented on LUCENE-6480: - bq. The question is the max value of the altitude? To clarify, geo3d is not using (lat,lon,altitude) tuples. When I said (x,y,z) I meant unit sphere (x,y,z), where z = sin(lat), x = cos(lat)*cos(lon), y = cos(lat)*sin(lon). The reason you'd want to pack (x,y,z) instead of just (lat,lon) is that computing cosines and sines is quite expensive, so you don't want to be constructing a geo3d.GeoPoint using lat/lon at document scoring time. Instead you'd want to unpack the (x,y,z) values directly from the Geo3DPointField. The range of *all three* parameters in this case is -1 to 1, which is how I came up with the packing resolution I did. bq. Locality is important since it will drive the complexity of the range search and how much the postings list will actually help The reason you need (x,y,z) instead of (lat,lon) at scoring time is because geo3d determines whether a point is within the shape using math that requires points to be in that form. If you do that, then the evaluation of membership is blindingly fast. The splitting proposal does have locality. Extend Simple GeoPointField Type to 3d --- Key: LUCENE-6480 URL: https://issues.apache.org/jira/browse/LUCENE-6480 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] proposes a simple GeoPointField type to lucene core. This field uses 64bit encoding of 2 dimensional points to construct sorted term representations of GeoPoints (aka: GeoHashing). This feature investigates adding support for encoding 3 dimensional GeoPoints, either by extending GeoPointField to a Geo3DPointField or adding an additional 3d constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7542) Schema API: Can't remove single dynamic copy field directive
Steve Rowe created SOLR-7542: Summary: Schema API: Can't remove single dynamic copy field directive Key: SOLR-7542 URL: https://issues.apache.org/jira/browse/SOLR-7542 Project: Solr Issue Type: Bug Affects Versions: 5.1 Reporter: Steve Rowe Fix For: 5.2 In a managed schema containing just a single dynamic copy field directive - i.e. a glob source or destination - deleting the copy field directive fails. For example, the default configset (data_driven_schema_configs) has such a schema: the {{*}}-{{\_text\_}} copy field directive is the only one. To reproduce: {noformat} bin/solr start -c bin/solr create my_solr_coll curl http://localhost:8983/solr/my_solr_coll/schema; -d'{delete-copy-field:{source:*, dest:_text_}}' {noformat} The deletion fails, and an NPE is logged: {noformat} ERROR - 2015-05-13 12:37:36.780; [my_solr_coll shard1 core_node1 my_solr_coll_shard1_replica1] org.apache.solr.common.SolrException; null:java.lang.NullPointerException at org.apache.solr.schema.IndexSchema.getCopyFieldProperties(IndexSchema.java:1450) at org.apache.solr.schema.IndexSchema.getNamedPropertyValues(IndexSchema.java:1406) at org.apache.solr.schema.IndexSchema.persist(IndexSchema.java:390) at org.apache.solr.schema.SchemaManager.doOperations(SchemaManager.java:120) at org.apache.solr.schema.SchemaManager.performOperations(SchemaManager.java:94) at org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:57) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) [...] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-7542) Schema API: Can't remove single dynamic copy field directive
[ https://issues.apache.org/jira/browse/SOLR-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe reassigned SOLR-7542: Assignee: Steve Rowe Schema API: Can't remove single dynamic copy field directive Key: SOLR-7542 URL: https://issues.apache.org/jira/browse/SOLR-7542 Project: Solr Issue Type: Bug Affects Versions: 5.1 Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 5.2 In a managed schema containing just a single dynamic copy field directive - i.e. a glob source or destination - deleting the copy field directive fails. For example, the default configset (data_driven_schema_configs) has such a schema: the {{*}}-{{\_text\_}} copy field directive is the only one. To reproduce: {noformat} bin/solr start -c bin/solr create my_solr_coll curl http://localhost:8983/solr/my_solr_coll/schema; -d'{delete-copy-field:{source:*, dest:_text_}}' {noformat} The deletion fails, and an NPE is logged: {noformat} ERROR - 2015-05-13 12:37:36.780; [my_solr_coll shard1 core_node1 my_solr_coll_shard1_replica1] org.apache.solr.common.SolrException; null:java.lang.NullPointerException at org.apache.solr.schema.IndexSchema.getCopyFieldProperties(IndexSchema.java:1450) at org.apache.solr.schema.IndexSchema.getNamedPropertyValues(IndexSchema.java:1406) at org.apache.solr.schema.IndexSchema.persist(IndexSchema.java:390) at org.apache.solr.schema.SchemaManager.doOperations(SchemaManager.java:120) at org.apache.solr.schema.SchemaManager.performOperations(SchemaManager.java:94) at org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:57) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) [...] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541960#comment-14541960 ] Nicholas Knize commented on LUCENE-6450: yes yes! That's the idea anyway. I've tinkered with this a bit already. It took the same amount of time to build the Automaton as it did the ranges (no surprises since it used the same logic) but queries were on the order of 10x slower (0.8sec/query on 60M points). Thinking maybe there's some optimization to the automaton that needs to be done? I figured first make progress here and post a separate issue for the automaton WIP. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
[ https://issues.apache.org/jira/browse/LUCENE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542001#comment-14542001 ] Karl Wright edited comment on LUCENE-6480 at 5/13/15 2:29 PM: -- bq. The question is the max value of the altitude? To clarify, geo3d is not using (lat,lon,altitude) tuples. When I said (x,y,z) I meant unit sphere (x,y,z), where z = sin(lat), x = cos(lat)cos(lon), y = cos(lat)sin(lon). The reason you'd want to pack (x,y,z) instead of just (lat,lon) is that computing cosines and sines is quite expensive, so you don't want to be constructing a geo3d.GeoPoint using lat/lon at document scoring time. Instead you'd want to unpack the (x,y,z) values directly from the Geo3DPointField. The range of *all three* parameters in this case is -1 to 1, which is how I came up with the packing resolution I did. bq. Locality is important since it will drive the complexity of the range search and how much the postings list will actually help The reason you need (x,y,z) instead of (lat,lon) at scoring time is because geo3d determines whether a point is within the shape using math that requires points to be in that form. If you do that, then the evaluation of membership is blindingly fast. The splitting proposal does have locality. was (Author: kwri...@metacarta.com): bq. The question is the max value of the altitude? To clarify, geo3d is not using (lat,lon,altitude) tuples. When I said (x,y,z) I meant unit sphere (x,y,z), where z = sin(lat), x = cos(lat)**cos(lon), y = cos(lat)**sin(lon). The reason you'd want to pack (x,y,z) instead of just (lat,lon) is that computing cosines and sines is quite expensive, so you don't want to be constructing a geo3d.GeoPoint using lat/lon at document scoring time. Instead you'd want to unpack the (x,y,z) values directly from the Geo3DPointField. The range of *all three* parameters in this case is -1 to 1, which is how I came up with the packing resolution I did. bq. Locality is important since it will drive the complexity of the range search and how much the postings list will actually help The reason you need (x,y,z) instead of (lat,lon) at scoring time is because geo3d determines whether a point is within the shape using math that requires points to be in that form. If you do that, then the evaluation of membership is blindingly fast. The splitting proposal does have locality. Extend Simple GeoPointField Type to 3d --- Key: LUCENE-6480 URL: https://issues.apache.org/jira/browse/LUCENE-6480 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] proposes a simple GeoPointField type to lucene core. This field uses 64bit encoding of 2 dimensional points to construct sorted term representations of GeoPoints (aka: GeoHashing). This feature investigates adding support for encoding 3 dimensional GeoPoints, either by extending GeoPointField to a Geo3DPointField or adding an additional 3d constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
[ https://issues.apache.org/jira/browse/LUCENE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542019#comment-14542019 ] Karl Wright commented on LUCENE-6480: - It also occurs to me that I don't fully understand how you intend to perform a fast search involving geo3d for records that are within a specified geo3d shape. Perhaps you could clarify in general terms how you would foresee doing that? I get that you are basically using recursive descent, intersecting with the ordering in the posting list, but then I get fuzzy. What kind of boolean decisions need to be made? Is membership of a point within the shape sufficient? Point me at the technique as written up elsewhere if you like... Extend Simple GeoPointField Type to 3d --- Key: LUCENE-6480 URL: https://issues.apache.org/jira/browse/LUCENE-6480 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] proposes a simple GeoPointField type to lucene core. This field uses 64bit encoding of 2 dimensional points to construct sorted term representations of GeoPoints (aka: GeoHashing). This feature investigates adding support for encoding 3 dimensional GeoPoints, either by extending GeoPointField to a Geo3DPointField or adding an additional 3d constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Recent Java 9 commit (e5b66323ae45) breaks fsync on directory
Hi Uwe, On May 13, 2015, at 2:27 AM, Uwe Schindler uschind...@apache.org wrote: many thanks for opening this issue! You’re welcome! I agree with Alan that adding an OpenOption would be a good possibility. In any case, as Files only contains static methods, we could still add a “utility” method that forces file/directory buffers to disk, that just uses the new open option under the hood. By that FileSystem SPI interfaces do not need to be modified and just need to take care about the new OpenOption (if supported). I started to investigate both avenues. Alan says he has some notes on previous work on the OpenOption avenue and I would like to see them before proceeding much further. There is one additional issue we found recently on MacOSX, but this is only slightly related to the one here. It looks like on MacOSX, FileChannel#force is mostly a noop regarding syncing data to disk, because the underlying operating system requires a “special” fnctl to force buffers to disk device: https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fsync.2.html: For applications that require tighter guarantees about the integrity of their data, Mac OS X provides the F_FULLFSYNC fcntl. The F_FULLFSYNC fcntl asks the drive to flush all buffered data to permanent storage. Applications, such as databases, that require a strict ordering of writes should use F_FULLFSYNC to ensure that their data is written in the order they expect. Please see fcntl(2) for more detail. This different behavior breaks the guarantees of FileChannel#force on MacOSX (as described in Javadocs). So the MacOSX FileSystemProvider implementation should use this special fnctl to force file buffers to disk. Thanks for mentioning this. I read all about the F_FULLSYNC situation yesterday in the OS X man pages. Should I open a bug report on bugs.sun.com? I don’t think there is any need. Perhaps we can simply handle the OS X variant under this issue unless someone objects. Thanks, Brian
[jira] [Commented] (SOLR-7531) Config API is merging certain key names together
[ https://issues.apache.org/jira/browse/SOLR-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542083#comment-14542083 ] ASF subversion and git services commented on SOLR-7531: --- Commit 1679223 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1679223 ] SOLR-7531: added a test Config API is merging certain key names together Key: SOLR-7531 URL: https://issues.apache.org/jira/browse/SOLR-7531 Project: Solr Issue Type: Bug Affects Versions: 5.0, 5.1 Reporter: Shalin Shekhar Mangar Assignee: Noble Paul Fix For: Trunk, 5.2 Starting from a new Solr 5.0 install {code} ./bin/solr start -e schemaless curl 'http://localhost:8983/solr/gettingstarted/config' config.json {code} Open config.json and note that there is a key called autoCommmitMaxDocs under the updateHandler section. {code} curl 'http://localhost:8983/solr/gettingstarted/config' -H 'Content-type:application/json' -d '{set-property : {updateHandler.autoCommit.maxDocs : 5000}}' curl 'http://localhost:8983/solr/gettingstarted/config' config.json {code} Open config.json and note that both the value of updateHandler autoCommit maxDocs and updateHandler autoCommitMaxDocs is now set to 5000 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7542) Schema API: Can't remove single dynamic copy field directive
[ https://issues.apache.org/jira/browse/SOLR-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542039#comment-14542039 ] Steve Rowe commented on SOLR-7542: -- The issue is that the schema can't be persisted once there are no more dynamic copy fields (glob copy field directives). One workaround is to first add another copy field directive (e.g. {{\*unlikely_field_suffix}}\-{{\_text\_}}). The copy field directive that you want to remove ({{*}}-{{\_text\_}} in our example) can then be successfully deleted. The fix is a null check on the internal array containing the dynamic copy fields. AFAICT, this is also a problem in schemas that start out with zero dynamic copy fields - in that case I think it won't be possible to make any schema modifications at all. Schema API: Can't remove single dynamic copy field directive Key: SOLR-7542 URL: https://issues.apache.org/jira/browse/SOLR-7542 Project: Solr Issue Type: Bug Affects Versions: 5.1 Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 5.2 In a managed schema containing just a single dynamic copy field directive - i.e. a glob source or destination - deleting the copy field directive fails. For example, the default configset (data_driven_schema_configs) has such a schema: the {{*}}-{{\_text\_}} copy field directive is the only one. To reproduce: {noformat} bin/solr start -c bin/solr create my_solr_coll curl http://localhost:8983/solr/my_solr_coll/schema; -d'{delete-copy-field:{source:*, dest:_text_}}' {noformat} The deletion fails, and an NPE is logged: {noformat} ERROR - 2015-05-13 12:37:36.780; [my_solr_coll shard1 core_node1 my_solr_coll_shard1_replica1] org.apache.solr.common.SolrException; null:java.lang.NullPointerException at org.apache.solr.schema.IndexSchema.getCopyFieldProperties(IndexSchema.java:1450) at org.apache.solr.schema.IndexSchema.getNamedPropertyValues(IndexSchema.java:1406) at org.apache.solr.schema.IndexSchema.persist(IndexSchema.java:390) at org.apache.solr.schema.SchemaManager.doOperations(SchemaManager.java:120) at org.apache.solr.schema.SchemaManager.performOperations(SchemaManager.java:94) at org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:57) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) [...] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7531) Config API is merging certain key names together
[ https://issues.apache.org/jira/browse/SOLR-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542041#comment-14542041 ] ASF subversion and git services commented on SOLR-7531: --- Commit 1679221 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1679221 ] SOLR-7531: config API shows a few keys merged together Config API is merging certain key names together Key: SOLR-7531 URL: https://issues.apache.org/jira/browse/SOLR-7531 Project: Solr Issue Type: Bug Affects Versions: 5.0, 5.1 Reporter: Shalin Shekhar Mangar Assignee: Noble Paul Fix For: Trunk, 5.2 Starting from a new Solr 5.0 install {code} ./bin/solr start -e schemaless curl 'http://localhost:8983/solr/gettingstarted/config' config.json {code} Open config.json and note that there is a key called autoCommmitMaxDocs under the updateHandler section. {code} curl 'http://localhost:8983/solr/gettingstarted/config' -H 'Content-type:application/json' -d '{set-property : {updateHandler.autoCommit.maxDocs : 5000}}' curl 'http://localhost:8983/solr/gettingstarted/config' config.json {code} Open config.json and note that both the value of updateHandler autoCommit maxDocs and updateHandler autoCommitMaxDocs is now set to 5000 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7542) Schema API: Can't remove single dynamic copy field directive
[ https://issues.apache.org/jira/browse/SOLR-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-7542: - Attachment: SOLR-7542.patch Patch with test that fails before applying the fix and succeeds afterward. I've added null checks on all access to the dynamic copy fields array in IndexSchema. Committing shortly. Schema API: Can't remove single dynamic copy field directive Key: SOLR-7542 URL: https://issues.apache.org/jira/browse/SOLR-7542 Project: Solr Issue Type: Bug Affects Versions: 5.1 Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 5.2 Attachments: SOLR-7542.patch In a managed schema containing just a single dynamic copy field directive - i.e. a glob source or destination - deleting the copy field directive fails. For example, the default configset (data_driven_schema_configs) has such a schema: the {{*}}-{{\_text\_}} copy field directive is the only one. To reproduce: {noformat} bin/solr start -c bin/solr create my_solr_coll curl http://localhost:8983/solr/my_solr_coll/schema; -d'{delete-copy-field:{source:*, dest:_text_}}' {noformat} The deletion fails, and an NPE is logged: {noformat} ERROR - 2015-05-13 12:37:36.780; [my_solr_coll shard1 core_node1 my_solr_coll_shard1_replica1] org.apache.solr.common.SolrException; null:java.lang.NullPointerException at org.apache.solr.schema.IndexSchema.getCopyFieldProperties(IndexSchema.java:1450) at org.apache.solr.schema.IndexSchema.getNamedPropertyValues(IndexSchema.java:1406) at org.apache.solr.schema.IndexSchema.persist(IndexSchema.java:390) at org.apache.solr.schema.SchemaManager.doOperations(SchemaManager.java:120) at org.apache.solr.schema.SchemaManager.performOperations(SchemaManager.java:94) at org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:57) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) [...] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.
Kevin Watters created SOLR-7543: --- Summary: Create GraphQuery that allows graph traversal as a query operator. Key: SOLR-7543 URL: https://issues.apache.org/jira/browse/SOLR-7543 Project: Solr Issue Type: New Feature Components: search Reporter: Kevin Watters Priority: Minor I have a GraphQuery that I implemented a long time back that allows a user to specify a seedQuery to identify which documents to start graph traversal from. It then gathers up the edge ids for those documents , optionally applies an additional filter. The query is then re-executed continually until no new edge ids are identified. I am currently hosting this code up at https://github.com/kwatters/solrgraph and I would like to work with the community to get some feedback and ultimately get it committed back in as a lucene query. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542399#comment-14542399 ] Anshum Gupta edited comment on SOLR-7275 at 5/13/15 6:40 PM: - Patch with test. I think this is good to go now. Any feedback would be appreciated. was (Author: anshumg): Patch with test. I think this is good to go now. Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-7275: --- Attachment: SOLR-7275.patch Patch with test. I think this is good to go now. Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542429#comment-14542429 ] Uwe Schindler commented on LUCENE-6450: --- Looks OK regarding my comments about subclassing. One thing: could you make the fields final in the query? Query should be immutable, so the min/maxLat/Lon soubles and polygon array are unmodifiable. I will have a closer look later, I just skimmed through the patch. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542474#comment-14542474 ] David Smiley commented on LUCENE-6371: -- I really like this design, because it enables one to build a highlighter that is accurate (so-called “query debugging”). That’s a huge bonus I wasn’t expecting from this patch (based on the issue title/description). But I think something is missing — SpanCollector.collectLeaf doesn’t provide access to the SpanQuery or perhaps the Term that is being collected. Might SpanCollector.DEFAULT be renamed to NO_OP? Same for BufferedSpanCollector.NO_OP. I think NO_OP is more clear as to what this implementation does. PayloadSpanCollector should use BytesRefArray instead of an ArrayListbyte[]; and it can return this from getPayloads() What is the purpose of the start end position params to collectLeaf()? No implementation uses them (on consumer or implementer side) and I'm not sure how they might be used. Improve Spans payload collection Key: LUCENE-6371 URL: https://issues.apache.org/jira/browse/LUCENE-6371 Project: Lucene - Core Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Attachments: LUCENE-6371.patch Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-7531) Config API is merging certain key names together
[ https://issues.apache.org/jira/browse/SOLR-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-7531. -- Resolution: Pending Closed Config API is merging certain key names together Key: SOLR-7531 URL: https://issues.apache.org/jira/browse/SOLR-7531 Project: Solr Issue Type: Bug Affects Versions: 5.0, 5.1 Reporter: Shalin Shekhar Mangar Assignee: Noble Paul Fix For: Trunk, 5.2 Starting from a new Solr 5.0 install {code} ./bin/solr start -e schemaless curl 'http://localhost:8983/solr/gettingstarted/config' config.json {code} Open config.json and note that there is a key called autoCommmitMaxDocs under the updateHandler section. {code} curl 'http://localhost:8983/solr/gettingstarted/config' -H 'Content-type:application/json' -d '{set-property : {updateHandler.autoCommit.maxDocs : 5000}}' curl 'http://localhost:8983/solr/gettingstarted/config' config.json {code} Open config.json and note that both the value of updateHandler autoCommit maxDocs and updateHandler autoCommitMaxDocs is now set to 5000 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.8.0) - Build # 2254 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2254/ Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC 1 tests failed. FAILED: org.apache.solr.cloud.MultiThreadedOCPTest.test Error Message: Captured an uncaught exception in thread: Thread[id=1487, name=parallelCoreAdminExecutor-629-thread-14, state=RUNNABLE, group=TGRP-MultiThreadedOCPTest] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1487, name=parallelCoreAdminExecutor-629-thread-14, state=RUNNABLE, group=TGRP-MultiThreadedOCPTest] at __randomizedtesting.SeedInfo.seed([41BFB6F148D0F9A9:C9EB892BE62C9451]:0) Caused by: java.lang.AssertionError: Too many closes on SolrCore at __randomizedtesting.SeedInfo.seed([41BFB6F148D0F9A9]:0) at org.apache.solr.core.SolrCore.close(SolrCore.java:1138) at org.apache.solr.common.util.IOUtils.closeQuietly(IOUtils.java:31) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:535) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:494) at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:628) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:213) at org.apache.solr.handler.admin.CoreAdminHandler$ParallelCoreAdminHandlerThread.run(CoreAdminHandler.java:1249) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Build Log: [...truncated 9808 lines...] [junit4] Suite: org.apache.solr.cloud.MultiThreadedOCPTest [junit4] 2 Creating dataDir: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/build/solr-core/test/J1/temp/solr.cloud.MultiThreadedOCPTest 41BFB6F148D0F9A9-001/init-core-data-001 [junit4] 2 295271 T1169 oas.SolrTestCaseJ4.buildSSLConfig Randomized ssl (false) and clientAuth (false) [junit4] 2 295271 T1169 oas.BaseDistributedSearchTestCase.initHostContext Setting hostContext system property: /b/ [junit4] 2 295274 T1169 oasc.ZkTestServer.run STARTING ZK TEST SERVER [junit4] 2 295275 T1170 oasc.ZkTestServer$2$1.setClientPort client port:0.0.0.0/0.0.0.0:0 [junit4] 2 295275 T1170 oasc.ZkTestServer$ZKServerMain.runFromConfig Starting server [junit4] 2 295377 T1169 oasc.ZkTestServer.run start zk server on port:54899 [junit4] 2 295377 T1169 oascc.SolrZkClient.createZkCredentialsToAddAutomatically Using default ZkCredentialsProvider [junit4] 2 295379 T1169 oascc.ConnectionManager.waitForConnected Waiting for client to connect to ZooKeeper [junit4] 2 295387 T1177 oascc.ConnectionManager.process Watcher org.apache.solr.common.cloud.ConnectionManager@3867cbb3 name:ZooKeeperConnection Watcher:127.0.0.1:54899 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None [junit4] 2 295388 T1169 oascc.ConnectionManager.waitForConnected Client is connected to ZooKeeper [junit4] 2 295388 T1169 oascc.SolrZkClient.createZkACLProvider Using default ZkACLProvider [junit4] 2 295388 T1169 oascc.SolrZkClient.makePath makePath: /solr [junit4] 2 295398 T1169 oascc.SolrZkClient.createZkCredentialsToAddAutomatically Using default ZkCredentialsProvider [junit4] 2 295400 T1169 oascc.ConnectionManager.waitForConnected Waiting for client to connect to ZooKeeper [junit4] 2 295404 T1180 oascc.ConnectionManager.process Watcher org.apache.solr.common.cloud.ConnectionManager@1cb44d54 name:ZooKeeperConnection Watcher:127.0.0.1:54899/solr got event WatchedEvent state:SyncConnected type:None path:null path:null type:None [junit4] 2 295405 T1169 oascc.ConnectionManager.waitForConnected Client is connected to ZooKeeper [junit4] 2 295405 T1169 oascc.SolrZkClient.createZkACLProvider Using default ZkACLProvider [junit4] 2 295405 T1169 oascc.SolrZkClient.makePath makePath: /collections/collection1 [junit4] 2 295412 T1169 oascc.SolrZkClient.makePath makePath: /collections/collection1/shards [junit4] 2 295420 T1169 oascc.SolrZkClient.makePath makePath: /collections/control_collection [junit4] 2 295425 T1169 oascc.SolrZkClient.makePath makePath: /collections/control_collection/shards [junit4] 2 295429 T1169 oasc.AbstractZkTestCase.putConfig put /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test-files/solr/collection1/conf/solrconfig-tlog.xml to /configs/conf1/solrconfig.xml [junit4] 2 295429 T1169 oascc.SolrZkClient.makePath makePath: /configs/conf1/solrconfig.xml [junit4] 2 295435 T1169 oasc.AbstractZkTestCase.putConfig put
[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6481: --- Attachment: LUCENE-6481_WIP.patch First cut WIP patch. LuceneUtil benchmark shows false negatives, though, so this is definitely not ready. So far I've been unable to reproduce the false negatives...I put it here for iterating improvements. *GeoPointField* Index Time: 640.24 sec Index Size: 4.4G Mean Query Time: 0.02 sec Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7143) MoreLikeThis Query Parser does not handle multiple field names
[ https://issues.apache.org/jira/browse/SOLR-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542663#comment-14542663 ] Anshum Gupta edited comment on SOLR-7143 at 5/13/15 8:40 PM: - Hi Jens, Sorry but I haven't been able to get to this all this while. Here's what we need to get working: # Way to specify multiple values for a field within the local params. e.g.: {code:title=SOLR-2798 would solve this} http://localhost:8983/solr/techproducts/select?q={!mlt qf=foo qf=bar}docid {code} # We also need to support parameter dereferencing as you suggested, considering we don't want to get involved with commas: {code} http://localhost:8983/solr/techproducts/select?q={!mlt qf=$mlt.fl}docidmlt.fl=foomlt.fl=bar {code} Supporting comma's would interfere with the syntax used for things like bf e.g. {{bf=recip(rord(creationDate),1,1000,1000)}} If you have time and the motivation, it'd be great if you contribute a patch for this. We may already have parts of it from the existing patch. was (Author: anshumg): Hi Jens, Sorry but I haven't been able to get to this all this while. Here's what we need to get working: # Way to specify multiple values for a field within the local params. e.g.: {code:title=SOLR-2798 would solve this} http://localhost:8983/solr/techproducts/select?q={!mlt qf=foo qf=bar}docid {code} # We also need to support parameter dereferencing as you suggested, considering we don't want to get involved with commas: {code} http://localhost:8983/solr/techproducts/select?q={!mlt qf=$mlt.fl}docidmlt.fl=foomlt.fl=bar {code} Supporting comma's would interfere with the syntax used for things like bf e.g. {{bf=recip(rord(creationDate),1,1000,1000)}} MoreLikeThis Query Parser does not handle multiple field names -- Key: SOLR-7143 URL: https://issues.apache.org/jira/browse/SOLR-7143 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 5.0 Reporter: Jens Wille Assignee: Anshum Gupta Attachments: SOLR-7143.patch, SOLR-7143.patch The newly introduced MoreLikeThis Query Parser (SOLR-6248) does not return any results when supplied with multiple fields in the {{qf}} parameter. To reproduce within the techproducts example, compare: {code} curl 'http://localhost:8983/solr/techproducts/select?q=%7B!mlt+qf=name%7DMA147LL/A' curl 'http://localhost:8983/solr/techproducts/select?q=%7B!mlt+qf=features%7DMA147LL/A' curl 'http://localhost:8983/solr/techproducts/select?q=%7B!mlt+qf=name,features%7DMA147LL/A' {code} The first two queries return 8 and 5 results, respectively. The third query doesn't return any results (not even the matched document). In contrast, the MoreLikeThis Handler works as expected (accounting for the default {{mintf}} and {{mindf}} values in SimpleMLTQParser): {code} curl 'http://localhost:8983/solr/techproducts/mlt?q=id:MA147LL/Amlt.fl=namemlt.mintf=1mlt.mindf=1' curl 'http://localhost:8983/solr/techproducts/mlt?q=id:MA147LL/Amlt.fl=featuresmlt.mintf=1mlt.mindf=1' curl 'http://localhost:8983/solr/techproducts/mlt?q=id:MA147LL/Amlt.fl=name,featuresmlt.mintf=1mlt.mindf=1' {code} After adding the following line to {{example/techproducts/solr/techproducts/conf/solrconfig.xml}}: {code:language=XML} requestHandler name=/mlt class=solr.MoreLikeThisHandler / {code} The first two queries return 7 and 4 results, respectively (excluding the matched document). The third query returns 7 results, as one would expect. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.8.0_45) - Build # 4807 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4807/ Java: 64bit/jdk1.8.0_45 -XX:+UseCompressedOops -XX:+UseG1GC 1 tests failed. FAILED: org.apache.solr.core.TestArbitraryIndexDir.testLoadNewIndexDir Error Message: Exception during query Stack Trace: java.lang.RuntimeException: Exception during query at __randomizedtesting.SeedInfo.seed([E296220DB3BA94EB:BCC99352D230443]:0) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:794) at org.apache.solr.core.TestArbitraryIndexDir.testLoadNewIndexDir(TestArbitraryIndexDir.java:128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: REQUEST FAILED: xpath=*[count(//doc)=1] xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/int/lstresult name=response numFound=0 start=0/result /response request was:q=id:2qt=standardstart=0rows=20version=2.2
[jira] [Updated] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-7275: --- Attachment: SOLR-7275.patch Accidentally added the SimpleSolrAuthorizationPlugin to the last patch. Removing it. Also added 2 more static file extensions to ignore for authz purpose. Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Attachment: LUCENE-6371.patch Updated patch: * collectLeaf() now takes PostingsEnum and Term * the default impls are renamed to NO_OP Changing from Collectionbyte[] to BytesRefArray is a great idea, but I'd like to do that in a separate issue as that effects the external SpanQuery API a fair amount. This patch currently only changes internals. Improve Spans payload collection Key: LUCENE-6371 URL: https://issues.apache.org/jira/browse/LUCENE-6371 Project: Lucene - Core Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Attachments: LUCENE-6371.patch, LUCENE-6371.patch Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542521#comment-14542521 ] Noble Paul commented on SOLR-7275: -- We need to tackle the modification of security.json pretty soon. But that can be dealt separately The security.json needs to be watched and the plugin needs to be notified of the change. That should not prevent us from committing this Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7143) MoreLikeThis Query Parser does not handle multiple field names
[ https://issues.apache.org/jira/browse/SOLR-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542663#comment-14542663 ] Anshum Gupta commented on SOLR-7143: Hi Jens, Sorry but I haven't been able to get to this all this while. Here's what we need to get working: # Way to specify multiple values for a field within the local params. e.g.: {code:title=SOLR-2798 would solve this} http://localhost:8983/solr/techproducts/select?q={!mlt qf=foo qf=bar}docid {code} # We also need to support parameter dereferencing as you suggested, considering we don't want to get involved with commas: {code} http://localhost:8983/solr/techproducts/select?q={!mlt qf=$mlt.fl}docidmlt.fl=foomlt.fl=bar {code} Supporting comma's would interfere with the syntax used for things like bf e.g. {{bf=recip(rord(creationDate),1,1000,1000)}} MoreLikeThis Query Parser does not handle multiple field names -- Key: SOLR-7143 URL: https://issues.apache.org/jira/browse/SOLR-7143 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 5.0 Reporter: Jens Wille Assignee: Anshum Gupta Attachments: SOLR-7143.patch, SOLR-7143.patch The newly introduced MoreLikeThis Query Parser (SOLR-6248) does not return any results when supplied with multiple fields in the {{qf}} parameter. To reproduce within the techproducts example, compare: {code} curl 'http://localhost:8983/solr/techproducts/select?q=%7B!mlt+qf=name%7DMA147LL/A' curl 'http://localhost:8983/solr/techproducts/select?q=%7B!mlt+qf=features%7DMA147LL/A' curl 'http://localhost:8983/solr/techproducts/select?q=%7B!mlt+qf=name,features%7DMA147LL/A' {code} The first two queries return 8 and 5 results, respectively. The third query doesn't return any results (not even the matched document). In contrast, the MoreLikeThis Handler works as expected (accounting for the default {{mintf}} and {{mindf}} values in SimpleMLTQParser): {code} curl 'http://localhost:8983/solr/techproducts/mlt?q=id:MA147LL/Amlt.fl=namemlt.mintf=1mlt.mindf=1' curl 'http://localhost:8983/solr/techproducts/mlt?q=id:MA147LL/Amlt.fl=featuresmlt.mintf=1mlt.mindf=1' curl 'http://localhost:8983/solr/techproducts/mlt?q=id:MA147LL/Amlt.fl=name,featuresmlt.mintf=1mlt.mindf=1' {code} After adding the following line to {{example/techproducts/solr/techproducts/conf/solrconfig.xml}}: {code:language=XML} requestHandler name=/mlt class=solr.MoreLikeThisHandler / {code} The first two queries return 7 and 4 results, respectively (excluding the matched document). The third query returns 7 results, as one would expect. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-Linux (32bit/jdk1.8.0_45) - Build # 12491 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/12491/ Java: 32bit/jdk1.8.0_45 -server -XX:+UseG1GC 1 tests failed. FAILED: org.apache.solr.cloud.CloudExitableDirectoryReaderTest.test Error Message: No live SolrServers available to handle this request:[https://127.0.0.1:33066/_/aa/collection1] Stack Trace: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[https://127.0.0.1:33066/_/aa/collection1] at __randomizedtesting.SeedInfo.seed([9951855D48D4B18A:1105BA87E628DC72]:0) at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:355) at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1086) at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:856) at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:799) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.queryServer(AbstractFullDistribZkTestBase.java:1425) at org.apache.solr.cloud.CloudExitableDirectoryReaderTest.assertPartialResults(CloudExitableDirectoryReaderTest.java:102) at org.apache.solr.cloud.CloudExitableDirectoryReaderTest.doTimeoutTests(CloudExitableDirectoryReaderTest.java:86) at org.apache.solr.cloud.CloudExitableDirectoryReaderTest.test(CloudExitableDirectoryReaderTest.java:53) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at
[jira] [Commented] (SOLR-7468) Kerberos authentication module
[ https://issues.apache.org/jira/browse/SOLR-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542913#comment-14542913 ] Anshum Gupta commented on SOLR-7468: Here's some feedback: # Can we avoid the addition of the extra Servlet Filter (KerberosFilter) ? Now that SDF is essentially a wrapper, perhaps we could reuse the wrapper. # If we do #1, we also wouldn't need the change/hack in the JettySolrRunner. Also, no change would be needed in MiniSolrCloudCluster. # Minor but important, I noticed a lot of unused imports, you should clean those up. Kerberos authentication module -- Key: SOLR-7468 URL: https://issues.apache.org/jira/browse/SOLR-7468 Project: Solr Issue Type: New Feature Components: security Reporter: Ishan Chattopadhyaya Attachments: SOLR-7468.patch, SOLR-7468.patch, SOLR-7468.patch SOLR-7274 introduces a pluggable authentication framework. This issue provides a Kerberos plugin implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6319) Delegating OneMerge
[ https://issues.apache.org/jira/browse/LUCENE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542945#comment-14542945 ] Elliott Bradshaw commented on LUCENE-6319: -- Just thought I'd ping on this. Any thoughts? Greatest patch ever? Seriously though, I know this touches a lot of hardcore internal classes, so I get it if people are wary. If anyone has any suggestions of a different route, I'm more than happy to explore it. Delegating OneMerge --- Key: LUCENE-6319 URL: https://issues.apache.org/jira/browse/LUCENE-6319 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Elliott Bradshaw Attachments: SOLR-6319.patch In trying to integrate SortingMergePolicy into ElasticSearch, I ran into an issue where the custom merge logic was being stripped out by IndexUpgraderMergeSpecification. Related issue here: https://github.com/elasticsearch/elasticsearch/issues/9731 In an endeavor to fix this, I attempted to create a DelegatingOneMerge that could be used to chain the different MergePolicies together. I quickly discovered this to be impossible, due to the direct member variable access of OneMerge by IndexWriter and other classes. It would be great if this variable access could be privatized and the consuming classes modified to use the appropriate getters and setters. Here's an example DelegatingOneMerge and modified OneMerge. https://gist.github.com/ebradshaw/e0b74e9e8d4976ab9e0a https://gist.github.com/ebradshaw/d72116a014f226076303 The downside here is that this would require an API change, as there are three public variables in OneMerge: estimatedMergeBytes, segments and totalDocCount. These would have to be moved behind public getters. Without this change, I'm not sure how we could get the SortingMergePolicy working in ES, but if anyone has any other suggestions I'm all ears! Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6459) [suggest] Query Interface for suggest API
[ https://issues.apache.org/jira/browse/LUCENE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur updated LUCENE-6459: - Attachment: LUCENE-6459.patch Updated Patch: - don't allow zero length suggestion values - improve docs - added tests [suggest] Query Interface for suggest API - Key: LUCENE-6459 URL: https://issues.apache.org/jira/browse/LUCENE-6459 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.1 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.x, 5.1 Attachments: LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch This patch factors out common indexing/search API used by the recently introduced [NRTSuggester|https://issues.apache.org/jira/browse/LUCENE-6339]. The motivation is to provide a query interface for FST-based fields (*SuggestField* and *ContextSuggestField*) for enabling suggestion scoring and more powerful automaton queries. Previously, only prefix ‘queries’ with index-time weights were supported but we can also support: * Prefix queries expressed as regular expressions: get suggestions that match multiple prefixes ** Example: _star\[wa\|tr\]_ matches _starwars_ and _startrek_ * Fuzzy Prefix queries supporting scoring: get typo tolerant suggestions scored by how close they are to the query prefix ** Example: querying for _seper_ will score _separate_ higher then _superstitious_ * Context Queries: get suggestions boosted and/or filtered based on their indexed contexts (meta data) ** Example: get typo tolerant suggestions on song names with prefix _like a roling_ boosting songs with genre _rock_ and _indie_ ** Example: get suggestion on all file names starting with _finan_ only for _user1_ and _user2_ h3. Suggest API {code} SuggestIndexSearcher searcher = new SuggestIndexSearcher(reader); CompletionQuery query = ... TopSuggestDocs suggest = searcher.suggest(query, num); {code} h3. CompletionQuery *CompletionQuery* is used to query *SuggestField* and *ContextSuggestField*. A *CompletionQuery* produces a *CompletionWeight*, which allows *CompletionQuery* implementations to pass in an automaton that will be intersected with a FST and allows boosting and meta data extraction from the intersected partial paths. A *CompletionWeight* produces a *CompletionScorer*. A *CompletionScorer* executes a Top N search against the FST with the provided automaton, scoring and filtering all matched paths. h4. PrefixCompletionQuery Return documents with values that match the prefix of an analyzed term text Documents are sorted according to their suggest field weight. {code} PrefixCompletionQuery(Analyzer analyzer, Term term) {code} h4. RegexCompletionQuery Return documents with values that match the prefix of a regular expression Documents are sorted according to their suggest field weight. {code} RegexCompletionQuery(Term term) {code} h4. FuzzyCompletionQuery Return documents with values that has prefixes within a specified edit distance of an analyzed term text. Documents are ‘boosted’ by the number of matching prefix letters of the suggestion with respect to the original term text. {code} FuzzyCompletionQuery(Analyzer analyzer, Term term) {code} h5. Scoring {{suggestion_weight + (global_maximum_weight * boost)}} where {{suggestion_weight}}, {{global_maximum_weight}} and {{boost}} are all integers. {{boost = # of prefix characters matched}} h4. ContextQuery Return documents that match a {{CompletionQuery}} filtered and/or boosted by provided context(s). {code} ContextQuery(CompletionQuery query) contextQuery.addContext(CharSequence context, int boost, boolean exact) {code} *NOTE:* {{ContextQuery}} should be used with {{ContextSuggestField}} to query suggestions boosted and/or filtered by contexts h5. Scoring {{suggestion_weight + (global_maximum_weight * context_boost)}} where {{suggestion_weight}}, {{global_maximum_weight}} and {{context_boost}} are all integers When used with {{FuzzyCompletionQuery}}, {{suggestion_weight + (global_maximum_weight * (context_boost + fuzzy_boost))}} h3. Context Suggest Field To use {{ContextQuery}}, use {{ContextSuggestField}} instead of {{SuggestField}}. Any {{CompletionQuery}} can be used with {{ContextSuggestField}}, the default behaviour is to return suggestions from *all* contexts. {{Context}} for every completion hit can be accessed through {{SuggestScoreDoc#context}}. {code} ContextSuggestField(String name, CollectionCharSequence contexts, String value, int weight) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SOLR-6273) Cross Data Center Replication
[ https://issues.apache.org/jira/browse/SOLR-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542953#comment-14542953 ] Erick Erickson commented on SOLR-6273: -- [~arcadius] Sorry it took a while to get back to you, but currently CDCR is active-passive, not active-active so the scenario you asked about shouldn't arise. Cross Data Center Replication - Key: SOLR-6273 URL: https://issues.apache.org/jira/browse/SOLR-6273 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Assignee: Erick Erickson Attachments: SOLR-6273-trunk.patch, SOLR-6273.patch, SOLR-6273.patch, SOLR-6273.patch, SOLR-6273.patch This is the master issue for Cross Data Center Replication (CDCR) described at a high level here: http://heliosearch.org/solr-cross-data-center-replication/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
[ https://issues.apache.org/jira/browse/LUCENE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542165#comment-14542165 ] Karl Wright commented on LUCENE-6480: - Depends on the application. With 64 bits the resolution can be 6.07 meters, which seems probably good enough for most. Extend Simple GeoPointField Type to 3d --- Key: LUCENE-6480 URL: https://issues.apache.org/jira/browse/LUCENE-6480 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] proposes a simple GeoPointField type to lucene core. This field uses 64bit encoding of 2 dimensional points to construct sorted term representations of GeoPoints (aka: GeoHashing). This feature investigates adding support for encoding 3 dimensional GeoPoints, either by extending GeoPointField to a Geo3DPointField or adding an additional 3d constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7503) Recovery after ZK session expiration happens in a single thread for all cores in a node
[ https://issues.apache.org/jira/browse/SOLR-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter updated SOLR-7503: - Attachment: SOLR-7503.patch Simple patch that registers cores in the background after ZK session expiration. I had to add some getter methods for the ExecutionService in the ZkContainer so that it is available to the ZkController when needed (iff cc is not null). I didn't want to use a new ExecutionService since the one setup by ZkContainer seemed most appropriate for this work, but you can't expose ZkContainer directly in ZkController because it's only a server-side thing. Recovery after ZK session expiration happens in a single thread for all cores in a node --- Key: SOLR-7503 URL: https://issues.apache.org/jira/browse/SOLR-7503 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.1 Reporter: Shalin Shekhar Mangar Assignee: Timothy Potter Labels: impact-high Fix For: Trunk, 5.2 Attachments: SOLR-7503.patch Currently cores are registered in parallel in an executor. However, when there's a ZK expiration, the recovery, which also happens in the register call, happens in a single thread: https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L300 We should make these happen in parallel as well so that recovery after ZK expiration doesn't take forever. Thanks to [~mewmewball] for catching this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-6968) add hyperloglog in statscomponent as an approximate count
[ https://issues.apache.org/jira/browse/SOLR-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man reopened SOLR-6968: Doing some perf testing, i found that the SPARSE representation of HLL can cause some heinous response times for large sets -- a minor part of the issue seems to be the slower insertion rate compared to the FULL representation (documented), but a much bigger factor is that _merging_ multiple (large) SPARSE HLLs is almost 10x slower then merging FULL HLLs of the same size. it might be worth adding tuning options and/or hueristics to control if/when SPARSE representation should be used (in cases where folks have smaller sets and care more about memory then speed), but for now i'm just going disable it. add hyperloglog in statscomponent as an approximate count - Key: SOLR-6968 URL: https://issues.apache.org/jira/browse/SOLR-6968 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Assignee: Hoss Man Fix For: Trunk, 5.2 Attachments: SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch stats component currently supports calcDistinct but it's terribly inefficient -- especially in distib mode. we should add support for using hyperloglog to compute an approximate count of distinct values (using localparams via SOLR-6349 to control the precision of the approximation) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7542) Schema API: Can't remove single dynamic copy field directive
[ https://issues.apache.org/jira/browse/SOLR-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542124#comment-14542124 ] ASF subversion and git services commented on SOLR-7542: --- Commit 1679229 from [~steve_rowe] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1679229 ] SOLR-7542: Schema API: Can't remove single dynamic copy field directive (merged trunk r1679225) Schema API: Can't remove single dynamic copy field directive Key: SOLR-7542 URL: https://issues.apache.org/jira/browse/SOLR-7542 Project: Solr Issue Type: Bug Affects Versions: 5.1 Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 5.2 Attachments: SOLR-7542.patch In a managed schema containing just a single dynamic copy field directive - i.e. a glob source or destination - deleting the copy field directive fails. For example, the default configset (data_driven_schema_configs) has such a schema: the {{*}}-{{\_text\_}} copy field directive is the only one. To reproduce: {noformat} bin/solr start -c bin/solr create my_solr_coll curl http://localhost:8983/solr/my_solr_coll/schema; -d'{delete-copy-field:{source:*, dest:_text_}}' {noformat} The deletion fails, and an NPE is logged: {noformat} ERROR - 2015-05-13 12:37:36.780; [my_solr_coll shard1 core_node1 my_solr_coll_shard1_replica1] org.apache.solr.common.SolrException; null:java.lang.NullPointerException at org.apache.solr.schema.IndexSchema.getCopyFieldProperties(IndexSchema.java:1450) at org.apache.solr.schema.IndexSchema.getNamedPropertyValues(IndexSchema.java:1406) at org.apache.solr.schema.IndexSchema.persist(IndexSchema.java:390) at org.apache.solr.schema.SchemaManager.doOperations(SchemaManager.java:120) at org.apache.solr.schema.SchemaManager.performOperations(SchemaManager.java:94) at org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:57) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) [...] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-7542) Schema API: Can't remove single dynamic copy field directive
[ https://issues.apache.org/jira/browse/SOLR-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved SOLR-7542. -- Resolution: Fixed Committed to trunk and branch_5x. Schema API: Can't remove single dynamic copy field directive Key: SOLR-7542 URL: https://issues.apache.org/jira/browse/SOLR-7542 Project: Solr Issue Type: Bug Affects Versions: 5.1 Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 5.2 Attachments: SOLR-7542.patch In a managed schema containing just a single dynamic copy field directive - i.e. a glob source or destination - deleting the copy field directive fails. For example, the default configset (data_driven_schema_configs) has such a schema: the {{*}}-{{\_text\_}} copy field directive is the only one. To reproduce: {noformat} bin/solr start -c bin/solr create my_solr_coll curl http://localhost:8983/solr/my_solr_coll/schema; -d'{delete-copy-field:{source:*, dest:_text_}}' {noformat} The deletion fails, and an NPE is logged: {noformat} ERROR - 2015-05-13 12:37:36.780; [my_solr_coll shard1 core_node1 my_solr_coll_shard1_replica1] org.apache.solr.common.SolrException; null:java.lang.NullPointerException at org.apache.solr.schema.IndexSchema.getCopyFieldProperties(IndexSchema.java:1450) at org.apache.solr.schema.IndexSchema.getNamedPropertyValues(IndexSchema.java:1406) at org.apache.solr.schema.IndexSchema.persist(IndexSchema.java:390) at org.apache.solr.schema.SchemaManager.doOperations(SchemaManager.java:120) at org.apache.solr.schema.SchemaManager.performOperations(SchemaManager.java:94) at org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:57) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) [...] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7531) Config API is merging certain key names together
[ https://issues.apache.org/jira/browse/SOLR-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542085#comment-14542085 ] ASF subversion and git services commented on SOLR-7531: --- Commit 1679224 from [~noble.paul] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1679224 ] SOLR-7531: config API shows a few keys merged together Config API is merging certain key names together Key: SOLR-7531 URL: https://issues.apache.org/jira/browse/SOLR-7531 Project: Solr Issue Type: Bug Affects Versions: 5.0, 5.1 Reporter: Shalin Shekhar Mangar Assignee: Noble Paul Fix For: Trunk, 5.2 Starting from a new Solr 5.0 install {code} ./bin/solr start -e schemaless curl 'http://localhost:8983/solr/gettingstarted/config' config.json {code} Open config.json and note that there is a key called autoCommmitMaxDocs under the updateHandler section. {code} curl 'http://localhost:8983/solr/gettingstarted/config' -H 'Content-type:application/json' -d '{set-property : {updateHandler.autoCommit.maxDocs : 5000}}' curl 'http://localhost:8983/solr/gettingstarted/config' config.json {code} Open config.json and note that both the value of updateHandler autoCommit maxDocs and updateHandler autoCommitMaxDocs is now set to 5000 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6480) Extend Simple GeoPointField Type to 3d
[ https://issues.apache.org/jira/browse/LUCENE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542149#comment-14542149 ] Karl Wright commented on LUCENE-6480: - bq. So why not just use ECEF then instead of the unit sphere? I presume your question is about geo3D in general. There's quite a bit of math in Geo3D that relies on GeoPoints being on the unit sphere. For that reason, using the unit sphere, or projecting to it at least, is preferred. If you are only doing containment of a point, you may not run into some of the more complex Geo3D math, but if you are determining relationships of bounding boxes to shapes, or finding the bounding box of a shape, you can't dodge being on the unit sphere. bq. Or have you tried this and the few extra trig computations impaired performance? If you mean trying to map points on the earth onto the unit sphere, then it was simply unnecessary for our application. The maximum error you can get, as I stated before, by using a sphere rather than a real earth model is a few meters. I maintain that doing such a mapping at indexing time is probably straightforward, at some performance expense, but I view this as beyond the bounds of this project. Extend Simple GeoPointField Type to 3d --- Key: LUCENE-6480 URL: https://issues.apache.org/jira/browse/LUCENE-6480 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] proposes a simple GeoPointField type to lucene core. This field uses 64bit encoding of 2 dimensional points to construct sorted term representations of GeoPoints (aka: GeoHashing). This feature investigates adding support for encoding 3 dimensional GeoPoints, either by extending GeoPointField to a Geo3DPointField or adding an additional 3d constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6968) add hyperloglog in statscomponent as an approximate count
[ https://issues.apache.org/jira/browse/SOLR-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-6968: --- Attachment: SOLR-6968_nosparse.patch patch that worked well in my perf tests to disable SPARSE (and optimize some ram usage when merging EMPTY) ... will commit once {{ant precommit test}} finishes. add hyperloglog in statscomponent as an approximate count - Key: SOLR-6968 URL: https://issues.apache.org/jira/browse/SOLR-6968 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Assignee: Hoss Man Fix For: Trunk, 5.2 Attachments: SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968_nosparse.patch stats component currently supports calcDistinct but it's terribly inefficient -- especially in distib mode. we should add support for using hyperloglog to compute an approximate count of distinct values (using localparams via SOLR-6349 to control the precision of the approximation) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Attachment: LUCENE-6371.patch I've been playing around with various APIs for this, and I think this one works reasonably well. Spans.isPayloadAvailable() and getPayload() are replaced with a collect() method that takes a SpanCollector. If you want to get payloads from a Spans, you do the following: {code:java} PayloadSpanCollector collector = new PayloadSpanCollector(); while (spans.nextStartPosition() != NO_MORE_POSITIONS) { collector.reset(); spans.collect(collector); doSomethingWith(collector.getPayloads()); } {code} The actual job of collecting information from postings lists is devolved to the collector itself (via SpanCollector.collectLeaf(), called from TermSpans.collect()). The API is made slightly complicated by the need to buffer collected information in NearOrderedSpans, because the algorithm there moves child spans on eagerly when finding the smallest possible match, so by the time collect() is called we're out of position. This is dealt with using a BufferedSpanCollector, with collectCandidate(Spans) and accept() methods. The default (No-op) collector has a no-op implementation of this, which should get optimized away by HotSpot, meaning that we don't need to have separate implementations for collecting and non-collecting algorithms, and can do away with PayloadNearOrderedSpans. This patch also moves the PayloadCheck queries to the .payloads package, which tidies things up a bit. All tests pass. Improve Spans payload collection Key: LUCENE-6371 URL: https://issues.apache.org/jira/browse/LUCENE-6371 Project: Lucene - Core Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Attachments: LUCENE-6371.patch Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6481: --- Attachment: LUCENE-6481.patch The test had the lat and lon ordering incorrect for both GeoPointFieldType and the GeoPointInBBoxQuery. I've attached a new patch with the correction. testRandomTiny passes but there is one failure in testRandom with the following: {noformat} ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true {noformat} {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.slow=true -Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 1.54s | TestGeoPointQuery.testRandom [junit4] Throwable #1: java.lang.AssertionError: id=632 docID=613 lat=46.19240875459866 lon=143.92476891121902 expected true but got: false deleted?=false [junit4]at __randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:83A81A5CC1FB49F1]:0) [junit4]at org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:302) [junit4]at org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:204) [junit4]at org.apache.lucene.search.TestGeoPointQuery.testRandom(TestGeoPointQuery.java:130) [junit4]at java.lang.Thread.run(Thread.java:745) {noformat} This should be enough to debug the issue. I expect to have a new patch sometime tomorrow or before weeks end. Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 3114 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3114/ 1 tests failed. REGRESSION: org.apache.solr.update.SoftAutoCommitTest.testSoftAndHardCommitMaxTimeMixedAdds Error Message: soft529 wasn't fast enough Stack Trace: java.lang.AssertionError: soft529 wasn't fast enough at __randomizedtesting.SeedInfo.seed([24CCE5AE308765DB:75181C2E81F4557C]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.apache.solr.update.SoftAutoCommitTest.testSoftAndHardCommitMaxTimeMixedAdds(SoftAutoCommitTest.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:745) Build Log: [...truncated 9443 lines...] [junit4] Suite: org.apache.solr.update.SoftAutoCommitTest [junit4] 2 Creating dataDir:
[jira] [Commented] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543224#comment-14543224 ] Anshum Gupta commented on SOLR-7275: Thanks for the feedback Noble. Right, as of now, a node restart would be required for security.json to be re-read. I'll create another issue for that and as I understand, you don't have an objection to committing this, right? :-) Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-Windows (64bit/jdk1.7.0_80) - Build # 4687 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4687/ Java: 64bit/jdk1.7.0_80 -XX:-UseCompressedOops -XX:+UseParallelGC 1 tests failed. FAILED: org.apache.solr.core.TestArbitraryIndexDir.testLoadNewIndexDir Error Message: Exception during query Stack Trace: java.lang.RuntimeException: Exception during query at __randomizedtesting.SeedInfo.seed([9DE047CFE9B60DA3:74BAFCF7772F9D0B]:0) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:794) at org.apache.solr.core.TestArbitraryIndexDir.testLoadNewIndexDir(TestArbitraryIndexDir.java:128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: REQUEST FAILED: xpath=*[count(//doc)=1] xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/int/lstresult name=response numFound=0 start=0/result /response request
[jira] [Commented] (SOLR-6968) add hyperloglog in statscomponent as an approximate count
[ https://issues.apache.org/jira/browse/SOLR-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542228#comment-14542228 ] ASF subversion and git services commented on SOLR-6968: --- Commit 1679241 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1679241 ] SOLR-6968: perf tweak: eliminate use of SPARSE storage option since it has some pathologically bad behavior for some set sizes (particularly when merging shard responses) add hyperloglog in statscomponent as an approximate count - Key: SOLR-6968 URL: https://issues.apache.org/jira/browse/SOLR-6968 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Assignee: Hoss Man Fix For: Trunk, 5.2 Attachments: SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968_nosparse.patch stats component currently supports calcDistinct but it's terribly inefficient -- especially in distib mode. we should add support for using hyperloglog to compute an approximate count of distinct values (using localparams via SOLR-6349 to control the precision of the approximation) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
Nicholas Knize created LUCENE-6481: -- Summary: Improve GeoPointField type to only visit high precision boundary terms Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6968) add hyperloglog in statscomponent as an approximate count
[ https://issues.apache.org/jira/browse/SOLR-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542290#comment-14542290 ] ASF subversion and git services commented on SOLR-6968: --- Commit 1679250 from hoss...@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1679250 ] SOLR-6968: perf tweak: eliminate use of SPARSE storage option since it has some pathologically bad behavior for some set sizes (particularly when merging shard responses) (merge r1679241) add hyperloglog in statscomponent as an approximate count - Key: SOLR-6968 URL: https://issues.apache.org/jira/browse/SOLR-6968 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Assignee: Hoss Man Fix For: Trunk, 5.2 Attachments: SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968_nosparse.patch stats component currently supports calcDistinct but it's terribly inefficient -- especially in distib mode. we should add support for using hyperloglog to compute an approximate count of distinct values (using localparams via SOLR-6349 to control the precision of the approximation) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7542) Schema API: Can't remove single dynamic copy field directive
[ https://issues.apache.org/jira/browse/SOLR-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542087#comment-14542087 ] ASF subversion and git services commented on SOLR-7542: --- Commit 1679225 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1679225 ] SOLR-7542: Schema API: Can't remove single dynamic copy field directive Schema API: Can't remove single dynamic copy field directive Key: SOLR-7542 URL: https://issues.apache.org/jira/browse/SOLR-7542 Project: Solr Issue Type: Bug Affects Versions: 5.1 Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 5.2 Attachments: SOLR-7542.patch In a managed schema containing just a single dynamic copy field directive - i.e. a glob source or destination - deleting the copy field directive fails. For example, the default configset (data_driven_schema_configs) has such a schema: the {{*}}-{{\_text\_}} copy field directive is the only one. To reproduce: {noformat} bin/solr start -c bin/solr create my_solr_coll curl http://localhost:8983/solr/my_solr_coll/schema; -d'{delete-copy-field:{source:*, dest:_text_}}' {noformat} The deletion fails, and an NPE is logged: {noformat} ERROR - 2015-05-13 12:37:36.780; [my_solr_coll shard1 core_node1 my_solr_coll_shard1_replica1] org.apache.solr.common.SolrException; null:java.lang.NullPointerException at org.apache.solr.schema.IndexSchema.getCopyFieldProperties(IndexSchema.java:1450) at org.apache.solr.schema.IndexSchema.getNamedPropertyValues(IndexSchema.java:1406) at org.apache.solr.schema.IndexSchema.persist(IndexSchema.java:390) at org.apache.solr.schema.SchemaManager.doOperations(SchemaManager.java:120) at org.apache.solr.schema.SchemaManager.performOperations(SchemaManager.java:94) at org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:57) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) [...] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6968) add hyperloglog in statscomponent as an approximate count
[ https://issues.apache.org/jira/browse/SOLR-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-6968. Resolution: Fixed add hyperloglog in statscomponent as an approximate count - Key: SOLR-6968 URL: https://issues.apache.org/jira/browse/SOLR-6968 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Assignee: Hoss Man Fix For: Trunk, 5.2 Attachments: SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968_nosparse.patch stats component currently supports calcDistinct but it's terribly inefficient -- especially in distib mode. we should add support for using hyperloglog to compute an approximate count of distinct values (using localparams via SOLR-6349 to control the precision of the approximation) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6450: --- Attachment: LUCENE-6450.patch I've decided to go ahead and just add a minor updated patch for commit consideration and save performance improvements for new issues like [LUCENE-6481 | https://issues.apache.org/jira/browse/LUCENE-6481]. This enables other patches, like the BKD-tree [LUCENE-6477 | https://issues.apache.org/jira/browse/LUCENE-6477] to use the helper classes provided by this patch, and other contributors to iterate improvements on this new field type. Patch includes the following updates: * Changed GeoPointIn*Query to subclass MultiTermQuery instead of NumericRangeQuery leaving NRQ unchanged * Removed unused DocValues from GeoPointField.FieldType (reduces sized of index for now) * Updated javadocs to reflect issues with large queries. * 2 space indent formatting Benchmarks are roughly the same, with a moderately reduced index size. *GeoPointField* Index Time: 160.545 sec Index Size: 1.3G Mean Query Time: 0.104 sec Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7537) Could not find or load main class org.apache.solr.util.SimplePostTool
[ https://issues.apache.org/jira/browse/SOLR-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542374#comment-14542374 ] Erik Hatcher commented on SOLR-7537: This works for me on Solr 5.1.0, doing `bin/solr create -c gettingstarted` and cd'ing into bin (an atypical thing to do, not what the quick start says to do, by the way) and running your command on the built-in docs directory of the install, `sh post -c gettingstarted ../docs`. Is your dist/solr-core-5.1.0.jar there? Something seems broken in your environment. Could not find or load main class org.apache.solr.util.SimplePostTool - Key: SOLR-7537 URL: https://issues.apache.org/jira/browse/SOLR-7537 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 5.1 Environment: Windows 8.1, cygwin4.3.33 Reporter: Peng Li In solr-5.1.0/bin folder, I typed below command ../doc folder has readme.docx sh post -c gettingstarted ../doc And I got below exception: c:\Java\jdk1.8.0_20/bin/java -classpath /cygdrive/c/Users/lipeng/_Main/Servers/solr-5.1.0/dist/solr-core-5.1.0.jar -Dauto=yes -Dc=gettingstarted -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool ../doc Error: Could not find or load main class org.apache.solr.util.SimplePostTool I followed instruction from here: http://lucene.apache.org/solr/quickstart.html Can you help me to take a look at? Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-7275: --- Attachment: SOLR-7275.patch Patch that adds request type info for /select [READ] and /update [WRITE] requests. Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Why morphlines code is in Solr?
https://github.com/kite-sdk/kite/tree/master/kite-morphlines/kite-morphlines-solr-core/src/main/java/org/kitesdk/morphline/solr and here is the Solr codebase https://github.com/apache/lucene-solr/tree/trunk/solr/contrib/morphlines-core/src/java/org/apache/solr/morphlines/solr essentially the same files are there in both codebase. Ideally, the source should be maintained in one project and the jar should be referred from the other project --Noble On Wed, May 13, 2015 at 3:29 AM, Shawn Heisey apa...@elyograg.org wrote: On 5/12/2015 2:14 PM, Noble Paul wrote: When I said jar dependency , I did not mean , that we check in the jar we use httpclient, but if you checkout lucene trunk you don't get the httpclient jar ,but the build process will add it to the distribution Doesn't that describe what happens with morphlines? The build process adds it to the distribution. If I'm completely missing the point of what you're saying, I'll shut up and let you elaborate and discuss it with other people who know what's going on. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Noble Paul
RE: Why morphlines code is in Solr?
Hi, I think his question was why the morphlines contrib *source code* is in Solr at all. He argues that we could simply fetch the pre-built contrib module from Maven and not have a fork of the whole module in Solr. Indeed I also don't like it that there are 2 almost similar variants of the morphlines contrib... Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Tuesday, May 12, 2015 11:59 PM To: dev@lucene.apache.org Subject: Re: Why morphlines code is in Solr? On 5/12/2015 2:14 PM, Noble Paul wrote: When I said jar dependency , I did not mean , that we check in the jar we use httpclient, but if you checkout lucene trunk you don't get the httpclient jar ,but the build process will add it to the distribution Doesn't that describe what happens with morphlines? The build process adds it to the distribution. If I'm completely missing the point of what you're saying, I'll shut up and let you elaborate and discuss it with other people who know what's going on. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-7275: --- Attachment: SOLR-7275.patch This patch filters out authz and context creation for *.png and *.html requests. There were a lot of those coming in for the new Admin UI. Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541578#comment-14541578 ] Karl Wright commented on LUCENE-6450: - I have some ideas for a geohash given (x,y,z) values that may turn out to be of interest. This geohash would have acceptable precision (a few meters) when packed in a long (64 bits). Question: does lucene efficiently support field types of that length? Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6450: --- Attachment: LUCENE-6450.patch Updated patch to make Query fields final. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-6481: --- Attachment: LUCENE-6481.patch New patch, starting from [~nknize]'s and then folding in the evilish random test I added for LUCENE-6477 ... maybe this can help debug why there are false negatives? E.g. with this patch when I run: {noformat} ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandomTiny -Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true {noformat} It fails with this: {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=F1E43F53709BFF82 -Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 2.91s | TestGeoPointQuery.testRandomTiny [junit4] Throwable #1: java.lang.AssertionError: id=0 docID=0 lat=-27.18027939545 lon=-167.14191331870592 expected true but got: false deleted?=false [junit4]at __randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:B8A3E1152EBAC72E]:0) [junit4]at org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:301) [junit4]at org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:203) [junit4]at org.apache.lucene.search.TestGeoPointQuery.testRandomTiny(TestGeoPointQuery.java:125) [junit4]at java.lang.Thread.run(Thread.java:745) {noformat} The test case should be easy-ish to debug: it only indexes at most a few 10s of points... Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481.patch, LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org