[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053912#comment-17053912 ] Ishan Chattopadhyaya commented on SOLR-13749: - Even though I feel there shouldn't be a separate qparser, I feel \{!xcjf} is a cryptic choice for the query parser name. \{!ccjoin} or \{!xcjoin} would've been better names. > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Blocker > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The XCJF query will not be aware of changes to the remote collection, so > if the remote collection is updated, cached XCJF queries may give inaccurate > results. > After the ttl period has expired, the XCJF query will re-execute the join > against the remote collection.| > |_All others_| |Any normal Solr parameter can also be specified as a local > param.| > > Example Solr Config.xml changes: > > {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}} > {{ }}{{class}}{{=}}{{"solr.LRUCache"}} > {{ }}{{size}}{{=}}{{"128"}} > {{ }}{{initialSize}}{{=}}{{"0"}} > {{ }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}} > > {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} >
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053906#comment-17053906 ] Ishan Chattopadhyaya commented on SOLR-13749: - bq. Basically, lets enhance JoinQParserPlugin to know when to use this new implementation instead of adding a new query parser that looks like the current one. The existing one already has a "method" and branches. Can we get this in ASAP for 8.5 please? Fully agree. Users shouldn't have to worry about using {!join} or {!xcjf} based on whether the other collection is co-located on the same nodes or not. bq. There is already a long standing precedent for having different query parsers for join operations that are significantly different. child, parent, and join. Why would the xcjf parser be treated by a different set of standards? All of these are conceptually different and reasonably clear in terms of nomenclature ({!child} and {!parent} for nested/block join, {!join} for generic joins). > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Blocker > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The XCJF query will not be aware of changes to the remote collection, so > if the remote collection is updated, cached XCJF
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053903#comment-17053903 ] Ishan Chattopadhyaya commented on SOLR-13749: - +1 to consolidating within existing JoinQParserPlugin. It already supports cross core join, which shouldn't be conceptually different from cross collection join (from users' point of view). We should minimize "query parser sprawl". I've not looked at the implementation here, but I really hope this new functionality can play well with SOLR-13350. For context, I had to do some surgery to the JoinQParserPlugin in order to make it play nicely. > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Blocker > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The XCJF query will not be aware of changes to the remote collection, so > if the remote collection is updated, cached XCJF queries may give inaccurate > results. > After the ttl period has expired, the XCJF query will re-execute the join > against the remote collection.| > |_All others_| |Any normal Solr parameter can also be specified as a local > param.| > > Example Solr Config.xml changes: > > {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}} > {{
[jira] [Commented] (SOLR-13893) BlobRepository looks at the wrong system variable (runtme.lib.size)
[ https://issues.apache.org/jira/browse/SOLR-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053895#comment-17053895 ] Munendra S N commented on SOLR-13893: - [^SOLR-13893.patch] patch against the master. > BlobRepository looks at the wrong system variable (runtme.lib.size) > --- > > Key: SOLR-13893 > URL: https://issues.apache.org/jira/browse/SOLR-13893 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-13893.patch, SOLR-13893.patch, SOLR-13893.patch > > > Tim Swetland on the user's list pointed out this line in BlobRepository: > private static final long MAX_JAR_SIZE = > Long.parseLong(System.getProperty("runtme.lib.size", String.valueOf(5 * 1024 > * 1024))); > "runtme" can't be right. > [~ichattopadhyaya][~noblepaul] what's your opinion? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13893) BlobRepository looks at the wrong system variable (runtme.lib.size)
[ https://issues.apache.org/jira/browse/SOLR-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13893: Attachment: SOLR-13893.patch > BlobRepository looks at the wrong system variable (runtme.lib.size) > --- > > Key: SOLR-13893 > URL: https://issues.apache.org/jira/browse/SOLR-13893 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-13893.patch, SOLR-13893.patch, SOLR-13893.patch > > > Tim Swetland on the user's list pointed out this line in BlobRepository: > private static final long MAX_JAR_SIZE = > Long.parseLong(System.getProperty("runtme.lib.size", String.valueOf(5 * 1024 > * 1024))); > "runtme" can't be right. > [~ichattopadhyaya][~noblepaul] what's your opinion? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request
[ https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053894#comment-17053894 ] Munendra S N commented on SOLR-13944: - [^SOLR-13944.patch] Thanks [~tflobbe] for the review. I have addressed the comments > CollapsingQParserPlugin throws NPE instead of bad request > - > > Key: SOLR-13944 > URL: https://issues.apache.org/jira/browse/SOLR-13944 > Project: Solr > Issue Type: Bug >Affects Versions: 7.3.1 >Reporter: Stefan >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-13944.patch, SOLR-13944.patch > > > I noticed the following NPE: > {code:java} > java.lang.NullPointerException at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > {code} > If I am correct, the problem was already addressed in SOLR-8807. The fix does > was not working in this case though, because of a syntax error in the query > (I used the local parameter syntax twice instead of combining it). The > relevant part of the query is: > {code:java} > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price > asc, id asc'} > {code} > After discussing that on the mailing list, I was asked to open a ticket, > because this situation should result in a bad request instead of a > NullpointerException (see > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request
[ https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13944: Attachment: SOLR-13944.patch > CollapsingQParserPlugin throws NPE instead of bad request > - > > Key: SOLR-13944 > URL: https://issues.apache.org/jira/browse/SOLR-13944 > Project: Solr > Issue Type: Bug >Affects Versions: 7.3.1 >Reporter: Stefan >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-13944.patch, SOLR-13944.patch > > > I noticed the following NPE: > {code:java} > java.lang.NullPointerException at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > {code} > If I am correct, the problem was already addressed in SOLR-8807. The fix does > was not working in this case though, because of a syntax error in the query > (I used the local parameter syntax twice instead of combining it). The > relevant part of the query is: > {code:java} > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price > asc, id asc'} > {code} > After discussing that on the mailing list, I was asked to open a ticket, > because this situation should result in a bad request instead of a > NullpointerException (see > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053888#comment-17053888 ] Kevin Watters commented on SOLR-13749: -- dismax.. edismax payload_check, payload_score ... there's a lot of query parser sprawl... This query parser doesn't support any of the scoring of the existing join query parser.. so there would be a lot of details to address before the functionality could be merged into the standard join query parser, otherwise we would have to answer questions like, "why doesn't the score=max work on the join query ?" , sure we could add this in at some point, the xcjf query is a filtering only operation. It does not do any scoring, for me that is pretty fundamentally different from the current join query parser. > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Blocker > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The XCJF query will not be aware of changes to the remote collection, so > if the remote collection is updated, cached XCJF queries may give inaccurate > results. > After the ttl period has expired, the XCJF query will re-execute the join > against the remote collection.| > |_All others_| |Any normal Solr parameter can also
[jira] [Comment Edited] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053883#comment-17053883 ] David Smiley edited comment on SOLR-13749 at 3/7/20, 4:43 AM: -- The \{{{!join}}} QParser has the params {{fromIndex}}, {{from}}, and {{to}} that align with XCJF functionality of similar parameters (to & from are the same, fromIndex is "collection"). This is not true of \{{{!parent}}} and \{{{!child}}}. Yes XCJF has _additional_ parameters and a cache etc. but they don't change the fundamental semantics (meaning). For years, users have been able to use \{{{!join}}} to match a foreign index to the target index of the request and the foreign index has been able to be a collection name. It has limitations (same node). What's awesome on this issue is that we're lifting that same-machine restriction. I appreciate that the functionality to do that requires fundamentally different code (which users don't care about) and there are tuning knobs. This has been the story for \{{{!join}}} for a long time as it gained the ability to do scoring which required different code. [~mkhl] you may have an opinion here as someone who has put effort into \{{{!join}}} over some years. (BTW boy is it hard to type query parser syntax in JIRA with its escaping :-) was (Author: dsmiley): The \{{{!join}}} QParser has the params {{fromIndex}}, {{from}}, and {{to}} that align with XCJF functionality of similar parameters (to & from are the same, fromIndex is "collection"). This is not true of \{{{!parent}}} and \{{{!child}}}. Yes XCJF has _additional_ parameters and a cache etc. but they don't change the fundamental semantics (meaning). For years, users have been able to use \{{{!join}}} to match a foreign index to the target index of the request and the foreign index has been able to be a collection name. It has limitations (same node). What's awesome on this issue is that we're lifting that same-machine restriction. I appreciate that the functionality to do that requires fundamentally different code (which users don't care about) and there are tuning knobs. This has been the story for {{{!join}}} for a long time as it gained the ability to do scoring which required different code. [~mkhl] you may have an opinion here as someone who has put effort into {{{!join}}} over some years. > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Blocker > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them >
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053883#comment-17053883 ] David Smiley commented on SOLR-13749: - The \{{{!join}}} QParser has the params {{fromIndex}}, {{from}}, and {{to}} that align with XCJF functionality of similar parameters (to & from are the same, fromIndex is "collection"). This is not true of \{{{!parent}}} and \{{{!child}}}. Yes XCJF has _additional_ parameters and a cache etc. but they don't change the fundamental semantics (meaning). For years, users have been able to use \{{{!join}}} to match a foreign index to the target index of the request and the foreign index has been able to be a collection name. It has limitations (same node). What's awesome on this issue is that we're lifting that same-machine restriction. I appreciate that the functionality to do that requires fundamentally different code (which users don't care about) and there are tuning knobs. This has been the story for {{{!join}}} for a long time as it gained the ability to do scoring which required different code. [~mkhl] you may have an opinion here as someone who has put effort into {{{!join}}} over some years. > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Blocker > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053862#comment-17053862 ] Kevin Watters commented on SOLR-13749: -- I'm a bit surprised by this. There is already a long standing precedent for having different query parsers for join operations that are significantly different. child, parent, and join. Why would the xcjf parser be treated by a different set of standards? Are there other query parsers that have been introduced in the past that were forced to be consolidated into an existing parser or is this a new precedent? > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Blocker > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The XCJF query will not be aware of changes to the remote collection, so > if the remote collection is updated, cached XCJF queries may give inaccurate > results. > After the ttl period has expired, the XCJF query will re-execute the join > against the remote collection.| > |_All others_| |Any normal Solr parameter can also be specified as a local > param.| > > Example Solr Config.xml changes: > > {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}} > {{ }}{{class}}{{=}}{{"solr.LRUCache"}} > {{
[jira] [Commented] (SOLR-14040) solr.xml shareSchema does not work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053754#comment-17053754 ] David Smiley commented on SOLR-14040: - I overlooked that it was documented; I don't know why I didn't notice this when I looked for it before (at least I thought I did). [~noble.paul] I propose I comment out the documentation of it so as to reduce exposure to the problem. Maybe only on the 8x & 8.5 release branches. I'll spend time working on the resolution – SOLR-14232 _a linked issue and I think really where this whole conversation should be happening, not here)_. > solr.xml shareSchema does not work in SolrCloud > --- > > Key: SOLR-14040 > URL: https://issues.apache.org/jira/browse/SOLR-14040 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: David Smiley >Assignee: David Smiley >Priority: Blocker > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > solr.xml has a shareSchema boolean option that can be toggled from the > default of false to true in order to share IndexSchema objects within the > Solr node. This is silently ignored in SolrCloud mode. The pertinent code > is {{org.apache.solr.core.ConfigSetService#createConfigSetService}} which > creates a CloudConfigSetService that is not related to the SchemaCaching > class. This may not be a big deal in SolrCloud which tends not to deal well > with many cores per node but I'm working on changing that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14173) Ref Guide Redesign
[ https://issues.apache.org/jira/browse/SOLR-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053789#comment-17053789 ] Cassandra Targett commented on SOLR-14173: -- I've worked on this on & off for the past couple of months and have it pretty close to done. From my list of Known Issues before: * -The fancy tab thing for multiple code examples in one section isn't styled right when you click other tabs- Fixed now. * -The top nav won't be responsive in smaller screens- Finally figured out how to do this responsively * -Behavior of sidebar on smaller screens could be improved- This is better now * -Still many overlapping CSS rules for elements and many unused CSS rules to be cleaned up- Did a lot of cleanup here * Sidebar requires too much scrolling - Phase 2 will trim this down * -Now unused CSS/JS files haven't been deleted yet- * -Search box shows results in the sidebar nav - I wasn't able to see this until yesterday and not sure how I feel about it. At any rate, I haven't worked with it much yet and it needs more work- I worked on this extensively and think it's OK now * -Home page (index.html) needs some additional love- I don't remember what I meant by this, but I worked on that too and it's fine now. Besides fixing these things, I also changed the left nav a little bit in terms of colors of child pages, etc. I don't want to spend too much time on that since Phase 2 will be a re-org that will require some additional work in this area. I've updated the demo site at https://people.apache.org/~ctargett/RefGuideRedesign/ as I've gone along, it's up to date with my latest changes today. I have not pushed the branch in a while, I think I will actually make a new branch since this one is really far behind and trying to update it gives me weird merge conflicts I really don't want to deal with. Comments? Feedback? > Ref Guide Redesign > -- > > Key: SOLR-14173 > URL: https://issues.apache.org/jira/browse/SOLR-14173 > Project: Solr > Issue Type: Improvement > Components: documentation >Reporter: Cassandra Targett >Assignee: Cassandra Targett >Priority: Major > > The current design of the Ref Guide was essentially copied from a > Jekyll-based documentation theme > (https://idratherbewriting.com/documentation-theme-jekyll/), which had a > couple important benefits for that time: > * It was well-documented and since I had little experience with Jekyll and > its Liquid templates and since I was the one doing it, I wanted to make it as > easy on myself as possible > * It was designed for documentation specifically so took care of all the > things like inter-page navigation, etc. > * It helped us get from Confluence to our current system quickly > It had some drawbacks, though: > * It wasted a lot of space on the page > * The theme was built for Markdown files, so did not take advantage of the > features of the {{jekyll-asciidoc}} plugin we use (the in-page TOC being one > big example - the plugin could create it at build time, but the theme > included JS to do it as the page loads, so we use the JS) > * It had a lot of JS and overlapping CSS files. While it used Bootstrap it > used a customized CSS on top of it for theming that made modifications > complex (it was hard to figure out how exactly a change would behave) > * With all the stuff I'd changed in my bumbling way just to get things to > work back then, I broke a lot of the stuff Bootstrap is supposed to give us > in terms of responsiveness and making the Guide usable even on smaller screen > sizes. > After upgrading the Asciidoctor components in SOLR-12786 and stopping the PDF > (SOLR-13782), I wanted to try to set us up for a more flexible system. We > need it for things like Joel's work on the visual guide for streaming > expressions (SOLR-13105), and in order to implement other ideas we might have > on how to present information in the future. > I view this issue as a phase 1 of an overall redesign that I've already > started in a local branch. I'll explain in a comment the changes I've already > made, and will use this issue to create and push a branch where we can > discuss in more detail. > Phase 1 here will be under-the-hood CSS/JS changes + overall page layout > changes. > Phase 2 (issue TBD) will be a wholesale re-organization of all the pages of > the Guide. > Phase 3 (issue TBD) will explore moving us from Jekyll to another static site > generator that is better suited for our content format, file types, and build > conventions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11359) An autoscaling/suggestions endpoint to recommend operations
[ https://issues.apache.org/jira/browse/SOLR-11359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053828#comment-17053828 ] Noble Paul commented on SOLR-11359: --- There is no specific url. You can hit any node in the cluster with that URI & it should work just fine. No. We don't want this to be run automatically. Users should make a conscious decision to run a command that's given by the API. It's safer that way > An autoscaling/suggestions endpoint to recommend operations > --- > > Key: SOLR-11359 > URL: https://issues.apache.org/jira/browse/SOLR-11359 > Project: Solr > Issue Type: New Feature > Components: AutoScaling >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Attachments: SOLR-11359.patch > > > Autoscaling can make suggestions to users on what operations they can perform > to improve the health of the cluster > The suggestions will have the following information > * http end point > * http method (POST,DELETE) > * command payload -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9170) wagon-ssh Maven HTTPS issue
[ https://issues.apache.org/jira/browse/LUCENE-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053748#comment-17053748 ] Ishan Chattopadhyaya commented on LUCENE-9170: -- [~romseygeek], I wasn't able to build Solr package from 8.4 release branch, so I opened this issue. If you aren't able to build during your release process, we can revisit this. In light of that, do you think this is a blocker? I leave the judgement to you. Also, the patch I submitted is the best solution to my extremely basic understanding of our build system; it worked for me. I don't know if this is the best solution, so I'm not in a position to resolve this issue myself without help from experts at the build system. > wagon-ssh Maven HTTPS issue > --- > > Key: LUCENE-9170 > URL: https://issues.apache.org/jira/browse/LUCENE-9170 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.5 > > Attachments: LUCENE-9170.patch, LUCENE-9170.patch > > > When I do, from lucene/ in branch_8_4: > ant -Dversion=8.4.2 generate-maven-artifacts > I see that wagon-ssh is being resolved from http://repo1.maven.org/maven2 > instead of https equivalent. This is surprising to me, since I can't find the > http URL anywhere. > Here's my log: > https://paste.centos.org/view/be2d3f3f > This is a critical issue since releases won't work without this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-13749: Priority: Blocker (was: Major) I understand your point of view but they don't sway my opinion: Params can differ based on the method. The whitelist thing is optional (only applies to multi-cluster). With 8.5 out soon and if nobody has time to develop this further at the moment, I think we have to _do something_ here to prevent a back-compat concern: * Option A: document in an obvious way (i.e. some call-out box) that the name & parameters will likely change without back-compat. In the project we sometimes throw out the word "experimental" a lot but here I'm just claiming the syntax/way it's invoked will change; I'm making no quality judgement on what's underneath. * Option B: comment it out making it invisible * Option C: remove from 8x/8.5; leave in master Please pick do the one that suits you Gus. They are all fine with me. BTW that whitelist thing reminds me heavily of the _existing_ "shardsWhitelist" feature (see distributed-requests.adoc). It's not clear to me if we need a new mechanism here. > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Blocker > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in
[jira] [Reopened] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley reopened SOLR-13749: - > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The XCJF query will not be aware of changes to the remote collection, so > if the remote collection is updated, cached XCJF queries may give inaccurate > results. > After the ttl period has expired, the XCJF query will re-execute the join > against the remote collection.| > |_All others_| |Any normal Solr parameter can also be specified as a local > param.| > > Example Solr Config.xml changes: > > {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}} > {{ }}{{class}}{{=}}{{"solr.LRUCache"}} > {{ }}{{size}}{{=}}{{"128"}} > {{ }}{{initialSize}}{{=}}{{"0"}} > {{ }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}} > > {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} > {{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}} > {{ }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin}} > {{}} > > {{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} > {{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} > {{/>}} >
[jira] [Commented] (SOLR-14040) solr.xml shareSchema does not work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053847#comment-17053847 ] Noble Paul commented on SOLR-14040: --- I've created SOLR-14311 to address the shortcomings > solr.xml shareSchema does not work in SolrCloud > --- > > Key: SOLR-14040 > URL: https://issues.apache.org/jira/browse/SOLR-14040 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > solr.xml has a shareSchema boolean option that can be toggled from the > default of false to true in order to share IndexSchema objects within the > Solr node. This is silently ignored in SolrCloud mode. The pertinent code > is {{org.apache.solr.core.ConfigSetService#createConfigSetService}} which > creates a CloudConfigSetService that is not related to the SchemaCaching > class. This may not be a big deal in SolrCloud which tends not to deal well > with many cores per node but I'm working on changing that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14311) Shared schema should not have access to core level classes
Noble Paul created SOLR-14311: - Summary: Shared schema should not have access to core level classes Key: SOLR-14311 URL: https://issues.apache.org/jira/browse/SOLR-14311 Project: Solr Issue Type: New Feature Security Level: Public (Default Security Level. Issues are Public) Reporter: Noble Paul Assignee: David Smiley When a schema is shared, it should not have access to the classes of a specific core. The core may come and go but the shared schema may continue tolive . So how do we implement that? If a schema is shared, create a new {{SolrResourceLoader}} specifically for that schema object. The classpath should be the same as the classpath for the SRL in {{CoreContainer}}. As and when we implement loading schema plugins from packages, they too should be accessible. The SRL created for this schema should be able to load resources from ZK in the path {{/configs/}}. This SRL should be discarded when the schema no destroyed -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14040) solr.xml shareSchema does not work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-14040. --- Resolution: Resolved > solr.xml shareSchema does not work in SolrCloud > --- > > Key: SOLR-14040 > URL: https://issues.apache.org/jira/browse/SOLR-14040 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > solr.xml has a shareSchema boolean option that can be toggled from the > default of false to true in order to share IndexSchema objects within the > Solr node. This is silently ignored in SolrCloud mode. The pertinent code > is {{org.apache.solr.core.ConfigSetService#createConfigSetService}} which > creates a CloudConfigSetService that is not related to the SchemaCaching > class. This may not be a big deal in SolrCloud which tends not to deal well > with many cores per node but I'm working on changing that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13893) BlobRepository looks at the wrong system variable (runtme.lib.size)
[ https://issues.apache.org/jira/browse/SOLR-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053820#comment-17053820 ] Lucene/Solr QA commented on SOLR-13893: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 74m 43s{color} | {color:green} core in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 80m 6s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-13893 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12995879/SOLR-13893.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / c73d2c1 | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | Default Java | LTS | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/701/testReport/ | | modules | C: solr/core U: solr/core | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/701/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > BlobRepository looks at the wrong system variable (runtme.lib.size) > --- > > Key: SOLR-13893 > URL: https://issues.apache.org/jira/browse/SOLR-13893 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-13893.patch, SOLR-13893.patch > > > Tim Swetland on the user's list pointed out this line in BlobRepository: > private static final long MAX_JAR_SIZE = > Long.parseLong(System.getProperty("runtme.lib.size", String.valueOf(5 * 1024 > * 1024))); > "runtme" can't be right. > [~ichattopadhyaya][~noblepaul] what's your opinion? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request
[ https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053745#comment-17053745 ] Tomas Eduardo Fernandez Lobbe commented on SOLR-13944: -- I don't know much about this part of the code, so I'm just replying based on the patch. As an optimization, can we keep the {{if (!fq.startsWith("{!collapse")) {}} and *only* do the full parse in case of {{true}}? {code:java} +} catch (SyntaxError e) { + // shouldn't happen as filters are already validated } {code} Maybe throw an AssertionError() here, wdyt? > CollapsingQParserPlugin throws NPE instead of bad request > - > > Key: SOLR-13944 > URL: https://issues.apache.org/jira/browse/SOLR-13944 > Project: Solr > Issue Type: Bug >Affects Versions: 7.3.1 >Reporter: Stefan >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-13944.patch > > > I noticed the following NPE: > {code:java} > java.lang.NullPointerException at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > {code} > If I am correct, the problem was already addressed in SOLR-8807. The fix does > was not working in this case though, because of a syntax error in the query > (I used the local parameter syntax twice instead of combining it). The > relevant part of the query is: > {code:java} > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price > asc, id asc'} > {code} > After discussing that on the mailing list, I was asked to open a ticket, > because this situation should result in a bad request instead of a > NullpointerException (see > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14040) solr.xml shareSchema does not work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-14040: -- Priority: Major (was: Blocker) > solr.xml shareSchema does not work in SolrCloud > --- > > Key: SOLR-14040 > URL: https://issues.apache.org/jira/browse/SOLR-14040 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > solr.xml has a shareSchema boolean option that can be toggled from the > default of false to true in order to share IndexSchema objects within the > Solr node. This is silently ignored in SolrCloud mode. The pertinent code > is {{org.apache.solr.core.ConfigSetService#createConfigSetService}} which > creates a CloudConfigSetService that is not related to the SchemaCaching > class. This may not be a big deal in SolrCloud which tends not to deal well > with many cores per node but I'm working on changing that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14311) Shared schema should not have access to core level classes
[ https://issues.apache.org/jira/browse/SOLR-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-14311: -- Fix Version/s: 8.6 > Shared schema should not have access to core level classes > -- > > Key: SOLR-14311 > URL: https://issues.apache.org/jira/browse/SOLR-14311 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: David Smiley >Priority: Major > Fix For: 8.6 > > > When a schema is shared, it should not have access to the classes of a > specific core. The core may come and go but the shared schema may continue > tolive . So how do we implement that? > If a schema is shared, create a new {{SolrResourceLoader}} specifically for > that schema object. The classpath should be the same as the classpath for the > SRL in {{CoreContainer}}. As and when we implement loading schema plugins > from packages, they too should be accessible. The SRL created for this schema > should be able to load resources from ZK in the path > {{/configs/}}. This SRL should be discarded when the schema > no destroyed -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11725) json.facet's stddev() function should be changed to use the "Corrected sample stddev" formula
[ https://issues.apache.org/jira/browse/SOLR-11725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053842#comment-17053842 ] Yonik Seeley commented on SOLR-11725: - +1 to commit > json.facet's stddev() function should be changed to use the "Corrected sample > stddev" formula > - > > Key: SOLR-11725 > URL: https://issues.apache.org/jira/browse/SOLR-11725 > Project: Solr > Issue Type: Sub-task > Components: Facet Module >Reporter: Chris M. Hostetter >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-11725.patch, SOLR-11725.patch, SOLR-11725.patch > > > While working on some equivalence tests/demonstrations for > {{facet.pivot+stats.field}} vs {{json.facet}} I noticed that the {{stddev}} > calculations done between the two code paths can be measurably different, and > realized this is due to them using very different code... > * {{json.facet=foo:stddev(foo)}} > ** {{StddevAgg.java}} > ** {{Math.sqrt((sumSq/count)-Math.pow(sum/count, 2))}} > * {{stats.field=\{!stddev=true\}foo}} > ** {{StatsValuesFactory.java}} > ** {{Math.sqrt(((count * sumOfSquares) - (sum * sum)) / (count * (count - > 1.0D)))}} > Since I"m not really a math guy, I consulting with a bunch of smart math/stat > nerds I know online to help me sanity check if these equations (some how) > reduced to eachother (In which case the discrepancies I was seeing in my > results might have just been due to the order of intermediate operation > execution & floating point rounding differences). > They confirmed that the two bits of code are _not_ equivalent to each other, > and explained that the code JSON Faceting is using is equivalent to the > "Uncorrected sample stddev" formula, while StatsComponent's code is > equivalent to the the "Corrected sample stddev" formula... > https://en.wikipedia.org/wiki/Standard_deviation#Uncorrected_sample_standard_deviation > When I told them that stuff like this is why no one likes mathematicians and > pressed them to explain which one was the "most canonical" (or "most > generally applicable" or "best") definition of stddev, I was told that: > # This is something statisticians frequently disagree on > # Practically speaking the diff between the calculations doesn't tend to > differ significantly when count is "very large" > # _"Corrected sample stddev" is more appropriate when comparing two > distributions_ > Given that: > * the primary usage of computing the stddev of a field/function against a > Solr result set (or against a sub-set of results defined by a facet > constraint) is probably to compare that distribution to a different Solr > result set (or to compare N sub-sets of results defined by N facet > constraints) > * the size of the sets of documents (values) can be relatively small when > computing stats over facet constraint sub-sets > ...it seems like {{StddevAgg.java}} should be updated to use the "Corrected > sample stddev" equation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] beettlle commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits
beettlle commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits URL: https://github.com/apache/lucene-solr/pull/1297#discussion_r389208349 ## File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java ## @@ -1684,58 +1685,39 @@ private void doGetShardIdAndNodeNameProcess(CoreDescriptor cd) { } private void waitForCoreNodeName(CoreDescriptor descriptor) { -int retryCount = 320; -log.debug("look for our core node name"); -while (retryCount-- > 0) { - final DocCollection docCollection = zkStateReader.getClusterState() - .getCollectionOrNull(descriptor.getCloudDescriptor().getCollectionName()); - if (docCollection != null && docCollection.getSlicesMap() != null) { -final Map slicesMap = docCollection.getSlicesMap(); -for (Slice slice : slicesMap.values()) { - for (Replica replica : slice.getReplicas()) { -// TODO: for really large clusters, we could 'index' on this - -String nodeName = replica.getStr(ZkStateReader.NODE_NAME_PROP); -String core = replica.getStr(ZkStateReader.CORE_NAME_PROP); - -String msgNodeName = getNodeName(); -String msgCore = descriptor.getName(); - -if (msgNodeName.equals(nodeName) && core.equals(msgCore)) { - descriptor.getCloudDescriptor() - .setCoreNodeName(replica.getName()); - getCoreContainer().getCoresLocator().persist(getCoreContainer(), descriptor); - return; -} - } +log.debug("waitForCoreNodeName >>> look for our core node name"); +try { + zkStateReader.waitForState(descriptor.getCollectionName(), 320, TimeUnit.SECONDS, c -> { Review comment: Agreed bout having too many settings, we're already drowning in them. Looking back looks like the number was added as part of SOLR-9140 and there's no comment of where the "320" came from. As well, there's another retry number [here](https://github.com/apache/lucene-solr/pull/1297/files#diff-d5e1be02f6f0c397e18380598aa62b3dR476) of "30" but no idea why. So we already have 2 different numbers of retries. If the numbers come from empirical experiments then I agree with them being constants but because they seem arbitrary seems like good candidates of per-application tuning. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits
madrob commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits URL: https://github.com/apache/lucene-solr/pull/1297#discussion_r389193738 ## File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java ## @@ -1684,58 +1685,39 @@ private void doGetShardIdAndNodeNameProcess(CoreDescriptor cd) { } private void waitForCoreNodeName(CoreDescriptor descriptor) { -int retryCount = 320; -log.debug("look for our core node name"); -while (retryCount-- > 0) { - final DocCollection docCollection = zkStateReader.getClusterState() - .getCollectionOrNull(descriptor.getCloudDescriptor().getCollectionName()); - if (docCollection != null && docCollection.getSlicesMap() != null) { -final Map slicesMap = docCollection.getSlicesMap(); -for (Slice slice : slicesMap.values()) { - for (Replica replica : slice.getReplicas()) { -// TODO: for really large clusters, we could 'index' on this - -String nodeName = replica.getStr(ZkStateReader.NODE_NAME_PROP); -String core = replica.getStr(ZkStateReader.CORE_NAME_PROP); - -String msgNodeName = getNodeName(); -String msgCore = descriptor.getName(); - -if (msgNodeName.equals(nodeName) && core.equals(msgCore)) { - descriptor.getCloudDescriptor() - .setCoreNodeName(replica.getName()); - getCoreContainer().getCoresLocator().persist(getCoreContainer(), descriptor); - return; -} - } +log.debug("waitForCoreNodeName >>> look for our core node name"); +try { + zkStateReader.waitForState(descriptor.getCollectionName(), 320, TimeUnit.SECONDS, c -> { Review comment: In general, I think it's good to have knobs, but there's definitely the possibility of having too many things available to configure and overwhelming operators. Can you describe what conditions would lead to wanting to tweak this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] beettlle commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits
beettlle commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits URL: https://github.com/apache/lucene-solr/pull/1297#discussion_r389132693 ## File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java ## @@ -1684,58 +1685,39 @@ private void doGetShardIdAndNodeNameProcess(CoreDescriptor cd) { } private void waitForCoreNodeName(CoreDescriptor descriptor) { -int retryCount = 320; -log.debug("look for our core node name"); -while (retryCount-- > 0) { - final DocCollection docCollection = zkStateReader.getClusterState() - .getCollectionOrNull(descriptor.getCloudDescriptor().getCollectionName()); - if (docCollection != null && docCollection.getSlicesMap() != null) { -final Map slicesMap = docCollection.getSlicesMap(); -for (Slice slice : slicesMap.values()) { - for (Replica replica : slice.getReplicas()) { -// TODO: for really large clusters, we could 'index' on this - -String nodeName = replica.getStr(ZkStateReader.NODE_NAME_PROP); -String core = replica.getStr(ZkStateReader.CORE_NAME_PROP); - -String msgNodeName = getNodeName(); -String msgCore = descriptor.getName(); - -if (msgNodeName.equals(nodeName) && core.equals(msgCore)) { - descriptor.getCloudDescriptor() - .setCoreNodeName(replica.getName()); - getCoreContainer().getCoresLocator().persist(getCoreContainer(), descriptor); - return; -} - } +log.debug("waitForCoreNodeName >>> look for our core node name"); +try { + zkStateReader.waitForState(descriptor.getCollectionName(), 320, TimeUnit.SECONDS, c -> { Review comment: If this change is being made, should the number of retries be configurable? This hardcoded value seems to be used a lot in the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on issue #1215: LUCENE-9164: Ignore ACE on tragic event if IW is closed
dnhatn commented on issue #1215: LUCENE-9164: Ignore ACE on tragic event if IW is closed URL: https://github.com/apache/lucene-solr/pull/1215#issuecomment-595913520 Closes in favor of https://github.com/apache/lucene-solr/pull/1319. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn closed pull request #1215: LUCENE-9164: Ignore ACE on tragic event if IW is closed
dnhatn closed pull request #1215: LUCENE-9164: Ignore ACE on tragic event if IW is closed URL: https://github.com/apache/lucene-solr/pull/1215 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14040) solr.xml shareSchema does not work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053695#comment-17053695 ] Cassandra Targett commented on SOLR-14040: -- bq. As I understand shared-schema (SolrCloud or not) is still in development and not documented The page {{format-of-solr-xml.adoc}} does show a {{shareSchema}} parameter, which I think is what's being talked about here, so it is actually already documented. Since it is documented, whenever a feature doesn't work in a pretty common use case (like running Solr in SolrCloud mode), we should document the limitation when it exists in a release that a user would actually use. bq. If we document the limitation, shouldn't we also document the feature itself, but actually it is not ready... difficult. Is there a section specific to "coming" features? You're right, that situation would be awkward and we wouldn't do it - if it's not documented it's like it otherwise doesn't exist so mentioning a limitation in something that doesn't exist would be misleading, really. We don't put info about future features in the Ref Guide as a general rule (unless it's to say something like some changes may be coming in the future, but even then, it's not usually that relevant - it could be years before it arrives). If something is implemented but we consider it experimental, that's fine to document as long as we're clear about its status as Experimental in that documentation. Such docs should also include known limitations (which may or may not be resolved in the future) whenever possible. > solr.xml shareSchema does not work in SolrCloud > --- > > Key: SOLR-14040 > URL: https://issues.apache.org/jira/browse/SOLR-14040 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: David Smiley >Assignee: David Smiley >Priority: Blocker > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > solr.xml has a shareSchema boolean option that can be toggled from the > default of false to true in order to share IndexSchema objects within the > Solr node. This is silently ignored in SolrCloud mode. The pertinent code > is {{org.apache.solr.core.ConfigSetService#createConfigSetService}} which > creates a CloudConfigSetService that is not related to the SchemaCaching > class. This may not be a big deal in SolrCloud which tends not to deal well > with many cores per node but I'm working on changing that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9266) ant nightly-smoke fails due to presence of build.gradle
[ https://issues.apache.org/jira/browse/LUCENE-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053682#comment-17053682 ] Mike Drob commented on LUCENE-9266: --- As I'm working on this I'm discovering other issues present as well, will fix them all in a single patch if they remain small enough. > ant nightly-smoke fails due to presence of build.gradle > --- > > Key: LUCENE-9266 > URL: https://issues.apache.org/jira/browse/LUCENE-9266 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Mike Drob >Priority: Major > > Seen on Jenkins - > [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/1617/console] > > Reproduced locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9266) ant nightly-smoke fails due to presence of build.gradle
[ https://issues.apache.org/jira/browse/LUCENE-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated LUCENE-9266: -- Parent: LUCENE-9077 Issue Type: Sub-task (was: Task) > ant nightly-smoke fails due to presence of build.gradle > --- > > Key: LUCENE-9266 > URL: https://issues.apache.org/jira/browse/LUCENE-9266 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Mike Drob >Priority: Major > > Seen on Jenkins - > [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/1617/console] > > Reproduced locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9266) ant nightly-smoke fails due to presence of build.gradle
[ https://issues.apache.org/jira/browse/LUCENE-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053679#comment-17053679 ] Mike Drob commented on LUCENE-9266: --- Porting the nightly smoke to gradle is outside of the scope of what I want to do here. > ant nightly-smoke fails due to presence of build.gradle > --- > > Key: LUCENE-9266 > URL: https://issues.apache.org/jira/browse/LUCENE-9266 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Mike Drob >Priority: Major > > Seen on Jenkins - > [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/1617/console] > > Reproduced locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053668#comment-17053668 ] Xin-Chun Zhang commented on LUCENE-9136: 1. My personal git branch: [https://github.com/irvingzhang/lucene-solr/tree/jira/lucene-9136-ann-ivfflat]. 2. The vector format is as follows, !image-2020-03-07-01-25-58-047.png|width=535,height=297! Structure of IVF index meta is as follows, !image-2020-03-07-01-27-12-859.png|width=606,height=276! Structure of IVF data: !image-2020-03-07-01-22-06-132.png|width=529,height=309! 3. Ann-benchmark tool could be found in: [https://github.com/irvingzhang/ann-benchmarks]. Benchmark results (Single Thread, 2.5GHz * 2CPU, 16GB RAM, nprobe=8,16,32,64,128,256, centroids=4*sqrt(N), where N the size of dataset): 1) Glove-1.2M-25D-Angular: index build + training cost 706s, qps: 18.8~49.6, recall: 76.8%~99.7% !glove-25-angular.png|width=653,height=450! 2) Glove-1.2M-100D-Angular: index build + training cost 2487s, qps: 12.2~38.3, recall 65.8%~96.3% !glove-100-angular.png|width=671,height=462! 3) Sift-1M-128D-Euclidean: index build + training cost 2397s, qps 14.8~38.2, recall 71.1%~99.2% !sift-128-euclidean.png|width=684,height=471! > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: glove-100-angular.png, glove-25-angular.png, > image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, > image-2020-03-07-01-27-12-859.png, sift-128-euclidean.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is >
[jira] [Issue Comment Deleted] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Comment: was deleted (was: 1. My personal git branch: [https://github.com/irvingzhang/lucene-solr/tree/jira/lucene-9136-ann-ivfflat]. 2. The vector format is as follows, !image-2020-03-07-01-25-58-047.png|width=535,height=297! Structure of IVF index meta is as follows, !image-2020-03-07-01-27-12-859.png|width=606,height=276! Structure of IVF data: !image-2020-03-07-01-22-06-132.png|width=529,height=309! 3. Ann-benchmark tool could be found in: [https://github.com/irvingzhang/ann-benchmarks]. Benchmark results (Single Thread, 2.5GHz * 2CPU, 16GB RAM, nprobe=8,16,32,64,128,256, centroids=4*sqrt(N), where N the size of dataset): 1) Glove-1.2M-25D-Angular: index build + training cost 706s, qps: 18.8~49.6, recall: 76.8%~99.7% !https://intranetproxy.alipay.com/skylark/lark/0/2020/png/35769/1583504416262-89784074-c9dc-4489-99a1-5e4b3c76e5fc.png|width=624,height=430! 2) Glove-1.2M-100D-Angular: index build + training cost 2487s, qps: 12.2~38.3, recall 65.8%~96.3% !https://intranetproxy.alipay.com/skylark/lark/0/2020/png/35769/1583510066130-b4fbcb29-8ad7-4ff2-99ce-c52f7c27826e.png|width=679,height=468! 3) Sift-1M-128D-Euclidean: index build + training cost 2397s, qps 14.8~38.2, recall 71.1%~99.2% !https://intranetproxy.alipay.com/skylark/lark/0/2020/png/35769/1583515010497-20b74f41-72c3-48ce-a929-1cbfbd6a6423.png|width=691,height=476! ) > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: glove-100-angular.png, glove-25-angular.png, > image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, > image-2020-03-07-01-27-12-859.png, sift-128-euclidean.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: sift-128-euclidean.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: glove-100-angular.png, glove-25-angular.png, > image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, > image-2020-03-07-01-27-12-859.png, sift-128-euclidean.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: glove-25-angular.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: glove-100-angular.png, glove-25-angular.png, > image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, > image-2020-03-07-01-27-12-859.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: glove-100-angular.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: glove-100-angular.png, glove-25-angular.png, > image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, > image-2020-03-07-01-27-12-859.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request
[ https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053653#comment-17053653 ] Lucene/Solr QA commented on SOLR-13944: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 74m 57s{color} | {color:green} core in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 82m 23s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-13944 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12995852/SOLR-13944.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / c73d2c1 | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | Default Java | LTS | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/699/testReport/ | | modules | C: solr/core U: solr/core | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/699/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > CollapsingQParserPlugin throws NPE instead of bad request > - > > Key: SOLR-13944 > URL: https://issues.apache.org/jira/browse/SOLR-13944 > Project: Solr > Issue Type: Bug >Affects Versions: 7.3.1 >Reporter: Stefan >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-13944.patch > > > I noticed the following NPE: > {code:java} > java.lang.NullPointerException at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > {code} > If I am correct, the problem was already addressed in SOLR-8807. The fix does > was not working in this case though, because of a syntax error in the query > (I used the local parameter syntax twice instead of combining it). The > relevant part of the query is: > {code:java} > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price > asc, id asc'} > {code} > After discussing that on the mailing list, I was asked to open a ticket, > because this situation should result in a bad request instead of a > NullpointerException (see > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14073) Fix segment look ahead NPE in CollapsingQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-14073: -- Attachment: SOLR-14073.patch > Fix segment look ahead NPE in CollapsingQParserPlugin > - > > Key: SOLR-14073 > URL: https://issues.apache.org/jira/browse/SOLR-14073 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: SOLR-14073.patch, SOLR-14073.patch, SOLR-14073.patch > > > The CollapsingQParserPlugin has a bug that if every segment is not visited > during the collect it throws an NPE. This causes the CollapsingQParserPlugin > to not work when used with any feature that short circuits the segments > during the collect. This includes using the CollapsingQParserPlugin twice in > the same query and the time limiting collector. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053640#comment-17053640 ] Michael Froh commented on LUCENE-8962: -- bq. With a slightly refactored IW we can share the merge logic and let the reader re-write itself since we are talking about very small segments the overhead is very small. This would in turn mean that we are doing the work twice ie. the IW would do its normal work and might merge later etc. Just to provide a bit more context, for the case where my team uses this change, we're replicating the index (think Solr master/slave) from "writers" to many "searchers", so we're avoiding doing the work many times. An earlier (less invasive) approach I tried to address the small flushed segments problem was roughly: call commit on writer, hard link the commit files to another filesystem directory to "clone" the index, open an IW on that directory, merge small segments on the clone, let searchers replicate from the clone. That approach does mean that the merging work happens twice (since the "real" index doesn't benefit from the merge on the clone), but it doesn't involve any changes in Lucene. Maybe that less-invasive approach is a better way to address this. It's certainly more consistent with [~simonw]'s suggestion above. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9.5h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053641#comment-17053641 ] Kevin Watters commented on SOLR-13749: -- having a local param like method=xcjf could trigger the xcjf query parser if we want. There are some complications. Currently, XCJF benefits greatly by some additional configuration for that query parser to specify the field in which a collection has been routed on. The current join query parsers aren't defined by default in the solrconfig.xml . by merging together the functionality of these 2 query parsers, we might want to explicitly define the join query parser in the solr config by default. Additionally, there are many query parsers beyond xcjf that are really join query parsers. "child", and "parent" should also be considered "join" query parsers if we want to fully go to a consolidated join query parser model. We'll try to be responsive to issues on this ticket, however, I'm not sure how much bandwidth we will have for larger refactors related to xcjf. My preference would be that we leave it as is. This is what we were asked to develop and contribute back so we'd like to keep it as close to the original contribution as possible. If we collectively want to wrangle all of those join parsers into a single consolidated join query parser perhaps we could track that as a different issue/ticket. > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully
dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully URL: https://github.com/apache/lucene-solr/pull/1319#discussion_r389028879 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -299,7 +300,76 @@ static int getActualMaxDocs() { final FieldNumbers globalFieldNumberMap; final DocumentsWriter docWriter; - private final Queue eventQueue = new ConcurrentLinkedQueue<>(); + private final CloseableQueue eventQueue = new CloseableQueue(this); + + static final class CloseableQueue implements Closeable { +private volatile boolean closed = false; +private final Semaphore permits = new Semaphore(Integer.MAX_VALUE); +private final Queue queue = new ConcurrentLinkedQueue<>(); +private final IndexWriter writer; + +CloseableQueue(IndexWriter writer) { + this.writer = writer; +} + +private void tryAcquire() { + if (permits.tryAcquire() == false) { +throw new AlreadyClosedException("queue is closed"); + } + if (closed) { +throw new AlreadyClosedException("queue is closed"); + } +} + +boolean add(Event event) { + tryAcquire(); + try { +return queue.add(event); + } finally { +permits.release(); + } +} + +void processEvents() throws IOException { + tryAcquire(); + try { +processEventsInternal(); + }finally { Review comment: nit: space after `}` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully
dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully URL: https://github.com/apache/lucene-solr/pull/1319#discussion_r389028289 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -299,7 +300,76 @@ static int getActualMaxDocs() { final FieldNumbers globalFieldNumberMap; final DocumentsWriter docWriter; - private final Queue eventQueue = new ConcurrentLinkedQueue<>(); + private final CloseableQueue eventQueue = new CloseableQueue(this); + + static final class CloseableQueue implements Closeable { +private volatile boolean closed = false; +private final Semaphore permits = new Semaphore(Integer.MAX_VALUE); +private final Queue queue = new ConcurrentLinkedQueue<>(); +private final IndexWriter writer; + +CloseableQueue(IndexWriter writer) { + this.writer = writer; +} + +private void tryAcquire() { + if (permits.tryAcquire() == false) { +throw new AlreadyClosedException("queue is closed"); + } + if (closed) { +throw new AlreadyClosedException("queue is closed"); + } +} + +boolean add(Event event) { + tryAcquire(); + try { +return queue.add(event); + } finally { +permits.release(); + } +} + +void processEvents() throws IOException { + tryAcquire(); + try { +processEventsInternal(); + }finally { +permits.release(); + } +} +private void processEventsInternal() throws IOException { Review comment: nit: add a new line This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully
dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully URL: https://github.com/apache/lucene-solr/pull/1319#discussion_r389029514 ## File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java ## @@ -3773,7 +3774,58 @@ public void testRefreshAndRollbackConcurrently() throws Exception { stopped.set(true); indexer.join(); refresher.join(); + if (w.getTragicException() != null) { +w.getTragicException().printStackTrace(); Review comment: I think we don't need to print the stack trace here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully
dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully URL: https://github.com/apache/lucene-solr/pull/1319#discussion_r389028879 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -299,7 +300,76 @@ static int getActualMaxDocs() { final FieldNumbers globalFieldNumberMap; final DocumentsWriter docWriter; - private final Queue eventQueue = new ConcurrentLinkedQueue<>(); + private final CloseableQueue eventQueue = new CloseableQueue(this); + + static final class CloseableQueue implements Closeable { +private volatile boolean closed = false; +private final Semaphore permits = new Semaphore(Integer.MAX_VALUE); +private final Queue queue = new ConcurrentLinkedQueue<>(); +private final IndexWriter writer; + +CloseableQueue(IndexWriter writer) { + this.writer = writer; +} + +private void tryAcquire() { + if (permits.tryAcquire() == false) { +throw new AlreadyClosedException("queue is closed"); + } + if (closed) { +throw new AlreadyClosedException("queue is closed"); + } +} + +boolean add(Event event) { + tryAcquire(); + try { +return queue.add(event); + } finally { +permits.release(); + } +} + +void processEvents() throws IOException { + tryAcquire(); + try { +processEventsInternal(); + }finally { Review comment: nit: space after `{` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on issue #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches
atris commented on issue #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches URL: https://github.com/apache/lucene-solr/pull/1294#issuecomment-595884480 @jpountz Raised another iteration, please let me know your thoughts and comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053623#comment-17053623 ] Xin-Chun Zhang commented on LUCENE-9136: 1. My personal git branch: [https://github.com/irvingzhang/lucene-solr/tree/jira/lucene-9136-ann-ivfflat]. 2. The vector format is as follows, !image-2020-03-07-01-25-58-047.png|width=535,height=297! Structure of IVF index meta is as follows, !image-2020-03-07-01-27-12-859.png|width=606,height=276! Structure of IVF data: !image-2020-03-07-01-22-06-132.png|width=529,height=309! 3. Ann-benchmark tool could be found in: [https://github.com/irvingzhang/ann-benchmarks]. Benchmark results (Single Thread, 2.5GHz * 2CPU, 16GB RAM, nprobe=8,16,32,64,128,256, centroids=4*sqrt(N), where N the size of dataset): 1) Glove-1.2M-25D-Angular: index build + training cost 706s, qps: 18.8~49.6, recall: 76.8%~99.7% !https://intranetproxy.alipay.com/skylark/lark/0/2020/png/35769/1583504416262-89784074-c9dc-4489-99a1-5e4b3c76e5fc.png|width=624,height=430! 2) Glove-1.2M-100D-Angular: index build + training cost 2487s, qps: 12.2~38.3, recall 65.8%~96.3% !https://intranetproxy.alipay.com/skylark/lark/0/2020/png/35769/1583510066130-b4fbcb29-8ad7-4ff2-99ce-c52f7c27826e.png|width=679,height=468! 3) Sift-1M-128D-Euclidean: index build + training cost 2397s, qps 14.8~38.2, recall 71.1%~99.2% !https://intranetproxy.alipay.com/skylark/lark/0/2020/png/35769/1583515010497-20b74f41-72c3-48ce-a929-1cbfbd6a6423.png|width=691,height=476! > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: image-2020-03-07-01-22-06-132.png, > image-2020-03-07-01-25-58-047.png, image-2020-03-07-01-27-12-859.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053620#comment-17053620 ] Michael Sokolov commented on LUCENE-8962: - Based on [~simonw]'s recent comments in github, plus difficulty getting tests to pass consistently (apparently there are more failing tests in Elasticland), we should probably revert for now, at least from 8.x and 8.5 branches. I am tied up for the moment, but will be able to do the revert this weekend. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9.5h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: image-2020-03-07-01-27-12-859.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: image-2020-03-07-01-22-06-132.png, > image-2020-03-07-01-25-58-047.png, image-2020-03-07-01-27-12-859.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: image-2020-03-07-01-25-58-047.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: image-2020-03-07-01-22-06-132.png, > image-2020-03-07-01-25-58-047.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches
atris commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r389038791 ## File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java ## @@ -211,6 +213,18 @@ public IndexSearcher(IndexReaderContext context, Executor executor) { assert context.isTopLevel: "IndexSearcher's ReaderContext must be topLevel for reader" + context.reader(); reader = context.reader(); this.executor = executor; +this.sliceExecutionControlPlane = executor == null ? null : getSliceExecutionControlPlane(executor); +this.readerContext = context; +leafContexts = context.leaves(); +this.leafSlices = executor == null ? null : slices(leafContexts); + } + + // Package private for testing + IndexSearcher(IndexReaderContext context, Executor executor, SliceExecutionControlPlane sliceExecutionControlPlane) { +assert context.isTopLevel: "IndexSearcher's ReaderContext must be topLevel for reader" + context.reader(); +reader = context.reader(); +this.executor = executor; +this.sliceExecutionControlPlane = executor == null ? null : sliceExecutionControlPlane; Review comment: Not sure if I understood your point. The passed in instance is the one being assigned to the member? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: image-2020-03-07-01-22-06-132.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: image-2020-03-07-01-22-06-132.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: (was: 1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png) > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Comment: was deleted (was: The index format of IVFFlat is organized as follows, !1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png! In general, the number of centroids lies within the interval [4 * sqrt(N), 16 * sqrt(N)], where N is the data set size. We use (4 * sqrt(N)) as the actual value of centroid number to balance between accuracy and computational load, denoted by c. And the full data set is used for training if its size no larger than 200,000. Otherwise (128 * c) points are selected after shuffling for training in order to accelerate training. Experiments have been conducted on a large data set (sift1M, [http://corpus-texmex.irisa.fr/]) to verify the implementation of IVFFlat. The base data set (sift_base.fvecs) contains 1,000,000 vectors with 128 dimensions. And 10,000 queries (sift_query.fvecs) are used for recall testing. The recall ratio follows Recall=(Recall vectors in groundTruth) / (number of queries * TopK), where number of queries = 10,000 and TopK=100. The results are as follows (single thread and single segment), ||nprobe||avg. search time (ms)||recall (%)|| |8|16.3827|44.24| |16|16.5834|58.04| |32|19.2031|71.55| |64|24.7065|83.30| |128|34.9165|92.03| |256|60.5844|97.18| | | | | **The test codes could be found in [https://github.com/irvingzhang/lucene-solr/blob/jira/LUCENE-9136/lucene/core/src/test/org/apache/lucene/util/KnnIvfAndGraphPerformTester.java.|https://github.com/irvingzhang/lucene-solr/blob/jira/LUCENE-9136/lucene/core/src/test/org/apache/lucene/util/KnnIvfAndGraphPerformTester.java] ) > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential >
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: (was: image-2020-02-16-15-05-02-451.png) > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Comment: was deleted (was: Hi, [~jtibshirani], thanks for your suggestions! ??"I wonder if this clustering-based approach could fit more closely in the current search framework. In the current prototype, we keep all the cluster information on-heap. We could instead try storing each cluster as its own 'term' with a postings list. The kNN query would then be modelled as an 'OR' over these terms."?? In the previous implementation ([https://github.com/irvingzhang/lucene-solr/commit/eb5f79ea7a705595821f73f80a0c5752061869b2]), the cluster information is divided into two parts – meta (.ifi) and data(.ifd) as shown in the following figure, where each cluster with a postings list is stored in the data file (.ifd) and not kept on-heap. A major concern of this implementation is its reading performance of cluster data since reading is a very frequent behavior on kNN search. I will test and check the performance. !image-2020-02-16-15-05-02-451.png! ??"Because of this concern, it could be nice to include benchmarks for index time (in addition to QPS)..."?? Many thanks! I will check the links you mentioned and consider optimize the clustering cost. In addition, more benchmarks will be added soon. h2. *UPDATE – Feb. 24, 2020* I have add a new implementation for IVF index, which has been marked as ***V2 under the package org.apache.lucene.codecs.lucene90. In current implementation, the IVF index has been divided into two files with suffixes .ifi and .ifd, respectively. The .ifd file will be read if cluster information is needed. The experiments are conducted on dataset sift1M (Test codes: [https://github.com/irvingzhang/lucene-solr/blob/jira/LUCENE-9136/lucene/core/src/test/org/apache/lucene/util/ivfflat/KnnIvfPerformTester.java]), detailed results are as follows, # add document -- 3921 ms; # commit -- 3912286 ms (mainly spent on k-means training, 10 iterations, 4000 centroids, totally 512,000 vectors used for training); # R@100 recall time and recall ratio are listed in the following table ||nprobe||avg. search time (ms)||recall ratio (%)|| |8|28.0755|44.154| |16|27.1745|57.9945| |32|32.986|71.7003| |64|40.4082|83.50471| |128|50.9569|92.07929| |256|73.923|97.150894| Compare with on-heap implementation of IVF index, the query time increases significantly (22%~71%). Actually, IVF index is comprised of unique docIDs, and will not take up too much memory. *There is a small argument about whether to keep the cluster information on-heap or not. Hope to hear more suggestions.* ) > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio.
[jira] [Commented] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
[ https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053589#comment-17053589 ] David Smiley commented on LUCENE-9258: -- Makes sense to me; your test is perfect. I'm curious; how did you see this at a higher level (e.g. Solr or ES)? The issue title & details here are a bit geeky / low-level and I'm trying to think of a good CHANGES.txt entry that might be more meaningful to users. > DocTermsIndexDocValues should not assume it's operating on a SortedDocValues > field > -- > > Key: LUCENE-9258 > URL: https://issues.apache.org/jira/browse/LUCENE-9258 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.7.2, 8.4 >Reporter: Michele Palmia >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9258.patch > > > When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from > _DocTermsIndexDocValues_ , the latter instantiates a new iterator on > _SortedDocValues_ regardless of the fact that the underlying field can > actually be of a different type (e.g. a _SortedSetDocValues_ processed > through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
[ https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley reassigned LUCENE-9258: Assignee: David Smiley > DocTermsIndexDocValues should not assume it's operating on a SortedDocValues > field > -- > > Key: LUCENE-9258 > URL: https://issues.apache.org/jira/browse/LUCENE-9258 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.7.2, 8.4 >Reporter: Michele Palmia >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9258.patch > > > When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from > _DocTermsIndexDocValues_ , the latter instantiates a new iterator on > _SortedDocValues_ regardless of the fact that the underlying field can > actually be of a different type (e.g. a _SortedSetDocValues_ processed > through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on issue #1301: LUCENE-9254: UniformSplit supports FST off-heap.
bruno-roustant commented on issue #1301: LUCENE-9254: UniformSplit supports FST off-heap. URL: https://github.com/apache/lucene-solr/pull/1301#issuecomment-595856255 Updated after LUCENE-9257 removed FSTLoadMode. Now FST is off-heap by default. It is possible to force it with a boolean in the UniformSplitPostingsFormat. Also, FST is always on-heap if there is block encoding/decoding. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11359) An autoscaling/suggestions endpoint to recommend operations
[ https://issues.apache.org/jira/browse/SOLR-11359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052544#comment-17052544 ] Megan Carey edited comment on SOLR-11359 at 3/6/20, 4:38 PM: - Would it be possible to explicitly return the URL to hit for applying the suggestion? i.e. rather than return an HTTP method, operation type, etc. just return the constructed URL for executing the action? Also, are you considering writing a cron to periodically execute these suggestions? Or was the intention for these to be manually applied? [~noble.paul] was (Author: megancarey): Would it be possible to explicitly return the URL to hit for applying the suggestion? i.e. rather than return an HTTP method, operation type, etc. just return the constructed URL for executing the action? Also, are you considering writing a cron to periodically execute these suggestions? > An autoscaling/suggestions endpoint to recommend operations > --- > > Key: SOLR-11359 > URL: https://issues.apache.org/jira/browse/SOLR-11359 > Project: Solr > Issue Type: New Feature > Components: AutoScaling >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Attachments: SOLR-11359.patch > > > Autoscaling can make suggestions to users on what operations they can perform > to improve the health of the cluster > The suggestions will have the following information > * http end point > * http method (POST,DELETE) > * command payload -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-11725) json.facet's stddev() function should be changed to use the "Corrected sample stddev" formula
[ https://issues.apache.org/jira/browse/SOLR-11725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N reassigned SOLR-11725: --- Assignee: Munendra S N > json.facet's stddev() function should be changed to use the "Corrected sample > stddev" formula > - > > Key: SOLR-11725 > URL: https://issues.apache.org/jira/browse/SOLR-11725 > Project: Solr > Issue Type: Sub-task > Components: Facet Module >Reporter: Chris M. Hostetter >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-11725.patch, SOLR-11725.patch, SOLR-11725.patch > > > While working on some equivalence tests/demonstrations for > {{facet.pivot+stats.field}} vs {{json.facet}} I noticed that the {{stddev}} > calculations done between the two code paths can be measurably different, and > realized this is due to them using very different code... > * {{json.facet=foo:stddev(foo)}} > ** {{StddevAgg.java}} > ** {{Math.sqrt((sumSq/count)-Math.pow(sum/count, 2))}} > * {{stats.field=\{!stddev=true\}foo}} > ** {{StatsValuesFactory.java}} > ** {{Math.sqrt(((count * sumOfSquares) - (sum * sum)) / (count * (count - > 1.0D)))}} > Since I"m not really a math guy, I consulting with a bunch of smart math/stat > nerds I know online to help me sanity check if these equations (some how) > reduced to eachother (In which case the discrepancies I was seeing in my > results might have just been due to the order of intermediate operation > execution & floating point rounding differences). > They confirmed that the two bits of code are _not_ equivalent to each other, > and explained that the code JSON Faceting is using is equivalent to the > "Uncorrected sample stddev" formula, while StatsComponent's code is > equivalent to the the "Corrected sample stddev" formula... > https://en.wikipedia.org/wiki/Standard_deviation#Uncorrected_sample_standard_deviation > When I told them that stuff like this is why no one likes mathematicians and > pressed them to explain which one was the "most canonical" (or "most > generally applicable" or "best") definition of stddev, I was told that: > # This is something statisticians frequently disagree on > # Practically speaking the diff between the calculations doesn't tend to > differ significantly when count is "very large" > # _"Corrected sample stddev" is more appropriate when comparing two > distributions_ > Given that: > * the primary usage of computing the stddev of a field/function against a > Solr result set (or against a sub-set of results defined by a facet > constraint) is probably to compare that distribution to a different Solr > result set (or to compare N sub-sets of results defined by N facet > constraints) > * the size of the sets of documents (values) can be relatively small when > computing stats over facet constraint sub-sets > ...it seems like {{StddevAgg.java}} should be updated to use the "Corrected > sample stddev" equation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13893) BlobRepository looks at the wrong system variable (runtme.lib.size)
[ https://issues.apache.org/jira/browse/SOLR-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13893: Attachment: SOLR-13893.patch > BlobRepository looks at the wrong system variable (runtme.lib.size) > --- > > Key: SOLR-13893 > URL: https://issues.apache.org/jira/browse/SOLR-13893 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-13893.patch, SOLR-13893.patch > > > Tim Swetland on the user's list pointed out this line in BlobRepository: > private static final long MAX_JAR_SIZE = > Long.parseLong(System.getProperty("runtme.lib.size", String.valueOf(5 * 1024 > * 1024))); > "runtme" can't be right. > [~ichattopadhyaya][~noblepaul] what's your opinion? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14289) Solr may attempt to check Chroot after already having connected once
[ https://issues.apache.org/jira/browse/SOLR-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053561#comment-17053561 ] Mike Drob commented on SOLR-14289: -- [~dsmiley] - seems like we're working on similar problems around speeding up core startup - can you take a look at this and let me know what you think? > Solr may attempt to check Chroot after already having connected once > > > Key: SOLR-14289 > URL: https://issues.apache.org/jira/browse/SOLR-14289 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Attachments: Screen Shot 2020-02-26 at 2.56.14 PM.png > > Time Spent: 10m > Remaining Estimate: 0h > > On server startup, we will attempt to load the solr.xml from zookeeper if we > have the right properties set, and then later when starting up the core > container will take time to verify (and create) the chroot even if it is the > same string that we already used before. We can likely skip the second > short-lived zookeeper connection to speed up our startup sequence a little > bit. > > See this attached image from thread profiling during startup. > !Screen Shot 2020-02-26 at 2.56.14 PM.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13893) BlobRepository looks at the wrong system variable (runtme.lib.size)
[ https://issues.apache.org/jira/browse/SOLR-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13893: Status: Patch Available (was: Open) > BlobRepository looks at the wrong system variable (runtme.lib.size) > --- > > Key: SOLR-13893 > URL: https://issues.apache.org/jira/browse/SOLR-13893 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-13893.patch, SOLR-13893.patch > > > Tim Swetland on the user's list pointed out this line in BlobRepository: > private static final long MAX_JAR_SIZE = > Long.parseLong(System.getProperty("runtme.lib.size", String.valueOf(5 * 1024 > * 1024))); > "runtme" can't be right. > [~ichattopadhyaya][~noblepaul] what's your opinion? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13893) BlobRepository looks at the wrong system variable (runtme.lib.size)
[ https://issues.apache.org/jira/browse/SOLR-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053562#comment-17053562 ] Munendra S N commented on SOLR-13893: - [^SOLR-13893.patch] Slightly modified patch > BlobRepository looks at the wrong system variable (runtme.lib.size) > --- > > Key: SOLR-13893 > URL: https://issues.apache.org/jira/browse/SOLR-13893 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-13893.patch, SOLR-13893.patch > > > Tim Swetland on the user's list pointed out this line in BlobRepository: > private static final long MAX_JAR_SIZE = > Long.parseLong(System.getProperty("runtme.lib.size", String.valueOf(5 * 1024 > * 1024))); > "runtme" can't be right. > [~ichattopadhyaya][~noblepaul] what's your opinion? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053545#comment-17053545 ] Nhat Nguyen commented on LUCENE-8962: - Some engine tests in Elasticsearch are failing because of this change. I am working to backport them to Lucene so that we can catch similar issues in Lucene. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9.5h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator
[ https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053517#comment-17053517 ] David Smiley commented on LUCENE-8103: -- Notice that {{TwoPhaseIterator.asDocIdSetIterator(tpi);}} will return an implementation whose {{advance(docId)}} method will move beyond the passed in docID and call matches until it finds a match. That is a waste _if the user of this DISI doesn't care what the next matching document is if the approximation doesn't match_. So QueryValueSource's exists() method could work with the approximation first and if that matches, then and only then call TPI.match. If there is no TPI then the the scorer's DISI is accurate. > QueryValueSource should use TwoPhaseIterator > > > Key: LUCENE-8103 > URL: https://issues.apache.org/jira/browse/LUCENE-8103 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/other >Reporter: David Smiley >Priority: Minor > Attachments: LUCENE-8103.patch > > > QueryValueSource (in "queries" module) is a ValueSource representation of a > Query; the score is the value. It ought to try to use a TwoPhaseIterator > from the query if it can be offered. This will prevent possibly expensive > advancing beyond documents that we aren't interested in. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
[ https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N reassigned SOLR-13199: --- Assignee: Munendra S N > NPE due to unexpected null return value from QueryBitSetProducer.getBitSet > -- > > Key: SOLR-13199 > URL: https://issues.apache.org/jira/browse/SOLR-13199 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The > attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that > you will obtain by following the steps below: > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > {noformat} >Reporter: Johannes Kloos >Assignee: Munendra S N >Priority: Minor > Labels: diffblue, newdev > Attachments: SOLR-13199.patch, home.zip > > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]=*:* > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.NullPointerException > at > org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292) > at org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73) > {noformat} > In ChildDocTransformer.transform, we have the following lines: > {noformat} > final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext); > final int segPrevRootId = segRootId==0? -1: > segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay > {noformat} > But getBitSet can return null if the set of DocIds is empty: > {noformat} > return docIdSet == DocIdSet.EMPTY ? null : ((BitDocIdSet) docIdSet).bits(); > {noformat} > We found this bug using [Diffblue Microservices > Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more > information on this [fuzz testing > campaign|https://www.diffblue.com/blog/2018/12/19/diffblue-microservice-testing-a-sneak-peek-at-our-early-product-and-results?utm_source=solr-br]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
[ https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053452#comment-17053452 ] Munendra S N commented on SOLR-13199: - [^SOLR-13199.patch] NPE is still occurring when using without nestedPath field. I have removed version check which wasn't required. When parentFilter is null then, setting parentFilter to {{MatchNoDocsQuery}} as parentFilter String is specified after parsing it resolves to {{null}} [~dsmiley] Could you please review this once? > NPE due to unexpected null return value from QueryBitSetProducer.getBitSet > -- > > Key: SOLR-13199 > URL: https://issues.apache.org/jira/browse/SOLR-13199 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The > attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that > you will obtain by following the steps below: > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > {noformat} >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > Attachments: SOLR-13199.patch, home.zip > > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]=*:* > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.NullPointerException > at > org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292) > at org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73) > {noformat} > In ChildDocTransformer.transform, we have the following lines: > {noformat} > final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext); > final int segPrevRootId = segRootId==0? -1: > segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay > {noformat} > But getBitSet can return null if the set of DocIds is empty: > {noformat} > return docIdSet == DocIdSet.EMPTY ? null : ((BitDocIdSet) docIdSet).bits(); > {noformat} > We found this bug using [Diffblue Microservices > Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more > information on this [fuzz testing > campaign|https://www.diffblue.com/blog/2018/12/19/diffblue-microservice-testing-a-sneak-peek-at-our-early-product-and-results?utm_source=solr-br]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
[ https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13199: Status: Patch Available (was: Open) > NPE due to unexpected null return value from QueryBitSetProducer.getBitSet > -- > > Key: SOLR-13199 > URL: https://issues.apache.org/jira/browse/SOLR-13199 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The > attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that > you will obtain by following the steps below: > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > {noformat} >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > Attachments: SOLR-13199.patch, home.zip > > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]=*:* > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.NullPointerException > at > org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292) > at org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73) > {noformat} > In ChildDocTransformer.transform, we have the following lines: > {noformat} > final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext); > final int segPrevRootId = segRootId==0? -1: > segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay > {noformat} > But getBitSet can return null if the set of DocIds is empty: > {noformat} > return docIdSet == DocIdSet.EMPTY ? null : ((BitDocIdSet) docIdSet).bits(); > {noformat} > We found this bug using [Diffblue Microservices > Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more > information on this [fuzz testing > campaign|https://www.diffblue.com/blog/2018/12/19/diffblue-microservice-testing-a-sneak-peek-at-our-early-product-and-results?utm_source=solr-br]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
[ https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13199: Attachment: SOLR-13199.patch > NPE due to unexpected null return value from QueryBitSetProducer.getBitSet > -- > > Key: SOLR-13199 > URL: https://issues.apache.org/jira/browse/SOLR-13199 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The > attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that > you will obtain by following the steps below: > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > {noformat} >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > Attachments: SOLR-13199.patch, home.zip > > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]=*:* > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.NullPointerException > at > org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292) > at org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73) > {noformat} > In ChildDocTransformer.transform, we have the following lines: > {noformat} > final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext); > final int segPrevRootId = segRootId==0? -1: > segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay > {noformat} > But getBitSet can return null if the set of DocIds is empty: > {noformat} > return docIdSet == DocIdSet.EMPTY ? null : ((BitDocIdSet) docIdSet).bits(); > {noformat} > We found this bug using [Diffblue Microservices > Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more > information on this [fuzz testing > campaign|https://www.diffblue.com/blog/2018/12/19/diffblue-microservice-testing-a-sneak-peek-at-our-early-product-and-results?utm_source=solr-br]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant closed pull request #1328: LUCENE-9257: Always keep FST off-heap. Remove SegmentReadState.openedFromWriter.
bruno-roustant closed pull request #1328: LUCENE-9257: Always keep FST off-heap. Remove SegmentReadState.openedFromWriter. URL: https://github.com/apache/lucene-solr/pull/1328 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9257) FSTLoadMode should not be BlockTree specific as it is used more generally in index package
[ https://issues.apache.org/jira/browse/LUCENE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Roustant resolved LUCENE-9257. Fix Version/s: 8.6 Resolution: Fixed Thanks reviewers! > FSTLoadMode should not be BlockTree specific as it is used more generally in > index package > -- > > Key: LUCENE-9257 > URL: https://issues.apache.org/jira/browse/LUCENE-9257 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Minor > Fix For: 8.6 > > Time Spent: 1.5h > Remaining Estimate: 0h > > FSTLoadMode and its associate attribute key (static String) are currently > defined in BlockTreeTermsReader, but they are actually used outside of > BlockTree in the general "index" package. > CheckIndex and ReadersAndUpdates are using these enum and attribute key to > drive the FST load mode through the SegmentReader which is not specific to a > postings format. They have an unnecessary dependency to BlockTreeTermsReader. > We could move FSTLoadMode out of BlockTreeTermsReader, to make it a public > enum of the "index" package. That way CheckIndex and ReadersAndUpdates do not > import anymore BlockTreeTermsReader. > This would also allow other postings formats to use the same enum (e.g. > LUCENE-9254) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9257) FSTLoadMode should not be BlockTree specific as it is used more generally in index package
[ https://issues.apache.org/jira/browse/LUCENE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053432#comment-17053432 ] ASF subversion and git services commented on LUCENE-9257: - Commit e7a61eadf6d2f3c722c791e7470a79b2e919cdeb in lucene-solr's branch refs/heads/branch_8x from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e7a61ea ] LUCENE-9257: Always keep FST off-heap. Remove FSTLoadMode, Reader attributes and openedFromWriter. > FSTLoadMode should not be BlockTree specific as it is used more generally in > index package > -- > > Key: LUCENE-9257 > URL: https://issues.apache.org/jira/browse/LUCENE-9257 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > FSTLoadMode and its associate attribute key (static String) are currently > defined in BlockTreeTermsReader, but they are actually used outside of > BlockTree in the general "index" package. > CheckIndex and ReadersAndUpdates are using these enum and attribute key to > drive the FST load mode through the SegmentReader which is not specific to a > postings format. They have an unnecessary dependency to BlockTreeTermsReader. > We could move FSTLoadMode out of BlockTreeTermsReader, to make it a public > enum of the "index" package. That way CheckIndex and ReadersAndUpdates do not > import anymore BlockTreeTermsReader. > This would also allow other postings formats to use the same enum (e.g. > LUCENE-9254) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053427#comment-17053427 ] David Smiley commented on LUCENE-8962: -- Thanks so much for your input Simon! We need to fight the complexity here. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9.5h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9257) FSTLoadMode should not be BlockTree specific as it is used more generally in index package
[ https://issues.apache.org/jira/browse/LUCENE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053412#comment-17053412 ] ASF subversion and git services commented on LUCENE-9257: - Commit c73d2c15ba7c5936715408807184c99ab7cfdfd4 in lucene-solr's branch refs/heads/master from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c73d2c1 ] LUCENE-9257: Always keep FST off-heap. Remove SegmentReadState.openedFromWriter. > FSTLoadMode should not be BlockTree specific as it is used more generally in > index package > -- > > Key: LUCENE-9257 > URL: https://issues.apache.org/jira/browse/LUCENE-9257 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > FSTLoadMode and its associate attribute key (static String) are currently > defined in BlockTreeTermsReader, but they are actually used outside of > BlockTree in the general "index" package. > CheckIndex and ReadersAndUpdates are using these enum and attribute key to > drive the FST load mode through the SegmentReader which is not specific to a > postings format. They have an unnecessary dependency to BlockTreeTermsReader. > We could move FSTLoadMode out of BlockTreeTermsReader, to make it a public > enum of the "index" package. That way CheckIndex and ReadersAndUpdates do not > import anymore BlockTreeTermsReader. > This would also allow other postings formats to use the same enum (e.g. > LUCENE-9254) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram opened a new pull request #1329: SOLR-14275: Policy calculations are very slow for large clusters and large operations
sigram opened a new pull request #1329: SOLR-14275: Policy calculations are very slow for large clusters and large operations URL: https://github.com/apache/lucene-solr/pull/1329 # Description See JIRA for the explanation of the problem. # Solution Try and reduce the combinatoric explosion in the candidate placements. Use caching more effectively. # Tests Manual performance tests using the scenario.txt attached to JIRA. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant closed pull request #1305: LUCENE-9257: Make FSTLoadMode enum not BlockTree specific.
bruno-roustant closed pull request #1305: LUCENE-9257: Make FSTLoadMode enum not BlockTree specific. URL: https://github.com/apache/lucene-solr/pull/1305 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on issue #1305: LUCENE-9257: Make FSTLoadMode enum not BlockTree specific.
bruno-roustant commented on issue #1305: LUCENE-9257: Make FSTLoadMode enum not BlockTree specific. URL: https://github.com/apache/lucene-solr/pull/1305#issuecomment-595761243 Replaced by https://github.com/apache/lucene-solr/pull/1320 to always keep FST off-heap. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9257) FSTLoadMode should not be BlockTree specific as it is used more generally in index package
[ https://issues.apache.org/jira/browse/LUCENE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053400#comment-17053400 ] Bruno Roustant commented on LUCENE-9257: While preparing the port to the 8x branch I saw that I forgot a significant cleanup: the openedFromWriter boolean, which was also added to support FSTLoadMode logic. So I also remove it. For visibility I added PR#1328, but I'll commit it immediately. > FSTLoadMode should not be BlockTree specific as it is used more generally in > index package > -- > > Key: LUCENE-9257 > URL: https://issues.apache.org/jira/browse/LUCENE-9257 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > FSTLoadMode and its associate attribute key (static String) are currently > defined in BlockTreeTermsReader, but they are actually used outside of > BlockTree in the general "index" package. > CheckIndex and ReadersAndUpdates are using these enum and attribute key to > drive the FST load mode through the SegmentReader which is not specific to a > postings format. They have an unnecessary dependency to BlockTreeTermsReader. > We could move FSTLoadMode out of BlockTreeTermsReader, to make it a public > enum of the "index" package. That way CheckIndex and ReadersAndUpdates do not > import anymore BlockTreeTermsReader. > This would also allow other postings formats to use the same enum (e.g. > LUCENE-9254) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant opened a new pull request #1328: LUCENE-9257: Always keep FST off-heap. Remove SegmentReadState.openedFromWriter.
bruno-roustant opened a new pull request #1328: LUCENE-9257: Always keep FST off-heap. Remove SegmentReadState.openedFromWriter. URL: https://github.com/apache/lucene-solr/pull/1328 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11725) json.facet's stddev() function should be changed to use the "Corrected sample stddev" formula
[ https://issues.apache.org/jira/browse/SOLR-11725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053387#comment-17053387 ] Munendra S N commented on SOLR-11725: - I'm planning to commit this weekend (only to master), let me know if there are any concerns > json.facet's stddev() function should be changed to use the "Corrected sample > stddev" formula > - > > Key: SOLR-11725 > URL: https://issues.apache.org/jira/browse/SOLR-11725 > Project: Solr > Issue Type: Sub-task > Components: Facet Module >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-11725.patch, SOLR-11725.patch, SOLR-11725.patch > > > While working on some equivalence tests/demonstrations for > {{facet.pivot+stats.field}} vs {{json.facet}} I noticed that the {{stddev}} > calculations done between the two code paths can be measurably different, and > realized this is due to them using very different code... > * {{json.facet=foo:stddev(foo)}} > ** {{StddevAgg.java}} > ** {{Math.sqrt((sumSq/count)-Math.pow(sum/count, 2))}} > * {{stats.field=\{!stddev=true\}foo}} > ** {{StatsValuesFactory.java}} > ** {{Math.sqrt(((count * sumOfSquares) - (sum * sum)) / (count * (count - > 1.0D)))}} > Since I"m not really a math guy, I consulting with a bunch of smart math/stat > nerds I know online to help me sanity check if these equations (some how) > reduced to eachother (In which case the discrepancies I was seeing in my > results might have just been due to the order of intermediate operation > execution & floating point rounding differences). > They confirmed that the two bits of code are _not_ equivalent to each other, > and explained that the code JSON Faceting is using is equivalent to the > "Uncorrected sample stddev" formula, while StatsComponent's code is > equivalent to the the "Corrected sample stddev" formula... > https://en.wikipedia.org/wiki/Standard_deviation#Uncorrected_sample_standard_deviation > When I told them that stuff like this is why no one likes mathematicians and > pressed them to explain which one was the "most canonical" (or "most > generally applicable" or "best") definition of stddev, I was told that: > # This is something statisticians frequently disagree on > # Practically speaking the diff between the calculations doesn't tend to > differ significantly when count is "very large" > # _"Corrected sample stddev" is more appropriate when comparing two > distributions_ > Given that: > * the primary usage of computing the stddev of a field/function against a > Solr result set (or against a sub-set of results defined by a facet > constraint) is probably to compare that distribution to a different Solr > result set (or to compare N sub-sets of results defined by N facet > constraints) > * the size of the sets of documents (values) can be relatively small when > computing stats over facet constraint sub-sets > ...it seems like {{StddevAgg.java}} should be updated to use the "Corrected > sample stddev" equation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request
[ https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13944: Status: Patch Available (was: Open) > CollapsingQParserPlugin throws NPE instead of bad request > - > > Key: SOLR-13944 > URL: https://issues.apache.org/jira/browse/SOLR-13944 > Project: Solr > Issue Type: Bug >Affects Versions: 7.3.1 >Reporter: Stefan >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-13944.patch > > > I noticed the following NPE: > {code:java} > java.lang.NullPointerException at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > {code} > If I am correct, the problem was already addressed in SOLR-8807. The fix does > was not working in this case though, because of a syntax error in the query > (I used the local parameter syntax twice instead of combining it). The > relevant part of the query is: > {code:java} > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price > asc, id asc'} > {code} > After discussing that on the mailing list, I was asked to open a ticket, > because this situation should result in a bad request instead of a > NullpointerException (see > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request
[ https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053385#comment-17053385 ] Munendra S N commented on SOLR-13944: - [^SOLR-13944.patch] Initial patch for fixing NPE. This is valid, as defType for fq is by default is lucene and then localParams syntax is parsed but the case of tagging for collapse filter wasn't handled in SOLR-8807 (it was doing a simple string match). Here, I have replaced it with filter parsing, without that we can't know if there is collapse filter or not. {noformat} fq={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price asc, id asc'} {noformat} [~tflobbe] As you had asked the user to create the JIRA issue, I would prefer if you could take look at this patch > CollapsingQParserPlugin throws NPE instead of bad request > - > > Key: SOLR-13944 > URL: https://issues.apache.org/jira/browse/SOLR-13944 > Project: Solr > Issue Type: Bug >Affects Versions: 7.3.1 >Reporter: Stefan >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-13944.patch > > > I noticed the following NPE: > {code:java} > java.lang.NullPointerException at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > {code} > If I am correct, the problem was already addressed in SOLR-8807. The fix does > was not working in this case though, because of a syntax error in the query > (I used the local parameter syntax twice instead of combining it). The > relevant part of the query is: > {code:java} > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price > asc, id asc'} > {code} > After discussing that on the mailing list, I was asked to open a ticket, > because this situation should result in a bad request instead of a > NullpointerException (see > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053381#comment-17053381 ] ASF subversion and git services commented on LUCENE-8962: - Commit 90aced5a51f92ffd6e97449eb7c44aacc643c8a3 in lucene-solr's branch refs/heads/branch_8x from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=90aced5 ] LUCENE-8962: Split test case (#1313) * LUCENE-8962: Simplify test case The testMergeOnCommit test case was trying to verify too many things at once: basic semantics of merge on commit and proper behavior when a bunch of indexing threads are writing and committing all at once. Now we just verify basic behavior, with strict assertions on invariants, while leaving it to MockRandomMergePolicy to enable merge on commit in existing test cases to verify that indexing generally works as expected and no new unexpected exceptions are thrown. * LUCENE-8962: Only update toCommit if merge was committed The code was previously assuming that if mergeFinished() was called and isAborted() was false, then the merge must have completed successfully. Instead, we should know for sure if a given merge was committed, and only then update our pending commit SegmentInfos. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053380#comment-17053380 ] ASF subversion and git services commented on LUCENE-8962: - Commit 90aced5a51f92ffd6e97449eb7c44aacc643c8a3 in lucene-solr's branch refs/heads/branch_8x from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=90aced5 ] LUCENE-8962: Split test case (#1313) * LUCENE-8962: Simplify test case The testMergeOnCommit test case was trying to verify too many things at once: basic semantics of merge on commit and proper behavior when a bunch of indexing threads are writing and committing all at once. Now we just verify basic behavior, with strict assertions on invariants, while leaving it to MockRandomMergePolicy to enable merge on commit in existing test cases to verify that indexing generally works as expected and no new unexpected exceptions are thrown. * LUCENE-8962: Only update toCommit if merge was committed The code was previously assuming that if mergeFinished() was called and isAborted() was false, then the merge must have completed successfully. Instead, we should know for sure if a given merge was committed, and only then update our pending commit SegmentInfos. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053382#comment-17053382 ] ASF subversion and git services commented on LUCENE-8962: - Commit 90aced5a51f92ffd6e97449eb7c44aacc643c8a3 in lucene-solr's branch refs/heads/branch_8x from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=90aced5 ] LUCENE-8962: Split test case (#1313) * LUCENE-8962: Simplify test case The testMergeOnCommit test case was trying to verify too many things at once: basic semantics of merge on commit and proper behavior when a bunch of indexing threads are writing and committing all at once. Now we just verify basic behavior, with strict assertions on invariants, while leaving it to MockRandomMergePolicy to enable merge on commit in existing test cases to verify that indexing generally works as expected and no new unexpected exceptions are thrown. * LUCENE-8962: Only update toCommit if merge was committed The code was previously assuming that if mergeFinished() was called and isAborted() was false, then the merge must have completed successfully. Instead, we should know for sure if a given merge was committed, and only then update our pending commit SegmentInfos. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request
[ https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13944: Attachment: SOLR-13944.patch > CollapsingQParserPlugin throws NPE instead of bad request > - > > Key: SOLR-13944 > URL: https://issues.apache.org/jira/browse/SOLR-13944 > Project: Solr > Issue Type: Bug >Affects Versions: 7.3.1 >Reporter: Stefan >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-13944.patch > > > I noticed the following NPE: > {code:java} > java.lang.NullPointerException at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > {code} > If I am correct, the problem was already addressed in SOLR-8807. The fix does > was not working in this case though, because of a syntax error in the query > (I used the local parameter syntax twice instead of combining it). The > relevant part of the query is: > {code:java} > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price > asc, id asc'} > {code} > After discussing that on the mailing list, I was asked to open a ticket, > because this situation should result in a bad request instead of a > NullpointerException (see > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053361#comment-17053361 ] Dawid Weiss commented on LUCENE-9241: - I wasn't really that much concerned; just pointing out the (sad) fact of how it's implemented for Windows. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14310) Expose solr logs with basic filters via HTTP
Noble Paul created SOLR-14310: - Summary: Expose solr logs with basic filters via HTTP Key: SOLR-14310 URL: https://issues.apache.org/jira/browse/SOLR-14310 Project: Solr Issue Type: Sub-task Security Level: Public (Default Security Level. Issues are Public) Reporter: Noble Paul -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14309) Expose GC logs via an HTTP API
Noble Paul created SOLR-14309: - Summary: Expose GC logs via an HTTP API Key: SOLR-14309 URL: https://issues.apache.org/jira/browse/SOLR-14309 Project: Solr Issue Type: Sub-task Security Level: Public (Default Security Level. Issues are Public) Reporter: Noble Paul -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053344#comment-17053344 ] Noble Paul commented on SOLR-13942: --- I've opened a new PR. added more tests . Please review > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: New Feature > Components: v2 API >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > example > download the {{state.json}} of > {code} > GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json > {code} > get a list of all children under {{/live_nodes}} > {code} > GET http://localhost:8983/api/cluster/zk/live_nodes > {code} > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul opened a new pull request #1327: SOLR-13942: /api/cluster/zk/* to fetch raw ZK data
noblepaul opened a new pull request #1327: SOLR-13942: /api/cluster/zk/* to fetch raw ZK data URL: https://github.com/apache/lucene-solr/pull/1327 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053339#comment-17053339 ] ASF subversion and git services commented on SOLR-13942: Commit a8e7895c3007f3aa7e58bc52fb610416e80850a6 in lucene-solr's branch refs/heads/branch_8x from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a8e7895 ] Revert "SOLR-13942: /api/cluster/zk/* to fetch raw ZK data" This reverts commit 2044f8c83ebb0775d76b1e96c168ca936701abd4. > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: New Feature > Components: v2 API >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > example > download the {{state.json}} of > {code} > GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json > {code} > get a list of all children under {{/live_nodes}} > {code} > GET http://localhost:8983/api/cluster/zk/live_nodes > {code} > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053338#comment-17053338 ] ASF subversion and git services commented on SOLR-13942: Commit 4cf37ade3531305d508e383b9c16a0c5690bacae in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4cf37ad ] Revert "SOLR-13942: /api/cluster/zk/* to fetch raw ZK data" This reverts commit bc6fa3b65060b17a88013a0378f4a9d285067d82. > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: New Feature > Components: v2 API >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > example > download the {{state.json}} of > {code} > GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json > {code} > get a list of all children under {{/live_nodes}} > {code} > GET http://localhost:8983/api/cluster/zk/live_nodes > {code} > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #404: Comment to explain how to use URLClassifyProcessorFactory
janhoy commented on issue #404: Comment to explain how to use URLClassifyProcessorFactory URL: https://github.com/apache/lucene-solr/pull/404#issuecomment-595723278 @ohtwadi Do you want to address the review comment so we can merge this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy closed pull request #880: Tweak header format.
janhoy closed pull request #880: Tweak header format. URL: https://github.com/apache/lucene-solr/pull/880 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9033) Update Release docs an scripts with new site instructions
[ https://issues.apache.org/jira/browse/LUCENE-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated LUCENE-9033: Description: *releaseWizard.py:* [PR#1324|https://github.com/apache/lucene-solr/pull/1324] Janhoy has started on this, but will likely not finish before the 8.5 release *[ReleaseTODO|https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo] page:* I suggest we deprecate this page if folks are happy with releaseWizard, which should encapsulate all steps and details, and can also generate an HTML TODO document per release. *publish-solr-ref-guide.sh:* [PR#1326|https://github.com/apache/lucene-solr/pull/1326] This script can be deleted, not in use since we do not publish PDF anymore *(/) solr-ref-gudie/src/meta-docs/publish.adoc:* Done There may be other places affected, such as other WIKI pages? was: *releaseWizard.py:* [PR#1324|https://github.com/apache/lucene-solr/pull/1324] Janhoy has started on this, but will likely not finish before the 8.5 release *[ReleaseTODO|https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo] page:* I suggest we deprecate this page if folks are happy with releaseWizard, which should encapsulate all steps and details, and can also generate an HTML TODO document per release. *publish-solr-ref-guide.sh:* This script can be deleted, not in use since we do not publish PDF anymore *(/) solr-ref-gudie/src/meta-docs/publish.adoc:* Done There may be other places affected, such as other WIKI pages? > Update Release docs an scripts with new site instructions > - > > Key: LUCENE-9033 > URL: https://issues.apache.org/jira/browse/LUCENE-9033 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/tools >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > *releaseWizard.py:* [PR#1324|https://github.com/apache/lucene-solr/pull/1324] > Janhoy has started on this, but will likely not finish before the 8.5 release > *[ReleaseTODO|https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo] > page:* I suggest we deprecate this page if folks are happy with > releaseWizard, which should encapsulate all steps and details, and can also > generate an HTML TODO document per release. > *publish-solr-ref-guide.sh:* > [PR#1326|https://github.com/apache/lucene-solr/pull/1326] This script can be > deleted, not in use since we do not publish PDF anymore > *(/) solr-ref-gudie/src/meta-docs/publish.adoc:* Done > > There may be other places affected, such as other WIKI pages? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org