[jira] [Commented] (SOLR-12901) Make UnifiedHighlighter the default

2020-05-22 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114536#comment-17114536
 ] 

David Smiley commented on SOLR-12901:
-

{{hl.preserveMulti}} is not supported yet, and is possibly more important than 
some of the others for multi-value fields.  It's not too hard if we also assume 
{{hl.bs.type=WHOLE}} for this case, which makes sense to me any way.

> Make UnifiedHighlighter the default
> ---
>
> Key: SOLR-12901
> URL: https://issues.apache.org/jira/browse/SOLR-12901
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Fix For: master (9.0)
>
>
> I think the UnifiedHighlighter should be the default in 8.0.  It's faster and 
> more accurate than alternatives.
> The original highlighter however has some benefits:
> * Different passage/snippet delineation options; somewhat more flexible.  
> Though no i18n BreakIterator based one.
> * Seems to handle some "special" Queries and/or QueryParsers by default 
> better -- namely SurroundQParser.  Though SOLR-12895 will address this UH 
> issue.
> * Considers boosts in the query when computing a passage score
> * hl.alternateField, hl.maxAlternateFieldLength, hl.highlightAlternate 
> options.  Instead the UH has hl.defaultSummary boolean
> See 
> https://builds.apache.org/job/Solr-reference-guide-master/javadoc/highlighting.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley opened a new pull request #1531: SOLR-11334: Split some field lists better

2020-05-22 Thread GitBox


dsmiley opened a new pull request #1531:
URL: https://github.com/apache/lucene-solr/pull/1531


   https://issues.apache.org/jira/browse/SOLR-11334



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9380) Fix auxiliary class warnings in Lucene

2020-05-22 Thread Erick Erickson (Jira)
Erick Erickson created LUCENE-9380:
--

 Summary: Fix auxiliary class warnings in Lucene
 Key: LUCENE-9380
 URL: https://issues.apache.org/jira/browse/LUCENE-9380
 Project: Lucene - Core
  Issue Type: Improvement
 Environment: There are only three and they're entirely simple so I'll 
fix them up. Since they're in Lucene, I thought it should be a separate JIRA.
Reporter: Erick Erickson
Assignee: Erick Erickson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14474) Fix remaining auxilliary class warnings in Solr

2020-05-22 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-14474:
--
Summary: Fix remaining auxilliary class warnings in Solr  (was: Fix 
auxilliary class warnings in Solr core)

> Fix remaining auxilliary class warnings in Solr
> ---
>
> Key: SOLR-14474
> URL: https://issues.apache.org/jira/browse/SOLR-14474
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> We have quite a number of situations where multiple classes are declared in a 
> single source file, which is a poor practice. I ran across a bunch of these 
> in solr/core, and [~mdrob] fixed some of these in SOLR-14426. [~dsmiley] 
> looked at those and thought that it would have been better to just move a 
> particular class to its own file. And [~uschindler] do you have any comments?
> I have a fork with a _bunch_ of changes to get warnings out that include 
> moving more than a few classes into static inner classes, including the one 
> Mike did. I do NOT intend to commit this, it's too big/sprawling, but it does 
> serve to show a variety of situations. See: 
> https://github.com/ErickErickson/lucene-solr/tree/jira/SOLR-10810 for how 
> ugly it all looks. I intend to break this wodge down into smaller tasks and 
> start over now that I have a clue as to the scope. And do ignore the generics 
> changes as well as the consequences of upgrading apache commons CLI, those 
> need to be their own JIRA.
> What I'd like to do is agree on some guidelines for when to move classes to 
> their own file and when to move them to static inner classes.
> Some things I saw, reference the fork for the changes (again, I won't check 
> that in).
> 1> DocValuesAcc has no fewer than 9 classes that could be moved inside the 
> main class. But they all become "static abstract". And take 
> "DoubleSortedNumericDVAcc" in that class, It gets extended over in 4 other 
> files. How would all that get resolved? How many of them would people 
> recommend moving into their own files? Do we want to proliferate all those? 
> And so on with all the other plethora of classes in 
> org.apache.solr.search.facet.
> This is particularly thorny because the choices would be about a zillion new 
> classes or about a zillion edits.
> Does the idea of abstract .vs. concrete classes make any difference? IOW, if 
> we change an abstract class to a nested class, then maybe we just have to 
> change the class(es) that extend it?
> 2> StatsComponent.StatsInfo probably should be its own file?
> 3> FloatCmp, LongCmp, DoubleCmp all declare classes with "Comp" rather than 
> "Cmp". Those files should just be renamed.
> 4> JSONResponseWriter. ???
> 5> FacetRangeProcessor seems like it needs its own class
> 6> FacetRequestSorted seems like it needs its own class
> 7> FacetModule
> So what I'd like going forward is to agree on some guidelines to resolve 
> whether to move a class to its own file or make it nested (probably static). 
> Not hard-and-fast rules, just something to cut down on the rework due to 
> objections.
> And what about backporting to 8x? My suggestion is to backport what's 
> easy/doesn't break back-compat in order to make keeping the two branches in 
> sync.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-05-22 Thread Viral Gandhi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viral Gandhi updated LUCENE-9378:
-
Description: 
Lucene 8.5.1 includes a change to always [compress 
BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused 
(~30%) reduction in our red-line QPS (throughput). 

We think users should be given some way to opt-in for this compression feature 
instead of always being enabled which can have a substantial query time cost as 
we saw during our upgrade. [~mikemccand] suggested one possible approach by 
introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and UNCOMPRESSED) 
and allowing users to create a custom Codec subclassing the default Codec and 
pick the format they want.

Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
Mode.BEST_SPEED and Mode.BEST_COMPRESSION.

Here's related issues for adding benchmark covering BINARY doc values 
query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]

  was:
Lucene 8.5.1 includes a change to always [compress 
BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused 
(~30%) reduction in our red-line QPS (throughput). 

We think users should be given some way to opt-in for this compression feature 
instead of always being enabled which can have a substantial query time cost as 
we saw during our upgrade. [~mikemccand] suggested one possible approach by 
introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) 
and allowing users to create a custom Codec subclassing the default Codec and 
pick the format they want.

Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
Mode.BEST_SPEED and Mode.BEST_COMPRESSION.

Here's related issues for adding benchmark covering BINARY doc values 
query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]


> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2020-05-22 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114407#comment-17114407
 ] 

David Smiley commented on SOLR-13749:
-

RE routerField: I understand but it's also sometimes a little inconvenient to 
edit the solrconfig.xml.  Consider reading routerField as a parameter as an 
option if routerField isn't specified in the config.

> Implement support for joining across collections with multiple shards ( XCJF )
> --
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Blocker
> Fix For: 8.6
>
> Attachments: 2020-03 Smiley with ASF hat.jpeg
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
>  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster 
> will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional 
> )|
> |from|Required|The join key field name in the external collection ( required 
> )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to 
> retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the 
> "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash 
> range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this 
> parameter is not specified, 
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered 
> valid, in seconds.  Defaults to 3600 (one hour).  
> The XCJF query will not be aware of changes to the remote collection, so 
> if the remote collection is updated, cached XCJF queries may give inaccurate 
> results.  
> After the ttl period has expired, the XCJF query will re-execute the join 
> against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local 
> param.|
>  
> Example Solr Config.xml changes:
>  
>  {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
>  {{   }}{{class}}{{=}}{{"solr.LRUCache"}}
>  {{   }}{{size}}{{=}}{{"128"}}
>  {{   }}{{initialSize}}{{=}}{{"0"}}
>  {{   }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
>   
>  {{<}}{{queryParser}} 

[jira] [Commented] (SOLR-13289) Support for BlockMax WAND

2020-05-22 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114398#comment-17114398
 ] 

Christine Poerschke commented on SOLR-13289:


{quote}{quote}would or wouldn't a minExactHits=100 request make use of a 
minExactHits=1000
{quote}
It wouldn't. Right now, it's just using equal. We could improve this for sure, 
that said, I'm wondering how useful that would be in practice? like, people 
doing the same request in the same index with different minExactHits. While 
certainly could happen, I'm not sure how common that is.
{quote}
I agree, say {{minExactCount=100}} and {{minExactCount=1000}} on the same index 
might be uncommon but a query with a {{minExactCount=}} restriction being able 
to use a cache entry from a query without a {{minExactCount=}} restriction 
might be more interesting. Anyhow, I've speculatively opened 
[https://github.com/apache/lucene-solr/pull/1530] – though perhaps a new ticket 
would be clearer since this one is now closed.

> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Fix For: master (9.0), 8.6
>
> Attachments: SOLR-13289.patch, SOLR-13289.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke opened a new pull request #1530: (work-in-progress) SOLR-13289: QueryResultKey.[nc_]minExactCount

2020-05-22 Thread GitBox


cpoerschke opened a new pull request #1530:
URL: https://github.com/apache/lucene-solr/pull/1530


   Allow with-minExactCount searches to use query cache entries from 
without-minExactCount searches.
   
   https://issues.apache.org/jira/browse/SOLR-13289



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14280) SolrConfig logging not helpful

2020-05-22 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114348#comment-17114348
 ] 

Jason Gerlowski commented on SOLR-14280:


bq. I only fixed one line but maybe it would be better to replace all the 
similar log lines.
By similar log lines, are you referring to similar places in SolrConfig?  Or 
are you just referring generically to places where we log warnings/errors 
without including the exception message?

For what it's worth, I'd be happy to commit this as-is.  There's definitely 
other places where we can improve our logging, but I don't want to let the 
perfect get in the way of the good.  If you've got other places you'd like to 
address as part of this jira, feel free.  But if you don't have immediate plans 
for that, I'll merge your one-liner patch as is so people can start benefiting 
from it.

> SolrConfig logging not helpful
> --
>
> Key: SOLR-14280
> URL: https://issues.apache.org/jira/browse/SOLR-14280
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andras Salamon
>Priority: Minor
> Attachments: SOLR-14280-01.patch
>
>
> SolrConfig prints out a warning message if it's not able to add files to the 
> classpath, but this message is not too helpful:
> {noformat}
> o.a.s.c.SolrConfig Couldn't add files from 
> /opt/cloudera/parcels/CDH-7.1.1-1.cdh7.1.1.p0.1850855/lib/solr/dist filtered 
> by solr-langid-\d.*\.jar to classpath: 
> /opt/cloudera/parcels/CDH-7.1.1-1.cdh7.1.1.p0.1850855/lib/solr/
> dist {noformat}
> The reason should be at the end of the log message but it's just repeats the 
> problematic file name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14510) Remove deprecations added with the BlockMax WAND support

2020-05-22 Thread Tomas Eduardo Fernandez Lobbe (Jira)
Tomas Eduardo Fernandez Lobbe created SOLR-14510:


 Summary: Remove deprecations added with the BlockMax WAND support
 Key: SOLR-14510
 URL: https://issues.apache.org/jira/browse/SOLR-14510
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: master (9.0)
Reporter: Tomas Eduardo Fernandez Lobbe


I'd like to remove some of the deprecations added by the work in SOLR-132899. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2020-05-22 Thread Dan Fox (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114307#comment-17114307
 ] 

Dan Fox commented on SOLR-13749:


{quote}Instead of {{method=ccjoin}}, lets do {{method=crossCollection}}. Having 
"join" in there is redundant given the context. And no reason to be 
ultra-concise.
{quote}
{quote}Lets make this also work for the default. In 
{{org.apache.solr.search.JoinQParserPlugin#parse}} which detects explicit vs 
default method, you could modify that if the default index method fails, make 
an attempt with crossCollection. Also, maybe tweak the exception of the 
existing fromIndex check failure to mention the new method (or not; your 
preference).
{quote}
Sounds good to me - I'll work on making those changes.
{quote}Can't routerField be an (optional) query _parameter_ instead of 
demanding pre-configuration?
{quote}
It could be.  The reason I'd made that configurable at the plugin level is that 
the routerField can only ever be one field for any given collection (the field 
that the document IDs were prefixed with at index time), so it should have the 
same value for every query against that collection.  If it was a query 
parameter, and you forgot to set that parameter, you might end up with awful 
performance, or if you set it to the wrong value, you could end up with 
incomplete results.  So I thought it'd be better to have it configured in the 
plugin, so it would only have to be set one time, and then the query parser 
would automatically do the right thing, rather than having to think about it 
every time you write a join query.

There is also a "routed" parameter you can set on the query to tell it that 
your collection was routed by the join key, which you can use without 
pre-configuration.
{quote}Can you please remove CrossCollectionJoinQParserPlugin or explain why it 
should stay?
{quote}
Yeah, I guess there's no reason for that to exist anymore.  I'll take it out.

> Implement support for joining across collections with multiple shards ( XCJF )
> --
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Blocker
> Fix For: 8.6
>
> Attachments: 2020-03 Smiley with ASF hat.jpeg
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
>  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster 
> will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional 
> )|
> |from|Required|The join key field name in the external collection ( required 
> )|
> |to|Required|The join key field name in the local collection|
> |v|See 

[jira] [Commented] (SOLR-14467) inconsistent server errors combining relatedness() with allBuckets:true

2020-05-22 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114300#comment-17114300
 ] 

Michael Gibney commented on SOLR-14467:
---

Oh, and in light of this conversation, I'm guessing we should probably 
force-push/overwrite all of my most recent (last 5, SOLR-14467-related commits) 
at [PR 
#751|https://github.com/apache/lucene-solr/pull/751/commits/add5d8ed168faf872ed62212d388f09284fd04b8]?

> inconsistent server errors combining relatedness() with allBuckets:true
> ---
>
> Key: SOLR-14467
> URL: https://issues.apache.org/jira/browse/SOLR-14467
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14467.patch, SOLR-14467_test.patch
>
>
> While working on randomized testing for SOLR-13132 i discovered a variety of 
> different ways that JSON Faceting's "allBuckets" option can fail when 
> combined with the "relatedness()" function.
> I haven't found a trivial way to manual reproduce this, but i have been able 
> to trigger the failures with a trivial patch to {{TestCloudJSONFacetSKG}} 
> which i will attach.
> Based on the nature of the failures it looks like it may have something to do 
> with multiple segments of different sizes, and or resizing the SlotAccs ?
> The relatedness() function doesn't have much (any?) existing tests in place 
> that leverage "allBuckets" so this is probably a bug that has always existed 
> -- it's possible it may be excessively cumbersome to fix and we might 
> nee/wnat to just document that incompatibility and add some code to try and 
> detect if the user combines these options and if so fail with a 400 error?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14495) Fix or suppress warnings in solr/search/function

2020-05-22 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14495.
---
Fix Version/s: 8.6
   Resolution: Fixed

> Fix or suppress warnings in solr/search/function
> 
>
> Key: SOLR-14495
> URL: https://issues.apache.org/jira/browse/SOLR-14495
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14495) Fix or suppress warnings in solr/search/function

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114293#comment-17114293
 ] 

ASF subversion and git services commented on SOLR-14495:


Commit 5e7be63ca7fbd234526fceb2d7a0594a54f90670 in lucene-solr's branch 
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5e7be63 ]

SOLR-14495: Fix or suppress warnings in solr/search/function


> Fix or suppress warnings in solr/search/function
> 
>
> Key: SOLR-14495
> URL: https://issues.apache.org/jira/browse/SOLR-14495
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14495) Fix or suppress warnings in solr/search/function

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114295#comment-17114295
 ] 

ASF subversion and git services commented on SOLR-14495:


Commit 675956c0041b18d48a7c059ea458c49f5310d74a in lucene-solr's branch 
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=675956c ]

SOLR-14495: Fix or suppress warnings in solr/search/function


> Fix or suppress warnings in solr/search/function
> 
>
> Key: SOLR-14495
> URL: https://issues.apache.org/jira/browse/SOLR-14495
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14467) inconsistent server errors combining relatedness() with allBuckets:true

2020-05-22 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114234#comment-17114234
 ] 

Michael Gibney commented on SOLR-14467:
---

This sounds good to me.

One general question: when {{SlotContext.isAllBucket()==true}}, what would be 
returned by {{SlotContext.getSlotQuery()}}? I agree that using the base 
DocSet/domain for the entire facet as the slotContext in SpecialSlotAcc would 
be misleading. (I had also wondered about using {{field:[* TO *]}}, which would 
be slightly different, but equally misleading for all the same reasons you 
identified). So if there's no single slotQuery that _wouldn't_ be misleading in 
the allBuckets case, the options could be:
 # treat {{isAllBucket()==true}} as mutually exclusive with {{getSlotQuery()}}, 
and have the latter either return {{null}} or throw an 
{{IllegalStateException}}?
 # allow SlotContext to wrap the original (i.e. not allBuckets) slot and 
slotContext, e.g.:
{code:java}
public static final class SlotContext {
  private final boolean isAllBuckets;
  private final Query slotQuery;
  private int originalSlot = -1;
  private IntFunction originalContext;
  /** constructs a normal instance */
  public SlotContext(Query slotQuery) {
this.slotQuery = slotQuery;
this.isAllBuckets = false;
  }
  /** constructs an allBuckets instance */
  public SlotContext() {
this.slotQuery = null;
this.isAllBuckets = true;
  }
  public Query getSlotQuery() {
return isAllBuckets ? originalContext.apply(originalSlot).getSlotQuery() : 
slotQuery;
  }
  public boolean isAllBuckets() {
return isAllBuckets;
  }
  /** provides access to the original slot, if desired? */
  public boolean getOriginalSlot() {
return originalSlot;
  }
  /** called by SpecialSlotAcc before passing to collectAcc/otherAccs */
  public void updateAllBuckets(int originalSlot, IntFunction 
originalContext) {
this.originalSlot = originalSlot;
this.originalContext = originalContext;
  }
}
{code}

FWIW I thought a little more about why I proceeded under the assumption that 
{{relatedness()}} would be meaningful for {{allBuckets}}. I actually _do_ think 
it could be relevant, but in a way that (upon further reflection) I think can 
only be practically calculated using sweep collection. And for that matter, 
considering the problems and awkwardness you identified with making 
{{RelatednessAgg}} directly aware of {{allBucketsSlot}} and "double-counting", 
could (I think?) be best supported by extending the concept of "sweep 
collection" to cover "normal" {{allBuckets}} collection. I'm not exactly sure 
what that would look like, but in any event it seems clear that it would be a 
different issue, if there's even any interest in pursuing it.

To briefly expand on why I think {{relatedness()}} might be meaningful for 
{{allBuckets}}: at a high level, say you have 5 buckets returned out of 10 
total buckets, and each of the 5 returned is perfectly correlated 
(relatedness==1.0). Despite this, you have no way of knowing how "special" 
these buckets are in the overall context of field you're faceting on ... 
perhaps all 10 bucket values are perfectly correlated; or perhaps the 5 buckets 
that weren't returned are perfectly _negatively_ correlated 
(relatedness==-1.0). I _think_ that calculating relatedness over allBuckets 
(with fgCount="sum(fgCount) over all buckets", bgCount="sum(bgCount) over all 
buckets", and fgSize and bgSize each multiplied by the total number of buckets) 
should give you a meaningful way of normalizing/contextualizing relatedness 
scores of individual buckets. But the non-sweep implementation of relatedness, 
being driven by the presence of values in the base domain, would be a bad fit 
for this, since it ignores all buckets not represented in the base domain 
(regardless of whether they might have values in the fgSet or bgSet that would 
be relevant to calculating a meaningful "allBuckets" relatedness score).

> inconsistent server errors combining relatedness() with allBuckets:true
> ---
>
> Key: SOLR-14467
> URL: https://issues.apache.org/jira/browse/SOLR-14467
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14467.patch, SOLR-14467_test.patch
>
>
> While working on randomized testing for SOLR-13132 i discovered a variety of 
> different ways that JSON Faceting's "allBuckets" option can fail when 
> combined with the "relatedness()" function.
> I haven't found a trivial way to manual reproduce this, but i have been able 
> to trigger the failures with a trivial patch to {{TestCloudJSONFacetSKG}} 
> which i will attach.
> Based on the nature 

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1527: SOLR-14384 Stack SolrRequestInfo

2020-05-22 Thread GitBox


dsmiley commented on a change in pull request #1527:
URL: https://github.com/apache/lucene-solr/pull/1527#discussion_r429352375



##
File path: solr/core/src/java/org/apache/solr/request/SolrRequestInfo.java
##
@@ -38,7 +40,13 @@
 
 
 public class SolrRequestInfo {
-  protected final static ThreadLocal threadLocal = new 
ThreadLocal<>();
+
+  protected final static int capacity = 150;
+
+  protected final static ThreadLocal> threadLocal = 
ThreadLocal.withInitial(() -> {
+Deque stack = new ArrayDeque<>(capacity);

Review comment:
   I think you can simply do a lambda reference to ArrayDequeue::new.  No 
need initialize with an internal capacity.  Honestly I'd prefer a LinkedList 
because in practice this stack will be extremely small (zero, one, sometimes 
two, very unlikely more).  But I leave that impl choice to you.

##
File path: solr/core/src/java/org/apache/solr/request/SolrRequestInfo.java
##
@@ -38,7 +40,13 @@
 
 
 public class SolrRequestInfo {
-  protected final static ThreadLocal threadLocal = new 
ThreadLocal<>();
+
+  protected final static int capacity = 150;

Review comment:
   This should be MAX_STACK_SIZE I think.  
   Can anyone ( @mkhludnev ?) venture to guess how, realistically, we might 
reach upwards of 150?  I can't imagine more than a few, let alone 150.

##
File path: solr/core/src/java/org/apache/solr/request/SolrRequestInfo.java
##
@@ -52,35 +60,48 @@
   private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
   public static SolrRequestInfo getRequestInfo() {
-return threadLocal.get();
+if (threadLocal.get().isEmpty()) return null;
+return threadLocal.get().peek();
   }
 
   public static void setRequestInfo(SolrRequestInfo info) {
-// TODO: temporary sanity check... this can be changed to just an assert 
in the future
-SolrRequestInfo prev = threadLocal.get();
-if (prev != null) {
-  log.error("Previous SolrRequestInfo was not closed!  req={}", 
prev.req.getOriginalParams());
-  log.error("prev == info : {}", prev.req == info.req, new 
RuntimeException());
+if (info == null) {
+  throw new IllegalArgumentException("SolrRequestInfo is null");
+} else {
+  if (threadLocal.get().size() <= capacity) {
+threadLocal.get().push(info);
+  } else {
+log.error("SolrRequestInfo Stack is full");
+  }
 }
-assert prev == null;
-
-threadLocal.set(info);
   }
 
   public static void clearRequestInfo() {
-try {
-  SolrRequestInfo info = threadLocal.get();
-  if (info != null && info.closeHooks != null) {
-for (Closeable hook : info.closeHooks) {
-  try {
-hook.close();
-  } catch (Exception e) {
-SolrException.log(log, "Exception during close hook", e);
-  }
+if (threadLocal.get().isEmpty()) {
+  log.error("clearRequestInfo called too many times");
+} else {
+  SolrRequestInfo info = threadLocal.get().pop();
+  closeHooks(info);
+}
+  }
+
+  public static void reset() {

Review comment:
   Add javadoc please

##
File path: solr/core/src/java/org/apache/solr/request/SolrRequestInfo.java
##
@@ -52,35 +60,48 @@
   private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
   public static SolrRequestInfo getRequestInfo() {
-return threadLocal.get();
+if (threadLocal.get().isEmpty()) return null;
+return threadLocal.get().peek();
   }
 
   public static void setRequestInfo(SolrRequestInfo info) {

Review comment:
   Add javadoc please

##
File path: solr/core/src/java/org/apache/solr/request/SolrRequestInfo.java
##
@@ -52,35 +60,48 @@
   private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
   public static SolrRequestInfo getRequestInfo() {
-return threadLocal.get();
+if (threadLocal.get().isEmpty()) return null;
+return threadLocal.get().peek();
   }
 
   public static void setRequestInfo(SolrRequestInfo info) {
-// TODO: temporary sanity check... this can be changed to just an assert 
in the future
-SolrRequestInfo prev = threadLocal.get();
-if (prev != null) {
-  log.error("Previous SolrRequestInfo was not closed!  req={}", 
prev.req.getOriginalParams());
-  log.error("prev == info : {}", prev.req == info.req, new 
RuntimeException());
+if (info == null) {
+  throw new IllegalArgumentException("SolrRequestInfo is null");
+} else {
+  if (threadLocal.get().size() <= capacity) {
+threadLocal.get().push(info);
+  } else {
+log.error("SolrRequestInfo Stack is full");
+  }
 }
-assert prev == null;
-
-threadLocal.set(info);
   }
 
   public static void clearRequestInfo() {

Review comment:
   Add javadoc please





This is an automated 

[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler

2020-05-22 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114230#comment-17114230
 ] 

David Smiley commented on SOLR-14470:
-

Ok; makes sense.

> Add streaming expressions to /export handler
> 
>
> Key: SOLR-14470
> URL: https://issues.apache.org/jira/browse/SOLR-14470
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Export Writer, streaming expressions
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Many streaming scenarios would greatly benefit from the ability to perform 
> partial rollups (or other transformations) as early as possible, in order to 
> minimize the amount of data that has to be sent from shards to the 
> aggregating node.
> This can be implemented as a subset of streaming expressions that process the 
> data directly inside each local {{ExportHandler}} and outputs only the 
> records from the resulting stream. 
> Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is 
> the case with {{Combiner}}, because the input data is processed in batches 
> there would be no guarantee that only 1 record per unique sort values would 
> be emitted - in fact, in most cases multiple partial aggregations would be 
> emitted. Still, in many scenarios this would allow reducing the amount of 
> data to be sent by several orders of magnitude.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2020-05-22 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114228#comment-17114228
 ] 

David Smiley commented on SOLR-13749:
-

My _only_ concern with a non-whitelisted zkHost is that you could join from 
another cluster (you call this the "destination" but I find that orientation 
confusing) and _maybe_ somehow that could be used to get that data out?  I 
don't know how it could; doesn't seem realistically useful to a hacker.  And 
besides, additional network or other security measures could exist to further 
protect from that.  So nevermind.

> Implement support for joining across collections with multiple shards ( XCJF )
> --
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Blocker
> Fix For: 8.6
>
> Attachments: 2020-03 Smiley with ASF hat.jpeg
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
>  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster 
> will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional 
> )|
> |from|Required|The join key field name in the external collection ( required 
> )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to 
> retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the 
> "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash 
> range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this 
> parameter is not specified, 
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered 
> valid, in seconds.  Defaults to 3600 (one hour).  
> The XCJF query will not be aware of changes to the remote collection, so 
> if the remote collection is updated, cached XCJF queries may give inaccurate 
> results.  
> After the ttl period has expired, the XCJF query will re-execute the join 
> against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local 
> param.|
>  
> Example Solr Config.xml changes:
>  
>  {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
>  {{   

[jira] [Created] (SOLR-14509) Make minExactCount's default configurable

2020-05-22 Thread Tomas Eduardo Fernandez Lobbe (Jira)
Tomas Eduardo Fernandez Lobbe created SOLR-14509:


 Summary: Make minExactCount's default configurable
 Key: SOLR-14509
 URL: https://issues.apache.org/jira/browse/SOLR-14509
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Tomas Eduardo Fernandez Lobbe


In SOLR-13289 we added the {{minExactCount}} parameter for using the BlockMax 
WAND algorithm. Currently, when unset, the default {{minExactCount}} is set to 
{{Integer.MAX_VALUE}}, it would be nice to be able to set this default in 
{{solrconfig.xml}}, so that users don't need to send it per request (or 
configure in all request handlers)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13289) Support for BlockMax WAND

2020-05-22 Thread Tomas Eduardo Fernandez Lobbe (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Eduardo Fernandez Lobbe resolved SOLR-13289.
--
Fix Version/s: 8.6
   master (9.0)
   Resolution: Fixed

> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Fix For: master (9.0), 8.6
>
> Attachments: SOLR-13289.patch, SOLR-13289.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13289) Support for BlockMax WAND

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114205#comment-17114205
 ] 

ASF subversion and git services commented on SOLR-13289:


Commit 60e5cff87f56d9c4aebc5aeab63e10bd24440087 in lucene-solr's branch 
refs/heads/branch_8x from Tomas Eduardo Fernandez Lobbe
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=60e5cff ]

SOLR-13289: Add Support for BlockMax WAND (#1456)

Add support for BlockMax WAND via a minExactHits parameter. Hits will be 
counted accurately at least until this value, and above that, the count will be 
an approximation. In distributed search requests, the count will be per shard, 
so potentially the count will be accurately counted until numShards * 
minExactHits. The response will include the value numFoundExact which can be 
true (The value in numFound is exact) or false (the value in numFound is an 
approximation).


> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Attachments: SOLR-13289.patch, SOLR-13289.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13289) Support for BlockMax WAND

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114207#comment-17114207
 ] 

ASF subversion and git services commented on SOLR-13289:


Commit d5f8aab8614f61348782b09de8443c22e7c26bd2 in lucene-solr's branch 
refs/heads/branch_8x from Tomas Eduardo Fernandez Lobbe
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d5f8aab ]

SOLR-13289: Rename minExactHits to minExactCount (#1511)


> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Attachments: SOLR-13289.patch, SOLR-13289.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13289) Support for BlockMax WAND

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114206#comment-17114206
 ] 

ASF subversion and git services commented on SOLR-13289:


Commit 62a3476c89afb81b6ab07a2b3dbd6b27a6634fe7 in lucene-solr's branch 
refs/heads/branch_8x from Tomas Eduardo Fernandez Lobbe
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=62a3476 ]

SOLR-13289: Use the final collector's scoreMode (#1517)

This is needed in case a PostFilter changes the scoreMode


> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Attachments: SOLR-13289.patch, SOLR-13289.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13289) Support for BlockMax WAND

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114208#comment-17114208
 ] 

ASF subversion and git services commented on SOLR-13289:


Commit d97e6fe821a8025a66ba6d0f1d558a68d7789aa5 in lucene-solr's branch 
refs/heads/branch_8x from Tomas Eduardo Fernandez Lobbe
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d97e6fe ]

SOLR-13289: Add Refguide changes (#1501)



> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Attachments: SOLR-13289.patch, SOLR-13289.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13289) Support for BlockMax WAND

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114209#comment-17114209
 ] 

ASF subversion and git services commented on SOLR-13289:


Commit c8bfe974b26a8963ef20d1fbb283c8a4dddc52b6 in lucene-solr's branch 
refs/heads/branch_8x from Tomas Eduardo Fernandez Lobbe
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c8bfe97 ]

SOLR-13289: Add CHANGES entry to 8.x


> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Attachments: SOLR-13289.patch, SOLR-13289.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11334) UnifiedSolrHighlighter returns an error when hl.fl delimited by ", "

2020-05-22 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114145#comment-17114145
 ] 

Lucene/Solr QA commented on SOLR-11334:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m  3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m  3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m  3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 46m 
14s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-11334 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12885761/SOLR-11334.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 78f4a5b8ff8 |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/755/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/755/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> UnifiedSolrHighlighter returns an error when hl.fl delimited by ", "
> 
>
> Key: SOLR-11334
> URL: https://issues.apache.org/jira/browse/SOLR-11334
> Project: Solr
>  Issue Type: Bug
>  Components: highlighter
>Affects Versions: 6.6
> Environment: Ubuntu 17.04 (GNU/Linux 4.10.0-33-generic x86_64)
> Java HotSpot 64-Bit Server VM(build 25.114-b01, mixed mode)
>Reporter: Yasufumi Mizoguchi
>Priority: Trivial
> Attachments: SOLR-11334.patch
>
>
> UnifiedSolrHighlighter(hl.method=unified) misjudge the zero-length string as 
> a field name and returns an error when hl.fl delimited by ", "
> request:
> {code}
> $ curl -XGET 
> "http://localhost:8983/solr/techproducts/select?fl=name,%20manu=name,%20manu=unified=on=on=corsair=json;
> {code}
> response:
> {code}
> {
>   "responseHeader":{
> "status":400,
> "QTime":8,
> "params":{
>   "q":"corsair",
>   "hl":"on",
>   "indent":"on",
>   "fl":"name, manu",
>   "hl.fl":"name, manu",
>   "hl.method":"unified",
>   "wt":"json"}},
>   "response":{"numFound":2,"start":0,"docs":[
>   {
> "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 
> (PC 3200) System Memory - Retail",
> "manu":"Corsair Microsystems Inc."},
>   {
> "name":"CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 
> 400 (PC 3200) Dual Channel Kit System Memory - Retail",
> "manu":"Corsair Microsystems Inc."}]
>   },
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"undefined field ",
> "code":400}}
> {code}
> DefaultHighlighter's response:
> {code}
> {
>   "responseHeader":{
> "status":0,
> "QTime":5,
> "params":{
>   "q":"corsair",
>   "hl":"on",
>   "indent":"on",
>   "fl":"name, manu",
>   "hl.fl":"name, manu",
>   "hl.method":"original",
>   "wt":"json"}},
>   "response":{"numFound":2,"start":0,"docs":[
>  

[jira] [Resolved] (SOLR-14443) Make SolrLogPostTool resilient to unexpected requests

2020-05-22 Thread Jason Gerlowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-14443.

Fix Version/s: 8.6
   master (9.0)
   Resolution: Fixed

> Make SolrLogPostTool resilient to unexpected requests
> -
>
> Key: SOLR-14443
> URL: https://issues.apache.org/jira/browse/SOLR-14443
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Trivial
> Fix For: master (9.0), 8.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When SolrLogPostTool parses log messages corresponding to incoming requests, 
> it sets various predefined fields based on the parameters on the request.  
> e.g. it sets a rows_i field, a wt_s field, and so on.
> This logic works for most requests, but if the log-parser encounters requests 
> with multiple of these params (e.g. rows), it will blithely add them to the 
> SolrInputDocument, and error out when Solr rejects the eventual update 
> request because it is attempting to put multiple values into a single-valued 
> field.
> We can do two things to fix this.
> # Make SolrLogPostTool's "posting" code resilient to individual update 
> failures. It doesn't make any sense to crash the entire posting routine just 
> because one batch (or one log message) was malformed.
> #  Tweak the field parsing logic to be more resilient to the specific 
> "redundant query params" case I encountered specifically here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14443) Make SolrLogPostTool resilient to unexpected requests

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114122#comment-17114122
 ] 

ASF subversion and git services commented on SOLR-14443:


Commit 322fe54715a646525b7b0c0717977092ca456be0 in lucene-solr's branch 
refs/heads/branch_8x from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=322fe54 ]

SOLR-14443: Make SolrLogPostTool resilient to odd requests (#1525)


> Make SolrLogPostTool resilient to unexpected requests
> -
>
> Key: SOLR-14443
> URL: https://issues.apache.org/jira/browse/SOLR-14443
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When SolrLogPostTool parses log messages corresponding to incoming requests, 
> it sets various predefined fields based on the parameters on the request.  
> e.g. it sets a rows_i field, a wt_s field, and so on.
> This logic works for most requests, but if the log-parser encounters requests 
> with multiple of these params (e.g. rows), it will blithely add them to the 
> SolrInputDocument, and error out when Solr rejects the eventual update 
> request because it is attempting to put multiple values into a single-valued 
> field.
> We can do two things to fix this.
> # Make SolrLogPostTool's "posting" code resilient to individual update 
> failures. It doesn't make any sense to crash the entire posting routine just 
> because one batch (or one log message) was malformed.
> #  Tweak the field parsing logic to be more resilient to the specific 
> "redundant query params" case I encountered specifically here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14508) TopicStreams do not allow use of collection aliases

2020-05-22 Thread Jonathan Nightingale (Jira)
Jonathan Nightingale created SOLR-14508:
---

 Summary: TopicStreams do not allow use of collection aliases
 Key: SOLR-14508
 URL: https://issues.apache.org/jira/browse/SOLR-14508
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrJ
Affects Versions: 8.1, 7.4
Reporter: Jonathan Nightingale


I'm trying to use the solrJ TopicStream class to listen for data in a 
collection by referencing its alias. It complains about there not being any 
slices for that alias. On investigation it looked like the code in solrj was 
explicitly NOT looking at aliases in topic stream.

 

I made some minor changes and rebuilt and it seems to work.

 

in two places where getSlices is called, I set the useAlias flag to true

 
protected void constructStreams() throws IOException {
try {
  ZkStateReader zkStateReader = cloudSolrClient.getZkStateReader();
  Slice[] slices = CloudSolrStream.getSlices(this.collection, 
zkStateReader, *true*);
 
and also here
 
private void getCheckpoints() throws IOException {
this.checkpoints = new HashMap<>();
ZkStateReader zkStateReader = cloudSolrClient.getZkStateReader();

Slice[] slices = CloudSolrStream.getSlices(this.collection, zkStateReader, 
*true*);
 
I don't know how many versions this affects but I'm sure its a bunch. Also I 
hit the same errors using the REST api as I did using Solrj, so this would 
probably not affect that



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6966) Contribution: Codec for index-level encryption

2020-05-22 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114085#comment-17114085
 ] 

Bruno Roustant commented on LUCENE-6966:


+1

I'm going to work soon on this simple Directory based approach. I've created 
LUCENE-9379 to follow that in a separate issue.

I'll try to inspire from the previous works (related links) and I'll share my 
plan first to start discussions ahead.

> Contribution: Codec for index-level encryption
> --
>
> Key: LUCENE-6966
> URL: https://issues.apache.org/jira/browse/LUCENE-6966
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/other
>Reporter: Renaud Delbru
>Priority: Major
>  Labels: codec, contrib
> Attachments: Encryption Codec Documentation.pdf, LUCENE-6966-1.patch, 
> LUCENE-6966-2-docvalues.patch, LUCENE-6966-2.patch
>
>
> We would like to contribute a codec that enables the encryption of sensitive 
> data in the index that has been developed as part of an engagement with a 
> customer. We think that this could be of interest for the community.
> Below is a description of the project.
> h1. Introduction
> In comparison with approaches where all data is encrypted (e.g., file system 
> encryption, index output / directory encryption), encryption at a codec level 
> enables more fine-grained control on which block of data is encrypted. This 
> is more efficient since less data has to be encrypted. This also gives more 
> flexibility such as the ability to select which field to encrypt.
> Some of the requirements for this project were:
> * The performance impact of the encryption should be reasonable.
> * The user can choose which field to encrypt.
> * Key management: During the life cycle of the index, the user can provide a 
> new version of his encryption key. Multiple key versions should co-exist in 
> one index.
> h1. What is supported ?
> - Block tree terms index and dictionary
> - Compressed stored fields format
> - Compressed term vectors format
> - Doc values format (prototype based on an encrypted index output) - this 
> will be submitted as a separated patch
> - Index upgrader: command to upgrade all the index segments with the latest 
> key version available.
> h1. How it is implemented ?
> h2. Key Management
> One index segment is encrypted with a single key version. An index can have 
> multiple segments, each one encrypted using a different key version. The key 
> version for a segment is stored in the segment info.
> The provided codec is abstract, and a subclass is responsible in providing an 
> implementation of the cipher factory. The cipher factory is responsible of 
> the creation of a cipher instance based on a given key version.
> h2. Encryption Model
> The encryption model is based on AES/CBC with padding. Initialisation vector 
> (IV) is reused for performance reason, but only on a per format and per 
> segment basis.
> While IV reuse is usually considered a bad practice, the CBC mode is somehow 
> resilient to IV reuse. The only "leak" of information that this could lead to 
> is being able to know that two encrypted blocks of data starts with the same 
> prefix. However, it is unlikely that two data blocks in an index segment will 
> start with the same data:
> - Stored Fields Format: Each encrypted data block is a compressed block 
> (~4kb) of one or more documents. It is unlikely that two compressed blocks 
> start with the same data prefix.
> - Term Vectors: Each encrypted data block is a compressed block (~4kb) of 
> terms and payloads from one or more documents. It is unlikely that two 
> compressed blocks start with the same data prefix.
> - Term Dictionary Index: The term dictionary index is encoded and encrypted 
> in one single data block.
> - Term Dictionary Data: Each data block of the term dictionary encodes a set 
> of suffixes. It is unlikely to have two dictionary data blocks sharing the 
> same prefix within the same segment.
> - DocValues: A DocValues file will be composed of multiple encrypted data 
> blocks. It is unlikely to have two data blocks sharing the same prefix within 
> the same segment (each one will encodes a list of values associated to a 
> field).
> To the best of our knowledge, this model should be safe. However, it would be 
> good if someone with security expertise in the community could review and 
> validate it. 
> h1. Performance
> We report here a performance benchmark we did on an early prototype based on 
> Lucene 4.x. The benchmark was performed on the Wikipedia dataset where all 
> the fields (id, title, body, date) were encrypted. Only the block tree terms 
> and compressed stored fields format were tested at that time. 
> h2. Indexing
> The indexing throughput slightly decreased and is roughly 15% less than with 
> the 

[jira] [Commented] (LUCENE-7368) Remove queryNorm

2020-05-22 Thread Dumitru Daniliuc (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114084#comment-17114084
 ] 

Dumitru Daniliuc commented on LUCENE-7368:
--

You are right: our custom Similarity implementation did not override 
{{queryNorm()}}, so it defaulted to {{Similarity.queryNorm()}} which used to 
always return 1.0f. Thanks for your explanation and for helping us debug this!

> Remove queryNorm
> 
>
> Key: LUCENE-7368
> URL: https://issues.apache.org/jira/browse/LUCENE-7368
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Major
> Fix For: 7.0
>
> Attachments: LUCENE-7368.patch
>
>
> Splitting LUCENE-7347 into smaller tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9379) Directory based approach for index encryption

2020-05-22 Thread Bruno Roustant (Jira)
Bruno Roustant created LUCENE-9379:
--

 Summary: Directory based approach for index encryption
 Key: LUCENE-9379
 URL: https://issues.apache.org/jira/browse/LUCENE-9379
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Bruno Roustant
Assignee: Bruno Roustant


The goal is to provide optional encryption of the index, with a scope limited 
to an encryptable Lucene Directory wrapper.

Encryption is at rest on disk, not in memory.

This simple approach should fit any Codec as it would be orthogonal, without 
modifying APIs as much as possible.

Use a standard encryption method. Limit perf/memory impact as much as possible.

Determine how callers provide encryption keys. They must not be stored on disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija merged pull request #1525: SOLR-14443: Make SolrLogPostTool resilient to odd requests

2020-05-22 Thread GitBox


gerlowskija merged pull request #1525:
URL: https://github.com/apache/lucene-solr/pull/1525


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14443) Make SolrLogPostTool resilient to unexpected requests

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114071#comment-17114071
 ] 

ASF subversion and git services commented on SOLR-14443:


Commit 78f4a5b8ff854861ac6ad17c27016e222463e54c in lucene-solr's branch 
refs/heads/master from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=78f4a5b ]

SOLR-14443: Make SolrLogPostTool resilient to odd requests (#1525)



> Make SolrLogPostTool resilient to unexpected requests
> -
>
> Key: SOLR-14443
> URL: https://issues.apache.org/jira/browse/SOLR-14443
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When SolrLogPostTool parses log messages corresponding to incoming requests, 
> it sets various predefined fields based on the parameters on the request.  
> e.g. it sets a rows_i field, a wt_s field, and so on.
> This logic works for most requests, but if the log-parser encounters requests 
> with multiple of these params (e.g. rows), it will blithely add them to the 
> SolrInputDocument, and error out when Solr rejects the eventual update 
> request because it is attempting to put multiple values into a single-valued 
> field.
> We can do two things to fix this.
> # Make SolrLogPostTool's "posting" code resilient to individual update 
> failures. It doesn't make any sense to crash the entire posting routine just 
> because one batch (or one log message) was malformed.
> #  Tweak the field parsing logic to be more resilient to the specific 
> "redundant query params" case I encountered specifically here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9330) Make SortField responsible for index sorting

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114059#comment-17114059
 ] 

ASF subversion and git services commented on LUCENE-9330:
-

Commit ee2c798cd8891c7c4c0f857c54f3be97d9e32883 in lucene-solr's branch 
refs/heads/branch_8x from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ee2c798 ]

LUCENE-9330: Fix Lucene70Norms test in backwards-codecs


> Make SortField responsible for index sorting
> 
>
> Key: LUCENE-9330
> URL: https://issues.apache.org/jira/browse/LUCENE-9330
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Index sorting is currently handled inside Sorter and MultiSorter, with 
> hard-coded implementations dependent on SortField types.  This means that you 
> can't sort by custom SortFields, and also that the logic for handling 
> specific sort types is split between several unrelated classes.
> SortFields should instead be able to implement their own index sorting 
> methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9330) Make SortField responsible for index sorting

2020-05-22 Thread Alan Woodward (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-9330.
---
Fix Version/s: 8.6
 Assignee: Alan Woodward
   Resolution: Fixed

> Make SortField responsible for index sorting
> 
>
> Key: LUCENE-9330
> URL: https://issues.apache.org/jira/browse/LUCENE-9330
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Index sorting is currently handled inside Sorter and MultiSorter, with 
> hard-coded implementations dependent on SortField types.  This means that you 
> can't sort by custom SortFields, and also that the logic for handling 
> specific sort types is split between several unrelated classes.
> SortFields should instead be able to implement their own index sorting 
> methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9330) Make SortField responsible for index sorting

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114026#comment-17114026
 ] 

ASF subversion and git services commented on LUCENE-9330:
-

Commit ad0cefb8ec279e4fae0b104d487deb36ff2f9d27 in lucene-solr's branch 
refs/heads/branch_8x from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ad0cefb ]

LUCENE-9330: Make SortFields responsible for index sorting and serialization 
(#1440)

This commit adds a new class IndexSorter which handles how a sort should be 
applied
to documents in an index:

* how to serialize/deserialize sort info in the segment header
* how to sort documents within a segment
* how to sort documents from merging segments

SortField has a getIndexSorter() method, which will return null if the sort 
cannot be used
to sort an index (eg if it uses scores or other query-dependent values). This 
also requires a
new Codec as there is a change to the SegmentInfoFormat


> Make SortField responsible for index sorting
> 
>
> Key: LUCENE-9330
> URL: https://issues.apache.org/jira/browse/LUCENE-9330
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Alan Woodward
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Index sorting is currently handled inside Sorter and MultiSorter, with 
> hard-coded implementations dependent on SortField types.  This means that you 
> can't sort by custom SortFields, and also that the logic for handling 
> specific sort types is split between several unrelated classes.
> SortFields should instead be able to implement their own index sorting 
> methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7368) Remove queryNorm

2020-05-22 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114021#comment-17114021
 ] 

Adrien Grand commented on LUCENE-7368:
--

It looks to me like the problem is in 6.6.6, not 7.7.2. Seeing queryNorm=1 
suggests that your custom similarity incompletely implements query 
normalization. See for instance what the same explanation looks like with 
ClassicSimilarity: the IDF factor of the queryWeight gets cancelled by the 
queryNorm, and only fieldWeight retains an IDF factor.

> Remove queryNorm
> 
>
> Key: LUCENE-7368
> URL: https://issues.apache.org/jira/browse/LUCENE-7368
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Major
> Fix For: 7.0
>
> Attachments: LUCENE-7368.patch
>
>
> Splitting LUCENE-7347 into smaller tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-05-22 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114010#comment-17114010
 ] 

Adrien Grand commented on LUCENE-9378:
--

The code suggests that block decompression happens only once per block indeed. 
I'm not very familiar with the facets tasks, do they consume all docs by any 
chance? A side-effect of bulk-decoding multiple values at once is that 
selective queries get slower because they likely decompress values that they 
don't need, but queries that match most documents like MatchAllDocsQuery might 
get faster.

Another factor that probably plays a role here is how compressible the data is. 
The compression logic we're using is fast when data is barely compressible and 
gets slower if the data is highly compressible. So depending on how 
compressible the data is, performance results could be extremely different. 
Maybe we should update the Disk usage tool 
(https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/DiskUsage.70.java)
 to work with the Lucene84 and Lucene86 codecs to get a clearer picture about 
the storage savings on a per-field basis.

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek merged pull request #1440: LUCENE-9330: Make SortFields responsible for index sorting and serialization

2020-05-22 Thread GitBox


romseygeek merged pull request #1440:
URL: https://github.com/apache/lucene-solr/pull/1440


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9330) Make SortField responsible for index sorting

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113989#comment-17113989
 ] 

ASF subversion and git services commented on LUCENE-9330:
-

Commit de2bad9039054af753bc2c847565f63f05f4fdd7 in lucene-solr's branch 
refs/heads/master from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=de2bad9 ]

LUCENE-9330: Make SortFields responsible for index sorting and serialization 
(#1440)

This commit adds a new class IndexSorter which handles how a sort should be 
applied
to documents in an index:

* how to serialize/deserialize sort info in the segment header
* how to sort documents within a segment
* how to sort documents from merging segments

SortField has a getIndexSorter() method, which will return null if the sort 
cannot be used
to sort an index (eg if it uses scores or other query-dependent values). This 
also requires a
new Codec as there is a change to the SegmentInfoFormat

> Make SortField responsible for index sorting
> 
>
> Key: LUCENE-9330
> URL: https://issues.apache.org/jira/browse/LUCENE-9330
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Alan Woodward
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Index sorting is currently handled inside Sorter and MultiSorter, with 
> hard-coded implementations dependent on SortField types.  This means that you 
> can't sort by custom SortFields, and also that the logic for handling 
> specific sort types is split between several unrelated classes.
> SortFields should instead be able to implement their own index sorting 
> methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6966) Contribution: Codec for index-level encryption

2020-05-22 Thread Jira


[ 
https://issues.apache.org/jira/browse/LUCENE-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113965#comment-17113965
 ] 

Juraj Jurčo commented on LUCENE-6966:
-

+1 also hope this is still not dead.. We would appreciate it as well. 

> Contribution: Codec for index-level encryption
> --
>
> Key: LUCENE-6966
> URL: https://issues.apache.org/jira/browse/LUCENE-6966
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/other
>Reporter: Renaud Delbru
>Priority: Major
>  Labels: codec, contrib
> Attachments: Encryption Codec Documentation.pdf, LUCENE-6966-1.patch, 
> LUCENE-6966-2-docvalues.patch, LUCENE-6966-2.patch
>
>
> We would like to contribute a codec that enables the encryption of sensitive 
> data in the index that has been developed as part of an engagement with a 
> customer. We think that this could be of interest for the community.
> Below is a description of the project.
> h1. Introduction
> In comparison with approaches where all data is encrypted (e.g., file system 
> encryption, index output / directory encryption), encryption at a codec level 
> enables more fine-grained control on which block of data is encrypted. This 
> is more efficient since less data has to be encrypted. This also gives more 
> flexibility such as the ability to select which field to encrypt.
> Some of the requirements for this project were:
> * The performance impact of the encryption should be reasonable.
> * The user can choose which field to encrypt.
> * Key management: During the life cycle of the index, the user can provide a 
> new version of his encryption key. Multiple key versions should co-exist in 
> one index.
> h1. What is supported ?
> - Block tree terms index and dictionary
> - Compressed stored fields format
> - Compressed term vectors format
> - Doc values format (prototype based on an encrypted index output) - this 
> will be submitted as a separated patch
> - Index upgrader: command to upgrade all the index segments with the latest 
> key version available.
> h1. How it is implemented ?
> h2. Key Management
> One index segment is encrypted with a single key version. An index can have 
> multiple segments, each one encrypted using a different key version. The key 
> version for a segment is stored in the segment info.
> The provided codec is abstract, and a subclass is responsible in providing an 
> implementation of the cipher factory. The cipher factory is responsible of 
> the creation of a cipher instance based on a given key version.
> h2. Encryption Model
> The encryption model is based on AES/CBC with padding. Initialisation vector 
> (IV) is reused for performance reason, but only on a per format and per 
> segment basis.
> While IV reuse is usually considered a bad practice, the CBC mode is somehow 
> resilient to IV reuse. The only "leak" of information that this could lead to 
> is being able to know that two encrypted blocks of data starts with the same 
> prefix. However, it is unlikely that two data blocks in an index segment will 
> start with the same data:
> - Stored Fields Format: Each encrypted data block is a compressed block 
> (~4kb) of one or more documents. It is unlikely that two compressed blocks 
> start with the same data prefix.
> - Term Vectors: Each encrypted data block is a compressed block (~4kb) of 
> terms and payloads from one or more documents. It is unlikely that two 
> compressed blocks start with the same data prefix.
> - Term Dictionary Index: The term dictionary index is encoded and encrypted 
> in one single data block.
> - Term Dictionary Data: Each data block of the term dictionary encodes a set 
> of suffixes. It is unlikely to have two dictionary data blocks sharing the 
> same prefix within the same segment.
> - DocValues: A DocValues file will be composed of multiple encrypted data 
> blocks. It is unlikely to have two data blocks sharing the same prefix within 
> the same segment (each one will encodes a list of values associated to a 
> field).
> To the best of our knowledge, this model should be safe. However, it would be 
> good if someone with security expertise in the community could review and 
> validate it. 
> h1. Performance
> We report here a performance benchmark we did on an early prototype based on 
> Lucene 4.x. The benchmark was performed on the Wikipedia dataset where all 
> the fields (id, title, body, date) were encrypted. Only the block tree terms 
> and compressed stored fields format were tested at that time. 
> h2. Indexing
> The indexing throughput slightly decreased and is roughly 15% less than with 
> the base Lucene. 
> The merge time slightly increased by 35%.
> There was no significant difference in term of index size.
> h2. Query Throughput
> With respect to query throughput, we 

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-05-22 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113960#comment-17113960
 ] 

Michael McCandless commented on LUCENE-9378:


Thanks for running the luceneutil benchmarks Michael Sokolov!
{quote}Interestingly, BrowseDateTaxoFacets shows a big improvement! But 
otherwise we see a pretty significant degradation in performance.
{quote}
That is fascinating, because faceting uses BINARY DV to hold all ordinals.  I 
wonder whether the BINARY DV compression somehow makes faceting faster!?  Could 
you try running the tasks w/ normal relevance sort to see impact on 
{{BrowseDateTaxoFacets}}?   (So we can separate "sorting by BINARY compressed" 
from "faceting on BINARY compressed").

Robert Muir also suggested this idea: have we verified that the block 
decompression only happens once per block, when we {{.advance}} to multiple 
(increasing) docids in the block?  The sizable performance hits are so big in 
the results above that it makes me wonder if we are accidentally decompressing 
on every {{.advance}} rather than once per block.

Also, I wonder why the original benchmarks on the issue didn't uncover similar 
performance changes.

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13325) Add a collection selector to ComputePlanAction

2020-05-22 Thread Shalin Shekhar Mangar (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-13325.
--
  Assignee: Shalin Shekhar Mangar
Resolution: Fixed

Thanks [~ab] for the review!

> Add a collection selector to ComputePlanAction
> --
>
> Key: SOLR-13325
> URL: https://issues.apache.org/jira/browse/SOLR-13325
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: master (9.0), 8.6
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Similar to SOLR-13273, it'd be nice to have a collection selector that 
> applies to compute plan action. An example use-case would be to selectively 
> add replicas on new nodes for certain collections only.
> Here is a selector that returns collections that match the given collection 
> property/value pair:
> {code}
> "collection": {"property_name": "property_value"}
> {code}
> Here's another selector that returns collections that have the given policy 
> applied
> {code}
> "collection": {"#policy": "policy_name"}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13325) Add a collection selector to ComputePlanAction

2020-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113848#comment-17113848
 ] 

ASF subversion and git services commented on SOLR-13325:


Commit 7020713da11363963926a6db33f246dd743fcd1c in lucene-solr's branch 
refs/heads/branch_8x from Shalin Shekhar Mangar
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7020713 ]

SOLR-13325: Add a collection selector to ComputePlanAction (#1512)

ComputePlanAction now supports a collection selector of the form `collections: 
{policy: my_policy}` which can be used to select multiple collections that 
match collection property/value pairs. This is useful to maintain a whitelist 
of collections for which actions should be taken without needing to hard-code 
the collection names. The collection hints are pushed down to the policy engine 
so operations for non-matching collections are not computed at all. The 
AutoAddReplicasPlanAction now becomes a thin shim over ComputePlanAction and 
simply adds a collection selector for the collection property 
autoAddReplicas=true.

(cherry picked from commit 338671e511b753955f7186e7063cd95824cdf4e0)


> Add a collection selector to ComputePlanAction
> --
>
> Key: SOLR-13325
> URL: https://issues.apache.org/jira/browse/SOLR-13325
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Shalin Shekhar Mangar
>Priority: Major
> Fix For: master (9.0), 8.6
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Similar to SOLR-13273, it'd be nice to have a collection selector that 
> applies to compute plan action. An example use-case would be to selectively 
> add replicas on new nodes for certain collections only.
> Here is a selector that returns collections that match the given collection 
> property/value pair:
> {code}
> "collection": {"property_name": "property_value"}
> {code}
> Here's another selector that returns collections that have the given policy 
> applied
> {code}
> "collection": {"#policy": "policy_name"}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14507) Option to pass solr.hdfs.home in API backup/restore calls

2020-05-22 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113784#comment-17113784
 ] 

Lucene/Solr QA commented on SOLR-14507:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
21s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m  7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m  3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m  3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 46m 
17s{color} | {color:green} core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
44s{color} | {color:green} solrj in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-14507 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13003698/SOLR-14507.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 338671e511b |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/754/testReport/ |
| modules | C: solr/core solr/solrj U: solr |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/754/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Option to pass solr.hdfs.home in API backup/restore calls
> -
>
> Key: SOLR-14507
> URL: https://issues.apache.org/jira/browse/SOLR-14507
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Reporter: Haley Reeve
>Priority: Major
> Attachments: SOLR-14507.patch
>
>
> The Solr backup/restore API has an optional parameter for specifying the 
> directory to backup to. However, the HdfsBackupRepository class doesn't use 
> this location when creating the HDFS Filesystem object. Instead it uses the 
> solr.hdfs.home setting configured in solr.xml. This functionally means that 
> the backup location, which can be passed to the API call dynamically, is 
> limited by the static home directory defined in solr.xml. This requirement 
> means that if the solr.hdfs.home path and backup location don't share the 
> same URI scheme and hostname, the backup will fail, even if the backup could 
> otherwise have been written to the specified location successfully.
> If we had the option to pass the solr.hdfs.home path as part of the API call, 
> it would remove this limitation on the backup location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org