Re: Json Faceting Performance Issues on solr v8.7.0

2021-02-05 Thread Michael Gibney
Ah! that's significant. The latency is likely due to building the OrdinalMap (which maps segment ords to global ords) ... "dvhash" (assuming the relevant fields are not multivalued) will very likely work; "dvhash" doesn't map to global ords, so doesn't need to build the OrdinalMap (which gets

Re: Solr Slack Workspace

2021-02-05 Thread Anshum Gupta
Hey Ishan, Thanks for doing this. Is this the ASF Slack space or something else? On Tue, Feb 2, 2021 at 2:04 AM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > Hi all, > I've created an invite link for the Slack workspace: > https://s.apache.org/solr-slack. > Please test it out.

Re: Json Faceting Performance Issues on solr v8.7.0

2021-02-05 Thread mmb1234
> Does this happen on a warm searcher (are subsequent requests with no intervening updates _ever_ fast?)? Subsequent response times very fast if searcher remains open. As a control test, I faceted on the same field that I used in the q param. 1. Start solr 2. Execute q=resultId:x=0 =>

Re: Json Faceting Performance Issues on solr v8.7.0

2021-02-05 Thread Michael Gibney
Apologies, I missed deducing from the request url that you're already talking strictly about single-shard requests (so everything I was suggesting about shards.preference etc. is not applicable). "dvhash" is still worth a try though, esp. with `numFound` being 943 (out of 185 million!). Does this

Re: Solr Slack Workspace

2021-02-05 Thread Justin Sweeney
Worked for me and a few others, thanks for doing that! On Tue, Feb 2, 2021 at 5:04 AM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > Hi all, > I've created an invite link for the Slack workspace: > https://s.apache.org/solr-slack. > Please test it out. I'll send a broader

Re: Json Faceting Performance Issues on solr v8.7.0

2021-02-05 Thread mmb1234
Ok. I'll try that. Meanwhile query on resultId is subsecond response. But the immediate next query for faceting takes 40+secs. The core has 185million docs and 63GB index size. curl 'http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=resultId:x=0' {

timeouts when update sent to non-Leader

2021-02-05 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
We have a problem on a 3.5gig collection running Solr7.4 (we will soon upgrade to Solr8.5.2) Users were often encountering timeout errors of the type shown below My colleague found a blog post at

Re: Json Faceting Performance Issues on solr v8.7.0

2021-02-05 Thread Michael Gibney
`resultId` sounds like it might be a relatively high-cardinality field (lots of unique values)? What's your number of shards, and replicas per shard? SOLR-15008 (note: not a bug) describes a situation that may be fundamentally similar to yours (though to be sure it's impossible to say for sure

Json Faceting Performance Issues on solr v8.7.0

2021-02-05 Thread mmb1234
Hello, I am seeing very slow response from json faceting against a single core (though core is shard leader in a collection). Fields processId and resultId are non-multivalued, indexed and docvalues string (not text). Soft Commit = 5sec (opensearcher=true) and Hard Commit = 10sec because new

RE: Authentication for all but selects

2021-02-05 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
What works for us is having something like this at the bottom of security.json: { "name":"open_select", "path":"/select/*", "role":null, "index":9}, { "name":"catch-all-nocollection", "collection":null, "path":"/*",

Re: Extract a list of the most recent field values?

2021-02-05 Thread Alexandre Rafalovitch
Rewriting: *) https://lucene.apache.org/solr/guide/8_8/json-request-api.html#json-parameter-merging , there is a way to represent most (all?) of the structure with json.x parameter. *) Also, you can create custom Request Handlers in solrconfig.xml with a lot of those parameters either as defaults

Re: Clarification on term facet method dvhash

2021-02-05 Thread Michael Gibney
Happy to help! If I'm correctly reading the block of code linked to above, "dvhash" is silently ignored for multi-valued fields. So probably not much performance difference there ;-) On Fri, Feb 5, 2021 at 2:12 PM ufuk yılmaz wrote: > This is a huge help Mr. Gibney thank you! > > One thing I

Sv: Extract a list of the most recent field values?

2021-02-05 Thread Hullegård , Jimi
Ah, I never thought about grouping on date ranges, and nesting the faceting like that. Interesting! I managed to do a quick test query that seems to give me what I want: { "query": "*:*", "filter": "+category:* +modified:[NOW/DAY-60DAYS TO *]", "limit": 0, "facet": {

RE: Clarification on term facet method dvhash

2021-02-05 Thread ufuk yılmaz
This is a huge help Mr. Gibney thank you! One thing I can add is I tried dvhash with a string multi-valued field, it worked and didn’t throw any error but I don’t know if it got silently ignored or just worked. Sent from Mail for Windows 10 From: Michael Gibney Sent: 05 February 2021 20:52

Authentication for all but selects

2021-02-05 Thread Robert Douglas
Hello all, We are working on some migrations and we want to be incorporating authentication more uniformly across all our installations of Solr, but we are getting stuck on allowing Select statements through without authentication while having authentication on with RBAP for everything else.

Re: Clarification on term facet method dvhash

2021-02-05 Thread Michael Gibney
Correction!: wrt "dvhash" and numeric types, it looks like I had it exactly backwards! single-valued numeric types _do_ use (even default to) "dvhash" ... sorry about that! I stand by the rest of the previous message though, which applies at a minimum to string-like fields. On Fri, Feb 5, 2021 at

Re: Clarification on term facet method dvhash

2021-02-05 Thread Michael Gibney
> Performance and resource is still affected by 30M unique values of T right? Yes. The main performance issue would be the per-request allocation of a 30M-element `long[]` for "dv" or "uif" methods (which are by far the most common methods in practice). With low enough request volume and large

Clarification on term facet method dvhash

2021-02-05 Thread ufuk yılmaz
Hello, I’m using Solr 8.4. Very excited about performance improvements in 8.8: http://joelsolr.blogspot.com/2021/01/optimizations-coming-to-solr.html As I understand the main determinator of performance and RAM usage of a terms facet is cardinality of the field in whole collection, but not the

Re: Extract a list of the most recent field values?

2021-02-05 Thread Alexandre Rafalovitch
This feels like basic faceting on category, but you are trying to make a latest record, rather than count as a sorting/grouping principle. How about using JSON Facets? https://lucene.apache.org/solr/guide/8_8/json-facet-api.html I would do the first level as range facet and do your dates at

Re: 404 Errors on update/extract

2021-02-05 Thread Alexandre Rafalovitch
Hi Leon, Feel free to create JIRA issue https://issues.apache.org/jira/secure/Dashboard.jspa and then do Github pull request to fix the example name. The documentation is in asciidoc format at: https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide/src with names matching those on

Re: 404 Errors on update/extract

2021-02-05 Thread nq
Hi Alex, Thanks a lot for your help! I have tested the same using the 'techproducts' example as proposed, and it worked fine. You are right, the documentation seems to be outdated in this aspect. I have just reviewed the solrconfig.xml of the 'schemaless' example and found all the Solr

Sv: Extract a list of the most recent field values?

2021-02-05 Thread Hullegård , Jimi
Hi Emir, But that page says: "The field that is being collapsed on. The field must be a single valued String, Int or Float" And the field in question is a multi value field. And when I try using fq={!collapse field=myField} I get: "org.apache.solr.search.SyntaxError: Collapsing not supported

Re: 404 Errors on update/extract

2021-02-05 Thread Alexandre Rafalovitch
I think the extract handler is not defined in schemaless. This may be a change from before and the documentation is out of sync. Can you try 'techproducts' example instead of schemaless: bin/solr stop (if you are still running it) bin/solr start -e techproducts Then the import command. The Tika

Re: Extract a list of the most recent field values?

2021-02-05 Thread Emir Arnautović
Hi Jimi, It seems to me that you could get the results using collapsing query parse: https://lucene.apache.org/solr/guide/6_6/collapse-and-expand-results.html HTH, Emir -- Monitoring - Log Management - Alerting -

Extract a list of the most recent field values?

2021-02-05 Thread Hullegård , Jimi
Hi, Say we have a bunch of documents in Solr, and each document has a multi value field "category". Now I would like to get the N most recently used categories, ordered so that the most recently used category comes first and then in falling order. My simplistic solution to this would be: 1.

404 Errors on update/extract

2021-02-05 Thread nq
Hi, I am new to Solr and tried to follow the guide to upload PDF data using Tika, on Solr 8.7.0 (running on Debian 10): https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html but I get an HTTP 404 error when trying to import the file. In the solr

Empty shard1 - -:{"shard1":[]} cannot add new replicas

2021-02-05 Thread Dirk Wintergruen
Dear all, I cannot add or remove any replicas of one collection. Diagnostics in the log file shows empty shards "gmpg-fulltext3":{"shard1":[]}, see below. What can I do? ng.error.diagnostics.3897285248187441 { "sortedNodes":[{

Re: Recovering forever after upgrade to 8.8.0: Timeout waiting for collection state

2021-02-05 Thread Henrik B A
On Fri, Feb 5, 2021 at 10:58 AM Henrik Brautaset Aronsen wrote: > After upgrading our Solr Cloud collections from 8.7.0 to 8.8.0 I struggle > to get a consistent state. We have 8 servers hosting 3 collections, with > shards/replicas spread over alle the servers. > > All replicas on solr3577 is

Recovering forever after upgrade to 8.8.0: Timeout waiting for collection state

2021-02-05 Thread Henrik B A
After upgrading our Solr Cloud collections from 8.7.0 to 8.8.0 I struggle to get a consistent state. We have 8 servers hosting 3 collections, with shards/replicas spread over all the servers. All replicas on solr3577 is in "Recovering" state, and is repeating every five minutes:

Re: Excessive logging 8.8.0

2021-02-05 Thread Markus Jelsma
Thanks! Op do 4 feb. 2021 om 20:04 schreef Chris Hostetter : > > FWIW: that log message was added to branch_8x by 3c02c9197376 as part of > SOLR-15052 ... it's based on master commit 8505d4d416fd -- but that does > not add that same logging message ... so it definitely smells like a > mistake to

Recovering forever after upgrade to 8.8.0: Timeout waiting for collection state

2021-02-05 Thread Henrik Brautaset Aronsen
Hi! After upgrading our Solr Cloud collections from 8.7.0 to 8.8.0 I struggle to get a consistent state. We have 8 servers hosting 3 collections, with shards/replicas spread over alle the servers. All replicas on solr3577 is in "Recovering" state, and is repeating every five minutes:

Unable to connect to an 8.8.0 Solr Cloud database via API

2021-02-05 Thread Flowerday, Matthew J
Hi There I have been checking out the latest (8.8.0) SolrCloud database (using Zookeeper 3.6.2) against our application which talks to Solr via the Solr API (I am not too sure of the details as I am not a java developer unfortunately!). The software has Solr 8.7.0/ZooKeeper 3.6.2 libraries