Facets based on sampling

2016-11-04 Thread John Davis
Hi, I am trying to improve the performance of queries with facets. I understand that for queries with high facet cardinality and large number results the current facet computation algorithms can be slow as they are trying to loop across all docs and facet values. Does there exist an option to

Empty facets on TextField

2016-10-18 Thread John Davis
Hi, I have converted one of my fields from StrField to TextField and am not getting back any facets for that field. Here's the exact configuration of the TextField. I have tested it with 6.2.0 on a fresh instance and it repros consistently. From reading through past archives and documentation, it

Re: Empty facets on TextField

2016-10-18 Thread John Davis
docValues would still exist in the index for this > field (just with no values), and that normal faceting would use those. > Forcing facet.method=enum forces the use of the index instead of > docvalues (or the fieldcache if the field is configured w/o > docvalues). > > -Yonik > >

Re: Empty facets on TextField

2017-01-06 Thread John Davis
gment that has them. > > -Yonik > > > > > > On Tue, Oct 18, 2016 at 10:09 PM, John Davis <johndavis925...@gmail.com> > wrote: > >> Thanks. Is there a way around to not starting fresh and forcing the > reindex > >> to remove docValues? > >

Schemaless detecting multivalued fields

2017-10-19 Thread John Davis
Hi, I know about the schemaless configuration defaulting to multivalued fields of the corresponding type. I was just wondering if there was a way to first detect if the incoming value is list or singleton, and based on it pick the corresponding types. Ideally if the value is an long then use

Really slow facet performance in 6.6

2017-10-23 Thread John Davis
Hello, We are seeing really slow facet performance with new solr release. This is on an index of 2M documents. A few things we've tried: 1. method=uif however that didn't help much (the facet fields have docValues=false since they are multi-valued). Debug info below. 2. changing query (q=) that

Re: Facets based on sampling

2017-10-23 Thread John Davis
Docvalues don't work for multivalued fields. I just started a separate thread with more debug info. It is a bit surprising why facet computation is so slow even when the query matches hundreds of docs. On Mon, Oct 23, 2017 at 6:53 AM, alessandro.benedetti wrote: > Hi John,

Re: SolrCloud

2017-12-15 Thread John Davis
; new_collection, basically all your routing is the same. You can create > aliases pointing to multiple collections or specify multiple > collections on the query, don't know if that fits your use case or not > though. > > > Best, > Erick > > On Fri, Dec 15, 2017 at 9:03 AM, Joh

SolrCloud

2017-12-15 Thread John Davis
Hello, We are thinking about migrating to SolrCloud. Our current setup is: 1. Multiple replicas and shards. 2. Each query typically hits a single shard only. 3. We have an external system that assigns a document to a shard based on it's origin and is also used by solr clients when querying to find

Solr index size statistics

2017-12-02 Thread John Davis
Hello, Is there a way to get index size statistics for a given solr instance? For eg broken by each field stored or indexed. The only things I know of is running du on the index data files and getting counts per field indexed/stored, however each field can be quite different wrt size. Thanks John

Re: Facets based on sampling

2017-10-24 Thread John Davis
ypes) fields. What you need to > do is to convert your analysed field to multivalue string field - that > requires changes in indexing flow. > > > > HTH, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection > > Solr & Elasticsearch Con

Re: Facets based on sampling

2017-10-20 Thread John Davis
t; On Fri, Nov 4, 2016 at 3:02 PM, John Davis <johndavis925...@gmail.com> > wrote: > > Hi, > > I am trying to improve the performance of queries with facets. I > understand > > that for queries with high facet cardinality and large number results the > > current

Re: Sort by payload value

2018-05-25 Thread John Davis
; guts of the payload > calcs. > > FYI, ties are broken by the internal Lucene doc ID. If the theory that > you are getting > no matches, then your sort order is determined by this value which you > don't really > have much access to. > > Best, > Erick > > On Thu, May

Sort by payload value

2018-05-24 Thread John Davis
Hello, We are trying to use payload values as described in [1] and are running into issues when issuing *sort by* payload value. Would appreciate any pointers to what we might be doing wrong. We are running solr 6.6.0. * Here's the payload value definition:

Matching within list fields

2018-01-29 Thread John Davis
Hi there! We have a use case where we'd like to search within a list field, however the search should not match across different elements in the list field -- all terms should match a single element in the list. For eg if the field is a list of comments on a product, search should be able to

Solr needs a restart to recover from "No space left on device"

2018-02-06 Thread John Davis
Hi there! We ran out of disk on our solr instance. However even after cleaning up the disk solr server did not realize that there is free disk available. It only got fixed after a restart. Is this a known issue? Or are there workarounds that don't require a restart? Thanks John

Index size by document fields

2018-08-04 Thread John Davis
Hi, Is there a way to monitor the size of the index broken by individual fields across documents? I understand there are different parts - the inverted index and the stored fields - and an estimate would be good start. Thanks John

Ignored fields and copyfield

2018-08-06 Thread John Davis
Hi there, If a field is set as "ignored" (indexed=false, stored=false) can it be used for another field as part of copyfield directive which might index/store it. John

Re: What causes new searcher to be created?

2019-03-10 Thread John Davis
s that until a new searcher is created all the > > newly indexed docs will not be visible > > This should be the case. So regardless of what the admin says, _can_ > you see newly indexed documents? > > Best, > Erick > > > On Mar 9, 2019, at 7:24 PM, John Davis &

What causes new searcher to be created?

2019-03-09 Thread John Davis
Hi there, I couldn't find an answer to this in the docs: if openSearcher is set to false in the autocommit with no softcommits, what triggers a new one to be created? My assumption is that until a new searcher is created all the newly indexed docs will not be visible. Based on the solr admin

Re: Optimizing fq query performance

2019-04-17 Thread John Davis
If what you describe is the case for range query [* TO *], why would lucene not optimize field:* similar way? On Wed, Apr 17, 2019 at 10:36 AM Shawn Heisey wrote: > On 4/17/2019 10:51 AM, John Davis wrote: > > Can you clarify why field:[* TO *] is lot more efficient than field:*

Re: Optimizing fq query performance

2019-04-17 Thread John Davis
wrote: > On 4/17/2019 1:21 PM, John Davis wrote: > > If what you describe is the case for range query [* TO *], why would > lucene > > not optimize field:* similar way? > > I don't know. Low level lucene operation is a mystery to me. > > I have seen first-hand that

Optimizing fq query performance

2019-04-13 Thread John Davis
Hi there, We noticed a sizable performance degradation when we add certain fq filters to the query even though the result set does not change between the two queries. I would've expected solr to optimize internally by picking the most constrained fq filter first, but maybe my understanding is

Re: Optimizing fq query performance

2019-04-14 Thread John Davis
for indexed fields because all terms for the > field need to be iterated (e.g. does term1 match doc1, does term2 match > doc1, etc) > One can optimize this by indexing a term in a different field to turn it > into a single term query (i.e. exists:field1) > > -Yonik > &g

Re: Solr Heap Usage

2019-06-02 Thread John Davis
see: https://issues.apache.org/jira/browse/SOLR-12962. > > In short, there’s not enough information until you dive in and test > bunches of stuff to tell. > > Best, > Erick > > > > On Jun 2, 2019, at 2:22 AM, John Davis > wrote: > > > > This makes sense, any

Adding Multiple JSON Documents

2019-06-02 Thread John Davis
Hi there, I was looking at the solr documentation for indexing multiple documents via json and noticed inconsistency in the docs. Should the POST url be /update/*json/docs *instead of just /update. It does look like former does work, unless both will work just fine?

Re: Solr Heap Usage

2019-06-04 Thread John Davis
overhead associated with it. On Tue, Jun 4, 2019 at 8:03 AM Erick Erickson wrote: > I need to update that, didn’t understand the bits about retaining internal > memory structures at the time. > > > On Jun 4, 2019, at 2:10 AM, John Davis > wrote: > > > > Erick -

Re: Solr Heap Usage

2019-06-07 Thread John Davis
figure out questions like number of shards/replicas, heap size, memory etc. > Hard data, good process and regular testing will trump guesswork every time > > Greg > > On Tue, Jun 4, 2019 at 9:22 AM John Davis > wrote: > > > You might want to test with softcommit of hours

Solr Heap Usage

2019-06-01 Thread John Davis
I've read a bunch of the wiki's on solr heap usage and wanted to confirm my understanding of what all does solr use the heap for: 1. Indexing new documents - until committed? if not how long are the new documents kept in heap? 2. Merging segments - does solr load the entire segment in memory or

Re: Solr Heap Usage

2019-06-04 Thread John Davis
they’d be something like this: > Do a hard commit with openSearcher=false every 60 seconds. > Do a soft commit every 5 minutes. > > I’d actually be surprised if you were able to measure differences between > those settings and just hard commit with openSearcher=true every 60 > s

Re: Solr Heap Usage

2019-06-02 Thread John Davis
and does streaming merge it shouldn't matter? On Sat, Jun 1, 2019 at 9:24 AM Walter Underwood wrote: > > On May 31, 2019, at 11:27 PM, John Davis > wrote: > > > > 2. Merging segments - does solr load the entire segment in memory or > chunks > > of it? if la

Re: Enabling/disabling docValues

2019-06-10 Thread John Davis
ly happens…. > > Best, > Erick > > P.S. I _think_ Lucene tries to use the definition from the first segment, > but since whether the lists of segments to be merged don’t look at the > field definitions at all. Whether the first segment in the list has > SortableText or not

Enabling/disabling docValues

2019-06-09 Thread John Davis
Hi there, We recently changed a field from TextField + no docValues to SortableTextField which has docValues enabled by default. Once I did this I do not see any facet values for the field. I know that once all the docs are re-indexed facets should work again, however can someone clarify the

Re: Enabling/disabling docValues

2019-06-11 Thread John Davis
& resources, and if we empower power users to understand the system better it will help making more informed tradeoffs. On Tue, Jun 11, 2019 at 6:52 AM Gus Heck wrote: > On Mon, Jun 10, 2019 at 10:53 PM John Davis > wrote: > > > You have made many assumptions which might not

Re: Enabling/disabling docValues

2019-06-09 Thread John Davis
tructing low-level analysis chains. > > So I’d _strongly_ recommend you re-index your corpus to a new collection > with the current definition, then perhaps use CREATEALIAS to seamlessly > switch. > > Best, > Erick > > > On Jun 9, 2019, at 12:50 PM, John Davis &

Facet count incorrect

2019-05-22 Thread John Davis
Hi there - Our facet counts are incorrect for a particular field and I suspect it is because we changed the type of the field from StrField to TextField. Two questions: 1. If we do re-index all the documents in the index, would these counts get fixed? 2. Is there a "safe" way of changing field

Re: Facet count incorrect

2019-05-23 Thread John Davis
leValued or vice versa (particularly with docValues) > etc. are all “fraught”. > > My usual reply is “if you’re going to reindex everything anyway, why not > just do it to a new collection and alias when you’re done?” It’s much safer. > > Best, > Erick > > > On May 22, 2019,

Re: Optimizing fq query performance

2019-04-18 Thread John Davis
FYI https://issues.apache.org/jira/browse/SOLR-11437 https://issues.apache.org/jira/browse/SOLR-12488 On Thu, Apr 18, 2019 at 7:24 AM Shawn Heisey wrote: > On 4/17/2019 11:49 PM, John Davis wrote: > > I did a few tests with our instance solr-7.4.0 and field:* vs field:[* TO > >

Re: Optimizing fq query performance

2019-04-17 Thread John Davis
Can you clarify why field:[* TO *] is lot more efficient than field:* On Sun, Apr 14, 2019 at 12:14 PM Shawn Heisey wrote: > On 4/13/2019 12:58 PM, John Davis wrote: > > We noticed a sizable performance degradation when we add certain fq > filters > > to the query even tho

Solr Payloads

2019-09-20 Thread John Davis
We are using solr payload field and noticed the values extracted using payload() sometimes don't match the value stored in the field. Is there a lossy encoding for the payload value? fq=payload_field:*, fl=payload_field,payload(payload_field, 573131) "payload_field":

Blocking certain queries

2020-02-03 Thread John Davis
Hello, Is there a way to block certain queries in solr? For eg a delete for *:* or if there is a known query that causes problems, can these be blocked at the solr server layer.

SolrDeletionPolicy & Core Reload

2021-01-02 Thread John Davis
Hi, Does Core Reload pick up changes to SolrDeletionPolicy in solrconfig.xml or does the solr server needs to be restarted? And what would be the best way to check the current values of SolrDeletionPolicy (eg