Log slow queries to SQL Database using Log4j2 (JDBC)

2020-05-25 Thread Krönert Florian
Hi everyone, For our Solr instance I have the requirement that all queries should be logged, so that we can later on analyze, which search texts were queried most often. Were using solr 8.3.1 using the official docker image, hosted on Azure. My approach for implementing this, was now to

unified highlighter performance in solr 8.5.1

2020-05-25 Thread Michal Hlavac
Hi, I have field: and configuration: true unified true content_txt_sk_highlight 2 true Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which is really slow. Same query with hl.bs.type=WORD takes from 8 - 45 ms is this normal behaviour or should I create issue? thanks, m.

Re: unified highlighter performance in solr 8.5.1

2020-05-25 Thread Michal Hlavac
I did same test on solr 8.4.1 and response times are same for both hl.bs.type=SENTENCE and hl.bs.type=WORD m. On pondelok 25. mája 2020 15:28:24 CEST Michal Hlavac wrote: Hi, I have field: and configuration: true unified true content_txt_sk_highlight 2 true Doing query with

Default Values and Missing Field Queries

2020-05-25 Thread Chris Dempsey
I'm new to Solr and made an honest stab to finding this in info the docs. I'm working on an update to an existing large collection in Solr 7.7 to add a BoolField to mark it as "soft deleted" or not. My understanding is that updating the schema will mean the new field will only exist and have a

Re: Default Values and Missing Field Queries

2020-05-25 Thread Erick Erickson
Try q=*:* -boolfield=false And it's not as costly as you might think, there's special handling for *:* queries. And if you put that in an fq clause instead, the result set will be put into the filter cache and be reused assuming you want to do this repeatedly. BTW, Solr doesn't use strict

Re: Default Values and Missing Field Queries

2020-05-25 Thread Chris Dempsey
Thanks for the clarification and pointers Erick! Much appreciated! On Mon, May 25, 2020 at 11:18 AM Erick Erickson wrote: > Try q=*:* -boolfield=false > > And it's not as costly as you might think, there's special handling for *:* > queries. And if you put that in an fq clause instead, the

Re: unified highlighter performance in solr 8.5.1

2020-05-25 Thread Michal Hlavac
Yes, have no problems in 8.4.1, only 8.5.1 Also yes, those are multi page pdf files. m. On pondelok 25. mája 2020 19:11:31 CEST David Smiley wrote: > Wow that's terrible! > So this problem is for SENTENCE in particular, and it's a regression in > 8.5? I'll see if I can reproduce this with the

How to control ranking based on into which field a hit is found

2020-05-25 Thread Steven White
Hi everyone, I index my data from the DB into their own fields. I then use copyField to index the value of all the fields into _ALL_FIELDS_ that I created. In my edismax, I use _ALL_FIELDS_ for “df”. Here is how my edismax looks like: explicit edismax *:* AND

Re: unified highlighter performance in solr 8.5.1

2020-05-25 Thread David Smiley
Wow that's terrible! So this problem is for SENTENCE in particular, and it's a regression in 8.5? I'll see if I can reproduce this with the Lucene benchmark module. I figure you have some meaty text, like "page" size or longer? ~ David On Mon, May 25, 2020 at 10:38 AM Michal Hlavac wrote: >

Re: How to control ranking based on into which field a hit is found

2020-05-25 Thread Erick Erickson
Try something like q=whatever OR q=id:whatever^1000 I’d put it in quotes for the id= clause, and do look at what the parsed query looks like when you specify =query. The reason I recommend this is you’ll no doubt try something like q=id:download MOD2012A manual witout quotes and be very

Re: How to control ranking based on into which field a hit is found

2020-05-25 Thread Steven White
Thanks Erick. OR'ing ID:"MOD2012A"^1000 with the original query will not always guarantee that the record with the matching ID will be the #1 hit on the list, or will it? Also, why did you boost by a factor of 1000? I never figured out what the number means for boosting. I have seen 10,

Solr Deletes

2020-05-25 Thread Dwane Hall
Hey Solr users, I'd really appreciate some community advice if somebody can spare some time to assist me. My question relates to initially deleting a large amount of unwanted data from a Solr Cloud collection, and then advice on best patterns for managing delete operations on a regular

Re: Log slow queries to SQL Database using Log4j2 (JDBC)

2020-05-25 Thread Walter Underwood
I would back up and do this a different way, with off-the-shelf parts. Send the logs to syslog or your favorite log aggregator. From there, configure something that puts them into an ELK stack (Elasticsearch, Logstash, Kibana). A commercial version of this is logz.io .

Re: How to control ranking based on into which field a hit is found

2020-05-25 Thread Erick Erickson
If you boost it high enough it should, but you’re right it’s not guaranteed. The number is “whatever works”, it’s just a number the score is multiplied by. But another, not costly, but guaranteed to work would be have your app do a real-time get on the ID in parallel with the main query.

RE: Indexing huge data onto solr

2020-05-25 Thread Srinivas Kashyap
Hi Erick, Thanks for the below response. The link which you provided holds good if you have single entity where you can join the tables and index it. But in our scenario, we have nested entities joining different tables as shown below: db-data-config.xml: (table 1 join