Re: Why does Solr sort on _docid_ with rows=0 ?

2020-02-28 Thread Walter Underwood
docid is the natural order of the posting lists, so there is no sorting effort.
I expect that means “don’t sort”.

Also, cross-posting is probably not good. I’m replying only to solr-user.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 28, 2020, at 5:42 PM, S G  wrote:
> 
> So no one knows this then?
> It seems like a good opportunity to get some performance!
> 
> On Tue, Feb 25, 2020 at 2:01 PM S G  wrote:
> 
>> Hi,
>> 
>> I see a lot of such queries in my Solr 7.6.0 logs:
>> 
>> 
>> *path=/select
>> params={q=*:*=false=_docid_+asc=0=javabin=2}
>> hits=287128180 status=0 QTime=7173*
>> On some searching, this is the code seems to fire the above:
>> 
>> https://github.com/apache/lucene-solr/blob/f80e8e11672d31c6e12069d2bd12a28b92e5a336/solr/solrj/src/java/org/apache/solr/client/solrj/impl/LBSolrClient.java#L89-L101
>> 
>> Can someone explain why Solr is doing this?
>> Note that "hits" is a very large value and is something which could be
>> impacting performance?
>> 
>> If you want to check a zombie server, shouldn't there be a much less
>> expensive way to do a health-check instead?
>> 
>> Thanks
>> SG
>> 
>> 
>> 
>> 



Re: Why does Solr sort on _docid_ with rows=0 ?

2020-02-28 Thread S G
So no one knows this then?
It seems like a good opportunity to get some performance!

On Tue, Feb 25, 2020 at 2:01 PM S G  wrote:

> Hi,
>
> I see a lot of such queries in my Solr 7.6.0 logs:
>
>
> *path=/select
> params={q=*:*=false=_docid_+asc=0=javabin=2}
> hits=287128180 status=0 QTime=7173*
> On some searching, this is the code seems to fire the above:
>
> https://github.com/apache/lucene-solr/blob/f80e8e11672d31c6e12069d2bd12a28b92e5a336/solr/solrj/src/java/org/apache/solr/client/solrj/impl/LBSolrClient.java#L89-L101
>
> Can someone explain why Solr is doing this?
> Note that "hits" is a very large value and is something which could be
> impacting performance?
>
> If you want to check a zombie server, shouldn't there be a much less
> expensive way to do a health-check instead?
>
> Thanks
> SG
>
>
>
>


Re: Limiting access to /admin path

2020-02-28 Thread Jesús Roca
Yes, it works!

Thanks a lot!

El vie., 28 feb. 2020 20:15, Oakley, Craig (NIH/NLM/NCBI) [C]
 escribió:

> I have found that for admin commands you may need to include
> "collection":null
>   {
> "name":"admin-info-system2",
> "path":"/admin/*",
> "collection":null,
> "role":"*"}
>
>
> -Original Message-
> From: Jesús Roca 
> Sent: Friday, February 28, 2020 2:10 PM
> To: solr-user@lucene.apache.org
> Subject: Limiting access to /admin path
>
>  Hello,
>
> I have a Solr 7.7.2 instance with basic authentication.
>
> Anyone knows how to limit only to authenticated users the access to /admin
> path?
> For example to:
>
> https://localhost:8983/solr/admin/info/system
>
> When I access to that section this is the log generated:
> 2020-02-28 18:05:58.896 INFO  (qtp694316372-17) [   ] o.a.s.s.HttpSolrCall
> [admin] webapp=null path=/admin/info/system params={} status=0 QTime=36
>
> I have added the following custom permission, but it doesn't block the
> unauthenticated request to that section:
>
> "permissions":[
>   {
> "name":"admin-info-system",
> "path":"/admin/info/system",
> "role":"*"}
>   ],
>
> If I create the following custom permissions with diferent path:
>
> "permissions":[
>   {
> "name":"admin-info-system1",
> "path":"/select/*",
> "role":"*"},
>   {
> "name":"admin-info-system2",
> "path":"/admin/*",
> "role":"*"}
>   ],
>
> Then, I have to authenticate when I query a collection, but I can still
> access to /admin/info/system or /admin/collections?action=CLUSTERSTATUS
>
> Definitely, I don't know how to block unauthenticated access to /admin path
> without add the blockUnknown=true attribute but, if I do that, all the
> request will have to be authenticated and I didn't.
>
> Thanks in advance!
>


SolrCloud location for solr.xml

2020-02-28 Thread Mike Drob
Hi Searchers!

I was recently looking at some of the start-up logic for Solr and was
interested in cleaning it up a little bit. However, I'm not sure how common
certain deployment scenarios are. Specifically is anybody doing the
following combination:

* Using SolrCloud (i.e. state stored in zookeeper)
* Loading solr.xml from a local solr home rather than zookeeper

Much appreciated! Thanks,
Mike


RE: Limiting access to /admin path

2020-02-28 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
I have found that for admin commands you may need to include "collection":null
  {
"name":"admin-info-system2",
"path":"/admin/*",
"collection":null,
"role":"*"}


-Original Message-
From: Jesús Roca  
Sent: Friday, February 28, 2020 2:10 PM
To: solr-user@lucene.apache.org
Subject: Limiting access to /admin path

 Hello,

I have a Solr 7.7.2 instance with basic authentication.

Anyone knows how to limit only to authenticated users the access to /admin
path?
For example to:

https://localhost:8983/solr/admin/info/system

When I access to that section this is the log generated:
2020-02-28 18:05:58.896 INFO  (qtp694316372-17) [   ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={} status=0 QTime=36

I have added the following custom permission, but it doesn't block the
unauthenticated request to that section:

"permissions":[
  {
"name":"admin-info-system",
"path":"/admin/info/system",
"role":"*"}
  ],

If I create the following custom permissions with diferent path:

"permissions":[
  {
"name":"admin-info-system1",
"path":"/select/*",
"role":"*"},
  {
"name":"admin-info-system2",
"path":"/admin/*",
"role":"*"}
  ],

Then, I have to authenticate when I query a collection, but I can still
access to /admin/info/system or /admin/collections?action=CLUSTERSTATUS

Definitely, I don't know how to block unauthenticated access to /admin path
without add the blockUnknown=true attribute but, if I do that, all the
request will have to be authenticated and I didn't.

Thanks in advance!


Limiting access to /admin path

2020-02-28 Thread Jesús Roca
 Hello,

I have a Solr 7.7.2 instance with basic authentication.

Anyone knows how to limit only to authenticated users the access to /admin
path?
For example to:

https://localhost:8983/solr/admin/info/system

When I access to that section this is the log generated:
2020-02-28 18:05:58.896 INFO  (qtp694316372-17) [   ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={} status=0 QTime=36

I have added the following custom permission, but it doesn't block the
unauthenticated request to that section:

"permissions":[
  {
"name":"admin-info-system",
"path":"/admin/info/system",
"role":"*"}
  ],

If I create the following custom permissions with diferent path:

"permissions":[
  {
"name":"admin-info-system1",
"path":"/select/*",
"role":"*"},
  {
"name":"admin-info-system2",
"path":"/admin/*",
"role":"*"}
  ],

Then, I have to authenticate when I query a collection, but I can still
access to /admin/info/system or /admin/collections?action=CLUSTERSTATUS

Definitely, I don't know how to block unauthenticated access to /admin path
without add the blockUnknown=true attribute but, if I do that, all the
request will have to be authenticated and I didn't.

Thanks in advance!


Re: Re: Re: Re: Re: Query Autocomplete Evaluation

2020-02-28 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Paras,

Thank you! This is all very helpful __ I'm going to read through your answer a 
couple more times and follow up if I have any more questions!

Best,
Audrey

On 2/28/20, 8:08 AM, "Paras Lehana"  wrote:

Hey Audrey,

Users often skip results and go straight to vanilla search even though
> their query is displayed in the top of the suggestions list


Yes, we do track this in another metric. This behaviour is more
prevalent for shorter terms like "tea" and "bag". But, anyways, we measure
MRR for quantifying how high are we able to show suggestions to the users.
Since we include only the terms selection via Auto-Suggest in the universe
for calculation, the searches where user skip Auto-Suggest won't be
counted. I think we can safely exclude these if you're using MRR to measure
how well you order your result set. Still, if you want to include those,
you can always compare the search term with the last result set and include
them in MRR - you're actually right that users maybe skipping the lower
positions even if the intended suggestion is available. Our MRR stands at
68% and 75% of all of the suggestions are selected from position #1 or #2.


So acceptance rate = # of suggestions taken / total queries issued?


Yes. The total queries issues should ideally be those where Auto-Suggest
was selected or could have been selected i.e. we exclude voice searches. We
try to include as much as those searches which were made via typing in the
search bar. But that's how we have fine-tuned our tracking over months.
You're right about the general formula - searches via Auto-Suggest divided
by total Searches.


And Selection to Display = # of suggestions taken (this would only be 1, if
> the not-taken suggestions are given 0s) / total suggestions displayed? If
> the above is true, wouldn't Selection to Display be binary? I.e. it's
> either 1/# of suggestions displayed (assuming this is a constant) or 0?


Yup. Please note that this is calculated per session of Auto-Suggest. Let
the formula be S/D. We will take D (Display) as 1 and not 3 when a user
query for "bag" (b, ba, bag). If the S (Selection) was made in the last
display, it is 1 also. If a user selects "bag" after writing "ba", we don't
say that S=0, D=1 for "b" and S=1, D=1 for "ba". For this, we already track
APL (Average Prefix Length). S/D is calculated per search and thus, here
S=1, D=1 for search "bag". Thus, for a single search, S/D can be either 0
or 1 - you're right, it's binary!

Hope this helps. Loved your questions! :)

On Thu, 27 Feb 2020 at 22:21, Audrey Lorberfeld - audrey.lorberf...@ibm.com
 wrote:

> Paras,
>
> Thank you for this response! Yes, you are being clear __
>
> Regarding the assumptions you make for MRR, do you have any research
> papers to confirm that these user behaviors have been observed? I only ask
> because this paper 
https://urldefense.proofpoint.com/v2/url?u=http-3A__yichang-2Dcs.com_yahoo_sigir14-5FSearchAssist.pdf=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=itCtsKdh-LT8eUwdVvqBc96lR_64mPtVw7t52WMtBLs=JrGARO4xkzWbtv7_b-H5da6Ki6PemYL5NQ253y0Y7Qs=
 
> talks about how users often skip results and go straight to vanilla search
> even though their query is displayed in the top of the suggestions list
> (section 3.2 "QAC User Behavior Analysis"), among other behaviors that go
> against general IR intuition. This is only one paper, of course, but it
> seems that user research of QAC is hard to come by otherwise.
>
> So acceptance rate = # of suggestions taken / total queries issued ?
> And Selection to Display = # of suggestions taken (this would only be 1,
> if the not-taken suggestions are given 0s) / total suggestions displayed ?
>
> If the above is true, wouldn't Selection to Display be binary? I.e. it's
> either 1/# of suggestions displayed (assuming this is a constant) or 0?
>
> Best,
> Audrey
>
>
> 
> From: Paras Lehana 
> Sent: Thursday, February 27, 2020 2:58:25 AM
> To: solr-user@lucene.apache.org
> Subject: [EXTERNAL] Re: Re: Re: Query Autocomplete Evaluation
>
> Hi Audrey,
>
> For MRR, we assume that if a suggestion is selected, it's relevant. It's
> also assumed that the user will always click the highest relevant
> suggestion. Thus, we calculate position selection for each selection. If
> still, I'm not understanding your question correctly, feel free to contact
> me personally (hangouts?).
>
> And @Paras, the third and fourth evaluation metrics you listed in your
> > first reply seem the same to me. What is the difference between the two?
>
>
> I was expecting you to ask this - I should 

Re: Re: Re: Re: Query Autocomplete Evaluation

2020-02-28 Thread Paras Lehana
Hey Audrey,

Users often skip results and go straight to vanilla search even though
> their query is displayed in the top of the suggestions list


Yes, we do track this in another metric. This behaviour is more
prevalent for shorter terms like "tea" and "bag". But, anyways, we measure
MRR for quantifying how high are we able to show suggestions to the users.
Since we include only the terms selection via Auto-Suggest in the universe
for calculation, the searches where user skip Auto-Suggest won't be
counted. I think we can safely exclude these if you're using MRR to measure
how well you order your result set. Still, if you want to include those,
you can always compare the search term with the last result set and include
them in MRR - you're actually right that users maybe skipping the lower
positions even if the intended suggestion is available. Our MRR stands at
68% and 75% of all of the suggestions are selected from position #1 or #2.


So acceptance rate = # of suggestions taken / total queries issued?


Yes. The total queries issues should ideally be those where Auto-Suggest
was selected or could have been selected i.e. we exclude voice searches. We
try to include as much as those searches which were made via typing in the
search bar. But that's how we have fine-tuned our tracking over months.
You're right about the general formula - searches via Auto-Suggest divided
by total Searches.


And Selection to Display = # of suggestions taken (this would only be 1, if
> the not-taken suggestions are given 0s) / total suggestions displayed? If
> the above is true, wouldn't Selection to Display be binary? I.e. it's
> either 1/# of suggestions displayed (assuming this is a constant) or 0?


Yup. Please note that this is calculated per session of Auto-Suggest. Let
the formula be S/D. We will take D (Display) as 1 and not 3 when a user
query for "bag" (b, ba, bag). If the S (Selection) was made in the last
display, it is 1 also. If a user selects "bag" after writing "ba", we don't
say that S=0, D=1 for "b" and S=1, D=1 for "ba". For this, we already track
APL (Average Prefix Length). S/D is calculated per search and thus, here
S=1, D=1 for search "bag". Thus, for a single search, S/D can be either 0
or 1 - you're right, it's binary!

Hope this helps. Loved your questions! :)

On Thu, 27 Feb 2020 at 22:21, Audrey Lorberfeld - audrey.lorberf...@ibm.com
 wrote:

> Paras,
>
> Thank you for this response! Yes, you are being clear __
>
> Regarding the assumptions you make for MRR, do you have any research
> papers to confirm that these user behaviors have been observed? I only ask
> because this paper http://yichang-cs.com/yahoo/sigir14_SearchAssist.pdf
> talks about how users often skip results and go straight to vanilla search
> even though their query is displayed in the top of the suggestions list
> (section 3.2 "QAC User Behavior Analysis"), among other behaviors that go
> against general IR intuition. This is only one paper, of course, but it
> seems that user research of QAC is hard to come by otherwise.
>
> So acceptance rate = # of suggestions taken / total queries issued ?
> And Selection to Display = # of suggestions taken (this would only be 1,
> if the not-taken suggestions are given 0s) / total suggestions displayed ?
>
> If the above is true, wouldn't Selection to Display be binary? I.e. it's
> either 1/# of suggestions displayed (assuming this is a constant) or 0?
>
> Best,
> Audrey
>
>
> 
> From: Paras Lehana 
> Sent: Thursday, February 27, 2020 2:58:25 AM
> To: solr-user@lucene.apache.org
> Subject: [EXTERNAL] Re: Re: Re: Query Autocomplete Evaluation
>
> Hi Audrey,
>
> For MRR, we assume that if a suggestion is selected, it's relevant. It's
> also assumed that the user will always click the highest relevant
> suggestion. Thus, we calculate position selection for each selection. If
> still, I'm not understanding your question correctly, feel free to contact
> me personally (hangouts?).
>
> And @Paras, the third and fourth evaluation metrics you listed in your
> > first reply seem the same to me. What is the difference between the two?
>
>
> I was expecting you to ask this - I should have explained a bit more.
> Acceptance Rate is the searches through Auto-Suggest for all Searches.
> Whereas, value for Selection to Display is 1 if the Selection is made given
> the suggestions were displayed otherwise 0. Here, the cases where results
> are displayed is the universal set. Acceptance Rate is counted 0 even for
> those searches where Selection was not made because there were no results
> while S/D will not count this - it only counts cases where the result was
> displayed.
>
> Hope I'm clear. :)
>
> On Tue, 25 Feb 2020 at 21:10, Audrey Lorberfeld -
> audrey.lorberf...@ibm.com
>  wrote:
>
> > This article
> >
> 

Re: Solr Cloud on Docker?

2020-02-28 Thread Jan Høydahl
We have been experimenting with Solr cloud in Docker for some while,
and tried to do some optimizations by turning off swap on the host completely.
However, that quickly led to OOM crashes although we had 8G physical, 4G heap
and Solr just having a few thousand docs.

This makes me suspect that the common advice of disabling swap may not work
that well when Docker is involved, due to how Docker manages memory. I found 
some
evidence that Docker often will use swap first even if physical memory is 
available.

We ended up enabling swap again, but instead reduce vm.swappiness to 1.

So any information or best practices regarding Swap/swappiness and perhaps also
the docker options --memory-swap, --memory-swappiness.

Any experience from you guys would be valuable!

Jan

> 5. feb. 2020 kl. 20:01 skrev Karl Stoney 
> :
> 
> Nothing much to add to the below apart from we also successfully run solr on 
> kubernetes.  It took some implementation effort but we're now at a point 
> where we can do `kubectl scale --replicas=x statefulset/solr` and increase 
> capacity in minutes with solr's autoscaling taking care of the new shard 
> creation.
> 
> Very happy.
> 
> From: Dominique Bejean  >
> Sent: 05 February 2020 17:53
> To: Dwane Hall mailto:dwaneh...@hotmail.com>>
> Cc: Scott Stults  >; solr-user@lucene.apache.org 
>   >
> Subject: Re: Solr Cloud on Docker?
> 
> Thank you Dwane. Great info :)
> 
> 
> Le mer. 5 févr. 2020 à 11:49, Dwane Hall  a écrit :
> 
>> Hey Dominique,
>> 
>> From a memory management perspective I don't do any container resource
>> limiting specifically in Docker (although as you mention you certainly
>> can).  In our circumstances these hosts are used specifically for Solr so I
>> planned and tested my capacity beforehand. We have ~768G of RAM on each of
>> these 5 hosts so with 20x16G heaps we had ~320G of heap being used by Solr,
>> some overhead for Docker and the other OS services leaving ~400G for the OS
>> cache and whatever wants to grab it on each host. Not everyone will have
>> servers this large which is why we really had to take advantage of multiple
>> Solr instances/host and Docker became important for our cluster operation
>> management.  Our disk's are not SSD's either and all instances write to the
>> same raid 5 spinner which is bind mounted to the containers.  With this
>> configuration we've been able to achieve consistent median response times
>> of under 500ms across the largest collection but obviously query type
>> varies this (no terms, leading wildcards etc.).  Our QPS is not huge
>> ranging from 2-20/sec but if we need to scale further or speed up response
>> times there's certainly wins that can be made at a disk level.  For our
>> current circumstances we're very content with the deployment.
>> 
>> In not sure if you've read Toke's blog on his experiences at the Royal
>> Danish Library but I found it really useful when capacity planning and
>> recommend reading it (
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsbdevel.wordpress.com%2F2016%2F11%2F30%2F70tb-16b-docs-4-machines-1-solrcloud%2Fdata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C551dd53ab648462d6ae008d7aa6463d4%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165220483158911sdata=LXYgh3kUAo4X4mDbDIqhJO%2B%2BR3FdrTxci3sNw%2Frm0sc%3Dreserved=0
>>  
>> 
>> ).
>> 
>> As always it's recommend to test for your own conditions and best of luck
>> with your deployment!
>> 
>> Dwane
>> 
>> --
>> *From:* Scott Stults > >
>> *Sent:* Thursday, 30 January 2020 1:45 AM
>> *To:* solr-user@lucene.apache.org  
>> mailto:solr-user@lucene.apache.org>>
>> *Subject:* Re: Solr Cloud on Docker?
>> 
>> One of our clients has been running a big Solr Cloud (100-ish nodes, TB
>> index, billions of docs) in kubernetes for over a year and it's been
>> wonderful. I think during that time the biggest scrapes we got were when we
>> ran out of disk space. Performance and reliability has been solid
>> otherwise. Like Dwane alluded to, a lot of operations pitfalls can be
>> avoided if you do your Docker orchestration through kubernetes.
>> 
>> 
>> k/r,
>> Scott
>> 
>> On Tue, Jan 28, 2020 at 3:34 AM Dominique Bejean <
>> dominique.bej...@eolya.fr >
>> wrote:
>> 
>>> Hi  Dwane,
>>> 
>>> Thank you for sharing this great solr/docker user story.
>>> 
>>> 

Re: Regarding Installation of Solr 5.1.0 as Paas in AZure

2020-02-28 Thread Jan Høydahl
Hi

You need to provide a lot more details if you wish to have useful answers.
First, we don’t know what PAAS Azure means (we can guess, but it won’t help). 
So you need to walk us through exactly what you try to do, i.e.

* What operating system are you trying to install on
* How exactly did you do the install
* Exactly how do you start the Solr service, with what parameters and what 
solr.in.sh
* When you start it, do what logs do you see (solr.log file)
* The 404 you are talking about, exactly where do you see this, after visiting 
what URL?

Paste as much config and logs as you can, then perhaps someone can spot the 
error.
You may have to use DropBox or gitst etc to share big files or long pieces of 
text.

You may want to review 
https://cwiki.apache.org/confluence/display/SOLR/UsingMailingLists for more 
in-depth advice on how to use this mailing list

Jan

> 27. feb. 2020 kl. 14:20 skrev Shrinivasan Mohanakrishnan 
> :
> 
> Hi Team,
> 
> Is it possible to install Solr 5.1.0 as Paas in Azure, Since my sitecore is 
> 8.2 update 5 which supports only solr 5.1.0. I tried to install using various 
> articles but it is throwing 404 error. Can anyone please help on this.
> 
> Thanks in Advance.
> 
> Regards,
> Shrinivasan.
> 
> 
> This e-mail communication and any attachments to it are confidential and 
> privileged to Hexaware and are strictly intended only for the personal and 
> confidential use of the designated recipient(s) named above. If you are not 
> the intended recipient of this message, you are hereby notified that any 
> review, dissemination, distribution or copying of this message is strictly 
> prohibited and may be unlawful.
> 
> Please notify the sender immediately and destroy all copies of this message 
> along with all attachments thereto.