RE: solr url control

2018-03-05 Thread Becky Bonner
Thank you for your response.  Our dev instance is not a cloud but we will be 
implementing cloud in our staging and production environments.  I was afraid 
you were going to tell me that the substructure was not supported. I was hoping 
that in the core autodiscovery, it would keep the path.  Thanks for your help. 

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Friday, March 2, 2018 6:45 PM
To: solr-user@lucene.apache.org
Subject: Re: solr url control

On 3/2/2018 10:29 AM, Becky Bonner wrote:
> We are trying to setup one solr server for several applications each with a 
> different collection.  Is there a way to have have 2 collections under one 
> folder and the url be something like this:
> http://mysolrinstance.com/solr/myParent1/collection1
> http://mysolrinstance.com/solr/myParent1/collection2
> http://mysolrinstance.com/solr/myParent2
> http://mysolrinstance.com/solr/myParent3

No. I am not aware of any way to set up a hierarchy like this. Collections and 
cores have one identifier for their names.  You could use myparent1_collection1 
as a name.

Implementing such a hierarchy like this would likely be difficult for the dev 
team, and would probably be a large source of bugs for several releases after 
it first became available.  I don't think a feature like this is likely to 
happen.

Later, you said "We would not want the data from one collection to ever show up 
in another collection query."  That's not ever going to happen unless the 
software making the query explicitly requests it, and it will need to know 
details about the indexes in your Solr server to be able to do it successfully. 
 FYI: People who cannot be trusted shouldn't ever have direct access to your 
Solr installation.

Are you running SolrCloud?  I ask because if you're not, then the terminology 
for each index isn't a "collection" ... it's a core.  This is a pedantic 
statement, but you'll get better answers if your terminology is correct.

If you were running SolrCloud, it would be extremely unlikely for you to have a 
directory structure like you describe.  SolrCloud normally handles all core 
creation behind the scenes and isn't going to set up a directory structure like 
that.

Information about how core discovery works:

https://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29#Finding_cores

Thanks,
Shawn



RE: solr url control

2018-03-02 Thread Becky Bonner
So the thing is ... these collections all have very unique schemas and the data 
are unrelated to each other.  And we do a lot of field queries on the content.  
We would not want the data from one collection to ever show up in another 
collection query.  They are used by different audiences and securities as well. 
 We want to keep them separated.  

While it is not required that the urls include the myParentX ... it would be 
consistent with our current implementation that we are upgrading from 4.6 to 
7.2.  this was a very simple task under apache but I cant figure out how to do 
this in solr 7

-Original Message-
From: Becky Bonner 
Sent: Friday, March 2, 2018 1:11 PM
To: 'solr-user@lucene.apache.org' <solr-user@lucene.apache.org>
Subject: RE: solr url control

Sorry Webster - I meant to make this a new question ... but accidentally sent 
it. You wrote
From: Webster Homer [mailto:webster.ho...@sial.com] 
Sent: Friday, March 2, 2018 12:20 PM
To: solr-user@lucene.apache.org
Subject: Re: NRT replicas miss hits and return duplicate hits when paging 
solrcloud searches

Becky,
This should have been its own question.

Solrcloud is different from standalone solr, the configurations live in 
Zookeeper and the index is created under SOLR_HOME. You might want to rethink 
your solution, What problem are you trying to solve with that layout? Would it 
be solved by creating the Parent1 collection with 2 shards?

-Original Message-
From: Becky Bonner 
Sent: Friday, March 2, 2018 11:29 AM
To: solr-user@lucene.apache.org
Subject: solr url control

We are trying to setup one solr server for several applications each with a 
different collection.  Is there a way to have have 2 collections under one 
folder and the url be something like this:
http://mysolrinstance.com/solr/myParent1/collection1
http://mysolrinstance.com/solr/myParent1/collection2
http://mysolrinstance.com/solr/myParent2
http://mysolrinstance.com/solr/myParent3


We organized it like that under the solr folder but the URLs to the collections 
do not include the "myParent1".
This makes the names of my collections more confusing because you can't tell 
what application they belong to.  It wasn’t a problem until we had 2 
collections for one of the apps.


RE: solr url control

2018-03-02 Thread Becky Bonner
Sorry Webster - I meant to make this a new question ... but accidentally sent 
it. You wrote
From: Webster Homer [mailto:webster.ho...@sial.com] 
Sent: Friday, March 2, 2018 12:20 PM
To: solr-user@lucene.apache.org
Subject: Re: NRT replicas miss hits and return duplicate hits when paging 
solrcloud searches

Becky,
This should have been its own question.

Solrcloud is different from standalone solr, the configurations live in 
Zookeeper and the index is created under SOLR_HOME. You might want to rethink 
your solution, What problem are you trying to solve with that layout? Would it 
be solved by creating the Parent1 collection with 2 shards?

-Original Message-
From: Becky Bonner 
Sent: Friday, March 2, 2018 11:29 AM
To: solr-user@lucene.apache.org
Subject: solr url control

We are trying to setup one solr server for several applications each with a 
different collection.  Is there a way to have have 2 collections under one 
folder and the url be something like this:
http://mysolrinstance.com/solr/myParent1/collection1
http://mysolrinstance.com/solr/myParent1/collection2
http://mysolrinstance.com/solr/myParent2
http://mysolrinstance.com/solr/myParent3


We organized it like that under the solr folder but the URLs to the collections 
do not include the "myParent1".
This makes the names of my collections more confusing because you can't tell 
what application they belong to.  It wasn’t a problem until we had 2 
collections for one of the apps.


solr url control

2018-03-02 Thread Becky Bonner
We are trying to setup one solr server for several applications each with a 
different collection.  Is there a way to have have 2 collections under one 
folder and the url be something like this:
http://mysolrinstance.com/solr/myParent1/collection1
http://mysolrinstance.com/solr/myParent1/collection2
http://mysolrinstance.com/solr/myParent2
http://mysolrinstance.com/solr/myParent3


We organized it like that under the solr folder but the URLs to the collections 
do not include the "myParent1".
This makes the names of my collections more confusing because you can't tell 
what application they belong to.  It wasn’t a problem until we had 2 
collections for one of the apps.


RE: NRT replicas miss hits and return duplicate hits when paging solrcloud searches

2018-03-02 Thread Becky Bonner
We are trying to setup one solr server for several applications each with a 
different collection.  Is there a way to have have 2 collections under one 
folder and the url be something like this:
http://mysolrinstance.com/solr/myParent1/collection1
http://mysolrinstance.com/solr/myParent1/collection2
http://mysolrinstance.com/solr/myParent2
http://mysolrinstance.com/solr/myParent3


We organized it like that under the solr folder but the URLs to the collections 
do not include the "myParent1".
This makes the names of my collections more confusing because you can't tell 
what application they belong to.  It wasn’t a problem until we had 2 
collections for one of the apps.




-Original Message-
From: Webster Homer [mailto:webster.ho...@sial.com] 
Sent: Friday, March 2, 2018 10:29 AM
To: solr-user@lucene.apache.org
Subject: Re: NRT replicas miss hits and return duplicate hits when paging 
solrcloud searches

I am trying to test if enabling stats cache as suggested by Eric would also 
address this issue. I added this to my solrconfig.xml

 

I executed queries and saw no differences. Then I re-indexed the data, again I 
saw no differences in behavior.
Then I found this,  SOLR-10952. It seems we need to disable the 
queryResultCache for the global stats cache to work.
I've never disabled this before. I edited the solrconfig.xml setting the sizes 
to 0. I'm not sure if this is how to disable the cache or not.



I also set this:
   0

Then uploaded the solrconfig.xml and reloaded the collection. It sill made no 
difference. Do I need to restart solr for this to take effect?
When I look in the admin console, the queryResultCache still seems to have the 
old settings.

Does enabling statsCache require a solr restart too? Does enabling the 
statsCache require that the data be re-indexed? The documentation on this 
feature is skimpy.
Is there a way to see if it's enabled in the Admin Console?

On Tue, Feb 27, 2018 at 9:31 AM, Webster Homer 
wrote:

> Emir,
>
> Using tlog replica types addresses my immediate problem.
>
> The secondary issue is that all of our searches show inconsistent results.
> These are all normal paging use cases. We regularly test our 
> relevancy, and these differences creates confusion in the testers. 
> Moreover, we are migrating from Endeca which has very consistent results.
>
> I'm hoping that using the global stats cache will make the other 
> searches more stable. I think we will eventually move to favoring tlog 
> replicas. We have a couple of collections where NRT makes sense, but 
> those collections don't need to return data in relevancy order. I 
> think NRT should be considered a niche use case for a search engine, 
> tlog and pull replicas are a much better fit for a search engine 
> (imho)
>
> On Tue, Feb 27, 2018 at 4:01 AM, Emir Arnautović < 
> emir.arnauto...@sematext.com> wrote:
>
>> Hi Webster,
>> Since you are returning all hits, returning the last page is almost 
>> as heavy for Solr as returning all documents. Maybe you should 
>> consider just returning one large page and completely avoid this issue.
>> I agree with you that this should be handled by Solr. ES solved this 
>> issue with “preference” search parameter where you can set session id 
>> as preference and it will stick to the same shards. I guess you could 
>> try similar thing on your own but that would require you to send list 
>> of shards as parameter for your search and balance it for different sessions.
>>
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
>> Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 26 Feb 2018, at 21:03, Webster Homer  wrote:
>> >
>> > Erick,
>> >
>> > No we didn't look at that. I will add it to the list. We have  not 
>> > seen performance issues with solr. We have much slower technologies 
>> > in our stack. This project was to replace a system that was too slow.
>> >
>> > Thank you, I will look into it
>> >
>> > Webster
>> >
>> > On Mon, Feb 26, 2018 at 1:13 PM, Erick Erickson <
>> erickerick...@gmail.com>
>> > wrote:
>> >
>> >> Did you try enabling distributed IDF (statsCache)? See:
>> >> https://lucene.apache.org/solr/guide/6_6/distributed-requests.html
>> >>
>> >> It's may not totally fix the issue, but it's worth trying. It does 
>> >> come with a performance penalty of course.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Mon, Feb 26, 2018 at 11:00 AM, Webster Homer <
>> webster.ho...@sial.com>
>> >> wrote:
>> >>> Thanks Shawn, I had settled on this as a solution.
>> >>>
>> >>> All our use cases for Solr is to return results in order of 
>> >>> relevancy
>> to
>> >>> the query, so having a deterministic sort would defeat that purpose.
>> >> Since
>> >>> we wanted to be able to return all the results for a query, I
>> originally
>> >>> looked at using the Streaming API, but that doesn't support 
>> >>> returning results sorted by relevancy

RE: solr usage reporting

2018-01-25 Thread Becky Bonner
That would work for a single server but collecting the logs from the farm would 
be a problematic since we would have logs from all nodes and replicas from all 
the members of the farm.  We would then need weed out what we are interested in 
and combine. It would be better if there were a way to query it within Solr.  I 
think something in Solr would be best ... a separate collection that can be 
queried and reports generated from it.  The log does have the basic info we 
need though.


-Original Message-
From: Marco Reis [mailto:m...@marcoreis.net] 
Sent: Thursday, January 25, 2018 11:14 AM
To: solr-user@lucene.apache.org
Subject: Re: solr usage reporting

One way is to collect the log from your server and, then, use other tool to 
generate your report.


On Thu, Jan 25, 2018 at 2:59 PM Becky Bonner <bbon...@teleflora.com> wrote:

> Hi all,
> We are in the process of replacing our Google Search Appliance with 
> SOLR
> 7.1 and are needing one last piece of our requirements.  We provide a 
> monthly report to our business that shows the top 1000 query terms 
> requested during the date range as well as the query terms requested 
> that contained no results.  Is there a way to log the requests and 
> later query solr for these results? Or is there a plugin to add this 
> functionality?
>
> Your help appreciated.
> Bcubed
>
>
> --
Marco Reis
Software Engineer
http://marcoreis.net
https://github.com/masreis
+55 61 9 81194620


solr usage reporting

2018-01-25 Thread Becky Bonner
Hi all,
We are in the process of replacing our Google Search Appliance with SOLR 7.1 
and are needing one last piece of our requirements.  We provide a monthly 
report to our business that shows the top 1000 query terms requested during the 
date range as well as the query terms requested that contained no results.  Is 
there a way to log the requests and later query solr for these results? Or is 
there a plugin to add this functionality?

Your help appreciated.
Bcubed