RE: solr url control
Thank you for your response. Our dev instance is not a cloud but we will be implementing cloud in our staging and production environments. I was afraid you were going to tell me that the substructure was not supported. I was hoping that in the core autodiscovery, it would keep the path. Thanks for your help. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Friday, March 2, 2018 6:45 PM To: solr-user@lucene.apache.org Subject: Re: solr url control On 3/2/2018 10:29 AM, Becky Bonner wrote: > We are trying to setup one solr server for several applications each with a > different collection. Is there a way to have have 2 collections under one > folder and the url be something like this: > http://mysolrinstance.com/solr/myParent1/collection1 > http://mysolrinstance.com/solr/myParent1/collection2 > http://mysolrinstance.com/solr/myParent2 > http://mysolrinstance.com/solr/myParent3 No. I am not aware of any way to set up a hierarchy like this. Collections and cores have one identifier for their names. You could use myparent1_collection1 as a name. Implementing such a hierarchy like this would likely be difficult for the dev team, and would probably be a large source of bugs for several releases after it first became available. I don't think a feature like this is likely to happen. Later, you said "We would not want the data from one collection to ever show up in another collection query." That's not ever going to happen unless the software making the query explicitly requests it, and it will need to know details about the indexes in your Solr server to be able to do it successfully. FYI: People who cannot be trusted shouldn't ever have direct access to your Solr installation. Are you running SolrCloud? I ask because if you're not, then the terminology for each index isn't a "collection" ... it's a core. This is a pedantic statement, but you'll get better answers if your terminology is correct. If you were running SolrCloud, it would be extremely unlikely for you to have a directory structure like you describe. SolrCloud normally handles all core creation behind the scenes and isn't going to set up a directory structure like that. Information about how core discovery works: https://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29#Finding_cores Thanks, Shawn
RE: solr url control
So the thing is ... these collections all have very unique schemas and the data are unrelated to each other. And we do a lot of field queries on the content. We would not want the data from one collection to ever show up in another collection query. They are used by different audiences and securities as well. We want to keep them separated. While it is not required that the urls include the myParentX ... it would be consistent with our current implementation that we are upgrading from 4.6 to 7.2. this was a very simple task under apache but I cant figure out how to do this in solr 7 -Original Message- From: Becky Bonner Sent: Friday, March 2, 2018 1:11 PM To: 'solr-user@lucene.apache.org' <solr-user@lucene.apache.org> Subject: RE: solr url control Sorry Webster - I meant to make this a new question ... but accidentally sent it. You wrote From: Webster Homer [mailto:webster.ho...@sial.com] Sent: Friday, March 2, 2018 12:20 PM To: solr-user@lucene.apache.org Subject: Re: NRT replicas miss hits and return duplicate hits when paging solrcloud searches Becky, This should have been its own question. Solrcloud is different from standalone solr, the configurations live in Zookeeper and the index is created under SOLR_HOME. You might want to rethink your solution, What problem are you trying to solve with that layout? Would it be solved by creating the Parent1 collection with 2 shards? -Original Message- From: Becky Bonner Sent: Friday, March 2, 2018 11:29 AM To: solr-user@lucene.apache.org Subject: solr url control We are trying to setup one solr server for several applications each with a different collection. Is there a way to have have 2 collections under one folder and the url be something like this: http://mysolrinstance.com/solr/myParent1/collection1 http://mysolrinstance.com/solr/myParent1/collection2 http://mysolrinstance.com/solr/myParent2 http://mysolrinstance.com/solr/myParent3 We organized it like that under the solr folder but the URLs to the collections do not include the "myParent1". This makes the names of my collections more confusing because you can't tell what application they belong to. It wasn’t a problem until we had 2 collections for one of the apps.
RE: solr url control
Sorry Webster - I meant to make this a new question ... but accidentally sent it. You wrote From: Webster Homer [mailto:webster.ho...@sial.com] Sent: Friday, March 2, 2018 12:20 PM To: solr-user@lucene.apache.org Subject: Re: NRT replicas miss hits and return duplicate hits when paging solrcloud searches Becky, This should have been its own question. Solrcloud is different from standalone solr, the configurations live in Zookeeper and the index is created under SOLR_HOME. You might want to rethink your solution, What problem are you trying to solve with that layout? Would it be solved by creating the Parent1 collection with 2 shards? -Original Message- From: Becky Bonner Sent: Friday, March 2, 2018 11:29 AM To: solr-user@lucene.apache.org Subject: solr url control We are trying to setup one solr server for several applications each with a different collection. Is there a way to have have 2 collections under one folder and the url be something like this: http://mysolrinstance.com/solr/myParent1/collection1 http://mysolrinstance.com/solr/myParent1/collection2 http://mysolrinstance.com/solr/myParent2 http://mysolrinstance.com/solr/myParent3 We organized it like that under the solr folder but the URLs to the collections do not include the "myParent1". This makes the names of my collections more confusing because you can't tell what application they belong to. It wasn’t a problem until we had 2 collections for one of the apps.
solr url control
We are trying to setup one solr server for several applications each with a different collection. Is there a way to have have 2 collections under one folder and the url be something like this: http://mysolrinstance.com/solr/myParent1/collection1 http://mysolrinstance.com/solr/myParent1/collection2 http://mysolrinstance.com/solr/myParent2 http://mysolrinstance.com/solr/myParent3 We organized it like that under the solr folder but the URLs to the collections do not include the "myParent1". This makes the names of my collections more confusing because you can't tell what application they belong to. It wasn’t a problem until we had 2 collections for one of the apps.
RE: NRT replicas miss hits and return duplicate hits when paging solrcloud searches
We are trying to setup one solr server for several applications each with a different collection. Is there a way to have have 2 collections under one folder and the url be something like this: http://mysolrinstance.com/solr/myParent1/collection1 http://mysolrinstance.com/solr/myParent1/collection2 http://mysolrinstance.com/solr/myParent2 http://mysolrinstance.com/solr/myParent3 We organized it like that under the solr folder but the URLs to the collections do not include the "myParent1". This makes the names of my collections more confusing because you can't tell what application they belong to. It wasn’t a problem until we had 2 collections for one of the apps. -Original Message- From: Webster Homer [mailto:webster.ho...@sial.com] Sent: Friday, March 2, 2018 10:29 AM To: solr-user@lucene.apache.org Subject: Re: NRT replicas miss hits and return duplicate hits when paging solrcloud searches I am trying to test if enabling stats cache as suggested by Eric would also address this issue. I added this to my solrconfig.xml I executed queries and saw no differences. Then I re-indexed the data, again I saw no differences in behavior. Then I found this, SOLR-10952. It seems we need to disable the queryResultCache for the global stats cache to work. I've never disabled this before. I edited the solrconfig.xml setting the sizes to 0. I'm not sure if this is how to disable the cache or not. I also set this: 0 Then uploaded the solrconfig.xml and reloaded the collection. It sill made no difference. Do I need to restart solr for this to take effect? When I look in the admin console, the queryResultCache still seems to have the old settings. Does enabling statsCache require a solr restart too? Does enabling the statsCache require that the data be re-indexed? The documentation on this feature is skimpy. Is there a way to see if it's enabled in the Admin Console? On Tue, Feb 27, 2018 at 9:31 AM, Webster Homerwrote: > Emir, > > Using tlog replica types addresses my immediate problem. > > The secondary issue is that all of our searches show inconsistent results. > These are all normal paging use cases. We regularly test our > relevancy, and these differences creates confusion in the testers. > Moreover, we are migrating from Endeca which has very consistent results. > > I'm hoping that using the global stats cache will make the other > searches more stable. I think we will eventually move to favoring tlog > replicas. We have a couple of collections where NRT makes sense, but > those collections don't need to return data in relevancy order. I > think NRT should be considered a niche use case for a search engine, > tlog and pull replicas are a much better fit for a search engine > (imho) > > On Tue, Feb 27, 2018 at 4:01 AM, Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > >> Hi Webster, >> Since you are returning all hits, returning the last page is almost >> as heavy for Solr as returning all documents. Maybe you should >> consider just returning one large page and completely avoid this issue. >> I agree with you that this should be handled by Solr. ES solved this >> issue with “preference” search parameter where you can set session id >> as preference and it will stick to the same shards. I guess you could >> try similar thing on your own but that would require you to send list >> of shards as parameter for your search and balance it for different sessions. >> >> HTH, >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & >> Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >> > On 26 Feb 2018, at 21:03, Webster Homer wrote: >> > >> > Erick, >> > >> > No we didn't look at that. I will add it to the list. We have not >> > seen performance issues with solr. We have much slower technologies >> > in our stack. This project was to replace a system that was too slow. >> > >> > Thank you, I will look into it >> > >> > Webster >> > >> > On Mon, Feb 26, 2018 at 1:13 PM, Erick Erickson < >> erickerick...@gmail.com> >> > wrote: >> > >> >> Did you try enabling distributed IDF (statsCache)? See: >> >> https://lucene.apache.org/solr/guide/6_6/distributed-requests.html >> >> >> >> It's may not totally fix the issue, but it's worth trying. It does >> >> come with a performance penalty of course. >> >> >> >> Best, >> >> Erick >> >> >> >> On Mon, Feb 26, 2018 at 11:00 AM, Webster Homer < >> webster.ho...@sial.com> >> >> wrote: >> >>> Thanks Shawn, I had settled on this as a solution. >> >>> >> >>> All our use cases for Solr is to return results in order of >> >>> relevancy >> to >> >>> the query, so having a deterministic sort would defeat that purpose. >> >> Since >> >>> we wanted to be able to return all the results for a query, I >> originally >> >>> looked at using the Streaming API, but that doesn't support >> >>> returning results sorted by relevancy
RE: solr usage reporting
That would work for a single server but collecting the logs from the farm would be a problematic since we would have logs from all nodes and replicas from all the members of the farm. We would then need weed out what we are interested in and combine. It would be better if there were a way to query it within Solr. I think something in Solr would be best ... a separate collection that can be queried and reports generated from it. The log does have the basic info we need though. -Original Message- From: Marco Reis [mailto:m...@marcoreis.net] Sent: Thursday, January 25, 2018 11:14 AM To: solr-user@lucene.apache.org Subject: Re: solr usage reporting One way is to collect the log from your server and, then, use other tool to generate your report. On Thu, Jan 25, 2018 at 2:59 PM Becky Bonner <bbon...@teleflora.com> wrote: > Hi all, > We are in the process of replacing our Google Search Appliance with > SOLR > 7.1 and are needing one last piece of our requirements. We provide a > monthly report to our business that shows the top 1000 query terms > requested during the date range as well as the query terms requested > that contained no results. Is there a way to log the requests and > later query solr for these results? Or is there a plugin to add this > functionality? > > Your help appreciated. > Bcubed > > > -- Marco Reis Software Engineer http://marcoreis.net https://github.com/masreis +55 61 9 81194620
solr usage reporting
Hi all, We are in the process of replacing our Google Search Appliance with SOLR 7.1 and are needing one last piece of our requirements. We provide a monthly report to our business that shows the top 1000 query terms requested during the date range as well as the query terms requested that contained no results. Is there a way to log the requests and later query solr for these results? Or is there a plugin to add this functionality? Your help appreciated. Bcubed