Re: No Live SolrServer available to handle this request
When i look at the solr logs i find the below exception Caused by: java.io.IOException: Invalid JSON type java.lang.String, expected Map at org.apache.solr.schema.JsonPreAnalyzedParser.parse(JsonPreAnalyzedParser.java:86) at org.apache.solr.schema.PreAnalyzedField$PreAnalyzedTokenizer.decodeInput(PreAnalyzedField.java:345) at org.apache.solr.schema.PreAnalyzedField$PreAnalyzedTokenizer.access$000(PreAnalyzedField.java:280) at org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer$1.setReader(PreAnalyzedField.java:375) at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:202) at org.apache.lucene.search.uhighlight.AnalysisOffsetStrategy.tokenStream(AnalysisOffsetStrategy.java:58) at org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.getOffsetsEnums(MemoryIndexOffsetStrategy.java:106) ... 37 more I am setting up lot of fields (fq, score, highlight,etc) then put it into solrquery. On Wed, Dec 6, 2017 at 11:22 AM, Selvam Ramanwrote: > When i am firing query it returns the doc as expected. (Example: > q=synthesis) > > I am facing the problem when i include wildcard character in the query. > (Example: q=synthesi*) > > > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > Error from server at http://localhost:8983/solr/Metadata2: > org.apache.solr.client.solrj.SolrServerException: > > No live SolrServers available to handle this request:[/solr/Metadata2_ > shard1_replica1, > solr/Metadata2_shard2_replica2, > solr/Metadata2_shard1_replica2] > > -- > Selvam Raman > "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" > -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
No Live SolrServer available to handle this request
When i am firing query it returns the doc as expected. (Example: q=synthesis) I am facing the problem when i include wildcard character in the query. (Example: q=synthesi*) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/Metadata2: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[/solr/Metadata2_shard1_replica1, solr/Metadata2_shard2_replica2, solr/Metadata2_shard1_replica2] -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Re: Dataimport handler showing idle status with multiple shards
From: Shawn HeiseyReply-To: "solr-user@lucene.apache.org" Date: Tuesday, December 5, 2017 at 1:31 PM To: "solr-user@lucene.apache.org" Subject: Re: Dataimport handler showing idle status with multiple shards On 12/5/2017 10:47 AM, Sarah Weissman wrote: I’ve recently been using the dataimport handler to import records from a database into a Solr cloud collection with multiple shards. I have 6 dataimport handlers configured on 6 different paths all running simultaneously against the same DB. I’ve noticed that when I do this I often get “idle” status from the DIH even when the import is still running. The percentage of the time I get an “idle” response seems proportional to the number of shards. I.e., with 1 shard it always shows me non-idle status, with 2 shards I see idle about half the time I check the status, with 96 shards it seems to be showing idle almost all the time. I can see the size of each shard increasing, so I’m sure the import is still going. I recently switched from 6.1 to 7.1 and I don’t remember this happening in 6.1. Does anyone know why the DIH would report idle when it’s running? e.g.: curl http://myserver:8983/solr/collection/dataimport6 To use DIH with SolrCloud, you should be sending your request directly to a shard replica core, not the collection, so that you can be absolutely certain that the import command and the status command are going to the same place. You MIGHT need to also have a distrib=false parameter on the request, but I do not know whether that is required to prevent the load balancing on the dataimport handler. Thanks for the information, Shawn. I am relatively new to Solr cloud and I am used to running the dataimport from the admin dashboard, where it happens at the collection level, so I find it surprising that the right way to do this is at the core level. So, if I want to be able to check the status of my data import for N cores I would need to create N different data import configs that manually partition the collection and start each different config on a different core? That seems like it could get confusing. And then if I wanted to grow or shrink my shards I’d have to rejigger my data import configs every time. I kind of expect a distributed index to hide these details from me. I only have one node at the moment, and I don’t understand how Solr cloud works internally well enough to understand what it means for the data import to be running on a shard vs. a node. It would be nice if doing a status query would at least tell you something, like the number of documents last indexed on that core, even if nothing is currently running. That way at least I could extrapolate how much longer the operation will take.
RE: Multiple cores versus a "source" field.
Thanks Walter. Your case does apply as both data stores do indeed cover the same kind of material, with many important terms in common. "source" + fq: coming up. -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Tuesday, 5 December 2017 5:51 p.m. To: solr-user@lucene.apache.org Subject: Re: Multiple cores versus a "source" field. One more opinion on source field vs separate collections for multiple corpora. Index statistics don’t really settle down until at least 100k documents. Below that, idf is pretty noisy. With Ultraseek, we used pre-calculated frequency data for collections under 10k docs. If your corpora have similar word statistics, you might get more predictable relevance with a single collection. For example, if you have data sheets and press releases, but they are both about test instruments, then you might get some advantage from having more data points about the “text” and “title” fields. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 4, 2017, at 7:17 PM, Phil Scaddenwrote: > > Thanks Eric. I have already followed the solrj indexing very closely - I have > to do a lot of manipulation at indexing time. The other blog article is very > interesting as I do indeed use "year" (year of publication) and it is very > frequently used to filter queries. I will have a play with that now. > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, 5 December 2017 4:11 p.m. > To: solr-user > Subject: Re: Multiple cores versus a "source" field. > > That's the unpleasant part of semi-structued documents (PDF, Word, whatever). > You never know the relationship between raw size and indexable text. > > Basically anything that you don't care to contribute to _scoring_ is often > better in an fq clause. You can also use {!cache=false} to bypass actually > using the cache if you know it's unlikely to be reused. > > Two other points: > > 1> you can offload the parsing to clients rather than Solr and gain > more control over the process (assuming you haven't already). Here's a blog: > https://lucidworks.com/2012/02/14/indexing-with-solrj/ > > 2> One reason to not go to fq clauses (except if you use > {!cache=false}) is if you are using bare NOW in your clauses for, say ranges, > one common construct is fq=date[NOW-1DAY TO NOW]. Here's another blog on the > subject: > https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/ > > > Best, > Erick > > > On Mon, Dec 4, 2017 at 6:08 PM, Phil Scadden wrote: >>> You'll have a few economies of scale I think with a single core, but >>> frankly I don't know if they'd be enough to measure. You say the docs are >>> "quite large" though, >are you talking books? Magazine articles? is 20K >>> large or are the 20M? >> >> Technical reports. Sometimes up to 200MB pdfs, but that would include a lot >> of imagery. More typically 20Mb. A 140MB pdf contained only 400k of text. >> >> Thanks for tip on fq: I will put that into code now as I have other fields >> used is similar fashion. >> Notice: This email and any attachments are confidential and may not be used, >> published or redistributed without the prior written consent of the >> Institute of Geological and Nuclear Sciences Limited (GNS Science). If >> received in error please destroy and immediately notify GNS Science. Do not >> copy or disclose the contents. > Notice: This email and any attachments are confidential and may not be used, > published or redistributed without the prior written consent of the Institute > of Geological and Nuclear Sciences Limited (GNS Science). If received in > error please destroy and immediately notify GNS Science. Do not copy or > disclose the contents. Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.
Re: Dataimport handler showing idle status with multiple shards
On 12/5/2017 10:47 AM, Sarah Weissman wrote: I’ve recently been using the dataimport handler to import records from a database into a Solr cloud collection with multiple shards. I have 6 dataimport handlers configured on 6 different paths all running simultaneously against the same DB. I’ve noticed that when I do this I often get “idle” status from the DIH even when the import is still running. The percentage of the time I get an “idle” response seems proportional to the number of shards. I.e., with 1 shard it always shows me non-idle status, with 2 shards I see idle about half the time I check the status, with 96 shards it seems to be showing idle almost all the time. I can see the size of each shard increasing, so I’m sure the import is still going. I recently switched from 6.1 to 7.1 and I don’t remember this happening in 6.1. Does anyone know why the DIH would report idle when it’s running? e.g.: curl http://myserver:8983/solr/collection/dataimport6 When you send a DIH request to the collection name, SolrCloud is going to load balance that request across the cloud, just like it would with any other request. Solr will look at the list of all responding nodes that host part of the collection and send multiple such requests to different cores (shards/replicas) across the cloud. If there are four cores in the collection and the nodes hosting them are all working, then each of those cores would only see requests to /dataimport about one fourth of the time. DIH imports happen at the core level, NOT the collection level, so when you start an import on a collection with four cores in the cloud, only one of those four cores is actually going to be doing the import, the rest of them are idle. This behavior should happen with any version, so I would expect it in 6.1 as well as 7.1. To use DIH with SolrCloud, you should be sending your request directly to a shard replica core, not the collection, so that you can be absolutely certain that the import command and the status command are going to the same place. You MIGHT need to also have a distrib=false parameter on the request, but I do not know whether that is required to prevent the load balancing on the dataimport handler. A similar question came to this list two days ago, and I replied to that one yesterday. http://lucene.472066.n3.nabble.com/Dataimporter-status-tp4365602p4365879.html Somebody did open an issue a LONG time ago about this problem: https://issues.apache.org/jira/browse/SOLR-3666 I just commented on the issue. Thanks, Shawn
Dataimport handler showing idle status with multiple shards
Hi, I’ve recently been using the dataimport handler to import records from a database into a Solr cloud collection with multiple shards. I have 6 dataimport handlers configured on 6 different paths all running simultaneously against the same DB. I’ve noticed that when I do this I often get “idle” status from the DIH even when the import is still running. The percentage of the time I get an “idle” response seems proportional to the number of shards. I.e., with 1 shard it always shows me non-idle status, with 2 shards I see idle about half the time I check the status, with 96 shards it seems to be showing idle almost all the time. I can see the size of each shard increasing, so I’m sure the import is still going. I recently switched from 6.1 to 7.1 and I don’t remember this happening in 6.1. Does anyone know why the DIH would report idle when it’s running? e.g.: curl http://myserver:8983/solr/collection/dataimport6 { "responseHeader":{ "status":0, "QTime":0}, "initArgs":[ "defaults",[ "config","data-config6.xml"]], "status":"idle", "importResponse":"", "statusMessages":{}} Thanks, Sarah
Re: SolrIndexSearcher count
No custom code at all. On Dec 5, 2017 10:31 PM, "Erick Erickson"wrote: > Do you have any custom code in the mix anywhere? > > On Tue, Dec 5, 2017 at 5:02 AM, Rick Dig wrote: > > Hello all, > > is it normal to have many instances (100+) of SolrIndexSearchers to be > open > > at the same time? Our Heap Analysis shows this to be the case. > > > > We have autoCommit for every 5 minutes, with openSearcher=true, would > this > > close the old searcher and create a new one or just create a new one with > > the old one still not getting dereferenced? if so, when do the older > > searchers get cleaned up ? > > > > thanks for your help > > -rakshit >
Re: Skewed IDF in multi lingual index, again
It is challenging as the performance of different use cases and domains will by very dependent on the use case (there's no one globally perfect relevance solution). But a good set of metrics to see *generally* how stock Solr performs across a reasonable set of verticals would be nice. My philosophy about Lucene-based search is that it's not a solution, but rather a framework that should have sane defaults but large amounts of configurability. For example,I'm not sure there's a globally "right" answer maxDoc vs docCount Problems with docCount come into play when a corpus usually has an empty field, but it's occasionally filled out. This creates a strong bias against matches in that usually empty field, when previously a match in that field was weighted very highly For example, if a product catalog has a user-editable tag field that is rarely used, and a product description, such as Product Name: Nice Pants! Product Description: Come wear these pants! Tags: [blue] [acid-wash] Product Name: Acid Wash Pants Product Description: Come wear these pants! Tags: (empty) In this case, the IDF for the acid wash match in tags is very low using docCount whereas with maxDocs it was very high. Not sure what the right answer is, but there is often a desire to want more complete docs to be boosted much higher, which the "maxDocs" method does. Another case where docCount can be a problem is copy fields: With copy fields, you care that the original field had terms, even if for some reason they were removed in the analysis chain. This can happen with some methods we use for simple entity extraction. Further the definitions of BM25, etc rely on corpus level document frequency for a term and don't have a concept of fields. BM25F can mostly be implemented with BlendedTermQuery which blends doc frequencies across fields http://opensourceconnections.com/blog/2016/10/19/bm25f-in-lucene/ On Tue, Dec 5, 2017 at 10:28 AM alessandro.benedettiwrote: > Thanks Yonik and thanks Doug. > > I agree with Doug in adding few generics test corpora Jenkins automatically > runs some metrics on, to evaluate Apache Lucene/Solr changes don't affect a > golden truth too much. > This of course can be very complex, but I think it is a direction the > Apache > Lucene/Solr community should work on. > > Given that, I do believe that in this case, moving from maxDocs(field > independent) to docCount(field dependent) was a good move ( and this > specific multi language use case is an example). > > Actually I also believe that theoretically docCount(field dependent) is > still better than maxDocs(field dependent). > This is because docCount(field dependent) represents a state in time > associated to the current index while maxDocs represents an historical > consideration. > A corpus of documents can change in time, and how much a term is rare can > drastically change ( let's pick an highly dynamic domain such news). > > Doug, were you able to generalise and abstract any consideration from what > happened to your customers and why they got regressions moving from maxDocs > to docCount(field dependent) ? > > > > > - > --- > Alessandro Benedetti > Search Consultant, R Software Engineer, Director > Sease Ltd. - www.sease.io > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Consultant, OpenSource Connections. Contact info at http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)
Re: SolrIndexSearcher count
Do you have any custom code in the mix anywhere? On Tue, Dec 5, 2017 at 5:02 AM, Rick Digwrote: > Hello all, > is it normal to have many instances (100+) of SolrIndexSearchers to be open > at the same time? Our Heap Analysis shows this to be the case. > > We have autoCommit for every 5 minutes, with openSearcher=true, would this > close the old searcher and create a new one or just create a new one with > the old one still not getting dereferenced? if so, when do the older > searchers get cleaned up ? > > thanks for your help > -rakshit
Re: Logging in Solrcloud
HTTP request log, not solr.log. This is intra-cluster: 10.98.15.241 - - [29/Oct/2017:23:59:57 +] "POST //sc16.prod2.cloud.cheggnet.com:8983/solr/questions_shard4_replica8/auto HTTP/1.1" 200 194 This is from outside (yes, we have long queries): 10.98.15.110 - - [29/Oct/2017:23:59:58 +] "GET //solr-cloud.prod2.cheggnet.com:8983/solr/questions/srp?qt=%2Fsrp=jack+and+jill+are+maneuvering+a+2800+kg+boat+near+a+dock.+initially+the+boat%27s+position+is+m+and+its+speed+is+1.9+m%2Fs.+as+the+boat+moves+to+position+m%2C+jack+exerts+a+force+n+and+jill+exerts=source%3Atbs=0=2=true=jack+and+jill+are+maneuvering+a+2800+kg+boat+near+a+dock.+initially+the+boat%27s+position+is+m+and+its+speed+is+1.9+m%2Fs.+as+the+boa In your case, “gettingstarted_shard1_replica_n2” should mean that is an intra-cluster request. Also, “distrib=false” means it is for a single core. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 5, 2017, at 7:38 AM, Matzdorf, Stefan, Springer SBM DE >wrote: > > first of all, i'm using solr 7.1.0 ... > > i took a look into the logfile of solr and see the follwowing 2 log > statements for query "test": > > 4350609 INFO (qtp1918627686-691) [c:gettingstarted s:shard1 r:core_node5 > x:gettingstarted_shard1_replica_n2] o.a.s.c.S.Request > [gettingstarted_shard1_replica_n2] webapp=/solr path=/select > params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:8983/solr/gettingstarted_shard1_replica_n2/|http://127.0.1.1:7574/solr/gettingstarted_shard1_replica_n1/=10=2=test=1512474643732=true=javabin} > hits=0 status=0 QTime=0 > > 4350615 INFO (qtp1918627686-20) [c:gettingstarted s:shard2 r:core_node8 > x:gettingstarted_shard2_replica_n6] o.a.s.c.S.Request > [gettingstarted_shard2_replica_n6] webapp=/solr path=/select > params={q=test=on=json} hits=0 status=0 QTime=7 > > > Both were logged by the org.apache.solr.core.Request - logger (i configured > that to log on info level), but there is no information about what kind of > request (GET/POST etc) comes in. it just logs what you could see above. do > you use a different logger for that? (and with logger in that case i mean the > ones you could configre und the logger/level menu in the solr ui, where to > choose what you want to log). > > Regards > Matze > > > -- > Stefan Matzdorf > Software Engineer > B2X Platform Development > > Springer Nature > Heidelberger Platz 3, 14197 Berlin, Germany > T +4903827975072 > stefan.matzd...@springer.com > www.springernature.com > --- > Springer Nature is one of the world’s leading global research, educational > and professional publishers, created in May 2015 through the combination of > Nature Publishing Group, > Palgrave Macmillan, Macmillan Education and Springer Science+Business Media. > --- > Springer Science+Business Media Deutschland GmbH > Registered Office: Berlin / Amtsgericht Berlin-Charlottenburg, HRB 152987 B > Directors: Derk Haank, Martin Mos, Dr. Ulrich Vest > > > Von: Walter Underwood > Gesendet: Dienstag, 5. Dezember 2017 16:20 > An: solr-user@lucene.apache.org > Betreff: Re: Logging in Solrcloud > > In 6.5.1, the intra-cluster requests are POST, which makes them easy to > distinguish in the request logs. Also, the intra-cluster requests go to a > specific core instead of to the collection. So we use the request logs and > grep out the GET lines. > > We are considering fronting every Solr process with a local nginx server. > That will allow us to limit concurrent connections. It will also give us a > log of just the client requests. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Dec 5, 2017, at 4:25 AM, Matzdorf, Stefan, Springer SBM DE >> wrote: >> >> To be more precisely and provide some more details, i tried to simplify the >> problem by using the Solr-examples that were delivered with the solr >> So i started bin/solr -e cloud, using 2 nodes, 2 shards and replication of 2. >> >> To understand the following, it might be important to know, which ports are >> used: >> node 1: 8983 (leader for shard1 and shard2) >> node 2: 7574 (no leader at all) >> >> >> In this example i searched for 3 terms in the following order: first on node >> 1 (8983 - leader) and then on node 2 (7574). >> >> Sample1 (q=test): >> http://localhost:8983/solr/gettingstarted/select?indent=on=test=json >> >> produced logs: >> 1) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select >> params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard1_replica_n1/|http://127.0.1.1:8983/solr/gettingstarted_shard1_replica_n2/=10=2=test=1512474523045=true=javabin} >> hits=0 status=0 QTime=1 >> 2) [gettingstarted_shard2_replica_n4] webapp=/solr path=/select >>
AW: Logging in Solrcloud
first of all, i'm using solr 7.1.0 ... i took a look into the logfile of solr and see the follwowing 2 log statements for query "test": 4350609 INFO (qtp1918627686-691) [c:gettingstarted s:shard1 r:core_node5 x:gettingstarted_shard1_replica_n2] o.a.s.c.S.Request [gettingstarted_shard1_replica_n2] webapp=/solr path=/select params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:8983/solr/gettingstarted_shard1_replica_n2/|http://127.0.1.1:7574/solr/gettingstarted_shard1_replica_n1/=10=2=test=1512474643732=true=javabin} hits=0 status=0 QTime=0 4350615 INFO (qtp1918627686-20) [c:gettingstarted s:shard2 r:core_node8 x:gettingstarted_shard2_replica_n6] o.a.s.c.S.Request [gettingstarted_shard2_replica_n6] webapp=/solr path=/select params={q=test=on=json} hits=0 status=0 QTime=7 Both were logged by the org.apache.solr.core.Request - logger (i configured that to log on info level), but there is no information about what kind of request (GET/POST etc) comes in. it just logs what you could see above. do you use a different logger for that? (and with logger in that case i mean the ones you could configre und the logger/level menu in the solr ui, where to choose what you want to log). Regards Matze -- Stefan Matzdorf Software Engineer B2X Platform Development Springer Nature Heidelberger Platz 3, 14197 Berlin, Germany T +4903827975072 stefan.matzd...@springer.com www.springernature.com --- Springer Nature is one of the world’s leading global research, educational and professional publishers, created in May 2015 through the combination of Nature Publishing Group, Palgrave Macmillan, Macmillan Education and Springer Science+Business Media. --- Springer Science+Business Media Deutschland GmbH Registered Office: Berlin / Amtsgericht Berlin-Charlottenburg, HRB 152987 B Directors: Derk Haank, Martin Mos, Dr. Ulrich Vest Von: Walter UnderwoodGesendet: Dienstag, 5. Dezember 2017 16:20 An: solr-user@lucene.apache.org Betreff: Re: Logging in Solrcloud In 6.5.1, the intra-cluster requests are POST, which makes them easy to distinguish in the request logs. Also, the intra-cluster requests go to a specific core instead of to the collection. So we use the request logs and grep out the GET lines. We are considering fronting every Solr process with a local nginx server. That will allow us to limit concurrent connections. It will also give us a log of just the client requests. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 5, 2017, at 4:25 AM, Matzdorf, Stefan, Springer SBM DE > wrote: > > To be more precisely and provide some more details, i tried to simplify the > problem by using the Solr-examples that were delivered with the solr > So i started bin/solr -e cloud, using 2 nodes, 2 shards and replication of 2. > > To understand the following, it might be important to know, which ports are > used: > node 1: 8983 (leader for shard1 and shard2) > node 2: 7574 (no leader at all) > > > In this example i searched for 3 terms in the following order: first on node > 1 (8983 - leader) and then on node 2 (7574). > > Sample1 (q=test): >http://localhost:8983/solr/gettingstarted/select?indent=on=test=json > >produced logs: > 1) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select > params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard1_replica_n1/|http://127.0.1.1:8983/solr/gettingstarted_shard1_replica_n2/=10=2=test=1512474523045=true=javabin} > hits=0 status=0 QTime=1 > 2) [gettingstarted_shard2_replica_n4] webapp=/solr path=/select > params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard2_replica_n4/|http://127.0.1.1:8983/solr/gettingstarted_shard2_replica_n6/=10=2=test=1512474523045=true=javabin} > hits=0 status=0 QTime=1 > > > >http://localhost:7574/solr/gettingstarted/select?indent=on=test=json > >produced logs: > 1) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select > params={q=test=on=json} hits=0 status=0 QTime=17 > > ## > ## > > Sample2 (q=foo): >http://localhost:8983/solr/gettingstarted/select?indent=on=foo=json > >produced logs: > 1) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select > params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard1_replica_n1/|http://127.0.1.1:8983/solr/gettingstarted_shard1_replica_n2/=10=2=foo=1512474569299=true=javabin} > hits=0 status=0 QTime=0 > > > >http://localhost:7574/solr/gettingstarted/select?indent=on=foo=json > >produced logs: > 1)
Re: Skewed IDF in multi lingual index, again
Thanks Yonik and thanks Doug. I agree with Doug in adding few generics test corpora Jenkins automatically runs some metrics on, to evaluate Apache Lucene/Solr changes don't affect a golden truth too much. This of course can be very complex, but I think it is a direction the Apache Lucene/Solr community should work on. Given that, I do believe that in this case, moving from maxDocs(field independent) to docCount(field dependent) was a good move ( and this specific multi language use case is an example). Actually I also believe that theoretically docCount(field dependent) is still better than maxDocs(field dependent). This is because docCount(field dependent) represents a state in time associated to the current index while maxDocs represents an historical consideration. A corpus of documents can change in time, and how much a term is rare can drastically change ( let's pick an highly dynamic domain such news). Doug, were you able to generalise and abstract any consideration from what happened to your customers and why they got regressions moving from maxDocs to docCount(field dependent) ? - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Java profiler?
Anybody have a favorite profiler to use with Solr? I’ve been asked to look at why out queries are slow on a detail level. Personally, I think they are slow because they are so long, up to 40 terms. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Re: Logging in Solrcloud
In 6.5.1, the intra-cluster requests are POST, which makes them easy to distinguish in the request logs. Also, the intra-cluster requests go to a specific core instead of to the collection. So we use the request logs and grep out the GET lines. We are considering fronting every Solr process with a local nginx server. That will allow us to limit concurrent connections. It will also give us a log of just the client requests. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 5, 2017, at 4:25 AM, Matzdorf, Stefan, Springer SBM DE >wrote: > > To be more precisely and provide some more details, i tried to simplify the > problem by using the Solr-examples that were delivered with the solr > So i started bin/solr -e cloud, using 2 nodes, 2 shards and replication of 2. > > To understand the following, it might be important to know, which ports are > used: > node 1: 8983 (leader for shard1 and shard2) > node 2: 7574 (no leader at all) > > > In this example i searched for 3 terms in the following order: first on node > 1 (8983 - leader) and then on node 2 (7574). > > Sample1 (q=test): >http://localhost:8983/solr/gettingstarted/select?indent=on=test=json > >produced logs: > 1) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select > params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard1_replica_n1/|http://127.0.1.1:8983/solr/gettingstarted_shard1_replica_n2/=10=2=test=1512474523045=true=javabin} > hits=0 status=0 QTime=1 > 2) [gettingstarted_shard2_replica_n4] webapp=/solr path=/select > params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard2_replica_n4/|http://127.0.1.1:8983/solr/gettingstarted_shard2_replica_n6/=10=2=test=1512474523045=true=javabin} > hits=0 status=0 QTime=1 > > > >http://localhost:7574/solr/gettingstarted/select?indent=on=test=json > >produced logs: > 1) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select > params={q=test=on=json} hits=0 status=0 QTime=17 > > ## > ## > > Sample2 (q=foo): >http://localhost:8983/solr/gettingstarted/select?indent=on=foo=json > >produced logs: > 1) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select > params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard1_replica_n1/|http://127.0.1.1:8983/solr/gettingstarted_shard1_replica_n2/=10=2=foo=1512474569299=true=javabin} > hits=0 status=0 QTime=0 > > > >http://localhost:7574/solr/gettingstarted/select?indent=on=foo=json > >produced logs: > 1) [gettingstarted_shard2_replica_n4] webapp=/solr path=/select > params={q=foo=on=json} hits=0 status=0 QTime=13 > > ## > ## > > Sample3 (q=test) NOTE- its the same query as in sample1: >http://localhost:8983/solr/gettingstarted/select?indent=on=test=json > >produced logs: > 1) [gettingstarted_shard2_replica_n4] webapp=/solr path=/select > params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard2_replica_n4/|http://127.0.1.1:8983/solr/gettingstarted_shard2_replica_n6/=10=2=test=1512474643732=true=javabin} > hits=0 status=0 QTime=0 > > >http://localhost:7574/solr/gettingstarted/select?indent=on=test=json > >produced logs: > 1) [gettingstarted_shard2_replica_n4] webapp=/solr path=/select > params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard2_replica_n4/|http://127.0.1.1:8983/solr/gettingstarted_shard2_replica_n6/=10=2=test=1512474627254=true=javabin} > hits=0 status=0 QTime=0 > 2) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select > params={q=test=on=json} hits=0 status=0 QTime=13 > > ## > ## > > Sample4 (q=baa): >http://localhost:8983/solr/gettingstarted/select?indent=on=baa=json > >produced logs: > 1) [gettingstarted_shard2_replica_n4] webapp=/solr path=/select > params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard2_replica_n4/|http://127.0.1.1:8983/solr/gettingstarted_shard2_replica_n6/=10=2=baa=1512474709460=true=javabin} > hits=0 status=0 QTime=0 > > >
Re: Metadata passed with CURL (via literal) is not recognized by SOLR ...?
Ok, I found the solution myself. Reason for this behaviour was the "lowernames = true"-configuration of the Tika-requestHandler, that transformed the "module-id" to "module_id". I added a fitting copyField to my schema and it seems to work now. Maybe, this information is useful for someone ... of course, it is mentioned the manual, but finding it is the problem, if you don't know, what you are looking for. ;) Regards Jan Mit freundlichen Grüßen/ With kind regards Jan Schluchtmann Systems Engineering Cluster Instruments VW Group Continental Automotive GmbH Division Interior ID S3 RM VDO-Strasse 1, 64832 Babenhausen, Germany Telefon/Phone: +49 6073 12-4346 Telefax: +49 6073 12-79-4346 Von:jan.christopher.schluchtmann-...@continental-corporation.com An: solr-user@lucene.apache.org, Datum: 05.12.2017 11:02 Betreff:Metadata passed with CURL (via literal) is not recognized by SOLR ...? Hi! I am trying to index RTF-files by uploading them to the Solr-Server with CURL. I am trying to pass the required metadata by the "literal.="-statement. The "id" and the "module-id" are mandatory in my schema. The "id" is recognized correctly, as one can see in the Solr-response "doc=48a0xxx" ... but the "module-id" seems to be neglected. Why is that? Thanks in advance!!! Here is the CURL-command I pass via Windows 10 Powershell: SOLR-REQUEST: curl.exe " http://localhost:8983/solr/ContiReqManCore/update/extract/?commit=true=48a04d8e5da651c5-000ba8a6-1=000d8181=FPK_Medium_19S1=%2FFPK_Medium_19S1=000ba8a6=PVVTS_Functional_FPK_Medium_19S1=%2FFPK_Medium_19S1%2F02_Quality%2F10_Verification-Validation%2FPVVTS_Functional_FPK_Medium_19S1=PVVTS_Funct_=1 " -F "object-ole=@D:\(...)\PVVTS_Funct_263.rtf" SOLR-RESPONSE: { "responseHeader":{ "status":400, "QTime":7}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"[doc=48a04d8e5da651c5-000ba8a6-1] missing required field: module-id", "code":400} } Mit freundlichen Grüßen/ With kind regards Jan Schluchtmann Systems Engineering Cluster Instruments VW Group Continental Automotive GmbH Division Interior ID S3 RM VDO-Strasse 1, 64832 Babenhausen, Germany Telefon/Phone: +49 6073 12-4346 Telefax: +49 6073 12-79-4346
Implicit routing changes to Composite while re-deploy configuration changes
Hi, I have implemented implicit routing with below configuration. Created one default collection manually 'AMS_Config' which contains configurations files schema,solrconfig etc. Using 'AMS_Config' I have created 2 collections model,workset respectively with below command which created 2 shard for each collection containing 2 node for each shard where nodes are each solr instance where collection created. Command: /admin/collections?action=CREATE=model=2=implicit=dr=shard1,shard2=2=AMS_Config Collection Detail: Model = Shard1,Shard2 Shard1 = node1,node2[leader] Shard2 =node1[leader],node2 Configuration in admin UI on solr for model collection: Shard count:2 configName:AMS_Config replicationFactor:2 maxShardsPerNode:2 router:implicit autoAddReplicas:false after this I have index document to particular shard using set value (router.field) dr = shard1 Issue: After indexing the document made changes in schema files and redeploy using below command for set latest configs zkcli.bat -cmd upconfig -confdir ../../solr/AMS_Config/conf -confname AMS_Config -z It will changes the router value implicit to compositeId and now my document are index across all shard so why this should happens.How to avoid this. Please do needful. Regards, Ketan. [CC Award Winners!]
Re: Skewed IDF in multi lingual index, again
Just a piece of feedback from clients on the original docCount change. I have seen several cases with clients where the switch to docCount surprised and harmed relevance. More broadly, I’m concerned when we make these changes there’s not a testing process against test corpuses with judgments and relevance metrics to understand their impact. I see it mentioned in a JIRA from time to time that someone saw an improvement on a private collection in NDCG. And we have to take their word for it. Public testing of relevance against every build using stock settings could be extremely valuable and would more easily justify these changes. Something similar to the performance tests that are made. Sadly I can only complain now :) I wish I had time to work on something like this. Doug On Tue, Dec 5, 2017 at 7:38 AM Yonik Seeleywrote: > On Tue, Dec 5, 2017 at 5:15 AM, alessandro.benedetti > wrote: > > "Lucene/Solr doesn't actually delete documents when you delete them, it > > just marks them as deleted. I'm pretty sure that the difference between > > docCount and maxDoc is deleted documents. Maybe I don't understand what > > I'm talking about, but that is the best I can come up with. " > > > > Thanks Shawn, yes, that is correct and I was aware of it. > > I was curious of another difference : > > I think we confirmed that docCount is local to the field ( thanks Yonik > for > > that) so : > > > > docCount(index,field1)= # of documents in the index that currently have > > value(s) for field1 > > > > My question is : > > > > maxDocs(index,field1)= max # of documents in the index that had value(s) > for > > field1 > > > > OR > > > > maxDocs(index)= max # of documents that appeared in the index ( field > > independent) > > The latter. > I imagine that's why docCount was introduced (to avoid changing the > meaning of an existing term). > FWIW, the scoring change was made in > https://issues.apache.org/jira/browse/LUCENE-6711 for Lucene/Solr 6.0 > > -Yonik > -- Consultant, OpenSource Connections. Contact info at http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)
SolrIndexSearcher count
Hello all, is it normal to have many instances (100+) of SolrIndexSearchers to be open at the same time? Our Heap Analysis shows this to be the case. We have autoCommit for every 5 minutes, with openSearcher=true, would this close the old searcher and create a new one or just create a new one with the old one still not getting dereferenced? if so, when do the older searchers get cleaned up ? thanks for your help -rakshit
Re: Skewed IDF in multi lingual index, again
On Tue, Dec 5, 2017 at 5:15 AM, alessandro.benedettiwrote: > "Lucene/Solr doesn't actually delete documents when you delete them, it > just marks them as deleted. I'm pretty sure that the difference between > docCount and maxDoc is deleted documents. Maybe I don't understand what > I'm talking about, but that is the best I can come up with. " > > Thanks Shawn, yes, that is correct and I was aware of it. > I was curious of another difference : > I think we confirmed that docCount is local to the field ( thanks Yonik for > that) so : > > docCount(index,field1)= # of documents in the index that currently have > value(s) for field1 > > My question is : > > maxDocs(index,field1)= max # of documents in the index that had value(s) for > field1 > > OR > > maxDocs(index)= max # of documents that appeared in the index ( field > independent) The latter. I imagine that's why docCount was introduced (to avoid changing the meaning of an existing term). FWIW, the scoring change was made in https://issues.apache.org/jira/browse/LUCENE-6711 for Lucene/Solr 6.0 -Yonik
Re: Logging in Solrcloud
To be more precisely and provide some more details, i tried to simplify the problem by using the Solr-examples that were delivered with the solr So i started bin/solr -e cloud, using 2 nodes, 2 shards and replication of 2. To understand the following, it might be important to know, which ports are used: node 1: 8983 (leader for shard1 and shard2) node 2: 7574 (no leader at all) In this example i searched for 3 terms in the following order: first on node 1 (8983 - leader) and then on node 2 (7574). Sample1 (q=test): http://localhost:8983/solr/gettingstarted/select?indent=on=test=json produced logs: 1) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard1_replica_n1/|http://127.0.1.1:8983/solr/gettingstarted_shard1_replica_n2/=10=2=test=1512474523045=true=javabin} hits=0 status=0 QTime=1 2) [gettingstarted_shard2_replica_n4] webapp=/solr path=/select params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard2_replica_n4/|http://127.0.1.1:8983/solr/gettingstarted_shard2_replica_n6/=10=2=test=1512474523045=true=javabin} hits=0 status=0 QTime=1 http://localhost:7574/solr/gettingstarted/select?indent=on=test=json produced logs: 1) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select params={q=test=on=json} hits=0 status=0 QTime=17 ## ## Sample2 (q=foo): http://localhost:8983/solr/gettingstarted/select?indent=on=foo=json produced logs: 1) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard1_replica_n1/|http://127.0.1.1:8983/solr/gettingstarted_shard1_replica_n2/=10=2=foo=1512474569299=true=javabin} hits=0 status=0 QTime=0 http://localhost:7574/solr/gettingstarted/select?indent=on=foo=json produced logs: 1) [gettingstarted_shard2_replica_n4] webapp=/solr path=/select params={q=foo=on=json} hits=0 status=0 QTime=13 ## ## Sample3 (q=test) NOTE- its the same query as in sample1: http://localhost:8983/solr/gettingstarted/select?indent=on=test=json produced logs: 1) [gettingstarted_shard2_replica_n4] webapp=/solr path=/select params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard2_replica_n4/|http://127.0.1.1:8983/solr/gettingstarted_shard2_replica_n6/=10=2=test=1512474643732=true=javabin} hits=0 status=0 QTime=0 http://localhost:7574/solr/gettingstarted/select?indent=on=test=json produced logs: 1) [gettingstarted_shard2_replica_n4] webapp=/solr path=/select params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard2_replica_n4/|http://127.0.1.1:8983/solr/gettingstarted_shard2_replica_n6/=10=2=test=1512474627254=true=javabin} hits=0 status=0 QTime=0 2) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select params={q=test=on=json} hits=0 status=0 QTime=13 ## ## Sample4 (q=baa): http://localhost:8983/solr/gettingstarted/select?indent=on=baa=json produced logs: 1) [gettingstarted_shard2_replica_n4] webapp=/solr path=/select params={df=_text_=false=id=score=4=0=true=http://127.0.1.1:7574/solr/gettingstarted_shard2_replica_n4/|http://127.0.1.1:8983/solr/gettingstarted_shard2_replica_n6/=10=2=baa=1512474709460=true=javabin} hits=0 status=0 QTime=0 http://localhost:7574/solr/gettingstarted/select?indent=on=baa=json produced logs: 1) [gettingstarted_shard1_replica_n1] webapp=/solr path=/select params={q=baa=on=json} hits=0 status=0 QTime=12 Sorry for this messy logs. I'll try to sumarize For queries against the node 1, the leading node, i never got those "short logs". just containing what i was querying. Instead i recieve logs containing all these sharding information. Sometimes 2 equivalent ones (see sample 1) and sometimes just one log (sample 2-4). Mentioned that i got different logs for the same query/request (sample1 vs sample3). For queries against the node 2, not leading anything, i got those "short logs" everytime. In addition to that, i also recievie sometimes
Re: Skewed IDF in multi lingual index, again
"Lucene/Solr doesn't actually delete documents when you delete them, it just marks them as deleted. I'm pretty sure that the difference between docCount and maxDoc is deleted documents. Maybe I don't understand what I'm talking about, but that is the best I can come up with. " Thanks Shawn, yes, that is correct and I was aware of it. I was curious of another difference : I think we confirmed that docCount is local to the field ( thanks Yonik for that) so : docCount(index,field1)= # of documents in the index that currently have value(s) for field1 My question is : maxDocs(index,field1)= max # of documents in the index that had value(s) for field1 OR maxDocs(index)= max # of documents that appeared in the index ( field independent) Regards - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Issue with CDCR bootstrapping in Solr 7.1
Tom, Thank you for trying out bunch of things with CDCR setup. I am successfully able to replicate the exact issue on my setup, this is a problem. I have opened a JIRA for the same: https://issues.apache.org/jira/browse/SOLR-11724. Feel free to add any relevant details as you like. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Tue, Dec 5, 2017 at 2:23 AM, Tom Peterswrote: > Not sure how it's possible. But I also tried using the _default config and > just adding in the source and target configuration to make sure I didn't > have something wonky in my custom solrconfig that was causing this issue. I > can confirm that until I restart the follower nodes, they will not receive > the initial index. > > > On Dec 1, 2017, at 12:52 AM, Amrit Sarkar > wrote: > > > > Tom, > > > > (and take care not to restart the leader node otherwise it will replicate > >> from one of the replicas which is missing the index). > > > > How is this possible? Ok I will look more into it. Appreciate if someone > > else also chimes in if they have similar issue. > > > > Amrit Sarkar > > Search Engineer > > Lucidworks, Inc. > > 415-589-9269 > > www.lucidworks.com > > Twitter http://twitter.com/lucidworks > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > Medium: https://medium.com/@sarkaramrit2 > > > > On Fri, Dec 1, 2017 at 4:49 AM, Tom Peters wrote: > > > >> Hi Amrit, I tried issuing hard commits to the various nodes in the > target > >> cluster and it does not appear to cause the follower replicas to receive > >> the initial index. The only way I can get the replicas to see the > original > >> index is by restarting those nodes (and take care not to restart the > leader > >> node otherwise it will replicate from one of the replicas which is > missing > >> the index). > >> > >> > >>> On Nov 30, 2017, at 12:16 PM, Amrit Sarkar > >> wrote: > >>> > >>> Tom, > >>> > >>> This is very useful: > >>> > I found a way to get the follower replicas to receive the documents > from > the leader in the target data center, I have to restart the solr > >> instance > running on that server. Not sure if this information helps at all. > >>> > >>> > >>> You have to issue hardcommit on target after the bootstrapping is done. > >>> Reloading makes the core opening a new searcher. While explicit commit > is > >>> issued at target leader after the BS is done, follower are left > >> unattended > >>> though the docs are copied over. > >>> > >>> Amrit Sarkar > >>> Search Engineer > >>> Lucidworks, Inc. > >>> 415-589-9269 > >>> www.lucidworks.com > >>> Twitter http://twitter.com/lucidworks > >>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >>> Medium: https://medium.com/@sarkaramrit2 > >>> > >>> On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters > >> wrote: > >>> > Hi Amrit, > > Starting with more documents doesn't appear to have made a difference. > This time I tried with >1000 docs. Here are the steps I took: > > 1. Deleted the collection on both the source and target DCs. > > 2. Recreated the collections. > > 3. Indexed >1000 documents on source data center, hard commmit > > $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s > $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; > >> done > solr01-a: 1368 > solr01-b: 1368 > solr01-c: 1368 > solr02-a: 0 > solr02-b: 0 > solr02-c: 0 > > 4. Enabled CDCR and checked docs > > $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START' > > $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s > $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; > >> done > solr01-a: 1368 > solr01-b: 1368 > solr01-c: 1368 > solr02-a: 0 > solr02-b: 0 > solr02-c: 1368 > > Some additional notes: > > * I do not have numRecordsToKeep defined in my solrconfig.xml, so I > >> assume > it will use the default of 100 > > * I found a way to get the follower replicas to receive the documents > >> from > the leader in the target data center, I have to restart the solr > >> instance > running on that server. Not sure if this information helps at all. > > > On Nov 30, 2017, at 11:22 AM, Amrit Sarkar > wrote: > > > > Hi Tom, > > > > I see what you are saying and I too think this is a bug, but I will > confirm > > once on the code. Bootstrapping should happen on all the nodes of the > > target. > > > > Meanwhile can you index more than 100 documents in the source and do > >> the > > exact same experiment again. Followers will not
Metadata passed with CURL (via literal) is not recognized by SOLR ...?
Hi! I am trying to index RTF-files by uploading them to the Solr-Server with CURL. I am trying to pass the required metadata by the "literal.="-statement. The "id" and the "module-id" are mandatory in my schema. The "id" is recognized correctly, as one can see in the Solr-response "doc=48a0xxx" ... but the "module-id" seems to be neglected. Why is that? Thanks in advance!!! Here is the CURL-command I pass via Windows 10 Powershell: SOLR-REQUEST: curl.exe " http://localhost:8983/solr/ContiReqManCore/update/extract/?commit=true=48a04d8e5da651c5-000ba8a6-1=000d8181=FPK_Medium_19S1=%2FFPK_Medium_19S1=000ba8a6=PVVTS_Functional_FPK_Medium_19S1=%2FFPK_Medium_19S1%2F02_Quality%2F10_Verification-Validation%2FPVVTS_Functional_FPK_Medium_19S1=PVVTS_Funct_=1 " -F "object-ole=@D:\(...)\PVVTS_Funct_263.rtf" SOLR-RESPONSE: { "responseHeader":{ "status":400, "QTime":7}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"[doc=48a04d8e5da651c5-000ba8a6-1] missing required field: module-id", "code":400} } Mit freundlichen Grüßen/ With kind regards Jan Schluchtmann Systems Engineering Cluster Instruments VW Group Continental Automotive GmbH Division Interior ID S3 RM VDO-Strasse 1, 64832 Babenhausen, Germany Telefon/Phone: +49 6073 12-4346 Telefax: +49 6073 12-79-4346
Re: Logging in Solrcloud
Hi Stefan, I am not aware of option to log only client side queries, but I think that you can find workaround with what you currently have. If you take a look at log lines for query that comes from the client and one that is result of querying shards, you will see differences - the most simple one, if you are not using solrj for querying, would be wt parameter: e.g. client request might have wt=json while shard requests would have wt=javabin. There are also parameters that are added by Solr for internal calls so just compare log lines you will find some discriminator in your version of Solr. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 5 Dec 2017, at 07:58, Matzdorf, Stefan, Springer SBM DE >wrote: > > Hey everybody, > > i have a question regarding query-request logging in solr-cloud. I've set the > the "org.apache.solr.core.SolrCore.Request"-logger to INFO-level and its > logging all those query-requests. So far so good. BUT, as I'm running Solr in > cloud mode with 3 nodes and 3 shards per collections (with a replica of 3; > distributed about all 3 nodes), i get a logging statement from each node as > well as from each shard. That i get it from each node seems quite obvious to > me. Different server, different solr-instances...ok. But how could i avoid > getting also the logs from the shards itself? > > My main problem is, that i would like to measure, classify etc my queries. > But for example if i would like to count the number of queries it goes a bit > weird. So from one request sent to the cloud i got 5-7 logging statements. (i > guess it depends on the results of found documents within a shard?!). > > > If i could get just one log-statement per node per request (in my case 3) > would be good. But even then, i have to do some math to get the exact values. > At the first look it seems quite easy, dividing by 3, but thats sadly not the > case. So what happens if one node goes down? Then i would just get 2 > log-statements. Thats also the reason why i can't set the log-level to INFO > just on one node. > > > > Long story short, is there a better way to log queries, then setting > "org.apache.solr.core.SolrCore.Request" to INFO??? > > > Thanks in advance?