AW: AW: SolrClient#updateByQuery?
Thanks for all these (main contributor's ) valuable inputs! First thing I did was getting getting rid of "expungeDeletes". My "single-deletion" unittest failed unti I added the optimize-param > updateReques.setParam( "optimize", "true" ); Does this make sense or should JIRA it? How expensive ist this "optimization"? -Ursprüngliche Nachricht- Von: Shawn Heisey [mailto:apa...@elyograg.org] Gesendet: Samstag, 27. Januar 2018 00:49 An: solr-user@lucene.apache.org Betreff: Re: AW: SolrClient#updateByQuery? On 1/26/2018 9:55 AM, Clemens Wyss DEV wrote: > Why do I want to do all this (dumb things)? The context is as follows: > when a document is deleted in an index/core this deletion is not immediately > reflected in the searchresults. Deletions at not really NRT (or has this > changed?). Till now we "solved" this brutely by forcing a commit (with > "expunge deletes"), till we noticed that this results in quite a "heavy > load", to say the least. > Now I have the idea to add a "deleted"-flag to all the documents that is > filtered on on all queries. > When it comes to deletions, I would upate the document's deleted flag and > then effectively delete it. For single deletion this is ok, but what if I > need to re-index? The deleteByQuery functionality is known to have some issues getting along with other things happening at the same time. For best performance and compatibility with concurrent operations, I would strongly recommend that you change all deleteByQuery calls into two steps: Do a standard query with fl=id (or whatever your uniqueKey field is), gather up the ID values (possibly with start/rows pagination or cursorMark), and then proceed to do one or more deleteById calls with those ID values. Both the query and the ID-based delete can coexist with other concurrent operations very well. I would expect that doing atomic updates to a deleted field in your documents is going to be slower than the query/deleteById approach. I cannot be sure this is the case, but I think it would be. It should be a lot more friendly to NRT operation than deleteByQuery. As Walter said, expungeDeletes will result in Solr doing a lot more work than it should, slowing things down even more. It also won't affect search results at all. Once the commit finishes and opens a new searcher, Solr will not include deleted documents in search results. The expungeDeletes parameter can make commits take a VERY long time. I have no idea whether the issues surrounding deleteByQuery can be fixed or not. Thanks, Shawn
Re: 7.2.1 cluster dies within minutes after restart
On 1/26/2018 10:02 AM, Markus Jelsma wrote: > o.a.z.ClientCnxn Client session timed out, have not heard from server in > 22130ms (although zkClientTimeOut is 3). Are you absolutely certain that there is a setting for zkClientTimeout that is actually getting applied? The default value in Solr's example configs is 30 seconds, but the internal default in the code (when no configuration is found) is still 15. I have confirmed this in the code. Looks like SolrCloud doesn't log the values it's using for things like zkClientTimeout. I think it should. https://issues.apache.org/jira/browse/SOLR-11915 Thanks, Shawn
Re: AW: SolrClient#updateByQuery?
On 1/26/2018 9:55 AM, Clemens Wyss DEV wrote: > Why do I want to do all this (dumb things)? The context is as follows: > when a document is deleted in an index/core this deletion is not immediately > reflected in the searchresults. Deletions at not really NRT (or has this > changed?). Till now we "solved" this brutely by forcing a commit (with > "expunge deletes"), till we noticed that this results in quite a "heavy > load", to say the least. > Now I have the idea to add a "deleted"-flag to all the documents that is > filtered on on all queries. > When it comes to deletions, I would upate the document's deleted flag and > then effectively delete it. For single deletion this is ok, but what if I > need to re-index? The deleteByQuery functionality is known to have some issues getting along with other things happening at the same time. For best performance and compatibility with concurrent operations, I would strongly recommend that you change all deleteByQuery calls into two steps: Do a standard query with fl=id (or whatever your uniqueKey field is), gather up the ID values (possibly with start/rows pagination or cursorMark), and then proceed to do one or more deleteById calls with those ID values. Both the query and the ID-based delete can coexist with other concurrent operations very well. I would expect that doing atomic updates to a deleted field in your documents is going to be slower than the query/deleteById approach. I cannot be sure this is the case, but I think it would be. It should be a lot more friendly to NRT operation than deleteByQuery. As Walter said, expungeDeletes will result in Solr doing a lot more work than it should, slowing things down even more. It also won't affect search results at all. Once the commit finishes and opens a new searcher, Solr will not include deleted documents in search results. The expungeDeletes parameter can make commits take a VERY long time. I have no idea whether the issues surrounding deleteByQuery can be fixed or not. Thanks, Shawn
Re: Bitnami, or other Solr on AWS recommendations?
On 1/26/2018 12:24 PM, TK Solr wrote: > If I want to deploy Solr on AWS, do people recommend using the > prepackaged Bitnami Solr image? Or is it better to install Solr > manually on a computer instance? Or are there a better way? Solr has included an installer script for quite some time (since 5.0, I think) that works on many OSes that are not Windows. It has had the most testing on Linux, which I think is what an AWS instance will typically be running. Configuring the service is mostly done with the /etc/default/solr.in.sh file. Getting help on this list is going to be a lot easier if you use the installer included with Solr. For help with the bitnami image, your best bet would be bitnami. They're going to know how they have customized the software. Thanks, Shawn
Re: Bitnami, or other Solr on AWS recommendations?
Also shameless self-promotion, but my company (Fogbeam Labs) is about to launch a Solr / ManifoldCF powered Search-as-a-Service offering. If you'd like to learn more, shoot me an email at prho...@fogbeam.com and I'd be happy to give you the skinny. Phil This message optimized for indexing by NSA PRISM On Fri, Jan 26, 2018 at 4:01 PM, Sameer Maggonwrote: > Although this is shameless promotion, but have you taken a look at > SearchStax (https://www.searchstax.com)? Why not use a Solr-as-a-Service? > > On Fri, Jan 26, 2018 at 11:24 AM, TK Solr wrote: > >> If I want to deploy Solr on AWS, do people recommend using the prepackaged >> Bitnami Solr image? Or is it better to install Solr manually on a computer >> instance? Or are there a better way? >> >> TK >> >> >> > > > -- > Sameer Maggon > Founder, SearchStax, Inc. > https://www.searchstax.com
Re: Solr 7.2.1 - cursorMark and elevateIds
Ok, thanks for the clarification. I'll open a Jira issue. On Fri, 26 Jan 2018 at 01:21, Yonik Seeleywrote: > Yes, please open a JIRA issue. > The elevate component modifies the sort parameter, and it looks like > that doesn't play well with cursorMark, which needs to > serialize/deserialize sort values. > We can either fix the issue, or at a minimum provide a better error > message if cursorMark is limited to sorting on "normal" fields only. > > -Yonik > > > On Wed, Jan 24, 2018 at 3:19 PM, Greg Roodt wrote: > > Given the technical nature of this problem? Do you think I should try > > raising this on the developer group or raising a bug? > > > > > > > > On 24 January 2018 at 12:36, Greg Roodt wrote: > > > >> Hi > >> > >> I'm trying to use the Query Eleveation Component in conjunction with > >> CursorMark pagination. It doesn't seem to work. I get an exception. Are > >> these components meant to work together? > >> > >> This works: > >> enableElevation=true=true=MAAMNqFV1dg > >> > >> This fails: > >> cursorMark=*=true=true& > >> elevateIds=MAAMNqFV1dg > >> > >> Here is the stacktrace: > >> > >> """ > >> 'trace'=>'java.lang.ClassCastException: java.lang.Integer cannot be cast > >> to org.apache.lucene.util.BytesRef at org.apache.solr.schema.FieldType. > >> marshalStringSortValue(FieldType.java:1127) at org.apache.solr.schema. > >> StrField.marshalSortValue(StrField.java:100) at org.apache.solr.search. > >> CursorMark.getSerializedTotem(CursorMark.java:250) at > >> > org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1445) > >> at org.apache.solr.handler.component.QueryComponent. > >> process(QueryComponent.java:375) at org.apache.solr.handler. > >> component.SearchHandler.handleRequestBody(SearchHandler.java:303) at > >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177) > >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503) at > >> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710) at > >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516) at > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382) > >> at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326) > >> at org.eclipse.jetty.servlet.ServletHandler$CachedChain. > >> doFilter(ServletHandler.java:1751) at org.eclipse.jetty.servlet. > >> ServletHandler.doHandle(ServletHandler.java:582) at > >> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > >> at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > >> at org.eclipse.jetty.server.session.SessionHandler. > >> doHandle(SessionHandler.java:226) at org.eclipse.jetty.server. > >> handler.ContextHandler.doHandle(ContextHandler.java:1180) at > >> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > >> at org.eclipse.jetty.server.session.SessionHandler. > >> doScope(SessionHandler.java:185) at org.eclipse.jetty.server. > >> handler.ContextHandler.doScope(ContextHandler.java:1112) at > >> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > >> at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle( > >> ContextHandlerCollection.java:213) at org.eclipse.jetty.server. > >> handler.HandlerCollection.handle(HandlerCollection.java:119) at > >> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > >> at > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) > >> at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > >> at org.eclipse.jetty.server.Server.handle(Server.java:534) at > >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at > >> > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > >> at org.eclipse.jetty.io > .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > >> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at > >> org.eclipse.jetty.io > .SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > >> at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume. > >> executeProduceConsume(ExecuteProduceConsume.java:303) at > >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume. > >> produceConsume(ExecuteProduceConsume.java:148) at > >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run( > >> ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread. > >> QueuedThreadPool.runJob(QueuedThreadPool.java:671) at > >> > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > >> at java.lang.Thread.run(Thread.java:748) > >> """ > >> > >> Any idea what's going wrong? > >> > >> Greg > >> > >> >
Re: Bitnami, or other Solr on AWS recommendations?
Although this is shameless promotion, but have you taken a look at SearchStax (https://www.searchstax.com)? Why not use a Solr-as-a-Service? On Fri, Jan 26, 2018 at 11:24 AM, TK Solrwrote: > If I want to deploy Solr on AWS, do people recommend using the prepackaged > Bitnami Solr image? Or is it better to install Solr manually on a computer > instance? Or are there a better way? > > TK > > > -- Sameer Maggon Founder, SearchStax, Inc. https://www.searchstax.com
Re: Bitnami, or other Solr on AWS recommendations?
I guess I'd say test with the image - especially if you're deploying a larger number of Solr boxes. We do a lot of them where I work and (unfortunately, for reasons I won't bother you with) can't use an image. The time it takes to install solr is noticeable when we deploy Solr on our 100 plus EC2 instances. Of course, if you need to customize Solr or the Solr Server in any way, you can make your own by hand and then build an image from that for use as a base. On Fri, Jan 26, 2018 at 12:24 PM, TK Solrwrote: > If I want to deploy Solr on AWS, do people recommend using the prepackaged > Bitnami Solr image? Or is it better to install Solr manually on a computer > instance? Or are there a better way? > > TK > > >
Re: SolrClient#updateByQuery?
Wait. What do you mean by: "... this deletion is not immediately reflected in the search results..."? Like all other index operations this change won't be "visible" until the next commit, but expungeDeletes is (or should be) totally unnecessary. And very costly for reasons other than you might be aware of, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ If you commit after docs are deleted and _still_ see them in search results, that's a JIRA. That should simply _not_ be the case. Do note, however, that DBQ can take quite a long time to run. Is it possible that the delete isn't complete yet for some reason? As for why there's not an "Update By Query", it's actually fairly awful to deal with. Imagine in Solr's case what UpdateByQuery, set fieldX=32, q=*:*. In order for that to work 1> It's possible that the update single-valued docValues fields (which doesn't need all fields stored) could be made to work with that. That functionality is so new, though, that it hasn't been addressed (and I'm not totally sure it's possible). Assuming the case isn't <1>, it would require 2> it's use "atomic updates under the covers, meaning: 2a> all fields be stored (a pre-requisite for Atomic Updates) 2b> each and every document would be completely re-indexed. Inverted indexes don't lend themselves well to bulk updates FWIW, Erick On Fri, Jan 26, 2018 at 9:50 AM, Walter Underwoodwrote: > Use a filter query to filter out all the documents marked deleted. > > Don’t use “expunge deletes”, it does more than you want because it forces a > merge. Just commit after sending the delete. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Jan 26, 2018, at 8:55 AM, Clemens Wyss DEV wrote: >> >> Thx Emir! >> >>> You are thinking too RDMS >> maybe the DBQ "missled" me >> >>> The best you can do is select and send updates as a single bulk >> how can I do "In-Place Updates" >> (https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates) >> from/through SolrJ? >> >>> Also use DBQ with caution - it does not work well with concurrent updates >> we "prevent" this through sequentialization (per core ) >> >> Why do I want to do all this (dumb things)? The context is as follows: >> when a document is deleted in an index/core this deletion is not immediately >> reflected in the searchresults. Deletions at not really NRT (or has this >> changed?). Till now we "solved" this brutely by forcing a commit (with >> "expunge deletes"), till we noticed that this results in quite a "heavy >> load", to say the least. >> Now I have the idea to add a "deleted"-flag to all the documents that is >> filtered on on all queries. >> When it comes to deletions, I would upate the document's deleted flag and >> then effectively delete it. For single deletion this is ok, but what if I >> need to re-index? >> >> -Ursprüngliche Nachricht- >> Von: Emir Arnautović [mailto:emir.arnauto...@sematext.com] >> Gesendet: Freitag, 26. Januar 2018 17:31 >> An: solr-user@lucene.apache.org >> Betreff: Re: SolrClient#updateByQuery? >> >> Hi Clemens, >> You are thinking too RDMS. You can use query to select doc, but how would >> you provide what are updated doc? I guess you could use this approach only >> for incremental updates or with some scripting language. That is not >> supported at the moment. The best you can do is select and send updates as a >> single bulk. >> >> Also use DBQ with caution - it does not work well with concurrent updates. >> >> HTH, >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & >> Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >>> On 26 Jan 2018, at 17:10, Clemens Wyss DEV wrote: >>> >>> SolrClient has the method(s) deleteByQuery (which I make use of when I need >>> to reindex). >>> #updateByQuery does nicht exist. What if I want to "update all documents >>> matching a query"? >>> >>> >>> Thx >>> Clemens >> >
cdcr replication of new collection doesn't replicate
We have just upgraded our QA solr clouds to 7.2.0 We have 3 solr clouds. collections in the first cloud replicate to the other 2 For existing collections which we upgraded in place using the lucene index upgrade tool seem to behave correctly data written to collections in the first environment replicates to the other 2 We created a new collection has 2 shards each with 2 replicas. The new collection uses tlog replicas instead of NRT replicas. We configured CDCR similarly to other collections so that writes to the first are sent to the other 2 clouds. However, we never see data appear in the target collections. We do see tlog files appear, and I can see cdcr update messages in the logs, but none of the cores ever get any data in them. So the tlogs accumulate but are never loaded into the target collections This doesn't seem correct. I'm at a loss as to what to do next. We will probably copy the index files from the one collection to the other two collections directly, but shouldn't cdcr be sending the data? Does cdcr work with tlog replicas? -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Bitnami, or other Solr on AWS recommendations?
If I want to deploy Solr on AWS, do people recommend using the prepackaged Bitnami Solr image? Or is it better to install Solr manually on a computer instance? Or are there a better way? TK
Re: Solr 4.8.1 multiple client updates the same collection
On 1/26/2018 4:23 AM, Vincenzo D'Amore wrote: > The first client does the following: > > 1. rollbacks all add/deletes made to the index since the last commit (in > case previous client execution was completed unsuccessfully). > 2. reads data from sql server > 3. updates solr documents > 4. manually commits > > And *important*, once a day, the first client deletes all the existing > documents and reindex the entire collection from scratch. > > The second client is simpler, it manually commits after every atomic update. The fact that one client is deleting everything and reindexing changes the landscape dramatically. Since I do not know anything about your setup, I'll make up a similar scenario and describe what I see as the potential problems. Let's say that this theoretical index contains one million documents. A full reindex of this index takes 2 hours and starts at midnight. While the reindex is happening, the first client doesn't do "normal" updates. The second client runs every ten minutes (x:00, x:10, etc), and is completely unaware of what the first client is doing. At 12:01 AM, the full delete has happened to the "under construction" version of the index, and the reindex has been running for one minute. Everything is fine, anyone searching will have the full index available. At 12:10 AM, let's imagine that the second client is going to update one document with the atomic update feature. If the full reindex has indexed that document, this will work, but if it hasn't, the atomic update is going to fail. For the purposes of this scenario, let's assume that the atomic update succeeds, and the second client does its commit. When the second client's commit finishes, the index will have a little over 8 documents in it, instead of one million, because all the documents were deleted and the reindex is only about eight percent complete. The same thing would also happen when autoSoftCommit gets triggered after an update, if autoSoftCommit is configured. If the second client can be paused while the first client is reindexing, and you don't configure autoSoftCommit, then everything will be fine. But if the second client does its work while the reindex is underway, there will be problems. Separate side issue: The fact that your first client does rollbacks could potentially roll back changes made by the second client, unless you can guarantee that the second client will wait until the first client is idle. Thanks, Shawn
Re: SolrClient#updateByQuery?
Use a filter query to filter out all the documents marked deleted. Don’t use “expunge deletes”, it does more than you want because it forces a merge. Just commit after sending the delete. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 26, 2018, at 8:55 AM, Clemens Wyss DEVwrote: > > Thx Emir! > >> You are thinking too RDMS > maybe the DBQ "missled" me > >> The best you can do is select and send updates as a single bulk > how can I do "In-Place Updates" > (https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates) > from/through SolrJ? > >> Also use DBQ with caution - it does not work well with concurrent updates > we "prevent" this through sequentialization (per core ) > > Why do I want to do all this (dumb things)? The context is as follows: > when a document is deleted in an index/core this deletion is not immediately > reflected in the searchresults. Deletions at not really NRT (or has this > changed?). Till now we "solved" this brutely by forcing a commit (with > "expunge deletes"), till we noticed that this results in quite a "heavy > load", to say the least. > Now I have the idea to add a "deleted"-flag to all the documents that is > filtered on on all queries. > When it comes to deletions, I would upate the document's deleted flag and > then effectively delete it. For single deletion this is ok, but what if I > need to re-index? > > -Ursprüngliche Nachricht- > Von: Emir Arnautović [mailto:emir.arnauto...@sematext.com] > Gesendet: Freitag, 26. Januar 2018 17:31 > An: solr-user@lucene.apache.org > Betreff: Re: SolrClient#updateByQuery? > > Hi Clemens, > You are thinking too RDMS. You can use query to select doc, but how would you > provide what are updated doc? I guess you could use this approach only for > incremental updates or with some scripting language. That is not supported at > the moment. The best you can do is select and send updates as a single bulk. > > Also use DBQ with caution - it does not work well with concurrent updates. > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > Elasticsearch Consulting Support Training - http://sematext.com/ > > > >> On 26 Jan 2018, at 17:10, Clemens Wyss DEV wrote: >> >> SolrClient has the method(s) deleteByQuery (which I make use of when I need >> to reindex). >> #updateByQuery does nicht exist. What if I want to "update all documents >> matching a query"? >> >> >> Thx >> Clemens >
Re:***UNCHECKED*** Limit Solr search to number of character/words (without changing index)
Hi Zahid, if you want to allow searching only if the query is shorter than a certain number of terms / characters, I would do it before calling solr probably, otherwise you could write a QueryParserPlugin (see [1]) and check that the query is sound before processing it. See also: http://coding-art.blogspot.co.uk/2016/05/writing-custom-solr-query-parser-for.html Cheers, Diego [1] https://wiki.apache.org/solr/SolrPlugins From: solr-user@lucene.apache.org At: 01/26/18 13:24:36To: solr-user@lucene.apache.org Cc: apa...@elyograg.org Subject: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index) Hi All, Is there any way I can restrict Solr search query to look for specified number of characters/words (for only searching purposes not for highlighting) *For example:* *Indexed content:* *I am a man of my words I am a lazy man...* Search to consider only below mentioned (words=7 or characters=16) *I am a man of my words* If I search for *lazy *no record should find. If I search for *a *1 record should find. Thanks Zahid Iqbal
7.2.1 cluster dies within minutes after restart
Hello, We recently upgraded our clusters from 7.1 to 7.2.1. One collection (2 shard, 2 replica) specifically is in a bad state almost continuously, After proper restart the cluster is all green. Within minutes the logs are flooded with many bad omens: o.a.z.ClientCnxn Client session timed out, have not heard from server in 22130ms (although zkClientTimeOut is 3). o.a.s.c.Overseer could not read the data org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer_elect/leader o.a.s.c.c.DefaultConnectionStrategy Reconnect to ZooKeeper failed:org.apache.solr.common.cloud.ZooKeeperException: A ZK error has occurred o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error trying to proxy request for url etc etc etc 2018-01-26 16:43:31.419 WARN (OverseerAutoScalingTriggerThread-171411573518537026-logs4.gr.nl.openindex.io:8983_solr-n_001853) [ ] o.a.s.c.a.OverseerTriggerThread OverseerTriggerThread woken up but we are closed, exiting. Soon most nodes are gone, maybe one is still green or yellow (recovering from another dead node). A point of interest is that this collection is always under maximum load, receiving hundreds of queries per node per second. We disabled the querying of the cluster and restarted it again, this time it kept running fine, and it continued to run fine even when we slowly restarted the tons of queries that need to be fired. We just reverted the modifications above, the cluster now receives full load of queries as soon as it is available, everything was restarted and everything is suddenly fine again. We really have no clue why for a days everything is fine, then we suddenly come into some weird flow (loaded with o.a.z.ClientCnxn Client session timed out msgs) and it takes several full restarts for things to settle down. Then all is fine until this afternoon where for two hours long the cluster kept dying almost instantly. And at this moment, all is well, again, it seems. The only steady companion when things go bad are the time outs related to ZK. Under normal circumstances, we do not time out due to GC, the heap is just 2 GB. Query response times are ~10 ms even when under maximum load. We would like to know why and how it enters a 'bad state' for no apparent reason. Any ideas? Many thanks! Markus side note: This cluster always has been a pain but 7.2.1 made something worse, reverting to 7.1 is not possible due to index being too new (there were no notes in CHANGES indicateing an index incompatibility between these two minor versions).
AW: SolrClient#updateByQuery?
Thx Emir! > You are thinking too RDMS maybe the DBQ "missled" me > The best you can do is select and send updates as a single bulk how can I do "In-Place Updates" (https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates) from/through SolrJ? >Also use DBQ with caution - it does not work well with concurrent updates we "prevent" this through sequentialization (per core ) Why do I want to do all this (dumb things)? The context is as follows: when a document is deleted in an index/core this deletion is not immediately reflected in the searchresults. Deletions at not really NRT (or has this changed?). Till now we "solved" this brutely by forcing a commit (with "expunge deletes"), till we noticed that this results in quite a "heavy load", to say the least. Now I have the idea to add a "deleted"-flag to all the documents that is filtered on on all queries. When it comes to deletions, I would upate the document's deleted flag and then effectively delete it. For single deletion this is ok, but what if I need to re-index? -Ursprüngliche Nachricht- Von: Emir Arnautović [mailto:emir.arnauto...@sematext.com] Gesendet: Freitag, 26. Januar 2018 17:31 An: solr-user@lucene.apache.org Betreff: Re: SolrClient#updateByQuery? Hi Clemens, You are thinking too RDMS. You can use query to select doc, but how would you provide what are updated doc? I guess you could use this approach only for incremental updates or with some scripting language. That is not supported at the moment. The best you can do is select and send updates as a single bulk. Also use DBQ with caution - it does not work well with concurrent updates. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 26 Jan 2018, at 17:10, Clemens Wyss DEVwrote: > > SolrClient has the method(s) deleteByQuery (which I make use of when I need > to reindex). > #updateByQuery does nicht exist. What if I want to "update all documents > matching a query"? > > > Thx > Clemens
Re: SolrClient#updateByQuery?
Hi Clemens, You are thinking too RDMS. You can use query to select doc, but how would you provide what are updated doc? I guess you could use this approach only for incremental updates or with some scripting language. That is not supported at the moment. The best you can do is select and send updates as a single bulk. Also use DBQ with caution - it does not work well with concurrent updates. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 26 Jan 2018, at 17:10, Clemens Wyss DEVwrote: > > SolrClient has the method(s) deleteByQuery (which I make use of when I need > to reindex). > #updateByQuery does nicht exist. What if I want to "update all documents > matching a query"? > > > Thx > Clemens
SolrClient#updateByQuery?
SolrClient has the method(s) deleteByQuery (which I make use of when I need to reindex). #updateByQuery does nicht exist. What if I want to "update all documents matching a query"? Thx Clemens
Re: pf2
Emir Sow=false .. thanks for this! The problem seems to be due to a stopword. Everything is fine when I avoid stopwords in my query. The stopword might get removed in the query matching, but I would need to allow some slop perhaps for pf2. Thanks Rick On January 26, 2018 8:14:06 AM EST, "Emir Arnautović"wrote: >Hi Rick, >It does not work in any case or it does not work for some cases - e.g. >something like l’avion? Maybe you can try use sow=false and see if it >will help. > >Cheers, >Emir >-- >Monitoring - Log Management - Alerting - Anomaly Detection >Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > >> On 26 Jan 2018, at 13:38, Rick Leir wrote: >> >> Emir >> Thanks, I will do when I get off this bus. >> >> I have run the text thru the SolrAdmin Analyzer, it looks fine. >> >> According to the debugQuery output, individual words match in the qf, >but not the pair that pf2 should match. >> >> I compare the configs for English and French, and they are the same >apart from the analysis chain which is below. Only French fails. I will >take out filters one by one and attempt to find which is causing this. >> Cheers -- Rick >> >> On January 26, 2018 4:09:51 AM EST, "Emir Arnautović" > wrote: >>> Hi Rick, >>> Can you include sample of your query and text that should match. >>> >>> Thanks, >>> Emir >>> -- >>> Monitoring - Log Management - Alerting - Anomaly Detection >>> Solr & Elasticsearch Consulting Support Training - >http://sematext.com/ >>> >>> >>> On 25 Jan 2018, at 23:13, Rick Leir wrote: Hi all My pf2 keywords^11.0 works for english not for french. Here are the >>> fieldtypes, actually from two schema.xml's in separate cores. Solr >>> 5.2.2, edismax, q.op AND I suspect there are several problems with the french schema. Maybe >I >>> only needed to show the query analyzer, not the index analyzer? The pf2 does not show a match in the debugQuery=true output for the >>> French. However, a qf keywords^10.0 does show a match. The keywords >>> field is copyfielded into text, which is the df. Is there any other >>> field I should be showing? Thanks Rick >> positionIncrementGap="100"> >> mapping="mapping-ISOLatin1Accent.txt"/> >> ignoreCase="true" synonyms="synonyms.txt" >>> tokenizerFactory="solr.StandardTokenizerFactory"/> >> words="lang/stopwords_en.txt"/> >> protected="protwords.txt"/> >> dictionary="lang/stemdict_en.txt" ignoreCase="true"/> language="English" >>> /> >> mapping="mapping-ISOLatin1Accent.txt"/> >> words="lang/stopwords_en.txt"/> >> protected="protwords.txt"/> >> dictionary="lang/stemdict_en.txt" ignoreCase="true"/> language="English" >>> /> >> positionIncrementGap="100"> >> mapping="mapping-ISOLatin1Accent.txt"/> >> ignoreCase="true" synonyms="synonyms.txt" >>> tokenizerFactory="solr.StandardTokenizerFactory"/> >> articles="lang/contractions_fr.txt"/> >> ignoreCase="true" words="lang/stopwords_fr.txt"/> >> dictionary="lang/stemdict_fr.txt" ignoreCase="true"/> >> mapping="mapping-ISOLatin1Accent.txt"/> >> articles="lang/contractions_fr.txt"/> >> ignoreCase="true" words="lang/stopwords_fr.txt"/> >> dictionary="lang/stemdict_fr.txt" ignoreCase="true"/> -- Sorry for being brief. Alternate email is rickleir at yahoo dot com >> >> -- >> Sorry for being brief. Alternate email is rickleir at yahoo dot com -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
***UNCHECKED*** Limit Solr search to number of character/words (without changing index)
Hi All, Is there any way I can restrict Solr search query to look for specified number of characters/words (for only searching purposes not for highlighting) *For example:* *Indexed content:* *I am a man of my words I am a lazy man...* Search to consider only below mentioned (words=7 or characters=16) *I am a man of my words* If I search for *lazy *no record should find. If I search for *a *1 record should find. Thanks Zahid Iqbal
Re: pf2
Hi Rick, It does not work in any case or it does not work for some cases - e.g. something like l’avion? Maybe you can try use sow=false and see if it will help. Cheers, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 26 Jan 2018, at 13:38, Rick Leirwrote: > > Emir > Thanks, I will do when I get off this bus. > > I have run the text thru the SolrAdmin Analyzer, it looks fine. > > According to the debugQuery output, individual words match in the qf, but not > the pair that pf2 should match. > > I compare the configs for English and French, and they are the same apart > from the analysis chain which is below. Only French fails. I will take out > filters one by one and attempt to find which is causing this. > Cheers -- Rick > > On January 26, 2018 4:09:51 AM EST, "Emir Arnautović" > wrote: >> Hi Rick, >> Can you include sample of your query and text that should match. >> >> Thanks, >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >>> On 25 Jan 2018, at 23:13, Rick Leir wrote: >>> >>> >>> >>> Hi all >>> My pf2 keywords^11.0 works for english not for french. Here are the >> fieldtypes, actually from two schema.xml's in separate cores. Solr >> 5.2.2, edismax, q.op AND >>> I suspect there are several problems with the french schema. Maybe I >> only needed to show the query analyzer, not the index analyzer? >>> >>> The pf2 does not show a match in the debugQuery=true output for the >> French. However, a qf keywords^10.0 does show a match. The keywords >> field is copyfielded into text, which is the df. Is there any other >> field I should be showing? >>> Thanks >>> Rick >>> >>> > positionIncrementGap="100"> >>> >>> > mapping="mapping-ISOLatin1Accent.txt"/> >>> >>> > ignoreCase="true" synonyms="synonyms.txt" >> tokenizerFactory="solr.StandardTokenizerFactory"/> >>> > words="lang/stopwords_en.txt"/> >>> >>> >>> > protected="protwords.txt"/> >>> > dictionary="lang/stemdict_en.txt" ignoreCase="true"/> >>> >>> > /> >>> >>> >>> >>> > mapping="mapping-ISOLatin1Accent.txt"/> >>> >>> > words="lang/stopwords_en.txt"/> >>> >>> >>> > protected="protwords.txt"/> >>> > dictionary="lang/stemdict_en.txt" ignoreCase="true"/> >>> >>> > /> >>> >>> >>> >>> >>> > positionIncrementGap="100"> >>> >>> > mapping="mapping-ISOLatin1Accent.txt"/> >>> >>> > ignoreCase="true" synonyms="synonyms.txt" >> tokenizerFactory="solr.StandardTokenizerFactory"/> >>> > articles="lang/contractions_fr.txt"/> >>> >>> > ignoreCase="true" words="lang/stopwords_fr.txt"/> >>> >>> > dictionary="lang/stemdict_fr.txt" ignoreCase="true"/> >>> >>> >>> >>> > mapping="mapping-ISOLatin1Accent.txt"/> >>> >>> > articles="lang/contractions_fr.txt"/> >>> >>> > ignoreCase="true" words="lang/stopwords_fr.txt"/> >>> >>> > dictionary="lang/stemdict_fr.txt" ignoreCase="true"/> >>> >>> >>> >>> >>> -- >>> Sorry for being brief. Alternate email is rickleir at yahoo dot com > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com
RE: solr usage reporting
Becky, There are excellent log analysis systems. Logstash? Awstats? I do not think Solr should do this. Some people index their logs into a separate Solr core for analysis, but it might be a challenge to do this in a useful way. Cheers -- Rick On January 25, 2018 2:56:01 PM EST, Becky Bonnerwrote: >That would work for a single server but collecting the logs from the >farm would be a problematic since we would have logs from all nodes and >replicas from all the members of the farm. We would then need weed out >what we are interested in and combine. It would be better if there were >a way to query it within Solr. I think something in Solr would be best >... a separate collection that can be queried and reports generated >from it. The log does have the basic info we need though. > > >-Original Message- >From: Marco Reis [mailto:m...@marcoreis.net] >Sent: Thursday, January 25, 2018 11:14 AM >To: solr-user@lucene.apache.org >Subject: Re: solr usage reporting > >One way is to collect the log from your server and, then, use other >tool to generate your report. > > >On Thu, Jan 25, 2018 at 2:59 PM Becky Bonner >wrote: > >> Hi all, >> We are in the process of replacing our Google Search Appliance with >> SOLR >> 7.1 and are needing one last piece of our requirements. We provide a > >> monthly report to our business that shows the top 1000 query terms >> requested during the date range as well as the query terms requested >> that contained no results. Is there a way to log the requests and >> later query solr for these results? Or is there a plugin to add this >functionality? >> >> Your help appreciated. >> Bcubed >> >> >> -- >Marco Reis >Software Engineer >http://marcoreis.net >https://github.com/masreis >+55 61 9 81194620 -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: pf2
Emir Thanks, I will do when I get off this bus. I have run the text thru the SolrAdmin Analyzer, it looks fine. According to the debugQuery output, individual words match in the qf, but not the pair that pf2 should match. I compare the configs for English and French, and they are the same apart from the analysis chain which is below. Only French fails. I will take out filters one by one and attempt to find which is causing this. Cheers -- Rick On January 26, 2018 4:09:51 AM EST, "Emir Arnautović"wrote: >Hi Rick, >Can you include sample of your query and text that should match. > >Thanks, >Emir >-- >Monitoring - Log Management - Alerting - Anomaly Detection >Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > >> On 25 Jan 2018, at 23:13, Rick Leir wrote: >> >> >> >> Hi all >> My pf2 keywords^11.0 works for english not for french. Here are the >fieldtypes, actually from two schema.xml's in separate cores. Solr >5.2.2, edismax, q.op AND >> I suspect there are several problems with the french schema. Maybe I >only needed to show the query analyzer, not the index analyzer? >> >> The pf2 does not show a match in the debugQuery=true output for the >French. However, a qf keywords^10.0 does show a match. The keywords >field is copyfielded into text, which is the df. Is there any other >field I should be showing? >> Thanks >> Rick >> >> positionIncrementGap="100"> >> >> mapping="mapping-ISOLatin1Accent.txt"/> >> >> ignoreCase="true" synonyms="synonyms.txt" >tokenizerFactory="solr.StandardTokenizerFactory"/> >> words="lang/stopwords_en.txt"/> >> >> >> protected="protwords.txt"/> >> dictionary="lang/stemdict_en.txt" ignoreCase="true"/> >> >> /> >> >> >> >> mapping="mapping-ISOLatin1Accent.txt"/> >> >> words="lang/stopwords_en.txt"/> >> >> >> protected="protwords.txt"/> >> dictionary="lang/stemdict_en.txt" ignoreCase="true"/> >> >> /> >> >> >> >> >> positionIncrementGap="100"> >> >> mapping="mapping-ISOLatin1Accent.txt"/> >> >> ignoreCase="true" synonyms="synonyms.txt" >tokenizerFactory="solr.StandardTokenizerFactory"/> >> articles="lang/contractions_fr.txt"/> >> >> ignoreCase="true" words="lang/stopwords_fr.txt"/> >> >> dictionary="lang/stemdict_fr.txt" ignoreCase="true"/> >> >> >> >> mapping="mapping-ISOLatin1Accent.txt"/> >> >> articles="lang/contractions_fr.txt"/> >> >> ignoreCase="true" words="lang/stopwords_fr.txt"/> >> >> dictionary="lang/stemdict_fr.txt" ignoreCase="true"/> >> >> >> >> >> -- >> Sorry for being brief. Alternate email is rickleir at yahoo dot com -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: Solr 4.8.1 multiple client updates the same collection
Thanks Shawn, >> >> 30 >> false >> >> > > That autoCommit configuration is not affecting document visibility at all, > because openSearcher is set to false. Don't rush to change this -- this > kind of configuration is what you want. I would probably use one minute > here rather than five minutes, but again, don't be in a rush to change it > without some evidence that the change is needed. Well, I don't want change this... I showed this configuration only to make clear what's current situation. > > > But in order to have prices updates as soon as possible, we're >> planning to add a second client that, even while the first client is >> running, should submit many prices atomic updates. >> >> Now I'm worried about to have two clients on the same collection, even >> if those clients can be orchestrated using a kind of semaphore, I'm >> afraid that those atomic commits could come too quickly or in worst >> case might even overlap the other (first) client. >> > > Having multiple clients send updates should not be a problem. This is the > recommended way to increase indexing speed. > > When/where are the commits that open a new searcher happening? The > autoCommit settings aren't handling that. > > The first client does the following: 1. rollbacks all add/deletes made to the index since the last commit (in case previous client execution was completed unsuccessfully). 2. reads data from sql server 3. updates solr documents 4. manually commits And *important*, once a day, the first client deletes all the existing documents and reindex the entire collection from scratch. The second client is simpler, it manually commits after every atomic update. > There are several ways to accomplish commits that make changes visible. > Three of them are what I would call "correct" to pair with your autoCommit > settings. > > One good option is to configure autoSoftCommit, with a value that > describes how long after the first update a commit will take place. A > second option is to include a "commitWithin" parameter on every update > request, with a value that works similarly to autoSoftCommit. Another is > to send an update request that explicitly commits. Ideally those manual > commits will be soft commits. > > I would recommend one of the first two options, but be very careful about > making the intervals too short. The interval should be longer than it > takes for a typical commit to actually happen, probably two to three times > as long, or longer. I would not recommend manual commit requests unless > you can be sure that only one of your clients will send them, and that the > commits will be spaced far enough apart that they can't overlap. I am > using manual soft commits for updates on my own indexes. > > I could change current configuration adding autoSoftCommit and openSearcher=true. But not sure how can I, with this configuration, have once a day a full reindex. I mean, with the current configuration any change is visible only after the manual commit. So even when there is a full reindex, the new entire index is visible only when the client commits. I don't think I can do the same thing having autoSoftCommit with openSearcher=true. Right? -- Vincenzo D'Amore
Re: pf2
Hi Rick, Can you include sample of your query and text that should match. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 25 Jan 2018, at 23:13, Rick Leirwrote: > > > > Hi all > My pf2 keywords^11.0 works for english not for french. Here are the > fieldtypes, actually from two schema.xml's in separate cores. Solr 5.2.2, > edismax, q.op AND > I suspect there are several problems with the french schema. Maybe I only > needed to show the query analyzer, not the index analyzer? > > The pf2 does not show a match in the debugQuery=true output for the French. > However, a qf keywords^10.0 does show a match. The keywords field is > copyfielded into text, which is the df. Is there any other field I should be > showing? > Thanks > Rick > > > >mapping="mapping-ISOLatin1Accent.txt"/> > >synonyms="synonyms.txt" tokenizerFactory="solr.StandardTokenizerFactory"/> >words="lang/stopwords_en.txt"/> > > > >dictionary="lang/stemdict_en.txt" ignoreCase="true"/> > > > > > >mapping="mapping-ISOLatin1Accent.txt"/> > >words="lang/stopwords_en.txt"/> > > > >dictionary="lang/stemdict_en.txt" ignoreCase="true"/> > > > > > > > > >mapping="mapping-ISOLatin1Accent.txt"/> > >synonyms="synonyms.txt" tokenizerFactory="solr.StandardTokenizerFactory"/> >articles="lang/contractions_fr.txt"/> > >words="lang/stopwords_fr.txt"/> > >dictionary="lang/stemdict_fr.txt" ignoreCase="true"/> > > > >mapping="mapping-ISOLatin1Accent.txt"/> > >articles="lang/contractions_fr.txt"/> > >words="lang/stopwords_fr.txt"/> > >dictionary="lang/stemdict_fr.txt" ignoreCase="true"/> > > > > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com