Re: regex-urlfilter help
Yeah,, I'm curious why this thread is used to talk that topic. I'll start a new thread on my questions. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-cannot-provide-index-service-after-a-large-GC-pause-but-core-state-in-ZK-is-still-active-tp4308942p4310302.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Very long young generation stop the world GC pause
Sorry for my wrong memory. The swap is 16GB. -- View this message in context: http://lucene.472066.n3.nabble.com/Very-long-young-generation-stop-the-world-GC-pause-tp4308911p4310301.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Soft commit and reading data just after the commit
I didn't look much onto REALTIME GET handler. Thanks for mentioning that. I'm checking it now On 19 Dec 2016 10:09, "Lasitha Wattaladeniya"wrote: > Hi all, > > Thanks for your replies, > > @dorian : the requirement is, we are showing a list of entries on a page. > For each user there's a read / unread flag. The data for listing is > fetched from solr. And you can see the entry was previously read or not. So > when a user views an entry by clicking. We are updating the database flag > to READ and use real time indexing to update solr entry. So when the user > close the full view of the entry and go back to entry listing page, the > data fetched from solr should be updated to READ. That's the use case we > are trying to fix. > > @eric : thanks for the lengthy reply. So let's say I increase the > autosoftcommit time out to may be 100 ms. In that case do I have to wait > much that time from client side before calling search ?. What's the > correct way of achieving this? > > Regards, > Lasitha > > On 18 Dec 2016 23:52, "Erick Erickson" wrote: > >> 1 ms autocommit is far too frequent. And it's not >> helping you anyway. >> >> There is some lag between when a commit happens >> and when the docs are really available. The sequence is: >> 1> commit (soft or hard-with-opensearcher=true doesn't matter). >> 2> a new searcher is opened and autowarming starts >> 3> until the new searcher is opened, queries continue to be served by >> the old searcher >> 4> the new searcher is fully opened >> 5> _new_ requests are served by the new searcher. >> 6> the last request is finished by the old searcher and it's closed. >> >> So what's probably happening is that you send docs and then send a >> query and Solr is still in step <3>. You can look at your admin UI >> pluginst/stats page or your log to see how long it takes for a >> searcher to open and adjust your expectations accordingly. >> >> If you want to fetch only the document (not try to get it by a >> search), Real Time Get is designed to insure that you always get the >> most recent copy whether it's searchable or not. >> >> All that said, Solr wasn't designed for autocommits that are that >> frequent. That's why the documentation talks about _Near_ Real Time. >> You may need to adjust your expectations. >> >> Best, >> Erick >> >> On Sun, Dec 18, 2016 at 6:49 AM, Dorian Hoxha >> wrote: >> > There's a very high probability that you're using the wrong tool for the >> > job if you need 1ms softCommit time. Especially when you always need it >> (ex >> > there are apps where you need commit-after-insert very rarely). >> > >> > So explain what you're using it for ? >> > >> > On Sun, Dec 18, 2016 at 3:38 PM, Lasitha Wattaladeniya < >> watt...@gmail.com> >> > wrote: >> > >> >> Hi Furkan, >> >> >> >> Thanks for the links. I had read the first one but not the second one. >> I >> >> did read it after you sent. So in my current solrconfig.xml settings >> below >> >> are the configurations, >> >> >> >> >> >>${solr.autoSoftCommit.maxTime:1} >> >> >> >> >> >> >> >> >> >>15000 >> >>false >> >> >> >> >> >> The problem i'm facing is, just after adding the documents to solr >> using >> >> solrj, when I retrieve data from solr I am not getting the updated >> results. >> >> This happens time to time. Most of the time I get the correct data but >> in >> >> some occasions I get wrong results. so as you suggest, what the best >> >> practice to use here ? , should I wait 1 mili second before calling for >> >> updated results ? >> >> >> >> Regards, >> >> Lasitha >> >> >> >> Lasitha Wattaladeniya >> >> Software Engineer >> >> >> >> Mobile : +6593896893 >> >> Blog : techreadme.blogspot.com >> >> >> >> On Sun, Dec 18, 2016 at 8:46 PM, Furkan KAMACI > > >> >> wrote: >> >> >> >> > Hi Lasitha, >> >> > >> >> > First of all, did you check these: >> >> > >> >> > https://cwiki.apache.org/confluence/display/solr/Near+ >> >> Real+Time+Searching >> >> > https://lucidworks.com/blog/2013/08/23/understanding- >> >> > transaction-logs-softcommit-and-commit-in-sorlcloud/ >> >> > >> >> > after that, if you cannot adjust your configuration you can give more >> >> > information and we can find a solution. >> >> > >> >> > Kind Regards, >> >> > Furkan KAMACI >> >> > >> >> > On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya < >> >> watt...@gmail.com> >> >> > wrote: >> >> > >> >> >> Hi furkan, >> >> >> >> >> >> Thanks for your reply, it is generally a query heavy system. We are >> >> using >> >> >> realtime indexing for editing the available data >> >> >> >> >> >> Regards, >> >> >> Lasitha >> >> >> >> >> >> Lasitha Wattaladeniya >> >> >> Software Engineer >> >> >> >> >> >> Mobile : +6593896893 <+65%209389%206893> >> >> >> Blog : techreadme.blogspot.com >> >> >> >> >> >> On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI < >> furkankam...@gmail.com> >> >> >> wrote: >> >> >> >> >> >>> Hi Lasitha,
Re: Soft commit and reading data just after the commit
Hi all, Thanks for your replies, @dorian : the requirement is, we are showing a list of entries on a page. For each user there's a read / unread flag. The data for listing is fetched from solr. And you can see the entry was previously read or not. So when a user views an entry by clicking. We are updating the database flag to READ and use real time indexing to update solr entry. So when the user close the full view of the entry and go back to entry listing page, the data fetched from solr should be updated to READ. That's the use case we are trying to fix. @eric : thanks for the lengthy reply. So let's say I increase the autosoftcommit time out to may be 100 ms. In that case do I have to wait much that time from client side before calling search ?. What's the correct way of achieving this? Regards, Lasitha On 18 Dec 2016 23:52, "Erick Erickson"wrote: > 1 ms autocommit is far too frequent. And it's not > helping you anyway. > > There is some lag between when a commit happens > and when the docs are really available. The sequence is: > 1> commit (soft or hard-with-opensearcher=true doesn't matter). > 2> a new searcher is opened and autowarming starts > 3> until the new searcher is opened, queries continue to be served by > the old searcher > 4> the new searcher is fully opened > 5> _new_ requests are served by the new searcher. > 6> the last request is finished by the old searcher and it's closed. > > So what's probably happening is that you send docs and then send a > query and Solr is still in step <3>. You can look at your admin UI > pluginst/stats page or your log to see how long it takes for a > searcher to open and adjust your expectations accordingly. > > If you want to fetch only the document (not try to get it by a > search), Real Time Get is designed to insure that you always get the > most recent copy whether it's searchable or not. > > All that said, Solr wasn't designed for autocommits that are that > frequent. That's why the documentation talks about _Near_ Real Time. > You may need to adjust your expectations. > > Best, > Erick > > On Sun, Dec 18, 2016 at 6:49 AM, Dorian Hoxha > wrote: > > There's a very high probability that you're using the wrong tool for the > > job if you need 1ms softCommit time. Especially when you always need it > (ex > > there are apps where you need commit-after-insert very rarely). > > > > So explain what you're using it for ? > > > > On Sun, Dec 18, 2016 at 3:38 PM, Lasitha Wattaladeniya < > watt...@gmail.com> > > wrote: > > > >> Hi Furkan, > >> > >> Thanks for the links. I had read the first one but not the second one. I > >> did read it after you sent. So in my current solrconfig.xml settings > below > >> are the configurations, > >> > >> > >>${solr.autoSoftCommit.maxTime:1} > >> > >> > >> > >> > >>15000 > >>false > >> > >> > >> The problem i'm facing is, just after adding the documents to solr using > >> solrj, when I retrieve data from solr I am not getting the updated > results. > >> This happens time to time. Most of the time I get the correct data but > in > >> some occasions I get wrong results. so as you suggest, what the best > >> practice to use here ? , should I wait 1 mili second before calling for > >> updated results ? > >> > >> Regards, > >> Lasitha > >> > >> Lasitha Wattaladeniya > >> Software Engineer > >> > >> Mobile : +6593896893 > >> Blog : techreadme.blogspot.com > >> > >> On Sun, Dec 18, 2016 at 8:46 PM, Furkan KAMACI > >> wrote: > >> > >> > Hi Lasitha, > >> > > >> > First of all, did you check these: > >> > > >> > https://cwiki.apache.org/confluence/display/solr/Near+ > >> Real+Time+Searching > >> > https://lucidworks.com/blog/2013/08/23/understanding- > >> > transaction-logs-softcommit-and-commit-in-sorlcloud/ > >> > > >> > after that, if you cannot adjust your configuration you can give more > >> > information and we can find a solution. > >> > > >> > Kind Regards, > >> > Furkan KAMACI > >> > > >> > On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya < > >> watt...@gmail.com> > >> > wrote: > >> > > >> >> Hi furkan, > >> >> > >> >> Thanks for your reply, it is generally a query heavy system. We are > >> using > >> >> realtime indexing for editing the available data > >> >> > >> >> Regards, > >> >> Lasitha > >> >> > >> >> Lasitha Wattaladeniya > >> >> Software Engineer > >> >> > >> >> Mobile : +6593896893 <+65%209389%206893> > >> >> Blog : techreadme.blogspot.com > >> >> > >> >> On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI < > furkankam...@gmail.com> > >> >> wrote: > >> >> > >> >>> Hi Lasitha, > >> >>> > >> >>> What is your indexing / querying requirements. Do you have an index > >> >>> heavy/light - query heavy/light system? > >> >>> > >> >>> Kind Regards, > >> >>> Furkan KAMACI > >> >>> > >> >>> On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya < > >> >>> watt...@gmail.com> > >> >>> wrote: > >> >>> > >> >>> > Hello devs,
Re: Very long young generation stop the world GC pause
Thanks a lot, PushKar! And sorry for late response. Our OS ram is 128GB. And we have 2 solr nodes on one machine. Each solr node has max heap size 32GB. And we do not have swap. -- View this message in context: http://lucene.472066.n3.nabble.com/Very-long-young-generation-stop-the-world-GC-pause-tp4308911p4310291.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stemming with SOLR
Thank you all for the replies. I am considering the suggestions On 17 Dec 2016 01:50, "Susheel Kumar"wrote: > To handle irregular nouns ( > http://www.ef.com/english-resources/english-grammar/ > singular-and-plural-nouns/), > the simplest way is handle them using StemOverriderFactory. The list is > not so long. Or otherwise go for commercial solutions like basistech etc. > as Alex suggested oR you can customize Hunspell extensively to handle most > of them. > > Thanks, > Susheel > > On Thu, Dec 15, 2016 at 9:46 PM, Alexandre Rafalovitch > > wrote: > > > If you need the full fidelity solution taking care of multiple > > edge-cases, it could be worth looking at commercial solutions. > > > > > > http://www.basistech.com/ has one, including a free-level SAAS plan. > > > > Regards, > >Alex. > > > > http://www.solr-start.com/ - Resources for Solr users, new and > experienced > > > > > > On 15 December 2016 at 21:28, Lasitha Wattaladeniya > > wrote: > > > Hi all, > > > > > > Thanks for the replies, > > > > > > @eric, ahmet : since those stemmers are logical stemmers it won't work > on > > > words such as caught, ran and so on. So in our case it won't work > > > > > > @susheel : Yes I thought about it but problems we have is, the > documents > > we > > > index are some what large text, so copy fielding these into duplicate > > > fields will affect on the index time ( we have jobs to index data > > > periodically) and query time. I wonder why there isn't a correct > solution > > > to this > > > > > > Regards, > > > Lasitha > > > > > > Lasitha Wattaladeniya > > > Software Engineer > > > > > > Mobile : +6593896893 > > > Blog : techreadme.blogspot.com > > > > > > On Fri, Dec 16, 2016 at 12:58 AM, Susheel Kumar > > > > wrote: > > > > > >> We did extensive comparison in the past for Snowball, KStem and > Hunspell > > >> and there are cases where one of them works better but not other or > > >> vice-versa. You may utilise all three of them by having 3 different > > fields > > >> (fieldTypes) and during query, search in all of them. > > >> > > >> For some of the cases where none of them works (e.g wolves, wolf > etc)., > > use > > >> StemOverriderFactory. > > >> > > >> HTH. > > >> > > >> Thanks, > > >> Susheel > > >> > > >> On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan > > > > >> wrote: > > >> > > >> > Hi, > > >> > > > >> > KStemFilter returns legitimate English words, please use it. > > >> > > > >> > Ahmet > > >> > > > >> > > > >> > > > >> > On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya < > > >> > watt...@gmail.com> wrote: > > >> > Hello devs, > > >> > > > >> > I'm trying to develop this indexing and querying flow where it > > converts > > >> the > > >> > words to its original form (lemmatization). I was doing bit of > > research > > >> > lately but the information on the internet is very limited. I tried > > using > > >> > hunspellfactory but it doesn't convert the word to it's original > form, > > >> > instead it gives suggestions for some words (hunspell works for some > > >> > english words correctly but for some it gives multiple suggestions > or > > no > > >> > suggestions, i used the en_us.dic provided by openoffice) > > >> > > > >> > I know this is a generic problem in searching, so is there anyone > who > > can > > >> > point me to correct direction or some information :) > > >> > > > >> > Best regards, > > >> > Lasitha Wattaladeniya > > >> > Software Engineer > > >> > > > >> > Mobile : +6593896893 > > >> > Blog : techreadme.blogspot.com > > >> > > > >> > > >
Re: Solr on HDFS: Streaming API performance tuning
Ok, based on the stack trace I suspect one of your sort fields has NULL values, which in the 5x branch could produce null pointers if a segment had no values for a sort field. This is also fixed in the Solr 6x branch. Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Dec 17, 2016 at 2:44 PM, Chetas Joshiwrote: > Here is the stack trace. > > java.lang.NullPointerException > > at > org.apache.solr.client.solrj.io.comp.FieldComparator$2. > compare(FieldComparator.java:85) > > at > org.apache.solr.client.solrj.io.comp.FieldComparator. > compare(FieldComparator.java:92) > > at > org.apache.solr.client.solrj.io.comp.FieldComparator. > compare(FieldComparator.java:30) > > at > org.apache.solr.client.solrj.io.comp.MultiComp.compare(MultiComp.java:45) > > at > org.apache.solr.client.solrj.io.comp.MultiComp.compare(MultiComp.java:33) > > at > org.apache.solr.client.solrj.io.stream.CloudSolrStream$ > TupleWrapper.compareTo(CloudSolrStream.java:396) > > at > org.apache.solr.client.solrj.io.stream.CloudSolrStream$ > TupleWrapper.compareTo(CloudSolrStream.java:381) > > at java.util.TreeMap.put(TreeMap.java:560) > > at java.util.TreeSet.add(TreeSet.java:255) > > at > org.apache.solr.client.solrj.io.stream.CloudSolrStream._ > read(CloudSolrStream.java:366) > > at > org.apache.solr.client.solrj.io.stream.CloudSolrStream. > read(CloudSolrStream.java:353) > > at > > *.*.*.*.SolrStreamResultIterator$$anon$1.run(SolrStreamResultIterator. > scala:101) > > at java.lang.Thread.run(Thread.java:745) > > 16/11/17 13:04:31 *ERROR* SolrStreamResultIterator:missing exponent > number: > char=A,position=106596 > BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA' > AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj' > > org.noggit.JSONParser$ParseException: missing exponent number: > char=A,position=106596 > BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA' > AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj' > > at org.noggit.JSONParser.err(JSONParser.java:356) > > at org.noggit.JSONParser.readExp(JSONParser.java:513) > > at org.noggit.JSONParser.readNumber(JSONParser.java:419) > > at org.noggit.JSONParser.next(JSONParser.java:845) > > at org.noggit.JSONParser.nextEvent(JSONParser.java:951) > > at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:127) > > at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:57) > > at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:37) > > at > org.apache.solr.client.solrj.io.stream.JSONTupleStream. > next(JSONTupleStream.java:84) > > at > org.apache.solr.client.solrj.io.stream.SolrStream.read( > SolrStream.java:147) > > at > org.apache.solr.client.solrj.io.stream.CloudSolrStream$TupleWrapper.next( > CloudSolrStream.java:413) > > at > org.apache.solr.client.solrj.io.stream.CloudSolrStream._ > read(CloudSolrStream.java:365) > > at > org.apache.solr.client.solrj.io.stream.CloudSolrStream. > read(CloudSolrStream.java:353) > > > Thanks! > > On Fri, Dec 16, 2016 at 11:45 PM, Reth RM wrote: > > > If you could provide the json parse exception stack trace, it might help > to > > predict issue there. > > > > > > On Fri, Dec 16, 2016 at 5:52 PM, Chetas Joshi > > wrote: > > > > > Hi Joel, > > > > > > The only NON alpha-numeric characters I have in my data are '+' and > '/'. > > I > > > don't have any backslashes. > > > > > > If the special characters was the issue, I should get the JSON parsing > > > exceptions every time irrespective of the index size and irrespective > of > > > the available memory on the machine. That is not the case here. The > > > streaming API successfully returns all the documents when the index > size > > is > > > small and fits in the available memory. That's the reason I am > confused. > > > > > > Thanks! > > > > > > On Fri, Dec 16, 2016 at 5:43 PM, Joel Bernstein > > > wrote: > > > > > > > The Streaming API may have been throwing exceptions because the JSON > > > > special characters were not escaped. This was fixed in Solr 6.0. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Joel Bernstein > > > > http://joelsolr.blogspot.com/ > > > > > > > > On Fri, Dec 16, 2016 at 4:34 PM, Chetas Joshi < > chetas.jo...@gmail.com> > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > I am running Solr 5.5.0. > > > > > It is a solrCloud of 50 nodes and I have the following config for > all > > > the > > > > > collections. > > > > > maxShardsperNode: 1 > > > > > replicationFactor: 1 > > > > > > > > > > I was using Streaming API to get back results from Solr. It worked > > fine > > > > for > > > > > a while until the index data size reached beyond 40 GB per shard > > (i.e. > > > > per > > > > > node). It started throwing JSON parsing exceptions while reading
Re: Confusing debug=timing parameter
Thank you Furkan. I am still a little confused. So I will shorten the response and post only the relevant pieces for easier understanding. "responseHeader": { "status": 0, "QTime": 2978 } "response": { "numFound": 1565135270, }, "debug": { "timing": { "time": 19320, "prepare": { "time": 4, "query": { "time": 3 }, "process": { "time": 19315, "query": { "time": 19309 } } } As I understand, QTime is the total time spent by the core. "process", "prepare" etc. are all the parts that together make the part of query processing. And so their times should approximately add up to the QTime. Numbers wise, I would have expected prepare-time + process-time <= QTime Or 4 + 19315 <= 2978 This is obviously not correct. Where am I making a mistake? Any pointers would be greatly appreciated. Thanks SG On Sun, Dec 18, 2016 at 4:40 AM, Furkan KAMACIwrote: > Hi, > > Let me explain you *time* *parameters in Solr*: > > *Timing* parameter of debug returns information about how long the query > took to process. > > *Query time* shows information of how long did it take in Solr to get the > search > results. It doesn't include reading bits from disk, etc. > > Also, there is another parameter named as *elapsed time*. It shows time > frame of the query sent to Solr and response is returned. Includes query > time, reading bits from disk, constructing the response and transmissioning > it, etc. > > Kind Regards, > Furkan KAMACI > > On Sat, Dec 17, 2016 at 6:43 PM, S G wrote: > > > Hi, > > > > I am using Solr 4.10 and its response time for the clients is not very > > good. > > Even though the Solr's plugin/stats shows less than 200 milliseconds, > > clients report several seconds in response time. > > > > So I tried using debug-timing parameter from the Solr UI and this is > what I > > got. > > Note how the QTime is 2978 while the time in debug-timing is 19320. > > > > What does this mean? > > How can Solr return a result in 3 seconds when time taken between two > > points in the same path is 20 seconds ? > > > > { > > "responseHeader": { > > "status": 0, > > "QTime": 2978, > > "params": { > > "q": "*:*", > > "debug": "timing", > > "indent": "true", > > "wt": "json", > > "_": "1481992653008" > > } > > }, > > "response": { > > "numFound": 1565135270, > > "start": 0, > > "maxScore": 1, > > "docs": [ > > > > ] > > }, > > "debug": { > > "timing": { > > "time": 19320, > > "prepare": { > > "time": 4, > > "query": { > > "time": 3 > > }, > > "facet": { > > "time": 0 > > }, > > "mlt": { > > "time": 0 > > }, > > "highlight": { > > "time": 0 > > }, > > "stats": { > > "time": 0 > > }, > > "expand": { > > "time": 0 > > }, > > "debug": { > > "time": 0 > > } > > }, > > "process": { > > "time": 19315, > > "query": { > > "time": 19309 > > }, > > "facet": { > > "time": 0 > > }, > > "mlt": { > > "time": 1 > > }, > > "highlight": { > > "time": 0 > > }, > > "stats": { > > "time": 0 > > }, > > "expand": { > > "time": 0 > > }, > > "debug": { > > "time": 5 > > } > > } > > } > > } > > } > > >
Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question
Wow, thanks. So assuming I have a five node ensemble and one machine is rolling along as leader, am I correct to assume that as a leader becomes taxed it can lose the election and another takes over as leader? The leader actually floats about the ensemble under load? I was thinking the leader was merely for referential integrity and things stayed that way until a physical failure. This would all seem important when building indexes. I think I need to set up a sniffer. Identifying the node with a hash id seems very cool. If my app makes the call to the server with the appropriate shard, then there might only be messaging on the Zookeeper network. Is this a correct assumption? Is my terminology cross threaded? Oh well, time to build my first cluster. I wrote all my clients with single shard collections on a stand alone. Now I need to make sure my app is not a cluster buster. I feel like I am on the right path. Thanks and Best, GW On 18 December 2016 at 09:53, Dorian Hoxhawrote: > On Sun, Dec 18, 2016 at 3:48 PM, GW wrote: > > > Yeah, > > > > > > I'll look at the proxy you suggested shortly. > > > > I've discovered that the idea of making a zookeeper aware app is > pointless > > when scripting REST calls right after I installed libzookeeper. > > > > Zookeeper is there to provide the zookeeping for Solr: End of story. Me > > thinks > > > > I believe what really has to happen is: connect to the admin API to get > > status > > > > /solr/admin/collections?action=CLUSTERSTATUS > > > > I think it is more sensible to make a cluster aware app. > > > > 1 > name="shards"> > name="range">8000-7fffactive > name="replicas"> > name="core">FrogMerchants_shard1_replica1 > > http://10.128.0.2:8983/solr > name="node_name">10.128.0.2:8983_solr > name="state">active > name="leader">true > > > > I can get an array of nodes that have a state of active. So if I have 7 > > nodes that are state = active, I will have those in an array. Then I can > > use rand() funtion with an array count to select a node/url to post a > json > > string. It would eliminate the need for a load balancer. I think. > > > If you send to random(node), there is high chance(increasing with number of > nodes/shards) that node won't have the leader, so that node will also > redirect it to the leader. What you can do, is compute the hash of the 'id' > field locally. with hash-id you will get shard-id (because each shard has > the hash-range), and with shard, you will find the leader, and you will > find on which node the leader is (cluster-status) and send the request > directly to the leader and be certain that it won't be redirected again > (less network hops). > > > > //pseudo code > > > > $array_count = $count($active_nodes) > > > > $url_target = rand(0, $array_count); > > > > // creat a function to pull the url somthing like > > > > > > $url = get_solr_url($url_target); > > > > I have test sever on my bench. I'll spin up a 5 node cluster today, get > my > > app cluster aware and then get into some Solr indexes with Vi and totally > > screw with some shards. > > > > If I am correct I will post again. > > > > Best, > > > > GW > > > > On 15 December 2016 at 12:34, Shawn Heisey wrote: > > > > > On 12/14/2016 7:36 AM, GW wrote: > > > > I understand accessing solr directly. I'm doing REST calls to a > single > > > > machine. > > > > > > > > If I have a cluster of five servers and say three Apache servers, I > can > > > > round robin the REST calls to all five in the cluster? > > > > > > > > I guess I'm going to find out. :-) If so I might be better off just > > > > running Apache on all my solr instances. > > > > > > If you're running SolrCloud (which uses zookeeper) then sending > multiple > > > query requests to any node will load balance the requests across all > > > replicas for the collection. This is an inherent feature of SolrCloud. > > > Indexing requests will be forwarded to the correct place. > > > > > > The node you're sending to is a potential single point of failure, > which > > > you can eliminate by putting a load balancer in front of Solr that > > > connects to at least two of the nodes. As I just mentioned, SolrCloud > > > will do further load balancing to all nodes which are capable of > serving > > > the requests. > > > > > > I use haproxy for a load balancer in front of Solr. I'm not running in > > > Cloud mode, but a load balancer would also work for Cloud, and is > > > required for high availability when your client only connects to one > > > server and isn't cloud aware. > > > > > > http://www.haproxy.org/ > > > > > > Solr includes a cloud-aware Java client that talks to zookeeper and > > > always knows the state of the cloud. This eliminates the requirement > > > for a load balancer, but using that client would require that you write > > > your website in Java. > > > > > > The PHP clients are third-party software, and
Re: Separating Search and Indexing in SolrCloud
Stop 16 дек. 2016 г. 3:31 PM пользователь "Jaroslaw Rozanski" < m...@jarekrozanski.com> написал: > Hi all, > > According to documentation, in normal operation (not recovery) in Solr > Cloud configuration the leader sends updates it receives to all the > replicas. > > This means and all nodes in the shard perform same effort to index > single document. Correct? > > Is there then a benefit to *not* to send search requests to leader, but > only to replicas? > > Given index & search heavy Solr Cloud system, is it possible to separate > search from indexing nodes? > > > RE: Solr 5.5.0 > > -- > Jaroslaw Rozanski | e: m...@jarekrozanski.com > 695E 436F A176 4961 7793 5C70 AFDF FB5E 682C 4D3D > >
Re: Separating Search and Indexing in SolrCloud
Analyzed documents. The transaction log stores the raw input. On Sun, Dec 18, 2016 at 5:32 AM, Jaroslaw Rozanskiwrote: > Hi Erick, > > > Not talking about separation any more. I merely summarized message from > Pushkar. As I said it was clear that it was not possible. > > > About the RAMBufferSizeMB, getting back to my original question, is this > buffer for storing update requests or ready to index, analyzed documents? > > Documentation suggests former, your first mention however suggests the > later. > > > Thanks, > Jaroslaw > > > On 18/12/16 02:16, Erick Erickson wrote: >> Yes indexing is adding stress. No you can't separate >> the two in SolrCloud. End of story, why beat it to death? >> You'll have to figure out the sharding strategy that >> meets your indexing and querying needs and live >> within that framework. I'd advise setting up a small >> cluster and driving it to its tipping point and extrapolating >> from there. Here's the long version of "the sizing exercise". >> >> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ >> >> My point that while indexing to Solr/Lucene there is >> additional pressure. That pressure has a fixed upper >> limit that doesn't grow with the number of docs. That's not >> true for searching, as you add more docs per node, the >> pressure (especially memory) increases. Concentrate >> your efforts there IMO. >> >> Best >> Erick >> >> >> >> On Sat, Dec 17, 2016 at 12:54 PM, Jaroslaw Rozanski >> wrote: >>> Hi Erick, >>> >>> So what does this buffer represent? What does it actually store? Raw >>> update request or analyzed document? >>> >>> The documentation suggest that it stores actual update requests. >>> >>> Obviously analyzed document can and will occupy much more space than raw >>> one. Also analysis with create a lot of new allocations and subsequent >>> GC work. >>> >>> Yes, you are probably right that search puts more stress and is main >>> memory user but combination of: >>> - non-trivial analysis, >>> - high volume of updates and >>> - search on the same node >>> >>> seems adding fuel to the fire. >>> >>> From previous response by Pushkar, it is clear that separation is not >>> achievable with existing SolrCloud mechanism. >>> >>> Thanks >>> >>> >>> On 17/12/16 20:24, Erick Erickson wrote: bq: I am more concerned with indexing memory requirements at volume By and large this isn't much of a problem. RAMBufferSizeMB in solrconfig.xml governs how much memory is consumed in Solr for indexing. When that limit is exceeded, the buffer is flushed to disk. I've rarely heard of indexing being a memory issue. Anecdotally I haven't seen throughput benefit with buffer sizes over 128M. You're correct in that master/slave style replication would use less memory on the slave, although there are other costs. I.e. rather than the data for document X being sent to the replicas once as in SolrCloud, that data is re-sent to the slave every time it's merged into a new segment. That said, memory issues are _far_ more prevalent on the search side of things so unless this is a proven issue in your environment I would fight other fires. Best, Erick On Fri, Dec 16, 2016 at 1:06 PM, Jaroslaw Rozanski wrote: > Thanks, that issue looks interesting! > > On 16/12/16 16:38, Pushkar Raste wrote: >> This kind of separation is not supported yet. There however some work >> going on, you can read about it on >> https://issues.apache.org/jira/browse/SOLR-9835 >> >> This unfortunately would not support soft commits and hence would not be >> a >> good solution for near real time indexing. >> >> On Dec 16, 2016 7:44 AM, "Jaroslaw Rozanski" >> wrote: >> >>> Sorry, not what I meant. >>> >>> Leader is responsible for distributing update requests to replica. So >>> eventually all replicas have same state as leader. Not a problem. >>> >>> It is more about the performance of such. If I gather correctly normal >>> replication happens by standard update request. Not by, say, segment >>> copy. >>> >>> Which means update on leader is as "expensive" as on replica. >>> >>> Hence, if my understanding is correct, sending search request to replica >>> only, in index heavy environment, would bring no benefit. >>> >>> So the question is: is there a mechanism, in SolrCloud (not legacy >>> master/slave set-up) to make one node take a load of indexing which >>> other nodes focus on searching. >>> >>> This is not a question of SolrClient cause that is clear how to direct >>> search request to specific nodes. This is more about index optimization >>> so that certain nodes (ie. replicas) could suffer less due
Re: Soft commit and reading data just after the commit
1 ms autocommit is far too frequent. And it's not helping you anyway. There is some lag between when a commit happens and when the docs are really available. The sequence is: 1> commit (soft or hard-with-opensearcher=true doesn't matter). 2> a new searcher is opened and autowarming starts 3> until the new searcher is opened, queries continue to be served by the old searcher 4> the new searcher is fully opened 5> _new_ requests are served by the new searcher. 6> the last request is finished by the old searcher and it's closed. So what's probably happening is that you send docs and then send a query and Solr is still in step <3>. You can look at your admin UI pluginst/stats page or your log to see how long it takes for a searcher to open and adjust your expectations accordingly. If you want to fetch only the document (not try to get it by a search), Real Time Get is designed to insure that you always get the most recent copy whether it's searchable or not. All that said, Solr wasn't designed for autocommits that are that frequent. That's why the documentation talks about _Near_ Real Time. You may need to adjust your expectations. Best, Erick On Sun, Dec 18, 2016 at 6:49 AM, Dorian Hoxhawrote: > There's a very high probability that you're using the wrong tool for the > job if you need 1ms softCommit time. Especially when you always need it (ex > there are apps where you need commit-after-insert very rarely). > > So explain what you're using it for ? > > On Sun, Dec 18, 2016 at 3:38 PM, Lasitha Wattaladeniya > wrote: > >> Hi Furkan, >> >> Thanks for the links. I had read the first one but not the second one. I >> did read it after you sent. So in my current solrconfig.xml settings below >> are the configurations, >> >> >>${solr.autoSoftCommit.maxTime:1} >> >> >> >> >>15000 >>false >> >> >> The problem i'm facing is, just after adding the documents to solr using >> solrj, when I retrieve data from solr I am not getting the updated results. >> This happens time to time. Most of the time I get the correct data but in >> some occasions I get wrong results. so as you suggest, what the best >> practice to use here ? , should I wait 1 mili second before calling for >> updated results ? >> >> Regards, >> Lasitha >> >> Lasitha Wattaladeniya >> Software Engineer >> >> Mobile : +6593896893 >> Blog : techreadme.blogspot.com >> >> On Sun, Dec 18, 2016 at 8:46 PM, Furkan KAMACI >> wrote: >> >> > Hi Lasitha, >> > >> > First of all, did you check these: >> > >> > https://cwiki.apache.org/confluence/display/solr/Near+ >> Real+Time+Searching >> > https://lucidworks.com/blog/2013/08/23/understanding- >> > transaction-logs-softcommit-and-commit-in-sorlcloud/ >> > >> > after that, if you cannot adjust your configuration you can give more >> > information and we can find a solution. >> > >> > Kind Regards, >> > Furkan KAMACI >> > >> > On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya < >> watt...@gmail.com> >> > wrote: >> > >> >> Hi furkan, >> >> >> >> Thanks for your reply, it is generally a query heavy system. We are >> using >> >> realtime indexing for editing the available data >> >> >> >> Regards, >> >> Lasitha >> >> >> >> Lasitha Wattaladeniya >> >> Software Engineer >> >> >> >> Mobile : +6593896893 <+65%209389%206893> >> >> Blog : techreadme.blogspot.com >> >> >> >> On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI >> >> wrote: >> >> >> >>> Hi Lasitha, >> >>> >> >>> What is your indexing / querying requirements. Do you have an index >> >>> heavy/light - query heavy/light system? >> >>> >> >>> Kind Regards, >> >>> Furkan KAMACI >> >>> >> >>> On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya < >> >>> watt...@gmail.com> >> >>> wrote: >> >>> >> >>> > Hello devs, >> >>> > >> >>> > I'm here with another problem i'm facing. I'm trying to do a commit >> >>> (soft >> >>> > commit) through solrj and just after the commit, retrieve the data >> from >> >>> > solr (requirement is to get updated data list). >> >>> > >> >>> > I'm using soft commit instead of the hard commit, is previously I got >> >>> an >> >>> > error "Exceeded limit of maxWarmingSearchers=2, try again later" >> >>> because of >> >>> > too many commit requests. Now I have removed the explicit commit and >> >>> has >> >>> > let the solr to do the commit using autoSoftCommit *(1 mili second)* >> >>> and >> >>> > autoCommit *(30 seconds)* configurations. Now I'm not getting any >> >>> errors >> >>> > when i'm committing frequently. >> >>> > >> >>> > The problem i'm facing now is, I'm not getting the updated data when >> I >> >>> > fetch from solr just after the soft commit. So in this case what are >> >>> the >> >>> > best practices to use ? to wait 1 mili second before retrieving data >> >>> after >> >>> > soft commit ? I don't feel like waiting from client side is a good >> >>> option. >> >>> > Please give me some help from your expert knowledge
Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question
On Sun, Dec 18, 2016 at 3:48 PM, GWwrote: > Yeah, > > > I'll look at the proxy you suggested shortly. > > I've discovered that the idea of making a zookeeper aware app is pointless > when scripting REST calls right after I installed libzookeeper. > > Zookeeper is there to provide the zookeeping for Solr: End of story. Me > thinks > > I believe what really has to happen is: connect to the admin API to get > status > > /solr/admin/collections?action=CLUSTERSTATUS > > I think it is more sensible to make a cluster aware app. > > 1 name="shards"> name="range">8000-7fffactive name="replicas"> name="core">FrogMerchants_shard1_replica1 > http://10.128.0.2:8983/solr name="node_name">10.128.0.2:8983_solr name="state">active name="leader">true > > I can get an array of nodes that have a state of active. So if I have 7 > nodes that are state = active, I will have those in an array. Then I can > use rand() funtion with an array count to select a node/url to post a json > string. It would eliminate the need for a load balancer. I think. > If you send to random(node), there is high chance(increasing with number of nodes/shards) that node won't have the leader, so that node will also redirect it to the leader. What you can do, is compute the hash of the 'id' field locally. with hash-id you will get shard-id (because each shard has the hash-range), and with shard, you will find the leader, and you will find on which node the leader is (cluster-status) and send the request directly to the leader and be certain that it won't be redirected again (less network hops). > //pseudo code > > $array_count = $count($active_nodes) > > $url_target = rand(0, $array_count); > > // creat a function to pull the url somthing like > > > $url = get_solr_url($url_target); > > I have test sever on my bench. I'll spin up a 5 node cluster today, get my > app cluster aware and then get into some Solr indexes with Vi and totally > screw with some shards. > > If I am correct I will post again. > > Best, > > GW > > On 15 December 2016 at 12:34, Shawn Heisey wrote: > > > On 12/14/2016 7:36 AM, GW wrote: > > > I understand accessing solr directly. I'm doing REST calls to a single > > > machine. > > > > > > If I have a cluster of five servers and say three Apache servers, I can > > > round robin the REST calls to all five in the cluster? > > > > > > I guess I'm going to find out. :-) If so I might be better off just > > > running Apache on all my solr instances. > > > > If you're running SolrCloud (which uses zookeeper) then sending multiple > > query requests to any node will load balance the requests across all > > replicas for the collection. This is an inherent feature of SolrCloud. > > Indexing requests will be forwarded to the correct place. > > > > The node you're sending to is a potential single point of failure, which > > you can eliminate by putting a load balancer in front of Solr that > > connects to at least two of the nodes. As I just mentioned, SolrCloud > > will do further load balancing to all nodes which are capable of serving > > the requests. > > > > I use haproxy for a load balancer in front of Solr. I'm not running in > > Cloud mode, but a load balancer would also work for Cloud, and is > > required for high availability when your client only connects to one > > server and isn't cloud aware. > > > > http://www.haproxy.org/ > > > > Solr includes a cloud-aware Java client that talks to zookeeper and > > always knows the state of the cloud. This eliminates the requirement > > for a load balancer, but using that client would require that you write > > your website in Java. > > > > The PHP clients are third-party software, and as far as I know, are not > > cloud-aware. > > > > https://wiki.apache.org/solr/IntegratingSolr#PHP > > > > Some advantages of using a Solr client over creating HTTP requests > > yourself: The code is easier to write, and to read. You generally do > > not need to worry about making sure that your requests are properly > > escaped for URLs, XML, JSON, etc. The response to the requests is > > usually translated into data structures appropriate to the language -- > > your program probably doesn't need to know how to parse XML or JSON. > > > > Thanks, > > Shawn > > > > >
Re: Soft commit and reading data just after the commit
There's a very high probability that you're using the wrong tool for the job if you need 1ms softCommit time. Especially when you always need it (ex there are apps where you need commit-after-insert very rarely). So explain what you're using it for ? On Sun, Dec 18, 2016 at 3:38 PM, Lasitha Wattaladeniyawrote: > Hi Furkan, > > Thanks for the links. I had read the first one but not the second one. I > did read it after you sent. So in my current solrconfig.xml settings below > are the configurations, > > >${solr.autoSoftCommit.maxTime:1} > > > > >15000 >false > > > The problem i'm facing is, just after adding the documents to solr using > solrj, when I retrieve data from solr I am not getting the updated results. > This happens time to time. Most of the time I get the correct data but in > some occasions I get wrong results. so as you suggest, what the best > practice to use here ? , should I wait 1 mili second before calling for > updated results ? > > Regards, > Lasitha > > Lasitha Wattaladeniya > Software Engineer > > Mobile : +6593896893 > Blog : techreadme.blogspot.com > > On Sun, Dec 18, 2016 at 8:46 PM, Furkan KAMACI > wrote: > > > Hi Lasitha, > > > > First of all, did you check these: > > > > https://cwiki.apache.org/confluence/display/solr/Near+ > Real+Time+Searching > > https://lucidworks.com/blog/2013/08/23/understanding- > > transaction-logs-softcommit-and-commit-in-sorlcloud/ > > > > after that, if you cannot adjust your configuration you can give more > > information and we can find a solution. > > > > Kind Regards, > > Furkan KAMACI > > > > On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya < > watt...@gmail.com> > > wrote: > > > >> Hi furkan, > >> > >> Thanks for your reply, it is generally a query heavy system. We are > using > >> realtime indexing for editing the available data > >> > >> Regards, > >> Lasitha > >> > >> Lasitha Wattaladeniya > >> Software Engineer > >> > >> Mobile : +6593896893 <+65%209389%206893> > >> Blog : techreadme.blogspot.com > >> > >> On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI > >> wrote: > >> > >>> Hi Lasitha, > >>> > >>> What is your indexing / querying requirements. Do you have an index > >>> heavy/light - query heavy/light system? > >>> > >>> Kind Regards, > >>> Furkan KAMACI > >>> > >>> On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya < > >>> watt...@gmail.com> > >>> wrote: > >>> > >>> > Hello devs, > >>> > > >>> > I'm here with another problem i'm facing. I'm trying to do a commit > >>> (soft > >>> > commit) through solrj and just after the commit, retrieve the data > from > >>> > solr (requirement is to get updated data list). > >>> > > >>> > I'm using soft commit instead of the hard commit, is previously I got > >>> an > >>> > error "Exceeded limit of maxWarmingSearchers=2, try again later" > >>> because of > >>> > too many commit requests. Now I have removed the explicit commit and > >>> has > >>> > let the solr to do the commit using autoSoftCommit *(1 mili second)* > >>> and > >>> > autoCommit *(30 seconds)* configurations. Now I'm not getting any > >>> errors > >>> > when i'm committing frequently. > >>> > > >>> > The problem i'm facing now is, I'm not getting the updated data when > I > >>> > fetch from solr just after the soft commit. So in this case what are > >>> the > >>> > best practices to use ? to wait 1 mili second before retrieving data > >>> after > >>> > soft commit ? I don't feel like waiting from client side is a good > >>> option. > >>> > Please give me some help from your expert knowledge > >>> > > >>> > Best regards, > >>> > Lasitha Wattaladeniya > >>> > Software Engineer > >>> > > >>> > Mobile : +6593896893 > >>> > Blog : techreadme.blogspot.com > >>> > > >>> > >> > >> > > >
Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question
Yeah, I'll look at the proxy you suggested shortly. I've discovered that the idea of making a zookeeper aware app is pointless when scripting REST calls right after I installed libzookeeper. Zookeeper is there to provide the zookeeping for Solr: End of story. Me thinks I believe what really has to happen is: connect to the admin API to get status /solr/admin/collections?action=CLUSTERSTATUS I think it is more sensible to make a cluster aware app. 18000-7fffactiveFrogMerchants_shard1_replica1 http://10.128.0.2:8983/solr10.128.0.2:8983_solractivetrue I can get an array of nodes that have a state of active. So if I have 7 nodes that are state = active, I will have those in an array. Then I can use rand() funtion with an array count to select a node/url to post a json string. It would eliminate the need for a load balancer. I think. //pseudo code $array_count = $count($active_nodes) $url_target = rand(0, $array_count); // creat a function to pull the url somthing like $url = get_solr_url($url_target); I have test sever on my bench. I'll spin up a 5 node cluster today, get my app cluster aware and then get into some Solr indexes with Vi and totally screw with some shards. If I am correct I will post again. Best, GW On 15 December 2016 at 12:34, Shawn Heiseywrote: > On 12/14/2016 7:36 AM, GW wrote: > > I understand accessing solr directly. I'm doing REST calls to a single > > machine. > > > > If I have a cluster of five servers and say three Apache servers, I can > > round robin the REST calls to all five in the cluster? > > > > I guess I'm going to find out. :-) If so I might be better off just > > running Apache on all my solr instances. > > If you're running SolrCloud (which uses zookeeper) then sending multiple > query requests to any node will load balance the requests across all > replicas for the collection. This is an inherent feature of SolrCloud. > Indexing requests will be forwarded to the correct place. > > The node you're sending to is a potential single point of failure, which > you can eliminate by putting a load balancer in front of Solr that > connects to at least two of the nodes. As I just mentioned, SolrCloud > will do further load balancing to all nodes which are capable of serving > the requests. > > I use haproxy for a load balancer in front of Solr. I'm not running in > Cloud mode, but a load balancer would also work for Cloud, and is > required for high availability when your client only connects to one > server and isn't cloud aware. > > http://www.haproxy.org/ > > Solr includes a cloud-aware Java client that talks to zookeeper and > always knows the state of the cloud. This eliminates the requirement > for a load balancer, but using that client would require that you write > your website in Java. > > The PHP clients are third-party software, and as far as I know, are not > cloud-aware. > > https://wiki.apache.org/solr/IntegratingSolr#PHP > > Some advantages of using a Solr client over creating HTTP requests > yourself: The code is easier to write, and to read. You generally do > not need to worry about making sure that your requests are properly > escaped for URLs, XML, JSON, etc. The response to the requests is > usually translated into data structures appropriate to the language -- > your program probably doesn't need to know how to parse XML or JSON. > > Thanks, > Shawn > >
Re: Soft commit and reading data just after the commit
Hi Furkan, Thanks for the links. I had read the first one but not the second one. I did read it after you sent. So in my current solrconfig.xml settings below are the configurations, ${solr.autoSoftCommit.maxTime:1} 15000 false The problem i'm facing is, just after adding the documents to solr using solrj, when I retrieve data from solr I am not getting the updated results. This happens time to time. Most of the time I get the correct data but in some occasions I get wrong results. so as you suggest, what the best practice to use here ? , should I wait 1 mili second before calling for updated results ? Regards, Lasitha Lasitha Wattaladeniya Software Engineer Mobile : +6593896893 Blog : techreadme.blogspot.com On Sun, Dec 18, 2016 at 8:46 PM, Furkan KAMACIwrote: > Hi Lasitha, > > First of all, did you check these: > > https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching > https://lucidworks.com/blog/2013/08/23/understanding- > transaction-logs-softcommit-and-commit-in-sorlcloud/ > > after that, if you cannot adjust your configuration you can give more > information and we can find a solution. > > Kind Regards, > Furkan KAMACI > > On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya > wrote: > >> Hi furkan, >> >> Thanks for your reply, it is generally a query heavy system. We are using >> realtime indexing for editing the available data >> >> Regards, >> Lasitha >> >> Lasitha Wattaladeniya >> Software Engineer >> >> Mobile : +6593896893 <+65%209389%206893> >> Blog : techreadme.blogspot.com >> >> On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI >> wrote: >> >>> Hi Lasitha, >>> >>> What is your indexing / querying requirements. Do you have an index >>> heavy/light - query heavy/light system? >>> >>> Kind Regards, >>> Furkan KAMACI >>> >>> On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya < >>> watt...@gmail.com> >>> wrote: >>> >>> > Hello devs, >>> > >>> > I'm here with another problem i'm facing. I'm trying to do a commit >>> (soft >>> > commit) through solrj and just after the commit, retrieve the data from >>> > solr (requirement is to get updated data list). >>> > >>> > I'm using soft commit instead of the hard commit, is previously I got >>> an >>> > error "Exceeded limit of maxWarmingSearchers=2, try again later" >>> because of >>> > too many commit requests. Now I have removed the explicit commit and >>> has >>> > let the solr to do the commit using autoSoftCommit *(1 mili second)* >>> and >>> > autoCommit *(30 seconds)* configurations. Now I'm not getting any >>> errors >>> > when i'm committing frequently. >>> > >>> > The problem i'm facing now is, I'm not getting the updated data when I >>> > fetch from solr just after the soft commit. So in this case what are >>> the >>> > best practices to use ? to wait 1 mili second before retrieving data >>> after >>> > soft commit ? I don't feel like waiting from client side is a good >>> option. >>> > Please give me some help from your expert knowledge >>> > >>> > Best regards, >>> > Lasitha Wattaladeniya >>> > Software Engineer >>> > >>> > Mobile : +6593896893 >>> > Blog : techreadme.blogspot.com >>> > >>> >> >> >
Re: Separating Search and Indexing in SolrCloud
Hi Erick, Not talking about separation any more. I merely summarized message from Pushkar. As I said it was clear that it was not possible. About the RAMBufferSizeMB, getting back to my original question, is this buffer for storing update requests or ready to index, analyzed documents? Documentation suggests former, your first mention however suggests the later. Thanks, Jaroslaw On 18/12/16 02:16, Erick Erickson wrote: > Yes indexing is adding stress. No you can't separate > the two in SolrCloud. End of story, why beat it to death? > You'll have to figure out the sharding strategy that > meets your indexing and querying needs and live > within that framework. I'd advise setting up a small > cluster and driving it to its tipping point and extrapolating > from there. Here's the long version of "the sizing exercise". > > https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > My point that while indexing to Solr/Lucene there is > additional pressure. That pressure has a fixed upper > limit that doesn't grow with the number of docs. That's not > true for searching, as you add more docs per node, the > pressure (especially memory) increases. Concentrate > your efforts there IMO. > > Best > Erick > > > > On Sat, Dec 17, 2016 at 12:54 PM, Jaroslaw Rozanski >wrote: >> Hi Erick, >> >> So what does this buffer represent? What does it actually store? Raw >> update request or analyzed document? >> >> The documentation suggest that it stores actual update requests. >> >> Obviously analyzed document can and will occupy much more space than raw >> one. Also analysis with create a lot of new allocations and subsequent >> GC work. >> >> Yes, you are probably right that search puts more stress and is main >> memory user but combination of: >> - non-trivial analysis, >> - high volume of updates and >> - search on the same node >> >> seems adding fuel to the fire. >> >> From previous response by Pushkar, it is clear that separation is not >> achievable with existing SolrCloud mechanism. >> >> Thanks >> >> >> On 17/12/16 20:24, Erick Erickson wrote: >>> bq: I am more concerned with indexing memory requirements at volume >>> >>> By and large this isn't much of a problem. RAMBufferSizeMB in >>> solrconfig.xml governs how much memory is consumed in Solr for >>> indexing. When that limit is exceeded, the buffer is flushed to disk. >>> I've rarely heard of indexing being a memory issue. Anecdotally I >>> haven't seen throughput benefit with buffer sizes over 128M. >>> >>> You're correct in that master/slave style replication would use less >>> memory on the slave, although there are other costs. I.e. rather than >>> the data for document X being sent to the replicas once as in >>> SolrCloud, that data is re-sent to the slave every time it's merged >>> into a new segment. >>> >>> That said, memory issues are _far_ more prevalent on the search side >>> of things so unless this is a proven issue in your environment I would >>> fight other fires. >>> >>> Best, >>> Erick >>> >>> On Fri, Dec 16, 2016 at 1:06 PM, Jaroslaw Rozanski >>> wrote: Thanks, that issue looks interesting! On 16/12/16 16:38, Pushkar Raste wrote: > This kind of separation is not supported yet. There however some work > going on, you can read about it on > https://issues.apache.org/jira/browse/SOLR-9835 > > This unfortunately would not support soft commits and hence would not be a > good solution for near real time indexing. > > On Dec 16, 2016 7:44 AM, "Jaroslaw Rozanski" > wrote: > >> Sorry, not what I meant. >> >> Leader is responsible for distributing update requests to replica. So >> eventually all replicas have same state as leader. Not a problem. >> >> It is more about the performance of such. If I gather correctly normal >> replication happens by standard update request. Not by, say, segment >> copy. >> >> Which means update on leader is as "expensive" as on replica. >> >> Hence, if my understanding is correct, sending search request to replica >> only, in index heavy environment, would bring no benefit. >> >> So the question is: is there a mechanism, in SolrCloud (not legacy >> master/slave set-up) to make one node take a load of indexing which >> other nodes focus on searching. >> >> This is not a question of SolrClient cause that is clear how to direct >> search request to specific nodes. This is more about index optimization >> so that certain nodes (ie. replicas) could suffer less due to high >> volume indexing while serving search requests. >> >> >> >> >> On 16/12/16 12:35, Dorian Hoxha wrote: >>> The leader is the source of truth. You expect to make the replica the >>> source of truth or something???Doesn't make sense? >>>
Re: Soft commit and reading data just after the commit
Hi Lasitha, First of all, did you check these: https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ after that, if you cannot adjust your configuration you can give more information and we can find a solution. Kind Regards, Furkan KAMACI On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniyawrote: > Hi furkan, > > Thanks for your reply, it is generally a query heavy system. We are using > realtime indexing for editing the available data > > Regards, > Lasitha > > Lasitha Wattaladeniya > Software Engineer > > Mobile : +6593896893 <+65%209389%206893> > Blog : techreadme.blogspot.com > > On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI > wrote: > >> Hi Lasitha, >> >> What is your indexing / querying requirements. Do you have an index >> heavy/light - query heavy/light system? >> >> Kind Regards, >> Furkan KAMACI >> >> On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya < >> watt...@gmail.com> >> wrote: >> >> > Hello devs, >> > >> > I'm here with another problem i'm facing. I'm trying to do a commit >> (soft >> > commit) through solrj and just after the commit, retrieve the data from >> > solr (requirement is to get updated data list). >> > >> > I'm using soft commit instead of the hard commit, is previously I got an >> > error "Exceeded limit of maxWarmingSearchers=2, try again later" >> because of >> > too many commit requests. Now I have removed the explicit commit and has >> > let the solr to do the commit using autoSoftCommit *(1 mili second)* and >> > autoCommit *(30 seconds)* configurations. Now I'm not getting any errors >> > when i'm committing frequently. >> > >> > The problem i'm facing now is, I'm not getting the updated data when I >> > fetch from solr just after the soft commit. So in this case what are the >> > best practices to use ? to wait 1 mili second before retrieving data >> after >> > soft commit ? I don't feel like waiting from client side is a good >> option. >> > Please give me some help from your expert knowledge >> > >> > Best regards, >> > Lasitha Wattaladeniya >> > Software Engineer >> > >> > Mobile : +6593896893 >> > Blog : techreadme.blogspot.com >> > >> > >
Re: Confusing debug=timing parameter
Hi, Let me explain you *time* *parameters in Solr*: *Timing* parameter of debug returns information about how long the query took to process. *Query time* shows information of how long did it take in Solr to get the search results. It doesn't include reading bits from disk, etc. Also, there is another parameter named as *elapsed time*. It shows time frame of the query sent to Solr and response is returned. Includes query time, reading bits from disk, constructing the response and transmissioning it, etc. Kind Regards, Furkan KAMACI On Sat, Dec 17, 2016 at 6:43 PM, S Gwrote: > Hi, > > I am using Solr 4.10 and its response time for the clients is not very > good. > Even though the Solr's plugin/stats shows less than 200 milliseconds, > clients report several seconds in response time. > > So I tried using debug-timing parameter from the Solr UI and this is what I > got. > Note how the QTime is 2978 while the time in debug-timing is 19320. > > What does this mean? > How can Solr return a result in 3 seconds when time taken between two > points in the same path is 20 seconds ? > > { > "responseHeader": { > "status": 0, > "QTime": 2978, > "params": { > "q": "*:*", > "debug": "timing", > "indent": "true", > "wt": "json", > "_": "1481992653008" > } > }, > "response": { > "numFound": 1565135270, > "start": 0, > "maxScore": 1, > "docs": [ > > ] > }, > "debug": { > "timing": { > "time": 19320, > "prepare": { > "time": 4, > "query": { > "time": 3 > }, > "facet": { > "time": 0 > }, > "mlt": { > "time": 0 > }, > "highlight": { > "time": 0 > }, > "stats": { > "time": 0 > }, > "expand": { > "time": 0 > }, > "debug": { > "time": 0 > } > }, > "process": { > "time": 19315, > "query": { > "time": 19309 > }, > "facet": { > "time": 0 > }, > "mlt": { > "time": 1 > }, > "highlight": { > "time": 0 > }, > "stats": { > "time": 0 > }, > "expand": { > "time": 0 > }, > "debug": { > "time": 5 > } > } > } > } > } >
Re: Soft commit and reading data just after the commit
Hi furkan, Thanks for your reply, it is generally a query heavy system. We are using realtime indexing for editing the available data Regards, Lasitha Lasitha Wattaladeniya Software Engineer Mobile : +6593896893 Blog : techreadme.blogspot.com On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACIwrote: > Hi Lasitha, > > What is your indexing / querying requirements. Do you have an index > heavy/light - query heavy/light system? > > Kind Regards, > Furkan KAMACI > > On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya > > wrote: > > > Hello devs, > > > > I'm here with another problem i'm facing. I'm trying to do a commit (soft > > commit) through solrj and just after the commit, retrieve the data from > > solr (requirement is to get updated data list). > > > > I'm using soft commit instead of the hard commit, is previously I got an > > error "Exceeded limit of maxWarmingSearchers=2, try again later" because > of > > too many commit requests. Now I have removed the explicit commit and has > > let the solr to do the commit using autoSoftCommit *(1 mili second)* and > > autoCommit *(30 seconds)* configurations. Now I'm not getting any errors > > when i'm committing frequently. > > > > The problem i'm facing now is, I'm not getting the updated data when I > > fetch from solr just after the soft commit. So in this case what are the > > best practices to use ? to wait 1 mili second before retrieving data > after > > soft commit ? I don't feel like waiting from client side is a good > option. > > Please give me some help from your expert knowledge > > > > Best regards, > > Lasitha Wattaladeniya > > Software Engineer > > > > Mobile : +6593896893 > > Blog : techreadme.blogspot.com > > >
Re: Soft commit and reading data just after the commit
Hi Lasitha, What is your indexing / querying requirements. Do you have an index heavy/light - query heavy/light system? Kind Regards, Furkan KAMACI On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniyawrote: > Hello devs, > > I'm here with another problem i'm facing. I'm trying to do a commit (soft > commit) through solrj and just after the commit, retrieve the data from > solr (requirement is to get updated data list). > > I'm using soft commit instead of the hard commit, is previously I got an > error "Exceeded limit of maxWarmingSearchers=2, try again later" because of > too many commit requests. Now I have removed the explicit commit and has > let the solr to do the commit using autoSoftCommit *(1 mili second)* and > autoCommit *(30 seconds)* configurations. Now I'm not getting any errors > when i'm committing frequently. > > The problem i'm facing now is, I'm not getting the updated data when I > fetch from solr just after the soft commit. So in this case what are the > best practices to use ? to wait 1 mili second before retrieving data after > soft commit ? I don't feel like waiting from client side is a good option. > Please give me some help from your expert knowledge > > Best regards, > Lasitha Wattaladeniya > Software Engineer > > Mobile : +6593896893 > Blog : techreadme.blogspot.com >
Soft commit and reading data just after the commit
Hello devs, I'm here with another problem i'm facing. I'm trying to do a commit (soft commit) through solrj and just after the commit, retrieve the data from solr (requirement is to get updated data list). I'm using soft commit instead of the hard commit, is previously I got an error "Exceeded limit of maxWarmingSearchers=2, try again later" because of too many commit requests. Now I have removed the explicit commit and has let the solr to do the commit using autoSoftCommit *(1 mili second)* and autoCommit *(30 seconds)* configurations. Now I'm not getting any errors when i'm committing frequently. The problem i'm facing now is, I'm not getting the updated data when I fetch from solr just after the soft commit. So in this case what are the best practices to use ? to wait 1 mili second before retrieving data after soft commit ? I don't feel like waiting from client side is a good option. Please give me some help from your expert knowledge Best regards, Lasitha Wattaladeniya Software Engineer Mobile : +6593896893 Blog : techreadme.blogspot.com