Re: Highlighting the search keywords
Nope, that is how it works. It is not in place. > On 31 Jul 2018, at 21:57, Renuka Srishti wrote: > > Hi All, > > I was using highlighting in solr, solr gives highlighting results within > the response but not included within the documents. > Am i missing something? Can i configure so that it can show highlighted > keywords matched within the documents. > > Thanks > Renuka Srishti
Re: Cannot train 2 or more features for Solr LTR using LIBLINEAR
Hi, Anyone has any information on this? Regards, Edwin On Mon, 30 Jul 2018 at 11:15, Zheng Lin Edwin Yeo wrote: > Hi, > > I am using the Solr LTR in Solr 7.4.0, and I am trying to train an example > learning model using LIBLINEAR. > > When I tried to run the code from train_and_upload_demo_model.py , I can > only train one feature at a time. If I put more than one features, then I > will get the following error > > Traceback (most recent call last): > > File "train_and_upload_demo_model.py", line 182, in > > sys.exit(main()) > > File "train_and_upload_demo_model.py", line 169, in main > > > formatter.processQueryDocFeatureVector(fvGenerator,config["trainingFile"]); > > File > "/cygdrive/c/Users/edwin/Desktop/solr-7.4.0/contrib/ltr/myModel/libsvm_formatter.py", > line 25, in processQueryDocFeatureVector > > curListOfFv.append((relevance,self._makeFeaturesMap(featureVector))) > > File > "/cygdrive/c/Users/edwin/Desktop/solr-7.4.0/contrib/ltr/myModel/libsvm_formatter.py", > line 35, in _makeFeaturesMap > > featName,featValue = keyValuePairStr.split(":"); > > ValueError: too many values to unpack > > > Is there any way that we can do so that we can train 2 or more features at > the same time? > > Regards, > Edwin > >
Re: SolrCloud: Different replicationFactor for different shards in same collection
This feels like more work than necessary, especially the bit: "which will require modification in Solr code". If your needs are to co-locate various groups of documents on specific nodes, composite id (the default) routing has the ability to cluster docs together, see: https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html the "document routing" section. You can also route queries to those shards only, see: https://lucidworks.com/2013/06/13/solr-cloud-document-routing/ If that isn't sufficient, using "implicit" routing allows you to send document to specific shards. True, in both cases the _client_ has to assign the doc to a particular shard based on whatever criteria you need, but that seems like less work than changing Solr code. Best, Erick On Tue, Jul 31, 2018 at 5:20 PM, Nawab Zada Asad Iqbal wrote: > Thanks Erick > > > This is for future. I am exploring to use a custom sharding scheme (which > will require modification in Solr code) together with the benefits of > SolrCloud. > > > > Thanks > Nawab > > > > On Tue, Jul 31, 2018 at 4:51 PM, Erick Erickson > wrote: > >> Sure, just use the Collections API ADDREPLICA command to add as many >> replicas for specific shards as you want. There's no way to specify >> that at creation time though. >> >> Some of the new autoscaling can do this automatically I believe. >> >> I have to ask what it is about your collection that this is true. If >> you're using the default composite id routing having one shard get >> substantially more queries than the others is unexpected. >> >> If you're using implicit routing then it's perfectly understandable. >> >> Best, >> Erick >> >> On Tue, Jul 31, 2018 at 4:12 PM, Nawab Zada Asad Iqbal >> wrote: >> > Hi, >> > >> > I am looking at Solr 7.x and couldn't find an answer in the >> documentation. >> > Is it possibly to specify different replicationFactor for different >> shards >> > in same collection? E.g. if a certain shard is receiving more queries >> than >> > rest of the collection I would like to add more replicas for it to help >> > with the query load. >> > >> > >> > >> > Thanks >> > Nawab >>
Re: SolrCloud: Different replicationFactor for different shards in same collection
Thanks Erick This is for future. I am exploring to use a custom sharding scheme (which will require modification in Solr code) together with the benefits of SolrCloud. Thanks Nawab On Tue, Jul 31, 2018 at 4:51 PM, Erick Erickson wrote: > Sure, just use the Collections API ADDREPLICA command to add as many > replicas for specific shards as you want. There's no way to specify > that at creation time though. > > Some of the new autoscaling can do this automatically I believe. > > I have to ask what it is about your collection that this is true. If > you're using the default composite id routing having one shard get > substantially more queries than the others is unexpected. > > If you're using implicit routing then it's perfectly understandable. > > Best, > Erick > > On Tue, Jul 31, 2018 at 4:12 PM, Nawab Zada Asad Iqbal > wrote: > > Hi, > > > > I am looking at Solr 7.x and couldn't find an answer in the > documentation. > > Is it possibly to specify different replicationFactor for different > shards > > in same collection? E.g. if a certain shard is receiving more queries > than > > rest of the collection I would like to add more replicas for it to help > > with the query load. > > > > > > > > Thanks > > Nawab >
Re: SolrCloud: Different replicationFactor for different shards in same collection
Sure, just use the Collections API ADDREPLICA command to add as many replicas for specific shards as you want. There's no way to specify that at creation time though. Some of the new autoscaling can do this automatically I believe. I have to ask what it is about your collection that this is true. If you're using the default composite id routing having one shard get substantially more queries than the others is unexpected. If you're using implicit routing then it's perfectly understandable. Best, Erick On Tue, Jul 31, 2018 at 4:12 PM, Nawab Zada Asad Iqbal wrote: > Hi, > > I am looking at Solr 7.x and couldn't find an answer in the documentation. > Is it possibly to specify different replicationFactor for different shards > in same collection? E.g. if a certain shard is receiving more queries than > rest of the collection I would like to add more replicas for it to help > with the query load. > > > > Thanks > Nawab
SolrCloud: Different replicationFactor for different shards in same collection
Hi, I am looking at Solr 7.x and couldn't find an answer in the documentation. Is it possibly to specify different replicationFactor for different shards in same collection? E.g. if a certain shard is receiving more queries than rest of the collection I would like to add more replicas for it to help with the query load. Thanks Nawab
Re: sharding and placement of replicas
Right, two JVMs on the same physical host with different ports are "different Solrs" by default. If you had two replicas per shard and both were on either Solr instance (same port) that would be unexpected. Problem is that this would have been a bug clear back in the Solr 4x days so the fact that you say you saw it on 6.6 would be unexpected. Of course if you have three replicas and two instances, I'd absolutely expect that two replicas would be on one of them for each shard. Best, Erick On Tue, Jul 31, 2018 at 12:24 PM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote: > In my case, when trying on Solr7.4 (in response to Shawn Heisey's 6/19/18 > comment "If this is a provable and reproducible bug, and it's still a problem > in the current stable branch"), I had only installed Solr7.4 on one host, and > so I was testing with two nodes on the same host (different port numbers). I > had previously had the same symptom when the two nodes were on different > hosts, but that was with Solr6.6 -- I can try it again with Solr7.4 with two > hosts and report back. > > -Original Message- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: Tuesday, July 31, 2018 2:26 PM > To: solr-user@lucene.apache.org > Subject: Re: sharding and placement of replicas > > On 7/27/2018 8:26 PM, Erick Erickson wrote: >> Yes with some fiddling as far as "placement rules", start here: >> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html >> >> The idea (IIUC) is that you provide a snitch" that identifies what >> "rack" the Solr instance is on and can define placement rules that >> define "don't put more than one thingy on the same rack". "Thingy" >> here is replica, shard, whatever as defined by other placement rules. > > I'd like to see an improvement in Solr's behavior when nothing has been > configured in auto-scaling or rule-based replica placement. Configuring > those things is certainly an option, but I think we can do better even > without that config. > > I believe that Solr already has some default intelligence that keeps > multiple replicas from ending up on the same *node* when possible ... I > would like this to also be aware of *hosts*. > > Craig hasn't yet indicated whether there is more than one node per host, > so I don't know whether the behavior he's seeing should be considered a bug. > > If somebody gives one machine multiple names/addresses and uses > different hostnames in their SolrCloud config for one actual host, then > it wouldn't be able to do any better than it does now, but if there are > matches in the hostname part of different entries in live_nodes, then I > think the improvement might be relatively easy. Not saying that I know > what to do, but somebody who is familiar with the Collections API code > can probably do it. > > Thanks, > Shawn >
Re: Search for a specific unicode char
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 To whom it may concern, On 7/31/18 2:56 PM, tedsolr wrote: > I'm having some trouble with non printable, but valid, UTF8 chars > when exporting to Amazon Redshift. The export fails but I can't yet > find this data in my Solr collection. How can I search, say from > the admin console, for a particular character? I'm looking for > U+001E and U+001F Try copy/pasting from e.g. https://www.fileformat.info/info/unicode/char/001e/browsertest.htm Or url-decode this string (%1e) here: https://meyerweb.com/eric/tools/dencoder/ and paste it into your search box. Do you have the source-data for the index? Maybe it's easier to locate the character in the source-data than in the index. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgzZoACgkQHPApP6U8 pFh5LQ//XEHKxGXd50kujey1H2i9SCoF0MYPIL255Mm/CXI2CEkHBiZnEEN7mrEH xW87KbpKcahikEYT2fc/VDoctWtoJYpzi3WrizONNf1W7J4Nq9sSfdQ8UEDEuHy7 ITma15LkVseKmWxcFJP5rOtRatHw+L0j8EzwvYrC+BfpP7c9hqO8h4VO+9fkmSbn 5wB49kfot4quvJf4iMud+/qd6+4rLD1XR2nO1P7ZRuU7yqEGy5w9fLFNYkAVZmxR 1WXidEnAgLXxFoR061k0OwrxCwgVD0K/NqhzO5cWpmv5DbGoFiWcuOavzlOedp7u ZPP32TuAM3PqmTpO6ku1MEsI70jVNlaRx6M1dzp6RUARFNEzLRbw93F3Vo9A34PL 94JhDaKMqbA74s2OdG+qNna7Fwe4mbIXMxUbwY80AC+1RMkEzRC/f1erNK1sfCzA 6cn06FNLuwbNhHvEpPAcS7TX0w0uhy4tCbbBt8rw0pbZDWee4Jz/aF7eRfMIiLdt SlILSJZyte0CCMuC7Rm5qs/lpObfOaynVNSHpyPOJircqOyvYDy/UWq6C1t5/NuB 0X6vpBy/QSZhmmq7GHc6a8A6udDd8cfW1rXEt1vRcG9qnke1zSR7Trcb6n+GV19s wooo3fHIsvU7393MHUZqAspaU20WqY9r9coNRHmje40Uj5ckFzU= =NdlT -END PGP SIGNATURE-
Re: Solr Server crashes when requesting a result with too large resultRows
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Georg, On 7/31/18 12:33 PM, Georg Fette wrote: > Yes ist is only one of the processors that is at maximum capacity. Ok. > How do I do something like a thread-dump of a single thread ? Here's how to get a thread dump of the whole JVM: https://wiki.apache.org/tomcat/HowTo#How_do_I_obtain_a_thread_dump_of_my _running_webapp_.3F The "tid" field of each thread is usually the same as the process-id from a "top" or "ps" listing, except it's often shown in hex instead of decimal. Have a look at this for some guidance: http://javadrama.blogspot.com/2012/02/why-is-java-eating-my-cpu.html Some tools dump the tid in hex, others in decimal. It's frustrating sometimes. > We run the Solr from the command line out-of-the-box and not in a > code development environment. Are there parameters that can be > configured so that the server creates dumps ? You don't want this to happen automatically. Instead, you'll want to trigger a dump manually for debugging purposes. - -chris > Am 31.07.2018 um 15:07 schrieb Christopher Schultz: Georg, > > On 7/31/18 4:39 AM, Georg Fette wrote: We run the server version 7.3.1. on a machine with 32GB RAM in a mode having -10g. When requesting a query with q={!boost b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=str ing > _field_type:catalog_entry&rows=2147483647 the server takes all available memory up to 10GB and is then no longer accessible with one processor at 100%. > Is it a single thread which takes the CPU or more than one? Can > you identify that thread and take a thread dump to get a backtrace > for that thread? > When we reduce the rows parameter to 1000 the query works. The query returns only 581 results. The documentation at https://wiki.apache.org/solr/CommonQueryParameters states that as the "rows" parameter a "ridiculously large value" may be used, but this could pose a problem. The number we used was Int.max from Java. > Interesting. I wonder if Solr attempts to pre-allocate a result > buffer. Requesting 2147483647 rows can have an adverse affect on > most pre-allocated data structures. > > -chris >> > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgySgACgkQHPApP6U8 pFjgKxAAxfbUmcj81+CpmTwHaPsz8Zb70HX4o/1eDGwALMhuvg8MyTaZnR9rSPy3 LHhAn0dtdnhp7Pe3NWRrYFdzKOZjQ85jiEcW96bzCe5ggJmnvs9a9VeEJ+5b4AXN XMtSMo8Ph7BvAWeTQcwmsiK8w2grAzaV6zXEetxaXgL0+16wfIjyNBteiQHkpcjo T5T5UzSzwyuAxFJkxSdbsF6SAJD7+zwbOEUQlURlUBsmzgam124ojgNl3gEG8d/V SSFhI1vnuj7pkdFLSZm7BDdAw6KjnOeM3yE3VKh5Lem4CRNLrP3ZvKrzKVlWTFJ4 dAIuJL6GUSMEFU0MCwQZjFxmtWNMwl/MIdDD8Yp9m/GislLXbcOi4oBbmWTNnuqU SPtmjdV+7fcIRl8AWc0bzLbK4nFYlVFzhiijR5am+pvF13TB/WQ8eOn9uifSPxWb OHzrU+fMV0fvIe5pZxqkcHEBas5QiZKZ5yH6Zz+xLldF4nh9Q4A6CJu/21qU/Kxd Dp2lenZEjKc90FKpSVMXqxJNM0n7geRmTSgv8imeoQf5+H6VU7dll1xGQkTnXtR9 UyV/U1fj12z2UjzcY6ePuJ8BadIx+cSf6H3q4bcJOGZ884lI+bDX08C/89hb/5vT 2NE5+tK1jAOX/ESClb6eFFMcJzBww/CoIxb9PpRqgw3HJKYuVpY= =mS/y -END PGP SIGNATURE-
Re: Search for a specific unicode char
This is an example of what the data looks like: "SOURCEFILEID":"77907", "APPROP_GROUP_CODE_T":"F\uG\uR", "APPROP_GROUP_CODE_T_aggr":"F\uG\uR", "APPROP_GROUP_CODE_T_search":"F\uG\uR", "OBJECT_DESC_T":"OTHER PROFESSIONAL/TECHNICAL SERVICES", That's a snippet from a query results. "\u" is a null value. I don't know why this data is presenting in this style. I still don't know how to search for one unicode character. A search using the value as shown above does work: q=APPROP_GROUP_CODE_T:"F\uG\uR" -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Highlighting the search keywords
Hi All, I was using highlighting in solr, solr gives highlighting results within the response but not included within the documents. Am i missing something? Can i configure so that it can show highlighted keywords matched within the documents. Thanks Renuka Srishti
RE: sharding and placement of replicas
In my case, when trying on Solr7.4 (in response to Shawn Heisey's 6/19/18 comment "If this is a provable and reproducible bug, and it's still a problem in the current stable branch"), I had only installed Solr7.4 on one host, and so I was testing with two nodes on the same host (different port numbers). I had previously had the same symptom when the two nodes were on different hosts, but that was with Solr6.6 -- I can try it again with Solr7.4 with two hosts and report back. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Tuesday, July 31, 2018 2:26 PM To: solr-user@lucene.apache.org Subject: Re: sharding and placement of replicas On 7/27/2018 8:26 PM, Erick Erickson wrote: > Yes with some fiddling as far as "placement rules", start here: > https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html > > The idea (IIUC) is that you provide a snitch" that identifies what > "rack" the Solr instance is on and can define placement rules that > define "don't put more than one thingy on the same rack". "Thingy" > here is replica, shard, whatever as defined by other placement rules. I'd like to see an improvement in Solr's behavior when nothing has been configured in auto-scaling or rule-based replica placement. Configuring those things is certainly an option, but I think we can do better even without that config. I believe that Solr already has some default intelligence that keeps multiple replicas from ending up on the same *node* when possible ... I would like this to also be aware of *hosts*. Craig hasn't yet indicated whether there is more than one node per host, so I don't know whether the behavior he's seeing should be considered a bug. If somebody gives one machine multiple names/addresses and uses different hostnames in their SolrCloud config for one actual host, then it wouldn't be able to do any better than it does now, but if there are matches in the hostname part of different entries in live_nodes, then I think the improvement might be relatively easy. Not saying that I know what to do, but somebody who is familiar with the Collections API code can probably do it. Thanks, Shawn
Search for a specific unicode char
I'm having some trouble with non printable, but valid, UTF8 chars when exporting to Amazon Redshift. The export fails but I can't yet find this data in my Solr collection. How can I search, say from the admin console, for a particular character? I'm looking for U+001E and U+001F thanks! Solr 5.5.4 -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: sharding and placement of replicas
On 7/27/2018 8:26 PM, Erick Erickson wrote: > Yes with some fiddling as far as "placement rules", start here: > https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html > > The idea (IIUC) is that you provide a snitch" that identifies what > "rack" the Solr instance is on and can define placement rules that > define "don't put more than one thingy on the same rack". "Thingy" > here is replica, shard, whatever as defined by other placement rules. I'd like to see an improvement in Solr's behavior when nothing has been configured in auto-scaling or rule-based replica placement. Configuring those things is certainly an option, but I think we can do better even without that config. I believe that Solr already has some default intelligence that keeps multiple replicas from ending up on the same *node* when possible ... I would like this to also be aware of *hosts*. Craig hasn't yet indicated whether there is more than one node per host, so I don't know whether the behavior he's seeing should be considered a bug. If somebody gives one machine multiple names/addresses and uses different hostnames in their SolrCloud config for one actual host, then it wouldn't be able to do any better than it does now, but if there are matches in the hostname part of different entries in live_nodes, then I think the improvement might be relatively easy. Not saying that I know what to do, but somebody who is familiar with the Collections API code can probably do it. Thanks, Shawn
Fuzzy search 'sometimes' not working
Hello list, I currently observed a very strange behaviour of fuzzy searches with Solr Cloud 5.5.0. I have two identical documents in 2 different collections. Something like {name: "Tomas"}. I find the document in the first collection with a search like name:Thomass~2. But I don't find it in the second one! I triple checked everything (I find them both with name:Tomas, I find them both with name:Thomas~1, the solrconfig and schemas are identical), but I just don't see any reasonable explanation for it. Could it be that the functionality of fuzzy searching depends on the data of other documents in the collection; like a limit of how many "Thomas"'s there could be? Or on the amount of memory available? Could some race condition during indexing have removed the "Thomass" variant in one case? Anything non-deterministic? Any bug in that direction fixed since 5.5.0? Thanks a lot for any ideas, David -- David Frese +49 7071 70896 75 Active Group GmbH Hechinger Str. 12/1, 72072 Tübingen Registergericht: Amtsgericht Stuttgart, HRB 224404 Geschäftsführer: Dr. Michael Sperber
Re: Solr Server crashes when requesting a result with too large resultRows
Hi Christoph, Yes ist is only one of the processors that is at maximum capacity. How do I do something like a thread-dump of a single thread ? We run the Solr from the command line out-of-the-box and not in a code development environment. Are there parameters that can be configured so that the server creates dumps ? Greetings Georg Am 31.07.2018 um 15:07 schrieb Christopher Schultz: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Georg, On 7/31/18 4:39 AM, Georg Fette wrote: We run the server version 7.3.1. on a machine with 32GB RAM in a mode having -10g. When requesting a query with q={!boost b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string _field_type:catalog_entry&rows=2147483647 the server takes all available memory up to 10GB and is then no longer accessible with one processor at 100%. Is it a single thread which takes the CPU or more than one? Can you identify that thread and take a thread dump to get a backtrace for that thread? When we reduce the rows parameter to 1000 the query works. The query returns only 581 results. The documentation at https://wiki.apache.org/solr/CommonQueryParameters states that as the "rows" parameter a "ridiculously large value" may be used, but this could pose a problem. The number we used was Int.max from Java. Interesting. I wonder if Solr attempts to pre-allocate a result buffer. Requesting 2147483647 rows can have an adverse affect on most pre-allocated data structures. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgXy4ACgkQHPApP6U8 pFicOQ//c1Qe0hLOHIbSvmxAMVEhqZTjQlzEGFoYYhC1aGrYpw++RKQYtBLD2kmN DcLLkwFOmwv5CDft+Mn+g5ZWhEuZSKnwFgxsPfTAbRGjDYGQ7qCzzGq2JGacoxTJ rPgizyRlZQ4f5QY0RHohAGFx/QhgPtLdSl0V32eERWH8fVJWvDH3iYTTTSDN4UCY /bpB34nrruBgh2iTz9UcGR1jnTw9iU57OVYRwtTk8ETeOivcBM5MTXzKbwQ8/w5m c7lmKWqMG0G5XKKu6KDbWFZwSwYLBvHTUQurqgS2pkm+r2c4xP5/U0+uI5D9EseS 1HiOjWBuhWFEIveioKCOQbPAWL+C0i4xMbBLiC4RZPnTs6LSQ0aXm4Jx05NFoAWt 3HA2VCb9rrK5y8cICSCbVGaPNNBT9HHqJqeo2eGbzLaZXP5iRCc8BdkjHTPrSqCq gh8FEAK9pVS3ejO96DZvIoiIEpcmRNuSHczdE7YKwCv5XvytSh4QXa0SKluEhpYo acPXOtjIbqFcTZ1f+hZTfiG1/PeCUnYshta8VdSyvIjm748wOB7wqs7uYhl0b6zx i6OgoQ3bOel8e7oAO4Fmv5LE56b8A4tOPzPBf4Y1ehb8e8HbBdSzZuzqZZrQqChQ AUfrEzaXUKIBsmlaUneT2qjsLLZZmU+Gk0EYJnmHw63RQR/QxKg= =IXGx -END PGP SIGNATURE- -- - Dipl.-Inf. Georg Fette Raum: B009 Universität WürzburgTel.: +49-(0)931-31-85516 Am Hubland Fax.: +49-(0)931-31-86732 97074 Würzburg mail: georg.fe...@uni-wuerzburg.de -
Re: Zookeeper / Solr interaction
Ok, your OOM errors are most likely due to trying to stuff too many replicas into too little memory. You have 100 collections, 8 shards per collection and 1 replica per shard. So if my math is right, you have 800 replicas total, 400 replicas per Solr instance. 6G of memory is very little for that many replicas, and any significant number of docs in these collections will make matters worse. Anything to do with sorting, faceting or grouping that works on fields that do not use docValues will make it even worse yet. All that said, I've certainly seen many more replicas than that operate with a 3-node external ZK ensemble so I doubt it's the case you're overloading ZK. There are some additional multipliers to the number of ZK events having to do with the "overseer". My best guess is that your focus on ZK events is a red herring, and that if you distributed your replicas over more Solr nodes and/or were able to allocate significantly more memory to each node the problem would go away. Note that the usual recommendation is that you allocate no more than 50% of your available physical memory to the Java heap. Best, Erick On Tue, Jul 31, 2018 at 8:06 AM, Zarski, Jacek wrote: > Thanks for responding! That's some good info. Here are the answers to the > questions you had... > > Solr has 6gb of heap > We have 1 replica per shard at 8 shards per collection > We currently have approximately 100 collections > Zookeeper is an external ensemble each with their own server > > -Original Message- > From: Erick Erickson > Sent: Monday, July 30, 2018 7:56 PM > To: solr-user > Subject: Re: Zookeeper / Solr interaction > > 1> Yeah, the interactions with ZK are quite chatty. Basically each > replica may have several changes of state with ZooKeeper. > Down->recovering->active. How many replicas do you have on a node? > > 2> Unfortunately I don't have much info on this point. > > 3> I would not expect OOMs on the Solr node while waiting for ZK to > respond. How much heap are you allocating to Solr? How many replicas do you > have? > > a> Plausible yes, but that many transactions seems quite high. How > many replicas do you have on your Solr instance? One scenario here is that > you have thousands of replicas. Is your ZK ensemble an external one? And are > they running on separate hardware? Because this many transactions for only > two Solr instances seems quite high so I'm curious about a few more details > of your setup, how many collections, shards and replicas are we talking here? > > Best, > Erick > > > On Mon, Jul 30, 2018 at 1:10 PM, Zarski, Jacek > wrote: >> Some information I forgot to include: >> Solr version : 7.2.1 >> Zk version : 3.4.10 >> >> -Original Message- >> From: Zarski, Jacek >> Sent: Monday, July 30, 2018 4:06 PM >> To: solr-user@lucene.apache.org >> Subject: Zookeeper / Solr interaction >> >> Hi, >> >> We have the following environment setup for zookeeper/solrcould >> >> 3 zookeeper ensemble >> 2 Solr cloud servers >> >> I am writing you to further inquire about the interaction of solr and >> zookeeper. In particular relating to transactions in the transaction logs. I >> have a script running that logs the amount of transactions. I am matching >> this log with snapshot timing and new log creation. >> >> After a problem arose in our PROD environment, I have tracked it to an >> unrecommended configuration where logs and data was kept on the same drive. >> Since then we have configured separate drives for logs and data in that >> environment. The behavior that caused the problem was when a snapshot was >> happening, a solr instance reported that it was unable to establish a ZK >> leader. Following that failure, during recovery, 4 more snapshots happened >> in short succession(10 minutes) on all 3 zk servers causing the whole >> environment to be unresponsive until restart for 1.5 hours. >> >> I am currently working to recreate the problem and gather more information >> on the cause and impact of snapshots. I have configured a DEV environment >> with the same number of servers. I have changed the zk configuration to >> again have the logs and data in the same drive and directory. I am seeing >> that snapshots cause a degredation in performance due to IO block but I >> would like more information on transactions and snapshots to confirm this >> behavior and our suspicions. >> >> Here are the scenarios I would like more information about: >> >> 1. When the solr server is restarted, I see a huge influx of >> transactions on the zookeeper transaction log. What is the solr behavior >> that is causing this and is this normal? >> >> 2. There is scenarios where snapshots are being created without >> reaching "snapCount" (snapCount=10) transactions. I have documented >> snapshots at 17k and 45k transactions. In what scenarios would a snapshot be >> created other than reaching "snapCount" transactions? >> >> 3. Since zk won't respond before wr
RE: Zookeeper / Solr interaction
Thanks for responding! That's some good info. Here are the answers to the questions you had... Solr has 6gb of heap We have 1 replica per shard at 8 shards per collection We currently have approximately 100 collections Zookeeper is an external ensemble each with their own server -Original Message- From: Erick Erickson Sent: Monday, July 30, 2018 7:56 PM To: solr-user Subject: Re: Zookeeper / Solr interaction 1> Yeah, the interactions with ZK are quite chatty. Basically each replica may have several changes of state with ZooKeeper. Down->recovering->active. How many replicas do you have on a node? 2> Unfortunately I don't have much info on this point. 3> I would not expect OOMs on the Solr node while waiting for ZK to respond. How much heap are you allocating to Solr? How many replicas do you have? a> Plausible yes, but that many transactions seems quite high. How many replicas do you have on your Solr instance? One scenario here is that you have thousands of replicas. Is your ZK ensemble an external one? And are they running on separate hardware? Because this many transactions for only two Solr instances seems quite high so I'm curious about a few more details of your setup, how many collections, shards and replicas are we talking here? Best, Erick On Mon, Jul 30, 2018 at 1:10 PM, Zarski, Jacek wrote: > Some information I forgot to include: > Solr version : 7.2.1 > Zk version : 3.4.10 > > -Original Message- > From: Zarski, Jacek > Sent: Monday, July 30, 2018 4:06 PM > To: solr-user@lucene.apache.org > Subject: Zookeeper / Solr interaction > > Hi, > > We have the following environment setup for zookeeper/solrcould > > 3 zookeeper ensemble > 2 Solr cloud servers > > I am writing you to further inquire about the interaction of solr and > zookeeper. In particular relating to transactions in the transaction logs. I > have a script running that logs the amount of transactions. I am matching > this log with snapshot timing and new log creation. > > After a problem arose in our PROD environment, I have tracked it to an > unrecommended configuration where logs and data was kept on the same drive. > Since then we have configured separate drives for logs and data in that > environment. The behavior that caused the problem was when a snapshot was > happening, a solr instance reported that it was unable to establish a ZK > leader. Following that failure, during recovery, 4 more snapshots happened > in short succession(10 minutes) on all 3 zk servers causing the whole > environment to be unresponsive until restart for 1.5 hours. > > I am currently working to recreate the problem and gather more information on > the cause and impact of snapshots. I have configured a DEV environment with > the same number of servers. I have changed the zk configuration to again have > the logs and data in the same drive and directory. I am seeing that snapshots > cause a degredation in performance due to IO block but I would like more > information on transactions and snapshots to confirm this behavior and our > suspicions. > > Here are the scenarios I would like more information about: > > 1. When the solr server is restarted, I see a huge influx of > transactions on the zookeeper transaction log. What is the solr behavior that > is causing this and is this normal? > > 2. There is scenarios where snapshots are being created without > reaching "snapCount" (snapCount=10) transactions. I have documented > snapshots at 17k and 45k transactions. In what scenarios would a snapshot be > created other than reaching "snapCount" transactions? > > 3. Since zk won't respond before writing to the transaction log... at > Snapshot time(IO block) is it possible for the solr server to wait for a > response from zk causing all other writes to be buffered resulting in a full > heap and therefore an out of memory failure on the solr node? > > a. Now referencing question #1... When a solr node recovers, the influx > of transactions plus the continuing writes seems to be enough to trigger > another snapshot resulting in further downtime. Is this case plausible? > > Thanks, > Jacek
Re: Solr Server crashes when requesting a result with too large resultRows
On 7/31/2018 2:39 AM, Georg Fette wrote: We run the server version 7.3.1. on a machine with 32GB RAM in a mode having -10g. When requesting a query with q={!boost b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647 the server takes all available memory up to 10GB and is then no longer accessible with one processor at 100%. When we reduce the rows parameter to 1000 the query works. The query returns only 581 results. This is happening because of the way that Solr prepares for searching. Objects are allocated in heap memory according to the rows value before the query even gets executed. If you run Solr on an operating system other than Windows, the resulting OutOfMemoryError will cause the Solr process to be killed. If it's running on Windows, Solr would stay running, but we have no way of knowing whether it would work *correctly* after OOME. https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_too_many_rows At the link above is a link to a blog post that covers the problem in great detail. https://sbdevel.wordpress.com/2015/10/05/speeding-up-core-search/ With a rows parameter of over 2 billion, Solr (actually it's Lucene, which provides most of Solr's functionality) will allocate that many ScoreDoc objects, which needs about 60GB of heap memory. So it's not possible on your hardware. As you'll see if you read the blog post, Toke has some ideas about how to improve the situation. I don't think an issue has been filed, but I could be wrong about that. Right now, switching to cursorMark or the /export handler is a better way to get a very large result set. Thanks, Shawn
Re: Solr Server crashes when requesting a result with too large resultRows
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Georg, On 7/31/18 4:39 AM, Georg Fette wrote: > We run the server version 7.3.1. on a machine with 32GB RAM in a > mode having -10g. > > When requesting a query with > > q={!boost > b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string _field_type:catalog_entry&rows=2147483647 > > > > the server takes all available memory up to 10GB and is then no > longer accessible with one processor at 100%. Is it a single thread which takes the CPU or more than one? Can you identify that thread and take a thread dump to get a backtrace for that thread? > When we reduce the rows parameter to 1000 the query works. The > query returns only 581 results. > > The documentation at > https://wiki.apache.org/solr/CommonQueryParameters states that as > the "rows" parameter a "ridiculously large value" may be used, but > this could pose a problem. The number we used was Int.max from > Java. Interesting. I wonder if Solr attempts to pre-allocate a result buffer. Requesting 2147483647 rows can have an adverse affect on most pre-allocated data structures. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgXy4ACgkQHPApP6U8 pFicOQ//c1Qe0hLOHIbSvmxAMVEhqZTjQlzEGFoYYhC1aGrYpw++RKQYtBLD2kmN DcLLkwFOmwv5CDft+Mn+g5ZWhEuZSKnwFgxsPfTAbRGjDYGQ7qCzzGq2JGacoxTJ rPgizyRlZQ4f5QY0RHohAGFx/QhgPtLdSl0V32eERWH8fVJWvDH3iYTTTSDN4UCY /bpB34nrruBgh2iTz9UcGR1jnTw9iU57OVYRwtTk8ETeOivcBM5MTXzKbwQ8/w5m c7lmKWqMG0G5XKKu6KDbWFZwSwYLBvHTUQurqgS2pkm+r2c4xP5/U0+uI5D9EseS 1HiOjWBuhWFEIveioKCOQbPAWL+C0i4xMbBLiC4RZPnTs6LSQ0aXm4Jx05NFoAWt 3HA2VCb9rrK5y8cICSCbVGaPNNBT9HHqJqeo2eGbzLaZXP5iRCc8BdkjHTPrSqCq gh8FEAK9pVS3ejO96DZvIoiIEpcmRNuSHczdE7YKwCv5XvytSh4QXa0SKluEhpYo acPXOtjIbqFcTZ1f+hZTfiG1/PeCUnYshta8VdSyvIjm748wOB7wqs7uYhl0b6zx i6OgoQ3bOel8e7oAO4Fmv5LE56b8A4tOPzPBf4Y1ehb8e8HbBdSzZuzqZZrQqChQ AUfrEzaXUKIBsmlaUneT2qjsLLZZmU+Gk0EYJnmHw63RQR/QxKg= =IXGx -END PGP SIGNATURE-
Re: Solr Server crashes when requesting a result with too large resultRows
Yes, but 581 is the final number you got in the response, which is the result of the main query intersected with the filter query so I wouldn't take in account this number. The main and the filter query are executed separately, so I guess (but I'm guessing because I don't know these internals) that's here where the "rows" parameter matters. Again, I'm guessing, I'm sure some Solr committer here can explain you how things are working. Best, Andrea On 31/07/18 11:12, Fette, Georg wrote: Hi Andrea, I agree that receiving too much data in one request is bad. But I was surprised that the query works with a lower but still very large rows parameter and that there is a threshold at which it crashes the server. Furthermore, it seems that the reason for the crash is not the size of the actual results because those are only 581. Greetings Georg Am 31.07.2018 um 10:53 schrieb Andrea Gazzarini: Hi Georg, I would say, without knowing your context, that this is not what Solr is supposed to do. You're asking to load everything in a single request/response and this poses a problem. Since I guess that, even we assume it works, you should then iterate those results one by one or in blocks, an option would be to do this part (block scrolling) using Solr [2]. I suggest you to have a look at * the export endpoint [1] * the cursor API [2] Best, Andrea [1] https://lucene.apache.org/solr/guide/6_6/exporting-result-sets.html [2] https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors On 31/07/18 10:44, Georg Fette wrote: Hello, We run the server version 7.3.1. on a machine with 32GB RAM in a mode having -10g. When requesting a query with q={!boost b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647 the server takes all available memory up to 10GB and is then no longer accessible with one processor at 100%. When we reduce the rows parameter to 1000 the query works. The query returns only 581 results. The documentation at https://wiki.apache.org/solr/CommonQueryParameters states that as the "rows" parameter a "ridiculously large value" may be used, but this could pose a problem. The number we used was Int.max from Java. Greetings Georg
NullPointerException in SolrMetricManager
Hi, we are seeing the following NPE sometimes when we delete a collection right after we modify the schema: 08:47:46.407 [zkCallback-5-thread-4] INFO org.apache.solr.rest.ManagedResource 209 processStoredData - Loaded initArgs {ignoreCase=true} for /schema/analysis/stopwords/text_ar 08:47:46.407 [zkCallback-5-thread-4] INFO org.apache.solr.rest.schema.analysis.ManagedWordSetResource 116 onManagedDataLoadedFromStorage - Loaded 119 words for /schema/analysis/stopwords/text_ar 08:47:46.407 [zkCallback-5-thread-4] INFO org.apache.solr.rest.ManagedResource 117 notifyObserversDuringInit - Notified 8 observers of /schema/analysis/stopwords/text_ar 08:47:46.407 [zkCallback-5-thread-4] INFO org.apache.solr.rest.RestManager 668 addRegisteredResource - Registered new managed resource /schema/analysis/stopwords/text_ar 08:47:46.408 [zkCallback-5-thread-4] INFO org.apache.solr.schema.IndexSchema 592 readSchema - Loaded schema solr-config/1.6 with uniqueid field id 08:47:46.408 [zkCallback-5-thread-4] INFO org.apache.solr.schema.ZkIndexSchemaReader 177 updateSchema - Finished refreshing schema in 411 ms 08:47:46.415 [qtp254749889-20] INFO org.apache.solr.core.SolrCore 1517 close - [donald.test-query-1533026857986_shard1_replica_n1] CLOSING SolrCore org.apache.solr.core.SolrCore@62ef7f0c 08:47:46.415 [qtp254749889-20] INFO org.apache.solr.metrics.SolrMetricManager 1038 closeReporters - Closing metric reporters for registry=solr.core.donald.test-query-1533026857986.shard1.replica_n1, tag=62ef7f0c 08:47:46.416 [qtp254749889-20] INFO org.apache.solr.metrics.SolrMetricManager 1038 closeReporters - Closing metric reporters for registry=solr.collection.donald.test-query-1533026857986.shard1.leader, tag=62ef7f0c 08:47:46.416 [Thread-20] INFO org.apache.solr.metrics.reporters.SolrJmxReporter 112 doInit - JMX monitoring for 'solr.core.donald.test-query-1533026857986.shard1.replica_n1' (registry 'solr.core.donald.test-query-1533026857986.shard1.replica_n1') enabled at server: com.sun.jmx.mbeanserver.JmxMBeanServer@2698dc7 08:47:46.417 [Thread-20] WARN org.apache.solr.cloud.ZkController 2689 lambda$fireEventListeners$6 - listener throws error org.apache.solr.common.SolrException: Unable to reload core [donald.test-query-1533026857986_shard1_replica_n1] at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1411) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.core.SolrCore.lambda$getConfListener$20(SolrCore.java:3029) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.cloud.ZkController.lambda$fireEventListeners$6(ZkController.java:2687) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181] Caused by: java.lang.NullPointerException at org.apache.solr.metrics.SolrMetricManager.loadShardReporters(SolrMetricManager.java:1146) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.metrics.SolrCoreMetricManager.loadReporters(SolrCoreMetricManager.java:92) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.core.SolrCore.(SolrCore.java:909) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.core.SolrCore.reload(SolrCore.java:663) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1390) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] ... 3 more regards, Hendrik
Re: Solr Server crashes when requesting a result with too large resultRows
Hi Andrea, I agree that receiving too much data in one request is bad. But I was surprised that the query works with a lower but still very large rows parameter and that there is a threshold at which it crashes the server. Furthermore, it seems that the reason for the crash is not the size of the actual results because those are only 581. Greetings Georg Am 31.07.2018 um 10:53 schrieb Andrea Gazzarini: Hi Georg, I would say, without knowing your context, that this is not what Solr is supposed to do. You're asking to load everything in a single request/response and this poses a problem. Since I guess that, even we assume it works, you should then iterate those results one by one or in blocks, an option would be to do this part (block scrolling) using Solr [2]. I suggest you to have a look at * the export endpoint [1] * the cursor API [2] Best, Andrea [1] https://lucene.apache.org/solr/guide/6_6/exporting-result-sets.html [2] https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors On 31/07/18 10:44, Georg Fette wrote: Hello, We run the server version 7.3.1. on a machine with 32GB RAM in a mode having -10g. When requesting a query with q={!boost b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647 the server takes all available memory up to 10GB and is then no longer accessible with one processor at 100%. When we reduce the rows parameter to 1000 the query works. The query returns only 581 results. The documentation at https://wiki.apache.org/solr/CommonQueryParameters states that as the "rows" parameter a "ridiculously large value" may be used, but this could pose a problem. The number we used was Int.max from Java. Greetings Georg -- - Dipl.-Inf. Georg Fette Raum: B009 Universität WürzburgTel.: +49-(0)931-31-85516 Am Hubland Fax.: +49-(0)931-31-86732 97074 Würzburg mail: georg.fe...@uni-wuerzburg.de -
Solr Server crashes when requesting a result with too large resultRows
We run the server version 7.3.1. on a machine with 32GB RAM in a mode having -10g. When requesting a query with q={!boost b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647 the server takes all available memory up to 10GB and is then no longer accessible with one processor at 100%. When we reduce the rows parameter to 1000 the query works. The query returns only 581 results. The documentation at https://wiki.apache.org/solr/CommonQueryParameters states that as the "rows" parameter a "ridiculously large value" may be used, but this could pose a problem. The number we used was Int.max from Java. Greetings Georg -- - Dipl.-Inf. Georg Fette Raum: B001 Universität WürzburgTel.: +49-(0)931-31-85516 Am Hubland Fax.: +49-(0)931-31-86732 97074 Würzburg mail: georg.fe...@uni-wuerzburg.de -
Is system-load based auto-scaling supported in latest Solr version
Hi, there >From Solr 7.x doc I know auto-scaling is triggered by number of replicas, just want to know if I can achieve auto-scaling based on system-load dynamically? Appreciate your reply. Thanks, Xiaoming
Re: Solr Server crashes when requesting a result with too large resultRows
Hi Georg, I would say, without knowing your context, that this is not what Solr is supposed to do. You're asking to load everything in a single request/response and this poses a problem. Since I guess that, even we assume it works, you should then iterate those results one by one or in blocks, an option would be to do this part (block scrolling) using Solr [2]. I suggest you to have a look at * the export endpoint [1] * the cursor API [2] Best, Andrea [1] https://lucene.apache.org/solr/guide/6_6/exporting-result-sets.html [2] https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors On 31/07/18 10:44, Georg Fette wrote: Hello, We run the server version 7.3.1. on a machine with 32GB RAM in a mode having -10g. When requesting a query with q={!boost b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647 the server takes all available memory up to 10GB and is then no longer accessible with one processor at 100%. When we reduce the rows parameter to 1000 the query works. The query returns only 581 results. The documentation at https://wiki.apache.org/solr/CommonQueryParameters states that as the "rows" parameter a "ridiculously large value" may be used, but this could pose a problem. The number we used was Int.max from Java. Greetings Georg
RE: Solr Server crashes when requesting a result with too large resultRows
Hello Georg, As you have seen, a high rows parameter is a bad idea. Use cursor mark [1] instead. Regards, Markus [1] https://lucene.apache.org/solr/guide/7_4/pagination-of-results.html -Original message- > From:Georg Fette > Sent: Tuesday 31st July 2018 10:44 > To: solr-user@lucene.apache.org > Subject: Solr Server crashes when requesting a result with too large > resultRows > > Hello, > We run the server version 7.3.1. on a machine with 32GB RAM in a mode > having -10g. > When requesting a query with > q={!boost > b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647 > the server takes all available memory up to 10GB and is then no longer > accessible with one processor at 100%. > When we reduce the rows parameter to 1000 the query works. The query > returns only 581 results. > The documentation at https://wiki.apache.org/solr/CommonQueryParameters > states that as the "rows" parameter a "ridiculously large value" may be > used, but this could pose a problem. The number we used was Int.max from > Java. > Greetings > Georg > > -- > - > Dipl.-Inf. Georg Fette Raum: B001 > Universität WürzburgTel.: +49-(0)931-31-85516 > Am Hubland Fax.: +49-(0)931-31-86732 > 97074 Würzburg mail: georg.fe...@uni-wuerzburg.de > - > >
Solr Server crashes when requesting a result with too large resultRows
Hello, We run the server version 7.3.1. on a machine with 32GB RAM in a mode having -10g. When requesting a query with q={!boost b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647 the server takes all available memory up to 10GB and is then no longer accessible with one processor at 100%. When we reduce the rows parameter to 1000 the query works. The query returns only 581 results. The documentation at https://wiki.apache.org/solr/CommonQueryParameters states that as the "rows" parameter a "ridiculously large value" may be used, but this could pose a problem. The number we used was Int.max from Java. Greetings Georg -- - Dipl.-Inf. Georg Fette Raum: B001 Universität WürzburgTel.: +49-(0)931-31-85516 Am Hubland Fax.: +49-(0)931-31-86732 97074 Würzburg mail: georg.fe...@uni-wuerzburg.de -