Re: Highlighting the search keywords

2018-07-31 Thread Nicolas Franck
Nope, that is how it works. It is not in place.

> On 31 Jul 2018, at 21:57, Renuka Srishti  wrote:
> 
> Hi All,
> 
> I was using highlighting in solr, solr gives highlighting results within
> the response but not included within the documents.
> Am i missing something? Can i configure so that it can show highlighted
> keywords matched within the documents.
> 
> Thanks
> Renuka Srishti



Re: Cannot train 2 or more features for Solr LTR using LIBLINEAR

2018-07-31 Thread Zheng Lin Edwin Yeo
Hi,

Anyone has any information on this?

Regards,
Edwin

On Mon, 30 Jul 2018 at 11:15, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I am using the Solr LTR in Solr 7.4.0, and I am trying to train an example
> learning model using LIBLINEAR.
>
> When I tried to run the code from train_and_upload_demo_model.py , I can
> only train one feature at a time. If I put more than one features, then I
> will get the following error
>
> Traceback (most recent call last):
>
>   File "train_and_upload_demo_model.py", line 182, in 
>
> sys.exit(main())
>
>   File "train_and_upload_demo_model.py", line 169, in main
>
>
> formatter.processQueryDocFeatureVector(fvGenerator,config["trainingFile"]);
>
>   File
> "/cygdrive/c/Users/edwin/Desktop/solr-7.4.0/contrib/ltr/myModel/libsvm_formatter.py",
> line 25, in processQueryDocFeatureVector
>
> curListOfFv.append((relevance,self._makeFeaturesMap(featureVector)))
>
>   File
> "/cygdrive/c/Users/edwin/Desktop/solr-7.4.0/contrib/ltr/myModel/libsvm_formatter.py",
> line 35, in _makeFeaturesMap
>
> featName,featValue = keyValuePairStr.split(":");
>
> ValueError: too many values to unpack
>
>
> Is there any way that we can do so that we can train 2 or more features at
> the same time?
>
> Regards,
> Edwin
>
>


Re: SolrCloud: Different replicationFactor for different shards in same collection

2018-07-31 Thread Erick Erickson
This feels like more work than necessary, especially the bit:
"which will require modification in Solr code".

If your needs are to co-locate various groups of documents
on specific nodes, composite id (the default) routing has
the ability to cluster docs together, see:
https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html
the "document routing" section. You can also route queries to
those shards only, see:
https://lucidworks.com/2013/06/13/solr-cloud-document-routing/

If that isn't sufficient, using "implicit" routing allows you to
send document to specific shards.

True, in both cases the _client_ has to assign the doc to a particular
shard based on whatever criteria you need, but that seems like less
work than changing Solr code.

Best,
Erick

On Tue, Jul 31, 2018 at 5:20 PM, Nawab Zada Asad Iqbal  wrote:
> Thanks Erick
>
>
> This is for future. I am exploring to use a custom sharding scheme (which
> will require modification in Solr code) together with the benefits of
> SolrCloud.
>
>
>
> Thanks
> Nawab
>
>
>
> On Tue, Jul 31, 2018 at 4:51 PM, Erick Erickson 
> wrote:
>
>> Sure, just use the Collections API ADDREPLICA command to add as many
>> replicas for specific shards as you want. There's no way to specify
>> that at creation time though.
>>
>> Some of the new autoscaling can do this automatically I believe.
>>
>> I have to ask what it is about your collection that this is true. If
>> you're using the default composite id routing having one shard get
>> substantially more queries than the others is unexpected.
>>
>> If you're using implicit routing then it's perfectly understandable.
>>
>> Best,
>> Erick
>>
>> On Tue, Jul 31, 2018 at 4:12 PM, Nawab Zada Asad Iqbal 
>> wrote:
>> > Hi,
>> >
>> > I am looking at Solr 7.x and couldn't find an answer in the
>> documentation.
>> > Is it possibly to specify different replicationFactor for different
>> shards
>> > in same collection? E.g. if a certain shard is receiving more queries
>> than
>> > rest of the collection  I would like to add more replicas for it to help
>> > with the query load.
>> >
>> >
>> >
>> > Thanks
>> > Nawab
>>


Re: SolrCloud: Different replicationFactor for different shards in same collection

2018-07-31 Thread Nawab Zada Asad Iqbal
Thanks Erick


This is for future. I am exploring to use a custom sharding scheme (which
will require modification in Solr code) together with the benefits of
SolrCloud.



Thanks
Nawab



On Tue, Jul 31, 2018 at 4:51 PM, Erick Erickson 
wrote:

> Sure, just use the Collections API ADDREPLICA command to add as many
> replicas for specific shards as you want. There's no way to specify
> that at creation time though.
>
> Some of the new autoscaling can do this automatically I believe.
>
> I have to ask what it is about your collection that this is true. If
> you're using the default composite id routing having one shard get
> substantially more queries than the others is unexpected.
>
> If you're using implicit routing then it's perfectly understandable.
>
> Best,
> Erick
>
> On Tue, Jul 31, 2018 at 4:12 PM, Nawab Zada Asad Iqbal 
> wrote:
> > Hi,
> >
> > I am looking at Solr 7.x and couldn't find an answer in the
> documentation.
> > Is it possibly to specify different replicationFactor for different
> shards
> > in same collection? E.g. if a certain shard is receiving more queries
> than
> > rest of the collection  I would like to add more replicas for it to help
> > with the query load.
> >
> >
> >
> > Thanks
> > Nawab
>


Re: SolrCloud: Different replicationFactor for different shards in same collection

2018-07-31 Thread Erick Erickson
Sure, just use the Collections API ADDREPLICA command to add as many
replicas for specific shards as you want. There's no way to specify
that at creation time though.

Some of the new autoscaling can do this automatically I believe.

I have to ask what it is about your collection that this is true. If
you're using the default composite id routing having one shard get
substantially more queries than the others is unexpected.

If you're using implicit routing then it's perfectly understandable.

Best,
Erick

On Tue, Jul 31, 2018 at 4:12 PM, Nawab Zada Asad Iqbal  wrote:
> Hi,
>
> I am looking at Solr 7.x and couldn't find an answer in the documentation.
> Is it possibly to specify different replicationFactor for different shards
> in same collection? E.g. if a certain shard is receiving more queries than
> rest of the collection  I would like to add more replicas for it to help
> with the query load.
>
>
>
> Thanks
> Nawab


SolrCloud: Different replicationFactor for different shards in same collection

2018-07-31 Thread Nawab Zada Asad Iqbal
Hi,

I am looking at Solr 7.x and couldn't find an answer in the documentation.
Is it possibly to specify different replicationFactor for different shards
in same collection? E.g. if a certain shard is receiving more queries than
rest of the collection  I would like to add more replicas for it to help
with the query load.



Thanks
Nawab


Re: sharding and placement of replicas

2018-07-31 Thread Erick Erickson
Right, two JVMs on the same physical host with different ports are
"different Solrs" by default. If you had two replicas per shard and
both were on either Solr instance (same port) that would be
unexpected.

Problem is that this would have been a bug clear back in the Solr 4x
days so the fact that you say you saw it on 6.6 would be unexpected.

Of course if you have three replicas and two instances, I'd absolutely
expect that two replicas would be on one of them for each shard.

Best,
Erick

On Tue, Jul 31, 2018 at 12:24 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
 wrote:
> In my case, when trying on Solr7.4 (in response to Shawn Heisey's 6/19/18 
> comment "If this is a provable and reproducible bug, and it's still a problem 
> in the current stable branch"), I had only installed Solr7.4 on one host, and 
> so I was testing with two nodes on the same host (different port numbers). I 
> had previously had the same symptom when the two nodes were on different 
> hosts, but that was with Solr6.6 -- I can try it again with Solr7.4 with two 
> hosts and report back.
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Tuesday, July 31, 2018 2:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: sharding and placement of replicas
>
> On 7/27/2018 8:26 PM, Erick Erickson wrote:
>> Yes with some fiddling as far as "placement rules", start here:
>> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
>>
>> The idea (IIUC) is that you provide a snitch" that identifies what
>> "rack" the Solr instance is on and can define placement rules that
>> define "don't put more than one thingy on the same rack". "Thingy"
>> here is replica, shard, whatever as defined by other placement rules.
>
> I'd like to see an improvement in Solr's behavior when nothing has been
> configured in auto-scaling or rule-based replica placement.  Configuring
> those things is certainly an option, but I think we can do better even
> without that config.
>
> I believe that Solr already has some default intelligence that keeps
> multiple replicas from ending up on the same *node* when possible ... I
> would like this to also be aware of *hosts*.
>
> Craig hasn't yet indicated whether there is more than one node per host,
> so I don't know whether the behavior he's seeing should be considered a bug.
>
> If somebody gives one machine multiple names/addresses and uses
> different hostnames in their SolrCloud config for one actual host, then
> it wouldn't be able to do any better than it does now, but if there are
> matches in the hostname part of different entries in live_nodes, then I
> think the improvement might be relatively easy.  Not saying that I know
> what to do, but somebody who is familiar with the Collections API code
> can probably do it.
>
> Thanks,
> Shawn
>


Re: Search for a specific unicode char

2018-07-31 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

To whom it may concern,

On 7/31/18 2:56 PM, tedsolr wrote:
> I'm having some trouble with non printable, but valid, UTF8 chars
> when exporting to Amazon Redshift. The export fails but I can't yet
> find this data in my Solr collection. How can I search, say from
> the admin console, for a particular character? I'm looking for
> U+001E and U+001F

Try copy/pasting from e.g.
https://www.fileformat.info/info/unicode/char/001e/browsertest.htm

Or url-decode this string (%1e) here:
https://meyerweb.com/eric/tools/dencoder/

and paste it into your search box.

Do you have the source-data for the index? Maybe it's easier to locate
the character in the source-data than in the index.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgzZoACgkQHPApP6U8
pFh5LQ//XEHKxGXd50kujey1H2i9SCoF0MYPIL255Mm/CXI2CEkHBiZnEEN7mrEH
xW87KbpKcahikEYT2fc/VDoctWtoJYpzi3WrizONNf1W7J4Nq9sSfdQ8UEDEuHy7
ITma15LkVseKmWxcFJP5rOtRatHw+L0j8EzwvYrC+BfpP7c9hqO8h4VO+9fkmSbn
5wB49kfot4quvJf4iMud+/qd6+4rLD1XR2nO1P7ZRuU7yqEGy5w9fLFNYkAVZmxR
1WXidEnAgLXxFoR061k0OwrxCwgVD0K/NqhzO5cWpmv5DbGoFiWcuOavzlOedp7u
ZPP32TuAM3PqmTpO6ku1MEsI70jVNlaRx6M1dzp6RUARFNEzLRbw93F3Vo9A34PL
94JhDaKMqbA74s2OdG+qNna7Fwe4mbIXMxUbwY80AC+1RMkEzRC/f1erNK1sfCzA
6cn06FNLuwbNhHvEpPAcS7TX0w0uhy4tCbbBt8rw0pbZDWee4Jz/aF7eRfMIiLdt
SlILSJZyte0CCMuC7Rm5qs/lpObfOaynVNSHpyPOJircqOyvYDy/UWq6C1t5/NuB
0X6vpBy/QSZhmmq7GHc6a8A6udDd8cfW1rXEt1vRcG9qnke1zSR7Trcb6n+GV19s
wooo3fHIsvU7393MHUZqAspaU20WqY9r9coNRHmje40Uj5ckFzU=
=NdlT
-END PGP SIGNATURE-


Re: Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Georg,

On 7/31/18 12:33 PM, Georg Fette wrote:
> Yes ist is only one of the processors that is at maximum capacity.

Ok.

> How do I do something like a thread-dump of a single thread ?

Here's how to get a thread dump of the whole JVM:
https://wiki.apache.org/tomcat/HowTo#How_do_I_obtain_a_thread_dump_of_my
_running_webapp_.3F

The "tid" field of each thread is usually the same as the process-id
from a "top" or "ps" listing, except it's often shown in hex instead
of decimal.

Have a look at this for some guidance:
http://javadrama.blogspot.com/2012/02/why-is-java-eating-my-cpu.html

Some tools dump the tid in hex, others in decimal. It's frustrating
sometimes.

> We run the Solr from the command line out-of-the-box and not in a 
> code development environment. Are there parameters that can be 
> configured so that the server creates dumps ?
You don't want this to happen automatically. Instead, you'll want to
trigger a dump manually for debugging purposes.

- -chris


> Am 31.07.2018 um 15:07 schrieb Christopher Schultz: Georg,
> 
> On 7/31/18 4:39 AM, Georg Fette wrote:
 We run the server version 7.3.1. on a machine with 32GB RAM
 in a mode having -10g.
 
 When requesting a query with
 
 q={!boost 
 b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=str
ing
>
 
_field_type:catalog_entry&rows=2147483647
 
 
 the server takes all available memory up to 10GB and is then
 no longer accessible with one processor at 100%.
> Is it a single thread which takes the CPU or more than one? Can
> you identify that thread and take a thread dump to get a backtrace
> for that thread?
> 
 When we reduce the rows parameter to 1000 the query
 works. The query returns only 581 results.
 
 The documentation at 
 https://wiki.apache.org/solr/CommonQueryParameters states
 that as the "rows" parameter a "ridiculously large value" may
 be used, but this could pose a problem. The number we used
 was Int.max from Java.
> Interesting. I wonder if Solr attempts to pre-allocate a result 
> buffer. Requesting 2147483647 rows can have an adverse affect on
> most pre-allocated data structures.
> 
> -chris
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgySgACgkQHPApP6U8
pFjgKxAAxfbUmcj81+CpmTwHaPsz8Zb70HX4o/1eDGwALMhuvg8MyTaZnR9rSPy3
LHhAn0dtdnhp7Pe3NWRrYFdzKOZjQ85jiEcW96bzCe5ggJmnvs9a9VeEJ+5b4AXN
XMtSMo8Ph7BvAWeTQcwmsiK8w2grAzaV6zXEetxaXgL0+16wfIjyNBteiQHkpcjo
T5T5UzSzwyuAxFJkxSdbsF6SAJD7+zwbOEUQlURlUBsmzgam124ojgNl3gEG8d/V
SSFhI1vnuj7pkdFLSZm7BDdAw6KjnOeM3yE3VKh5Lem4CRNLrP3ZvKrzKVlWTFJ4
dAIuJL6GUSMEFU0MCwQZjFxmtWNMwl/MIdDD8Yp9m/GislLXbcOi4oBbmWTNnuqU
SPtmjdV+7fcIRl8AWc0bzLbK4nFYlVFzhiijR5am+pvF13TB/WQ8eOn9uifSPxWb
OHzrU+fMV0fvIe5pZxqkcHEBas5QiZKZ5yH6Zz+xLldF4nh9Q4A6CJu/21qU/Kxd
Dp2lenZEjKc90FKpSVMXqxJNM0n7geRmTSgv8imeoQf5+H6VU7dll1xGQkTnXtR9
UyV/U1fj12z2UjzcY6ePuJ8BadIx+cSf6H3q4bcJOGZ884lI+bDX08C/89hb/5vT
2NE5+tK1jAOX/ESClb6eFFMcJzBww/CoIxb9PpRqgw3HJKYuVpY=
=mS/y
-END PGP SIGNATURE-


Re: Search for a specific unicode char

2018-07-31 Thread tedsolr
This is an example of what the data looks like:

  "SOURCEFILEID":"77907",
"APPROP_GROUP_CODE_T":"F\uG\uR",
"APPROP_GROUP_CODE_T_aggr":"F\uG\uR",
"APPROP_GROUP_CODE_T_search":"F\uG\uR",
"OBJECT_DESC_T":"OTHER PROFESSIONAL/TECHNICAL SERVICES",

That's a snippet from a query results. "\u" is a null value. I don't
know why this data is presenting in this style. I still don't know how to
search for one unicode character. A search using the value as shown above
does work:
q=APPROP_GROUP_CODE_T:"F\uG\uR"



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Highlighting the search keywords

2018-07-31 Thread Renuka Srishti
Hi All,

I was using highlighting in solr, solr gives highlighting results within
the response but not included within the documents.
Am i missing something? Can i configure so that it can show highlighted
keywords matched within the documents.

Thanks
Renuka Srishti


RE: sharding and placement of replicas

2018-07-31 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
In my case, when trying on Solr7.4 (in response to Shawn Heisey's 6/19/18 
comment "If this is a provable and reproducible bug, and it's still a problem 
in the current stable branch"), I had only installed Solr7.4 on one host, and 
so I was testing with two nodes on the same host (different port numbers). I 
had previously had the same symptom when the two nodes were on different hosts, 
but that was with Solr6.6 -- I can try it again with Solr7.4 with two hosts and 
report back.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Tuesday, July 31, 2018 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: sharding and placement of replicas

On 7/27/2018 8:26 PM, Erick Erickson wrote:
> Yes with some fiddling as far as "placement rules", start here:
> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
>
> The idea (IIUC) is that you provide a snitch" that identifies what
> "rack" the Solr instance is on and can define placement rules that
> define "don't put more than one thingy on the same rack". "Thingy"
> here is replica, shard, whatever as defined by other placement rules.

I'd like to see an improvement in Solr's behavior when nothing has been
configured in auto-scaling or rule-based replica placement.  Configuring
those things is certainly an option, but I think we can do better even
without that config.

I believe that Solr already has some default intelligence that keeps
multiple replicas from ending up on the same *node* when possible ... I
would like this to also be aware of *hosts*.

Craig hasn't yet indicated whether there is more than one node per host,
so I don't know whether the behavior he's seeing should be considered a bug.

If somebody gives one machine multiple names/addresses and uses
different hostnames in their SolrCloud config for one actual host, then
it wouldn't be able to do any better than it does now, but if there are
matches in the hostname part of different entries in live_nodes, then I
think the improvement might be relatively easy.  Not saying that I know
what to do, but somebody who is familiar with the Collections API code
can probably do it.

Thanks,
Shawn



Search for a specific unicode char

2018-07-31 Thread tedsolr
I'm having some trouble with non printable, but valid, UTF8 chars when
exporting to Amazon Redshift. The export fails but I can't yet find this
data in my Solr collection. How can I search, say from the admin console,
for a particular character? I'm looking for U+001E and U+001F

thanks!
Solr 5.5.4



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: sharding and placement of replicas

2018-07-31 Thread Shawn Heisey
On 7/27/2018 8:26 PM, Erick Erickson wrote:
> Yes with some fiddling as far as "placement rules", start here:
> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
>
> The idea (IIUC) is that you provide a snitch" that identifies what
> "rack" the Solr instance is on and can define placement rules that
> define "don't put more than one thingy on the same rack". "Thingy"
> here is replica, shard, whatever as defined by other placement rules.

I'd like to see an improvement in Solr's behavior when nothing has been
configured in auto-scaling or rule-based replica placement.  Configuring
those things is certainly an option, but I think we can do better even
without that config.

I believe that Solr already has some default intelligence that keeps
multiple replicas from ending up on the same *node* when possible ... I
would like this to also be aware of *hosts*.

Craig hasn't yet indicated whether there is more than one node per host,
so I don't know whether the behavior he's seeing should be considered a bug.

If somebody gives one machine multiple names/addresses and uses
different hostnames in their SolrCloud config for one actual host, then
it wouldn't be able to do any better than it does now, but if there are
matches in the hostname part of different entries in live_nodes, then I
think the improvement might be relatively easy.  Not saying that I know
what to do, but somebody who is familiar with the Collections API code
can probably do it.

Thanks,
Shawn



Fuzzy search 'sometimes' not working

2018-07-31 Thread David Frese

Hello list,

I currently observed a very strange behaviour of fuzzy searches with 
Solr Cloud 5.5.0.


I have two identical documents in 2 different collections. Something 
like {name: "Tomas"}. I find the document in the first collection with a 
search like name:Thomass~2. But I don't find it in the second one!


I triple checked everything (I find them both with name:Tomas, I find 
them both with name:Thomas~1, the solrconfig and schemas are identical), 
but I just don't see any reasonable explanation for it.


Could it be that the functionality of fuzzy searching depends on the 
data of other documents in the collection; like a limit of how many 
"Thomas"'s there could be? Or on the amount of memory available? Could 
some race condition during indexing have removed the "Thomass" variant 
in one case? Anything non-deterministic? Any bug in that direction fixed 
since 5.5.0?



Thanks a lot for any ideas,
David

--
David Frese
+49 7071 70896 75

Active Group GmbH
Hechinger Str. 12/1, 72072 Tübingen
Registergericht: Amtsgericht Stuttgart, HRB 224404
Geschäftsführer: Dr. Michael Sperber


Re: Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Georg Fette

Hi Christoph,
Yes ist is only one of the processors that is at maximum capacity.
How do I do something like a thread-dump of a single thread ? We run the
Solr from the command line out-of-the-box and not in a code development
environment. Are there parameters that can be configured so that the
server creates dumps ?
Greetings
Georg

Am 31.07.2018 um 15:07 schrieb Christopher Schultz:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Georg,

On 7/31/18 4:39 AM, Georg Fette wrote:

We run the server version 7.3.1. on a machine with 32GB RAM in a
mode having -10g.

When requesting a query with

q={!boost
b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string

_field_type:catalog_entry&rows=2147483647



the server takes all available memory up to 10GB and is then no
longer accessible with one processor at 100%.

Is it a single thread which takes the CPU or more than one? Can you
identify that thread and take a thread dump to get a backtrace for
that thread?


When we reduce the rows parameter to 1000 the query works. The
query returns only 581 results.

The documentation at
https://wiki.apache.org/solr/CommonQueryParameters states that as
the "rows" parameter a "ridiculously large value" may be used, but
this could pose a problem. The number we used was Int.max from
Java.

Interesting. I wonder if Solr attempts to pre-allocate a result
buffer. Requesting 2147483647 rows can have an adverse affect on most
pre-allocated data structures.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgXy4ACgkQHPApP6U8
pFicOQ//c1Qe0hLOHIbSvmxAMVEhqZTjQlzEGFoYYhC1aGrYpw++RKQYtBLD2kmN
DcLLkwFOmwv5CDft+Mn+g5ZWhEuZSKnwFgxsPfTAbRGjDYGQ7qCzzGq2JGacoxTJ
rPgizyRlZQ4f5QY0RHohAGFx/QhgPtLdSl0V32eERWH8fVJWvDH3iYTTTSDN4UCY
/bpB34nrruBgh2iTz9UcGR1jnTw9iU57OVYRwtTk8ETeOivcBM5MTXzKbwQ8/w5m
c7lmKWqMG0G5XKKu6KDbWFZwSwYLBvHTUQurqgS2pkm+r2c4xP5/U0+uI5D9EseS
1HiOjWBuhWFEIveioKCOQbPAWL+C0i4xMbBLiC4RZPnTs6LSQ0aXm4Jx05NFoAWt
3HA2VCb9rrK5y8cICSCbVGaPNNBT9HHqJqeo2eGbzLaZXP5iRCc8BdkjHTPrSqCq
gh8FEAK9pVS3ejO96DZvIoiIEpcmRNuSHczdE7YKwCv5XvytSh4QXa0SKluEhpYo
acPXOtjIbqFcTZ1f+hZTfiG1/PeCUnYshta8VdSyvIjm748wOB7wqs7uYhl0b6zx
i6OgoQ3bOel8e7oAO4Fmv5LE56b8A4tOPzPBf4Y1ehb8e8HbBdSzZuzqZZrQqChQ
AUfrEzaXUKIBsmlaUneT2qjsLLZZmU+Gk0EYJnmHw63RQR/QxKg=
=IXGx
-END PGP SIGNATURE-



--
-
Dipl.-Inf. Georg Fette  Raum: B009
Universität WürzburgTel.: +49-(0)931-31-85516
Am Hubland  Fax.: +49-(0)931-31-86732
97074 Würzburg  mail: georg.fe...@uni-wuerzburg.de
-



Re: Zookeeper / Solr interaction

2018-07-31 Thread Erick Erickson
Ok, your OOM errors are most likely due to
trying to stuff too many replicas into too little memory.

You have 100 collections, 8 shards per collection and
1 replica per shard. So if my math is right, you have
800 replicas total, 400 replicas per Solr instance.

6G of memory is very little for that many replicas, and
any significant number of docs in these collections will
make matters worse. Anything to do with sorting,
faceting or grouping that works on fields that do not
use docValues will make it even worse yet.

All that said, I've certainly seen many more replicas
than that operate with a 3-node external ZK ensemble so
I doubt it's the case you're overloading ZK. There are
some additional multipliers to the number of ZK
events having to do with the "overseer".

My best guess is that your focus on ZK events is a red
herring, and that if you distributed your replicas over more
Solr nodes and/or were able to allocate significantly more
memory to each node the problem would go away.

Note that the usual recommendation is that you allocate no
more than 50% of your available physical memory to
the Java heap.

Best,
Erick


On Tue, Jul 31, 2018 at 8:06 AM, Zarski, Jacek  wrote:
> Thanks for responding! That's some good info. Here are the answers to the 
> questions you had...
>
> Solr has 6gb of heap
> We have 1 replica per shard at 8 shards per collection
> We currently have approximately 100 collections
> Zookeeper is an external ensemble each with their own server
>
> -Original Message-
> From: Erick Erickson 
> Sent: Monday, July 30, 2018 7:56 PM
> To: solr-user 
> Subject: Re: Zookeeper / Solr interaction
>
> 1> Yeah, the interactions with ZK are quite chatty. Basically each
> replica may have several changes of state with ZooKeeper.
> Down->recovering->active. How many replicas do you have on a node?
>
> 2> Unfortunately I don't have much info on this point.
>
> 3> I would not expect OOMs on the Solr node while waiting for ZK to
> respond. How much heap are you allocating to Solr? How many replicas do you 
> have?
>
> a> Plausible yes, but that many transactions  seems quite high. How
> many replicas do you have on your Solr instance? One scenario here is that 
> you have thousands of replicas. Is your ZK ensemble an external one? And are 
> they running on separate hardware? Because this many transactions for only 
> two Solr instances seems quite high so I'm curious about a few more details 
> of your setup, how many collections, shards and replicas are we talking here?
>
> Best,
> Erick
>
>
> On Mon, Jul 30, 2018 at 1:10 PM, Zarski, Jacek  
> wrote:
>> Some information I forgot to include:
>> Solr version : 7.2.1
>> Zk version : 3.4.10
>>
>> -Original Message-
>> From: Zarski, Jacek 
>> Sent: Monday, July 30, 2018 4:06 PM
>> To: solr-user@lucene.apache.org
>> Subject: Zookeeper / Solr interaction
>>
>> Hi,
>>
>> We have the following environment setup for zookeeper/solrcould
>>
>> 3 zookeeper ensemble
>> 2 Solr cloud servers
>>
>> I am writing you to further inquire about the interaction of solr and 
>> zookeeper. In particular relating to transactions in the transaction logs. I 
>> have a script running that logs the amount of transactions. I am matching 
>> this log with snapshot timing and new log creation.
>>
>> After a problem arose in our PROD environment, I have tracked it to an 
>> unrecommended configuration where logs and data was kept on the same drive. 
>> Since then we have configured separate drives for logs and data in that 
>> environment. The behavior that caused the problem was when a snapshot was 
>> happening, a solr instance reported that it was unable to establish a ZK 
>> leader. Following that failure, during recovery,  4 more snapshots happened 
>> in short succession(10 minutes) on all 3 zk servers causing the whole 
>> environment to be unresponsive until restart for 1.5 hours.
>>
>> I am currently working to recreate the problem and gather more information 
>> on the cause and impact of snapshots. I have configured a DEV environment 
>> with the same number of servers. I have changed the zk configuration to 
>> again have the logs and data in the same drive and directory. I am seeing 
>> that snapshots cause a degredation in performance due to IO block but I 
>> would like more information on transactions and snapshots to confirm this 
>> behavior and our suspicions.
>>
>> Here are the scenarios I would like more information about:
>>
>> 1.   When the solr server is restarted, I see a huge influx of 
>> transactions on the zookeeper transaction log. What is the solr behavior 
>> that is causing this and is this normal?
>>
>> 2.   There is scenarios where snapshots are being created without 
>> reaching "snapCount" (snapCount=10) transactions. I have documented 
>> snapshots at 17k and 45k transactions. In what scenarios would a snapshot be 
>> created other than reaching "snapCount" transactions?
>>
>> 3.   Since zk won't respond before wr

RE: Zookeeper / Solr interaction

2018-07-31 Thread Zarski, Jacek
Thanks for responding! That's some good info. Here are the answers to the 
questions you had...

Solr has 6gb of heap
We have 1 replica per shard at 8 shards per collection 
We currently have approximately 100 collections
Zookeeper is an external ensemble each with their own server

-Original Message-
From: Erick Erickson  
Sent: Monday, July 30, 2018 7:56 PM
To: solr-user 
Subject: Re: Zookeeper / Solr interaction

1> Yeah, the interactions with ZK are quite chatty. Basically each
replica may have several changes of state with ZooKeeper.
Down->recovering->active. How many replicas do you have on a node?

2> Unfortunately I don't have much info on this point.

3> I would not expect OOMs on the Solr node while waiting for ZK to
respond. How much heap are you allocating to Solr? How many replicas do you 
have?

a> Plausible yes, but that many transactions  seems quite high. How
many replicas do you have on your Solr instance? One scenario here is that you 
have thousands of replicas. Is your ZK ensemble an external one? And are they 
running on separate hardware? Because this many transactions for only two Solr 
instances seems quite high so I'm curious about a few more details of your 
setup, how many collections, shards and replicas are we talking here?

Best,
Erick


On Mon, Jul 30, 2018 at 1:10 PM, Zarski, Jacek  wrote:
> Some information I forgot to include:
> Solr version : 7.2.1
> Zk version : 3.4.10
>
> -Original Message-
> From: Zarski, Jacek 
> Sent: Monday, July 30, 2018 4:06 PM
> To: solr-user@lucene.apache.org
> Subject: Zookeeper / Solr interaction
>
> Hi,
>
> We have the following environment setup for zookeeper/solrcould
>
> 3 zookeeper ensemble
> 2 Solr cloud servers
>
> I am writing you to further inquire about the interaction of solr and 
> zookeeper. In particular relating to transactions in the transaction logs. I 
> have a script running that logs the amount of transactions. I am matching 
> this log with snapshot timing and new log creation.
>
> After a problem arose in our PROD environment, I have tracked it to an 
> unrecommended configuration where logs and data was kept on the same drive. 
> Since then we have configured separate drives for logs and data in that 
> environment. The behavior that caused the problem was when a snapshot was 
> happening, a solr instance reported that it was unable to establish a ZK 
> leader. Following that failure, during recovery,  4 more snapshots happened 
> in short succession(10 minutes) on all 3 zk servers causing the whole 
> environment to be unresponsive until restart for 1.5 hours.
>
> I am currently working to recreate the problem and gather more information on 
> the cause and impact of snapshots. I have configured a DEV environment with 
> the same number of servers. I have changed the zk configuration to again have 
> the logs and data in the same drive and directory. I am seeing that snapshots 
> cause a degredation in performance due to IO block but I would like more 
> information on transactions and snapshots to confirm this behavior and our 
> suspicions.
>
> Here are the scenarios I would like more information about:
>
> 1.   When the solr server is restarted, I see a huge influx of 
> transactions on the zookeeper transaction log. What is the solr behavior that 
> is causing this and is this normal?
>
> 2.   There is scenarios where snapshots are being created without 
> reaching "snapCount" (snapCount=10) transactions. I have documented 
> snapshots at 17k and 45k transactions. In what scenarios would a snapshot be 
> created other than reaching "snapCount" transactions?
>
> 3.   Since zk won't respond before writing to the transaction log... at 
> Snapshot time(IO block) is it possible for the solr server to wait for a 
> response from zk causing all other writes to be buffered resulting in a full 
> heap and therefore an out of memory failure on the solr node?
>
> a.   Now referencing question #1... When a solr node recovers, the influx 
> of transactions plus the continuing writes seems to be enough to trigger 
> another snapshot resulting in further downtime. Is this case plausible?
>
> Thanks,
> Jacek


Re: Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Shawn Heisey

On 7/31/2018 2:39 AM, Georg Fette wrote:
We run the server version 7.3.1. on a machine with 32GB RAM in a mode 
having -10g.


When requesting a query with

q={!boost 
b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647


the server takes all available memory up to 10GB and is then no longer 
accessible with one processor at 100%.


When we reduce the rows parameter to 1000 the query works. The 
query returns only 581 results.


This is happening because of the way that Solr prepares for searching.  
Objects are allocated in heap memory according to the rows value before 
the query even gets executed.  If you run Solr on an operating system 
other than Windows, the resulting OutOfMemoryError will cause the Solr 
process to be killed.  If it's running on Windows, Solr would stay 
running, but we have no way of knowing whether it would work *correctly* 
after OOME.


https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_too_many_rows

At the link above is a link to a blog post that covers the problem in 
great detail.


https://sbdevel.wordpress.com/2015/10/05/speeding-up-core-search/

With a rows parameter of over 2 billion, Solr (actually it's Lucene, 
which provides most of Solr's functionality) will allocate that many 
ScoreDoc objects, which needs about 60GB of heap memory.  So it's not 
possible on your hardware.


As you'll see if you read the blog post, Toke has some ideas about how 
to improve the situation.  I don't think an issue has been filed, but I 
could be wrong about that.


Right now, switching to cursorMark or the /export handler is a better 
way to get a very large result set.


Thanks,
Shawn



Re: Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Georg,

On 7/31/18 4:39 AM, Georg Fette wrote:
> We run the server version 7.3.1. on a machine with 32GB RAM in a
> mode having -10g.
> 
> When requesting a query with
> 
> q={!boost 
> b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string
_field_type:catalog_entry&rows=2147483647
>
> 
> 
> the server takes all available memory up to 10GB and is then no
> longer accessible with one processor at 100%.

Is it a single thread which takes the CPU or more than one? Can you
identify that thread and take a thread dump to get a backtrace for
that thread?

> When we reduce the rows parameter to 1000 the query works. The
> query returns only 581 results.
> 
> The documentation at
> https://wiki.apache.org/solr/CommonQueryParameters states that as
> the "rows" parameter a "ridiculously large value" may be used, but
> this could pose a problem. The number we used was Int.max from 
> Java.

Interesting. I wonder if Solr attempts to pre-allocate a result
buffer. Requesting 2147483647 rows can have an adverse affect on most
pre-allocated data structures.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgXy4ACgkQHPApP6U8
pFicOQ//c1Qe0hLOHIbSvmxAMVEhqZTjQlzEGFoYYhC1aGrYpw++RKQYtBLD2kmN
DcLLkwFOmwv5CDft+Mn+g5ZWhEuZSKnwFgxsPfTAbRGjDYGQ7qCzzGq2JGacoxTJ
rPgizyRlZQ4f5QY0RHohAGFx/QhgPtLdSl0V32eERWH8fVJWvDH3iYTTTSDN4UCY
/bpB34nrruBgh2iTz9UcGR1jnTw9iU57OVYRwtTk8ETeOivcBM5MTXzKbwQ8/w5m
c7lmKWqMG0G5XKKu6KDbWFZwSwYLBvHTUQurqgS2pkm+r2c4xP5/U0+uI5D9EseS
1HiOjWBuhWFEIveioKCOQbPAWL+C0i4xMbBLiC4RZPnTs6LSQ0aXm4Jx05NFoAWt
3HA2VCb9rrK5y8cICSCbVGaPNNBT9HHqJqeo2eGbzLaZXP5iRCc8BdkjHTPrSqCq
gh8FEAK9pVS3ejO96DZvIoiIEpcmRNuSHczdE7YKwCv5XvytSh4QXa0SKluEhpYo
acPXOtjIbqFcTZ1f+hZTfiG1/PeCUnYshta8VdSyvIjm748wOB7wqs7uYhl0b6zx
i6OgoQ3bOel8e7oAO4Fmv5LE56b8A4tOPzPBf4Y1ehb8e8HbBdSzZuzqZZrQqChQ
AUfrEzaXUKIBsmlaUneT2qjsLLZZmU+Gk0EYJnmHw63RQR/QxKg=
=IXGx
-END PGP SIGNATURE-


Re: Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Andrea Gazzarini
Yes, but 581 is the final number you got in the response, which is the 
result of the main query intersected with the filter query so I wouldn't 
take in account this number. The main and the filter query are executed 
separately, so I guess (but I'm guessing because I don't know these 
internals) that's here where the "rows" parameter matters.


Again, I'm guessing, I'm sure some Solr committer here can explain you 
how things are working.


Best,
Andrea

On 31/07/18 11:12, Fette, Georg wrote:

Hi Andrea,
I agree that receiving too much data in one request is bad. But I was 
surprised that the query works with a lower but still very large rows 
parameter and that there is a threshold at which it crashes the 
server. Furthermore, it seems that the reason for the crash is not the 
size of the actual results because those are only 581.

Greetings
Georg

Am 31.07.2018 um 10:53 schrieb Andrea Gazzarini:

Hi Georg,
I would say, without knowing your context, that this is not what Solr 
is supposed to do. You're asking to load everything in a single 
request/response and this poses a problem.
Since I guess that, even we assume it works, you should then iterate 
those results one by one or in blocks, an option would be to do this 
part (block scrolling) using Solr [2].

I suggest you to have a look at

 * the export endpoint [1]
 * the cursor API [2]

Best,
Andrea

[1] https://lucene.apache.org/solr/guide/6_6/exporting-result-sets.html
[2] 
https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors


On 31/07/18 10:44, Georg Fette wrote:

Hello,
We run the server version 7.3.1. on a machine with 32GB RAM in a 
mode having -10g.

When requesting a query with
q={!boost 
b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647
the server takes all available memory up to 10GB and is then no 
longer accessible with one processor at 100%.
When we reduce the rows parameter to 1000 the query works. The 
query returns only 581 results.
The documentation at 
https://wiki.apache.org/solr/CommonQueryParameters states that as 
the "rows" parameter a "ridiculously large value" may be used, but 
this could pose a problem. The number we used was Int.max from Java.

Greetings
Georg










NullPointerException in SolrMetricManager

2018-07-31 Thread Hendrik Haddorp

Hi,

we are seeing the following NPE sometimes when we delete a collection 
right after we modify the schema:


08:47:46.407 [zkCallback-5-thread-4] INFO 
org.apache.solr.rest.ManagedResource 209 processStoredData - Loaded 
initArgs {ignoreCase=true} for /schema/analysis/stopwords/text_ar
08:47:46.407 [zkCallback-5-thread-4] INFO 
org.apache.solr.rest.schema.analysis.ManagedWordSetResource 116 
onManagedDataLoadedFromStorage - Loaded 119 words for 
/schema/analysis/stopwords/text_ar
08:47:46.407 [zkCallback-5-thread-4] INFO 
org.apache.solr.rest.ManagedResource 117 notifyObserversDuringInit - 
Notified 8 observers of /schema/analysis/stopwords/text_ar
08:47:46.407 [zkCallback-5-thread-4] INFO 
org.apache.solr.rest.RestManager 668 addRegisteredResource - Registered 
new managed resource /schema/analysis/stopwords/text_ar
08:47:46.408 [zkCallback-5-thread-4] INFO 
org.apache.solr.schema.IndexSchema 592 readSchema - Loaded schema 
solr-config/1.6 with uniqueid field id
08:47:46.408 [zkCallback-5-thread-4] INFO 
org.apache.solr.schema.ZkIndexSchemaReader 177 updateSchema - Finished 
refreshing schema in 411 ms
08:47:46.415 [qtp254749889-20] INFO  org.apache.solr.core.SolrCore 1517 
close - [donald.test-query-1533026857986_shard1_replica_n1] CLOSING 
SolrCore org.apache.solr.core.SolrCore@62ef7f0c
08:47:46.415 [qtp254749889-20] INFO 
org.apache.solr.metrics.SolrMetricManager 1038 closeReporters - Closing 
metric reporters for 
registry=solr.core.donald.test-query-1533026857986.shard1.replica_n1, 
tag=62ef7f0c
08:47:46.416 [qtp254749889-20] INFO 
org.apache.solr.metrics.SolrMetricManager 1038 closeReporters - Closing 
metric reporters for 
registry=solr.collection.donald.test-query-1533026857986.shard1.leader, 
tag=62ef7f0c
08:47:46.416 [Thread-20] INFO 
org.apache.solr.metrics.reporters.SolrJmxReporter 112 doInit - JMX 
monitoring for 
'solr.core.donald.test-query-1533026857986.shard1.replica_n1' (registry 
'solr.core.donald.test-query-1533026857986.shard1.replica_n1') enabled 
at server: com.sun.jmx.mbeanserver.JmxMBeanServer@2698dc7
08:47:46.417 [Thread-20] WARN  org.apache.solr.cloud.ZkController 2689 
lambda$fireEventListeners$6 - listener throws error 
org.apache.solr.common.SolrException: Unable to reload core 
[donald.test-query-1533026857986_shard1_replica_n1]
 at 
org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1411) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]
 at 
org.apache.solr.core.SolrCore.lambda$getConfListener$20(SolrCore.java:3029) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]
 at 
org.apache.solr.cloud.ZkController.lambda$fireEventListeners$6(ZkController.java:2687) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]

 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.NullPointerException
 at 
org.apache.solr.metrics.SolrMetricManager.loadShardReporters(SolrMetricManager.java:1146) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]
 at 
org.apache.solr.metrics.SolrCoreMetricManager.loadReporters(SolrCoreMetricManager.java:92) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]
 at org.apache.solr.core.SolrCore.(SolrCore.java:909) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]
 at org.apache.solr.core.SolrCore.reload(SolrCore.java:663) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]
 at 
org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1390) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - 
jpountz - 2018-06-18 16:55:13]

 ... 3 more

regards,
Hendrik


Re: Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Fette, Georg

Hi Andrea,
I agree that receiving too much data in one request is bad. But I was 
surprised that the query works with a lower but still very large rows 
parameter and that there is a threshold at which it crashes the server. 
Furthermore, it seems that the reason for the crash is not the size of 
the actual results because those are only 581.

Greetings
Georg

Am 31.07.2018 um 10:53 schrieb Andrea Gazzarini:

Hi Georg,
I would say, without knowing your context, that this is not what Solr 
is supposed to do. You're asking to load everything in a single 
request/response and this poses a problem.
Since I guess that, even we assume it works, you should then iterate 
those results one by one or in blocks, an option would be to do this 
part (block scrolling) using Solr [2].

I suggest you to have a look at

 * the export endpoint [1]
 * the cursor API [2]

Best,
Andrea

[1] https://lucene.apache.org/solr/guide/6_6/exporting-result-sets.html
[2] 
https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors


On 31/07/18 10:44, Georg Fette wrote:

Hello,
We run the server version 7.3.1. on a machine with 32GB RAM in a mode 
having -10g.

When requesting a query with
q={!boost 
b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647
the server takes all available memory up to 10GB and is then no 
longer accessible with one processor at 100%.
When we reduce the rows parameter to 1000 the query works. The 
query returns only 581 results.
The documentation at 
https://wiki.apache.org/solr/CommonQueryParameters states that as the 
"rows" parameter a "ridiculously large value" may be used, but this 
could pose a problem. The number we used was Int.max from Java.

Greetings
Georg






--
-
Dipl.-Inf. Georg Fette  Raum: B009
Universität WürzburgTel.: +49-(0)931-31-85516
Am Hubland  Fax.: +49-(0)931-31-86732
97074 Würzburg  mail: georg.fe...@uni-wuerzburg.de
-



Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Georg Fette
We run the server version 7.3.1. on a machine with 32GB RAM in a mode 
having -10g.


When requesting a query with

q={!boost 
b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647


the server takes all available memory up to 10GB and is then no longer 
accessible with one processor at 100%.


When we reduce the rows parameter to 1000 the query works. The query 
returns only 581 results.


The documentation at https://wiki.apache.org/solr/CommonQueryParameters 
states that as the "rows" parameter a "ridiculously large value" may be 
used, but this could pose a problem. The number we used was Int.max from 
Java.


Greetings
Georg

--
-
Dipl.-Inf. Georg Fette  Raum: B001
Universität WürzburgTel.: +49-(0)931-31-85516
Am Hubland  Fax.: +49-(0)931-31-86732
97074 Würzburg  mail: georg.fe...@uni-wuerzburg.de
-



Is system-load based auto-scaling supported in latest Solr version

2018-07-31 Thread Xiaoming Ma
Hi, there

>From Solr 7.x doc I know auto-scaling is triggered by number of replicas,
just want to know if I can achieve auto-scaling based on system-load
dynamically?

Appreciate your reply.

Thanks,
Xiaoming


Re: Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Andrea Gazzarini

Hi Georg,
I would say, without knowing your context, that this is not what Solr is 
supposed to do. You're asking to load everything in a single 
request/response and this poses a problem.
Since I guess that, even we assume it works, you should then iterate 
those results one by one or in blocks, an option would be to do this 
part (block scrolling) using Solr [2].

I suggest you to have a look at

 * the export endpoint [1]
 * the cursor API [2]

Best,
Andrea

[1] https://lucene.apache.org/solr/guide/6_6/exporting-result-sets.html
[2] 
https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors


On 31/07/18 10:44, Georg Fette wrote:

Hello,
We run the server version 7.3.1. on a machine with 32GB RAM in a mode 
having -10g.

When requesting a query with
q={!boost 
b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647
the server takes all available memory up to 10GB and is then no longer 
accessible with one processor at 100%.
When we reduce the rows parameter to 1000 the query works. The 
query returns only 581 results.
The documentation at 
https://wiki.apache.org/solr/CommonQueryParameters states that as the 
"rows" parameter a "ridiculously large value" may be used, but this 
could pose a problem. The number we used was Int.max from Java.

Greetings
Georg





RE: Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Markus Jelsma
Hello Georg,

As you have seen, a high rows parameter is a bad idea. Use cursor mark [1] 
instead.

Regards,
Markus

[1] https://lucene.apache.org/solr/guide/7_4/pagination-of-results.html
 
 
-Original message-
> From:Georg Fette 
> Sent: Tuesday 31st July 2018 10:44
> To: solr-user@lucene.apache.org
> Subject: Solr Server crashes when requesting a result with too large 
> resultRows
> 
> Hello,
> We run the server version 7.3.1. on a machine with 32GB RAM in a mode 
> having -10g.
> When requesting a query with
> q={!boost 
> b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647
> the server takes all available memory up to 10GB and is then no longer 
> accessible with one processor at 100%.
> When we reduce the rows parameter to 1000 the query works. The query 
> returns only 581 results.
> The documentation at https://wiki.apache.org/solr/CommonQueryParameters 
> states that as the "rows" parameter a "ridiculously large value" may be 
> used, but this could pose a problem. The number we used was Int.max from 
> Java.
> Greetings
> Georg
> 
> -- 
> -
> Dipl.-Inf. Georg Fette  Raum: B001
> Universität WürzburgTel.: +49-(0)931-31-85516
> Am Hubland  Fax.: +49-(0)931-31-86732
> 97074 Würzburg  mail: georg.fe...@uni-wuerzburg.de
> -
> 
> 


Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Georg Fette

Hello,
We run the server version 7.3.1. on a machine with 32GB RAM in a mode 
having -10g.

When requesting a query with
q={!boost 
b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)&fq=string_field_type:catalog_entry&rows=2147483647
the server takes all available memory up to 10GB and is then no longer 
accessible with one processor at 100%.
When we reduce the rows parameter to 1000 the query works. The query 
returns only 581 results.
The documentation at https://wiki.apache.org/solr/CommonQueryParameters 
states that as the "rows" parameter a "ridiculously large value" may be 
used, but this could pose a problem. The number we used was Int.max from 
Java.

Greetings
Georg

--
-
Dipl.-Inf. Georg Fette  Raum: B001
Universität WürzburgTel.: +49-(0)931-31-85516
Am Hubland  Fax.: +49-(0)931-31-86732
97074 Würzburg  mail: georg.fe...@uni-wuerzburg.de
-