Re: Fw: TolerantUpdateProcessorFactory not functioning

2020-06-10 Thread Hup Chen

There was another error which I think it should be an indexing error.
The listprice below is a pdouble filed, the update process didn't ignore the 
error when it was sent wrong data.

Response: {
  "responseHeader":{
"status":400,
"QTime":133551},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","java.lang.NumberFormatException"],
"msg":"ERROR: [doc=978194537913] Error adding field 
'listprice'='106Chapter' msg=For input string: \"106Chapter\"",
"code":400}}



From: Shawn Heisey 
Sent: Tuesday, June 9, 2020 3:19 PM
To: solr-user@lucene.apache.org 
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning

On 6/9/2020 12:44 AM, Hup Chen wrote:
> Thanks for your reply, this is one of the example where it fail.  POST by 
> using  charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error 
> found in the title field,  I hope solr can simply skip this record and go 
> ahead to index the rest data.
>
> 
> 
>   9780373773244
>   9780373773244
> Missing: Innocent By Association^Zachary's Law (Hqn 
> Romance) 
>   Lisa_Jackson 
> 
> 
>
> curl 
> "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain=100;
>  -H 'Content-Type: text/xml; charset=utf-8' -d @data
>
>
> 
> 
>
> 
>
>100
>400
>0
> 
> 
>
>  org.apache.solr.common.SolrException
>   name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException
>
>Illegal character ((CTRL-CHAR, code 26))
>   at [row,col {unknown-source}]: [1,225]
>400
> 
> 

I tried your example XML as it is shown in your original message, saved
to a file named "foo.xml", and didn't have any trouble.  I wasn't even
using the tolerant update processor.   I just fired up the techproducts
example on a solr-8.3.0 download I already had, added a field named
"isbn13" (string type) so the schema was compatible, and tried the
following command:

curl "http://localhost:8983/solr/techproducts/update; -H 'Content-Type:
text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by
an actual Ctrl-Z character.  When I did that, I got exactly the same
error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,
which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format
of the input ... it only ignores errors during *indexing*.  This error
occurred during the input parsing, not during indexing, so the update
processor could not ignore it.

Thanks,
Shawn


Re: How to determine why solr stops running?

2020-06-10 Thread Shawn Heisey

On 6/10/2020 12:13 PM, Ryan W wrote:

People keep suggesting I check the logs for errors.  What do those errors
look like?  Does anyone have examples of the text of a Solr oom error?  Or
the text of any other errors I should be looking for the next time solr
fails?  Are there phrases I should grep for in the logs?  Should I be
looking in the Solr logs for an OOM error, or in the Apache logs?


Are you running Solr on Windows?   If you are, then a Jave OOME will NOT 
cause Solr to stop.  On pretty much any other operating system, Solr 
will terminate when OOME occurs.  This termination will create a 
separate logfile, one that contains very little actual information, 
really the only thing it says is that the oom killer script was 
executed.  That logfile will have a filename like the following:


solr_oom_killer-8983-2019-08-11_22_57_56.log

If OOME is the reason Solr stops running, then the only place that 
exception will be logged is solr.log as far as I know ... but there 
exists a very real possibility that it won't actually be logged.  It 
could occur at a place in the code that does not have any logging.


At the URL below is an example of a logged OOME on a Solr server.  In 
this case, it wasn't memory that was exhausted, the error was logging an 
inability to start a new thread:


https://paste.apache.org/aznyg

Thanks,
Shawn


Re: How to determine why solr stops running?

2020-06-10 Thread Hup Chen
I will check "dmesg" first, to find out any hardware error message.
Then use some system admin tools to monitor that server,
for instance, top, vmstat, lsof, iostat ... or simply install some nice
free monitoring tool into this system, like monit, monitorix, nagios.
Good luck!


From: Ryan W 
Sent: Thursday, June 11, 2020 2:13 AM
To: solr-user@lucene.apache.org 
Subject: Re: How to determine why solr stops running?

Hi all,

People keep suggesting I check the logs for errors.  What do those errors
look like?  Does anyone have examples of the text of a Solr oom error?  Or
the text of any other errors I should be looking for the next time solr
fails?  Are there phrases I should grep for in the logs?  Should I be
looking in the Solr logs for an OOM error, or in the Apache logs?

There is nothing failing on the server except for solr -- at least not that
I can see.  There is no apparent problem with the hardware or anything else
on the server.  The OS is Red Hat Enterprise Linux. The server has 16 GB of
RAM and hosts one website that does not get a huge amount of traffic.

When the start command is given to solr, does it first check to see if solr
is running, or does it always start solr whether it is already running or
not?

Many thanks!
Ryan


On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson 
wrote:

> To add to what Dave said, if you have a particular machine that’s prone to
> suddenly stopping, that’s usually a red flag that you should seriously
> think about hardware issues.
>
> If the problem strikes different machines, then I agree with Shawn that
> the first thing I’d be suspicious of is OOM errors.
>
> FWIW,
> Erick
>
> > On Jun 9, 2020, at 6:05 AM, Dave  wrote:
> >
> > I’ll add that whenever I’ve had a solr instance shut down, for me it’s
> been a hardware failure. Either the ram or the disk got a “glitch” and both
> of these are relatively fragile and wear and tear type parts of the
> machine, and should be expected to fail and be replaced from time to time.
> Solr is pretty aggressive with its logging so there are a lot of writes
> always happening and of course reads, if the disk has any issues or the
> memory it can lock it up and bring her down, more so if you have any
> spellcheck dictionaries or suggesters being built on start up.
> >
> > Just my experience with this, could be wrong (most likely wrong) but we
> always have extra drives and memory around the server room for this
> reason.  At least once or twice a year we will have a disk failure in the
> raid and need to swap in a new one.
> >
> > Good luck though, also solr should be logging it’s failures so it would
> be good to look there too
> >
> >> On Jun 9, 2020, at 2:35 AM, Shawn Heisey  wrote:
> >>
> >> On 5/14/2020 7:22 AM, Ryan W wrote:
> >>> I manage a site where solr has stopped running a couple times in the
> past
> >>> week. The server hasn't been rebooted, so that's not the reason.  What
> else
> >>> causes solr to stop running?  How can I investigate why this is
> happening?
> >>
> >> Any situation where Solr stops running and nobody requested the stop is
> a result of a serious problem that must be thoroughly investigated.  I
> think it's a bad idea for Solr to automatically restart when it stops
> unexpectedly.  Chances are that whatever caused the crash is going to
> simply make the crash happen again until the problem is solved.
> Automatically restarting could hide problems from the system administrator.
> >>
> >> The only way a Solr auto-restart would be acceptable to me is if it
> sends a high priority alert to the sysadmin EVERY time it executes an
> auto-restart.  It really is that bad of a problem.
> >>
> >> The causes of Solr crashes (that I can think of) include the following.
> I believe I have listed these four options from most likely to least likely:
> >>
> >> * Java OutOfMemoryError exceptions.  On non-windows systems, the
> "bin/solr" script starts Solr with an option that results in Solr's death
> anytime one of these exceptions occurs.  We do this because program
> operation is indeterminate and completely unpredictable when OOME occurs,
> so it's far safer to stop running.  That exception can be caused by several
> things, some of which actually do not involve memory at all.  If you're
> running on Windows via the bin\solr.cmd command, then this will not happen
> ... but OOME could still cause a crash, because as I already mentioned,
> program operation is unpredictable when OOME occurs.
> >>
> >> * The OS kills Solr because system memory is completely exhausted and
> Solr is the process using the most memory.  Linux calls this the
> "oom-killer" ... I am pretty sure something like it exists on most
> operating systems.
> >>
> >> * Corruption somewhere in the system.  Could be in Java, the OS, Solr,
> or data used by any of those.
> >>
> >> * A very serious bug in Solr's code that we haven't discovered yet.
> >>
> >> I included that last one simply for completeness.  A bug 

Re: [EXTERNAL] - SolR OOM error due to query injection

2020-06-10 Thread Isabelle Giguere
Hi Guilherme;

The only thing I can think of right now is the number of non-alphanumeric 
characters.

In the first 'q' in your examples, after resolving the character escapes, 1/3 
of characters are non-alphanumeric (* / = , etc).

Maybe filter-out queries that contain too many non-alphanumeric characters 
before sending the request to Solr ?  Whatever "too many" could be.

Isabelle Giguère
Computational Linguist & Java Developer
Linguiste informaticienne & développeur java



De : Guilherme Viteri 
Envoyé : 10 juin 2020 16:57
À : solr-user@lucene.apache.org 
Objet : [EXTERNAL] - SolR OOM error due to query injection

Hi,

Environment: SolR 6.6.2, with org.apache.solr.solr-core:6.1.0. This setup has 
been running for at least 4 years without having OutOfMemory error. (it is 
never too late for an OOM…)

This week, our search tool has been attacked via ‘sql injection’ like, and that 
led to an OOM. These requests weren’t aggressive that stressed the server with 
an excessive number of hits, however 5 to 10 request of this nature was enough 
to crash the server.

I’ve come across a this link 
https://urldefense.com/v3/__https://stackoverflow.com/questions/26862474/prevent-from-solr-query-injections-when-using-solrj__;!!Obbck6kTJA!IdbT_RQCp3jXO5KJxMkWNJIRlNU9Hu1hnJsWqCWT_QS3zpZSAxYeFPM_hGWNwp3y$
  
, however, that’s not what I am after. In our case we do allow lucene query 
and field search like title:Title or our ids have dash and if it get escaped, 
then the search won’t work properly.

Does anyone have an idea ?

Cheers
G

Here are some of the requests that appeared in the logs in relation to the 
attack (see below: sorry it is messy)
query?q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28SELECT%2F%2A%2A%2F2%2A%28IF%28%28SELECT%2F%2A%2A%2F%2A%2F%2A%2A%2FFROM%2F%2A%2A%2F%28SELECT%2F%2A%2A%2FCONCAT%280x717a707871%2C%28SELECT%2F%2A%2A%2F%28ELT%283235%3D3235%2C1%29%29%29%2C0x717a626271%2C0x78%29%29s%29%2C%2F%2A%2A%2F8446744073709551610%2C%2F%2A%2A%2F8446744073709551610%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22YBXk%22%2F%2A%2A%2FLIKE%2F%2A%2A%2F%22YBXk=Homo%20sapiens=Reaction=Pathway=true

q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28SELECT%2F%2A%2A%2F2%2A%28IF%28%28SELECT%2F%2A%2A%2F%2A%2F%2A%2A%2FFROM%2F%2A%2A%2F%28SELECT%2F%2A%2A%2FCONCAT%280x717a707871%2C%28SELECT%2F%2A%2A%2F%28ELT%283235%3D3235%2C1%29%29%29%2C0x717a626271%2C0x78%29%29s%29%2C%2F%2A%2A%2F8446744073709551610%2C%2F%2A%2A%2F8446744073709551610%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22rDmG%22%3D%22rDmG=Homo%20sapiens=Reaction=Pathway=true

q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28SELECT%2F%2A%2A%2F3641%2F%2A%2A%2FFROM%28SELECT%2F%2A%2A%2FCOUNT%28%2A%29%2CCONCAT%280x717a707871%2C%28SELECT%2F%2A%2A%2F%28ELT%283641%3D3641%2C1%29%29%29%2C0x717a626271%2CFLOOR%28RAND%280%29%2A2%29%29x%2F%2A%2A%2FFROM%2F%2A%2A%2FINFORMATION_SCHEMA.PLUGINS%2F%2A%2A%2FGROUP%2F%2A%2A%2FBY%2F%2A%2A%2Fx%29a%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22dfkM%22%2F%2A%2A%2FLIKE%2F%2A%2A%2F%22dfkM=Homo%20sapiens=Reaction=Pathway=true

q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28SELECT%2F%2A%2A%2F3641%2F%2A%2A%2FFROM%28SELECT%2F%2A%2A%2FCOUNT%28%2A%29%2CCONCAT%280x717a707871%2C%28SELECT%2F%2A%2A%2F%28ELT%283641%3D3641%2C1%29%29%29%2C0x717a626271%2CFLOOR%28RAND%280%29%2A2%29%29x%2F%2A%2A%2FFROM%2F%2A%2A%2FINFORMATION_SCHEMA.PLUGINS%2F%2A%2A%2FGROUP%2F%2A%2A%2FBY%2F%2A%2A%2Fx%29a%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22yBhx%22%3D%22yBhx=Homo%20sapiens=Reaction=Pathway=true

q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F1695%3DCTXSYS.DRITHSX.SN%281695%2C%28CHR%28113%29%7C%7CCHR%28122%29%7C%7CCHR%28112%29%7C%7CCHR%28120%29%7C%7CCHR%28113%29%7C%7C%28SELECT%2F%2A%2A%2F%28CASE%2F%2A%2A%2FWHEN%2F%2A%2A%2F%281695%3D1695%29%2F%2A%2A%2FTHEN%2F%2A%2A%2F1%2F%2A%2A%2FELSE%2F%2A%2A%2F0%2F%2A%2A%2FEND%29%2F%2A%2A%2FFROM%2F%2A%2A%2FDUAL%29%7C%7CCHR%28113%29%7C%7CCHR%28122%29%7C%7CCHR%2898%29%7C%7CCHR%2898%29%7C%7CCHR%28113%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22eEdc%22%2F%2A%2A%2FLIKE%2F%2A%2A%2F%22eEdc=Homo%20sapiens=Reaction=Pathway=true

q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F1695%3DCTXSYS.DRITHSX.SN%281695%2C%28CHR%28113%29%7C%7CCHR%28122%29%7C%7CCHR%28112%29%7C%7CCHR%28120%29%7C%7CCHR%28113%29%7C%7C%28SELECT%2F%2A%2A%2F%28CASE%2F%2A%2A%2FWHEN%2F%2A%2A%2F%281695%3D1695%29%2F%2A%2A%2FTHEN%2F%2A%2A%2F1%2F%2A%2A%2FELSE%2F%2A%2A%2F0%2F%2A%2A%2FEND%29%2F%2A%2A%2FFROM%2F%2A%2A%2FDUAL%29%7C%7CCHR%28113%29%7C%7CCHR%28122%29%7C%7CCHR%2898%29%7C%7CCHR%2898%29%7C%7CCHR%28113%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22zAUD%22%3D%22zAUD=Homo%20sapiens=Reaction=Pathway=true


Use case of UTILIZENODE API

2020-06-10 Thread ChienHuaWang


While exploring the UTILIZENODE API to move replicas, it would depend on the
preferences & autoscaling policies defined.
But wondering what's the priority for its decision? Let's say I define
maximize freedisk & minimize heapUsage, and also set-cluster-policy as
example in doc. {"replica": "<2", "shard": "#EACH", "node": "#ANY"}
 
Would it check violation first? How it pick replica to delete and add?
Sometimes I observed "No node can satisfy the rules", even I tried to move
to a new node.

If I want to rebalance for new/existing node, what's the best approach? It
seems I have limited control of UTILIZENODE API, would this be suggested to
use in production? 

Appreciate if anyone can share the use case of UTILIZENODE API to move
replicas.




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


SolR OOM error due to query injection

2020-06-10 Thread Guilherme Viteri
Hi,

Environment: SolR 6.6.2, with org.apache.solr.solr-core:6.1.0. This setup has 
been running for at least 4 years without having OutOfMemory error. (it is 
never too late for an OOM…)

This week, our search tool has been attacked via ‘sql injection’ like, and that 
led to an OOM. These requests weren’t aggressive that stressed the server with 
an excessive number of hits, however 5 to 10 request of this nature was enough 
to crash the server.

I’ve come across a this link 
https://stackoverflow.com/questions/26862474/prevent-from-solr-query-injections-when-using-solrj
 
,
 however, that’s not what I am after. In our case we do allow lucene query and 
field search like title:Title or our ids have dash and if it get escaped, then 
the search won’t work properly.

Does anyone have an idea ?

Cheers
G

Here are some of the requests that appeared in the logs in relation to the 
attack (see below: sorry it is messy)
query?q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28SELECT%2F%2A%2A%2F2%2A%28IF%28%28SELECT%2F%2A%2A%2F%2A%2F%2A%2A%2FFROM%2F%2A%2A%2F%28SELECT%2F%2A%2A%2FCONCAT%280x717a707871%2C%28SELECT%2F%2A%2A%2F%28ELT%283235%3D3235%2C1%29%29%29%2C0x717a626271%2C0x78%29%29s%29%2C%2F%2A%2A%2F8446744073709551610%2C%2F%2A%2A%2F8446744073709551610%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22YBXk%22%2F%2A%2A%2FLIKE%2F%2A%2A%2F%22YBXk=Homo%20sapiens=Reaction=Pathway=true

q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28SELECT%2F%2A%2A%2F2%2A%28IF%28%28SELECT%2F%2A%2A%2F%2A%2F%2A%2A%2FFROM%2F%2A%2A%2F%28SELECT%2F%2A%2A%2FCONCAT%280x717a707871%2C%28SELECT%2F%2A%2A%2F%28ELT%283235%3D3235%2C1%29%29%29%2C0x717a626271%2C0x78%29%29s%29%2C%2F%2A%2A%2F8446744073709551610%2C%2F%2A%2A%2F8446744073709551610%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22rDmG%22%3D%22rDmG=Homo%20sapiens=Reaction=Pathway=true

q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28SELECT%2F%2A%2A%2F3641%2F%2A%2A%2FFROM%28SELECT%2F%2A%2A%2FCOUNT%28%2A%29%2CCONCAT%280x717a707871%2C%28SELECT%2F%2A%2A%2F%28ELT%283641%3D3641%2C1%29%29%29%2C0x717a626271%2CFLOOR%28RAND%280%29%2A2%29%29x%2F%2A%2A%2FFROM%2F%2A%2A%2FINFORMATION_SCHEMA.PLUGINS%2F%2A%2A%2FGROUP%2F%2A%2A%2FBY%2F%2A%2A%2Fx%29a%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22dfkM%22%2F%2A%2A%2FLIKE%2F%2A%2A%2F%22dfkM=Homo%20sapiens=Reaction=Pathway=true

q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28SELECT%2F%2A%2A%2F3641%2F%2A%2A%2FFROM%28SELECT%2F%2A%2A%2FCOUNT%28%2A%29%2CCONCAT%280x717a707871%2C%28SELECT%2F%2A%2A%2F%28ELT%283641%3D3641%2C1%29%29%29%2C0x717a626271%2CFLOOR%28RAND%280%29%2A2%29%29x%2F%2A%2A%2FFROM%2F%2A%2A%2FINFORMATION_SCHEMA.PLUGINS%2F%2A%2A%2FGROUP%2F%2A%2A%2FBY%2F%2A%2A%2Fx%29a%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22yBhx%22%3D%22yBhx=Homo%20sapiens=Reaction=Pathway=true

q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F1695%3DCTXSYS.DRITHSX.SN%281695%2C%28CHR%28113%29%7C%7CCHR%28122%29%7C%7CCHR%28112%29%7C%7CCHR%28120%29%7C%7CCHR%28113%29%7C%7C%28SELECT%2F%2A%2A%2F%28CASE%2F%2A%2A%2FWHEN%2F%2A%2A%2F%281695%3D1695%29%2F%2A%2A%2FTHEN%2F%2A%2A%2F1%2F%2A%2A%2FELSE%2F%2A%2A%2F0%2F%2A%2A%2FEND%29%2F%2A%2A%2FFROM%2F%2A%2A%2FDUAL%29%7C%7CCHR%28113%29%7C%7CCHR%28122%29%7C%7CCHR%2898%29%7C%7CCHR%2898%29%7C%7CCHR%28113%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22eEdc%22%2F%2A%2A%2FLIKE%2F%2A%2A%2F%22eEdc=Homo%20sapiens=Reaction=Pathway=true

q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F1695%3DCTXSYS.DRITHSX.SN%281695%2C%28CHR%28113%29%7C%7CCHR%28122%29%7C%7CCHR%28112%29%7C%7CCHR%28120%29%7C%7CCHR%28113%29%7C%7C%28SELECT%2F%2A%2A%2F%28CASE%2F%2A%2A%2FWHEN%2F%2A%2A%2F%281695%3D1695%29%2F%2A%2A%2FTHEN%2F%2A%2A%2F1%2F%2A%2A%2FELSE%2F%2A%2A%2F0%2F%2A%2A%2FEND%29%2F%2A%2A%2FFROM%2F%2A%2A%2FDUAL%29%7C%7CCHR%28113%29%7C%7CCHR%28122%29%7C%7CCHR%2898%29%7C%7CCHR%2898%29%7C%7CCHR%28113%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22zAUD%22%3D%22zAUD=Homo%20sapiens=Reaction=Pathway=true

q=IPP%22%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F4144%3DCONVERT%28INT%2C%28SELECT%2F%2A%2A%2FCHAR%28113%29%2BCHAR%28122%29%2BCHAR%28112%29%2BCHAR%28120%29%2BCHAR%28113%29%2B%28SELECT%2F%2A%2A%2F%28CASE%2F%2A%2A%2FWHEN%2F%2A%2A%2F%284144%3D4144%29%2F%2A%2A%2FTHEN%2F%2A%2A%2FCHAR%2849%29%2F%2A%2A%2FELSE%2F%2A%2A%2FCHAR%2848%29%2F%2A%2A%2FEND%29%29%2BCHAR%28113%29%2BCHAR%28122%29%2BCHAR%2898%29%2BCHAR%2898%29%2BCHAR%28113%29%29%29%2F%2A%2A%2FAND%2F%2A%2A%2F%28%28%28%22ePUW%22%2F%2A%2A%2FLIKE%2F%2A%2A%2F%22ePUW=Homo%20sapiens=Reaction=Pathway=true


Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-10 Thread Isabelle Giguere
Hi Jan;

Thank you for your reply.

This is security.json as seen in Zookeeper.  Credentials are admin / admin

{
  "authentication":{
"blockUnknown":false,
"realm":"MTM Solr",
"forwardCredentials":true,
"class":"solr.BasicAuthPlugin",
"credentials":{"admin":"0rTOgObKYwzSyPoYuj2su2/90eQCfysF1aasxTx+wrc= 
+tCMmpawYYtTsp3JfkG9avb8bKZlm/IGTZirsufYvns="},
"":{"v":2}},
  "authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[{
"name":"all",
"role":"admin"}],
"user-role":{"admin":"admin"},
"":{"v":8}}}

Thanks for feedback

Isabelle Giguère
Computational Linguist & Java Developer
Linguiste informaticienne & développeur java



De : Jan Høydahl 
Envoyé : 10 juin 2020 16:01
À : solr-user@lucene.apache.org 
Objet : [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

Please share your security.json file

Jan Høydahl

> 10. jun. 2020 kl. 21:53 skrev Isabelle Giguere 
> :
>
> Hi;
>
> I'm using Solr 8.5.0.  I have uploaded security.json to Zookeeper.  I can log 
> in the Solr Admin UI.  I can create collections and aliases, and I can index 
> documents in Solr.
>
> Collections : test1, test2
> Alias: test (combines test1, test2)
>
> Indexed document "solr-word.pdf" in collection test1
>
> Searching on a collection works:
> http://localhost:8983/solr/test1/select?q=*:*=xml
> 
>
> But searching on an alias results in HTTP 401
> http://localhost:8983/solr/test/select?q=*:*=xml
>
> Error from server at null: Expected mime type application/octet-stream but 
> got text/html.content="text/html;charset=utf-8"/> Error 401 Authentication failed, 
> Response code: 401  HTTP ERROR 401 Authentication 
> failed, Response code: 401  
> URI:/solr/test1_shard1_replica_n1/select 
> STATUS:401 MESSAGE:Authentication 
> failed, Response code: 401 
> SERVLET:default   
>
> Even if 
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/SOLR-13510__;!!Obbck6kTJA!P6ugA-rw1I80PaH0U_GVasNqn8EXwmVQ33lwcPOU-cvNgTJK6-3zAf8ukzvv3ynJ$
>   is fixed in Solr 8.5.0, I did try to start Solr with -Dsolr.http1=true, and 
> I set "forwardCredentials":true in security.json.
>
> Nothing works.  I just cannot use aliases when Solr is secured.
>
> Can anyone confirm if this may be a configuration issue, or if this could 
> possibly be a bug ?
>
> Thank you;
>
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
>
>


Re: HTTP 401 when searching on alias in secured Solr

2020-06-10 Thread Jan Høydahl
Please share your security.json file

Jan Høydahl

> 10. jun. 2020 kl. 21:53 skrev Isabelle Giguere 
> :
> 
> Hi;
> 
> I'm using Solr 8.5.0.  I have uploaded security.json to Zookeeper.  I can log 
> in the Solr Admin UI.  I can create collections and aliases, and I can index 
> documents in Solr.
> 
> Collections : test1, test2
> Alias: test (combines test1, test2)
> 
> Indexed document "solr-word.pdf" in collection test1
> 
> Searching on a collection works:
> http://localhost:8983/solr/test1/select?q=*:*=xml
> 
> 
> But searching on an alias results in HTTP 401
> http://localhost:8983/solr/test/select?q=*:*=xml
> 
> Error from server at null: Expected mime type application/octet-stream but 
> got text/html.content="text/html;charset=utf-8"/> Error 401 Authentication failed, 
> Response code: 401  HTTP ERROR 401 Authentication 
> failed, Response code: 401  
> URI:/solr/test1_shard1_replica_n1/select 
> STATUS:401 MESSAGE:Authentication 
> failed, Response code: 401 
> SERVLET:default   
> 
> Even if https://issues.apache.org/jira/browse/SOLR-13510 is fixed in Solr 
> 8.5.0, I did try to start Solr with -Dsolr.http1=true, and I set 
> "forwardCredentials":true in security.json.
> 
> Nothing works.  I just cannot use aliases when Solr is secured.
> 
> Can anyone confirm if this may be a configuration issue, or if this could 
> possibly be a bug ?
> 
> Thank you;
> 
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
> 
> 


HTTP 401 when searching on alias in secured Solr

2020-06-10 Thread Isabelle Giguere
Hi;

I'm using Solr 8.5.0.  I have uploaded security.json to Zookeeper.  I can log 
in the Solr Admin UI.  I can create collections and aliases, and I can index 
documents in Solr.

Collections : test1, test2
Alias: test (combines test1, test2)

Indexed document "solr-word.pdf" in collection test1

Searching on a collection works:
http://localhost:8983/solr/test1/select?q=*:*=xml


But searching on an alias results in HTTP 401
http://localhost:8983/solr/test/select?q=*:*=xml

Error from server at null: Expected mime type application/octet-stream but got 
text/html.Error 401 Authentication failed, 
Response code: 401  HTTP ERROR 401 Authentication 
failed, Response code: 401  
URI:/solr/test1_shard1_replica_n1/select 
STATUS:401 MESSAGE:Authentication 
failed, Response code: 401 SERVLET:default 
  

Even if https://issues.apache.org/jira/browse/SOLR-13510 is fixed in Solr 
8.5.0, I did try to start Solr with -Dsolr.http1=true, and I set 
"forwardCredentials":true in security.json.

Nothing works.  I just cannot use aliases when Solr is secured.

Can anyone confirm if this may be a configuration issue, or if this could 
possibly be a bug ?

Thank you;

Isabelle Giguère
Computational Linguist & Java Developer
Linguiste informaticienne & développeur java




Re: How to determine why solr stops running?

2020-06-10 Thread Ryan W
Hi all,

People keep suggesting I check the logs for errors.  What do those errors
look like?  Does anyone have examples of the text of a Solr oom error?  Or
the text of any other errors I should be looking for the next time solr
fails?  Are there phrases I should grep for in the logs?  Should I be
looking in the Solr logs for an OOM error, or in the Apache logs?

There is nothing failing on the server except for solr -- at least not that
I can see.  There is no apparent problem with the hardware or anything else
on the server.  The OS is Red Hat Enterprise Linux. The server has 16 GB of
RAM and hosts one website that does not get a huge amount of traffic.

When the start command is given to solr, does it first check to see if solr
is running, or does it always start solr whether it is already running or
not?

Many thanks!
Ryan


On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson 
wrote:

> To add to what Dave said, if you have a particular machine that’s prone to
> suddenly stopping, that’s usually a red flag that you should seriously
> think about hardware issues.
>
> If the problem strikes different machines, then I agree with Shawn that
> the first thing I’d be suspicious of is OOM errors.
>
> FWIW,
> Erick
>
> > On Jun 9, 2020, at 6:05 AM, Dave  wrote:
> >
> > I’ll add that whenever I’ve had a solr instance shut down, for me it’s
> been a hardware failure. Either the ram or the disk got a “glitch” and both
> of these are relatively fragile and wear and tear type parts of the
> machine, and should be expected to fail and be replaced from time to time.
> Solr is pretty aggressive with its logging so there are a lot of writes
> always happening and of course reads, if the disk has any issues or the
> memory it can lock it up and bring her down, more so if you have any
> spellcheck dictionaries or suggesters being built on start up.
> >
> > Just my experience with this, could be wrong (most likely wrong) but we
> always have extra drives and memory around the server room for this
> reason.  At least once or twice a year we will have a disk failure in the
> raid and need to swap in a new one.
> >
> > Good luck though, also solr should be logging it’s failures so it would
> be good to look there too
> >
> >> On Jun 9, 2020, at 2:35 AM, Shawn Heisey  wrote:
> >>
> >> On 5/14/2020 7:22 AM, Ryan W wrote:
> >>> I manage a site where solr has stopped running a couple times in the
> past
> >>> week. The server hasn't been rebooted, so that's not the reason.  What
> else
> >>> causes solr to stop running?  How can I investigate why this is
> happening?
> >>
> >> Any situation where Solr stops running and nobody requested the stop is
> a result of a serious problem that must be thoroughly investigated.  I
> think it's a bad idea for Solr to automatically restart when it stops
> unexpectedly.  Chances are that whatever caused the crash is going to
> simply make the crash happen again until the problem is solved.
> Automatically restarting could hide problems from the system administrator.
> >>
> >> The only way a Solr auto-restart would be acceptable to me is if it
> sends a high priority alert to the sysadmin EVERY time it executes an
> auto-restart.  It really is that bad of a problem.
> >>
> >> The causes of Solr crashes (that I can think of) include the following.
> I believe I have listed these four options from most likely to least likely:
> >>
> >> * Java OutOfMemoryError exceptions.  On non-windows systems, the
> "bin/solr" script starts Solr with an option that results in Solr's death
> anytime one of these exceptions occurs.  We do this because program
> operation is indeterminate and completely unpredictable when OOME occurs,
> so it's far safer to stop running.  That exception can be caused by several
> things, some of which actually do not involve memory at all.  If you're
> running on Windows via the bin\solr.cmd command, then this will not happen
> ... but OOME could still cause a crash, because as I already mentioned,
> program operation is unpredictable when OOME occurs.
> >>
> >> * The OS kills Solr because system memory is completely exhausted and
> Solr is the process using the most memory.  Linux calls this the
> "oom-killer" ... I am pretty sure something like it exists on most
> operating systems.
> >>
> >> * Corruption somewhere in the system.  Could be in Java, the OS, Solr,
> or data used by any of those.
> >>
> >> * A very serious bug in Solr's code that we haven't discovered yet.
> >>
> >> I included that last one simply for completeness.  A bug that causes a
> crash *COULD* exist, but as of right now, we have not seen any supporting
> evidence.
> >>
> >> My guess is that Java OutOfMemoryError is the cause here, but I can't
> be certain.  If that is happening, then some resource (which might not be
> memory) is fully depleted.  We would need to see the full OutOfMemoryError
> exception in order to determine why it is happening. Sometimes the
> exception is logged in solr.log, sometimes it 

RE: Timeout issue while doing update operations from clients (using SolrJ)

2020-06-10 Thread Kommu, Vinodh K.
We are getting following socket timeout exception during this error. Any idea 
on this?

ERROR (updateExecutor-3-thread-1392-processing-n:hostname:1100_solr 
x:TestCollection_shard6_replica_n10 c:TestCollection s:shard6 r:core_node13) 
[c:TestCollection s:shard6 r:core_node13 x:TestCollection_shard6_replica_n10] 
o.a.s.u.SolrCmdDistributor org.apache.solr.client.solrj.SolrServerException: 
Timeout occured while waiting response from server at: 
https://hostname:1100/solr/TestCollection_shard6_replica_n34
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:654)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient.request(ConcurrentUpdateSolrClient.java:491)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260)
at 
org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:326)
at 
org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:315)
at 
org.apache.solr.update.SolrCmdDistributor.dt_access$675(SolrCmdDistributor.java)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$303(ExecutorUtil.java)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at 
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at 
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at 
org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at 
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at 
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at 
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120)
at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542)

Thanks & Regards,
Vinodh

From: Kommu, Vinodh K.
Sent: Wednesday, June 10, 2020 3:41 PM
To: solr-user@lucene.apache.org
Subject: Timeout issue while doing update operations from clients (using SolrJ)

Hi,

Need some help in fixing intermittent timeout issue please. Recently we came 
across this 

Re: using solr to extarct keywords from a long text?

2020-06-10 Thread Mikhail Khludnev
Hello, David.

>From the code I noticing that MoreLikeThisHandler consumes request body
when there's no ?q= and analyzes it for doing what are you asking for. I
see that ref guide obscured this feature.

On Wed, Jun 10, 2020 at 4:37 PM David Zimmermann 
wrote:

> Dear solr community
>
> I’m supposed to extract keywords from long texts. I do have a solr index
> with a lot of documents from the same domain as my texts. So, I was
> wondering if I can use solr to extract those keywords. Ideally I would want
> to use the TF-IDF basd “importantTerms” from the “more like this” function,
> but without indexing the text first. Is there a way to run a more like this
> query not based on a document id, but on a a text supplied by the query? Or
> is there another way to achieve my goal?
>
> I have also been looking into using the /stream handler, but the solr core
> is set up as standalone and not in cloud mode.
>
> Best
> David



-- 
Sincerely yours
Mikhail Khludnev


using solr to extarct keywords from a long text?

2020-06-10 Thread David Zimmermann
Dear solr community

I’m supposed to extract keywords from long texts. I do have a solr index with a 
lot of documents from the same domain as my texts. So, I was wondering if I can 
use solr to extract those keywords. Ideally I would want to use the TF-IDF basd 
“importantTerms” from the “more like this” function, but without indexing the 
text first. Is there a way to run a more like this query not based on a 
document id, but on a a text supplied by the query? Or is there another way to 
achieve my goal?

I have also been looking into using the /stream handler, but the solr core is 
set up as standalone and not in cloud mode.

Best
David

Re: Getting rid of zookeeper

2020-06-10 Thread matthew sporleder
FWIW -- zookeeper is pretty set-and-forget in my experience with
settings like autopurge.snapRetainCount, autopurge.purgeInterval, and
rotating the zookeeper.out stdout file.

It is a big hassle to setup the individual myid files and keep them in
sync with the server.$id=hostname in zoo.cfg but, again, one time
pain.

I think smaller solr deployments could benefit from some easier
ability to configure the embedded zookeeper (like the improved zk
upconfig and friends) which might address this entire point?  The only
reason I don't run embedded zk (I use three small ec2's) is because
cpu/disk contention on the same server have burned me in the past.

On Wed, Jun 10, 2020 at 3:30 AM Jan Høydahl  wrote:
>
> Curator is just on the client (solr) side, to make it easier to integrate 
> with Zookeeper, right?
>
> If you study Elastic, they had terrible cluster stability a few years ago 
> since everything
> was too «dynamic» and «zero config». That led to the system outsmarting 
> itself when facing
> real-life network partitions and other failures. Solr did not have these 
> issues exactly because
> it relies on Zookeeper which is very static and hard to change (on purpose), 
> and thus delivers
> a strong, stable quorum. So what did Elastic do a couple years ago? They 
> adopted the same
> best practice as ZK, recommending 3 or 5 (statically defined) master nodes 
> that owns the
> cluster state.
>
> Solr could get rid of ZK the same way as KAFKA. But while KAFKA already has a
> distributed log they could replace ZK with (hey, Kafka IS a log), Solr would 
> need to add
> such a log, and it would need to be embedded in the Solr process to avoid 
> that extra runtime.
> I believe it could be done with Apache Ratis 
> (https://ratis.incubator.apache.org ) 
> which
> is a RAFT Java library. But I’m doubtful if the project has the bandwidth and 
> dedication right
> now to embark on such a project. It would probably be a multi-year effort, 
> first building
> abstractions on top of ZK, then moving one piece of ZK dependency over to 
> RAFT at a time,
> needing both systems in parallel, before at the end ZK could go away.
>
> I’d like to see it happen. Especially for smaller deployments it would be 
> fantastic.
>
> Jan
>
> > 10. jun. 2020 kl. 01:03 skrev Erick Erickson :
> >
> > The intermediate solution is to migrate to Curator. I don’t know all the 
> > ins and outs
> > of that and whether or not it would be easier to setup and maintain.
> >
> > I do know that Zookeeper is deeply embedded in Solr and taking replacing it 
> > with
> > most anything would be a major pain.
> >
> > I’m also certain that rewriting Zookeeper is a rat-hole that would take a 
> > major
> > effort. If anyone would like to try it, all patches welcome.
> >
> > FWIW,
> > er...@curmudgeon.com
> >
> >> On Jun 9, 2020, at 6:01 PM, Dave  wrote:
> >>
> >> Is it horrible that I’m already burnt out from just reading that?
> >>
> >> I’m going to stick to the classic solr master slave set up for the 
> >> foreseeable future, at least that let’s me focus more on the search theory 
> >> rather than the back end system non stop.
> >>
> >>> On Jun 9, 2020, at 5:11 PM, Vincenzo D'Amore  wrote:
> >>>
> >>> My 2 cents, I have few solrcloud productions installations, I would share
> >>> some thoughts of what I learned in the latest 4/5 years (fwiw) just as 
> >>> they
> >>> come out of my mind.
> >>>
> >>> - to configure a SolrCloud *production* Cluster you have to be a zookeeper
> >>> expert even if you only need Solr.
> >>> - the Zookeeper ensemble (3 or 5 zookeeper nodes) is recommended to run on
> >>> separate machines but for many customers this is too expensive. And for 
> >>> the
> >>> rest it is expensive just to have the instances (i.e. dockers). It is
> >>> expensive even to have people that know Zookeeper or even only train them.
> >>> - given the high availability function of a zookeeper cluster you have
> >>> to monitor it and promptly backup and restore. But it is hard to monitor
> >>> (and configure the monitoring) and it is even harder to backup and restore
> >>> (when it is running).
> >>> - You can't add or remove nodes in zookeeper when it is up. Only the 
> >>> latest
> >>> version should finally give the possibility to add/remove nodes when it is
> >>> running, but afak this is not still supported by SolrCloud (out of the 
> >>> box).
> >>> - many people fail when they try to run a SolrCloud cluster because it is
> >>> hard to set up, for example: SolrCloud zkcli runs poorly on windows.
> >>> - it is hard to admin the zookeeper remotely, basically there are no
> >>> utilities that let you easily list/read/write/delete files on a zookeeper
> >>> filesystem.
> >>> - it was really hard to create a zookeeper ensemble in kubernetes, only
> >>> recently appeared few solutions. This was so counter-productive for the
> >>> Solr project because now the world is moving to Kubernetes, and there is
> >>> 

Timeout issue while doing update operations from clients (using SolrJ)

2020-06-10 Thread Kommu, Vinodh K.
Hi,

Need some help in fixing intermittent timeout issue please. Recently we came 
across this timeout issue during QA performance testing when a streaming 
expression query which runs on a larger set of data (~60-80 million) from a 
client using solrJ, was timing out exactly in 2mins. Later this issue was fixed 
after bumping up idle timeout property default value from "6"ms to 
"60"ms (10mins). Now are getting timeout exceptions again when update and 
delete operations are happening. To fix this, we have increased following 
timeout settings in solr.xml file across all solr nodes.


  ${distribUpdateSoTimeout:60}

  ${distribUpdateConnTimeout:60}

  ${socketTimeout:60}

  ${connTimeout:60}


However even after increasing above timeout properties to 10mins still seeing 
timeout exceptions intermittently. Any other setting needs to update/change in 
solr or zookeeper or in client? Any suggestions?


Thanks & Regards,
Vinodh

DTCC DISCLAIMER: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify us 
immediately and delete the email and any attachments from your system. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email.


RE: Atomic updates with nested documents

2020-06-10 Thread Kaminski, Adi
Sure, np.
We did same W/A for long period, but eventually it indeed impacted very much 
our application performance, and partial atomic updates to parent doc improved 
this significantly (20-30x than whole docs).

Regards,
Adi

-Original Message-
From: Ludger Steens 
Sent: Wednesday, June 10, 2020 11:10 AM
To: solr-user@lucene.apache.org
Subject: AW: Atomic updates with nested documents

Hi Adi,

thank you for your reply!  Although I have to admit that this is not the 
response that I was hoping for .

Upgrading to Solr 8 is currently not possible for us because we found multiple 
issues when doing so  (see 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202005.mbox/%3Ce7dc73d4be2ac35404db0f6cfb75f905%40mail.gmail.com%3E).
We have now implemented a workaround and send the whole document with ChildDocs 
to Solr instead of doing an atomic update. This works as expected but is 
significantly slower.

Regards
Ludger

---
Beste Arbeitgeber ITK 2020 - 1. Platz für QAware ausgezeichnet von Great Place 
to Work
---

Ludger Steens
Softwarearchitekt


QAware GmbH
Aschauer Straße 32
81549 München, Germany
Mobil +49 175 7973969
mailto:ludger.ste...@qaware.de
https://www.qaware.de


Geschäftsführer: Christian Kamm, Johannes Weigend, Dr. Josef Adersberger
Registergericht: München
Handelsregisternummer: HRB 163761
---
-Ursprüngliche Nachricht-
Von: Kaminski, Adi 
Gesendet: Sonntag, 7. Juni 2020 08:45
An: solr-user@lucene.apache.org
Betreff: RE: Atomic updates with nested documents

Hi Ludger,
We had the same issue with Solr 7.6, and after discussing with the community 
we've found out that this partial update of parent document without "harm"
parent-child association can work only on Solr 8.1 or higher, and It also 
requires some prerequisites.

See the below item and it's last comments with details:
https://issues.apache.org/jira/browse/SOLR-12638?focusedCommentId=16894628=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16894628

Eventually we have move to Solr 8.3 and it's working there as expected with the 
above mentioned changes.

Regards,
Adi

-Original Message-
From: Ludger Steens 
Sent: Friday, June 5, 2020 3:24 PM
To: solr-user@lucene.apache.org
Subject: Atomic updates with nested documents

Dear Community,



I am using Solr 7.7 and I am wondering how it is possible to do a partial 
update on nested documents / child documents.

Suppose I have committed the following documents to the index:

[

  {

"id": "1",

"testString": "1",

"testInt": "1",

"_childDocuments_": [

  {

"id": "1.1",

"child_type": "child_a",

"testString": "1.1",

"testInt": "1"

  },

  {

"id": "1.2",

"child_type": "child_a",

"testString": "1.1",

"testInt": "1"

  }

]

  }

]

 is id, all fields are indexed.



Now I want to update testInt to 2 on the parent document without losing the 
parent child relation (ChildDocTransformerFactory should still produce correct 
results).

I tried the following variants, both not successful:



*Variant 1:*

Sending the following update document to the update-Endpoint

[

  {

"id": "1",

"testInt": {

  "set": "2"

}

  }

]

The parent document is updated, but the ChildDocTransformerFactory does not 
return any child documents



*Variant 2:*

Sending the following update document to the update-Endpoint

[

  {

"id": "1",

"testInt": {

  "set": "2"

},

"_childDocuments_": [

  {

"id": {

  "set": "1.1"

}

  },

  {

"id": {

  "set": "1.2"

}

  }

]

  }

]

Same result: Parent is updated, but ChildDocTransformerFactory does not return 
any child documents





Is there any other way of doing a partial update without losing the parent 
child relation?

Resending the complete document with all attributes and children would work but 
is inefficient for us (we had to load all documents from Solr before resending 
them).



Thanks in advance for your help



Ludger


--

*„Beste Arbeitgeber ITK 2020“ - 1. Platz für QAware* ausgezeichnet von Great 
Place to Work 

--

Ludger Steens
Softwarearchitekt

QAware GmbH
Aschauer Straße 32
81549 München, Germany
Mobil +49 175 7973969
ludger.ste...@qaware.de
www.qaware.de
--

Geschäftsführer: Christian Kamm, Johannes Weigend, Dr. Josef Adersberger
Registergericht: München
Handelsregisternummer: HRB 163761


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the 

AW: Atomic updates with nested documents

2020-06-10 Thread Ludger Steens
Hi Adi,

thank you for your reply!  Although I have to admit that this is not the
response that I was hoping for .

Upgrading to Solr 8 is currently not possible for us because we found
multiple issues when doing so  (see
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202005.mbox/%3Ce7dc73d4be2ac35404db0f6cfb75f905%40mail.gmail.com%3E).
We have now implemented a workaround and send the whole document with
ChildDocs to Solr instead of doing an atomic update. This works as expected
but is significantly slower.

Regards
Ludger

---
Beste Arbeitgeber ITK 2020 - 1. Platz für QAware
ausgezeichnet von Great Place to Work
---

Ludger Steens
Softwarearchitekt


QAware GmbH
Aschauer Straße 32
81549 München, Germany
Mobil +49 175 7973969
mailto:ludger.ste...@qaware.de
https://www.qaware.de


Geschäftsführer: Christian Kamm, Johannes Weigend, Dr. Josef Adersberger
Registergericht: München
Handelsregisternummer: HRB 163761
---
-Ursprüngliche Nachricht-
Von: Kaminski, Adi 
Gesendet: Sonntag, 7. Juni 2020 08:45
An: solr-user@lucene.apache.org
Betreff: RE: Atomic updates with nested documents

Hi Ludger,
We had the same issue with Solr 7.6, and after discussing with the community
we've found out that this partial update of parent document without "harm"
parent-child association can work only on Solr 8.1 or higher, and It also
requires some prerequisites.

See the below item and it's last comments with details:
https://issues.apache.org/jira/browse/SOLR-12638?focusedCommentId=16894628=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16894628

Eventually we have move to Solr 8.3 and it's working there as expected with
the above mentioned changes.

Regards,
Adi

-Original Message-
From: Ludger Steens 
Sent: Friday, June 5, 2020 3:24 PM
To: solr-user@lucene.apache.org
Subject: Atomic updates with nested documents

Dear Community,



I am using Solr 7.7 and I am wondering how it is possible to do a partial
update on nested documents / child documents.

Suppose I have committed the following documents to the index:

[

  {

"id": "1",

"testString": "1",

"testInt": "1",

"_childDocuments_": [

  {

"id": "1.1",

"child_type": "child_a",

"testString": "1.1",

"testInt": "1"

  },

  {

"id": "1.2",

"child_type": "child_a",

"testString": "1.1",

"testInt": "1"

  }

]

  }

]

 is id, all fields are indexed.



Now I want to update testInt to 2 on the parent document without losing the
parent child relation (ChildDocTransformerFactory should still produce
correct results).

I tried the following variants, both not successful:



*Variant 1:*

Sending the following update document to the update-Endpoint

[

  {

"id": "1",

"testInt": {

  "set": "2"

}

  }

]

The parent document is updated, but the ChildDocTransformerFactory does not
return any child documents



*Variant 2:*

Sending the following update document to the update-Endpoint

[

  {

"id": "1",

"testInt": {

  "set": "2"

},

"_childDocuments_": [

  {

"id": {

  "set": "1.1"

}

  },

  {

"id": {

  "set": "1.2"

}

  }

]

  }

]

Same result: Parent is updated, but ChildDocTransformerFactory does not
return any child documents





Is there any other way of doing a partial update without losing the parent
child relation?

Resending the complete document with all attributes and children would work
but is inefficient for us (we had to load all documents from Solr before
resending them).



Thanks in advance for your help



Ludger


--

*„Beste Arbeitgeber ITK 2020“ - 1. Platz für QAware* ausgezeichnet von Great
Place to Work

--

Ludger Steens
Softwarearchitekt

QAware GmbH
Aschauer Straße 32
81549 München, Germany
Mobil +49 175 7973969
ludger.ste...@qaware.de
www.qaware.de
--

Geschäftsführer: Christian Kamm, Johannes Weigend, Dr. Josef Adersberger
Registergericht: München
Handelsregisternummer: HRB 163761


This electronic message may contain proprietary and confidential information
of Verint Systems Inc., its affiliates and/or subsidiaries. The information
is intended to be for the use of the individual(s) or entity(ies) named
above. If you are not the intended recipient (or authorized to receive this
e-mail for the intended recipient), you may not use, copy, disclose or
distribute to anyone this message or any information contained in this
message. If you have received this electronic message in error, please
notify us by replying to this e-mail.


Re: Getting rid of zookeeper

2020-06-10 Thread Jan Høydahl
Curator is just on the client (solr) side, to make it easier to integrate with 
Zookeeper, right?

If you study Elastic, they had terrible cluster stability a few years ago since 
everything
was too «dynamic» and «zero config». That led to the system outsmarting itself 
when facing
real-life network partitions and other failures. Solr did not have these issues 
exactly because
it relies on Zookeeper which is very static and hard to change (on purpose), 
and thus delivers
a strong, stable quorum. So what did Elastic do a couple years ago? They 
adopted the same
best practice as ZK, recommending 3 or 5 (statically defined) master nodes that 
owns the
cluster state.

Solr could get rid of ZK the same way as KAFKA. But while KAFKA already has a
distributed log they could replace ZK with (hey, Kafka IS a log), Solr would 
need to add
such a log, and it would need to be embedded in the Solr process to avoid that 
extra runtime.
I believe it could be done with Apache Ratis 
(https://ratis.incubator.apache.org ) 
which 
is a RAFT Java library. But I’m doubtful if the project has the bandwidth and 
dedication right
now to embark on such a project. It would probably be a multi-year effort, 
first building
abstractions on top of ZK, then moving one piece of ZK dependency over to RAFT 
at a time,
needing both systems in parallel, before at the end ZK could go away.

I’d like to see it happen. Especially for smaller deployments it would be 
fantastic.

Jan

> 10. jun. 2020 kl. 01:03 skrev Erick Erickson :
> 
> The intermediate solution is to migrate to Curator. I don’t know all the ins 
> and outs
> of that and whether or not it would be easier to setup and maintain.
> 
> I do know that Zookeeper is deeply embedded in Solr and taking replacing it 
> with
> most anything would be a major pain.
> 
> I’m also certain that rewriting Zookeeper is a rat-hole that would take a 
> major
> effort. If anyone would like to try it, all patches welcome.
> 
> FWIW,
> er...@curmudgeon.com
> 
>> On Jun 9, 2020, at 6:01 PM, Dave  wrote:
>> 
>> Is it horrible that I’m already burnt out from just reading that?
>> 
>> I’m going to stick to the classic solr master slave set up for the 
>> foreseeable future, at least that let’s me focus more on the search theory 
>> rather than the back end system non stop. 
>> 
>>> On Jun 9, 2020, at 5:11 PM, Vincenzo D'Amore  wrote:
>>> 
>>> My 2 cents, I have few solrcloud productions installations, I would share
>>> some thoughts of what I learned in the latest 4/5 years (fwiw) just as they
>>> come out of my mind.
>>> 
>>> - to configure a SolrCloud *production* Cluster you have to be a zookeeper
>>> expert even if you only need Solr.
>>> - the Zookeeper ensemble (3 or 5 zookeeper nodes) is recommended to run on
>>> separate machines but for many customers this is too expensive. And for the
>>> rest it is expensive just to have the instances (i.e. dockers). It is
>>> expensive even to have people that know Zookeeper or even only train them.
>>> - given the high availability function of a zookeeper cluster you have
>>> to monitor it and promptly backup and restore. But it is hard to monitor
>>> (and configure the monitoring) and it is even harder to backup and restore
>>> (when it is running).
>>> - You can't add or remove nodes in zookeeper when it is up. Only the latest
>>> version should finally give the possibility to add/remove nodes when it is
>>> running, but afak this is not still supported by SolrCloud (out of the box).
>>> - many people fail when they try to run a SolrCloud cluster because it is
>>> hard to set up, for example: SolrCloud zkcli runs poorly on windows.
>>> - it is hard to admin the zookeeper remotely, basically there are no
>>> utilities that let you easily list/read/write/delete files on a zookeeper
>>> filesystem.
>>> - it was really hard to create a zookeeper ensemble in kubernetes, only
>>> recently appeared few solutions. This was so counter-productive for the
>>> Solr project because now the world is moving to Kubernetes, and there is
>>> basically no support.
>>> - well, after all these troubles, when the solrcloud clusters are
>>> configured correctly then, well, they are solid (rock?). And even if few
>>> Solr nodes/replicas went down the entire cluster can restore itself almost
>>> automatically, but how much work.
>>> 
>>> Believe me, I like Solr, but at the end of this long journey, sometimes I
>>> would really use only paas/saas instead of having to deal with all these
>>> troubles.
>