Re: support Rich Document

2021-02-10 Thread Jörn Franke
You can store them on the filesystem and a link to them in Solr. Your search 
application could fetch them from the filesystem and serve them to the users. 

Alternatively serve them as WebDAV, SharePoint or whatever your organization 
sets as standard.

It does not make sense to store them in Solr - they would just blow up the 
index without any value.

> Am 11.02.2021 um 05:08 schrieb Luke :
> 
> HI,
> 
> I know Solr can index rich documents, but I have one requirement.
> 
> I have all kind of documents, such as word, pdf, excel, ppt, jpg etcs
> 
> when Solr indexes them with Tika or OCR, it will extract text and save to
> solr, but the format will be lost, so when the user opens the document, it
> is not readable.
> 
> My question is whether Solr can keep original documents somewhere, such as
> external field, when I load documents, the original document can be
> retrieved too.
> 
> thanks


Re: SSL using CloudSolrClient

2021-02-03 Thread Jörn Franke
Only between Solr nodes PKIauthentication works 

> Am 03.02.2021 um 21:27 schrieb Jörn Franke :
> 
> SSL is transport security. For authentication you have to use basic or 
> kerberos or Hadoop. You may also need to configure authorisation
> 
>> Am 03.02.2021 um 21:22 schrieb ChienHuaWang :
>> 
>> Hi,
>> 
>> I am implementing SSL between Solr and Client communication. The clients
>> connect to Solr via CloudSolrClient
>> 
>> According to  doc
>> <https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html#index-a-document-using-cloudsolrclient>
>>  
>> , the passwords should also be set in clients.
>> However, in testing, client is still working well without any change but
>> with SSL enabling in Solr (passwords are setup)
>> 
>> Is this the expected behavior?  Could anyone share the experience, and
>> advise how to verify SSL?   
>> Appreciate any feedback.
>> 
>> 
>> 
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SSL using CloudSolrClient

2021-02-03 Thread Jörn Franke
SSL is transport security. For authentication you have to use basic or kerberos 
or Hadoop. You may also need to configure authorisation

> Am 03.02.2021 um 21:22 schrieb ChienHuaWang :
> 
> Hi,
> 
> I am implementing SSL between Solr and Client communication. The clients
> connect to Solr via CloudSolrClient
> 
> According to  doc
> 
>  
> , the passwords should also be set in clients.
> However, in testing, client is still working well without any change but
> with SSL enabling in Solr (passwords are setup)
> 
> Is this the expected behavior?  Could anyone share the experience, and
> advise how to verify SSL?   
> Appreciate any feedback.
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SOLR 8.6.0 date Indexing Issues.

2020-11-20 Thread Jörn Franke
Your should format the date according to the ISO Standard: 

https://lucene.apache.org/solr/guide/6_6/working-with-dates.html

Eg. 2018-07-12T00:00:00Z

You can either transform the date that you have in Solr or in your client 
pushing the doc to Solr. 
All major programming language have date utilities that allow you do to this 
transformation easily.


> Am 20.11.2020 um 21:50 schrieb Fiz N :
> 
> Hello Experts,
> 
> I am having  issues with indexing Date field in SOLR 8.6.0. I am indexing
> from MongoDB. In MongoDB the Format is as follows
> 
> 
> * "R_CREATION_DATE" : "12-Jul-18",  "R_MODIFY_DATE" : "30-Apr-19", *
> 
> In my Managed Schema I have the following entries.
> 
> 
> 
> 
> .
> 
> I am getting an error in the Solr log.
> 
> * org.apache.solr.common.SolrException: ERROR: [doc=mt_100] Error adding
> field 'R_MODIFY_DATE'='15-Jul-19' msg=Couldn't parse date because:
> Improperly formatted datetime: 15-Jul-19*
> 
> Please let me know how to handle this usecase with Date format
> "12-JUL-18". what changes should I do to make it work ?
> 
> Thanks
> Fiz N.


Re: Solr8.7 How to increase JVM-Memory ?

2020-11-18 Thread Jörn Franke
I think it should be /etc/default/solr.in.sh
And executable for the user executing Solr

> Am 18.11.2020 um 16:44 schrieb Bruno Mannina :
> 
> Yes. It was executable.
> 
> Must I to create solr.in.sh by copying from solr.in.sh.orig ? it's the right 
> way ?
> 
> -Message d'origine-
> De : Jörn Franke [mailto:jornfra...@gmail.com] 
> Envoyé : mercredi 18 novembre 2020 16:41
> À : solr-user@lucene.apache.org
> Objet : Re: Solr8.7 How to increase JVM-Memory ?
> 
> Did you make solr.in.sh executable ? Eg chmod a+x solr.in.sh ?
> 
>> Am 18.11.2020 um 16:33 schrieb Matheo Software :
>> 
>> 
>> Hi All,
>> 
>> Since several years I work with a old version of Solr on Ubuntu, version 5.4.
>> Today I test the 8.7 version.
>> But I’m not able to change the JVM-Memory like in my 5.4 version.
>> 
>> Many answers on the web tell to modify the solr.in.sh file but in my case I 
>> have only /opt/solr/solr.in.sh.orig.
>> And if I change the SOLR_HEAP to a new value like “8g” Dashboard shows 
>> always 512MB.
>> I try also to change SOLR_JAVA_MEM without success.
>> 
>> Of course, I restart solr each time. (service solr restart). No error in 
>> log. All work fine but no 8g of memory.
>> 
>> I also try to copie solr.in.sh.orig to solr.in.sh, the result is always the 
>> same.
>> 
>> Could you help me ?
>> 
>> Cordialement, Best Regards
>> Bruno Mannina
>> www.matheo-software.com
>> www.patent-pulse.com
>> Tél. +33 0 970 738 743
>> Mob. +33 0 634 421 817
>> 
>> 
>> 
>>Garanti sans virus. www.avast.com
> 
> 
> -- 
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus
> 


Re: Solr8.7 How to increase JVM-Memory ?

2020-11-18 Thread Jörn Franke
Did you make solr.in.sh executable ? Eg chmod a+x solr.in.sh ?

> Am 18.11.2020 um 16:33 schrieb Matheo Software :
> 
> 
> Hi All,
>  
> Since several years I work with a old version of Solr on Ubuntu, version 5.4.
> Today I test the 8.7 version.
> But I’m not able to change the JVM-Memory like in my 5.4 version.
>  
> Many answers on the web tell to modify the solr.in.sh file but in my case I 
> have only /opt/solr/solr.in.sh.orig.
> And if I change the SOLR_HEAP to a new value like “8g” Dashboard shows always 
> 512MB.
> I try also to change SOLR_JAVA_MEM without success.
>  
> Of course, I restart solr each time. (service solr restart). No error in log. 
> All work fine but no 8g of memory.
>  
> I also try to copie solr.in.sh.orig to solr.in.sh, the result is always the 
> same.
>  
> Could you help me ?
>  
> Cordialement, Best Regards
> Bruno Mannina
> www.matheo-software.com
> www.patent-pulse.com
> Tél. +33 0 970 738 743
> Mob. +33 0 634 421 817
> 
>  
> 
>   Garanti sans virus. www.avast.com


Re: Unable to upload configuration with upconfig (Unable to read additional data from server)

2020-11-16 Thread Jörn Franke
I recommend to use the configset api :

https://lucene.apache.org/solr/guide/8_6/configsets-api.html


Especially if you need to secure ZK access with authentication / authorization 
which is recommended 


> Am 16.11.2020 um 11:18 schrieb Maehr, Bernhard :
> 
> Hello guys,
> 
> I have set up a kubernetes cluster of SOR and Zookeeper with the public 
> available helm charts from 
> https://github.com/helm/charts/tree/master/incubator/solr
> SOLR and Zookeepers seams for me to be running each with 3 pods correctly. 
> SOLR is in version 8.7.0, Zookeeper in version 3.5.5
> 
> As the next step I tried to upload the configsets to the Zookeeper. This 
> should happen with CLI calls for outside SOLR and Zookeeper.
> 
> First I tried this with the zkCli.sh from Zookeeper. I was able to create a 
> directory and list directories. But as far es I understood it is not possible 
> to upload files with zkCli.sh, which is really a pitty.
> 
> Because of this I tried to use the solr command from SOLR instead. 
> Unfortunately this tool sometimes works, sometimes not, even on basic 
> commands like 'ls', which is working always with zkCli.sh
> 
> This is the result of mulitiple calls - You can see sometimes they work, 
> sometimes not.
> 
> WARN  - 2020-11-16 11:10:05.087; org.apache.zookeeper.ClientCnxn; An 
> exception was thrown while closing send thread for session 0x10029be8974001a. 
> => EndOfStreamException: Unable to read additional data from server sessionid 
> 0x10029be8974001a, likely server has closed socket
>at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
> org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read 
> additional data from server sessionid 0x10029be8974001a, likely server has 
> closed socket
>at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) 
> ~[zookeeper-3.6.2.jar:3.6.2]
>at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>  ~[zookeeper-3.6.2.jar:3.6.2]
>at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1275) 
> ~[zookeeper-3.6.2.jar:3.6.2]
> abbreviations.txt
> all_synonyms.txt
> WARN  - 2020-11-16 11:10:07.234; org.apache.zookeeper.ClientCnxn; An 
> exception was thrown while closing send thread for session 0x20029be8bd7001d. 
> => EndOfStreamException: Unable to read additional data from server sessionid 
> 0x20029be8bd7001d, likely server has closed socket
>at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
> org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read 
> additional data from server sessionid 0x20029be8bd7001d, likely server has 
> closed socket
>at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) 
> ~[zookeeper-3.6.2.jar:3.6.2]
>at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>  ~[zookeeper-3.6.2.jar:3.6.2]
>at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1275) 
> ~[zookeeper-3.6.2.jar:3.6.2]
> solrconfig.xml
> schema.xml
> solrconfig_query.xml
> word-delim-types.txt
> solrconfig_indexconfig.xml
> tokens.txt
> minus.txt
> WARN  - 2020-11-16 11:10:09.795; org.apache.zookeeper.ClientCnxn; An 
> exception was thrown while closing send thread for session 0x20029be8bd7001e. 
> => EndOfStreamException: Unable to read additional data from server sessionid 
> 0x20029be8bd7001e, likely server has closed socket
>at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
> org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read 
> additional data from server sessionid 0x20029be8bd7001e, likely server has 
> closed socket
>at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) 
> ~[zookeeper-3.6.2.jar:3.6.2]
>at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>  ~[zookeeper-3.6.2.jar:3.6.2]
>at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1275) 
> ~[zookeeper-3.6.2.jar:3.6.2]
> 
> 
> I tried it also with the older SOLR version 8.6.0 (because the included 
> Zookeper libs are in version 3.5.7 there) and get the error
> Most times I get the error "org.apache.solr.common.cloud.ConnectionManager; 
> zkClient has disconnected"
> 
> Is this some kind of version incompatibly of SOLR and zookeeper? A wrong 
> configuration of the zookeepers not being able to communicate between each 
> other?
> 
> Any help appreciated
> Bernhard
> 


Re: Solr endpoint on the public internet

2020-10-08 Thread Jörn Franke


It is like opening a database to the Internet - you simply don’t do it and I 
don’t recommend it.

If you despite the anti pattern want to do it  use the latest Solr versions and 
put a reverse proxy in front. Always use authentication and authorization. Do 
only allow a minimal API endpoints and no admin UI. Limit IPs that can access 
it. Do not use it for confidential data. 
If data (even public one!) gets leaked from your Solr instance it is very bad 
for the reputation of your Organisation.

Future versions allow to disable security problematic modules. Better wait for 
them. Still I would not do it in the first place - you also would not open 
databases to the Internet. I could also not find a use case for which this is 
needed.

> Am 08.10.2020 um 20:27 schrieb Marco Aurélio :
> 
> Hi!
> 
> We're looking into the option of setting up search with Solr without an
> intermediary application. This would mean our backend would index data into
> Solr and we would have a public Solr endpoint on the internet that would
> receive search requests directly.
> 
> Since I couldn't find an existing solution similar to ours, I would like to
> know whether it's possible to secure Solr in a way that allows anyone only
> read-access only to collections and how to achieve that. Specifically
> because of this part of the documentation
> :
> 
> *No Solr API, including the Admin UI, is designed to be exposed to
> non-trusted parties. Tune your firewall so that only trusted computers and
> people are allowed access. Because of this, the project will not regard
> e.g., Admin UI XSS issues as security vulnerabilities. However, we still
> ask you to report such issues in JIRA.*
> Is there a way we can restrict read-only access to Solr collections so as
> to allow users to make search requests directly to it or should we always
> keep our Solr instances completely private?
> 
> Thanks in advance!
> 
> Best regards,
> Marco Godinho


Re: Fetched but not Added Solr 8.6.2

2020-09-17 Thread Jörn Franke
Log file will tell you the issue.

> Am 17.09.2020 um 10:54 schrieb Anuj Bhargava :
> 
> We just installed Solr 8.6.2
> It is fetching the data but not adding
> 
> Indexing completed. *Added/Updated: 0 *documents. Deleted 0 documents.
> (Duration: 06s)
> Requests: 1 ,* Fetched: 100* 17/s, Skipped: 0 , Processed: 0
> 
> The *data-config.xml*
> 
> 
>driver="com.mysql.jdbc.Driver"
>batchSize="-1"
>autoReconnect="true"
>socketTimeout="0"
>connectTimeout="0"
>encoding="UTF-8"
>url="jdbc:mysql://zeroDateTimeBehavior=convertToNull"
>user="xxx"
>password="xxx"/>
>
>deltaQuery="select posting_id from countries where
> last_modified > '${dataimporter.last_index_time}'">
>
>
> 


Re: Solr Cloud 8.5.1 - HDFS and Erasure Coding

2020-09-16 Thread Jörn Franke
I am not aware of a test. However keep
In mind that HDFS supported will be deprecated.

Additionally - you can configure erasure encoding in HDFS on a per folder / 
file basis so you could in the worst case just make the folder for Solr with 
the standard HDFS mode.

Erasure encoding has several limitations (eg possibility to append etc) so I 
would be at least sceptical if it works and do extensive testing .

> Am 16.09.2020 um 15:41 schrieb Joe Obernberger :
> 
> Anyone use Solr with Erasure Coding on HDFS?  Is that supported?
> 
> Thank you
> 
> -Joe
> 


Re: Updating configset

2020-09-11 Thread Jörn Franke
I would go for the Solr rest api ... especially if you have a secured zk (eg 
with Kerberos). Then you need to manage access for humans only in Solr and not 
also in ZK.

> Am 11.09.2020 um 19:41 schrieb Erick Erickson :
> 
> Bin/solr zk upconfig...
> Bin/solr zk cp... For individual files.
> 
> Not as convenient as a nice API, but might let you get by...
> 
>> On Fri, Sep 11, 2020, 13:26 Houston Putman  wrote:
>> 
>> I completely agree, there should be a way to overwrite an existing
>> configSet.
>> 
>> Looks like https://issues.apache.org/jira/browse/SOLR-10391 already
>> exists,
>> so the work could be tracked there.
>> 
>> On Fri, Sep 11, 2020 at 12:36 PM Tomás Fernández Löbbe <
>> tomasflo...@gmail.com> wrote:
>> 
>>> I was in the same situation recently. I think it would be nice to have
>> the
>>> configset UPLOAD command be able to override the existing configset
>> instead
>>> of just fail (with a parameter such as override=true or something). We
>> need
>>> to be careful with the trusted/unstrusted flag there, but that should be
>>> possible.
>>> 
 If we can’t modify the configset wholesale this way, is it possible to
>>> create a new configset and swap the old collection to it?
>>> You can create a new one and then call MODIFYCOLLECTION on the collection
>>> that uses it:
>>> 
>>> 
>> https://lucene.apache.org/solr/guide/8_6/collection-management.html#modifycollection-parameters
>>> .
>>> I've never used that though.
>>> 
>>> On Fri, Sep 11, 2020 at 7:26 AM Carroll, Michael (ELS-PHI) <
>>> m.carr...@elsevier.com> wrote:
>>> 
 Hello,
 
 I am running SolrCloud in Kubernetes with Solr version 8.5.2.
 
 Is it possible to update a configset being used by a collection using a
 SolrCloud API directly? I know that this is possible using the zkcli
>> and
>>> a
 collection RELOAD. We essentially want to be able to checkout our
>>> configset
 from source control, and then replace everything in the active
>> configset
>>> in
 SolrCloud (other than the schema.xml).
 
 We have a couple of custom plugins that use config files that reside in
 the configset, and we don’t want to have to rebuild the collection or
 access zookeeper directly if we don’t have to. If we can’t modify the
 configset wholesale this way, is it possible to create a new configset
>>> and
 swap the old collection to it?
 
 Best,
 Michael Carroll
 
>>> 
>> 


Re: Solr Schema API seems broken to me after 8.2.0

2020-09-08 Thread Jörn Franke
Can you check the logfiles of Solr?

It could be that after the upgrade some filesystem permissions do not work 
anymore 

> Am 08.09.2020 um 09:27 schrieb "jeanc...@gmail.com" :
> 
> Hey guys, good morning.
> 
> As I didn't get any reply for this one, is it ok then that I create the
> Jira ticket?
> 
> Best Regards,
> 
> *Jean Silva*
> 
> 
> https://github.com/jeancsil
> 
> https://linkedin.com/in/jeancsil
> 
> 
> 
>> On Fri, Aug 28, 2020 at 11:10 AM jeanc...@gmail.com 
>> wrote:
>> 
>> Hey everybody,
>> 
>> First of all, I wanted to say that this is my first time writing here. I
>> hope I don't do anything wrong.
>> I went to create the "bug" ticket and saw it would be a good idea to first
>> talk to some of you via IRC (didn't work for me or I did something wrong
>> after 20 years of not using it..)
>> 
>> I'm currently using Solr 8.1.1 in production and I use the Schema API to
>> create the necessary fields before starting to index my new data. (Reason,
>> the managed-schema would be big for me to take care of and I decided to
>> automate this process by using the REST API).
>> 
>> I started trying to upgrade* from 8.1.1* directly to *8.6.1* and the
>> python script I use to add some fields and analyzers started to *kill
>> solr after some successful processes to finish* without issues.
>> 
>> *Let's put it simply that I have to make sure the fields that contain the
>> word "blablabla" in it need to be deleted and then recreated. I have ~33 of
>> them.*
>> 
>> The script works as expected but after some successful creations it kills
>> Solr!
>> 
>> This script was implemented in python and I thought that I might have done
>> something that doesn't work with Solr 8.6.1 anymore and decided to test it
>> with the *proper implementation of the library in Java*, SolrJ 8.6.1 as
>> well. The same error occurred. I also didn't see any change in the
>> documentation with regards to the request I was making.
>> 
>> Unfortunately I don't have any stacktrace from Solr as there were no
>> errors popping up in the console for me. The only thing I see was the
>> output of my script, saying that the *Remote closed connection without
>> response*:
>> ...
>> Traceback (most recent call last):
>>  File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py",
>> line 677, in urlopen
>>chunked=chunked,
>>  File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py",
>> line 426, in _make_request
>>six.raise_from(e, None)
>>  File "", line 3, in raise_from
>>  File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py",
>> line 421, in _make_request
>>httplib_response = conn.getresponse()
>>  File "/usr/lib/python3.7/http/client.py", line 1336, in getresponse
>>response.begin()
>>  File "/usr/lib/python3.7/http/client.py", line 306, in begin
>>version, status, reason = self._read_status()
>>  File "/usr/lib/python3.7/http/client.py", line 275, in _read_status
>>raise RemoteDisconnected("Remote end closed connection without"
>> http.client.RemoteDisconnected: Remote end closed connection without
>> response
>> 
>> 
>> With *Java and SolrJ* matching the Solr version I was using, I got this:
>> 
>> Deleting field field_name_1
>> {responseHeader={status=0,QTime=2187}}
>> 
>> Deleting field field_name_2
>> {responseHeader={status=0,QTime=1571}}
>> 
>> Deleting field field_name_3
>> {responseHeader={status=0,QTime=1587}}
>> 
>> Deleting field field_name_4
>> Exception while deleting the field field_name_4:* IOException occured
>> when talking to server at: http://localhost:32783/solr/my_core_name
>> *
>> 
>> Deleting field field_name_5
>> Exception while deleting the field field_name_5:* IOException occured
>> when talking to server at: http://localhost:32783/solr/my_core_name
>> *
>> // THIS REPEATES FOR + 30 TIMES AND THEN THE MESSAGE CHANGES A BIT
>> 
>> Exception while deleting the field field_name_6:* Server refused
>> connection at:
>> http://localhost:32783/solr/my_core_name/schema?wt=javabin=2
>> *
>> Deleting field field_name_6
>> // REPEATS ALSO MANY TIMES
>> 
>> Maybe I need to run the same thing again with some different configuration
>> to help give you guys a hint on what the problem is?
>> 
>> To finalize, I started to go "back in time" to see when this happened and
>> realized that *I can only upgrade from 8.1.1. to 8.2.1* without this
>> error to happen. (I'm using the docker images in here btw:
>> https://github.com/docker-solr/docker-solr)
>> 
>> Thank you very much and I hope I can also help with this if it's really a
>> bug.
>> 
>> Best Regards,
>> 
>> *Jean Silva*
>> 
>> 
>> https://github.com/jeancsil
>> 
>> https://linkedin.com/in/jeancsil
>> 
>> 


Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

2020-08-28 Thread Jörn Franke
Maybe this can help you?
https://lucene.apache.org/solr/guide/7_5/distributed-requests.html#configuring-statscache-distributed-idf

On Mon, May 11, 2020 at 9:24 AM Spyros Kapnissis  wrote:

> HI all,
>
> On our current master/slave setup (no cloud), we use a a custom sorting
> function to get the first pass results (using the sort param), and then we
> use LTR for re-ranking. This works fine, i.e. re-ranking is applied on the
> topN, after sorting has completed and the order is correct.
>
> However, as we are migrating on SolrCloud (version 7.3.1) with multiple
> shards, this does not seem to work as expected. To my understanding, Solr
> collects the reranked results from the shards back on a single node to
> merge them, and then tries to re-apply sorting.
>
> We would expect the results to at least follow the sorting formula, even if
> this is not what we want. But this still not even the case, as the
> combination of the two (sorting + reranking) results in erratic ordering.
>
> Example result, where $sort_score is the sorting formula output, and score
> is the LTR re-ranked output:
>
> {"id": "152",
> "$sort_score": 17.38543,
> "score": 0.22140852
> },
> {"id": "2016",
> "$sort_score": 14.612957,
> "score": 0.19214153
> },
> { "id": "1523",
> "$sort_score": 14.4093275,
> "score": 0.26738763
> },
> { "id": "6704",
> "$sort_score": 13.956842,
> "score": 0.17357588
> },
> { "id": "6512",
> "$sort_score": 14.43907,
> "score": 0.11575622
> },
>
> We also tried with other simple re-rank queries apart from LTR, and the
> issue persisted.
>
> Could someone please help troubleshoot? Ideally, we would want to have the
> re-rank results merged on the single node, and not re-apply sorting.
>
> Thank you!
>


Re: Solr collections gets wiped on restart

2020-08-27 Thread Jörn Franke
Any logfiles after restart?

Which Solr version?

I would activate autopurge in Zookeeper

> Am 27.08.2020 um 10:49 schrieb "antonio.di...@bnpparibasfortis.com" 
> :
> 
> Good morning,
> 
> 
> I would like to get some help if possible.
> 
> 
> 
> We have a 3 node Solr cluster (ensemble) with apache-zookeeper 3.5.5.
> 
> It works fine until we need to restart one of the nodes. Then all the content 
> of the collection gets deleted.
> 
> This is a production environment, and every time there is a restart or a 
> crash in one of the services/servers we loose lots of time restoring the 
> collection and work.
> 
> This is the way we start the nodes:
> 
> su - ipls004p -c "/applis/24374-iplsp-00/IPLS/solr-8.3.0/bin/solr start 
> -cloud -p 8987 -h s01vl9918254 -s 
> /applis/24374-iplsp-00/IPLS/solr-8.3.0/cloud/node1/solr -z 
> s01vl9918254:2181,s01vl9918256:2181,s01vl9918258:2181 -force"
> 
> This is the zoo.cfg:
> 1.  The number of milliseconds of each tick
> tickTime=2000
> 2.  The number of ticks that the initial
> 3.  synchronization phase can take
> initLimit=10
> 4.  The number of ticks that can pass between
> 5.  sending a request and getting an acknowledgement
> syncLimit=5
> 6.  the directory where the snapshot is stored.
> 7.  do not use /tmp for storage, /tmp here is just
> 8.  example sakes.
> dataDir=/applis/24374-iplsp-00/IPLS/apache-zookeeper-3.5.5-bin/temp
> 9.  the port at which the clients will connect
> clientPort=2181
> 1.  the maximum number of client connections.
> 2.  increase this if you need to handle more clients
> #maxClientCnxns=60
> #
> 3.  Be sure to read the maintenance section of the
> 4.  administrator guide before turning on autopurge.
> #
> 5.  
> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
> #
> 6.  The number of snapshots to retain in dataDir
> #autopurge.snapRetainCount=3
> 7.  Purge task interval in hours
> 8.  Set to "0" to disable auto purge feature
> #autopurge.purgeInterval=1
> 4lw.commands.whitelist=mntr,conf,ruok
> 
> server.1=s01vl9918256:3889:3888
> server.2=s01vl9918258:3889:3888
> server.3=s01vl9918254:3889:3888
> #server.4=s01vl9918255:3889:3888
> 
> 
> 
> 
> 
> Thanks in advance
> 
> 
> Regards, Cordialement,
> Antonio Dinis
> TCC Web Portals Ops Engineers  |   BNP Paribas Fortis SA/NV
> 
> 
> 
> 
> 
> 
> T
> 
> +32 (0)2 231 20994// Brussels Marais +1
> 
> 
> 
> 
> 
> 
> TCC Web Portals
> 
> 
> 
> ==
> BNP Paribas Fortis disclaimer:
> http://www.bnpparibasfortis.com/e-mail-disclaimer.html
> 
> BNP Paribas Fortis privacy policy:
> http://www.bnpparibasfortis.com/privacy-policy.html
> 
> ==


Re: Real time index data

2020-08-26 Thread Jörn Franke
Maybe to add to this . Additionally try to batch the requests from the queue - 
don’t do it one by one , but take n items at the same time.
Look on the Solr side also on the configuration of soft commits vs hard commits 
. Soft commits are relevant for definition how real time this is and can be.

> Am 26.08.2020 um 11:36 schrieb Jörn Franke :
> 
> You do not provide many details, but a queuing mechanism seems to be 
> appropriate for this use case.
> 
>> Am 26.08.2020 um 11:30 schrieb Tushar Arora :
>> 
>> Hi,
>> 
>> One of our use cases requires real time indexing of data in solr from DB.
>> Approximately, 30 rows are updated in a second in DB. And I also want these
>> to be updated in the index simultaneously.
>> Is the Queuing mechanism like Rabbitmq helpful in my case?
>> Please suggest the ways to achieve it.
>> 
>> Regards,
>> Tushar Arora


Re: Real time index data

2020-08-26 Thread Jörn Franke
You do not provide many details, but a queuing mechanism seems to be 
appropriate for this use case.

> Am 26.08.2020 um 11:30 schrieb Tushar Arora :
> 
> Hi,
> 
> One of our use cases requires real time indexing of data in solr from DB.
> Approximately, 30 rows are updated in a second in DB. And I also want these
> to be updated in the index simultaneously.
> Is the Queuing mechanism like Rabbitmq helpful in my case?
> Please suggest the ways to achieve it.
> 
> Regards,
> Tushar Arora


Re: SOLR Compatibility with Oracle Enterprise Linux 7

2020-08-24 Thread Jörn Franke
Yes, it should be no issues to upgrade to RHEL7.
I assume you mean Solr 8.4.0. You can also use the latest Solr version.

Why not RHEL8?


> Am 24.08.2020 um 09:02 schrieb Wang, Ke :
> 


Re: Kerberos on windows device

2020-08-21 Thread Jörn Franke
Hi,

You can use ktpass, if you are AD Administrator. The security json does not 
change from Linux.

Please note that there are a lot of things to consider with Kerberos that can 
go wrong which is not a Solr issue but Kerberos complexity (eg correct DNS 
names, correct encryption type selected in ktpass, correct attribute set in AD) 
- contact your AD administrator with Kerberos experience to get the parameters 
for your AD.

> Am 21.08.2020 um 22:28 schrieb Vanalli, Ali A - DOT :
> 
> Is there way to create a keytab file from windows using the ktpass utility?  
> We are running SOLR from a Windows device and the example provided in the 
> documentation is only for a Linux server.
> 
> Also, please provide the example of security.json for the Kerberos 
> Authentication and authorization on windows device.
> 
> -Thanks
> 
> Ali Vanalli
> Enterprise Business Appl Services Unit
> Enterprise Services Section
> Bureau of Information Technology Services
> Wisconsin Department of Transportation
> ofc (608) 264-9960
> ali.vana...@dot.wi.gov
> 


Re: Trailing space issue with indexed data.

2020-08-18 Thread Jörn Franke
During indexing. Do they matter for search, ie would the search be different 
with/without them?

> Am 18.08.2020 um 19:57 schrieb Fiz N :
> 
> Hell SOLR Experts,
> 
> I am using SOLR 8.6 and indexing data from MSSQL DB.
> 
> after indexing is done I am seeing
> 
> “Page_number”:”1“,
> “Doc_name”:”  office 770 toll free “
> “Doc_text”:” From:  Hyan, gan \nTo:  Delacruz
>  Decruz \n“
> 
> I was remove these empty spaces.
> 
> How to achieve it ?
> 
> Thanks
> Fiz Nadian.


Re: SOLR indexing takes longer time

2020-08-17 Thread Jörn Franke
The DIH is single threaded and deprecated. Your best bet is to have a 
script/program extracting data from MongoDB and write them to Solr in Batches 
using multiple threads. You will see a significant higher performance for your 
data.

> Am 17.08.2020 um 20:23 schrieb Abhijit Pawar :
> 
> Hello,
> 
> We are indexing some 200K plus documents in SOLR 5.4.1 with no shards /
> replicas and just single core.
> It takes almost 3.5 hours to index that data.
> I am using a data import handler to import data from the mongo database.
> 
> Is there something we can do to reduce the time taken to index?
> Will upgrade to newer version help?
> 
> Appreciate your help!
> 
> Regards,
> Abhijit


Re: DIH on SolrCloud

2020-08-13 Thread Jörn Franke
DIH is deprecated in current Solr versions. The general recommendation is to do 
processing outside the Solr server and use the update handler (the normal one, 
not Cell) to add documents to the index. So you should avoid using it as it is 
not future proof .

If you need more Time to migrate to a non-DIH solution:
I recommend to look at all log files of all servers to find the real error 
behind the issue. If you trigger in Solr cloud mode DIH from node 2 that does 
not mean it is executed there !

What could to wrong:
Other nodes do not have access to files/database or there is a parsing error or 
a script error.

> Am 13.08.2020 um 17:21 schrieb Issei Nishigata :
> 
> Hi, All
> 
> I'm using Solr4.10 with SolrCloud mode.
> I have 10 Nodes. one of Nodes is Leader Node, the others is Replica.(I will
> call this Node1 to Node10 for convenience)
> -> 1 Shard, 1 Leader(Node1), 9 Replica(Node2-10)
> Indexing always uses DIH of Node2. Therefore, DIH may be executed when
> Node2 is Leader or Replica.
> Node2 is not forcibly set to Leader when DIH is executed.
> 
> At one point, when Node2 executed DIH in the Replica state, the following
> error in Node9 occurred.
> 
> [updateExecutor-1-thread-9737][ERROR][org.apache.solr.common.SolrException]
> - org.apache.solr.client.solrj.SolrServerException: IOException occured
> when talking to server at: http://samplehost:8983/solr/test_shard1_replica9
> 
> I think this is the error while sending data from Node2 to Node9. And Node9
> couldn't respond for some reason.
> 
> The error occurs sometimes however it is not reproducible so that the
> investigation is troublesome.
> Is there any possible cause for this problem? I am worrying about if it is
> doing Solr anti-pattern.
> The thing is, when running DIH by Node2 as Replica, the above error occurs
> towards Node1 as Leader,
> then soon after, all the nodes might be returning to the index of the
> Node1.
> Do you think my understanding makes sense?
> 
> If using DIH on SolrCloud is not recommended, please let me know about this.
> 
> Thanks,
> Issei


Re: Replicas in Recovery During Atomic Updates

2020-08-10 Thread Jörn Franke
How do you ingest it exactly with Atomtic updates ? Is there an update 
processor in-between? 

What are your settings for hard/soft commit?

For the shared going to recovery - do you have a log entry or something ?

What is the Solr version?

How do you setup ZK?

> Am 10.08.2020 um 16:24 schrieb Anshuman Singh :
> 
> Hi,
> 
> We have a SolrCloud cluster with 10 nodes. We have 6B records ingested in
> the Collection. Our use case requires atomic updates ("inc") on 5 fields.
> Now almost 90% documents are atomic updates and as soon as we start our
> ingestion pipelines, multiple shards start going into recovery, sometimes
> all replicas of some shards go into down state.
> The ingestion rate is also too slow with atomic updates, 4-5k per second.
> We were able to ingest records without atomic updates at the rate of 50k
> records per second without any issues.
> 
> What I'm suspecting is, the fact that these "inc" atomic updates
> require fetching of fields before indexing can cause slow rates but what
> I'm not getting is, why are the replicas going into recovery?
> 
> Regards,
> Anshuman


Re: Solr + Parquets

2020-08-07 Thread Jörn Franke
DIH is deprecated and it will be removed from Solr. You may though still be 
able to install it as a plug-in. However, AFAIK nobody maintains it. Do not use 
it anymore

You can write a custom Spark data source that writes to Solr or does it in a 
spark Map step using SolrJ .
In both cases do not create 100s of executors to avoid overloading.


> Am 07.08.2020 um 18:39 schrieb Kevin Van Lieshout :
> 
> Hi,
> 
> Is there any assistance around writing parquets from spark to solr shards
> or is it possible to customize a DIH to import a parquet to a solr shard.
> Let me know if this is possible, or the best work around for this. Much
> appreciated, thanks
> 
> 
> Kevin VL


Re: Solrj client 8.6.0 issue special characters in query

2020-08-07 Thread Jörn Franke
Hmm, setting -Dfile.encoding=UTF-8 solves the problem. I have to now check
which component of the application screws it up, but at the moment I do NOT
believe it is related to Solrj.

On Fri, Aug 7, 2020 at 11:53 AM Jörn Franke  wrote:

> Dear all,
>
> I have the following issues. I have a Solrj Client 8.6 (but it happens
> also in previous versions), where I execute, for example, the following
> query:
> Jörn
>
> If I look into Solr Admin UI it finds all the right results.
>
> If I use Solrj client then it does not find anything.
> Further, investigating in debug mode it seems that the URI to server gets
> wrongly encoded.
> Jörn becomes J%C3%83%C2%B6rn
> It should become only J%C3%B6rn
> any idea why this happens and why it add %83%C2 inbetween? Those do not
> seem to be even valid UTF-8 characters
>
> I verified with various statements that I give to Solrj the correct
> encoded String "Jörn"
>
> Can anyone help me here?
>
> Thank you.
>
> best regards
>


Solrj client 8.6.0 issue special characters in query

2020-08-07 Thread Jörn Franke
Dear all,

I have the following issues. I have a Solrj Client 8.6 (but it happens also
in previous versions), where I execute, for example, the following query:
Jörn

If I look into Solr Admin UI it finds all the right results.

If I use Solrj client then it does not find anything.
Further, investigating in debug mode it seems that the URI to server gets
wrongly encoded.
Jörn becomes J%C3%83%C2%B6rn
It should become only J%C3%B6rn
any idea why this happens and why it add %83%C2 inbetween? Those do not
seem to be even valid UTF-8 characters

I verified with various statements that I give to Solrj the correct encoded
String "Jörn"

Can anyone help me here?

Thank you.

best regards


Re: Searching for credit card numbers

2020-07-28 Thread Jörn Franke
A regex search at query time would leave room for attacks (eg a regex can 
easily be designed to block the Solr server forever).

If the field is store you can also try to use a cursor to go through all 
entries using a cursor and reindex the doc based on the field:

https://lucene.apache.org/solr/guide/8_4/pagination-of-results.html

This would also imply that you have the other fields stored. Otherwise reindex.
You can do this in parallel to the existing index and once finished simply 
change the alias for the collection (that would be without any downtime for the 
users but you require of course corresponding space).

> Am 28.07.2020 um 21:06 schrieb lstusr 5u93n4 :
> 
> Possible... yes. Agreed that this is the right approach. But if we already
> have a big index that we're searching through? Any way to "hack it"?
> 
>> On Tue, 28 Jul 2020 at 14:55, Walter Underwood 
>> wrote:
>> 
>> I’d do that at index time. Add an update request processor script that
>> does the regex and adds a field has_credit_card_number:true.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
 On Jul 28, 2020, at 11:50 AM, lstusr 5u93n4  wrote:
>>> 
>>> Let's say I have a text field that's been indexed with the standard
>>> tokenizer, and I want to match the docs that have credit card numbers in
>>> them (this is for altruistic purposes, not nefarious ones!). What's the
>>> best way to build a search that will do this?
>>> 
>>> Searching for "   " seems to return inconsistent results.
>>> 
>>> Maybe a regex search? "[0-9]{4}?[0-9]{4}?[0-9]{4}?[0-9]{4}" seems like it
>>> should work, but that's not matching the docs I think it should either...
>>> 
>>> Any suggestions?
>>> 
>>> Thanks In Advance!
>> 
>> 


Re: Meow attacks

2020-07-28 Thread Jörn Franke
In Addition what has been said before (use private networks/firewall rules) - 
activate Kerberos authentication so that only Solr hosts can write to Zk (the 
Solr client needs no write access) and use encryption where possible. 
Upgrade Solr to the latest version, use ssl , enable Kerberos, have clients not 
having any admin access on Solr (minimum privileges only!), use Solr whitelists 
to enable only clients that should access Solr, enable Java security manager (* 
to make it work with Kerberos auth you need for it to wait for a newer Solr 
version).

> Am 28.07.2020 um 22:41 schrieb Odysci :
> 
> Folks,
> 
> I suspect one of our Zookeeper installations on AWS was subject to a Meow
> attack (
> https://arstechnica.com/information-technology/2020/07/more-than-1000-databases-have-been-nuked-by-mystery-meow-attack/
> )
> 
> Basically, the configuration for one of our collections disappeared from
> the Zookeeper tree (when looking at the Solr interface), and it left
> several files ending in "-meow"
> Before I realized it, I stopped and restarted the ZK and Solr machines (as
> part of ubuntu updates), and when ZK didn't find the configuration for a
> collection, it deleted the collection from Solr. At least that's what I
> suspect happened.
> 
> Fortunately it affected a very small index and we had backups. But it is
> very worrisome.
> Has anyone had any problems with this?
> Is there any type of log that I can check to sort out how this happened?
> The ZK log complained that the configs for the collection were not there,
> but that's about it.
> 
> and, is there a better way to protect against such attacks?
> Thanks
> 
> Reinaldo


Re: Cannot read ZK Kerberos conf when enabling java security manager on 8.6

2020-07-21 Thread Jörn Franke
Jira created 

> Am 21.07.2020 um 10:28 schrieb Ishan Chattopadhyaya 
> :
> 
> I think this warrants a JIRA. To work around this issue for now, you can
> use an environment variable SOLR_SECURITY_MANAGER_ENABLED=false before
> starting Solr.
> 
>> On Thu, Jul 16, 2020 at 11:58 PM Jörn Franke  wrote:
>> 
>> The solution would be probably a policy file shipped with Solr that allows
>> the ZK jar to create a logincontext. I suggest that Solr ships it otherwise
>> one would need to adapt it for every Solr update manually to include the
>> version of the ZK jar.
>> 
>>> On Thu, Jul 16, 2020 at 8:15 PM Jörn Franke  wrote:
>>> 
>>> I believe it is a bug in Solr because we need to create a policy to allow
>>> creating a login context:
>>> See here chapter "Running the Code with a Security Manager"
>>> 
>>> 
>> http://www.informatik.hs-furtwangen.de/doku/java/j2sdk-1_4_1-doc/guide/security/jaas/tutorials/LoginConfigFile.html
>>> 
>>> Please confirm and I will create a JIRA issue for Solr
>>> 
>>> On Thu, Jul 16, 2020 at 8:06 PM Jörn Franke 
>> wrote:
>>> 
>>>> Hallo,
>>>> 
>>>> I am using Solr 8.6.0.
>>>> When activating the Java security manager then Solr cannot use anymore
>>>> the jaas-client conf specified via java.security.auth.login.conf with
>>>> Zookeeper. We have configured Kerberos authentication for Zookeeper.
>>>> When disabling java security manager it works perfectly fine.
>>>> 
>>>> The exact error message is : „No JAAS configuration section named
>>>> 'Client' was found“. Somehow it seems that the Java security manager
>> blocks
>>>> access to that file .
>>>> The directory for the file is in the -Dsolr.allowPaths
>>>> Could this be a bug or is it a misconfiguration?
>>>> 
>>>> 
>>>> Thank you.
>>>> 
>>>> Best regards
>>> 
>>> 
>> 


Re: CDCR stress-test issues

2020-07-17 Thread Jörn Franke
Instead of CDCR you may simply duplicate the pipeline across both data centers. 
Then there is no need at each step of the pipeline to replicate (storage to 
storage, index to index etc.).
Instead both pipelines run in different data centers in parallel.

> Am 24.06.2020 um 15:46 schrieb Oakley, Craig (NIH/NLM/NCBI) [C] 
> :
> 
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a 
> couple of issues.
> 
> One is that the tlog files keep accumulating for some nodes in the CDCR 
> system, particularly for the non-Leader nodes in the Source SolrCloud. No 
> quantity of hard commits seem to cause any of these tlog files to be 
> released. This can become a problem upon reboot if there are hundreds of 
> thousands of tlog files, and Solr fails to start (complaining that there are 
> too many open files).
> 
> The tlogs had been accumulating on all the nodes of the CDCR set of 
> SolrClouds until I added these two lines to the solrconfig.xml file (for 
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud which 
> accumulates tlog files (the Target SolrCloud does seem to have a tendency to 
> clean up the tlog files, as does the Leader of the Source SolrCloud). If I 
> use ADDREPLICAPROP and REBALANCELEADERS to change which node is the Leader, 
> and if I then start adding more data, the tlogs on the new Leader sometimes 
> will go away, but then the old Leader begins accumulating tlog files. I am 
> dubious whether frequent reassignment of Leadership would be a practical 
> solution.
> 
> I also have several times attempted to simulate a production environment by 
> running several loops simultaneously, each of which inserts multiple records 
> on each iteration of the loop. Several times, I end up with a dozen records 
> on (both replicas of) the Source which never make it to (either replica of) 
> the Target. The Target has thousands of records which were inserted before 
> the missing records, and thousands of records which were inserted after the 
> missing records (and all these records, the replicated and the missing, were 
> inserted by curl commands which only differed in sequential numbers 
> incorporated into the values being inserted).
> 
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says that 
> the fix for Solr 7.3 had a problem; and the header says "Affects Version/s: 
> 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
> 
> Are  there any suggestions?
> 
> Thanks


Re: AtomicUpdate on SolrCloud is not working

2020-07-17 Thread Jörn Franke
What does „not work correctly mean“?

Have you checked that all fields are stored or doc values?

> Am 17.07.2020 um 11:26 schrieb yo tomi :
> 
> Hi All
> 
> Sorry, above settings are contrary with each other.
> Actually, following setting does not work properly.
> ---
> 
> 
> 
> 
> 
> 
> ---
> And follows is working as expected.
> ---
> 
> 
> 
> 
> 
> 
> 
> ---
> 
> Thanks,
> Yoshiaki
> 
> 
> 2020年7月17日(金) 16:32 yo tomi :
> 
>> Hi, All
>> When I did AtomicUpdate on SolrCloud by the following setting, it does not 
>> work properly.
>> 
>> ---
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ---
>> When changed as follows and made it work, it became as expected.
>> ---
>> 
>> 
>> 
>> 
>> 
>> 
>> ---
>> The later setting and the way of using post-processor could make the same 
>> result, I though,
>> but using post-processor, bug of SOLR-8030 makes me not feel like using it.
>> By the latter setting even, is there any possibility of SOLR-8030 to become? 
>> Seeing the source code, tlog which is from leader comes to Replica seems to 
>> be processed correctly with UpdateRequestProcessor,
>> the latter setting had not been the right one for the bug, I though.Anyone 
>> knows the most appropriate way to configure AtomicUpdate on SolrCloud?
>> 
>> Thanks,
>> Yoshiaki
>> 
>> 


Re: Cannot read ZK Kerberos conf when enabling java security manager on 8.6

2020-07-16 Thread Jörn Franke
The solution would be probably a policy file shipped with Solr that allows
the ZK jar to create a logincontext. I suggest that Solr ships it otherwise
one would need to adapt it for every Solr update manually to include the
version of the ZK jar.

On Thu, Jul 16, 2020 at 8:15 PM Jörn Franke  wrote:

> I believe it is a bug in Solr because we need to create a policy to allow
> creating a login context:
> See here chapter "Running the Code with a Security Manager"
>
> http://www.informatik.hs-furtwangen.de/doku/java/j2sdk-1_4_1-doc/guide/security/jaas/tutorials/LoginConfigFile.html
>
> Please confirm and I will create a JIRA issue for Solr
>
> On Thu, Jul 16, 2020 at 8:06 PM Jörn Franke  wrote:
>
>> Hallo,
>>
>> I am using Solr 8.6.0.
>> When activating the Java security manager then Solr cannot use anymore
>> the jaas-client conf specified via java.security.auth.login.conf with
>> Zookeeper. We have configured Kerberos authentication for Zookeeper.
>> When disabling java security manager it works perfectly fine.
>>
>> The exact error message is : „No JAAS configuration section named
>> 'Client' was found“. Somehow it seems that the Java security manager blocks
>> access to that file .
>> The directory for the file is in the -Dsolr.allowPaths
>>  Could this be a bug or is it a misconfiguration?
>>
>>
>> Thank you.
>>
>> Best regards
>
>


Re: Cannot read ZK Kerberos conf when enabling java security manager on 8.6

2020-07-16 Thread Jörn Franke
I believe it is a bug in Solr because we need to create a policy to allow
creating a login context:
See here chapter "Running the Code with a Security Manager"
http://www.informatik.hs-furtwangen.de/doku/java/j2sdk-1_4_1-doc/guide/security/jaas/tutorials/LoginConfigFile.html

Please confirm and I will create a JIRA issue for Solr

On Thu, Jul 16, 2020 at 8:06 PM Jörn Franke  wrote:

> Hallo,
>
> I am using Solr 8.6.0.
> When activating the Java security manager then Solr cannot use anymore the
> jaas-client conf specified via java.security.auth.login.conf with
> Zookeeper. We have configured Kerberos authentication for Zookeeper.
> When disabling java security manager it works perfectly fine.
>
> The exact error message is : „No JAAS configuration section named 'Client'
> was found“. Somehow it seems that the Java security manager blocks access
> to that file .
> The directory for the file is in the -Dsolr.allowPaths
>  Could this be a bug or is it a misconfiguration?
>
>
> Thank you.
>
> Best regards


Cannot read ZK Kerberos conf when enabling java security manager on 8.6

2020-07-16 Thread Jörn Franke
Hallo,

I am using Solr 8.6.0.
When activating the Java security manager then Solr cannot use anymore the 
jaas-client conf specified via java.security.auth.login.conf with Zookeeper. We 
have configured Kerberos authentication for Zookeeper. 
When disabling java security manager it works perfectly fine.

The exact error message is : „No JAAS configuration section named 'Client' was 
found“. Somehow it seems that the Java security manager blocks access to that 
file .
The directory for the file is in the -Dsolr.allowPaths 
 Could this be a bug or is it a misconfiguration?


Thank you.

Best regards 

Re: Supporting multiple indexes in one collection

2020-07-01 Thread Jörn Franke
How many documents ? 
The real difference  was only a couple of ms?

> Am 01.07.2020 um 07:34 schrieb Raji N :
> 
> Had 2 indexes in 2 separate shards in one collection and had exact same
> data published with composite router with a prefix. Disabled all caches.
> Issued the same query which is a small query with q parameter and fq
> parameter . Number of queries which got executed  (with same threads and
> run for same time ) were more in 2  indexes with 2 separate shards case.
> 90th percentile response time was also few ms better.
> 
> Thanks,
> Raji
> 
>> On Tue, Jun 30, 2020 at 10:06 PM Jörn Franke  wrote:
>> 
>> What did you test? Which queries? What were the exact results in terms of
>> time ?
>> 
>>>> Am 30.06.2020 um 22:47 schrieb Raji N :
>>> 
>>> Hi ,
>>> 
>>> 
>>> Trying to place multiple smaller indexes in one collection (as we read
>>> solrcloud performance degrades as number of collections increase). We are
>>> exploring two ways
>>> 
>>> 
>>> 1) Placing each index on a single shard of a collection
>>> 
>>>  In this case placing documents for a single index is manual and
>>> automatic rebalancing not done by solr
>>> 
>>> 
>>> 2) Solr routing composite router with a prefix .
>>> 
>>> In this case solr doesn’t place all the docs with same prefix in one
>>> shard , so searches becomes distributed. But shard rebalancing is taken
>>> care by solr.
>>> 
>>> 
>>> We did a small perf test with both these set up. We saw the performance
>> for
>>> the first case (placing an index explicitly on a shard ) is better.
>>> 
>>> 
>>> Has anyone done anything similar. Can you please share your experience.
>>> 
>>> 
>>> Thanks,
>>> 
>>> Raji
>> 


Re: Supporting multiple indexes in one collection

2020-06-30 Thread Jörn Franke
What did you test? Which queries? What were the exact results in terms of time ?

> Am 30.06.2020 um 22:47 schrieb Raji N :
> 
> Hi ,
> 
> 
> Trying to place multiple smaller indexes in one collection (as we read
> solrcloud performance degrades as number of collections increase). We are
> exploring two ways
> 
> 
> 1) Placing each index on a single shard of a collection
> 
>   In this case placing documents for a single index is manual and
> automatic rebalancing not done by solr
> 
> 
> 2) Solr routing composite router with a prefix .
> 
>  In this case solr doesn’t place all the docs with same prefix in one
> shard , so searches becomes distributed. But shard rebalancing is taken
> care by solr.
> 
> 
> We did a small perf test with both these set up. We saw the performance for
> the first case (placing an index explicitly on a shard ) is better.
> 
> 
> Has anyone done anything similar. Can you please share your experience.
> 
> 
> Thanks,
> 
> Raji


Re: How to determine why solr stops running?

2020-06-29 Thread Jörn Franke
Maybe you can identify in the logfiles some critical queries?

What is the total size of the index?

What client are you using on the web app side? Are you reusing clients or 
create one new for every query.

> Am 29.06.2020 um 21:14 schrieb Ryan W :
> 
> On Mon, Jun 29, 2020 at 1:49 PM David Hastings 
> wrote:
> 
>> little nit picky note here, use 31gb, never 32.
> 
> 
> Good to know.
> 
> Just now I got this output from bin/solr status:
> 
>  "solr_home":"/opt/solr/server/solr",
>  "version":"7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy -
> 2019-05-28 23:37:48",
>  "startTime":"2020-06-29T17:35:13.966Z",
>  "uptime":"0 days, 1 hours, 32 minutes, 7 seconds",
>  "memory":"9.3 GB (%57.9) of 16 GB"}
> 
> That's the highest memory use I've seen.  Not sure if this indicates 16GB
> isn't enough.  Then I ran it again a couple minutes later and it was down
> to 598.3 MB.  I wonder what accounts for these wide swings.  I can't
> imagine if a few users are doing searches, suddenly it uses 9 GB of RAM.
> 
> 
>> On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:
>> 
>>> It figures it would happen again a couple hours after I suggested the
>> issue
>>> might be resolved.  Just now, Solr stopped running.  I cleared the cache
>> in
>>> my app a couple times around the time that it happened, so perhaps that
>> was
>>> somehow too taxing for the server.  However, I've never allocated so much
>>> RAM to a website before, so it's odd that I'm getting these failures.  My
>>> colleagues were astonished when I said people on the solr-user list were
>>> telling me I might need 32GB just for solr.
>>> 
>>> I manage another project that uses Drupal + Solr, and we have a total of
>>> 8GB of RAM on that server and Solr never, ever stops.  I've been managing
>>> that site for years and never seen a Solr outage.  On that project,
>>> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB
>> or
>>> more?
>>> 
>>> "The thing that’s unsettling about this is that assuming you were hitting
>>> OOMs, and were running the OOM-killer script, you _should_ have had very
>>> clear evidence that that was the cause."
>>> 
>>> How do I know if I'm running the OOM-killer script?
>>> 
>>> Thank you.
>>> 
>>> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson >> 
>>> wrote:
>>> 
 The thing that’s unsettling about this is that assuming you were
>> hitting
 OOMs,
 and were running the OOM-killer script, you _should_ have had very
>> clear
 evidence that that was the cause.
 
 If you were not running the killer script, the apologies for not asking
 about that
 in the first place. Java’s performance is unpredictable when OOMs
>> happen,
 which is the point of the killer script: at least Solr stops rather
>> than
>>> do
 something inexplicable.
 
 Best,
 Erick
 
> On Jun 29, 2020, at 11:52 AM, David Hastings <
 hastings.recurs...@gmail.com> wrote:
> 
> sometimes just throwing money/ram/ssd at the problem is just the best
> answer.
> 
> On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
> 
>> Thanks everyone. Just to give an update on this issue, I bumped the
>>> RAM
>> available to Solr up to 16GB a couple weeks ago, and haven’t had any
>> problem since.
>> 
>> 
>> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
>> hastings.recurs...@gmail.com>
>> wrote:
>> 
>>> me personally, around 290gb.  as much as we could shove into them
>>> 
>>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson <
 erickerick...@gmail.com
>>> 
>>> wrote:
>>> 
 How much physical RAM? A rule of thumb is that you should allocate
>>> no
>>> more
 than 25-50 percent of the total physical RAM to Solr. That's
>> cumulative,
 i.e. the sum of the heap allocations across all your JVMs should
>> be
>> below
 that percentage. See Uwe Schindler's mmapdirectiry blog...
 
 Shot in the dark...
 
 On Tue, Jun 16, 2020, 11:51 David Hastings <
>> hastings.recurs...@gmail.com
 
 wrote:
 
> To add to this, i generally have solr start with this:
> -Xms31000m-Xmx31000m
> 
> and the only other thing that runs on them are maria db gallera
>> cluster
> nodes that are not in use (aside from replication)
> 
> the 31gb is not an accident either, you dont want 32gb.
> 
> 
> On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey <
>> apa...@elyograg.org
 
 wrote:
> 
>> On 6/11/2020 11:52 AM, Ryan W wrote:
 I will check "dmesg" first, to find out any hardware error
>>> message.
>> 
>> 
>> 
>>> [1521232.781801] Out of memory: Kill process 117529 (httpd)
>> score 9
 or
>>> sacrifice child
>>> [1521232.782908] Killed process 117529 (httpd), UID 48,
>> 

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-24 Thread Jörn Franke
I agree with Bernd. I believe also that change is natural so eventually one 
needs to evolve the terminology or create a complete new product. To evolve the 
terminology one can write a page in the ref guide for translating it and over 
time adapt it in Solr etc.


> Am 24.06.2020 um 13:30 schrieb Bernd Fehling :
> 
> I'm following this thread now for a while and I can understand
> the wish to change some naming/wording/speech in one or the other
> programs but I always get back to the one question:
> "Is it the weapon which kills people or the hand controlled by
> the mind which fires the weapon?"
> 
> The thread started with slave - slavery, then turned over to master
> and followed by leader (for me as a german... you know).
> What will come next?
> 
> And more over, we now discuss about changes in the source code and
> due to this there need to be changes to the documentation.
> What about the books people wrote about this programs and source code,
> should we force this authors to rewrite their books?
> May be we should file a request to all web search engines to reject
> all stored content about these "banned" words?
> And contact all web hosters about providing bad content.
> 
> To sum things up, within my 40 years of computer science and writing
> programs I have never had a nanosecond any thoughts about words
> like master, slave, leader, ... other than thinking about computers
> and programming.
> 
> Just my 2 cents.
> 
> For what it is worth, I tend to guide/follower if there "must be" any changes.
> 
> Bernd


Re: solr fq with contains not returning any results

2020-06-24 Thread Jörn Franke
I don’t know your data, but could it be that you tokenize differently ?

Why do you do the wildcard search at all? Maybe a different tokenizing strategy 
can bring you more effieciently results? Depends on what you need to achieve of 
course ... 

> Am 24.06.2020 um 05:37 schrieb yaswanth kumar :
> 
> I am using solr 8.2
> 
> And when trying to do fq=auto_nsallschools:*bostonschool*, the data is not
> being returned. But if I do the same in solr 5.5 (which I already have and
> we are in process of migrating to 8.2 ) its returning results.
> 
> if I do fq=auto_nsallschools:bostonschool
> or
> fq=auto_nsallschools:bostonschool* its returning results but when I try
> with contains like described above or fq=auto_nsallschools:*bostonschool
> (ends with) it's not returning any results.
> 
> The field which we are already using is a copy field and multi valued, am I
> doing something wrong? or does 8.2 need some adjustment in the configs?
> 
> Here is the schema
> 
>  multiValued="true"/>
>  stored="false" multiValued="true"/>
> 
> 
> 
>  positionIncrementGap="100">
>  
>
>
>
>  
>
> 
> 
>  positionIncrementGap="100">
>  
> pattern="(\)" replacement="_and_" />
> pattern="(\$)" replacement="_dollar_" />
> pattern="(\*)" replacement="_star_" />
> pattern="(\+)" replacement="_plus_" />
> pattern="(\-)" replacement="_minus_" />
> pattern="(\#)" replacement="_sharp_" />
> pattern="(\%)" replacement="_percent_" />
> pattern="(\=)" replacement="_equal_" />
> pattern="(\)" replacement="_lessthan_" />
> pattern="(\)" replacement="_greaterthan_" />
> pattern="(\€)" replacement="_euro_" />
> pattern="(\¢)" replacement="_cent_" />
> pattern="(\£)" replacement="_pound_" />
> pattern="(\¥)" replacement="_yuan_" />
> pattern="(\©)" replacement="_copyright_" />
> pattern="(\®)" replacement="_registered_" />
> pattern="(\|)" replacement="_pipe_" />
> pattern="(\^)" replacement="_caret_" />
> pattern="(\~)" replacement="_tilt_" />
> pattern="(\™)" replacement="_treadmark_" />
> pattern="(\@)" replacement="_at_" />
> pattern="(\)" replacement=" _doublequote_ " />
> pattern="(\()" replacement=" _leftparentheses_ " />
> pattern="(\))" replacement=" _rightparentheses_ " />
> pattern="(\{)" replacement="_leftcurlybracket_" />
> pattern="(\})" replacement="_rightcurlybracket_" />
> pattern="(\[)" replacement="_leftsquarebracket_" />
> pattern="(\])" replacement="_rightsquarebracket_" />
> synonyms="punctuation-whitelist.txt" ignoreCase="true" expand="false"/>
>
>
>
>  
>
> 
> Thanks,
> 
> -- 
> Thanks & Regards,
> Yaswanth Kumar Konathala.
> yaswanth...@gmail.com


Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-19 Thread Jörn Franke
Might be confusing with the nested doc terminology 

> Am 19.06.2020 um 20:14 schrieb Atita Arora :
> 
> I see so many topics being discussed in this thread and I literary got lost
> somewhere , but was just thinking can we call it Parent -Child
> architecture, m sure no one will raise an objection there.
> 
> Although, looking at comments above I still feel it would be a bigger
> effort to convince everyone than making a change. ;)
> 
>> On Fri, 19 Jun 2020, 17:21 Mark H. Wood,  wrote:
>> 
>>> On Fri, Jun 19, 2020 at 09:22:49AM -0400, j.s. wrote:
>>> On 6/18/20 9:50 PM, Rahul Goswami wrote:
 So +1 on "slave" being the problematic term IMO, not "master".
>>> 
>>> but you cannot have a master without a slave, n'est-ce pas?
>> 
>> Well, yes.  In education:  Master of Science, Arts, etc.  In law:
>> Special Master (basically a judge's delegate).  See also "magistrate."
>> None of these has any connotation of the ownership of one person by
>> another.
>> 
>> (It's a one-way relationship:  there is no slavery without mastery,
>> but there are other kinds of mastery.)
>> 
>> But this is an emotional issue, not a logical one.  If doing X makes
>> people angry, and we don't want to make those people angry, then
>> perhaps we should not do X.
>> 
>>> i think it is better to use the metaphor of copying rather than one of
>>> hierarchy. language has so many (unintended) consequences ...
>> 
>> Sensible.
>> 
>> --
>> Mark H. Wood
>> Lead Technology Analyst
>> 
>> University Library
>> Indiana University - Purdue University Indianapolis
>> 755 W. Michigan Street
>> Indianapolis, IN 46202
>> 317-274-0749
>> www.ulib.iupui.edu
>> 


Re: Proxy Error when cluster went down

2020-06-16 Thread Jörn Franke
Do you have another host with replica alive or are all replicas on the host 
that is down?

Are all SolrCloud hosts in the same ZooKeeper?

> Am 16.06.2020 um 19:29 schrieb Vishal Vaibhav :
> 
> Hi thanks . My solr is running in kubernetes. So host name goes away with
> the pod going search-rules-solr-v1-2.search-rules-solr-v1.search-digital.s
> vc.cluster.local
>  So in my case the pod with this host has gone and also the hostname
> search-rules-solr-v1-2.search-rules-solr-v1.search-digital.svc.cluster.local
> is no more there.. should not the solr cloud be aware of the fact that all
> the replicas in that solr host is down and should not proxy the request to
> that node..
> 
>> On Tue, 16 Jun 2020 at 5:06 PM, Shawn Heisey  wrote:
>> 
>>> On 6/15/2020 9:04 PM, Vishal Vaibhav wrote:
>>> I am running on solr 8.5. For some reason entire cluster went down. When
>> i
>>> am trying to bring up the nodes,its not coming up. My health check is
>>> on "/solr/rules/admin/system". I tried forcing a leader election but it
>>> dint help.
>>> so when i run the following commands. Why is it trying to proxy when
>> those
>>> nodes are down. Am i missing something?
>> 
>> 
>> 
>>> java.net.UnknownHostException:
>>> 
>> search-rules-solr-v1-2.search-rules-solr-v1.search-digital.svc.cluster.local:
>> 
>> It is trying to proxy because it's SolrCloud.  SolrCloud has an internal
>> load balancer that spreads queries across multiple replicas when
>> possible.  Your cluster must be aware of multiple servers where the
>> "rules" collection can be queried.
>> 
>> The underlying problem behind this error message is that the following
>> hostname is being looked up, and it doesn't exist:
>> 
>> 
>> search-rules-solr-v1-2.search-rules-solr-v1.search-digital.svc.cluster.local
>> 
>> This hostname is most likely coming from /etc/hosts on one of your
>> systems when that system starts Solr and it registers with the cluster,
>> and that /etc/hosts file is the ONLY place that the hostname exists, so
>> when SolrCloud tries to forward the request to that server, it is failing.
>> 
>> Thanks,
>> Shawn
>> 


Re: Solr cloud backup/restore not working

2020-06-16 Thread Jörn Franke
Have you looked in the Solr logfiles?

> Am 16.06.2020 um 05:46 schrieb yaswanth kumar :
> 
> Can anyone here help on the posted question pls??
> 
>> On Fri, Jun 12, 2020 at 10:38 AM yaswanth kumar 
>> wrote:
>> 
>> Using solr 8.2.0 and setup a cloud with 2 nodes. (2 replica's for each
>> collection)
>> Enabled basic authentication and gave all access to the admin user
>> 
>> Now trying to use solr cloud backup/restore API, backup is working great,
>> but when trying to invoke restore API its throwing the below error
>> 
>> {
>>  "responseHeader":{
>>"status":500,
>>"QTime":349},
>>  "Operation restore caused
>> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>> ADDREPLICA failed to create replica",
>>  "exception":{
>>"msg":"ADDREPLICA failed to create replica",
>>"rspCode":500},
>>  "error":{
>>"metadata":[
>>  "error-class","org.apache.solr.common.SolrException",
>>  "root-error-class","org.apache.solr.common.SolrException"],
>>"msg":"ADDREPLICA failed to create replica",
>>"trace":"org.apache.solr.common.SolrException: ADDREPLICA failed to
>> create replica\n\tat
>> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:280)\n\tat
>> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:252)\n\tat
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:820)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:786)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:546)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>> org.eclipse.jetty.server.Server.handle(Server.java:505)\n\tat
>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)\n\tat
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)\n\tat
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat
>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
>> org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:427)\n\tat
>> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:321)\n\tat
>> org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159)\n\tat
>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
>> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
>> 

Re: How to determine why solr stops running?

2020-06-15 Thread Jörn Franke
What is the Service definition of Solr in Redhat?


> Am 15.06.2020 um 19:46 schrieb Ryan W :
> 
> It happened again today.  Again, no other apparent problems on the server.
> Nothing else is stopping.  Nothing in the logs that strikes me as useful.
> I'm using Red Hat Linux 7.8 and Solr 7.7.2.
> 
> Solr is stopping a couple times per week and I don't know how to determine
> why.
> 
>> On Sun, Jun 14, 2020 at 9:41 AM Ryan W  wrote:
>> 
>> Thank you.  I pasted those settings at the end of my /etc/default/
>> solr.in.sh just now and restarted solr.  I will see if that fixes it.
>> Previously, I had no settings at all in solr.in.sh except for SOLR_PORT.
>> 
>> On Thu, Jun 11, 2020 at 1:59 PM Walter Underwood 
>> wrote:
>> 
>>> 1. You have a tiny heap. 536 Megabytes is not enough.
>>> 2. I stopped using the CMS GC years ago.
>>> 
>>> Here is the GC config we use on every one of our 150+ Solr hosts. We’re
>>> still on Java 8, but will be upgrading soon.
>>> 
>>> SOLR_HEAP=8g
>>> # Use G1 GC  -- wunder 2017-01-23
>>> # Settings from https://wiki.apache.org/solr/ShawnHeisey
>>> GC_TUNE=" \
>>> -XX:+UseG1GC \
>>> -XX:+ParallelRefProcEnabled \
>>> -XX:G1HeapRegionSize=8m \
>>> -XX:MaxGCPauseMillis=200 \
>>> -XX:+UseLargePages \
>>> -XX:+AggressiveOpts \
>>> "
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
 On Jun 11, 2020, at 10:52 AM, Ryan W  wrote:
 
 On Wed, Jun 10, 2020 at 8:35 PM Hup Chen  wrote:
 
> I will check "dmesg" first, to find out any hardware error message.
> 
 
 Here is what I see toward the end of the output from dmesg:
 
 [1521232.781785] [118857]48 118857   108785  677 201
 901 0 httpd
 [1521232.781787] [118860]48 118860   108785  710 201
 881 0 httpd
 [1521232.781788] [118862]48 118862   113063 5256 210
 725 0 httpd
 [1521232.781790] [118864]48 118864   114085 6634 212
 703 0 httpd
 [1521232.781791] [118871]48 118871   13968732323 262
 620 0 httpd
 [1521232.781793] [118873]48 118873   108785  821 201
 792 0 httpd
 [1521232.781795] [118879]48 118879   14026332719 263
 621 0 httpd
 [1521232.781796] [118903]48 118903   108785  812 201
 771 0 httpd
 [1521232.781798] [118905]48 118905   113575 5606 211
 660 0 httpd
 [1521232.781800] [118906]48 118906   113563 5694 211
 626 0 httpd
 [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
 sacrifice child
 [1521232.782908] Killed process 117529 (httpd), UID 48,
>>> total-vm:675824kB,
 anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
 
 Is this a relevant "Out of memory" message?  Does this suggest an OOM
 situation is the culprit?
 
 When I grep in the solr logs for oom, I see some entries like this...
 
 ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4
 -XX:CMSInitiatingOccupancyFraction=50
>>> -XX:CMSMaxAbortablePrecleanTime=6000
 -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
 -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520
 -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
 -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
 -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
 -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
 -XX:-OmitStackTraceInFastThrow
 -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
>>> /opt/solr/server/logs
 -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
 -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
 -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
 -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
 -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation
 -XX:+UseParNewGC
 
 Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh".
>>> But I
 think this is just a setting that indicates what to do in case of an
>>> OOM.
 And if I look in that oom_solr.sh file, I see it would write an entry
>>> to a
 solr_oom_kill log. And there is no such log in the logs directory.
 
 Many thanks.
 
 
 
 
> Then use some system admin tools to monitor that server,
> for instance, top, vmstat, lsof, iostat ... or simply install some nice
> free monitoring tool into this system, like monit, monitorix, nagios.
> Good luck!
> 
> 
> From: Ryan W 
> Sent: Thursday, June 11, 2020 2:13 AM
> To: 

Re: Script to check if solr is running

2020-06-08 Thread Jörn Franke
Use the solution described by Walter. This allows you to automatically restart 
in case of failure and is also cleaner than defining a cronjob. Otherwise This 
would be another dependency one needs to keep in mind - means if there is an 
issue and someone does not know the system the person has to look at different 
places which never is good 

> Am 04.06.2020 um 18:36 schrieb Ryan W :
> 
> Does anyone have a script that checks if solr is running and then starts it
> if it isn't running?  Occasionally my solr stops running even if there has
> been no Apache restart.  I haven't been able to determine the root cause,
> so the next best thing might be to check every 15 minutes or so if it's
> running and run it if it has stopped.
> 
> Thanks.


Re: Indexing PDF on SOLR 8.5

2020-06-07 Thread Jörn Franke
You have to write an external application that creates multiple threads, parses 
the PDFs and index them in Solr. Ideally you parse the PDFs once and store the 
resulting text on some file system and then index it. Reason is that if you 
upgrade to two major versions of Solr you might need to reindex again. Then you 
can save time because you don’t need to parse the PDFs again. 
It can be also useful in case you are not sure yet about the final schema and 
need to index several times in different schemas etc

You can also use Apache manifoldCF.



> Am 07.06.2020 um 19:19 schrieb Fiz N :
> 
> Hello SOLR Experts,
> 
> I am working on a POC to Index millions of PDF documents present in
> Multiple Folder in fileshare.
> 
> Could you please let me the best practices and step to implement it.
> 
> Thanks
> Fiz Nadiyal.


Re: Solr takes time to warm up core with huge data

2020-06-05 Thread Jörn Franke
I think DIH is the wrong solution for this. If you do an external custom load 
you will be probably much faster.

You have too much JVM memory from my point of view. Reduce it to eight or 
similar.

It seems you are just exporting data so you are better off work the exporting 
handler.
Add docvalues to the fields for this. It looks like you have no text field to 
be searched but only simple fields (string, date etc).

 You should not use the normal handler to return many results at once. If you 
cannot use the Export handler then use cursors :

https://lucene.apache.org/solr/guide/8_4/pagination-of-results.html#using-cursors

Both work to sort large result sets without consuming the whole memory

> Am 05.06.2020 um 08:18 schrieb Srinivas Kashyap 
> :
> 
> Thanks Shawn,
> 
> The filter queries are not complex. Below are the filter queries I’m running 
> for the corresponding schema entry:
> 
> q=*:*=PARENT_DOC_ID:100=MODIFY_TS:[1970-01-01T00:00:00Z TO 
> *]=PHY_KEY2:"HQ012206"=PHY_KEY1:"JACK"=1000=MODIFY_TS 
> desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 
> asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 asc,PHY_KEY7 
> asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc
> 
> This was the original query. Since there were lot of sorting fields, we 
> decided to not do on the solr side, instead fetch the query response and do 
> the sorting outside solr. This eliminated the need of more JVM memory which 
> was allocated. Every time we ran this query, solr would crash exceeding the 
> JVM memory. Now we are only running filter queries.
> 
> And regarding the filter cache, it is in default setup: (we are using default 
> solrconfig.xml, and we have only added the request handler for DIH)
> 
>  size="512"
> initialSize="512"
> autowarmCount="0"/>
> 
> Now that you’re aware of the size and numbers, can you please let me know 
> what values/size that I need to increase? Is there an advantage of moving 
> this single core to solr cloud? If yes, can you let us know, how many 
> shards/replica do we require for this core considering we allow it to grow as 
> users transact. The updates to this core is not thru DIH delta import rather, 
> we are using SolrJ to push the changes.
> 
> 
>   type="string"  indexed="true"  stored="true"
> omitTermFreqAndPositions="true" />
> type="date"indexed="true"  
> stored="true"omitTermFreqAndPositions="true" />
>   type="string"  indexed="true"  
> stored="true"omitTermFreqAndPositions="true" />
>   type="string"  indexed="true"  
> stored="true"omitTermFreqAndPositions="true" />
>   type="string"  indexed="true"  
> stored="true"omitTermFreqAndPositions="true" />
>   type="string"  indexed="true"  
> stored="true"omitTermFreqAndPositions="true" />
>   type="string"  indexed="true"  
> stored="true"omitTermFreqAndPositions="true" />
>   type="string"  indexed="true"  
> stored="true"omitTermFreqAndPositions="true" />
>   type="string"  indexed="true"  
> stored="true"omitTermFreqAndPositions="true" />
>   type="string"  indexed="true"  
> stored="true"omitTermFreqAndPositions="true" />
>   type="string"  indexed="true"  
> stored="true"omitTermFreqAndPositions="true" />
>  type="string"  indexed="true"  
> stored="true"omitTermFreqAndPositions="true" />
> 
> 
> Thanks,
> Srinivas
> 
> 
> 
>> On 6/4/2020 9:51 PM, Srinivas Kashyap wrote:
>> We are on solr 8.4.1 and In standalone server mode. We have a core with 
>> 497,767,038 Records indexed. It took around 32Hours to load data through DIH.
>> 
>> The disk occupancy is shown below:
>> 
>> 82G /var/solr/data//data/index
>> 
>> When I restarted solr instance and went to this core to query on solr admin 
>> GUI, it is hanging and is showing "Connection to Solr lost. Please check the 
>> Solr instance". But when I go back to dashboard, instance is up and I'm able 
>> to query other cores.
>> 
>> Also, querying on this core is eating up JVM memory allocated(24GB)/(32GB 
>> RAM). A query(*:*) with filterqueries is overshooting the memory with OOM.
> 
> You're going to want to have a lot more than 8GB available memory for
> disk caching with an 82GB index. That's a performance thing... with so
> little caching memory, Solr will be slow, but functional. That aspect
> of your setup will NOT lead to out of memory.
> 
> If you are experiencing Java "OutOfMemoryError" exceptions, you will
> need to figure out what resource is running out. It might be heap
> memory, but it also might 

Re: solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-04 Thread Jörn Franke
You need to separate keystore and truststore. 

I would leave the stores in their original format and provide the type in 
solr.in.sh

There is no need to convert them to JKS, PKCS12 is perfectly supported 

> Am 04.06.2020 um 06:48 schrieb yaswanth kumar :
> 
> Hi Franke,
> 
> I suspect its because of the certificate encryption ?? But will wait for
> you to confirm the same. We are trying to generate a certs with RSA 2048
> and finally combining them to a single JKS and that's what we are referring
> as a keystore and truststore, let me know if it doesn't work or if there is
> a standard procedure to do this certs.
> 
> Thanks,
> 
>> On Wed, Jun 3, 2020 at 8:25 AM yaswanth kumar  wrote:
>> 
>> thanks Franke,
>> 
>> I now made the use of the default jetty-ssl.xml that comes with the solr
>> package, but the issue is still happening when I try to push data to a
>> non-leader node.
>> 
>> Do you still think if its something to do with the configurations ??
>> 
>> Thanks,
>> 
>>> On Wed, Jun 3, 2020 at 12:29 AM Jörn Franke  wrote:
>>> 
>>> Why in the jetty-ssl.xml?
>>> 
>>> Should this not be configured in the solr.in.sh?
>>> 
>>>> Am 03.06.2020 um 00:38 schrieb yaswanth kumar :
>>>> 
>>>> Thanks Franke, but yes for all these questions I did configured it
>>>> properly, I made sure to include
>>>> 
>>>> >>> default="JKS"/>
>>>> >>> default="JKS"/>
>>>> in the jetty-ssl.xml along with the path keystore and truststore.
>>>> 
>>>> Also I have made sure that trusstore exists on all nodes and also I am
>>>> using the same file for both keystore and truststore as below
>>>> >>> default="./etc/solr-keystore.jks"/>
>>>> >>> name="solr.jetty.keystore.password" default=""/>
>>>> >>> default="./etc/solr-keystore.jks"/>
>>>> >>> name="solr.jetty.truststore.password" default=""/>
>>>> 
>>>> also urlScheme for ZK is set to https
>>>> 
>>>> 
>>>> Also the main error that I posted is the one that I am seeing as a
>>> return
>>>> response where as the below one is what I see from solr logs
>>>> 
>>>> 2020-06-02 22:32:04.472 ERROR (qtp984876512-93) [c:default s:shard1
>>>> r:core_node3 x:default_shard1_replica_n1] o.a.s.s.HttpSolrCall
>>>> null:org.apache.solr.update.processor.Distr$
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribFinish(DistributedZkUpdateProcessor.java:1189)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1096)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
>>>>   at
>>>> 
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>>>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
>>>>   at
>>>> or

Re: solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-04 Thread Jörn Franke
If the keystore and/or truststore is encrypted you need to provide the Passwort 
in solr.in.sh

> Am 04.06.2020 um 18:38 schrieb yaswanth kumar :
> 
> I haven't done any changes on jetty xml , I am just using what it comes
> with the solr package. just doing it in solr.in.sh but I am still seeing
> the same issue.
> 
> Thanks,
> 
>> On Thu, Jun 4, 2020 at 12:23 PM Jörn Franke  wrote:
>> 
>> I think you should not do it in the Jetty xml
>> Follow the official reference guide.
>> It should be in solr.in.sh
>> 
>> https://lucene.apache.org/solr/guide/8_4/enabling-ssl.html
>> 
>> 
>> 
>> 
>>>> Am 04.06.2020 um 06:48 schrieb yaswanth kumar :
>>> 
>>> Hi Franke,
>>> 
>>> I suspect its because of the certificate encryption ?? But will wait for
>>> you to confirm the same. We are trying to generate a certs with RSA 2048
>>> and finally combining them to a single JKS and that's what we are
>> referring
>>> as a keystore and truststore, let me know if it doesn't work or if there
>> is
>>> a standard procedure to do this certs.
>>> 
>>> Thanks,
>>> 
>>>> On Wed, Jun 3, 2020 at 8:25 AM yaswanth kumar 
>> wrote:
>>>> 
>>>> thanks Franke,
>>>> 
>>>> I now made the use of the default jetty-ssl.xml that comes with the solr
>>>> package, but the issue is still happening when I try to push data to a
>>>> non-leader node.
>>>> 
>>>> Do you still think if its something to do with the configurations ??
>>>> 
>>>> Thanks,
>>>> 
>>>>> On Wed, Jun 3, 2020 at 12:29 AM Jörn Franke 
>> wrote:
>>>>> 
>>>>> Why in the jetty-ssl.xml?
>>>>> 
>>>>> Should this not be configured in the solr.in.sh?
>>>>> 
>>>>>> Am 03.06.2020 um 00:38 schrieb yaswanth kumar >> :
>>>>>> 
>>>>>> Thanks Franke, but yes for all these questions I did configured it
>>>>>> properly, I made sure to include
>>>>>> 
>>>>>> >>>>> default="JKS"/>
>>>>>> >>>>> default="JKS"/>
>>>>>> in the jetty-ssl.xml along with the path keystore and truststore.
>>>>>> 
>>>>>> Also I have made sure that trusstore exists on all nodes and also I am
>>>>>> using the same file for both keystore and truststore as below
>>>>>> >>>>> default="./etc/solr-keystore.jks"/>
>>>>>> >>>>> name="solr.jetty.keystore.password" default=""/>
>>>>>> >>>>> default="./etc/solr-keystore.jks"/>
>>>>>> >>>>> name="solr.jetty.truststore.password" default=""/>
>>>>>> 
>>>>>> also urlScheme for ZK is set to https
>>>>>> 
>>>>>> 
>>>>>> Also the main error that I posted is the one that I am seeing as a
>>>>> return
>>>>>> response where as the below one is what I see from solr logs
>>>>>> 
>>>>>> 2020-06-02 22:32:04.472 ERROR (qtp984876512-93) [c:default s:shard1
>>>>>> r:core_node3 x:default_shard1_replica_n1] o.a.s.s.HttpSolrCall
>>>>>> null:org.apache.solr.update.processor.Distr$
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribFinish(DistributedZkUpdateProcessor.java:1189)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1096)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>>>  at
>>>>>> 
>>>>> 
>> org

Re: solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-04 Thread Jörn Franke
I think you should not do it in the Jetty xml
Follow the official reference guide.
It should be in solr.in.sh

https://lucene.apache.org/solr/guide/8_4/enabling-ssl.html




> Am 04.06.2020 um 06:48 schrieb yaswanth kumar :
> 
> Hi Franke,
> 
> I suspect its because of the certificate encryption ?? But will wait for
> you to confirm the same. We are trying to generate a certs with RSA 2048
> and finally combining them to a single JKS and that's what we are referring
> as a keystore and truststore, let me know if it doesn't work or if there is
> a standard procedure to do this certs.
> 
> Thanks,
> 
>> On Wed, Jun 3, 2020 at 8:25 AM yaswanth kumar  wrote:
>> 
>> thanks Franke,
>> 
>> I now made the use of the default jetty-ssl.xml that comes with the solr
>> package, but the issue is still happening when I try to push data to a
>> non-leader node.
>> 
>> Do you still think if its something to do with the configurations ??
>> 
>> Thanks,
>> 
>>> On Wed, Jun 3, 2020 at 12:29 AM Jörn Franke  wrote:
>>> 
>>> Why in the jetty-ssl.xml?
>>> 
>>> Should this not be configured in the solr.in.sh?
>>> 
>>>> Am 03.06.2020 um 00:38 schrieb yaswanth kumar :
>>>> 
>>>> Thanks Franke, but yes for all these questions I did configured it
>>>> properly, I made sure to include
>>>> 
>>>> >>> default="JKS"/>
>>>> >>> default="JKS"/>
>>>> in the jetty-ssl.xml along with the path keystore and truststore.
>>>> 
>>>> Also I have made sure that trusstore exists on all nodes and also I am
>>>> using the same file for both keystore and truststore as below
>>>> >>> default="./etc/solr-keystore.jks"/>
>>>> >>> name="solr.jetty.keystore.password" default=""/>
>>>> >>> default="./etc/solr-keystore.jks"/>
>>>> >>> name="solr.jetty.truststore.password" default=""/>
>>>> 
>>>> also urlScheme for ZK is set to https
>>>> 
>>>> 
>>>> Also the main error that I posted is the one that I am seeing as a
>>> return
>>>> response where as the below one is what I see from solr logs
>>>> 
>>>> 2020-06-02 22:32:04.472 ERROR (qtp984876512-93) [c:default s:shard1
>>>> r:core_node3 x:default_shard1_replica_n1] o.a.s.s.HttpSolrCall
>>>> null:org.apache.solr.update.processor.Distr$
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribFinish(DistributedZkUpdateProcessor.java:1189)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1096)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>>>>   at
>>>> 
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
>>>>   at
>>>> 
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>>>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
>>>>   at
>>>> org.apache.solr.servlet.HttpS

Re: Insert documents to a particular shard

2020-06-02 Thread Jörn Franke
Hint: you can easily try out streaming expressions in the admin UI

> Am 03.06.2020 um 07:32 schrieb Jörn Franke :
> 
> 
> You are trying to achieve data locality by having parents and children in the 
> same shard?
> Does document routing address it?
> 
> https://lucene.apache.org/solr/guide/8_5/shards-and-indexing-data-in-solrcloud.html#document-routing
> 
> 
> On a side node, I don’t know your complete use case, but have you explored 
> streaming expressions for graph traversal? 
> 
> https://lucene.apache.org/solr/guide/8_5/graph-traversal.html
> 
> 
>>> Am 03.06.2020 um 00:37 schrieb sambasivarao giddaluri 
>>> :
>>> 
>> Hi All,
>> I am running solr in cloud mode in local with 2 shards and 2 replica on
>> port 8983 and 7574 and figuring out how to insert document in to a
>> particular shard , I read about implicit and composite route but i don't
>> think it will work for my usecase.
>> 
>> shard1 :  http://192.168.0.112:8983/family_shard1_replica_n1
>> http://192.168.0.112:7574/family_shard1_replica_n2
>> 
>> shard2:   http://192.168.0.112:8983/family_shard2_replica_n3
>> http://192.168.0.112:7574/family_shard2_replica_n4
>> 
>> we have documents with parent child relationship but flatten out with 2
>> levels down and reference to each other.
>> family schema documents:
>> {
>> "Id":"1"
>> "document_type":"parent"
>> "name":"John"
>> }
>> {
>> "Id":"2"
>> "document_type":"child"
>> "parentId":"1"
>> "name":"Rodney"
>> }
>> {
>> "Id":"3"
>> "document_type":"child"
>> "parentId":"1"
>> "name":"George"
>> }
>> {
>> "Id":"4"
>> "document_type":"grandchild"
>> "parentId":"1",
>> "childIdId":"2"
>> "name":"David"
>> }
>> we have complex queries to get data based on graph query parser and  as
>> graph query parser does not work on solr cloud with multiple shards. I was
>> trying to develop a logic like whenever a document gets inserted or updated
>> make sure it gets saved in the same shard where the parent doc is stored ,
>> in that way graph query works because all the family information will be in
>> the same shard.
>> Approach :
>> 1) If a new child/grandchild is getting inserted then get the parent doc
>> shard details and add the shard details to the document in a field
>> ex:parentshard and save the doc in the shard.
>> 2) If document is getting updated check if the parentshard field exists if
>> so update the doc to same shard.
>> But all these check conditions will increase response time , currently our
>> development is done in cloud mode with single shard and  using solrj to
>> save the data.
>> Also i an unable to figure out the query to update  doc to a particular
>> shard.
>> 
>> Any suggestions will help .
>> 
>> Thanks in Advance
>> sam


Re: Insert documents to a particular shard

2020-06-02 Thread Jörn Franke
You are trying to achieve data locality by having parents and children in the 
same shard?
Does document routing address it?

https://lucene.apache.org/solr/guide/8_5/shards-and-indexing-data-in-solrcloud.html#document-routing


On a side node, I don’t know your complete use case, but have you explored 
streaming expressions for graph traversal? 

https://lucene.apache.org/solr/guide/8_5/graph-traversal.html


> Am 03.06.2020 um 00:37 schrieb sambasivarao giddaluri 
> :
> 
> Hi All,
> I am running solr in cloud mode in local with 2 shards and 2 replica on
> port 8983 and 7574 and figuring out how to insert document in to a
> particular shard , I read about implicit and composite route but i don't
> think it will work for my usecase.
> 
> shard1 :  http://192.168.0.112:8983/family_shard1_replica_n1
> http://192.168.0.112:7574/family_shard1_replica_n2
> 
> shard2:   http://192.168.0.112:8983/family_shard2_replica_n3
> http://192.168.0.112:7574/family_shard2_replica_n4
> 
> we have documents with parent child relationship but flatten out with 2
> levels down and reference to each other.
> family schema documents:
> {
> "Id":"1"
> "document_type":"parent"
> "name":"John"
> }
> {
> "Id":"2"
> "document_type":"child"
> "parentId":"1"
> "name":"Rodney"
> }
> {
> "Id":"3"
> "document_type":"child"
> "parentId":"1"
> "name":"George"
> }
> {
> "Id":"4"
> "document_type":"grandchild"
> "parentId":"1",
> "childIdId":"2"
> "name":"David"
> }
> we have complex queries to get data based on graph query parser and  as
> graph query parser does not work on solr cloud with multiple shards. I was
> trying to develop a logic like whenever a document gets inserted or updated
> make sure it gets saved in the same shard where the parent doc is stored ,
> in that way graph query works because all the family information will be in
> the same shard.
> Approach :
> 1) If a new child/grandchild is getting inserted then get the parent doc
> shard details and add the shard details to the document in a field
> ex:parentshard and save the doc in the shard.
> 2) If document is getting updated check if the parentshard field exists if
> so update the doc to same shard.
> But all these check conditions will increase response time , currently our
> development is done in cloud mode with single shard and  using solrj to
> save the data.
> Also i an unable to figure out the query to update  doc to a particular
> shard.
> 
> Any suggestions will help .
> 
> Thanks in Advance
> sam


Re: solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-02 Thread Jörn Franke
Why in the jetty-ssl.xml?

Should this not be configured in the solr.in.sh?

> Am 03.06.2020 um 00:38 schrieb yaswanth kumar :
> 
> Thanks Franke, but yes for all these questions I did configured it
> properly, I made sure to include
> 
>  default="JKS"/>
>   default="JKS"/>
> in the jetty-ssl.xml along with the path keystore and truststore.
> 
> Also I have made sure that trusstore exists on all nodes and also I am
> using the same file for both keystore and truststore as below
>  default="./etc/solr-keystore.jks"/>
>   name="solr.jetty.keystore.password" default=""/>
>   default="./etc/solr-keystore.jks"/>
>   name="solr.jetty.truststore.password" default=""/>
> 
> also urlScheme for ZK is set to https
> 
> 
> Also the main error that I posted is the one that I am seeing as a return
> response where as the below one is what I see from solr logs
> 
> 2020-06-02 22:32:04.472 ERROR (qtp984876512-93) [c:default s:shard1
> r:core_node3 x:default_shard1_replica_n1] o.a.s.s.HttpSolrCall
> null:org.apache.solr.update.processor.Distr$
>at
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribFinish(DistributedZkUpdateProcessor.java:1189)
>at
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1096)
>at
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
>at
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>at
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>at
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>at
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>at
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>at
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>at
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>at
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
>at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> 
> 
> One strange observation is that when I hit update api on the leader node
> its working without any error, and now immediately if I hit non-leader its
> working fine (only once or twice), but if I keep on trying to hit this node
> again and again its then throwing the above error and once the error
> started happening , its consistent again.
> 
> Please let me know if you need more information or if I am missing
> something else
> 
> Thanks,
> 
>> On Tue, Jun 2, 2020 at 4:59 PM Jörn Franke  wrote:
>> 
>> Have you looked in the logfiles?
>> 
>> Keystore Type correctly defined  on all nodes?
>> 
>> Have you configured the truststore on all nodes correctly?
>> 
>> Have you set clusterprop urlScheme to htttps in ZK?
>> 
>> 
>> https://lucene.apache.org/solr/guide/7_5/enabling-ssl.html#configure-zookeeper
>> 
>> 
>> 
>>>> Am 02.06.2020 um 18:57 schrieb yaswanth kumar :
>>> 
>>> team, can someone help me on the above topic?
>>> 
>>>> On Mon, Jun 1, 2020 at 10:00 PM yaswanth kumar 

Re: solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-02 Thread Jörn Franke
Have you looked in the logfiles?

Keystore Type correctly defined  on all nodes?

Have you configured the truststore on all nodes correctly?

Have you set clusterprop urlScheme to htttps in ZK?

https://lucene.apache.org/solr/guide/7_5/enabling-ssl.html#configure-zookeeper



> Am 02.06.2020 um 18:57 schrieb yaswanth kumar :
> 
> team, can someone help me on the above topic?
> 
>> On Mon, Jun 1, 2020 at 10:00 PM yaswanth kumar 
>> wrote:
>> 
>> Trying to setup solr 8.4.1 + open jdk 11 on centos , enabled the ssl
>> configurations with all the certs in place, but the issue what I am seeing
>> is when trying to hit /update api on non-leader solr node , its throwing an
>> error
>> 
>> configured 2 solr nodes with 1 zookeeper.
>> 
>> metadata":[
>> 
>> "error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException",
>> 
>> "root-error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException"],
>> "msg":"Async exception during distributed update:
>> javax.crypto.BadPaddingException: RSA private key operation failed",
>> "trace":"org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>> Async exception during distributed update:
>> javax.crypto.BadPaddingException: RSA private key operation failed\n\tat
>> org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribFinish(DistributedZkUpdateProcessor.java:1189)\n\tat
>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1096)\n\tat
>> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)\n\tat
>> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)\n\tat
>> org.apache.solr.update.processor.UpdateRequestProcessor.finish
>> 
>> Strangely this is happening when we try to hit a non-leader node, hitting
>> leader node its working fine without any issue and getting the data indexed.
>> 
>> Not able to track down where the exact issue is happening.
>> 
>> Thanks,
>> 
>> --
>> Thanks & Regards,
>> Yaswanth Kumar Konathala.
>> yaswanth...@gmail.com
>> 
> 
> 
> -- 
> Thanks & Regards,
> Yaswanth Kumar Konathala.
> yaswanth...@gmail.com


Re: SOLR cache tuning

2020-06-01 Thread Jörn Franke
You should not have other processes/container running on the same node. They 
potentially screw up your os cache making things slow, eg if the other 
processes also read files etc they can remove things from Solr from the Os 
cache and then the os cache needs to be filled again.

What performance do you have now and what performance do you expect?

For full queries I would try to export daily all the data and offer it as a 
simple https download/on a object store. Maybe when you process the documents 
for indexing you can already put them on a object store or similar - so you 
don’t need Solr at all to export all of the documents.


See also Walters message.

> Am 01.06.2020 um 17:29 schrieb Tarun Jain :
> 
> Hi,I have a SOLR installation in master-slave configuration. The slave is 
> used only for reads and master for writes.
> I wanted to know if there is anything I can do to improve the performance of 
> the readonly Slave instance?
> I am running SOLR 8.5 and Java 14. The JVM has 24GB of ram allocated. Server 
> has 256 GB of RAM with about 50gb free (rest being used by other services on 
> the server)The index is 15gb in size with about 2 million documents.
> We do a lot of queries where documents are fetched using filter queries and a 
> few times all 2 million documents are read.My initial idea to speed up SOLR 
> is that given the amount of memory available, SOLR should be able to keep the 
> entire index on the heap (I know OS will also cache the disk blocks) 
> My solrconfig has the following:
>  20  class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0" /> 
>  autowarmCount="0" />  initialSize="8192" autowarmCount="0" />  class="solr.search.LRUCache" size="10" initialSize="0" autowarmCount="10" 
> regenerator="solr.NoOpRegenerator" /> 
> true 
> 20 
> 200 
> false 
> 2 
> I have modified the documentCache size to 8192 from 512 but it has not helped 
> much. 
> I know this question has probably been asked a few times and I have read 
> everything I could find out about SOLR cache tuning. I am looking for some 
> more ideas.
> 
> Any ideas?
> Tarun Jain-=-


Re: Solr Admin UI with restricted authorization

2020-05-29 Thread Jörn Franke
You can restrict the admin UI by limiting access using the authorization 
plugin. I would though not give access to end users for the admin UI.  A good 
practice is to create your own web application running on a dedicated server 
that manages all the authentication / authorization and provides a UI for the 
end users. Then you can manage access in a dedicated enterprise directory and 
also implement single sign-on etc.


You cannot manage users in the admin UI and you will see it gets very 
cumbersome with many different users.

Also from a usability point of view the admin UI does not make sense for end 
users.



> Am 29.05.2020 um 22:08 schrieb Yang, Ya Lan :
> 
> 
> Dear Solr support team,
> 
> Hope you are doing well in this difficult time! I have 2 quick questions:
> Are we able to restrict Admin UI's functions? For example, blocking the 
> update (insert/delete/edit) functions on the Admin UI. My colleagues would 
> like to open this Admin UI to users, but I think this is too risky because 
> accessing Admin UI can edit the data and even bring down the Solr instance. 
> Please correct me if I am wrong.
> I implemented the Authentication and Authorization Plugins to our solr 
> instance using your super helpful guide. I wonder is there any UI we can 
> manage/configure the users' accounts and permissions instead of using the 
> commands? 
> Thank you!
> 
> Ya-Lan Yang
> Software Engineer
> Cline Center for Advanced Social Research
> University of Illinois Urbana-Champaign
> 217.244.6641
> yly...@illinois.edu
> 
> Cline Center for Advanced Social Research
> 2001 S. First St. Suite 207, MC-689
> Champaign, IL  61820-7478
> www.clinecenter.illinois.edu
> 
> 


Re: highlighting a whole html document using Unified highlighter

2020-05-24 Thread Jörn Franke
Hmm maybe more insights on the use case would be useful. It looks like what 
David says about metadata could make sense in your scenario depending on the 
requirements...



> Am 24.05.2020 um 13:20 schrieb Serkan KAZANCI :
> 
> Thanks Jörn for the answer,
> 
> I use post tool to index html documents, so the html tags are stripped when 
> indexed and stored. The remaining text is mapped to the field content by 
> default. 
> 
> hl.fragsize=0 works perfect for the indexed document, but I can only display 
> highlighted text-only version of html document because the html tags are 
> stripped.
> 
> So is it possible to index and store the html document without stripping the 
> html tags, so that when the document is displayed with hl.fragsize=0 
> parameter, it is displayed as original html document?
> 
> Or
> 
> Is it possible to give a whole html document as a parameter to the Unified 
> highlighter so that output is also a highlighted html document?
> 
> Or 
> 
> Do you have a better idea to highlight the keywords of the whole html 
> document? 
> 
> Thanks,
> 
> Serkan
> 
> -Original Message-
> From: Jörn Franke [mailto:jornfra...@gmail.com] 
> Sent: Sunday, May 24, 2020 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: highlighting a whole html document using Unified highlighter
> 
> hl.fragsize=0
> 
> https://lucene.apache.org/solr/guide/8_5/highlighting.html
> 
> 
> 
>> Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI :
>> 
>> Hi,
>> 
>> 
>> 
>> I use solr to search over a million html documents, when a document is
>> searched and displayed, I want to highlight the keywords that are used to
>> find and access the document.
>> 
>> 
>> 
>> Unified highlighter is fast, accurate and supports different languages but
>> only highlights passages with given parameters.
>> 
>> 
>> 
>> How can I highlight a whole html document using Unified highlighter? I have
>> written a php code but it cannot do the complex word stemming functions.
>> 
>> 
>> 
>> 
>> 
>> Thanks,
>> 
>> 
>> 
>> Serkan
>> 
> 


Re: highlighting a whole html document using Unified highlighter

2020-05-24 Thread Jörn Franke
hl.fragsize=0

https://lucene.apache.org/solr/guide/8_5/highlighting.html



> Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI :
> 
> Hi,
> 
> 
> 
> I use solr to search over a million html documents, when a document is
> searched and displayed, I want to highlight the keywords that are used to
> find and access the document.
> 
> 
> 
> Unified highlighter is fast, accurate and supports different languages but
> only highlights passages with given parameters.
> 
> 
> 
> How can I highlight a whole html document using Unified highlighter? I have
> written a php code but it cannot do the complex word stemming functions.
> 
> 
> 
> 
> 
> Thanks,
> 
> 
> 
> Serkan
> 


Re: Query takes more time in Solr 8.5.1 compare to 6.1.0 version

2020-05-21 Thread Jörn Franke
Did you create Solrconfig.xml for the collection from scratch after upgrading 
and reindexing? Was it based on the latest template?
If not then please try this. Maybe also you need to increase the corresponding 
caches in the config.

What happens if you reexecute the query?

Are there other processes/containers running on the same VM?

How much heap and how much total memory you have? You should only have a minor 
fraction of the memory as heap and most of it „free“ (this means it is used for 
file caches).



> Am 21.05.2020 um 15:24 schrieb vishal patel :
> 
> Any one is looking this issue?
> I got same issue.
> 
> Regards,
> Vishal Patel
> 
> 
> 
> 
> From: jay harkhani 
> Sent: Wednesday, May 20, 2020 7:39 PM
> To: solr-user@lucene.apache.org 
> Subject: Query takes more time in Solr 8.5.1 compare to 6.1.0 version
> 
> Hello,
> 
> Currently I upgrade Solr version from 6.1.0 to 8.5.1 and come across one 
> issue. Query which have more ids (around 3000) and grouping is applied takes 
> more time to execute. In Solr 6.1.0 it takes 677ms and in Solr 8.5.1 it takes 
> 26090ms. While take reading we have same solr schema and same no. of records 
> in both solr version.
> 
> Please refer below details for query, logs and thead dump (generate from Solr 
> Admin while execute query).
> 
> Query : https://drive.google.com/file/d/1bavCqwHfJxoKHFzdOEt-mSG8N0fCHE-w/view
> 
> Logs and Thread dump stack trace
> Solr 8.5.1 : 
> https://drive.google.com/file/d/149IgaMdLomTjkngKHrwd80OSEa1eJbBF/view
> Solr 6.1.0 : 
> https://drive.google.com/file/d/13v1u__fM8nHfyvA0Mnj30IhdffW6xhwQ/view
> 
> To analyse further more we found that if we remove grouping field or we 
> reduce no. of ids from query it execute fast. Is anything change in 8.5.1 
> version compare to 6.1.0 as in 6.1.0 even for large no. Ids along with 
> grouping it works faster?
> 
> Can someone please help to isolate this issue.
> 
> Regards,
> Jay Harkhani.


Re: Index using CSV file

2020-04-18 Thread Jörn Franke
Please also do not forget that you should create a schema in the Solr 
collection so that the data is correctly indexed so that you get fast and 
correct query result. 
I usually recommend to read one of the many Solr books out there to get 
started. This will save you a lot of time. 

> Am 18.04.2020 um 17:43 schrieb Jörn Franke :
> 
> 
> This you don’t do via the Solr UI. You have many choices amongst others 
> 1) write a client yourself that parses the csv and post it to the standard 
> Update handler 
> https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html
> 2) use the Solr post tool 
> https://lucene.apache.org/solr/guide/8_4/post-tool.html
> 3) use a http client command line tool (eg curl) and post the data to the CSV 
> update handler: 
> https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html
> 
> However, it would be useful to know what you exactly trying to achieve and 
> give more background on the project, what programming languages and 
> frameworks you (plan to) use etc to give you a more guided answer 
> 
>>> Am 18.04.2020 um 17:13 schrieb Shravan Kumar Bolla 
>>> :
>>> 
>> Hi,
>> 
>> I'm trying to import data from CSV file from Solr UI and I am completely new 
>> to Solr. Please provide the necessary configurations to achieve this.
>> 
>> 


Re: Index using CSV file

2020-04-18 Thread Jörn Franke
This you don’t do via the Solr UI. You have many choices amongst others 
1) write a client yourself that parses the csv and post it to the standard 
Update handler 
https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html
2) use the Solr post tool 
https://lucene.apache.org/solr/guide/8_4/post-tool.html
3) use a http client command line tool (eg curl) and post the data to the CSV 
update handler: 
https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html

However, it would be useful to know what you exactly trying to achieve and give 
more background on the project, what programming languages and frameworks you 
(plan to) use etc to give you a more guided answer 

> Am 18.04.2020 um 17:13 schrieb Shravan Kumar Bolla 
> :
> 
> Hi,
> 
> I'm trying to import data from CSV file from Solr UI and I am completely new 
> to Solr. Please provide the necessary configurations to achieve this.
> 
> 


Re: Indexing data from multiple data sources

2020-04-17 Thread Jörn Franke
What does your Solr.log say? Any error ?

> Am 17.04.2020 um 20:22 schrieb RaviKiran Moola 
> :
> 
> 
> Hi,
> 
> Greetings!!!
> 
> We are working on indexing data from multiple data sources (MySQL & MSSQL) in 
> a single collection. We specified data source details like connection details 
> along with the required fields for both data sources in a single data config 
> file, along with specified required fields details in the managed schema and 
> here fetching the same columns from both data sources by specifying the 
> common “unique key”.
> 
> Unable to index the data from the data sources using solr.
> 
> Here I’m attaching the data config file and screenshot.
> 
> Data config file:
>  
>   url="jdbc:mysql://182.74.133.92:3306/ra_dev" user="devuser" 
> password="Welcome_009" batchSize="1" />  
>   driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" 
> url="jdbc:sqlserver://182.74.133.92;databasename=BB_SOLR" user="matuser" 
> password="MatDev:07"/>   
>   
>   
>
> 
>   
>  
>   
>   
>   
>  
> 
> 
> 
> Thanks & Regards,
> Ravikiran Moola
> +91-9494924492
> 


Re: ZooKeeper 3.4 end of life

2020-04-15 Thread Jörn Franke
The problem with Solr related to use TLS with ZK is the following:
* 3.5.5 seem to only support tls certificate authentication together with TLS . 
Solr support es only digest and Kerberos authentication. However, I have to 
check in the ZK jiras if this has changed with higher ZK versions
* quorum tls will work but again only with tls authentication (not Kerberos 
etc). Again, one needs to check the ZK jiras which versions is affected and if 
this has been confirmed

If you don’t use TLS then zk in any version (potentially also ZK 3.6 - to be 
tested) should work. If you need TLS check if your authentication methods are 
supported.

> Am 15.04.2020 um 10:19 schrieb Bram Van Dam :
> 
> On 09/04/2020 16:03, Bram Van Dam wrote:
>> Thanks, Erick. I'll give it a go this weekend and see how it behaves.
>> I'll report back so there's a record of my attempts in case anyone else
>> ends up asking the same question.
> 
> Here's a quick update after non-exhaustive testing: Running SolrCloud
> 7.7.2 against ZK 3.5.7 seems to work. This is using the same Ensemble
> configuration as in 3.4, but with 4-letter-words now explicitly enabled.
> 
> ZK 3.5 allegedly makes it easier to use TLS throughout the ensemble, but
> I haven't tried that in conjunction with Solr yet. I'll give it a go if
> I can find the time.
> 
> - Bram


Re: solrQuery exception : org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://solrIP:8983/solr

2020-04-11 Thread Jörn Franke
Is it a self signed certificate ? Enterprise CAs? Then you need to add the 
certificates to tomcat, because Java needs to validate them (to avoid 
man-in-the-middle attacks).

Curl may not validate the certificate?!

> Am 11.04.2020 um 13:42 schrieb six23 <23sixconsult...@gmail.com>:
> 
> Hi,
> I have enabled Solr to use SSL communication in my environment. I could get
> https://localhost:8983/solr and curl it from the tomcat8 server with no
> issue. But
> I'm getting this is error in the log tomcat log " solrQuery exception :
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: https://solrIP:8983/solr;
> Am i missing a configuration or step somewhere to let tomcat to communicate
> with solr ?
> Can tomcat access SSL Solr directly ? or do i have to set up more
> configuration on tomcat ?
> 
> 
> 
> Any advice would be appreciated.
> Thanks
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: entity in DIH for partial update?

2020-04-10 Thread Jörn Franke
You could use atomic updates in DIH. However, there is a bug in 
current/potentially also old Solr version that this leaks a searcher (which 
means the index data is infinitely growing until you restart the server). 
You can also export from the database to Jsonline, post it to the json update 
handler together with the atomic processor.

> Am 10.04.2020 um 16:02 schrieb matthew sporleder :
> 
> I have an field I would like to add to my schema which is stored in a
> different database from my primary data.  Can I use a separate entity
> in my DIH to update a single field of my documents?
> 
> Thanks,
> Matt


Re: Proper way to manage managed-schema file

2020-04-06 Thread Jörn Franke
You can use the Solr rest services to do all those operations. 

https://lucene.apache.org/solr/guide/8_3/schema-api.html

Normally in a productive environment you don’t use the UI but do all changes in 
a controlled automated fashion using the REST APIs.



> Am 06.04.2020 um 20:11 schrieb TK Solr :
> 
> I am using Solr 8.3.1 in non-SolrCloud mode (what should I call this mode?) 
> and modifying managed-schema.
> 
> I noticed that Solr does override this file wiping out all my comments and 
> rearranging the order. I noticed there is a "DO NOT EDIT" comment. Then, what 
> is the proper/expected way to manage this file? Admin UI can add fields but 
> cannot edit existing one or add new field types. Do I keep a script of many 
> schema calls? (Then how do I reset the default to the initial one, which 
> would be needed before re-re-playing the schema calls.)
> 
> TK
> 
> 


Re: Unable to delete zookeeper queue

2020-04-01 Thread Jörn Franke
Maybe you need I inc on zk server and zk client Jute Max bufffer to execute 
this . You can better ask the ZK mailing list 

> Am 01.04.2020 um 14:53 schrieb Kommu, Vinodh K. :
> 
> Hi,
> 
> Does anyone know a working solution to delete zookeeper queue data? Please 
> help!!
> 
> 
> Regards,
> Vinodh
> 
> From: Kommu, Vinodh K.
> Sent: Tuesday, March 31, 2020 12:55 PM
> To: solr-user@lucene.apache.org
> Subject: Unable to delete zookeeper queue
> 
> All,
> 
> For some reason one of our zookeeper queue was filled with way bigger number 
> so when I tried to delete queues with "rmr /overseer/queue" command, it's 
> throwing - Packet len19029055 is out of range! exception. Later I have 
> increased maxbuffer size to 50M and tried the same rmr command but still 
> getting following error. Since the queues are not getting deleted, solr 
> cluster status is not healthy which apparently marks all replicas as down 
> even nodes are up & running. Looks like it is a known bug with zookeeper. Is 
> there a way to delete zookeeper queues forcefully?
> 
> Error snippet:
> 
> Exception in thread "main" 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /overseer/queue
>at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
>at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
>at org.apache.zookeeper.ZKUtil.listSubTreeBFS(ZKUtil.java:114)
>at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:49)
>at 
> org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:703)
>at 
> org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:588)
>at 
> org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:360)
>at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
>at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)
> 
> 
> Queue output:
> 
> get /overseer/queue
> null
> cZxid = 0x30017
> ctime = Wed Feb 06 22:29:14 EST 2019
> mZxid = 0x30017
> mtime = Wed Feb 06 22:29:14 EST 2019
> pZxid = 0x341869
> cversion = 1420613
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 1119355
> 
> 
> Regards,
> Vinodh
> 
> DTCC DISCLAIMER: This email and any files transmitted with it are 
> confidential and intended solely for the use of the individual or entity to 
> whom they are addressed. If you have received this email in error, please 
> notify us immediately and delete the email and any attachments from your 
> system. The recipient should check this email and any attachments for the 
> presence of viruses. The company accepts no liability for any damage caused 
> by any virus transmitted by this email.


Re: How do *you* restrict access to Solr?

2020-03-16 Thread Jörn Franke
Solr should not be accessible to end users directly - only through a dedicated 
application in between.

Then in an enterprise setting it is mostly Kerberos auth. and https (do not 
forget about zookeeper when using Solr cloud here you can also have Kerberos 
auth and in recent version also SSL). It is not that difficult to configure if 
you work with people that know a bit about those topics in your enterprise.

In a Cloud based scenario jwt token can make sense. 

Do not do security by obscurity. You owe it to the users that potentially also 
have private data on Solr.

> Am 16.03.2020 um 15:44 schrieb Ryan W :
> 
> How do you, personally, do it?  Do you use IPTables?  Basic Authentication
> Plugin? Something else?
> 
> I'm asking in part so I'l have something to search for.  I don't know where
> I should begin, so I figured I would ask how others do it.
> 
> I haven't been able to find anything that works, so if you can tell me what
> works for you, I can at least narrow it down a bit and do some Google
> searches.  Do I need to learn Solr's plugin system?  Am I starting in the
> right place if I follow this document:
> https://lucene.apache.org/solr/guide/7_0/rule-based-authorization-plugin.html#rule-based-authorization-plugin
> 
> Initially, the above document seems far too comprehensive for my needs.  I
> just want to block access to the Solr admin UI, and the list of predefined
> permissions in that document don't seem to be relevant.  Also, it seems
> unlikely this plugin system is necessary just to control access to the
> admin UI... or maybe it necessary?
> 
> In any case, what is your approach?
> 
> I'm using version 7.7.2 of Solr.
> 
> Thanks!


Re: JSP support not configured in Solr 8.3.0

2020-03-11 Thread Jörn Franke
It is better to have a dedicated frontend for Solr on a dedicated server. For 
security reasons, Solr becomes more and more locked up and it is also 
discouraged to put own web applications on it. 

> Am 11.03.2020 um 10:03 schrieb vishal patel :
> 
> 
> I put the JSP in \server\solr-webapp\webapp\test.jsp. When I hit in browser 
> using http://172.178.170.175:7999/solr/test.jsp, I got HTTP ERROR 500 Problem 
> accessing /solr/mem.jsp. Reason:JSP support not configured.
> 
> Is any jar required in \server\lib\? OR any configuration for that?
> 
> Regards,
> Vishal
> 
> Sent from Outlook


Re: Atomic Update and Optimization and segments

2020-03-10 Thread Jörn Franke
How do you do the atomic updates? I discovered a bug when doing them via DIH or 
Scriptupdateprocessor (only this one! The atomic one is fine) that leads to 
infinite index growth when doing atomic updates 

> Am 10.03.2020 um 13:28 schrieb Kayak28 :
> 
> Hello, Community:
> 
> Currently, my index grows up to almost 1T, and I would like to minimize my
> index.
> 
> I know I have a few fields that are not used or rarely used, so I want to
> delete them.
> I have tried to delete these fields by the atomic update, sending the
> following JSON for example.
> {
> "id":"1",
> "text":{"set": null }
> }
> As a result, it generated a new segment, so segment count increased +1,
> index size became bigger, and mac doc is increased +1.
> I have expected this result, but my goal is to minimize my index, so I sent
> an expungeDeleted request and optimize request, expecting to reduce the
> index size and segment count.
> But, the segment did not reduce, the index size did not change, and max doc
> did not change.
> 
> As of Solr 8.4.1, is there any way to minimize segment count, index size
> and max doc after atomic-updating?
> 
> Sincerely,
> Kaya Ota


Re: multivalue faceting term optimization

2020-03-09 Thread Jörn Franke
hll stands for https://en.wikipedia.org/wiki/HyperLogLog

You will not get the exact distinct count, but a distinct count very close to 
the real number. It is very fast and memory efficient for large number of 
distinct values.

> Am 10.03.2020 um 00:25 schrieb Nicolas Paris :
> 
> 
> Erick Erickson  writes:
>> Have you looked at the HyperLogLog stuff? Here’s at least a mention of
>> it: https://lucene.apache.org/solr/guide/8_4/the-stats-component.html
> 
> I am used to hll in the context of count distinct values -- cardinality.
> 
> I have to admit that section 
> https://lucene.apache.org/solr/guide/8_4/the-stats-component.html#local-parameters-with-the-stats-component
> is about hll and facets, but I am not sure that really meet the use
> case. I also have to admit that part is quite cryptic to me.
> 
> 
> -- 
> nicolas paris


Re: Problem with Solr 7.7.2 after OOM

2020-03-05 Thread Jörn Franke
Just keep in mind that the total memory should be much more than the heap to 
leverage Solr file caches. If you have 8 GB heap probably at least 16 gb total 
memory make sense to be available on the machine .

> Am 05.03.2020 um 16:58 schrieb Walter Underwood :
> 
> 
>> 
>> On Mar 5, 2020, at 4:29 AM, Bunde Torsten  wrote:
>> 
>> -Xms512m -Xmx512m 
> 
> Your heap is too small. Set this to -Xms8g -Xmx8g
> 
> In solr.in.sh, that looks like this:
> 
> SOLR_HEAP=8g
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 


Re: Upgrading from 6.5.0 to 8.4.1

2020-02-27 Thread Jörn Franke
You did a reload and not a reindex? 
Probably the best is to delete the collection fully, create it new and index 
then .

> Am 27.02.2020 um 14:02 schrieb Pavel Polivka :
> 
> Hello,
> 
> I am doing upgrade of SolrCloud cluster from 6.5.0 to 8.4.1.
> 
> My process is:
> 
> Upgrade to 7.7.2.
> Reconfigure my solrconfig to have  
> 7.7.2, reload collections.
> Delete all docs in all collections and index them again.
> 
> Upgrade to 8.4.1.
> This step always fails for me with message in logs saying that the index 
> format is probably created by 6.x version.
> My thinking is that the reindex I do in 7.7.2 is not enough.
> 
> Is there a way to check in what format my index is?
> 
> Any ideas if I am doing anything wrong?
> 
> 
> Thanks.
> 
> 
> Pavel Polivka


Re: Solr 8.2.0 - Schema issue

2020-02-26 Thread Jörn Franke
Not sure i understood the whole scenario. However did you try to reload (not 
reindex) the collection 

> Am 26.02.2020 um 15:02 schrieb Joe Obernberger :
> 
> Hi All - I have several solr collections all with the same schema.  If I add 
> a field to the schema and index it into the collection on which I added the 
> field, it works fine.  However, if I try to add a document to a different 
> solr collection that contains the new field (and is using the same schema), I 
> get an error that the field doesn't exist.
> 
> If I restart the cluster, this problem goes away and I can add a document 
> with the new field to any solr collection that has the schema.  Any 
> work-arounds that don't involve a restart?
> 
> Thank you!
> 
> -Joe Obernberger
> 


Re: Solr Upgrade socketTimeout issue in 8.2

2020-02-19 Thread Jörn Franke
Yes you need to reindex.
Update solrconfig, schemas to leverage the later feature of the version (some 
datatypes are now more optimal others are deprecated.

Update Solrconfig.xml and schema to leverage the latest  datatypes , features 
etc..

Create new collection based on newest config.
Use your regular Index process to move documents to new collection.

Check if new collection works and has expected performance.

Delete old collection.

Test before in a test environment and not in production!

> Am 19.02.2020 um 09:46 schrieb Yogesh Chaudhari 
> :
> 
> Hi,
> 
> Could you please share me the steps to upgrade SOlr?
> 
> Now I am using Solr cloud 5.2.1 on production and wanted to upgrade to 
> SOlr7.7.2. I am doing this in 2 spteps SOlr 5.2.1 to SOlr 6.6.6 then SOlr 
> 7.7.2.
> 
> I have upgraded to Solr but getting issue for indexing of old documents.  I 
> am badly stuck get get old document in migrated solr version.
> 
> Should I do the re-indexing? If yes can you please share the way to 
> re-indexing?
> 
> Can you please provide your inputs on this? 
> 
> Thanks,
> 
> Yogesh Chaudhari
> 
> -Original Message-
> From: kshitij tyagi  
> Sent: Wednesday, February 19, 2020 12:52 PM
> To: solr-user@lucene.apache.org
> Subject: Solr Upgrade socketTimeout issue in 8.2
> 
> Hi,
> 
> We have upgraded our solrCloud from version 6.6.0 to 8.2.0
> 
> At the time of indexing intermittently we are observing socketTimeout 
> exception when using Collection apis. example when we try reloading one of 
> the collection using CloudSolrClient class.
> 
> Is there any performance degradation in Solrcloud collection apis?
> 
> logs:
> 
> IOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of 
> stream exception
> 
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x2663e756d775747, likely client has closed socket
> 
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> 
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
> 
> at java.lang.Thread.run(Unknown Source)
> 
> 
> logs:
> 
> 
> Exception has occured in job switch: Timeout occurred while waiting response 
> from server 
> at:http://secure-web.cisco.com/1w_RA10DqbruLQVC6aKUXuMgV4hC3T14viv2m2iUptQ2hyGjYLn0sSSy0Q_XNqcVxHym-e_mOyPc_AYM4zpIWWXdyRCpvXzL3mSeFK-DzhL_CqoNi2FwQUvhk2zb8OQKs1e11yBHqblc3Kyx0XlruLvb24BUj0lBBGmJVf5E9rrTaFQbFmCdNyccx1KCIpzf2MlyeqvvXVWKCW_YbqnLWGjcfqlAylbNqJTGuKf5rbBMdJ8pn14dbFlM0QDZjn6IORWVA8NqmdhC9VwD1rzpU6dVIpsph6qz_OcgoH61wlZALQ1Zj65XRFtXvuhqEWQeaabvKactprjz1o3pflKaxttbgxz1ItRxb4FjZkBgTC24uwalAmi_CyfeP7DECtIYATYf3AJFjCUfLV8_Rj2V5J0JeCTFDi7CWqKFUhiHXtpM8PvZt8kgMIRwfgPUKHPIJ/http%3A%2F%2Fprod-t-8.net%3A8983%2Fsolr
> 
> 
> Is anyone facing same type of issue? in Solrcloud? Any suggestions to solve??
> 
> 
> 
> Regards,
> 
> kshitij


Re: Solr Relevancy problem

2020-02-19 Thread Jörn Franke
The best way to address this problem is to collect queries and examples why 
they are wrong and to document this. This is especially important when working 
with another vendor. Otherwise no one can give you proper help.

> Am 19.02.2020 um 09:17 schrieb Pradeep Tambade 
> :
> 
> Hello,
> 
> We have configured solr site search engine into our website(www.croma.com). 
> We are facing various issues like not showing relevant results, free text 
> search not showing  result, phrase keywords shows irrelevant results etc
> 
> Please help us resolve these issues also help to connect with solr tech 
> support team or any other company who is expert in managing solr search.
> 
> 
> Thanks & Regards,
> Pradeep Tambade |  Assistant Manager - Business Analyst
> Infiniti Retail Ltd. | A Tata Enterprise
> Mobile: +91 9664536737
> Email: pradeep.tamb...@croma.com | Shop at: www.croma.com
> 
> 
>  Have e-waste but don't know what to do about it?
> 
>  *   Call us at 7207-666-000 & we pick up your junk at your doorstep
>  *   We ensure responsible disposal
>  *   And also plant an actual tree in your name for the e-waste you dispose
> 
> [https://www.croma.com/_ui/responsive/common/images/Greatplacetowork_.jpg]
> 
> Registered Office: Unit No. 701 & 702, 7th Floor, Kaledonia, Sahar Road, 
> Andheri East, Mumbai - 400069, India


Re: Best Practises around relevance tuning per query

2020-02-18 Thread Jörn Franke
You are too much focus on the solution. If you would describe the business case 
in more detail without including the solution itself more people could help.

Eg it ie not clear why you have a scoring model and why this can address 
business needs. 

> Am 18.02.2020 um 01:50 schrieb Ashwin Ramesh :
> 
> Hi,
> 
> We are in the process of applying a scoring model to our search results. In
> particular, we would like to add scores for documents per query and user
> context.
> 
> For example, we want to have a score from 500 to 1 for the top 500
> documents for the query “dog” for users who speak US English.
> 
> We believe it becomes infeasible to store these scores in Solr because we
> want to update the scores regularly, and the number of scores increases
> rapidly with increased user attributes.
> 
> One solution we explored was to store these scores in a secondary data
> store, and use this at Solr query time with a boost function such as:
> 
> `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
> mul(termfreq(id,'ID-500'),1)`
> 
> We have over a hundred thousand documents in one Solr collection, and about
> fifty million in another Solr collection. We have some queries for which
> roughly 80% of the results match, although this is an edge case. We wanted
> to know the worst case performance, so we tested with such a query. For
> both of these collections we found the a message similar to the following
> in the Solr cloud logs (tested on a laptop):
> 
> Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
> 
> We then tried using the following boost, which seemed simpler:
> 
> `boost=if(query($qq), 10, 1)=id:(ID-1 OR ID-2 OR … OR ID-500)`
> 
> We then saw the following in the Solr cloud logs:
> 
> `The request took too long to iterate over terms.`
> 
> All responses above took over 5000 milliseconds to return.
> 
> We are considering Solr’s re-ranker, but I don’t know how we would use this
> without pushing all the query-context-document scores to Solr.
> 
> 
> The alternative solution that we are currently considering involves
> invoking multiple solr queries.
> 
> This means we would make a request to solr to fetch the top N results (id,
> score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar, limit=N.
> 
> Another request would be made using a filter query with a set of doc ids
> that we know are high value for the user’s query. E.g. q=*:*,
> fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
> 
> We would then do a reranking phase in our service layer.
> 
> Do you have any suggestions for known patterns of how we can store and
> retrieve scores per user context and query?
> 
> Regards,
> Ash & Spirit.
> 
> -- 
> **
> ** Empowering the world to design
> Also, we're 
> hiring. Apply here! 
> 
>   
>    
>     
> 


Re: StatelessScriptUpdateProcessorFactory causing OOM errors?

2020-02-13 Thread Jörn Franke
I had also issues with this factory when creating atomic updates inside there. 
They worked, but searcher where never closed and new ones where open and stayed 
open with all the issues related to that one. Maybe one needs to look into more 
detail into that. However - it is a script in the end so that could be always a 
bug in your script as well.

> Am 13.02.2020 um 19:21 schrieb Haschart, Robert J (rh9ec) 
> :
> 
> Erick,
> 
> Sorry I didn't see this response, for some reason solr-users has stopped 
> being delivered to my mail box.
> 
> The script that adds a field based on the value(s) in some other field 
> doesn't add a large number of different fields to the index.
> The pool_f field only has a total of 11 different values, and except for some 
> rare cases, any given record only has a single value in that field, and those 
> rare cases will have two values.
> 
> I had previously implemented the same functionality by making a small jar 
> file containing a customized version of TemplateUpdateProcessorFactory  that 
> could generate different field names, but since I needed another bit of 
> functionality in the Update Chain I decided to port the original 
> functionality to a script  since the "development overhead" of adding a 
> script is less than adding in multiple additional custom 
> UpdateProcessorFactory objects.
> 
> I had been running solr with the the memory flag  "-m 8G" and it had been 
> running fine with that setting for a least a year, even recently when the 
> customized java version of TemplateUpdateProcessorFactory was being invoked 
> to perform essentially the same processing step.
> 
> However when I tried to accomplish the same thing via javascript through 
> StatelessScriptUpdateProcessorFactory  and start a re-index it would die 
> after about 1 million records being indexed.And since it is merely my 
> (massive) development machine, during the re-index there are close to zero 
> searches coming through while the re-index is happening.
> 
> I've managed to work around the issue on my dev box by upping the the memory 
> for solr to 16G, and haven't had an OOM since doing that, but I'm hesitant to 
> push these changes to our AWS-hosted production instances since running out 
> of memory and terminating there would be more of an issue.
> 
> -Bob
> 
> 
> 
> 
>From: Erick Erickson 
>Subject: Re: StatelessScriptUpdateProcessorFactory causing OOM errors?
>Date: Thu, 6 Feb 2020 09:18:41 -0500
> 
>How many fields do you wind up having? It looks on a quick glance like
>it depends on the values of fields. While I’ve seen Solr/Lucene handle
>indexes with over 1M different fields, it’s unsatisfactory.
> 
>What I’m wondering is if you are adding a zillion different fields to your
>docs as time passes and eventually the structures that are needed to
>maintain your field mappings are blowing up memory.
> 
>If that’s that case, you need an alternative design because your
>performance will be unacceptable.
> 
>May be off base, if so we can dig further.
> 
>Best,
>Erick
> 
>> On Feb 5, 2020, at 3:41 PM, Haschart, Robert J (rh9ec)  
>> wrote:
>> 
>> StatelessScriptUpdateProcessorFactory
> 
> 
> 
> 


Re: Support Tesseract in Apache Solr

2020-02-11 Thread Jörn Franke
Honestly i would not run tesseract on the same server as Solr. It takes a lot 
of resources and may negatively impact Solr. Just write a small program using 
Tika+Tesseract that runs on a different server / container and posts the 
results to Solr.

About your question: Probably Tika (a dependency of Solr) figured it out or 
depending on your format Pdfbox (used by Tika).

> Am 11.02.2020 um 19:15 schrieb Karan Jain :
> 
> Hi All,
> 
> The Solr version 7.6.0 is running on my local machine. I have installed
> Tesseract through following steps:-
> yum install tesseract echo export PATH=$PATH:/usr/share/tesseract
>>> ~/.bash_profile
> echo export TESSDATA_PREFIX=/usr/share/tesseract >>~/.bash_profile
> 
> Now the deployed Solr is supporting tesseract. I searched TESSDATA_PREFIX
> in https://github.com/apache/lucene-solr and found no reference there. I
> could not understand How Solr came to know about the deployed tesseract.
> Please tell the specific java class in Solr if possible.
> 
> Thanks for your time,
> Best,
> Karan


Re: solr-injection

2020-02-11 Thread Jörn Franke
Do not have users accessing Solr directly.

 Have your own secure web frontend/ own APIs for it. In this way you can 
control secure access.

Secure Solr with https and Kerberos. Have for your web frontend only access 
rights needed and for your admins only the access rights they need. Automate 
deployment of configurations through the APIs. Secure Zookeeper (if in cloud 
mode) with ssl and authentication (eh Kerberos).

Make sure that connection to those two are only allowed for the web frontend 
and admins ( for the latter have a dedicated jumphost from which connections 
are allowed). 



> Am 11.02.2020 um 10:55 schrieb Martin Frank Hansen (MHQ) :
> 
> Hi,
> 
> I was wondering how others are handling solr – injection in their solutions?
> 
> After reading this post: 
> https://www.waratek.com/apache-solr-injection-vulnerability-customer-alert/ I 
> can see how important it is to update to Solr-8.2 or higher.
> 
> Has anyone been successful in injecting unintended queries to Solr? I have 
> tried to delete the database from the front-end, using basic search strings 
> and Solr commands, but has yet not been successful (which is good). I think 
> there are many who knows much more about this than me, so would be nice to 
> hear from someone with more experience.
> 
> Which considerations do I need to look at in order to secure my Solr core? 
> Currently we have a security layer on top on Solr, but at the same time we do 
> not want to restrict the flexibility of the searches too much.
> 
> Best regards
> 
> Martin
> 
> 
> Internal - KMD A/S
> 
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s Privatlivspolitik, der fortæller, 
> hvordan vi behandler oplysninger om dig.
> 
> Protection of your personal data is important to us. Here you can read KMD’s 
> Privacy Policy outlining how we process 
> your personal data.
> 
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen 
> og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og 
> andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages 
> og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget 
> ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge 
> e-mailen.
> 
> Please note that this message may contain confidential information. If you 
> have received this message by mistake, please inform the sender of the 
> mistake by sending a reply, then delete the message from your system without 
> making, distributing or retaining any copies of it. Although we believe that 
> the message and any attachments are free from viruses and other errors that 
> might affect the computer or it-system where it is received and read, the 
> recipient opens the message at his or her own risk. We assume no 
> responsibility for any loss or damage arising from the receipt or use of this 
> message.


Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
Ok, i created collection from scratch based on config

Unfortunately, it does not improve. It is just growing and growing. Except
when I stop solr and then during startup the unnecessary index files are
purged. Even with the previous config this did not happen in older Solr
versions (for sure not in 8.2, in 8.3 maybe, but for sure in 8.4).

Reproduction is simple: just load documents into the index (even during the
first load i observe a significant index size increase (4x fold) that is
then reduced after restart).

I observe though that during metadata update (= atomic updates) it
increases double (not anywhere near what is expected due to the update) and
then slightly reduce (a few megabytes, nothing compared to the real full
size that the index now has).

At the moment, it looks to me it is due to the Solr version, because the
config did not change (we have them all versioned, I checked). However,
maybe I am overlooking something.

Furthermore, it seems that during segment merges old segments are not
deleted until restart (but again, it is a speculation).
I suspect not many have observed this, because the only way that would be
observe is 1) they index a collection completely new and see a huge index
file consumption 2) they update their collection a lot and hit a limit of
disk space (which may happen in some cases not so soon).

I created a JIRA: https://issues.apache.org/jira/browse/SOLR-14202

Please let me know if I can test anything else.

On Tue, Jan 21, 2020 at 10:58 PM Jörn Franke  wrote:

> After testing the update?commit=true i now face an error: "Maximum lock
> count exceeded". strange this is the first time i see this in the lockfiles
> and when doing commit=true
> ava.lang.Error: Maximum lock count exceeded
> at
> java.base/java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:535)
> at
> java.base/java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:494)
> at
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1368)
> at
> java.base/java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:882)
> at
> org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)
> at
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:124)
> at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:658)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:102)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1079)
> at
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.processCommit(DistributedZkUpdateProcessor.java:220)
> at
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
&g

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
r.java:505)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:427)
at
org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:321)
at
org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:781)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:917)
at java.base/java.lang.Thread.run(Thread.java:834)

On Tue, Jan 21, 2020 at 10:51 PM Jörn Franke  wrote:

> The only weird thing is I see that for instance I have
> ${solr.autoCommit.maxTime:15000}  and similar entries.
> It looks like a template gone wrong, but this was not caused due to an
> internal development. It must have been come from a Solr version.
>
> On Tue, Jan 21, 2020 at 10:49 PM Jörn Franke  wrote:
>
>> It is btw. a Linux system and autosoftcommit is set to -1. However,
>> indeed openSearcher is set to false. A commit is set to true after doing
>> all the updates, but the index is not shrinking. The files are not
>> disappearing during shutdown, but they disappear after starting up again.
>>
>> On Tue, Jan 21, 2020 at 4:04 PM Jörn Franke  wrote:
>>
>>> thanks for the answer I will look into it - it is a possible
>>> explanation.
>>>
>>> > Am 20.01.2020 um 14:30 schrieb Erick Erickson >> >:
>>> >
>>> > Jörn:
>>> >
>>> > The only thing I can think of that _might_ cause this (I’m not all
>>> that familiar with the code) is if your solrconfig settings never open a
>>> searcher. Either you need to be sure openSearcher is set to true in the
>>> autocommit section in solrconfig.xml or your autoSoftCommit is set to
>>> something other than -1. Real Time Get requires access to all segments and
>>> it takes a new searcher being opened to release them. Actually, a very
>>> quick test would be to submit 
>>> “http://host:port/solr/collection/update?commit=true”
>>> and see if the index shrinks as a result. You don’t need to change
>>> solrconfig.xml for that test.
>>> >
>>> > If you are opening a new searcher, this is very concerning. There
>>> shouldn’t be anything else you have to set to prevent the index from
>>> growing. Could you check one thing? Compare the directory listing of the
>>> data/index directory just before you shut down Solr and then just after.
>>> What I’m  interested in is whether some subset of files disappears when you
>>> shut down Solr. This assumes you’re running on a *nix system, if Windows
>>> you may have to start Solr again to see the difference.
>>> >
>>> > So if you open a searcher and still see the problem, I can try to
>>> reproduce it. Can you share your solrconfig file or at least the autocommit
>>> and cache portions?
>>> >
>>> > Best,
>>> > Erick
>>> >
>>> >> On Jan 20, 2020, at 5:40 AM, Jörn Franke 
>>> wrote:
>>> >>
>>> >> From what is see it basically duplicates the index files, but does
>>> not delete the old ones.
>>> >> It uses caffeine cache.
>>> >>
>>> >> What I observe is that there is an exception when shutting down for
>>> the collection that is updated - timeout waiting for all directory ref
>>> counts to be released - gave up waiting on CacheDir.
>>> >>
>>> >>>> Am 20.01.2020 um 11:26 schrieb Jörn Franke :
>>> >>>
>>> >>> Sorry I missed a line - not tlog is growing but the /data/index
>>> folder is growing - until restart when it seems to be purged.
>>> >&

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
The only weird thing is I see that for instance I have
${solr.autoCommit.maxTime:15000}  and similar entries.
It looks like a template gone wrong, but this was not caused due to an
internal development. It must have been come from a Solr version.

On Tue, Jan 21, 2020 at 10:49 PM Jörn Franke  wrote:

> It is btw. a Linux system and autosoftcommit is set to -1. However, indeed
> openSearcher is set to false. A commit is set to true after doing all the
> updates, but the index is not shrinking. The files are not disappearing
> during shutdown, but they disappear after starting up again.
>
> On Tue, Jan 21, 2020 at 4:04 PM Jörn Franke  wrote:
>
>> thanks for the answer I will look into it - it is a possible explanation.
>>
>> > Am 20.01.2020 um 14:30 schrieb Erick Erickson > >:
>> >
>> > Jörn:
>> >
>> > The only thing I can think of that _might_ cause this (I’m not all that
>> familiar with the code) is if your solrconfig settings never open a
>> searcher. Either you need to be sure openSearcher is set to true in the
>> autocommit section in solrconfig.xml or your autoSoftCommit is set to
>> something other than -1. Real Time Get requires access to all segments and
>> it takes a new searcher being opened to release them. Actually, a very
>> quick test would be to submit 
>> “http://host:port/solr/collection/update?commit=true”
>> and see if the index shrinks as a result. You don’t need to change
>> solrconfig.xml for that test.
>> >
>> > If you are opening a new searcher, this is very concerning. There
>> shouldn’t be anything else you have to set to prevent the index from
>> growing. Could you check one thing? Compare the directory listing of the
>> data/index directory just before you shut down Solr and then just after.
>> What I’m  interested in is whether some subset of files disappears when you
>> shut down Solr. This assumes you’re running on a *nix system, if Windows
>> you may have to start Solr again to see the difference.
>> >
>> > So if you open a searcher and still see the problem, I can try to
>> reproduce it. Can you share your solrconfig file or at least the autocommit
>> and cache portions?
>> >
>> > Best,
>> > Erick
>> >
>> >> On Jan 20, 2020, at 5:40 AM, Jörn Franke  wrote:
>> >>
>> >> From what is see it basically duplicates the index files, but does not
>> delete the old ones.
>> >> It uses caffeine cache.
>> >>
>> >> What I observe is that there is an exception when shutting down for
>> the collection that is updated - timeout waiting for all directory ref
>> counts to be released - gave up waiting on CacheDir.
>> >>
>> >>>> Am 20.01.2020 um 11:26 schrieb Jörn Franke :
>> >>>
>> >>> Sorry I missed a line - not tlog is growing but the /data/index
>> folder is growing - until restart when it seems to be purged.
>> >>>
>> >>>> Am 20.01.2020 um 10:47 schrieb Jörn Franke :
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I have a test system here with Solr 8.4 (but this is also
>> reproducible in older Solr versions), which has an index which is growing
>> and growing - until the SolrCloud instance is restarted - then it is
>> reduced tot the expected normal size.
>> >>>> The collection is configured to do auto commit after 15000 ms. I
>> expect the index grows comes due to the usage of atomic updates, but I
>> would expect that due to the auto commit this does not grow all the time.
>> >>>> After the atomic updates a commit is done in any case.
>> >>>>
>> >>>> I don’t see any error message in the log files, but the growth is
>> quiet significant and frequent restarts are not a solution of course.
>> >>>>
>> >>>> Maybe I am overlooking here a tiny configuration issue?
>> >>>>
>> >>>> Thank you.
>> >>>>
>> >>>>
>> >>>> Best regards
>> >
>>
>


Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
It is btw. a Linux system and autosoftcommit is set to -1. However, indeed
openSearcher is set to false. A commit is set to true after doing all the
updates, but the index is not shrinking. The files are not disappearing
during shutdown, but they disappear after starting up again.

On Tue, Jan 21, 2020 at 4:04 PM Jörn Franke  wrote:

> thanks for the answer I will look into it - it is a possible explanation.
>
> > Am 20.01.2020 um 14:30 schrieb Erick Erickson :
> >
> > Jörn:
> >
> > The only thing I can think of that _might_ cause this (I’m not all that
> familiar with the code) is if your solrconfig settings never open a
> searcher. Either you need to be sure openSearcher is set to true in the
> autocommit section in solrconfig.xml or your autoSoftCommit is set to
> something other than -1. Real Time Get requires access to all segments and
> it takes a new searcher being opened to release them. Actually, a very
> quick test would be to submit 
> “http://host:port/solr/collection/update?commit=true”
> and see if the index shrinks as a result. You don’t need to change
> solrconfig.xml for that test.
> >
> > If you are opening a new searcher, this is very concerning. There
> shouldn’t be anything else you have to set to prevent the index from
> growing. Could you check one thing? Compare the directory listing of the
> data/index directory just before you shut down Solr and then just after.
> What I’m  interested in is whether some subset of files disappears when you
> shut down Solr. This assumes you’re running on a *nix system, if Windows
> you may have to start Solr again to see the difference.
> >
> > So if you open a searcher and still see the problem, I can try to
> reproduce it. Can you share your solrconfig file or at least the autocommit
> and cache portions?
> >
> > Best,
> > Erick
> >
> >> On Jan 20, 2020, at 5:40 AM, Jörn Franke  wrote:
> >>
> >> From what is see it basically duplicates the index files, but does not
> delete the old ones.
> >> It uses caffeine cache.
> >>
> >> What I observe is that there is an exception when shutting down for the
> collection that is updated - timeout waiting for all directory ref counts
> to be released - gave up waiting on CacheDir.
> >>
> >>>> Am 20.01.2020 um 11:26 schrieb Jörn Franke :
> >>>
> >>> Sorry I missed a line - not tlog is growing but the /data/index
> folder is growing - until restart when it seems to be purged.
> >>>
> >>>> Am 20.01.2020 um 10:47 schrieb Jörn Franke :
> >>>>
> >>>> Hi,
> >>>>
> >>>> I have a test system here with Solr 8.4 (but this is also
> reproducible in older Solr versions), which has an index which is growing
> and growing - until the SolrCloud instance is restarted - then it is
> reduced tot the expected normal size.
> >>>> The collection is configured to do auto commit after 15000 ms. I
> expect the index grows comes due to the usage of atomic updates, but I
> would expect that due to the auto commit this does not grow all the time.
> >>>> After the atomic updates a commit is done in any case.
> >>>>
> >>>> I don’t see any error message in the log files, but the growth is
> quiet significant and frequent restarts are not a solution of course.
> >>>>
> >>>> Maybe I am overlooking here a tiny configuration issue?
> >>>>
> >>>> Thank you.
> >>>>
> >>>>
> >>>> Best regards
> >
>


Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
thanks for the answer I will look into it - it is a possible explanation. 

> Am 20.01.2020 um 14:30 schrieb Erick Erickson :
> 
> Jörn:
> 
> The only thing I can think of that _might_ cause this (I’m not all that 
> familiar with the code) is if your solrconfig settings never open a searcher. 
> Either you need to be sure openSearcher is set to true in the autocommit 
> section in solrconfig.xml or your autoSoftCommit is set to something other 
> than -1. Real Time Get requires access to all segments and it takes a new 
> searcher being opened to release them. Actually, a very quick test would be 
> to submit “http://host:port/solr/collection/update?commit=true” and see if 
> the index shrinks as a result. You don’t need to change solrconfig.xml for 
> that test.
> 
> If you are opening a new searcher, this is very concerning. There shouldn’t 
> be anything else you have to set to prevent the index from growing. Could you 
> check one thing? Compare the directory listing of the data/index directory 
> just before you shut down Solr and then just after. What I’m  interested in 
> is whether some subset of files disappears when you shut down Solr. This 
> assumes you’re running on a *nix system, if Windows you may have to start 
> Solr again to see the difference.
> 
> So if you open a searcher and still see the problem, I can try to reproduce 
> it. Can you share your solrconfig file or at least the autocommit and cache 
> portions? 
> 
> Best,
> Erick
> 
>> On Jan 20, 2020, at 5:40 AM, Jörn Franke  wrote:
>> 
>> From what is see it basically duplicates the index files, but does not 
>> delete the old ones.
>> It uses caffeine cache.
>> 
>> What I observe is that there is an exception when shutting down for the 
>> collection that is updated - timeout waiting for all directory ref counts to 
>> be released - gave up waiting on CacheDir.
>> 
>>>> Am 20.01.2020 um 11:26 schrieb Jörn Franke :
>>> 
>>> Sorry I missed a line - not tlog is growing but the /data/index folder is 
>>> growing - until restart when it seems to be purged.
>>> 
>>>> Am 20.01.2020 um 10:47 schrieb Jörn Franke :
>>>> 
>>>> Hi,
>>>> 
>>>> I have a test system here with Solr 8.4 (but this is also reproducible in 
>>>> older Solr versions), which has an index which is growing and growing - 
>>>> until the SolrCloud instance is restarted - then it is reduced tot the 
>>>> expected normal size. 
>>>> The collection is configured to do auto commit after 15000 ms. I expect 
>>>> the index grows comes due to the usage of atomic updates, but I would 
>>>> expect that due to the auto commit this does not grow all the time.
>>>> After the atomic updates a commit is done in any case.
>>>> 
>>>> I don’t see any error message in the log files, but the growth is quiet 
>>>> significant and frequent restarts are not a solution of course.
>>>> 
>>>> Maybe I am overlooking here a tiny configuration issue? 
>>>> 
>>>> Thank you.
>>>> 
>>>> 
>>>> Best regards
> 


Re: Index growing and growing until restart

2020-01-20 Thread Jörn Franke
From what is see it basically duplicates the index files, but does not delete 
the old ones.
It uses caffeine cache.

What I observe is that there is an exception when shutting down for the 
collection that is updated - timeout waiting for all directory ref counts to be 
released - gave up waiting on CacheDir.

> Am 20.01.2020 um 11:26 schrieb Jörn Franke :
> 
> Sorry I missed a line - not tlog is growing but the /data/index folder is 
> growing - until restart when it seems to be purged.
> 
>> Am 20.01.2020 um 10:47 schrieb Jörn Franke :
>> 
>> Hi,
>> 
>> I have a test system here with Solr 8.4 (but this is also reproducible in 
>> older Solr versions), which has an index which is growing and growing - 
>> until the SolrCloud instance is restarted - then it is reduced tot the 
>> expected normal size. 
>> The collection is configured to do auto commit after 15000 ms. I expect the 
>> index grows comes due to the usage of atomic updates, but I would expect 
>> that due to the auto commit this does not grow all the time.
>> After the atomic updates a commit is done in any case.
>> 
>> I don’t see any error message in the log files, but the growth is quiet 
>> significant and frequent restarts are not a solution of course.
>> 
>> Maybe I am overlooking here a tiny configuration issue? 
>> 
>> Thank you.
>> 
>> 
>> Best regards


Re: Index growing and growing until restart

2020-01-20 Thread Jörn Franke
Sorry I missed a line - not tlog is growing but the /data/index folder is 
growing - until restart when it seems to be purged.

> Am 20.01.2020 um 10:47 schrieb Jörn Franke :
> 
> Hi,
> 
> I have a test system here with Solr 8.4 (but this is also reproducible in 
> older Solr versions), which has an index which is growing and growing - until 
> the SolrCloud instance is restarted - then it is reduced tot the expected 
> normal size. 
> The collection is configured to do auto commit after 15000 ms. I expect the 
> index grows comes due to the usage of atomic updates, but I would expect that 
> due to the auto commit this does not grow all the time.
> After the atomic updates a commit is done in any case.
> 
> I don’t see any error message in the log files, but the growth is quiet 
> significant and frequent restarts are not a solution of course.
> 
> Maybe I am overlooking here a tiny configuration issue? 
> 
> Thank you.
> 
> 
> Best regards


Index growing and growing until restart

2020-01-20 Thread Jörn Franke
Hi,

I have a test system here with Solr 8.4 (but this is also reproducible in older 
Solr versions), which has an index which is growing and growing - until the 
SolrCloud instance is restarted - then it is reduced tot the expected normal 
size. 
The collection is configured to do auto commit after 15000 ms. I expect the 
index grows comes due to the usage of atomic updates, but I would expect that 
due to the auto commit this does not grow all the time. 
After the atomic updates a commit is done in any case.

I don’t see any error message in the log files, but the growth is quiet 
significant and frequent restarts are not a solution of course.

Maybe I am overlooking here a tiny configuration issue? 

Thank you.


Best regards 

Re: Solr cloud production set up

2020-01-18 Thread Jörn Franke
I think you should do your own measurements. This is very document and 
processing specific.
You can run a test with a simple setup for let’s say 1 mio document and 
interpolate from this. It could be also that your ETL is the bottleneck and not 
Solr.
At the same time you can simulate user queries using Jmeter or similar.

> Am 18.01.2020 um 09:05 schrieb Rajdeep Sahoo :
> 
> Our Index size is huge and in master slave the full indexing time is almost
> 24 hrs.
>   In future the no of documents will increase.
> So,please some one recommend about the no of nodes and configuration like
> ram and cpu core for solr cloud.
> 
>> On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, 
>> wrote:
>> 
>> Why do you want to change to Solr Cloud? Master/slave is a great, stable
>> cluster architecture.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo 
>> wrote:
>>> 
>>> Please reply anyone
>>> 
>>> On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, <
>> rajdeepsahoo2...@gmail.com>
>>> wrote:
>>> 
 Hi all,
 We are using solr cloud 7.7.1
 In a live production environment how many solr cloud server do we need,
 Currently ,we are using master slave set up with 16 slave server with
 solr 4.6.
 In solr cloud do we need to scale it up or 16 server will suffice the
 purpose.
 
 
>> 
>> 


Re: regarding Extracting text from Images

2020-01-17 Thread Jörn Franke
Have you checked this?

https://cwiki.apache.org/confluence/display/TIKA/TikaOCR

> Am 17.01.2020 um 10:54 schrieb Retro :
> 
> Hello, can you please advise me, how to configure Solr so that embedded Tika
> is able to use Tesseract to do the  ocr of images? I have installed the
> following software -
> SOLR  - 7.4.0
> Tesseract - 4.1.1-rc2-20-g01fb
> TIKA   - TIKA 1.18 
> Tesseract is installed in to the following directory:
> /usr/share/tesseract/4/tessdata/
> echo $TESSDATA_PREFIX - > /usr/share/tesseract/4/tessdata/
> tesseract -v
> tesseract 4.1.1-rc2-20-g01fb
> leptonica-1.76.0
> 
> Command “tesseract test.jpg  test.txt”  produces accurate txt file with
> OCRed content from test.jpg
> Current setup allows us to index attachments such like structured text files
> (txt, word, pdf, etc), but does not react in any way for attachments like
> png, jpg. Nor it works if uploaded directly to SOLR using its web interface.
> 
> Necessary modifications were made to the following files:
> solrconfig.xml; TesseractOCRConfig.properties; parsecontent.xml;
> PDFparser.properties.
> 
> Would appreciate if someone helped me with this configuration. 
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 8.4.0 Cloud Graph is not shown due to CSP

2020-01-08 Thread Jörn Franke
I have to admit it was the cache. Sorry I believed I deleted it. Thanks for the 
efforts and testing ! I will update the Jira.

> Am 07.01.2020 um 22:14 schrieb Jörn Franke :
> 
> 
> here you go:
> https://issues.apache.org/jira/browse/SOLR-14176
> 
> a detailed screenshot of the message can be made tomorrow, but it looks 
> pretty much to CSP to me - i have seen others with other applications using 
> CSP.
> 
>> On Tue, Jan 7, 2020 at 6:45 PM Kevin Risden  wrote:
>> So this is caused by SOLR-13982 [1] and specifically SOLR-13987 [2]. Can
>> you open a new Jira specifically for this? It would be great if you could
>> capture from Chrome Dev Tools (or Firefox) the error message around what
>> specifically CSP is complaining about.
>> 
>> The other thing to ensure is that you force refresh the UI to make sure
>> nothing is cached. Idk if that is in play here but doesn't hurt.
>> 
>> [1] https://issues.apache.org/jira/browse/SOLR-13982
>> [2] https://issues.apache.org/jira/browse/SOLR-13987
>> 
>> Kevin Risden
>> 
>> On Tue, Jan 7, 2020, 11:15 Jörn Franke  wrote:
>> 
>> > Dear all,
>> >
>> > I noted that in Solr Cloud 8.4.0 the graph is not shown due to
>> > Content-Security-Policy. Apparently it violates unsafe-eval.
>> > It is a minor UI thing, but should I create an issue to that one? Maybe it
>> > is rather easy to avoid in the source code of the admin page?
>> >
>> > Thank you.
>> >
>> > Best regards
>> >
>> >
>> >


Re: Solr 8.4.0 Cloud Graph is not shown due to CSP

2020-01-07 Thread Jörn Franke
here you go:
https://issues.apache.org/jira/browse/SOLR-14176

a detailed screenshot of the message can be made tomorrow, but it looks
pretty much to CSP to me - i have seen others with other applications using
CSP.

On Tue, Jan 7, 2020 at 6:45 PM Kevin Risden  wrote:

> So this is caused by SOLR-13982 [1] and specifically SOLR-13987 [2]. Can
> you open a new Jira specifically for this? It would be great if you could
> capture from Chrome Dev Tools (or Firefox) the error message around what
> specifically CSP is complaining about.
>
> The other thing to ensure is that you force refresh the UI to make sure
> nothing is cached. Idk if that is in play here but doesn't hurt.
>
> [1] https://issues.apache.org/jira/browse/SOLR-13982
> [2] https://issues.apache.org/jira/browse/SOLR-13987
>
> Kevin Risden
>
> On Tue, Jan 7, 2020, 11:15 Jörn Franke  wrote:
>
> > Dear all,
> >
> > I noted that in Solr Cloud 8.4.0 the graph is not shown due to
> > Content-Security-Policy. Apparently it violates unsafe-eval.
> > It is a minor UI thing, but should I create an issue to that one? Maybe
> it
> > is rather easy to avoid in the source code of the admin page?
> >
> > Thank you.
> >
> > Best regards
> >
> >
> >
>


Solr 8.4.0 Cloud Graph is not shown due to CSP

2020-01-07 Thread Jörn Franke
Dear all,

I noted that in Solr Cloud 8.4.0 the graph is not shown due to 
Content-Security-Policy. Apparently it violates unsafe-eval.
It is a minor UI thing, but should I create an issue to that one? Maybe it is 
rather easy to avoid in the source code of the admin page?

Thank you.

Best regards 




Re: Question about the max num of solr node

2020-01-03 Thread Jörn Franke
Why do you want to set up so many? What are your designs in terms of volumes / 
no of documents etc? 


> Am 03.01.2020 um 10:32 schrieb Hongxu Ma :
> 
> Hi community
> I plan to set up a 128 host cluster: 2 solr nodes on each host.
> But I have a little concern about whether solr can support so many nodes.
> 
> I searched on wiki and found:
> https://cwiki.apache.org/confluence/display/SOLR/2019-11+Meeting+on+SolrCloud+and+project+health
> "If you create thousands of collections, it’ll lock up and become inoperable. 
>  Scott reported that If you boot up a 100+ node cluster, SolrCloud won’t get 
> to a happy state; currently you need to start them gradually."
> 
> I wonder to know:
> Beside the quoted items, does solr have known issues in a big cluster?
> And does solr have a hard limit number of max node?
> 
> Thanks.


Re: Solr 7.5 seed up, accuracy details

2019-12-28 Thread Jörn Franke
This highly depends on how you designed your collections etc. - there is no 
general answer. You have to do a performance test based on your configuration 
and documents.

I also recommend to check the Solr documentation on how to design a collection 
for 7.x and maybe start even from scratch defining it with a new fresh schema 
(using the schema api instead of schema.xml and solrconfig.xml etc). You will 
have anyway to reindex everything so it is a also a good opportunity to look at 
your existing processes and optimize them.

> Am 28.12.2019 um 15:19 schrieb Rajdeep Sahoo :
> 
> Hi all,
> Is there any way I can get the speed up,accuracy details i.e. performance
> improvements of solr 7.5 in comparison with solr 4.6
>  Currently,we are using solr 4.6 and we are in a process to upgrade to
> solr 7.5. Need these details.
> 
> Thanks in advance


Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2019-12-24 Thread Jörn Franke
It seems that you got this handed over with little documentation. You have to 
explore what the import handler does. This is a custom configuration that you 
need to check how it works.

Then as already said. You can simply install another version of Solr if you are 
within a Solr major version 8.x in Linux is simply a symbolic link pointing 
from one Solr version to the other. In this way you can easily switch back as 
well.

Finally, check your memory consumption. Normally heap is significant smaller 
then the total available memory as the non-heap memory is used by Solr for 
caching.

If you have 8g mb of heap I would expect that the total amount of memory 
available is more than 32 gb.
As always it depends, but maybe you can give more details on no of cores, heap 
memory, total memory and if other processes than Solr run on the machine.

> Am 24.12.2019 um 05:59 schrieb Ken Walker :
> 
> Hello,
> 
> We are using solr version 8.2.0 in our production server.
> 
> We are upgrading solr version from solr 8.2.0 version to solr 8.3.1
> version but we have faced out of memory error while importing data and
> then we have extended memory in our server and then again start
> importing process but it has work too slowy for 8GB data ( it has
> taken more than 2 days for importing data from solr 8.2.0 version to
> solr 8.3.1 version).
> 
> Could you please help me how we can do it fast for importing 8GB data
> from old solr version to new solr version?
> 
> We are using below command for importing data from one solr version to
> another solr version
> $ curl 
> 'http://IP-ADDRESS:8983/solr/items/dataimport?command=full-import=true=false=json=true=false=false=false'
> 
> Thanks in advance!
> - Ken


Re: MoreLikeThis does not work

2019-12-22 Thread Jörn Franke
It looks like you are trying to do a more like this of all documents in the 
collection. I am not sure if this makes sense. Maybe you should put a query 
that results in less results, eg one that returns a specific document. 

> Am 22.12.2019 um 13:40 schrieb Nehemia Litterat :
> 
> Hi,
> Any help will be appreciated
> 
> Using solr 8.2
> 
> I created a core using
> 
> ./solr create -c
> 
> 
> imported example docs using
> 
> * curl 'http://localhost:32576/solr/playground/update?commit=true
> ' --data-binary
> @/home/vagrant/books.csv -H 'Content-type:application/csv'*
> 
> the data is the sample data coming with solr
> 
> I did not change anything else in the configuration
> 
> trying to add *mlt=true=name* to my query did not yield any result
> 
> *http://localhost:8983/solr/playground/select?mlt.fl=name=true=*%3A*
> *
> 
> 
> I always get
> 
> "moreLikeThis":{ "0553573403":{"numFound":0,"start":0,"docs":[] }}
> 
> in the end of my results
> 
> this is the full result
> 
> {
> "responseHeader":{ "status":0, "QTime":3, "params":{ "q":"*:*", "mlt":"true",
> "mlt.fl":"name", "_":"1577018250160"}}, "response":{"numFound":10,"start":0
> ,"docs":[ { "id":"0553573403", "cat":["book"], "name":["A Game of Thrones"],
> "price":[7.99], "inStock":[true], "author":["George R.R. Martin"], "series_t
> ":"A Song of Ice and Fire", "sequence_i":1, "genre_s":"fantasy", "_version_
> ":1653437692236529664}, { "id":"0553579908", "cat":["book"], "name":["A
> Clash of Kings"], "price":[7.99], "inStock":[true], "author":["George R.R.
> Martin"], "series_t":"A Song of Ice and Fire", "sequence_i":2, "genre_s":
> "fantasy", "_version_":1653437692242821120}, { "id":"055357342X", "cat":[
> "book"], "name":["A Storm of Swords"], "price":[7.99], "inStock":[true], "
> author":["George R.R. Martin"], "series_t":"A Song of Ice and Fire", "
> sequence_i":3, "genre_s":"fantasy", "_version_":1653437692243869696}, { "id
> ":"0553293354", "cat":["book"], "name":["Foundation"], "price":[7.99], "
> inStock":[true], "author":["Isaac Asimov"], "series_t":"Foundation Novels",
> "sequence_i":1, "genre_s":"scifi", "_version_":1653437692244918272}, { "id":
> "0812521390", "cat":["book"], "name":["The Black Company"], "price":[6.99],
> "inStock":[false], "author":["Glen Cook"], "series_t":"The Chronicles of
> The Black Company", "sequence_i":1, "genre_s":"fantasy", "_version_":
> 1653437692244918273}, { "id":"0812550706", "cat":["book"], "name":["Ender's
> Game"], "price":[6.99], "inStock":[true], "author":["Orson Scott Card"], "
> series_t":"Ender", "sequence_i":1, "genre_s":"scifi", "_version_":
> 1653437692245966848}, { "id":"0441385532", "cat":["book"], "name":["Jhereg"],
> "price":[7.95], "inStock":[false], "author":["Steven Brust"], "series_t":"Vlad
> Taltos", "sequence_i":1, "genre_s":"fantasy", "_version_":
> 1653437692245966849}, { "id":"0380014300", "cat":["book"], "name":["Nine
> Princes In Amber"], "price":[6.99], "inStock":[true], "author":["Roger
> Zelazny"], "series_t":"the Chronicles of Amber", "sequence_i":1, "genre_s":
> "fantasy", "_version_":1653437692247015424}, { "id":"0805080481", "cat":[
> "book"], "name":["The Book of Three"], "price":[5.99], "inStock":[true], "
> author":["Lloyd Alexander"], "series_t":"The Chronicles of Prydain", "
> sequence_i":1, "genre_s":"fantasy", "_version_":1653437692248064000}, { "id
> ":"080508049X", "cat":["book"], "name":["The Black Cauldron"], "price":[5.99
> ], "inStock":[true], "author":["Lloyd Alexander"], "series_t":"The
> Chronicles of Prydain", "sequence_i":2, "genre_s":"fantasy", "_version_":
> 1653437692248064001}] }, "moreLikeThis":{ "0553573403":{"numFound":0,"start
> ":0,"docs":[] }, "0553579908":{"numFound":0,"start":0,"docs":[] }, "
> 055357342X":{"numFound":0,"start":0,"docs":[] }, "0553293354":{"numFound":0
> ,"start":0,"docs":[] }, "0812521390":{"numFound":0,"start":0,"docs":[] }, "
> 0812550706":{"numFound":0,"start":0,"docs":[] }, "0441385532":{"numFound":0
> ,"start":0,"docs":[] }, "0380014300":{"numFound":0,"start":0,"docs":[] }, "
> 0805080481":{"numFound":0,"start":0,"docs":[] }, "080508049X":{"numFound":0
> ,"start":0,"docs":[] }}}
> 
> 
> 
> [image: photo]
> *Nehemia Litterat*
> Our story
> 
> +972-54-6609351 | nlitte...@gmail.com
> 
> Skype: nlitterat <#SignatureSanitizer_SafeHtmlFilter_>
> 
> Please consider your environmental responsibility. Before printing this
> e-mail message, ask yourself whether you really need a hard copy.
> Create your own email signature
> 


Re: number of files indexed (re-formatted)

2019-12-18 Thread Jörn Franke
This depends on your ingestion process. Usually the unique ids that are not 
filenames may come not from a file or your ingestion process does not tel the 
file name. In this case the Collection seems to be configured to generate a 
unique identifier.

Maybe you can describe more in detail on how you process the files.

A wild speculation could be that they come from inside a zip file. In this case 
Metadata from Tika could be used as an Id were you concatenation zip file + 
file inside zip file .
However we don’t know what you have defined how your ingestion process looks 
like so this is pure speculation from my side.

> Am 18.12.2019 um 16:40 schrieb Nan Yu :
> 
> Sorry that I just found out that the mailing list takes plain text and my 
> previous post looks really messy. So I reformatted it.
> 
> 
> Hi,
> I did a simple indexing of a directory that contains a lot of pdf, text, 
> doc, zip etc. There are no structures for the content of the files and I 
> would like to index them and later on search "key words" within the files.
> 
> 
> After creating the core, I indexed the files in the directory using the 
> following command: 
> 
> 
> bin/post -p 8983 -m 10g -c myCore /DATA_FOLDER > solr_indexing.log
> 
> 
> The log file shows something like below (the first and last few lines in 
> the log file):
> 
> 
> java -classpath /solr/solr-8.3.0/dist/solr-core-8.3.0.jar -Dauto=yes 
> -Dport=8983 -Dm=15g -Dc=myCore -Ddata=files -Drecursive=yes 
> org.apache.solr.util.SimplePostTool /DATA_FOLDER
> SimplePostTool version 5.0.0
> Posting files to [base] url http://localhost:8983/solr/myCore/update...
> Entering auto mode. File endings considered are 
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> ...
> ...
> ...
> POSTing file Report.pdf (application/pdf) to [base]/extract
> 47256 files indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/myCore/update...
> Time spent: 1:03:59.587
> 
> 
> 
> 
> But when using browser to try to look at the result, the "overview" 
> (http://localhost:8983/solr/#/myCore/core-overview) shows:
> Num Docs: 47648
> 
> 
> Most of the files indexed has an metadata id has the value of the full path 
> of the file indexed, such as /DATA_FOLDER/20180321/Report.pdf 
> 
> 
> But there are about 400 of them, the id looks like: 
> 232d7bd6-c586-4726-8d2b-bc9b1febcff4.
> 
> 
> So my questions are:
> (1)why the two numbers are different (in log file vs. in the overview).
> (2)for those ids that are not a full path of a file, how do I know where they 
> comes from (the original file)?
> 
> 
> 
> 
> Thanks for your help!
> Nan
> 
> 
> 
> 
> PS: a few examples of query result for those strange ids:
> 
> 
> {
> "bolt-small-online":["Test strip-north"],
> "3696714.008":[3702848.584],
> "380614.564":[376900.143],
> "100.038":[111.074],
> "gpo-bolt":["teststrip"],
> "id":"232d7bd6-c586-4726-8d2b-bc9b1febcff4",
> "_version_":1652839231413813252
> }
> 
> 
> 
> 
> {
> "Date":["8/24/2001"],
> "EXT31":[0],
> "EXT32":[0.12],
> "Aggregate":[0.12],
> "Pounds_Vap":[37],
> "Gallons_Vap":[5.8],
> "Gallons_Liq":[0],
> "Gallons_Tot":[5.8],
> "Avg_Rate":[1.8],
> "Gallons_Rec":[577],
> "Water":[577],
> "id":"840c05af-caf0-4407-8753-dcc6957abcc5",
> "Well_s_":["EXT31;EXT32"],
> "Time__hrs_":[3.25],
> "_version_":1652898731969740800}]
>   }
> 
> 
> {
> "2":[4],
> "SFS1":["PLM1"],
> "1.00":[1.0],
> "69":[79],
> "id":"e675a6f5-0a3e-41b1-b1fe-b3098d0be725",
> "_version_":1652825435791163395
> }


Re: Atomic solrj update

2019-12-12 Thread Jörn Franke
One needs to see the code or get more insights on your design. Do you reuse the 
HTTPClient or do you create for every request a new one?
How often do you commit?
Do you do parallel updates from the client (multiple threads?).

> Am 13.12.2019 um 06:56 schrieb Prem :
> 
> I am trying to partially update of 50M data in a collection from CSV using
> Atomic script(solrj).But it is taking 2 hrs for 1M records.is there anyway i
> can speed up my update.
> Using HTTPClient to establish connection and also i am validating whether
> the particular document is available in collection or not and after that
> updating the document.
> 
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: user solr created by install not working with default password

2019-12-11 Thread Jörn Franke
Even for in-house without no outside access you should have authentication and 
https. There can be a tiny misconfiguration somewhere else not controlled by 
you and you face suddenly a big open leak.

Never do this - not even for development environments (here another important 
aspect is if there are possible authentication issues that you are aware of 
them early).

> Am 11.12.2019 um 17:18 schrieb rhys J :
> 
> I installed Solr following the directions on this site:
> 
> https://lucene.apache.org/solr/guide/6_6/installing-solr.html
> 
> I am running standalone Solr with no authentication added because it is all
> in-house with no access to outside requests.
> 
> When I try to su solr, using the password mentioned here:
> https://lucidworks.com/post/securing-solr-basic-auth-permission-rules/, i
> get an authentication failure.
> 
> I'm trying to chase down a bug, and I need to be able to see the results of
> some commands from the user solr's perspective.
> 
> What am I doing wrong?
> 
> Thanks,
> 
> Rhys


Re: Is it possible to have different Stop words depending on the value of a field?

2019-12-02 Thread Jörn Franke
You can have different fields by country. I am not sure about your stop words 
but if they are not occurring in the other languages then you have not a 
problem. 
On the other hand: it you need more than stop words (eg lemmatizing, 
specialized way of tokenization etc) then you need a different field per 
language. You don’t describe your full use case, but if you have different 
fields for different language then your client application needs to handle this 
(not difficult, but you have to be aware).
Not sure if you need to search a given address in all languages or if you use 
the language of the user etc.

> Am 02.12.2019 um 20:13 schrieb yeikel valdes :
> 
> Hi,
> 
> 
> I have an index that stores addresses from different countries.
> 
> 
> As every country has different stop words, I was wondering if it is possible 
> to apply a different set of stop words depending on the value of a field. 
> 
> 
> Or do I need different indexes/do itnat the ETL step to accomplish this?
> 
> 


Re: problem using Http2SolrClient with solr 8.3.0

2019-11-27 Thread Jörn Franke
Which jdk version? In this Setting i would recommend JDK11.

> Am 27.11.2019 um 22:00 schrieb Odysci :
> 
> Hi,
> I have a solr cloud setup using solr 8.3 and SolrJj, which works fine using
> the HttpSolrClient as well as the CloudSolrClient. I use 2 solr nodes with
> 3 Zookeeper nodes.
> Recently I configured my machines to handle ssl, http/2 and then I tried
> using in my java code the Http2SolrClient supported by SolrJ 8.3.0, but I
> got the following error at run time upon instantiating the Http2SolrClient
> object:
> 
> Has anyone seen this problem?
> Thanks
> Reinaldo
> ===
> 
> Oops: NoClassDefFoundError
> Unexpected error : Unexpected Error, caused by exception
> NoClassDefFoundError: org/eclipse/jetty/client/api/Request
> 
> play.exceptions.UnexpectedException: Unexpected Error
> at play.jobs.Job.onException(Job.java:180)
> at play.jobs.Job.call(Job.java:250)
> at Invocation.Job(Play!)
> Caused by: java.lang.NoClassDefFoundError:
> org/eclipse/jetty/client/api/Request
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient$AsyncTracker.(Http2SolrClient.java:789)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.(Http2SolrClient.java:131)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.build(Http2SolrClient.java:833)
> ... more
> Caused by: java.lang.ClassNotFoundException:
> org.eclipse.jetty.client.api.Request
> at
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
> at
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> ... 16 more
> ==


  1   2   3   >