Needs Help for Using Jaeger to Trace Solr

2020-06-01 Thread Yihao Huang
Hi,I am new to use Solr and Jaeger and currently I am working on how to use Jaeger to find trace of Solr. I get the problem that Jaeger seems unable to catch trace from Solr.I am using the Techproducts example data on Solr. I initiate the Solr service by running ./bin/solr start -e cloud. But the Jaeger UI (which I have tested to be effective using the demo application HotROD) does not show anything related to Solr. To solve the problem, I have attempted to:	1. Change the sampling rate by setting /admin/collections?action=""> but nothing changed on the Jaeger UI.	2. Set up tracer configurator in solr.xml under solr/example/cloud/node1/solr and solr/example/cloud/node2/solr as shown in the attached file, but it is reported that "ERROR: Did not see Solr at http://localhost:8983/solr come online within 30"I am not sure whether the operation I have made is correct. And I also notice that in https://lucene.apache.org/solr/guide/8_2/solr-tracing.html#jaeger-tracer-configurator, it is said that "Note that all library of jaegertracer-configurator must be included in the classpath of all nodes…”. So I also attempt to:		3. Run ./bin/solr start -e cloud -a “-classpath org.apache.solr.jaeger.JaegerTracerConfigurator” to include the classpath. But Solr reports "ERROR: Unbalanced quotes in "bin/solr" start -cloud -p 8983 -s "example/cloud/node1/solr” org.apache.solr.jaeger.JaegerTracerConfigurator" -a "-classpath”. I searched online and probably it is an unfixed bug of Solr (https://issues.apache.org/jira/browse/SOLR-8552). So this also doesn’t work.Again, as I am new to Solr and Jaeger, I am not sure whether these operations are stupid or not :-(. So I do hope that I can get some help from your team for making Jaeger and Solr work together. I would really appreciate your reply!Best regards,Yihao

solr.xml
Description: XML document


solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-01 Thread yaswanth kumar
Trying to setup solr 8.4.1 + open jdk 11 on centos , enabled the ssl
configurations with all the certs in place, but the issue what I am seeing
is when trying to hit /update api on non-leader solr node , its throwing an
error

configured 2 solr nodes with 1 zookeeper.

metadata":[
"error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException",
"root-error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException"],
"msg":"Async exception during distributed update:
javax.crypto.BadPaddingException: RSA private key operation failed",
"trace":"org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update:
javax.crypto.BadPaddingException: RSA private key operation failed\n\tat
org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribFinish(DistributedZkUpdateProcessor.java:1189)\n\tat
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1096)\n\tat
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)\n\tat
org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)\n\tat
org.apache.solr.update.processor.UpdateRequestProcessor.finish

Strangely this is happening when we try to hit a non-leader node, hitting
leader node its working fine without any issue and getting the data indexed.

Not able to track down where the exact issue is happening.

Thanks,

-- 
Thanks & Regards,
Yaswanth Kumar Konathala.
yaswanth...@gmail.com


Re: Lucene query to Solr query

2020-06-01 Thread gnandre
Is this odd use-case where one needs to convert Lucene query to Solr query?
Isn't this normal use-case when somebody is trying to port their Lucene
code to Solr?
I mean, is it like a XY problem where I should not even run into this
problem in the first place?


On Sun, May 31, 2020 at 9:40 AM Mikhail Khludnev  wrote:

> There's nothing like this now. Presumably one might visit queries and
> generate Query DSL json, but it might be a challenging problem.
>
> On Sun, May 31, 2020 at 3:42 AM gnandre  wrote:
>
> > I think this question here in this thread is similar to my question.
> >
> >
> https://lucene.472066.n3.nabble.com/Lucene-Query-to-Solr-query-td493751.html
> >
> >
> > As suggested in that thread, I do not want to use toString method for
> > Lucene query to pass it to the q param in SolrQuery.
> >
> > I am looking for a function that accepts org.apache.lucene.search.Query
> and
> > returns org.apache.solr.client.solrj.SolrQuery. Is that possible?
> >
> > On Sat, May 30, 2020 at 8:08 AM Erick Erickson 
> > wrote:
> >
> > > edismas is quite different from straight Lucene.
> > >
> > > Try attaching =query to the input and
> > > you’ll see the difference.
> > >
> > > Best,
> > > Erick
> > >
> > > > On May 30, 2020, at 12:32 AM, gnandre 
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I have following query which works fine as a lucene query:
> > > > +(topics:132)^0.02607211 (topics:146)^0.008187325
> > > > -asset_id:doc:en:index.html
> > > >
> > > > But, it does not work if I use it as a solr query with lucene as
> > defType.
> > > >
> > > > For it to work, I need to convert it like following:
> > > > q=+((topics:132)^0.02607211 (topics:146)^0.008187325
> > > > +(-(asset_id:doc\:en\:index.html))=edismax=OR
> > > >
> > > > Why does it not work as is? AFAIK syntax given in the first query is
> > > > supported by edismax.
> > >
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: SOLR cache tuning

2020-06-01 Thread Tarun Jain
 Hi,Thanks for the replies so far.
Walter: We have a few more solr cores. So the JVM is sized accordingly. I know 
we can separate the cores but for easier maintainability we have only one core. 
Also only one core is being used majority of the times. 
Jorn: I dont have a particular performance number in mind. I am exploring what 
kind of tuning can be done on a read-only slave on a server with tons of ram.
--Earlier today while reading the SOLR documentation I saw that 
CaffeineCache is the preferred caching implementation. So I switched my solr 
core to use CaffeineCache and the benchmarking results are very good.The 
reading times for 1.8 million documents has gone down from 210+ secs to ~130 
secs by just using CaffeineCache! So a 40% gain. 
I would recommend switching to CaffeineCache asap as it seems to be a simple 
change to get a very good speed up. 
I tried various numbers and looks like the default 512 size for filterCache & 
queryResultCache. The document size in my case is giving slightly better 
results with size=8192
If anyone else has any other tips on improving performance by changing 
parameters please let me know.Thanks for the replies so far.
Tarun Jain-=-On Monday, June 1, 2020, 01:55:56 PM EDT, Jörn Franke 
 wrote:  
 
 You should not have other processes/container running on the same node. They 
potentially screw up your os cache making things slow, eg if the other 
processes also read files etc they can remove things from Solr from the Os 
cache and then the os cache needs to be filled again.

What performance do you have now and what performance do you expect?

For full queries I would try to export daily all the data and offer it as a 
simple https download/on a object store. Maybe when you process the documents 
for indexing you can already put them on a object store or similar - so you 
don’t need Solr at all to export all of the documents.


See also Walters message.

> Am 01.06.2020 um 17:29 schrieb Tarun Jain :
> 
> Hi,I have a SOLR installation in master-slave configuration. The slave is 
> used only for reads and master for writes.
> I wanted to know if there is anything I can do to improve the performance of 
> the readonly Slave instance?
> I am running SOLR 8.5 and Java 14. The JVM has 24GB of ram allocated. Server 
> has 256 GB of RAM with about 50gb free (rest being used by other services on 
> the server)The index is 15gb in size with about 2 million documents.
> We do a lot of queries where documents are fetched using filter queries and a 
> few times all 2 million documents are read.My initial idea to speed up SOLR 
> is that given the amount of memory available, SOLR should be able to keep the 
> entire index on the heap (I know OS will also cache the disk blocks) 
> My solrconfig has the following:
>  20  class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0" /> 
>  autowarmCount="0" />  initialSize="8192" autowarmCount="0" />  class="solr.search.LRUCache" size="10" initialSize="0" autowarmCount="10" 
> regenerator="solr.NoOpRegenerator" /> 
> true 
> 20 
> 200 
> false 
> 2 
> I have modified the documentCache size to 8192 from 512 but it has not helped 
> much. 
> I know this question has probably been asked a few times and I have read 
> everything I could find out about SOLR cache tuning. I am looking for some 
> more ideas.
> 
> Any ideas?
> Tarun Jain-=-  

Hardware related issue

2020-06-01 Thread Rudenko, Artur
Hi Guys,

We were planning on using 7 physical servers for our solr node with 64 VCPUs 
2GHZ and 128 GRAM each but due to some constrains we had to use virtual 
environment that does not has the same amount of cpus. We were suggested to use 
less cpus with higher ghz.
Do we need to look for L1-L3 cache metrics? Do we need to increase RAM/IOPS 
relationally to fill the "gap" of less amount of CPU cores?
Any particular suggestions?

Artur Rudenko


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


Solr Terms browsing in descending order

2020-06-01 Thread Jigar Gajjar
Hello,
is it possible to retrieve index terms in the descending  order using terms 
handler, right now we get all terms in ascending order.
Thanks,Jigar Gajjar


Re: SOLR cache tuning

2020-06-01 Thread Jörn Franke
You should not have other processes/container running on the same node. They 
potentially screw up your os cache making things slow, eg if the other 
processes also read files etc they can remove things from Solr from the Os 
cache and then the os cache needs to be filled again.

What performance do you have now and what performance do you expect?

For full queries I would try to export daily all the data and offer it as a 
simple https download/on a object store. Maybe when you process the documents 
for indexing you can already put them on a object store or similar - so you 
don’t need Solr at all to export all of the documents.


See also Walters message.

> Am 01.06.2020 um 17:29 schrieb Tarun Jain :
> 
> Hi,I have a SOLR installation in master-slave configuration. The slave is 
> used only for reads and master for writes.
> I wanted to know if there is anything I can do to improve the performance of 
> the readonly Slave instance?
> I am running SOLR 8.5 and Java 14. The JVM has 24GB of ram allocated. Server 
> has 256 GB of RAM with about 50gb free (rest being used by other services on 
> the server)The index is 15gb in size with about 2 million documents.
> We do a lot of queries where documents are fetched using filter queries and a 
> few times all 2 million documents are read.My initial idea to speed up SOLR 
> is that given the amount of memory available, SOLR should be able to keep the 
> entire index on the heap (I know OS will also cache the disk blocks) 
> My solrconfig has the following:
>  20  class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0" /> 
>  autowarmCount="0" />  initialSize="8192" autowarmCount="0" />  class="solr.search.LRUCache" size="10" initialSize="0" autowarmCount="10" 
> regenerator="solr.NoOpRegenerator" /> 
> true 
> 20 
> 200 
> false 
> 2 
> I have modified the documentCache size to 8192 from 512 but it has not helped 
> much. 
> I know this question has probably been asked a few times and I have read 
> everything I could find out about SOLR cache tuning. I am looking for some 
> more ideas.
> 
> Any ideas?
> Tarun Jain-=-


Re: question about setup for maximizing solr performance

2020-06-01 Thread Shawn Heisey

On 6/1/2020 9:29 AM, Odysci wrote:

Hi,
I'm looking for some advice on improving performance of our solr setup.




Does anyone have any insights on what would be better for maximizing
throughput on multiple searches being done at the same time?
thanks!


In almost all cases, adding memory will provide the best performance 
boost.  This is because memory is faster than disks, even SSD.  I have 
put relevant information on a wiki page so that it is easy for people to 
find and digest:


https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems

Thanks,
Shawn


Re: SOLR cache tuning

2020-06-01 Thread Walter Underwood
Reading all the documents is going to be slow. If you want to do that, use a 
database.

You do NOT keep all of the index in heap. Solr doesn’t work like that.

Your JVM heap is probably way too big for 2 million documents, but I doubt that 
is the performance issue. We use an 8 GB heap for all of our Solr instances, 
including one with about 5 million docs per shard.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 1, 2020, at 8:28 AM, Tarun Jain  wrote:
> 
> Hi,I have a SOLR installation in master-slave configuration. The slave is 
> used only for reads and master for writes.
> I wanted to know if there is anything I can do to improve the performance of 
> the readonly Slave instance?
> I am running SOLR 8.5 and Java 14. The JVM has 24GB of ram allocated. Server 
> has 256 GB of RAM with about 50gb free (rest being used by other services on 
> the server)The index is 15gb in size with about 2 million documents.
> We do a lot of queries where documents are fetched using filter queries and a 
> few times all 2 million documents are read.My initial idea to speed up SOLR 
> is that given the amount of memory available, SOLR should be able to keep the 
> entire index on the heap (I know OS will also cache the disk blocks) 
> My solrconfig has the following:
>  20  class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0" /> 
>  autowarmCount="0" />  initialSize="8192" autowarmCount="0" />  class="solr.search.LRUCache" size="10" initialSize="0" autowarmCount="10" 
> regenerator="solr.NoOpRegenerator" /> 
> true 
> 20 
> 200 
> false 
> 2 
> I have modified the documentCache size to 8192 from 512 but it has not helped 
> much. 
> I know this question has probably been asked a few times and I have read 
> everything I could find out about SOLR cache tuning. I am looking for some 
> more ideas.
> 
> Any ideas?
> Tarun Jain-=-



question about setup for maximizing solr performance

2020-06-01 Thread Odysci
Hi,
I'm looking for some advice on improving performance of our solr setup. In
particular, about the trade-offs between applying larger machines, vs more
smaller machines. Our full index has just over 100 million docs, and we do
almost all searches using fq's (with q=*:*) and facets. We are using solr
8.3.

Currently, I have a solrcloud setup with 2 physical machines (let's call
them A and B), and my index is divided into 2 shards, and 2 replicas, such
that each machine has a full copy of the index.
The nodes and replicas are as follows:
Machine A:
  core_node3 / shard1_replica_n1
  core_node7 / shard2_replica_n4
Machine B:
  core_node5 / shard1_replica_n2
  core_node8 / shard2_replica_n6

My Zookeeper setup uses 3 instances. It's also the case that most of the
searches we do, we have results returning from both shards (from the same
search).

My experiments indicate that our setup is cpu-bound.
Due to cost constraints, I could, either, double the cpu in each of the 2
machines, or make it a 4-machine setup (using current size machines) and 2
shards and 4 replicas (or 4 shards w/ 4 replicas). I assume that keeping
the full index on all machines will allow all searches to be evenly
distributed.

Does anyone have any insights on what would be better for maximizing
throughput on multiple searches being done at the same time?
thanks!

Reinaldo


SOLR cache tuning

2020-06-01 Thread Tarun Jain
Hi,I have a SOLR installation in master-slave configuration. The slave is used 
only for reads and master for writes.
I wanted to know if there is anything I can do to improve the performance of 
the readonly Slave instance?
I am running SOLR 8.5 and Java 14. The JVM has 24GB of ram allocated. Server 
has 256 GB of RAM with about 50gb free (rest being used by other services on 
the server)The index is 15gb in size with about 2 million documents.
We do a lot of queries where documents are fetched using filter queries and a 
few times all 2 million documents are read.My initial idea to speed up SOLR is 
that given the amount of memory available, SOLR should be able to keep the 
entire index on the heap (I know OS will also cache the disk blocks) 
My solrconfig has the following:
  20  
   
true 
20 
200 
false 
2 
I have modified the documentCache size to 8192 from 512 but it has not helped 
much. 
I know this question has probably been asked a few times and I have read 
everything I could find out about SOLR cache tuning. I am looking for some more 
ideas.

Any ideas?
Tarun Jain-=-

Re: Solr 6.6.2 build from source is failing

2020-06-01 Thread sevanthi
Hi Even I am facing issue with ant build for lucene6.0 version, even after
changing the URLS from http to https .

Can you please provide the exact change files to make ant build to be work.





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Not all EML files are indexing during indexing

2020-06-01 Thread Charlie Hull

Hi Edwin,

What code is actually doing the indexing? AFAIK Solr doesn't include any 
code for actually walking a folder, extracting the content from .eml 
files and pushing this data into its index, so I'm guessing you've built 
something external?


Charlie


On 01/06/2020 02:13, Zheng Lin Edwin Yeo wrote:

Hi,

I am running this on Solr 7.6.0

Currently I have a situation whereby there's more than 2 million EML file
in a folder, and the folder is constantly updating the EML files with the
latest information and adding new EML files.

When I do the indexing, it is suppose to index the new EML files, and
update those index in which the EML file content has changed. However, I
found that not all new EML files are updated with each run of the indexing.

Could it be caused by the large number of files in the folder? Or due to
some other reasons?

Regards,
Edwin



--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com