Re: regarding Extracting text from Images

2019-10-23 Thread suresh pendap
so with extra libraries (tesseract?) > But Solr does not bundle those extras. > > In any case, you may want to run Tika externally to avoid the > conversion/extraction process be a burden to Solr itself. > > Regards, > Alex > > On Wed, Oct 23, 2019, 1:58 PM suresh pendap,

regarding Extracting text from Images

2019-10-23 Thread suresh pendap
Hello, I am reading the Solr documentation about integration with Tika and Solr Cell framework over here https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html I would like to know if the can Solr Cell framework also be used to extract text from the image

configuring zookeeper authentication and authorization

2018-01-24 Thread suresh pendap
Hi, I am following the Solr documentation to configure ZK authentication and ACLS from here https://lucene.apache.org/solr/guide/6_6/zookeeper-access-control.html I am planning to go with MD5 Digest authentication mechanism I am assuming that you still have to enable authentication on the

Re: regarding exposing merge metrics

2018-01-11 Thread suresh pendap
Hi Shawn, Thanks for replying to my questions. So is it correct to assume that exposing merge metrics is not known to cause any performance degradation? -suresh On Wed, Jan 10, 2018 at 5:40 PM, Shawn Heisey wrote: > On 1/10/2018 11:08 AM, S G wrote: > >> Last comment by

Re: regarding exposing merge metrics

2018-01-09 Thread suresh pendap
le most of > the > > others are reported directly? > > > > Thanks > > SG > > > > > > > > On Mon, Jan 8, 2018 at 2:02 PM, suresh pendap <sureshfors...@gmail.com> > > wrote: > > > >> Hi, > >> I am following the instruc

regarding exposing merge metrics

2018-01-08 Thread suresh pendap
Hi, I am following the instructions from https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html in order to expose the Index merge related metrics. The document says that we have to add the below snippet in order to expose the merge metrics ... 524288 true

Re: Scaling issue with Solr

2017-12-27 Thread Suresh Pendap
What is the downside of configuring ramBufferSizeMB to be equal to 5GB ? Is it only that the window of time for flush is larger, so recovery time will be higher in case of a crash? Thanks Suresh On 12/27/17, 1:34 PM, "Erick Erickson" wrote: You are probably

Re: merge metrics not showing up in Jconsole

2017-12-04 Thread suresh pendap
Hi, I wanted to check if it is a known issue that the merge metrics are not exposed as JMX beans. Any one else in the community ran into this issue? Thanks Suresh On Sun, Dec 3, 2017 at 4:24 PM, suresh pendap <sureshfors...@gmail.com> wrote: > I see only these metrics in my Jconso

Re: merge metrics not showing up in Jconsole

2017-12-03 Thread suresh pendap
I see only these metrics in my Jconsole window [image: Inline image 1] On Sun, Dec 3, 2017 at 4:19 PM, suresh pendap <sureshfors...@gmail.com> wrote: > Hi, > I am using Solr version 6.6.0 and am following the document > https://lucene.apache.org/solr/guide/6_6/metrics-repor

merge metrics not showing up in Jconsole

2017-12-03 Thread suresh pendap
Hi, I am using Solr version 6.6.0 and am following the document https://lucene.apache.org/solr/guide/6_6/metrics-reporting.html#index-merge-metrics to enable Index merge metrics. I have added the below config to my solrconfig.xml file 524288 true true ...

Solr merge related metrics not showing up in jconsole

2017-12-03 Thread suresh pendap
Hello, I am using Solr version 6.6 and I am following the document to get the Segment merge related metrics https://lucene.apache.org/solr/guide/6_6/metrics-reporting.html#index-merge-metrics I added the configuration to expose the merge related metrics to my solrconfig.xml file as below

Re: query with wild card with AND taking lot of time

2017-08-31 Thread suresh pendap
hat clause entirely > > > 2) Because all your clauses are more like filters and are ANDed > together, > > > you'll likely get better performance by putting them _each_ in an fq > > > E.g. > > > fq=product_identifier_type:DOTCOM_OFFER > > > fq=abstract_or_primar

query with wild card with AND taking lot of time

2017-08-31 Thread suresh pendap
Hello everybody, We are seeing that the below query is running very slow and taking almost 4 seconds to finish [] webapp=/solr path=/select

Re: identifying source of queries

2017-08-09 Thread suresh pendap
ery request should be > propagated to all query nodes as well. > > On Wed, Aug 9, 2017 at 8:58 AM, suresh pendap <sureshfors...@gmail.com> > wrote: > > Hi, > > We have found that application teams often fire ad-hoc queries, some of > > these are very expensive queries

identifying source of queries

2017-08-08 Thread suresh pendap
Hi, We have found that application teams often fire ad-hoc queries, some of these are very expensive queries and can bring the solr cluster down. Some times they just build custom scripts which does some offline analytics by firing expensive queries, the solr cluster was originally not sized for

precedence for configurations in solrconfig.xml file

2017-07-26 Thread suresh pendap
Hi, If I have a configoverlay.json file with the below content {"props":{"updateHandler":{"autoCommit":{ "maxTime":5, "maxDocs":1 and I also have a JVM properties set on the Solr JVM instance as -Dsolr.autoCommit.maxtime=2 -Dsolr.autoCommit.maxDocs=10 I

Re: regarding cursorMark feature for deep pagination

2017-07-19 Thread suresh pendap
on-of-large-result-sets/ > > Best, > Erick > > On Tue, Jul 18, 2017 at 10:00 PM, suresh pendap <sureshfors...@gmail.com> > wrote: > > Hi, > > > > This question is more about the Implementation detail of the cursorMark > > feature. > > > > I

regarding cursorMark feature for deep pagination

2017-07-18 Thread suresh pendap
Hi, This question is more about the Implementation detail of the cursorMark feature. I was reading about using the cursorMark feature for deep pagination in Solr mentioned in this blog http://yonik.com/solr/paging-and-deep-paging/ It is not clear to me as to how it is more efficient as compared

default values for numRecordsToKeep and maxNumLogsToKeep

2017-07-18 Thread suresh pendap
Hi, After looking at the source code I see that the default values for numRecordsToKeep is 100 and maxNumLogsToKeep is 10. So it seems by default the replica can only have 1000 document updates lag before the replica goes for a Full recovery from the leader. I would like to know the rationale

Re: Why do Solr nodes go into Recovery status

2017-06-06 Thread suresh pendap
ocs by default, but can be set in solrconfig.xml if necessary. If > that limit is exceeded, then indeed the entire index is copied from > the leader. > > Best, > Erick > > > > On Mon, Jun 5, 2017 at 5:18 PM, suresh pendap <sureshfors...@gmail.com> > wrote: > > Hi,

Re: Why do Solr nodes go into Recovery status

2017-06-06 Thread suresh pendap
But if the follower isn't "too far" behind it can be > > brought back into sync from via "peer sync" where it gets the missed > > docs sent to it from the tlog of a healthy replica. "Too far" is 100 > > docs by default, but can be set in solrconfig.xml

Why do Solr nodes go into Recovery status

2017-06-05 Thread suresh pendap
Hi, Why and in what scenarios do Solr nodes go into recovery status? Given that Solr is a CP system it means that the writes for a Document index are acknowledged only after they are propagated and acknowledged by all the replicas of the Shard. This means that technically the replica nodes

Re: EXT: Re: Query regarding Solr Caches

2017-05-12 Thread Suresh Pendap
eisey" <apa...@elyograg.org> wrote: >On 5/11/2017 4:58 PM, Suresh Pendap wrote: >> This question might have been asked on the solr user mailing list >>earlier. Solr has four different types of Cache DocumentCache, >>QueryResultCache, FieldValueCache and FilterQue

regarding Solr Caches

2017-05-11 Thread suresh pendap
Hi, This question might have been asked on the solr user mailing list earlier. Solr has four different types of Cache DocumentCache, QueryResultCache, FieldValueCache and FilterQueryCache. Are these Caches memory mapped or they reside in the JVM heap? Which Caches have the maximum impact on the

Query regarding Solr Caches

2017-05-11 Thread Suresh Pendap
Hi, This question might have been asked on the solr user mailing list earlier. Solr has four different types of Cache DocumentCache, QueryResultCache, FieldValueCache and FilterQueryCache I would like to know which of these Caches are off heap cache? Which Caches have the maximum impact on the

Re: EXT: Re: Solr Query Performance benchmarking

2017-05-05 Thread Suresh Pendap
Thanks everyone for taking time to respond to my email. I think you are correct in that the query results might be coming from main memory as I only had around 7k queries. However it is still not clear to me, given that everything was being served from main memory, why is that I am not able to

Regarding rule based replica placement in Solr

2017-05-05 Thread Suresh Pendap
Hi, I read the documentation about this feature over here. I did not get a very clear understanding of how this feature works and how I can configure it in Solr. This is what I would like to achieve and am

Solr Query Performance benchmarking

2017-04-27 Thread Suresh Pendap
Hi, I am trying to perform Solr Query performance benchmarking and trying to measure the maximum throughput and latency that I can get from.a given Solr cluster. Following are my configurations Number of Solr Nodes: 4 Number of shards: 2 replication-factor: 2 Index size: 55 GB Shard/Core

Re: unable to get more throughput with more threads

2017-03-24 Thread Suresh Pendap
to a larger value and see if that gives me some benefit. Thanks Suresh On 3/24/17 6:05 AM, "Shawn Heisey" <apa...@elyograg.org> wrote: >On 3/23/2017 6:10 PM, Suresh Pendap wrote: >> I performed the test with 1 thread, 10 client threads and 50 client >> threads

Re: unable to get more throughput with more threads

2017-03-24 Thread Suresh Pendap
> >>> On 24 March 2017 at 09:21, Matt Magnusson <magnuss...@gmail.com> wrote: >>> >>> > Out of curosity, what is your index size? I'm trying to do something >>> > similar with maximizing output, I'm currently looking at streaming >>> > expre

Re: unable to get more throughput with more threads

2017-03-23 Thread Suresh Pendap
that about 10 threads seems to be an >> optimum number. >> >> On Thu, Mar 23, 2017 at 8:10 PM, Suresh Pendap <spen...@walmartlabs.com> >> wrote: >> > Hi, >> > I am new to SOLR search engine technology and I am trying to get some >> performance

Re: unable to get more throughput with more threads

2017-03-23 Thread Suresh Pendap
I am using version 6.3 of Solr On 3/23/17 7:56 PM, "Aman Deep Singh" wrote: >system

unable to get more throughput with more threads

2017-03-23 Thread Suresh Pendap
Hi, I am new to SOLR search engine technology and I am trying to get some performance numbers to get maximum throughput from the SOLR cluster of a given size. I am currently doing only query load testing in which I randomly fire a bunch of queries to the SOLR cluster to generate the query load.

backward compatibility of Solr 6.3 version with old Sol4j clients

2017-02-03 Thread Suresh Pendap
Hi, Will Solrj client 4.10.3 version work with Solr 6.3 version of the server? I was trying to look up the documentation but no where the compatibility matrix between server and client is provided. Has some one already used this combination? Regards Suresh