from:"Dominique Bejean"

Re: org.apache.solr.common.SolrException: this IndexWriter is closed

2021-03-05 Thread Dominique Bejean

Hi,
You are using RAMDirectoryFactory without enough RAM ?
regards
Dominique

Le ven. 5 mars 2021 à 16:18, 李世明  a écrit :

> Hello:
>
> Have you encountered the following exception that will cause the index to
> not be written? But you can query
> Version:8.7.0
>
> org.apache.solr.common.SolrException: this IndexWriter is closed
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:234)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
> at
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at org.eclipse.jetty.server.Server.handle(Server.java:500)
> at
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
> at
> org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
> at org.eclipse.jetty.server.HttpChannel.run(HttpChannel.java:335)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
> at
> org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:170)
> at
> org.eclipse.jetty.http2.HTTP2Connection.onFillable(HTTP2Connection.java:125)
> at
> org.eclipse.jetty.http2.HTTP2Connection$FillableCallback.succeeded(HTTP2Connection.java:348)
> at org.eclipse.jetty.io
> .FillInterest.fillable(FillInterest.java:103)
> at org.eclipse.jetty.io
> .ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
> at
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
> at java.base/java.lang.Thread.run(Unknown Source)
> Caused by:

Re: Collection Creation across DC

2021-02-11 Thread Dominique Bejean

Hi,

Sorry, it is in French, but here is my suggestion in order to replace the
deprecated CDCR and achieve HA
https://www.eolya.fr/2020/11/16/solrcloud-disaster-recovery-alternative-a-cdcr/

In short, each shard has one PULL replica on remote datacenter and these
PULL replicas are excluded from search by using shard.preference query
parameter.

Regards

Dominique



Le mer. 10 févr. 2021 à 22:05, Revas  a écrit :

> Hello,
>
> Can we create a collection across data Center ( shard replica is in a
> different data center)
> for HA ?
>
> Thanks
> Revas
>

Re: NRT - Indexing

2021-02-02 Thread Dominique Bejean

Hi,

The issue was buildOnCommit=true on a SuggestComponent.

Dominique

Le mar. 2 févr. 2021 à 00:54, Shawn Heisey  a écrit :

> On 2/1/2021 12:08 AM, haris.k...@vnc.biz wrote:
> > Hope you're doing good. I am trying to configure NRT - Indexing in my
> > project. For this reason, I have configured *autoSoftCommit* to execute
> > every second and *autoCommit* to execute every 5 minutes. Everything
> > works as expected on the dev and test server. But on the production
> > server, there are more than 6 million documents indexed in Solr, so
> > whenever a new document is indexed it takes 2-3 minutes before appearing
> > in the search despite the setting I have described above. Since the
> > target is to develop a real-time system, this delay of 2-3 minutes is
> > not acceptable. How can I reduce this time window?
>
> Setting autoSoftCommit with a max time of 1000 (one second) does not
> mean you will see changes within one second.  It means that one second
> after indexing begins, Solr will start a soft commit operation.  That
> commit operation must fully complete and the new searcher must come
> online before changes are visible.  Those steps may take much longer
> than one second, which seems to be happening on your system.
>
> With the information available, I cannot tell you why your commits are
> taking so long.  One of the most common reasons for poor Solr
> performance is a lack of free memory on the system for caching purposes.
>
> Thanks,
> Shawn
>

Re: NRT - Indexing

2021-02-01 Thread Dominique Bejean

Hi,

It is not the cause of your issue, but Solr version is 8.6.0, and
solrconfig.xml includes
7.5.0

By "I am using a service that fetches data from the Postgres database and
indexes it to solr. The service runs with a delay of 5 seconds.". You man,
you are using DIH and launch a delta-import each 5 seconds ?

Solr logs may help.

Dominique



Le lun. 1 févr. 2021 à 13:00,  a écrit :

> Hello,
>
>
> I am attaching the solrconfig.xml along with this email, also I am
> attaching a text document that has JSON object regarding the system
> information I am using a service that fetches data from the Postgres
> database and indexes it to solr. The service runs with a delay of 5 seconds.
>
>
> Regards
>
>
> Mit freundlichen Grüssen / Kind regards
>
>
> Muhammad Haris Khan
>
>
> *VNC - Virtual Network Consult*
>
>
> *-- Solr Ingenieur --*
>
>
> - On 1 February, 2021 3:50 PM, Dominique Bejean <
> dominique.bej...@eolya.fr> wrote:
>
>
>
> Hi,
>
>
> What is your Solr version ?
>
> Can you share your solrconfig.xml ?
>
> How is your sharding ?
>
> Did you grep your solr logs on with the "commit' pattern in order to see
>
> hard and soft commit occurrences ?
>
> How are you pushing new docs or updates in the collection ?
>
>
> Regards.
>
>
> Dominique
>
>
>
>
>
> Le lun. 1 févr. 2021 à 08:08,  a écrit :
>
>
> > Hello,
>
> >
>
> > Hope you're doing good. I am trying to configure NRT - Indexing in my
>
> > project. For this reason, I have configured *autoSoftCommit* to execute
>
> > every second and *autoCommit* to execute every 5 minutes. Everything
>
> > works as expected on the dev and test server. But on the production
> server,
>
> > there are more than 6 million documents indexed in Solr, so whenever a
> new
>
> > document is indexed it takes 2-3 minutes before appearing in the search
>
> > despite the setting I have described above. Since the target is to
> develop
>
> > a real-time system, this delay of 2-3 minutes is not acceptable. How can
> I
>
> > reduce this time window?
>
> >
>
> > Plus any advice on better scaling the Solr considering more than 6
> million
>
> > records would be very helpful. Thank you in advance.
>
> >
>
> >
>
> >
>
> > Mit freundlichen Grüssen / Kind regards
>
> >
>
> > Muhammad Haris Khan
>
> >
>
> > *VNC - Virtual Network Consult*
>
> >
>
> > *-- Solr Ingenieur --*
>
> >
>

Re: Tweaking Shards and Replicas for high volume queries and updates

2021-02-01 Thread Dominique Bejean

Hi,

Some suggestions.

* 64GB JVM Heap
Are you sure you really need this heap size ? Did you check in your GC logs
(with gceasy.io) ?
A best practice is to minimize as possible the heap size and never more
than 31 GB.

* OS Caching
Did you set swappiness to 1 ?

* Put two instances of Solr on each node
You need to check resource usage in order to evaluate if it could be
interesting (CPU usage, CPU load average, CPU iowait, Heap usage, Disk I/O
read and write, MMAP caching, ...)
Load Average high with CPU Load low looks like Disk I/O can be the
bottleneck. I would consider increasing the number of physical servers with
less CPU, RAM and disk space on each (but globally with the same quantity
of CPU, RAM and disk space). This will increase the disk I/O capacity.

* Collection 4 is the trouble collection
Try to have smaller cores (more shards if you increase the number of Solr
instances)
Investigate in time routed ou category routed aliases if it can match with
your update strategy and/or your queries profiles.
Work again on shema :
- For docValues=true fields, check if you really need indexed=true and
storted=true (there are a lot of considerations to take in account), ...
- Over-indexing with copyfield ?
Work on queries : facets, group, collapse, fl=, rows=, ...

Regards

Dominique


Le mer. 27 janv. 2021 à 14:53, Hollowell,Skip  a écrit :

> 30 Dedicated physical Nodes in the Solr Cloud Cluster, all of identical
> configuration
> Server01   RHEL 7.x
> 256GB RAM
> 10 2TB Spinning Disk in a RAID 10 Configuration (Leaving us 9.8TB usable
> per node)
> 64GB JVM Heap, Tried has high as 100GB, but it appeared that 64GB was
> faster.  If we set a higher heap, do we starve the OS for caching?
> Huge Pages is off on the system, and thus UseLargePages is off on Solr
> Startup
> G1GC, Java 11  (ZGC with Java 15 and HugePages turned on was a disaster.
> We suspect it was due to the Huge Pages configuration)
> At one time we discussed putting two instances of Solr on each node,
> giving us a cloud of 60 instances instead of 30.  Load Average is high on
> these nodes during certain types of queries or updates, but CPU Load is
> relatively low and should be able to accommodate a second instance, but all
> the data would still be on the same RAID10 group of disks.
> Collection 4 is the trouble collection.  It has nearly a billion
> documents, and there are between 200 and 400 million updates every day.
> How do we get that kind of update performance, and still serve 10 million
> queries a day?  Schemas have been reviewed and re-reviewed to ensure we are
> only indexing and storing what is absolutely necessary.  What are we
> missing?  Do we need to revisit our replica policy?  Number of replicas or
> types of replicas (to ensure some are only used for reading, etc?)
> [Grabbed from the Admin UI]
> 755.6Gb Index Size according to Solr Cloud UI
> Total #docs: 371.8mn
> Avg size/doc: 2.1Kb
> 90 Shards, 2 NRT Replicas per Shard, 1,750,612,476 documents, avg
> size/doc: 1.7Kb, uses nested documents
> collection-1_s69r317   31.1Gb
> collection-1_s49r96 30.7Gb
> collection-1_s78r154   30.2Gb
> collection-1_s40r259   30.1Gb
> collection-1_s9r197 29.1Gb
> collection-1_s18r34 28.9Gb
> 120 Shards, 2 TLOG Replicas per Shard, 2,230,207,046 documents, avg
> size/doc: 1.3Kb
> collection-2_s78r154   22.8Gb
> collection-2_s49r96 22.8Gb
> collection-2_s46r331   22.8Gb
> collection-2_s18r34 22.7Gb
> collection-2_s109r21622.7Gb
> collection-2_s104r44722.7Gb
> collection-2_s15r269   22.7Gb
> collection-2_s73r385   22.7Gb
> 120 Shards, 2 TLOG Replicas per Shard, 733,588,503 documents, avg
> size/doc: 1.9Kb
> collection-3_s19r277   10.6Gb
> collection-3_s108r21410.6Gb
> collection-3_s48r94 10.6Gb
> collection-3_s109r45710.6Gb
> collection-3_s47r333   10.5Gb
> collection-3_s78r154   10.5Gb
> collection-3_s18r34 10.5Gb
> collection-3_s77r393   10.5Gb
>
> 120 Shards, 2 TLOG Replicas per Shard, 864,372,654 documents, avg
> size/doc: 5.6Kb
> collection-4_s109r21638.7Gb
> collection-4_s100r43938.7Gb
> collection-4_s49r96 38.7Gb
> collection-4_s35r309   38.6Gb
> collection-4_s18r34 38.6Gb
> collection-4_s78r154   38.6Gb
> collection-4_s7r253 38.6Gb
> collection-4_s69r377   38.6Gb
>

Re: NRT - Indexing

2021-02-01 Thread Dominique Bejean

Hi,

What is your Solr version ?
Can you share your solrconfig.xml ?
How is your sharding ?
Did you grep your solr logs on with the "commit' pattern in order to see
hard and soft commit occurrences ?
How are you pushing new docs or updates in the collection ?

Regards.

Dominique




Le lun. 1 févr. 2021 à 08:08,  a écrit :

> Hello,
>
> Hope you're doing good. I am trying to configure NRT - Indexing in my
> project. For this reason, I have configured *autoSoftCommit* to execute
> every second and *autoCommit* to execute every 5 minutes. Everything
> works as expected on the dev and test server. But on the production server,
> there are more than 6 million documents indexed in Solr, so whenever a new
> document is indexed it takes 2-3 minutes before appearing in the search
> despite the setting I have described above. Since the target is to develop
> a real-time system, this delay of 2-3 minutes is not acceptable. How can I
> reduce this time window?
>
> Plus any advice on better scaling the Solr considering more than 6 million
> records would be very helpful. Thank you in advance.
>
>
>
> Mit freundlichen Grüssen / Kind regards
>
> Muhammad Haris Khan
>
> *VNC - Virtual Network Consult*
>
> *-- Solr Ingenieur --*
>

Re: Solrcloud load balancing / failover

2020-12-26 Thread Dominique Bejean

Hi,
Thank you for your response.
Dominique

Le mar. 15 déc. 2020 à 08:06, Shalin Shekhar Mangar 
a écrit :

> No, the load balancing is based on random selection of replicas and
> CPU is not consulted. There are limited ways to influence the replica
> selection, see
> https://lucene.apache.org/solr/guide/8_4/distributed-requests.html#shards-preference-parameter
>
> If a replica fails then the query fails and an error is returned. I
> think (but I am not sure) that SolrJ retries the request on some
> specific errors in which case a different replica may be selected and
> the request may succeed.
>
> IMO, these are two weak areas of Solr right now. Suggestions/patches
> are welcome :-)
>
> On 12/11/20, Dominique Bejean  wrote:
> > Hi,
> >
> > Is there in Solrcloud any load balancing based on CPU load on Solr nodes
> ?
> >
> > If for shard a replica fails to handle a query, the query is sent to
> > another replica in order to be completed ?
> >
> > Regards
> >
> > Dominique
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Solrcloud load balancing / failover

2020-12-11 Thread Dominique Bejean

Hi,

Is there in Solrcloud any load balancing based on CPU load on Solr nodes ?

If for shard a replica fails to handle a query, the query is sent to
another replica in order to be completed ?

Regards

Dominique

Re: SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-19 Thread Dominique Bejean

Shawn,

According to the log4j description (
https://bz.apache.org/bugzilla/show_bug.cgi?id=57714), the issue is related
to lock during appenders collection process.

In addition to CONSOLE and file appenders in the default log4j.properties,
my customer added 2 extra FileAppender dedicated to all requests and slow
requests. I suggested removing these two extra appenders.

Regards

Dominique



Le lun. 19 oct. 2020 à 15:48, Dominique Bejean 
a écrit :

> Hi Shawn,
>
> Thank you for your response.
>
> You are confirming my diagnosis.
>
> This is in fact a 8 nodes cluster with one single collection with 4 shards
> and 1 replica (8 cores).
>
> 4 Gb heap and 90 Gb Ram
>
>
> When no issue occurs nearly 50% of the heap is used.
>
> Num Docs in collection : 10.000.000
>
> Num Docs per core is more or less 2.500.000
>
> Max Doc per core is more or less 3.000.000
>
> Core Data size is more or less 70 Gb
>
> Here are the JVM settings
>
> -DSTOP.KEY=solrrocks
>
> -DSTOP.PORT=7983
>
> -Dcom.sun.management.jmxremote
>
> -Dcom.sun.management.jmxremote.authenticate=false
>
> -Dcom.sun.management.jmxremote.local.only=false
>
> -Dcom.sun.management.jmxremote.port=18983
>
> -Dcom.sun.management.jmxremote.rmi.port=18983
>
> -Dcom.sun.management.jmxremote.ssl=false
>
> -Dhost=
>
> -Djava.rmi.server.hostname=XXX
>
> -Djetty.home=/x/server
>
> -Djetty.port=8983
>
> -Dlog4j.configuration=file:/xx/log4j.properties
>
> -Dsolr.install.dir=/xx/solr
>
> -Dsolr.jetty.request.header.size=32768
>
> -Dsolr.log.dir=/xxx/Logs
>
> -Dsolr.log.muteconsole
>
> -Dsolr.solr.home=//data
>
> -Duser.timezone=Europe/Paris
>
> -DzkClientTimeout=3
>
> -DzkHost=xxx
>
> -XX:+CMSParallelRemarkEnabled
>
> -XX:+CMSScavengeBeforeRemark
>
> -XX:+ParallelRefProcEnabled
>
> -XX:+PrintGCApplicationStoppedTime
>
> -XX:+PrintGCDateStamps
>
> -XX:+PrintGCDetails
>
> -XX:+PrintGCTimeStamps
>
> -XX:+PrintHeapAtGC
>
> -XX:+PrintTenuringDistribution
>
> -XX:+UseCMSInitiatingOccupancyOnly
>
> -XX:+UseConcMarkSweepGC
>
> -XX:+UseGCLogFileRotation
>
> -XX:+UseGCLogFileRotation
>
> -XX:+UseParNewGC
>
> -XX:-OmitStackTraceInFastThrow
>
> -XX:CMSInitiatingOccupancyFraction=50
>
> -XX:CMSMaxAbortablePrecleanTime=6000
>
> -XX:ConcGCThreads=4
>
> -XX:GCLogFileSize=20M
>
> -XX:MaxTenuringThreshold=8
>
> -XX:NewRatio=3
>
> -XX:NumberOfGCLogFiles=9
>
> -XX:OnOutOfMemoryError=/xxx/solr/bin/oom_solr.sh
>
> 8983
>
> /xx/Logs
>
> -XX:ParallelGCThreads=4
>
> -XX:PretenureSizeThreshold=64m
>
> -XX:SurvivorRatio=4
>
> -XX:TargetSurvivorRatio=90
>
> -Xloggc:/xx/solr_gc.log
>
> -Xloggc:/xx/solr_gc.log
>
> -Xms4g
>
> -Xmx4g
>
> -Xss256k
>
> -verbose:gc
>
>
>
> Here is one screenshot of top command for the node that failed last week.
>
> [image: 2020-10-19 15_48_06-Photos.png]
>
> Regards
>
> Dominique
>
>
>
> Le dim. 18 oct. 2020 à 22:03, Shawn Heisey  a écrit :
>
>> On 10/18/2020 3:22 AM, Dominique Bejean wrote:
>> > A few months ago, I reported an issue with Solr nodes crashing due to
>> the
>> > old generation heap growing suddenly and generating OOM. This problem
>> > occurred again this week. I have threads dumps for each minute during
>> the 3
>> > minutes the problem occured. I am using fastthread.io in order to
>> analyse
>> > these dumps.
>>
>> 
>>
>> > * The Log4j issue starts (
>> > https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/)
>>
>> If the log4j bug is the root cause here, then the only way you can fix
>> this is to upgrade to at least Solr 7.4.  That is the Solr version where
>> we first upgraded from log4j 1.2.x to log4j2.  You cannot upgrade log4j
>> in Solr 6.6.2 without changing Solr code.  The code changes required
>> were extensive.  Note that I did not do anything to confirm whether the
>> log4j bug is responsible here.  You seem pretty confident that this is
>> the case.
>>
>> Note that if you upgrade to 8.x, you will need to reindex from scratch.
>> Upgrading an existing index is possible with one major version bump, but
>> if your index has ever been touched by a release that's two major
>> versions back, it won't work.  In 8.x, that is enforced -- 8.x will not
>> even try to read an old index touched by 6.x or earlier.
>>
>> In the following wiki page, I provided instructions for getting a
>> screenshot of the process listing.
>>
>> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems
>>
>> In addition to that screenshot, I would like to know the on-disk size of
>> all the cores running on the problem node, along with a document count
>> from those cores.  It might be possible to work around the OOM just by
>> increasing the size of the heap.  That won't do anything about problems
>> with log4j.
>>
>> Thanks,
>> Shawn
>>
>

Re: SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-19 Thread Dominique Bejean

Hi Shawn,

Thank you for your response.

You are confirming my diagnosis.

This is in fact a 8 nodes cluster with one single collection with 4 shards
and 1 replica (8 cores).

4 Gb heap and 90 Gb Ram


When no issue occurs nearly 50% of the heap is used.

Num Docs in collection : 10.000.000

Num Docs per core is more or less 2.500.000

Max Doc per core is more or less 3.000.000

Core Data size is more or less 70 Gb

Here are the JVM settings

-DSTOP.KEY=solrrocks

-DSTOP.PORT=7983

-Dcom.sun.management.jmxremote

-Dcom.sun.management.jmxremote.authenticate=false

-Dcom.sun.management.jmxremote.local.only=false

-Dcom.sun.management.jmxremote.port=18983

-Dcom.sun.management.jmxremote.rmi.port=18983

-Dcom.sun.management.jmxremote.ssl=false

-Dhost=

-Djava.rmi.server.hostname=XXX

-Djetty.home=/x/server

-Djetty.port=8983

-Dlog4j.configuration=file:/xx/log4j.properties

-Dsolr.install.dir=/xx/solr

-Dsolr.jetty.request.header.size=32768

-Dsolr.log.dir=/xxx/Logs

-Dsolr.log.muteconsole

-Dsolr.solr.home=//data

-Duser.timezone=Europe/Paris

-DzkClientTimeout=3

-DzkHost=xxx

-XX:+CMSParallelRemarkEnabled

-XX:+CMSScavengeBeforeRemark

-XX:+ParallelRefProcEnabled

-XX:+PrintGCApplicationStoppedTime

-XX:+PrintGCDateStamps

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-XX:+PrintHeapAtGC

-XX:+PrintTenuringDistribution

-XX:+UseCMSInitiatingOccupancyOnly

-XX:+UseConcMarkSweepGC

-XX:+UseGCLogFileRotation

-XX:+UseGCLogFileRotation

-XX:+UseParNewGC

-XX:-OmitStackTraceInFastThrow

-XX:CMSInitiatingOccupancyFraction=50

-XX:CMSMaxAbortablePrecleanTime=6000

-XX:ConcGCThreads=4

-XX:GCLogFileSize=20M

-XX:MaxTenuringThreshold=8

-XX:NewRatio=3

-XX:NumberOfGCLogFiles=9

-XX:OnOutOfMemoryError=/xxx/solr/bin/oom_solr.sh

8983

/xx/Logs

-XX:ParallelGCThreads=4

-XX:PretenureSizeThreshold=64m

-XX:SurvivorRatio=4

-XX:TargetSurvivorRatio=90

-Xloggc:/xx/solr_gc.log

-Xloggc:/xx/solr_gc.log

-Xms4g

-Xmx4g

-Xss256k

-verbose:gc



Here is one screenshot of top command for the node that failed last week.

[image: 2020-10-19 15_48_06-Photos.png]

Regards

Dominique



Le dim. 18 oct. 2020 à 22:03, Shawn Heisey  a écrit :

> On 10/18/2020 3:22 AM, Dominique Bejean wrote:
> > A few months ago, I reported an issue with Solr nodes crashing due to the
> > old generation heap growing suddenly and generating OOM. This problem
> > occurred again this week. I have threads dumps for each minute during
> the 3
> > minutes the problem occured. I am using fastthread.io in order to
> analyse
> > these dumps.
>
> 
>
> > * The Log4j issue starts (
> > https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/)
>
> If the log4j bug is the root cause here, then the only way you can fix
> this is to upgrade to at least Solr 7.4.  That is the Solr version where
> we first upgraded from log4j 1.2.x to log4j2.  You cannot upgrade log4j
> in Solr 6.6.2 without changing Solr code.  The code changes required
> were extensive.  Note that I did not do anything to confirm whether the
> log4j bug is responsible here.  You seem pretty confident that this is
> the case.
>
> Note that if you upgrade to 8.x, you will need to reindex from scratch.
> Upgrading an existing index is possible with one major version bump, but
> if your index has ever been touched by a release that's two major
> versions back, it won't work.  In 8.x, that is enforced -- 8.x will not
> even try to read an old index touched by 6.x or earlier.
>
> In the following wiki page, I provided instructions for getting a
> screenshot of the process listing.
>
> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems
>
> In addition to that screenshot, I would like to know the on-disk size of
> all the cores running on the problem node, along with a document count
> from those cores.  It might be possible to work around the OOM just by
> increasing the size of the heap.  That won't do anything about problems
> with log4j.
>
> Thanks,
> Shawn
>

SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-18 Thread Dominique Bejean

Hi,

A few months ago, I reported an issue with Solr nodes crashing due to the
old generation heap growing suddenly and generating OOM. This problem
occurred again this week. I have threads dumps for each minute during the 3
minutes the problem occured. I am using fastthread.io in order to analyse
these dumps.

The threads scenario on failing node is

=== 15h54 -> it looks fine
Old gen heap : 0,5 Gb (3Gb max)
67 threads TIMED_WAITING
26 threads RUNNABLE
7 threads WAITING

=== 15h55 -> fastthreads reports few suspects
Old gen heap stars growing : from 0,5 Gb to 2 Gb (3Gb max)
42 threads TIMED_WAITING
41 threads RUNNABLE
10 threads WAITING

The first symptom is 8 runnable threads are stuck  (same stack trace)
waiting for response from some other nodes

java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at
org.apache.http.impl.io.SocketInputBuffer.isDataAvailable(SocketInputBuffer.java:95)
at
org.apache.http.impl.AbstractHttpClientConnection.isStale(AbstractHttpClientConnection.java:310)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.isStale(ManagedClientConnectionImpl.java:158)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:433)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:515)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:447)
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:388)
at
org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(HttpShardHandlerFactory.java:302)
at
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:166)
at
org.apache.solr.handler.component.HttpShardHandler$$Lambda$192/1788637481.call(Unknown
Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$15/986729174.run(Unknown
Source)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


=== 15h56 -> fastthreads reports issue
Old gen heap full : from 3Gb (3Gb max)
57 threads TIMED_WAITING
126 threads RUNNABLE
18 threads WAITING
14 threads BLOCKED

7 runnable threads are still stuck  (same stack trace) waiting for response
from some other nodes

1  BLOCKED thread obtained org.apache.log4j.Logger's lock & did not release
it due to that 13 threads are BLOCKED (same stack trace) on
org.apache.log4j.Category.callAppenders

java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.log4j.Category.callAppenders(Category.java:204)
- waiting to lock <0x0007005a6f08> (a org.apache.log4j.Logger)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:304)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2482)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at

Re: Returning fields a specific order

2020-09-29 Thread Dominique Bejean

Hi,

If data are in json format, you should use jq -S
https://stackoverflow.com/a/38210345/5998915

Regards

Dominique


Le lun. 28 sept. 2020 à 18:30, gnandre  a écrit :

> Hi,
>
> I have a use-case where I want to compare stored fields values of Solr
> documents from two different Solr instances. I can use a diff tool to
> compare them but only if they returned the fields in specific order in the
> response. I tried setting fl param with all the fields specified in
> particular order. However, the results that are returned do not follow
> specific order given in fl param. Is there any way to achieve this behavior
> in Solr?
>

Re: Delete from Solr console fails

2020-09-25 Thread Dominique Bejean

Hi Goutham,

I agree with Rahul, avoid large deletebyquery.
It you can, prefere one query to get all the ids first than use ids with
deletebyid

Regards

Dominique


Le ven. 25 sept. 2020 à 06:50, Goutham Tholpadi  a
écrit :

> I spoke too soon. I am getting the "Connection lost" error again.
>
> I have never faced this problem when there are a small number of docs in
> the index. I was wondering if the size of the index (30M docs) has anything
> to do with this.
>
> Thanks
> Goutham
>
> On Fri, Sep 25, 2020 at 9:55 AM Goutham Tholpadi 
> wrote:
>
> > Thanks for your response Rahul!
> >
> > Yes, all the fields I tried with were indexed=true, but it did not work.
> >
> > Btw, when I try to today, I am no longer getting the "Connection lost"
> > error. The delete command returns with status=success, however the
> document
> > is not actually deleted when I check in the search console again.
> >
> > I tried using Document Type as XML just now and I see the same behaviour
> > as above.
> >
> > Thanks
> > Goutham
> >
> > On Fri, Sep 25, 2020 at 7:17 AM Rahul Goswami 
> > wrote:
> >
> >> Goutham,
> >> Is the field you are trying to delete by indexed=true in the schema ?
> >> If the uniqueKey is indexed=true, does delete by id work for you?
> >> ( uniqueKey:value)
> >> Also, instead of  "Solr Command" if you choose the Document type as
> "XML"
> >> does it make any difference?
> >>
> >> Rahul
> >>
> >> On Thu, Sep 24, 2020 at 1:04 PM Goutham Tholpadi 
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > Setup:
> >> > We have a stand-alone Solr (v7.2) with around 30 million documents and
> >> with
> >> > 4 cores, 38G of RAM, and a 1TB disk. The documents were not directly
> >> > indexed but came from a restore of a back from another Solr instance.
> >> >
> >> > Problem:
> >> > Search queries seem to be working fine. However, when I try to delete
> >> > documents from the Solr console, I get a "Connection to Solr lost"
> >> error. I
> >> > am trying by navigating to the "Documents" section of the chosen core,
> >> > using "Solr Command" as the "Document Type", and entering something
> >> this in
> >> > the box below:
> >> > 
> >> > 
> >> > field:value
> >> > 
> >> > 
> >> >
> >> > I tried with the field being the unique key, and otherwise. I also
> tried
> >> > with values containing wild cards. I got the error in all cases.
> >> >
> >> > Any pointers on this?
> >> >
> >> > Thanks
> >> > Goutham
> >> >
> >>
> >
>

Re: solr performance with >1 NUMAs

2020-09-25 Thread Dominique Bejean

Hi,

This would be a Java VM option, not something Solr itself can know about.
Take a look at this article in comments. May be it will help.
https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html?showComment=1347033706559#c229885263664926125

Regards

Dominique



Le jeu. 24 sept. 2020 à 03:42, Wei  a écrit :

> Hi,
>
> Recently we deployed solr 8.4.1 on a batch of new servers with 2 NUMAs. I
> noticed that query latency almost doubled compared to deployment on single
> NUMA machines. Not sure what's causing the huge difference. Is there any
> tuning to boost the performance on multiple NUMA machines? Any pointer is
> appreciated.
>
> Best,
> Wei
>

Re: Autoscaling Rule for replica distribution across zones

2020-09-21 Thread Dominique Bejean

Hi,

I also tried this 2 rules and I still have all replicas of all shards of
the collection created in one single zone

curl 'http://localhost:8983/api/cluster/autoscaling' -H
'Content-type:application/json' -d '{ "set-policy": { "policyzone": [
{"replica": "#EQUAL", "shard": "#EACH", "nodeset":[{"sysprop.zone":
"dc1"},{"sysprop.zone":  "dc2"}]} ] } }'

curl 'http://localhost:8983/api/cluster/autoscaling' -H
'Content-type:application/json' -d '{ "set-policy": { "policyzone":
[{"replica": "50%", "shard": "#EACH", "nodeset":{ "sysprop.zone": "dc1"}},

{"replica": "50%", "shard": "#EACH", "nodeset":{"sysprop.zone": "dc2"}}] }
}'

Dominique


Le ven. 18 sept. 2020 à 12:13, Dominique Bejean 
a écrit :

> Hi,
>
> I have 4 nodes solrcloud cluster. 2 nodes (solr1 and solr3) are started
> with the parametrer -Dzone=dc1 and  the 2 other nodes (solr 2 and Solr4)
> are started with the parametrer -Dzone=dc2
>
> I want to create Autoscaling placement Rule in order to equally distribute
> replicas of a shard over zone (never 2 replicas of a shard in the same
> zone). According documentation, I created this rule
>
> { "set-policy": { "policyzone": [ {"replica": "#EQUAL", "shard": "#EACH",
> "sysprop.zone": ["dc1", "dc2"]} ] } }
>
> I create a collection with 2 shards and 2 replicas, and the 4 cores are
> created on solr2 and solr4 nodes so only in zone=dc2
>
> What is wrong in my rule ?
>
> Regards.
>
> Dominique Béjean
>
>
>
>
>
>
>
>

Autoscaling Rule for replica distribution across zones

2020-09-18 Thread Dominique Bejean

Hi,

I have 4 nodes solrcloud cluster. 2 nodes (solr1 and solr3) are started
with the parametrer -Dzone=dc1 and  the 2 other nodes (solr 2 and Solr4)
are started with the parametrer -Dzone=dc2

I want to create Autoscaling placement Rule in order to equally distribute
replicas of a shard over zone (never 2 replicas of a shard in the same
zone). According documentation, I created this rule

{ "set-policy": { "policyzone": [ {"replica": "#EQUAL", "shard": "#EACH",
"sysprop.zone": ["dc1", "dc2"]} ] } }

I create a collection with 2 shards and 2 replicas, and the 4 cores are
created on solr2 and solr4 nodes so only in zone=dc2

What is wrong in my rule ?

Regards.

Dominique Béjean

Re: "timeAllowed" param with "numFound" having a count value but doc list is empty

2020-09-15 Thread Dominique Bejean

Hi,

1. Yes, your analysis is correct

2. Yes, it can occurs too with very slow query.

Regards

Dominique

Le mar. 15 sept. 2020 à 15:14, Mark Robinson  a
écrit :

> Hi,
>
> When in a sample query I used "timeAllowed" as low as 10mS, I got value for
>
> "numFound" as say 2000, but no docs were returned. But when I increased the
>
> value for timeAllowed to be in seconds, never got this scenario.
>
>
>
> I have 2 qns:-
>
> 1. Why does numFound have a value like say 2000 or even 6000 but no
>
> documents actually returned. During document collection is calculation of
>
> numFound done first and doc collection later?. Is doc list empty because,by
>
> the time doc collection started the timeAllowed cut off took effect?
>
>
>
> 2. If I give timeAllowed a value say, 10s or above do you think the above
>
> scenario of valid count displayed in numFound, but doc list empty can ever
>
> occur still, as there is more time before cut-off to retrieve at least one
>
> doc ?
>
>
>
> Thanks!
>
> Mark
>
>

Re: Multi-word Synonyms not working properly with Edismax

2020-09-08 Thread Dominique Bejean

Hi,

Can you try to remove the RemoveDuplicatesTokenFilter ?

Dominique

Le mar. 8 sept. 2020 à 13:52, Manish Bafna  a
écrit :

> Hi,
>
> We are using the following configuration:
>
>
>
> --
>
> *Schema: *
>
> 
> positionIncrementGap="100"  autoGeneratePhraseQueries="true"
>
> omitNorms="true">
>
>  
>
> 
>
> 
>
> 
>
> 
>
> 
> dictionary="../hunspell_dictionary/en_US.dic"
>
> affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
>
> 
> 
>
>  
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
> dictionary="../hunspell_dictionary/en_US.dic"
>
> affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
>
> 
>
> 
>
> 
>
> 
>
> *Managed Synonyms:* "abc implement",  "bike", "xyz traders", "xyz
> transport"
>
> -
>
> *Query*: bike
>
> *parser Type:* edismax
>
> -
>
> *Parsed query (from debug)* : +DisjunctionMaxQueryfield1:"abc
>
> implement" field1:bike field1:"xyz traders" field1:"xyz trade"))
>
> -
>
>
>
> If you notice, there are 2 multi-word keywords starting with xyz, but only
>
> 1 of them is getting added to the query. If we change xyz transport to xy
>
> transport, then it works properly. The issue is only when the 2 multi-word
>
> keywords start with the same word. Though we are using graph synonyms, it
>
> is not working properly.
>
>
>
> Are we doing anything wrong here?
>
>
>
> Thanks,
>
> Manish.
>
>

Re: schema.xml version attribute

2020-09-06 Thread Dominique Bejean

Hi Shaw,

Thank you for your response.

I can see the default value set to 1.0
version = schemaConf.getFloat(expression, 1.0f)

I can't see where an outside limit value is raised to the minimum (1.0) or
lowered to the maximum (1.6).

Regards.

Dominique


Le dim. 6 sept. 2020 à 00:25, Shawn Heisey  a écrit :

> On 9/5/2020 3:30 AM, Dominique Bejean wrote:
> > Hi, I often see a bad usage of the version attribute in shema.xml. For
> > instance  The version attribute is to
> > specify the schema syntax and semantics version to be used by Solr.
> > The current value is 1.6 It is clearly specified in schema.xml
> > comments "It should not normally be changed by applications". However,
> > what happens if this attribute is not correctly set ? I tried to find
> > the answer in the code but without success. If the value is not 1.0,
> > 1.1, ... or 1.6, does Solr default it to the last correct value so 1.6 ?
>
> I've checked the code.
>
> If the version is not specified in the schema, then it defaults to 1.0.
> The code that handles this can be found in IndexSchema.java.
>
> Currently the minimum value is 1.0 and the maximum value is 1.6. If the
> actual configured version is outside of these limits, then the effective
> value is raised to the minimum or lowered to the maximum.
>
> Thanks,
> Shawn
>
>

schema.xml version attribute

2020-09-05 Thread Dominique Bejean

Hi,

I often see a bad usage of the version attribute in shema.xml. For instance



The version attribute is to specify the schema syntax and semantics version
to be used by Solr. The current value is 1.6

It is clearly specified in schema.xml comments "It should not normally be
changed by applications".

However, what happens if this attribute is not correctly set ? I tried to
find the answer in the code but without success. If the value is not 1.0,
1.1, ... or 1.6, does Solr default it to the last correct value so 1.6 ?

Regards

Dominique

Re: Understanding Solr heap %

2020-09-01 Thread Dominique Bejean

Hi,

As all Java applications the Heap memory is regularly cleaned by the
garbage collector (some young items moved to the old generation heap zone
and unused old items removed from the old generation heap zone). This
causes heap usage to continuously grow and reduce.

Regards

Dominique




Le mar. 1 sept. 2020 à 13:50, yaswanth kumar  a
écrit :

> Can someone make me understand on how the value % on the column Heap is
> calculated.
>
> I did created a new solr cloud with 3 solr nodes and one zookeeper, its
> not yet live neither interms of indexing or searching, but I do see some
> spikes in the HEAP column against nodes when I refresh the page multiple
> times. Its like almost going to 95% (sometimes) and then coming down to 50%
>
> Solr version: 8.2
> Zookeeper: 3.4
>
> JVM size configured in solr.in.sh is min of 1GB to max of 10GB (actually
> RAM size on the node is 16GB)
>
> Basically need to understand if I need to worry about this heap % which
> was quite altering before making it live? or is that quite normal, because
> this is new UI change on solr cloud is kind of new to us as we used to have
> solr 5 version before and this UI component doesn't exists then.
>
> --
> Thanks & Regards,
> Yaswanth Kumar Konathala.
> yaswanth...@gmail.com
>
> Sent from my iPhone

Re: Rule-Based permissions for cores

2020-08-31 Thread Dominique Bejean

Hi,

It looks like this issue I opened a long time ago.
https://issues.apache.org/jira/browse/SOLR-13097

Regards

Dominique


Le lun. 31 août 2020 à 23:02, Thomas Corthals  a
écrit :

> Hi,
>
> I'm trying to configure the Rule-Based Authorization Plugin in Solr 8.4.0
> in standalone mode. My goal is to limit a user's access to one or more
> designated cores. My security.json looks like this:
>
> {
>   "authentication":{
> "blockUnknown":true,
> "class":"solr.BasicAuthPlugin",
> "credentials":{
>   "solr":"...",
>   "user1":"...",
>   "user2":"..."},
> "realm":"Solr",
> "forwardCredentials":false,
> "":{"v":0}},
>   "authorization":{
> "class":"solr.RuleBasedAuthorizationPlugin",
> "permissions":[
>   {
> "name":"security-edit",
> "role":"admin",
> "index":1},
>   {
> "name":"read",
> "collection":"core1",
> "role":"role1",
> "index":2},
>   {
> "name":"read",
> "collection":"core2",
> "role":"role2",
> "index":3},
>   {
> "name":"all",
> "role":"admin",
> "index":4}],
> "user-role":{
>   "solr":"admin",
>   "user1":"role1",
>   "user2":"role2"},
> "":{"v":0}}}
>
> With this setup, I'm unable to read from any of the cores with either user.
> If I "delete-permission":4 both users can read from either core, not just
> "their" core.
>
> I have tried custom permissions like this to no avail:
> {"name": "access-core1", "collection": "core1", "role": "role1"},
> {"name": "access-core2", "collection": "core2", "role": "role2"},
> {"name": "all", "role": "admin"}
>
> Is it possible to do this for cores? Or am I out of luck because I'm not
> using collections?
>
> Regards
>
> Thomas
>

Re: How to Prevent Recovery?

2020-08-31 Thread Dominique Bejean

Hi,

Even if it is not the root cause, I suggest to try to respect some basic
best practices and so not have "2 Zk running on the
same nodes where Solr is running". Maybe you can achieve this by just
stopping these 2 Zk (and move them later). Did you increase
ZK_CLIENT_TIMEOUT to 3 ?

Did you check your GC logs ? Any consecutive full GC ? How big is your Solr
heap size ? Not too big ?

The last time I saw such long commits, it was due to slow segment merges
related docValues and dynamicfield. Are you intensively using DynamicFields
with docValues ?

Can you enable Lucene detailed debug information
(true) ?
https://lucene.apache.org/solr/guide/8_5/indexconfig-in-solrconfig.html#other-indexing-settings

With these Lucene debug information, are there any lines like this in your
logs ?

2020-05-03 16:22:38.139 INFO  (qtp1837543557-787) [   x:###]
o.a.s.u.LoggingInfoStream [MS][qtp1837543557-787]: too many merges;
stalling...
2020-05-03 16:24:58.318 INFO  (commitScheduler-19-thread-1) [   x:###]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
2020-05-03 16:24:59.005 INFO  (commitScheduler-19-thread-1) [   x:###]
o.a.s.u.LoggingInfoStream [MS][commitScheduler-19-thread-1]: too many
merges; stalling...
2020-05-03 16:31:31.402 INFO  (Lucene Merge Thread #55) [   x:###]
o.a.s.u.LoggingInfoStream [SM][Lucene Merge Thread #55]: 1291879 msec to
merge doc values [464265 docs]


Regards

Dominique





Le dim. 30 août 2020 à 20:44, Anshuman Singh  a
écrit :

> Hi,
>
> I changed all the replicas, 50x2, from NRT to TLOG by adding TLOG replicas
> using the ADDREPLICA API and then deleting the NRT replicas.
> But now, these replicas are going into recovery even more frequently during
> indexing. Same errors are observed.
> Also, commit is taking a lot of time compared to NRT replicas.
> Can this be due to the fact that most of the indexes are on disk and not in
> RAM, and therefore copying index from leader is causing high disk
> utilisation and causing poor performance?
> Do I need to tweak the auto commit settings? Right now it is 30 seconds max
> time and 100k max docs.
>
> Regards,
> Anshuman
>
> On Tue, Aug 25, 2020 at 10:23 PM Erick Erickson 
> wrote:
>
> > Commits should absolutely not be taking that much time, that’s where I’d
> > focus first.
> >
> > Some sneaky places things go wonky:
> > 1> you have  suggester configured that builds whenever there’s a commit.
> > 2> you send commits from the client
> > 3> you’re optimizing on commit
> > 4> you have too much data for your hardware
> >
> > My guess though is that the root cause of your recovery is that the
> > followers
> > get backed up. If there are enough merge threads running, the
> > next update can block until at least one is done. Then the scenario
> > goes something like this:
> >
> > leader sends doc to follower
> > follower does not index the document in time
> > leader puts follower into “leader initiated recovery”.
> >
> > So one thing to look for if that scenario is correct is whether there are
> > messages
> > in your logs with "leader-initiated recovery” I’d personally grep my logs
> > for
> >
> > grep logfile initated | grep recovery | grep leader
> >
> > ‘cause I never remember whether that’s the exact form. If it is this, you
> > can
> > lengthen the timeouts, look particularly for:
> > • distribUpdateConnTimeout
> > • distribUpdateSoTimeout
> >
> > All that said, your symptoms are consistent with a lot of merging going
> > on. With NRT
> > nodes, all replicas do all indexing and thus merging. Have you considered
> > using TLOG/PULL replicas? In your case they could even all be TLOG
> > replicas. In that
> > case, only the leader does the indexing, the other TLOG replicas of a
> > shard just stuff
> > the documents into their local tlogs without indexing at all.
> >
> > Speaking of which, you could reduce some of the disk pressure if you can
> > put your
> > tlogs on another drive, don’t know if that’s possible. Ditto the Solr
> logs.
> >
> > Beyond that, it may be a matter of increasing the hardware. You’re really
> > indexing
> > 120K records second ((1 leader + 2 followers) * 40K)/sec.
> >
> > Best,
> > Erick
> >
> > > On Aug 25, 2020, at 12:02 PM, Anshuman Singh <
> singhanshuma...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > We have a 10 node (150G RAM, 1TB SAS HDD, 32 cores) Solr 8.5.1 cluster
> > with
> > > 50 shards, rf 2 (NRT replicas), 7B docs, We have 5 Zk with 2 running on
> > the
> > > same nodes where Solr is running. Our use case requires continuous
> > > ingestions (updates mostly). If we ingest at 40k records per sec, after
> > > 10-15mins some replicas go into recovery with the errors observed given
> > in
> > > the end. We also observed high CPU during these ingestions (60-70%) and
> > > disks frequently reach 100% utilization.
> > >
> > > We know our hardware is limited but this system will

Re: Solr TLOG Replicas going in recovery

2020-08-29 Thread Dominique Bejean

Hi,

Can you provide more information : Solr version, how are you indexing (DIH,
threading, ...), more details in Solr logs ?

Did you analyse JVM Gc logs ?

Regards

Dominique

Le ven. 28 août 2020 à 22:53, amit3281  a écrit :

> Hi,
>
>
>
> I am using Solr on EXT4 partition and have TLOG replicas in my collection.
> I
>
> am using 2 Solr nodes to utilize 2 disk (for getting IOPS) for same
>
> collection. My collection has 150 shards. Each shard size is ~9GB and
>
> 48Million docs per shard.
>
>
>
> My shards frequently goes into recovery with error
>
> *o.a.s.h.RequestHandlerBase java.io.IOException:
>
> java.util.concurrent.TimeoutException: Idle timeout expired*.
>
>
>
> I am doing 90% times overwrites as getting few statistics update for same
>
> data.
>
>
>
> 1. Is this because of frequent updates?
>
> 2. Is this because of Huge number of shards?
>
> 3. Is this because of TLOG and big segments are getting transferred?
>
>
>
> Commit time is 15sec and I am trying to add 10k docs per second.
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
>

Re: Odd Solr zkcli script behavior

2020-08-27 Thread Dominique Bejean

Hi,

You can also connect to ZK element and use zkCli.sh tools

http://www.mtitek.com/tutorials/zookeeper/zkCli.php

Regards

Dominique


Le jeu. 27 août 2020 à 17:28, Webster Homer <
webster.ho...@milliporesigma.com> a écrit :

> I am using solr 7.7.2 solr cloud
>
> We version our collection and config set names with dates. I have two
> collections sial-catalog-product-20200711 and
> sial-catalog-product-20200808. A developer uploaded a configuration file to
> the 20200711 version that was not checked into our source control, and I
> wanted to retrieve is from zookeeper as we cannot find the version anywhere
> else. So I tried the zkcli.sh shell script.
>
> It always throws an exception when trying to access
> sial-catalog-product-20200711 but not when trying to access
> sial-catalog-product-20200808
> INFO  - 2020-08-27 10:26:36.283;
> org.apache.solr.common.cloud.ConnectionManager; zkClient has connected
> Exception in thread "main"
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for
> /solr/configs/sial-catalog-product-20200711/_schema_model-store.json
>   at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:114)
>   at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>   at
> org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1221)
>   at
> org.apache.solr.common.cloud.SolrZkClient.lambda$getData$5(SolrZkClient.java:358)
>   at
> org.apache.solr.common.cloud.SolrZkClient$$Lambda$6/1384010761.execute(Unknown
> Source)
>   at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:71)
>   at
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:358)
>   at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:331)
>
> I can see both collections and configsets in the SolrAdmin console. I can
> download the file from sial-catalog-product-20200808 with no problem. As
> far as I can tell the two config sets are accessible in the cloud, both
> config sets and collections are available the only difference is that we
> have an alias set to point to the newer one which is current, but the zkcli
> script does not use the alias.
>
> I tried both the getfile and downconfig commands and the behavior is
> consistent I can always get to the later one but the 20200711 version gives
> the NoNodeException
> What is going on here?
>
> A general comment, we use Zookeeper chroot, but the zkcli command doesn't
> seem to care if I pass the root on the zkhost argument or not. I also
> noticed that the zkcli command is poorly documented.
>
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
>
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>

Re: Error from server at http://localhost:8983/solr/search: Expected mime type application/octet-stream but got text/html

2020-08-27 Thread Dominique Bejean

Hi,

There were few discussions about similar issues these days. A JIRA issue
was created
https://issues.apache.org/jira/browse/SOLR-14768

Regards

Dominique


Le jeu. 27 août 2020 à 15:00, Divino I. Ribeiro Jr. <
divinoirj.ib...@gmail.com> a écrit :

> Hello everyone!
> When I run an query to Solr Server, it returns the following message:
>
> 2020-08-27 03:24:03,338 ERROR org.dspace.app.rest.utils.DiscoverQueryBuilder 
> @ divinoirj.ib...@gmail.com::Error in Discovery while setting up date facet 
> range:date facet\colon; 
> org.dspace.discovery.configuration.DiscoverySearchFilterFacet@1350bf85
> org.dspace.discovery.SearchServiceException: Error from server at 
> http://localhost:8983/solr/search: Expected mime type 
> application/octet-stream but got text/html.  
> 
> 
> Error 500 java.lang.NoClassDefFoundError: 
> org/eclipse/jetty/server/MultiParts
> 
> HTTP ERROR 500 java.lang.NoClassDefFoundError: 
> org/eclipse/jetty/server/MultiParts
> 
> URI:/solr/search/select
> STATUS:500
>
> Solr instalation cores:
>
> CWD: /opt/solr-8.6.0/server
> Instance: /var/solr/data/search
> Data: /var/solr/data/search/data
> Index: /var/solr/data/search/data/index
> Impl: org.apache.solr.core.NRTCachingDirectoryFactory
>
> Thanks!
>

Re: Solr collections gets wiped on restart

2020-08-27 Thread Dominique Bejean

Hi,

Which Solr version ?

Restart which node ? Solr ? ZK ? Only one node ?

Collections are missing in Solr console (lost in Zookeeper) but cores are
still present ?

Why put Zk data and datalog in a "temp" directory
(dataDir=/applis/24374-iplsp-00/IPLS/apache-zookeeper-3.5.5-bin/temp) ?
This directory is never purged ?

Why launch Solr with ZKHOST and other settings in the command line instead
of in solr.in.sh ?

There is a similar discussion thread these days "All cores gone along with
all solr configuration upon reboot".

Regards

Dominique


Le jeu. 27 août 2020 à 10:49, antonio.di...@bnpparibasfortis.com <
antonio.di...@bnpparibasfortis.com> a écrit :

> Good morning,
>
>
> I would like to get some help if possible.
>
>
>
> We have a 3 node Solr cluster (ensemble) with apache-zookeeper 3.5.5.
>
> It works fine until we need to restart one of the nodes. Then all the
> content of the collection gets deleted.
>
> This is a production environment, and every time there is a restart or a
> crash in one of the services/servers we loose lots of time restoring the
> collection and work.
>
> This is the way we start the nodes:
>
> su - ipls004p -c "/applis/24374-iplsp-00/IPLS/solr-8.3.0/bin/solr start
> -cloud -p 8987 -h s01vl9918254 -s
> /applis/24374-iplsp-00/IPLS/solr-8.3.0/cloud/node1/solr -z
> s01vl9918254:2181,s01vl9918256:2181,s01vl9918258:2181 -force"
>
> This is the zoo.cfg:
> 1.  The number of milliseconds of each tick
> tickTime=2000
> 2.  The number of ticks that the initial
> 3.  synchronization phase can take
> initLimit=10
> 4.  The number of ticks that can pass between
> 5.  sending a request and getting an acknowledgement
> syncLimit=5
> 6.  the directory where the snapshot is stored.
> 7.  do not use /tmp for storage, /tmp here is just
> 8.  example sakes.
> dataDir=/applis/24374-iplsp-00/IPLS/apache-zookeeper-3.5.5-bin/temp
> 9.  the port at which the clients will connect
> clientPort=2181
> 1.  the maximum number of client connections.
> 2.  increase this if you need to handle more clients
> #maxClientCnxns=60
> #
> 3.  Be sure to read the maintenance section of the
> 4.  administrator guide before turning on autopurge.
> #
> 5.
> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
> #
> 6.  The number of snapshots to retain in dataDir
> #autopurge.snapRetainCount=3
> 7.  Purge task interval in hours
> 8.  Set to "0" to disable auto purge feature
> #autopurge.purgeInterval=1
> 4lw.commands.whitelist=mntr,conf,ruok
>
> server.1=s01vl9918256:3889:3888
> server.2=s01vl9918258:3889:3888
> server.3=s01vl9918254:3889:3888
> #server.4=s01vl9918255:3889:3888
>
>
>
>
>
> Thanks in advance
>
>
> Regards, Cordialement,
> Antonio Dinis
> TCC Web Portals Ops Engineers  |   BNP Paribas Fortis SA/NV
>
>
>
>
>
>
> T
>
> +32 (0)2 231 20994// Brussels Marais +1
>
>
>
>
> 
>
> TCC Web Portals
>
>
>
> ==
> BNP Paribas Fortis disclaimer:
> http://www.bnpparibasfortis.com/e-mail-disclaimer.html
>
> BNP Paribas Fortis privacy policy:
> http://www.bnpparibasfortis.com/privacy-policy.html
>
> ==
>

Re: Simple query

2020-08-24 Thread Dominique Bejean

Hi,

We need to know how is analyzed your catch_all field at index and search
time.

I think you are using a stemming filter and "apache" is stemmed as "apach".
So "apache" and "apach" match the document and not "apac".
You can use the console in order to see how terms are removed or
transformed by each filter of the analysis chain for a field or a fieldtype.

Regards

Dominique


Le lun. 24 août 2020 à 12:01, Jayadevan Maymala 
a écrit :

> Hi all,
> I am learning the basics of Solr querying and am not able to figure out
> something. The first query which searches for 'apac' fetches no documents.
> The second one which searches for 'apach' , i.e. add h - one more
> character, fetches a document.
>
> curl -X GET "
>
> http://localhost:8983/solr/search_twitter/select?q=apac=catch_all=catch_all,score
> "
> {
>   "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
>   "q":"apac",
>   "df":"catch_all",
>   "fl":"catch_all,score"}},
>
>
> "response":{"numFound":0,"start":0,"maxScore":0.0,"numFoundExact":true,"docs":[]
>   }}
>
>
> curl -X GET "
>
> http://localhost:8983/solr/search_twitter/select?q=apach=catch_all=catch_all,score
> "
> {
>   "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
>   "q":"apach",
>   "df":"catch_all",
>   "fl":"catch_all,score"}},
>
>
> "response":{"numFound":1,"start":0,"maxScore":0.13076457,"numFoundExact":true,"docs":[
>   {
> "catch_all":["apache",
>   "Happy searching!",
>   "https://lucene.apache.org/solr;,
>   "https://lucene.apache.org;],
> "score":0.13076457}]
>   }}
>
> Field definition -
> "name":"catch_all",
> "type":"text_en",
> "multiValued":true
>
>
> Neither apac or apach is there in the data.
>
> Regards,
> Jayadevan
>

Re: IOException occured when talking to server

2020-08-17 Thread Dominique Bejean

These links are not providing solutions but may be provide some ideas for
the investigation.
I suggest to try the  -Djavax.net.debug=all JVM parameter for your client
application.

Good luke.

Dominique

Le lun. 17 août 2020 à 19:11, Odysci  a écrit :

> Dominique,
> thanks, but I'm not sure the links you sent point to an actual solution.
> The Nginx logs, sometimes give a 499 return code which is:
> (499 Client Closed Request
> Used when the client has closed the request before the server could send a
> response.
>
> but the timestamps of these log msgs do not coincide with the IOException,
> so I'm not sure they are related.
> Reinaldo
>
> On Mon, Aug 17, 2020 at 12:59 PM Dominique Bejean <
> dominique.bej...@eolya.fr> wrote:
>
>> Hi,
>>
>> It looks like this issues
>> https://github.com/eclipse/jetty.project/issues/4883
>> https://github.com/eclipse/jetty.project/issues/2571
>>
>> The Nginx server closed the connection. Any info in nginx log ?
>>
>> Dominique
>>
>> Le lun. 17 août 2020 à 17:33, Odysci  a écrit :
>>
>>> Hi,
>>> thanks for the reply.
>>> We're using solr 8.3.1, ZK 3.5.6
>>> The stacktrace is below.
>>> The address on the first line "
>>> http://192.168.15.10:888/solr/mycollection; is the "server" address in
>>> my nginx configuration, which points to 2 upstream solr nodes. There were
>>> no other solr or ZK messages in the logs.
>>>
>>> StackTrace:
>>> (Msg = IOException occured when talking to server at:
>>> http://192.168.15.10:888/solr/mycollection)
>>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>>> when talking to server at: http://192.168.15.10:888/solr/mycollection
>>> at
>>> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:418)
>>> at
>>> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:754)
>>> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
>>> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1035)
>>> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1051)
>>> ... calls from our code
>>> ... calls from our code
>>> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>>> at
>>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>>> at
>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>> at
>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>> at java.base/java.lang.Thread.run(Thread.java:834)
>>> Caused by: java.nio.channels.AsynchronousCloseException
>>> at
>>> org.eclipse.jetty.http2.client.http.HttpConnectionOverHTTP2.close(HttpConnectionOverHTTP2.java:144)
>>> at
>>> org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2.onClose(HttpClientTransportOverHTTP2.java:170)
>>> at
>>> org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2$SessionListenerPromise.onClose(HttpClientTransportOverHTTP2.java:232)
>>> at org.eclipse.jetty.http2.api.Session$Listener.onClose(Session.java:206)
>>> at
>>> org.eclipse.jetty.http2.HTTP2Session.notifyClose(HTTP2Session.java:1153)
>>> at org.eclipse.jetty.http2.HTTP2Session.onGoAway(HTTP2Session.java:438)
>>> at
>>> org.eclipse.jetty.http2.parser.Parser$Listener$Wrapper.onGoAway(Parser.java:392)
>>> at
>>> org.eclipse.jetty.http2.parser.BodyParser.notifyGoAway(BodyParser.java:187)
>>> at
>>> org.eclipse.jetty.http2.parser.GoAwayBodyParser.onGoAway(GoAwayBodyParser.java:169)
>>> at
>>> org.eclipse.jetty.http2.parser.GoAwayBodyParser.parse(GoAwayBodyParser.java:108)
>>> at org.eclipse.jetty.http2.parser.Parser.parseBody(Parser.java:194)
>>> at org.eclipse.jetty.http2.parser.Parser.parse(Parser.java:123)
>>> at
>>> org.eclipse.jetty.http2.HTTP2Connection$HTTP2Producer.produce(HTTP2Connection.java:248)
>>> at
>>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
>>> at
>>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
>>> at
>>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>>> at
>>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:132)
>>> at
>

Re: IOException occured when talking to server

2020-08-17 Thread Dominique Bejean

I mean add this parameter on your client application JVM :)


Le lun. 17 août 2020 à 18:36, Dominique Bejean 
a écrit :

> If you want a more detailed debug information from your client
> application, you can add this parameter while starting Solr JVM.
> -Djavax.net.debug=all
>
> It is very verbose !
>
> Dominique
>
>
> Le lun. 17 août 2020 à 17:59, Dominique Bejean 
> a écrit :
>
>> Hi,
>>
>> It looks like this issues
>> https://github.com/eclipse/jetty.project/issues/4883
>> https://github.com/eclipse/jetty.project/issues/2571
>>
>> The Nginx server closed the connection. Any info in nginx log ?
>>
>> Dominique
>>
>> Le lun. 17 août 2020 à 17:33, Odysci  a écrit :
>>
>>> Hi,
>>> thanks for the reply.
>>> We're using solr 8.3.1, ZK 3.5.6
>>> The stacktrace is below.
>>> The address on the first line "
>>> http://192.168.15.10:888/solr/mycollection; is the "server" address in
>>> my nginx configuration, which points to 2 upstream solr nodes. There were
>>> no other solr or ZK messages in the logs.
>>>
>>> StackTrace:
>>> (Msg = IOException occured when talking to server at:
>>> http://192.168.15.10:888/solr/mycollection)
>>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>>> when talking to server at: http://192.168.15.10:888/solr/mycollection
>>> at
>>> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:418)
>>> at
>>> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:754)
>>> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
>>> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1035)
>>> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1051)
>>> ... calls from our code
>>> ... calls from our code
>>> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>>> at
>>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>>> at
>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>> at
>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>> at java.base/java.lang.Thread.run(Thread.java:834)
>>> Caused by: java.nio.channels.AsynchronousCloseException
>>> at
>>> org.eclipse.jetty.http2.client.http.HttpConnectionOverHTTP2.close(HttpConnectionOverHTTP2.java:144)
>>> at
>>> org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2.onClose(HttpClientTransportOverHTTP2.java:170)
>>> at
>>> org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2$SessionListenerPromise.onClose(HttpClientTransportOverHTTP2.java:232)
>>> at org.eclipse.jetty.http2.api.Session$Listener.onClose(Session.java:206)
>>> at
>>> org.eclipse.jetty.http2.HTTP2Session.notifyClose(HTTP2Session.java:1153)
>>> at org.eclipse.jetty.http2.HTTP2Session.onGoAway(HTTP2Session.java:438)
>>> at
>>> org.eclipse.jetty.http2.parser.Parser$Listener$Wrapper.onGoAway(Parser.java:392)
>>> at
>>> org.eclipse.jetty.http2.parser.BodyParser.notifyGoAway(BodyParser.java:187)
>>> at
>>> org.eclipse.jetty.http2.parser.GoAwayBodyParser.onGoAway(GoAwayBodyParser.java:169)
>>> at
>>> org.eclipse.jetty.http2.parser.GoAwayBodyParser.parse(GoAwayBodyParser.java:108)
>>> at org.eclipse.jetty.http2.parser.Parser.parseBody(Parser.java:194)
>>> at org.eclipse.jetty.http2.parser.Parser.parse(Parser.java:123)
>>> at
>>> org.eclipse.jetty.http2.HTTP2Connection$HTTP2Producer.produce(HTTP2Connection.java:248)
>>> at
>>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
>>> at
>>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
>>> at
>>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>>> at
>>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:132)
>>> at
>>> org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:170)
>>> at
>>> org.eclipse.jetty.http2.HTTP2Connection.onFillable(HTTP2Connection.java:125)
>>> at
>>> org.eclipse.jetty.http2.HTTP2Connection$FillableCallback.succeeded(HTTP2Connection.java:348)
>>> at org.eclipse.jetty.i

Re: IOException occured when talking to server

2020-08-17 Thread Dominique Bejean

If you want a more detailed debug information from your client application,
you can add this parameter while starting Solr JVM.
-Djavax.net.debug=all

It is very verbose !

Dominique


Le lun. 17 août 2020 à 17:59, Dominique Bejean 
a écrit :

> Hi,
>
> It looks like this issues
> https://github.com/eclipse/jetty.project/issues/4883
> https://github.com/eclipse/jetty.project/issues/2571
>
> The Nginx server closed the connection. Any info in nginx log ?
>
> Dominique
>
> Le lun. 17 août 2020 à 17:33, Odysci  a écrit :
>
>> Hi,
>> thanks for the reply.
>> We're using solr 8.3.1, ZK 3.5.6
>> The stacktrace is below.
>> The address on the first line "http://192.168.15.10:888/solr/mycollection;
>> is the "server" address in my nginx configuration, which points to 2
>> upstream solr nodes. There were no other solr or ZK messages in the logs.
>>
>> StackTrace:
>> (Msg = IOException occured when talking to server at:
>> http://192.168.15.10:888/solr/mycollection)
>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>> when talking to server at: http://192.168.15.10:888/solr/mycollection
>> at
>> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:418)
>> at
>> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:754)
>> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
>> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1035)
>> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1051)
>> ... calls from our code
>> ... calls from our code
>> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>> at
>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>> at
>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>> at
>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>> at java.base/java.lang.Thread.run(Thread.java:834)
>> Caused by: java.nio.channels.AsynchronousCloseException
>> at
>> org.eclipse.jetty.http2.client.http.HttpConnectionOverHTTP2.close(HttpConnectionOverHTTP2.java:144)
>> at
>> org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2.onClose(HttpClientTransportOverHTTP2.java:170)
>> at
>> org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2$SessionListenerPromise.onClose(HttpClientTransportOverHTTP2.java:232)
>> at org.eclipse.jetty.http2.api.Session$Listener.onClose(Session.java:206)
>> at
>> org.eclipse.jetty.http2.HTTP2Session.notifyClose(HTTP2Session.java:1153)
>> at org.eclipse.jetty.http2.HTTP2Session.onGoAway(HTTP2Session.java:438)
>> at
>> org.eclipse.jetty.http2.parser.Parser$Listener$Wrapper.onGoAway(Parser.java:392)
>> at
>> org.eclipse.jetty.http2.parser.BodyParser.notifyGoAway(BodyParser.java:187)
>> at
>> org.eclipse.jetty.http2.parser.GoAwayBodyParser.onGoAway(GoAwayBodyParser.java:169)
>> at
>> org.eclipse.jetty.http2.parser.GoAwayBodyParser.parse(GoAwayBodyParser.java:108)
>> at org.eclipse.jetty.http2.parser.Parser.parseBody(Parser.java:194)
>> at org.eclipse.jetty.http2.parser.Parser.parse(Parser.java:123)
>> at
>> org.eclipse.jetty.http2.HTTP2Connection$HTTP2Producer.produce(HTTP2Connection.java:248)
>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:132)
>> at
>> org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:170)
>> at
>> org.eclipse.jetty.http2.HTTP2Connection.onFillable(HTTP2Connection.java:125)
>> at
>> org.eclipse.jetty.http2.HTTP2Connection$FillableCallback.succeeded(HTTP2Connection.java:348)
>> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>> at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>> at
>> org.eclipse.jetty.util.thread.Invocable.invokeNonBlocking(Invocable.java:68)
>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.invokeTask(EatWhatYouKill.java:345)
>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:300)
>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java

Re: IOException occured when talking to server

2020-08-17 Thread Dominique Bejean

Hi,

It looks like this issues
https://github.com/eclipse/jetty.project/issues/4883
https://github.com/eclipse/jetty.project/issues/2571

The Nginx server closed the connection. Any info in nginx log ?

Dominique

Le lun. 17 août 2020 à 17:33, Odysci  a écrit :

> Hi,
> thanks for the reply.
> We're using solr 8.3.1, ZK 3.5.6
> The stacktrace is below.
> The address on the first line "http://192.168.15.10:888/solr/mycollection;
> is the "server" address in my nginx configuration, which points to 2
> upstream solr nodes. There were no other solr or ZK messages in the logs.
>
> StackTrace:
> (Msg = IOException occured when talking to server at:
> http://192.168.15.10:888/solr/mycollection)
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: http://192.168.15.10:888/solr/mycollection
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:418)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:754)
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1035)
> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1051)
> ... calls from our code
> ... calls from our code
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.nio.channels.AsynchronousCloseException
> at
> org.eclipse.jetty.http2.client.http.HttpConnectionOverHTTP2.close(HttpConnectionOverHTTP2.java:144)
> at
> org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2.onClose(HttpClientTransportOverHTTP2.java:170)
> at
> org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2$SessionListenerPromise.onClose(HttpClientTransportOverHTTP2.java:232)
> at org.eclipse.jetty.http2.api.Session$Listener.onClose(Session.java:206)
> at org.eclipse.jetty.http2.HTTP2Session.notifyClose(HTTP2Session.java:1153)
> at org.eclipse.jetty.http2.HTTP2Session.onGoAway(HTTP2Session.java:438)
> at
> org.eclipse.jetty.http2.parser.Parser$Listener$Wrapper.onGoAway(Parser.java:392)
> at
> org.eclipse.jetty.http2.parser.BodyParser.notifyGoAway(BodyParser.java:187)
> at
> org.eclipse.jetty.http2.parser.GoAwayBodyParser.onGoAway(GoAwayBodyParser.java:169)
> at
> org.eclipse.jetty.http2.parser.GoAwayBodyParser.parse(GoAwayBodyParser.java:108)
> at org.eclipse.jetty.http2.parser.Parser.parseBody(Parser.java:194)
> at org.eclipse.jetty.http2.parser.Parser.parse(Parser.java:123)
> at
> org.eclipse.jetty.http2.HTTP2Connection$HTTP2Producer.produce(HTTP2Connection.java:248)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:132)
> at
> org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:170)
> at
> org.eclipse.jetty.http2.HTTP2Connection.onFillable(HTTP2Connection.java:125)
> at
> org.eclipse.jetty.http2.HTTP2Connection$FillableCallback.succeeded(HTTP2Connection.java:348)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
> at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> at
> org.eclipse.jetty.util.thread.Invocable.invokeNonBlocking(Invocable.java:68)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.invokeTask(EatWhatYouKill.java:345)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:300)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:132)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> ... 3 more
>
> -
>
> I did consider using the solrJ cloud or lb clients, but nginx gives me
> more flexibility in controlling how the load balancing is done. I'm still
> running experiments to see which one works best for me.
> In the meantime, if you have any clues for why I'm getting this
> IOException, I'd appreci

Re: IOException occured when talking to server

2020-08-17 Thread Dominique Bejean

Hi,

Can you provide more information ?
- Solr and ZK version
- full error stacktrace generated by SolrJ
- any concomitant and relevant information in solr nodes logs or zk logs

Just to know, why not use a load balanced LBHttp... Solr Client ?

Regards.

Dominique


Le lun. 17 août 2020 à 00:41, Odysci  a écrit :

> Hi,
>
> We have a solrcloud setup with 2 solr nodes and 3 ZK instances. Until
> recently I had my application server always call one of the solr nodes (via
> solrJ), and it worked just fine.
>
> In order to improve reliability I put an Nginx reverse-proxy load balancer
> between my application server and the solr nodes. The response time
> remained almost the same but we started getting the msg:
>
> IOException occured when talking to server http://myserver
>
> every minute or so (very randomly but consistently). Since our code will
> just try again after a few milliseconds, the overall system continues to
> work fine, despite the delay. I tried increasing all nginx related
> timeout's to no avail.
> I've searched for this msg a lot and most replies seem to be related to
> ssl.
> We are using http2solrclient but no ssl to solr.
> Can anyone shed any light on this?
>
> Thanks!
> Reinaldo
>

Re: Solr ping taking 600 seconds

2020-08-15 Thread Dominique Bejean

Hi,

How long to display the solr console ?
What about CPU and iowait with top ?

You should start by eliminate network issue between your solr nodes by
testing it with netcat on solr port.
http://deice.daug.net/netcat_speed.html

Dominique

Le ven. 14 août 2020 à 23:40, Susheel Kumar  a
écrit :

> Hello,
>
>
>
> One of our Solr 6.6.2 DR cluster (target CDCR) which even doesn't have any
>
> live search load seems to be taking 60 ms many times for the ping /
>
> health check calls. Anyone has seen this before/suggestion what could be
>
> wrong. The collection has 8 shards/3 replicas and 64GB memory and index
>
> seems to fit in memory. Below solr log entries.
>
>
>
>
>
> solr.log.26:2020-08-13 14:03:20.827 INFO  (qtp1775120226-46486) [c:COLL
>
> s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.S.Request
>
> [COLL_shard1_replica1]  webapp=/solr path=/admin/ping
>
> params={distrib=true&_stateVer_=COLL:3032=javabin=2}
>
> hits=62569458 status=0 QTime=600113
>
> solr.log.26:2020-08-13 14:03:20.827 WARN  (qtp1775120226-46486) [c:COLL
>
> s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.SolrCore slow:
>
> [COLL_shard1_replica1]  webapp=/solr path=/admin/ping
>
> params={distrib=true&_stateVer_=COLL:3032=javabin=2}
>
> hits=62569458 status=0 QTime=600113
>
> solr.log.26:2020-08-13 14:03:20.827 INFO  (qtp1775120226-46486) [c:COLL
>
> s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.S.Request
>
> [COLL_shard1_replica1]  webapp=/solr path=/admin/ping
>
> params={distrib=true&_stateVer_=COLL:3032=javabin=2} status=0
>
> QTime=600113
>
> solr.log.26:2020-08-13 14:03:20.827 WARN  (qtp1775120226-46486) [c:COLL
>
> s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.SolrCore slow:
>
> [COLL_shard1_replica1]  webapp=/solr path=/admin/ping
>
> params={distrib=true&_stateVer_=COLL:3032=javabin=2} status=0
>
> QTime=600113
>
> solr.log.38:2020-08-08 15:01:45.640 INFO  (qtp1775120226-46254) [c:COLL
>
> s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.S.Request
>
> [COLL_shard1_replica1]  webapp=/solr path=/admin/ping
>
> params={distrib=true&_stateVer_=COLL:3032=javabin=2}
>
> hits=62221186 status=0 QTime=600092
>
> solr.log.38:2020-08-08 15:01:45.640 WARN  (qtp1775120226-46254) [c:COLL
>
> s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.SolrCore slow:
>
> [COLL_shard1_replica1]  webapp=/solr path=/admin/ping
>
> params={distrib=true&_stateVer_=COLL:3032=javabin=2}
>
> hits=62221186 status=0 QTime=600092
>
> solr.log.38:2020-08-08 15:01:45.640 INFO  (qtp1775120226-46254) [c:COLL
>
> s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.S.Request
>
> [COLL_shard1_replica1]  webapp=/solr path=/admin/ping
>
> params={distrib=true&_stateVer_=COLL:3032=javabin=2} status=0
>
> QTime=600092
>
> solr.log.38:2020-08-08 15:01:45.640 WARN  (qtp1775120226-46254) [c:COLL
>
> s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.SolrCore slow:
>
> [COLL_shard1_replica1]  webapp=/solr path=/admin/ping
>
> params={distrib=true&_stateVer_=COLL:3032=javabin=2} status=0
>
> QTime=600092
>
> solr.log.39:2020-08-08 13:20:12.117 INFO  (qtp1775120226-46254) [c:COLL
>
> s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.S.Request
>
> [COLL_shard1_replica1]  webapp=/solr path=/admin/ping
>
> params={distrib=true&_stateVer_=COLL:3032=javabin=2}
>
> hits=63094900 status=0 QTime=600095
>
>
>
>
>
>
>
> server1:/home/kumar # curl --location --request GET '
>
> http://server1:8080/solr/COLL/admin/ping?distrib=true'
>
> 
>
> 
>
> true
> name="status">0600095
> name="q">{!lucene}*:*true
> name="df">wordTokensfalse
> name="rows">10all
> name="status">OK
>
> 
>
>

Re: Adding additional zookeeper on top

2020-08-14 Thread Dominique Bejean

Hi,

About the number of Zookeeper elements in an ensemble, you can find this
good information in this page. It applies to Solr.
https://www.cloudkarafka.com/blog/2018-07-04-cloudkarafka-how-many-zookeepers-in-a-cluster.html

   - 1 Node: no fault tolerance, no maintenance possibilities
   - 3 Nodes: An ensemble with 3 nodes will support one failure without
   loss of service, which is probably fine for most users, and also the most
   popular setup
   - 5 Nodes (recommended for real fault tolerance): A five-node cluster
   allows to take one server out for maintenance or upgrade and still be able
   to take a second unexpected failure, without interrupting your service
   - 7 Nodes: The same as for 5-node cluster but with the ability to bear
   the failure of three nodes


The Zookeeper ensemble size is not dependent of the number of Solr nodes.
Zookeeper activity is not in relation with updates or queries volume. Il is
related with :

   - solr node stop / start and so recovery
   - solr and alias collection creation / destruction ...
   - solr configset management

If your solr cluster is stable in terms of functional solr nodes and
collections, even with huge data updates and/or queries, the Zookeeper
ensemble won't be stressed.

About the upgrade method, I am not sure a hot operation is possible. In
order to minimise downtime :

Step 1 - Set up 2 new zookeeper with the 3 servers declaration
server.1=xxx:2881:3881
server.2=yyy:2882:3882
server.3=zzz:2883:3883

Step 2 - Add the 2 new servers declaration in the running zookeeper
server.2=yyy:2882:3882
server.3=zzz:2883:3883

Step 3 - Restart the running zookeeper and start the 2 new zookeeper

Step 4 - Wait for data synchronization

Step 5 - Stop, move, start the first zookeeper

Warning : do not use IP address, use server names in your Zookeeper
configuration


Regards

Dominique



Le ven. 14 août 2020 à 05:48, yaswanth kumar  a
écrit :

> Hi Team
>
> Can someone let me know if we can do an upgrade to  zookeeper ensemble
> from standalone ??
>
> I have 3 solr nodes with one zookeeper running on one of the node .. and
> it’s a solr cloud .. so now can I install zookeeper on another node just to
> make sure it’s not a single point of failure when the solr node that got
> zookeeper is down??
>
> Also want to understand what’s the best formula of choosing no of
> zookeeper that’s needed for solr cloud like for how many solr nodes .. how
> many zookeeper do we need to maintain for best fault tolerance
>
> Sent from my iPhone

Re: [Subquery] Transform Documents across Collections

2020-08-12 Thread Dominique Bejean

Hi Norbert,

The field name in collection2 is  "reporting_to" not "reporting".

Dominique



Le mer. 12 août 2020 à 11:59, Norbert Kutasi  a
écrit :

> Hello,
>
> We have been using [subquery] to come up with arbitrary complex hierarchies
> in our document responses.
>
> It works well as long as the documents are in the same collection however
> based on the reference guide I infer it can bring in documents from
> different collections except it throws an error.
>
> https://lucene.apache.org/solr/guide/8_2/transforming-result-documents.html#subquery
>
>
> We are on SOLR 8.2 and in this sandbox we have a 2 node SOLRCloud cluster,
> where both collections have 1 shard and 2 NRT replicas to ensure nodes have
> a core from each collection.
> Basic Authorization enabled.
>
> Simple steps to reproduce this issue in this 2 node environment:
> ./solr create -c Collection1 -s 1 -rf 2
> ./solr create -c Collection2 -s 1 -rf 2
>
> Note: these collections are schemaless, however we observed the ones with
> schemas.
>
> Collection 1:
> 
>
>   1
>   John
>
>
>   2
>   Peter
>
> 
>
> Collection 2:
> 
>
>   3
>   Thomas
>  2
>
>
>   4
>   Charles
>   1
>
>
>   5
>   Susan
>  3
>
> 
>
>
> http://localhost:8983/solr/Collection1/query
> {
>   params: {
> q: "*",
> fq: "*",
> rows: 5,
> fl:"*,subordinate:[subquery fromIndex=Collection2]",
> subordinate.fl:"*",
> subordinate.q:"{!field f=reporting v=$row.id}",
> subordinate.fq:"*",
> subordinate.rows:"5"
>   }
> }
>
> {
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"while invoking subordinate:[subqueryfromIndex=Collection2] on
>
> doc=SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS,
> first_name=[stored,index",
> "code":400}}
>
>
> Where do we make a mistake?
>
> Thank you in advance,
> Norbert
>

Re: Backups in SolrCloud using snapshots of individual cores?

2020-08-11 Thread Dominique Bejean

An idea could be use autoscaling API in order to add a PULL replica for
each shard located in one or more low resource backup dedicated nodes in
separate hardware.
However, we need to exclude these "PULL backup replica" from searches.
Unfortunately, I am not aware of this possibility.
For better RPO, TLOG replica would be better, but it could become an NRT
replica.

So, may be one solution could be create a new BACKUP replica type with
these characteristics :

   - According to RPO, options at creation time : based on PULL or TLOG
   sync mode
   - Search disabled


Dominique



Le mar. 11 août 2020 à 14:07, Erick Erickson  a
écrit :

> Dominique:
>
> Alternatives are under discussion, there isn’t a recommendation yet.
>
> Erick
>
> > On Aug 11, 2020, at 7:49 AM, Dominique Bejean 
> wrote:
> >
> > I missed that !
> > Are you aware about an alternative ?
> >
> > Regards
> >
> > Dominique
> >
> >
> > Le mar. 11 août 2020 à 13:15, Erick Erickson  a
> > écrit :
> >
> >> CDCR is being deprecated. so I wouldn’t suggest it for the long term.
> >>
> >>> On Aug 10, 2020, at 9:33 PM, Ashwin Ramesh 
> >> wrote:
> >>>
> >>> I would love an answer to this too!
> >>>
> >>> On Fri, Aug 7, 2020 at 12:18 AM Bram Van Dam 
> >> wrote:
> >>>
> >>>> Hey folks,
> >>>>
> >>>> Been reading up about the various ways of creating backups. The whole
> >>>> "shared filesystem for Solrcloud backups"-thing is kind of a no-go in
> >>>> our environment, so I've been looking for ways around that, and here's
> >>>> what I've come up with so far:
> >>>>
> >>>> 1. Stop applications from writing to solr
> >>>>
> >>>> 2. Commit everything
> >>>>
> >>>> 3. Identify a single core for each shard in each collection
> >>>>
> >>>> 4. Snapshot that core using CREATESNAPSHOT in the Collections API
> >>>>
> >>>> 5. Once complete, re-enable application write access to Solr
> >>>>
> >>>> 6. Create a backup from these snapshots using the replication
> handler's
> >>>> backup function (replication?command=backup=mySnapshot)
> >>>>
> >>>> 7. Put the backups somewhere safe
> >>>>
> >>>> 8. Clean up snapshots
> >>>>
> >>>>
> >>>> This seems ... too good to be true? I've seen so many threads about
> how
> >>>> hard it is to create backups in SolrCloud on this mailing list over
> the
> >>>> years, but this seems pretty straightforward? Am I missing some
> >>>> glaringly obvious reason why this will fail catastrophically?
> >>>>
> >>>> Using Solr 7.7 in this case.
> >>>>
> >>>> Feedback much appreciated!
> >>>>
> >>>> Thanks,
> >>>>
> >>>> - Bram
> >>>>
> >>>
> >>> --
> >>> **
> >>> ** <https://www.canva.com/>Empowering the world to design
> >>> Share accurate
> >>> information on COVID-19 and spread messages of support to your
> community.
> >>>
> >>> Here are some resources
> >>> <
> >>
> https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr_source=news_campaign=covid19_templates
> >
> >>
> >>> that can help.
> >>> <https://twitter.com/canva> <https://facebook.com/canva>
> >>> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
> >>> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
> >>> <https://instagram.com/canva>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
>
>

Re: Backups in SolrCloud using snapshots of individual cores?

2020-08-11 Thread Dominique Bejean

I missed that !
Are you aware about an alternative ?

Regards

Dominique


Le mar. 11 août 2020 à 13:15, Erick Erickson  a
écrit :

> CDCR is being deprecated. so I wouldn’t suggest it for the long term.
>
> > On Aug 10, 2020, at 9:33 PM, Ashwin Ramesh 
> wrote:
> >
> > I would love an answer to this too!
> >
> > On Fri, Aug 7, 2020 at 12:18 AM Bram Van Dam 
> wrote:
> >
> >> Hey folks,
> >>
> >> Been reading up about the various ways of creating backups. The whole
> >> "shared filesystem for Solrcloud backups"-thing is kind of a no-go in
> >> our environment, so I've been looking for ways around that, and here's
> >> what I've come up with so far:
> >>
> >> 1. Stop applications from writing to solr
> >>
> >> 2. Commit everything
> >>
> >> 3. Identify a single core for each shard in each collection
> >>
> >> 4. Snapshot that core using CREATESNAPSHOT in the Collections API
> >>
> >> 5. Once complete, re-enable application write access to Solr
> >>
> >> 6. Create a backup from these snapshots using the replication handler's
> >> backup function (replication?command=backup=mySnapshot)
> >>
> >> 7. Put the backups somewhere safe
> >>
> >> 8. Clean up snapshots
> >>
> >>
> >> This seems ... too good to be true? I've seen so many threads about how
> >> hard it is to create backups in SolrCloud on this mailing list over the
> >> years, but this seems pretty straightforward? Am I missing some
> >> glaringly obvious reason why this will fail catastrophically?
> >>
> >> Using Solr 7.7 in this case.
> >>
> >> Feedback much appreciated!
> >>
> >> Thanks,
> >>
> >> - Bram
> >>
> >
> > --
> > **
> > ** Empowering the world to design
> > Share accurate
> > information on COVID-19 and spread messages of support to your community.
> >
> > Here are some resources
> > <
> https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr_source=news_campaign=covid19_templates>
>
> > that can help.
> >  
> >  
> >   
> > 
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>

Re: Solrcloud tlog are not deleted

2020-08-11 Thread Dominique Bejean

Hi,

Did you disable CDCR buffer ?
solr//cdcr?action=DISABLEBUFFER

You can check with "cdcr?action=STATUS"

Regards

Dominique


Le mar. 11 août 2020 à 10:57, Michel Bamouni  a
écrit :

> Hello,
>
>
> We had setup a synchronization between our solr instances on 2 datacenters
> by using  the CDCR.
> until now, every thing worked fine but after an upgrade from solr 7.3 to
> solr 7.7, we are facing an issue.
> Indeed, our tlog files are not deleted even if we see the new values on
> the  two solr.
> It is like that the hard commit doesn't occur.
> In our solrconfig.xml file, we had configure the autocommit as below :
>
>
> 
>   ${solr.autoCommit.maxTime:15000}
>   false
> 
>
>
> and the softautocommit looks like that:
>
> 
>   ${solr.autoSoftCommit.maxTime:-1}
> 
>
>
> if someone has already meet this issue, I'm looking for your return.
>
>
> Best regards,
>
>
> Michel
>
>

Re: Backups in SolrCloud using snapshots of individual cores?

2020-08-11 Thread Dominique Bejean

  Hi,

This procedure looks fine but it is a little complexe to automatize.

Why not consider backup based on CDCR for Solrcloud or Replication for Solr
standalone ?

For Solrcloud, CDCR can be configured with source and target collections in
the same Solrcloud cluster. The target collection can have their shards
located in dedicated nodes and replication factor set to 1.

You need to be careful of locating target nodes on separate hardware (VM
and storage) and ideally in separate geographical locations.

You will be able to achieve very good RPO and RTO.
If RTO is not high, the dedicated nodes for backup destination can have few
CPU and RAM
If RTO is high we can imagine the backup becomes the live collection very
fast instead of restore or in degraded search only mode during restore.

Regards.

Dominique



Le jeu. 6 août 2020 à 16:18, Bram Van Dam  a écrit :

> Hey folks,
>
> Been reading up about the various ways of creating backups. The whole
> "shared filesystem for Solrcloud backups"-thing is kind of a no-go in
> our environment, so I've been looking for ways around that, and here's
> what I've come up with so far:
>
> 1. Stop applications from writing to solr
>
> 2. Commit everything
>
> 3. Identify a single core for each shard in each collection
>
> 4. Snapshot that core using CREATESNAPSHOT in the Collections API
>
> 5. Once complete, re-enable application write access to Solr
>
> 6. Create a backup from these snapshots using the replication handler's
> backup function (replication?command=backup=mySnapshot)
>
> 7. Put the backups somewhere safe
>
> 8. Clean up snapshots
>
>
> This seems ... too good to be true? I've seen so many threads about how
> hard it is to create backups in SolrCloud on this mailing list over the
> years, but this seems pretty straightforward? Am I missing some
> glaringly obvious reason why this will fail catastrophically?
>
> Using Solr 7.7 in this case.
>
> Feedback much appreciated!
>
> Thanks,
>
>  - Bram
>

Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-10 Thread Dominique Bejean

Doss,

See below.

Dominique


Le lun. 10 août 2020 à 17:41, Doss  a écrit :

> Hi Dominique,
>
> Thanks for your response. Find below the details, please do let me know if
> anything I missed.
>
>
> *- hardware architecture and sizing*
> >> Centos 7, VMs,4CPUs, 66GB RAM, 16GB Heap, 250GB SSD
>
>
> *- JVM version / settings*
> >> Red Hat, Inc. OpenJDK 64-Bit Server VM, version:"14.0.1 14.0.1+7" -
> Default Settings including GC
>

I don't think I would use a JVM version 14. OpenJDK 11 in my opinion is the
best choice for LTS version.


>
> *- Solr settings*
> >> softCommit: 15000 (15 sec), autoCommit: 30 (5 mins)
>  class="org.apache.solr.index.TieredMergePolicyFactory"> name="maxMergeAtOnce">30 100
> 30.0 
>
>class="org.apache.lucene.index.ConcurrentMergeScheduler"> name="maxMergeCount">18 name="maxThreadCount">6
>

You change a lot of default values. Any specific raisons ? Il seems very
aggressive !


>
>
> *- collections and queries information   *
> >> One Collection, with 4 shards , 3 replicas , 3.5 Million Records, 150
> columns, mostly integer fields, Average doc size is 350kb. Insert / Updates
> 0.5 Million Span across the whole day (peak time being 6PM to 10PM) ,
> selects not yet started. Daily once we do delta import of cetrain fields of
> type multivalued with some good amount of data.
>
> *- gc logs or gceasy results*
>
> Easy GC Report says GC health is good, one server's gc report:
> https://drive.google.com/file/d/1C2SqEn0iMbUOXnTNlYi46Gq9kF_CmWss/view?usp=sharing
> CPU Load Pattern:
> https://drive.google.com/file/d/1rjRMWv5ritf5QxgbFxDa0kPzVlXdbySe/view?usp=sharing
>
>
You have to analyze GC on all nodes !
Your heap is very big. According to full GC frequency, I don't think you
really need such a big heap for only indexing. May be when you will perform
queries.

Did you check your network performances ?
Did you check Zookeeper logs ?


>
> Thanks,
> Doss.
>
>
>
> On Mon, Aug 10, 2020 at 7:39 PM Dominique Bejean <
> dominique.bej...@eolya.fr> wrote:
>
>> Hi Doss,
>>
>> See a lot of TIMED_WATING connection occurs with high tcp traffic
>> infrastructure as in a LAMP solution when the Apache server can't
>> anymore connect to the MySQL/MariaDB database.
>> In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but
>> never net.ipv4.tcp_tw_recycle as you suggested in your previous post).
>> This
>> is well explained in this great article
>> https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux
>>
>> However, in general and more specifically in your case, I would
>> investigate
>> the root cause of your issue and do not try to find a workaround.
>>
>> Can you provide more information about your use case (we know : 3 node
>> SOLR
>> (8.3.1 NRT) + 3 Node Zookeeper Ensemble) ?
>>
>>- hardware architecture and sizing
>>- JVM version / settings
>>- Solr settings
>>- collections and queries information
>>- gc logs or gceasy results
>>
>> Regards
>>
>> Dominique
>>
>>
>>
>> Le lun. 10 août 2020 à 15:43, Doss  a écrit :
>>
>> > Hi,
>> >
>> > In solr 8.3.1 source, I see the following , which I assume could be the
>> > reason for the issue "Max requests queued per destination 3000 exceeded
>> for
>> > HttpDestination",
>> >
>> >
>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>> >private static final int MAX_OUTSTANDING_REQUESTS = 1000;
>> >
>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>> >  available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
>> >
>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>> >  return MAX_OUTSTANDING_REQUESTS * 3;
>> >
>> > how can I increase this?
>> >
>> > On Mon, Aug 10, 2020 at 12:01 AM Doss  wrote:
>> >
>> > > Hi,
>> > >
>> > > We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now
>> and
>> > > then we are facing "Max requests queued per destination 3000 exceeded
>> for
>> > > HttpDestination"
>> > >
>> > > After restart evering thing starts working fine until another problem.
>> > > Once a problem occurred we are seeing soo many TIMED_WAITING threads
>> > >
>> > > Server 1:
>> > >*7722*  Threads are in TIMED_WATING
>> > >
>> >
>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
>> > > ")
>> > > Server 2:
>> > >*4046*   Threads are in TIMED_WATING
>> > >
>> >
>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
>> > > ")
>> > > Server 3:
>> > >*4210*   Threads are in TIMED_WATING
>> > >
>> >
>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
>> > > ")
>> > >
>> > > Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can
>> we
>> > > increase the 3000 limit?
>> > >
>> > > Sorry, since I haven't got any response to my previous query,  I am
>> > > creating this as new,
>> > >
>> > > Thanks,
>> > > Mohandoss.
>> > >
>> >
>>
>

Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-10 Thread Dominique Bejean

Hi Doss,

See a lot of TIMED_WATING connection occurs with high tcp traffic
infrastructure as in a LAMP solution when the Apache server can't
anymore connect to the MySQL/MariaDB database.
In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but
never net.ipv4.tcp_tw_recycle as you suggested in your previous post). This
is well explained in this great article
https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux

However, in general and more specifically in your case, I would investigate
the root cause of your issue and do not try to find a workaround.

Can you provide more information about your use case (we know : 3 node SOLR
(8.3.1 NRT) + 3 Node Zookeeper Ensemble) ?

   - hardware architecture and sizing
   - JVM version / settings
   - Solr settings
   - collections and queries information
   - gc logs or gceasy results

Regards

Dominique



Le lun. 10 août 2020 à 15:43, Doss  a écrit :

> Hi,
>
> In solr 8.3.1 source, I see the following , which I assume could be the
> reason for the issue "Max requests queued per destination 3000 exceeded for
> HttpDestination",
>
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>private static final int MAX_OUTSTANDING_REQUESTS = 1000;
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>  available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>  return MAX_OUTSTANDING_REQUESTS * 3;
>
> how can I increase this?
>
> On Mon, Aug 10, 2020 at 12:01 AM Doss  wrote:
>
> > Hi,
> >
> > We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now and
> > then we are facing "Max requests queued per destination 3000 exceeded for
> > HttpDestination"
> >
> > After restart evering thing starts working fine until another problem.
> > Once a problem occurred we are seeing soo many TIMED_WAITING threads
> >
> > Server 1:
> >*7722*  Threads are in TIMED_WATING
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
> > ")
> > Server 2:
> >*4046*   Threads are in TIMED_WATING
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
> > ")
> > Server 3:
> >*4210*   Threads are in TIMED_WATING
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
> > ")
> >
> > Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can we
> > increase the 3000 limit?
> >
> > Sorry, since I haven't got any response to my previous query,  I am
> > creating this as new,
> >
> > Thanks,
> > Mohandoss.
> >
>

Re: Solr Down Issue

2020-08-10 Thread Dominique Bejean

Hi,

Did you analyse your gc logs ?
If not, it is the first action to do. Enable gc logs and use a tool like
https://gceasy.io/

Please provide more details about your configuration (JVM settings, ...)
and use case (QPS, queries, ...)
We just know you have 28 million indexed books (just metadata or also
content ?)  and 32 Go Ram.

Is is a Solr dedicated server ?

Regards

Dominique


Le lun. 10 août 2020 à 09:32, Rashmi Jain  a
écrit :

> Hello Team,
>
> Can you please check this on high priority.
>
> Regards,
> Rashmi
>
> From: Rashmi Jain
> Sent: Sunday, August 9, 2020 7:21 PM
> To: solr-user@lucene.apache.org
> Cc: Ritesh Sinha 
> Subject: Solr Down Issue
>
> Hello Team,
>
> I am Rashmi jain implemented solr on one of our site
> bookswagon.com. last 2-3 month we are facing
> strange issue, solr down suddenly without interrupting.   We check solr
> login and also check application logs but no clue found there regarding
> this.
> We have implemented solr 7.4 on Java SE 10 and have index
> data of books around 28 million.
> Also we are running solr on Windows server 2012 standard
> with 32 RAM.
> Please help us on this.
>
> Regards,
> Rashmi
>
>
>

Re: java.lang.StackOverflowError if pass long string in q parameter

2020-08-10 Thread Dominique Bejean

Hi,

It looks like uid field is a text field with graph filter. Do you really
need this for this specific large "OR" query ? Can't you use a string field
instead ?
Do you need to compute the score for this query ? Maybe you can use fq
instead of q ? You will have performance improvements by not computing the
score and use the cache.

Regards

Dominique

Le jeu. 6 août 2020 à 07:45, kumar gaurav  a écrit :

> HI
>
> I am getting the following exception if passing a long String in q
> parameter .
>
>
> q=uid:TCYY1EGPR38SX7EZ+OR+uid:TCYY1EGPR6M1ARAZ+OR+uid:TCYY1EGPR3NTTO3Z+OR+uid:TCYY1EGPR8L7XDZZ+OR+uid:TSHO3J0AGFUI9J3Z+OR+uid:TSHO3J0AI1CJJ2AZ+OR+uid:TSHO3J0AI4FZTBWZ+OR+uid:TDRE3J13G97WNCLZ+OR+uid:TCYY1EGPRA72BGHZ+OR+uid:TCYY1EGPR9EQUJYZ+OR+uid:TCYY1EGPRCTJXQPZ+OR+uid:TCYY1EGPR6RXPP0Z+OR+uid:TDRE3J13GBUSFV4Z+OR+uid:TTSH3FLDI7NJA8WZ+OR+uid:TERG3LIS70URWI5Z+OR+uid:TERG3LIS70QKOJAZ+OR+uid:TCYY1EGPR9EVMD5Z+OR+uid:TCYY1EGPRC8CRJ2Z+OR+uid:TCYY1EGPRGMD8MYZ+OR+uid:TCYY1EGPRM5OP68Z+OR+uid:TERG3LIS71AU8ZAZ+OR+uid:TERG3LIS719WRJWZ+OR+uid:THAQ3LIZCJ7TSEUZ+OR+uid:TERG3LIS70Q2O8IZ+OR+uid:TCYY1EGPRGXN2ZIZ+OR+uid:TCYY1EGPRGYTH3FZ+OR+uid:TCYY1EGPRK1JFUQZ+OR+uid:TCYY1EGPRM3JNN0Z+OR+uid:TERG3LIS70QPC4FZ+OR+uid:TBBA3LKKUOLVK89Z+OR+uid:TSOC1HULKNGBDUEZ+OR+uid:TSOC1HULKMTEOGTZ+OR+uid:TCYY1EGPRF93SE8Z+OR+uid:TCYY1EGPREUHNVMZ+OR+uid:TCYY1EGPRESMC0MZ+OR+uid:TCYY1EGPRDZE49OZ+OR+uid:THMB1OMS16B3OCPZ+OR+uid:TSOC1NS0MMMNAXOZ+OR+uid:TSOC1NS0GVJHP82Z+OR+uid:TSOC1NS0H3QAQQ7Z+OR+uid:TCYY2BESMSQWQBFZ+OR+uid:TCYY2BESMTJMA60Z+OR+uid:TCYY2BESN9EK5GFZ+OR+uid:TCYY2BESN9ER8PYZ+OR+uid:TSOC1NS0LBFBEAUZ+OR+uid:THAT2AOL6U500A1Z+OR+uid:THAT2AON5W2HVY9Z+OR+uid:THAT2AOL86LNHYTZ+OR+uid:TCYY2BESMO42C3GZ+OR+uid:TCYY1EGPSZSFLLTZ+OR+uid:TCYY1EGPT0X5B3DZ+OR+uid:THAT2AOL8GMD7O4Z+OR+uid:TSHT3FL6STFG1DEZ+OR+uid:TTSH3J0X6W92MPYZ+OR+uid:TTSH3J0X6SKNCECZ+OR+uid:TCYY1EGPS2J2UF4Z+OR+uid:TCYY1EGPT4HFILGZ+OR+uid:TCYY1EGPRQQQH7QZ+OR+uid:TCYY1EGPRZ72UA6Z+OR+uid:TSHT3FL6SWUTR9OZ+OR+uid:TTSH3J0X759RPQRZ+OR+uid:TTSH3J0X7ES5BR8Z+OR+uid:TTSH3J0X7CSXHAYZ+OR+uid:TCYY1EGPT74CXJMZ+OR+uid:TCYY1EGPS00631RZ+OR+uid:TCYY1EGPS0YU45YZ+OR+uid:TCYY1EGPS4BXXEFZ+OR+uid:TTSH3J0X7HFX0XMZ+OR+uid:TTSH3J0X1AY49RBZ+OR+uid:TTSH3J0X1B36WWWZ+OR+uid:TTSH3J0X1IOH3I8Z+OR+uid:TCYY1EGPSFA5BV2Z+OR+uid:TCYY1EGPSJ43BQNZ+OR+uid:TDAASAPEOHUVZZ+OR+uid:TCYY1EGPSPUZD2PZ+OR+uid:TTSH3J0X3B4S8E9Z+OR+uid:TTSH3J0X6O6TKRQZ+OR+uid:TBRF3LJHIFUI9G6Z+OR+uid:TTSH3J0X4O4S6AUZ+OR+uid:TCYY1EGPSPJHP2NZ+OR+uid:TCYY1EGPSQ95JCCZ+OR+uid:TCYY1EGPSSFR7Z0Z+OR+uid:TCYY1EGPSUYSCNKZ+OR+uid:TTSH3J0X65JG54CZ+OR+uid:TTSH3J0X6CS2ZAXZ+OR+uid:TTSH3J0X6HX537OZ+OR+uid:TTSH3J0X6PP1YGSZ+OR+uid:TCYY1EGPSWN05FGZ+OR+uid:TCYY1EGPSYB513WZ+OR+uid:TCYY1EGPSZR3X2SZ+OR+uid:TCYY1EGPT21MLB5Z+OR+uid:TBRF3LJHIFUOGPPZ+OR+uid:TTSH3J0X1TT376ZZ+OR+uid:TTSH3J0X4HE2ERLZ+OR+uid:TTSH3J0X39NEGZYZ+OR+uid:TCYY1EGPT4ZMPX4Z+OR+uid:TCCHSB60XT4YLZ+OR+uid:TCCHSB61WL7AZZ+OR+uid:TCYYSAUS1XIV3Z+OR+uid:TTSH3J0X6KMH7M2Z+OR+uid:TTSH3J0X1I5FYDGZ+OR+uid:TTSH3J0X4MISXH4Z+OR+uid:TCCHSB60XMUV1Z+OR+uid:TCCHSB61HK0B7Z+OR+uid:TCCHSB61VT84HZ+OR+uid:TCCHSB61ECHWDZ+OR+uid:TTSH3J0X1DU668XZ+OR+uid:TTSH3J0X1QGEU28Z+OR+uid:TTSH3J0X4BCEM0UZ+OR+uid:TTSH3J0X4MLHNIMZ+OR+uid:TCCHSB61E6Y87Z+OR+uid:TCYDSA2IT31VEZ+OR+uid:TCYDSA2IVH6HBZ+OR+uid:TDAASAPG0ADD5Z+OR+uid:TTSH3J0X4SZZWY7Z+OR+uid:TTSH3J0X36NM6Y7Z+OR+uid:TDAASAPFOY8EKZ+OR+uid:TDAASAPMOVIV5Z+OR+uid:TDAASAPI7JPNUZ+OR+uid:TDAASAPHV0UKXZ+OR+uid:TDAASAPFNE1HLZ+OR+uid:TDAASAPLVL68OZ+OR+uid:TDAASAPMLS2YXZ+OR+uid:TMOHS9OKD987QZ+OR+uid:TKKT1AL3XKSUWK4Z+OR+uid:TDAASAPEK2QUWZ+OR+uid:TDAASAPEL75NWZ+OR+uid:TDAASAPF8SZJSZ+OR+uid:TDAASAPBBDB7LZ+OR+uid:TKKT1AL3YGBBT6WZ+OR+uid:TKKT1AL3ZC37N63Z+OR+uid:TERG1F6W6ULALO6Z+OR+uid:TERG1F6W6V16EJOZ+OR+uid:TDAASAPF2MO5CZ+OR+uid:TCYY2BG5PF8FQEYZ+OR+uid:TCYY2BG5QA8QLLMZ+OR+uid:TCYY2BG5R1YBCSAZ+OR+uid:TERG1F6W6V1P63TZ+OR+uid:TERG1F6W6VDJOHPZ+OR+uid:TERG1F6W6VP70CYZ+OR+uid:TERG1F6W6WX5D8KZ+OR+uid:TCYY2BG5REQ3IJHZ+OR+uid:TCYY2BG5RRFVUGDZ+OR+uid:TDAASAPDZ31GKZ+OR+uid:TDAASAPH1HNF1Z+OR+uid:TERG1F6W6XQ5UWYZ+OR+uid:TERG1F6W71MQDF5Z+OR+uid:TERG1F6W736SVVJZ+OR+uid:TRNG1F6W9NO8IW7Z+OR+uid:TDAASAPCC56WXZ+OR+uid:TDAASAPE9IZZ0Z+OR+uid:TDAASAPHVBD96Z+OR+uid:TDAASAPIDRAJ6Z+OR+uid:TRNG1F6W9NR1AJNZ+OR+uid:TRNG1F6W9O4U8GRZ+OR+uid:TRNG1F6W9OJE1CJZ+OR+uid:TRNG1F6W9PQJQGUZ+OR+uid:TDPYS9W9CZH9OZ+OR+uid:TSBNSB3TFCCEJZ+OR+uid:TMOUSB8BSI8VZZ+OR+uid:THDPVN9E24NKWZ+OR+uid:TRNG1F6W9PW0F9IZ+OR+uid:TRNGTB9IGWYVVZ+OR+uid:TWAT2FNTMY5NML5Z+OR+uid:TWAT2FNTMY5S3JAZ+OR+uid:THDPVN9DMRA5PZ+OR+uid:TKEISA5KVQB3SZ+OR+uid:TLTB2HENA46KWL8Z+OR+uid:TLTBSALTEXOSKZ+OR+uid:TWAT2FNTONX41SZZ+OR+uid:TWAT2FNTOODV5F8Z+OR+uid:TWAT2FNTOOLD7WKZ
>
> {
>   "error":{
> "msg":"java.lang.StackOverflowError",
> "trace":"java.lang.RuntimeException:
> java.lang.StackOverflowError\n\tat
> org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:662)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:530)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>
>

Re: Lucene-Solr project split

2020-06-09 Thread Dominique Bejean

Thank you.

Dominique

Le mar. 9 juin 2020 à 15:18, Ilan Ginzburg  a écrit :

> See also https://issues.apache.org/jira/browse/SOLR-14521
>
> On Tue, Jun 9, 2020 at 3:17 PM Ilan Ginzburg  wrote:
>
>> Yes.
>>
>>
>> https://lists.apache.org/thread.html/raab13cabe321d12b6cda7dc6e529176f51ece31d30f00997dd36570a%40%3Cdev.lucene.apache.org%3E
>>
>> Ilan
>>
>> On Tue, Jun 9, 2020 at 3:10 PM Dominique Bejean <
>> dominique.bej...@eolya.fr> wrote:
>>
>>> Hi,
>>>
>>> One of my clients claims that the Lucene-Solr project will split into two
>>> separate projects after a vote of the community. I cannot find any trace
>>> of
>>> discussions on this subject. Is it true ?
>>>
>>> Regards.
>>>
>>> Dominique
>>>
>>

Re: Solrcloud 6.6 becomes nuts

2020-06-09 Thread Dominique Bejean

Hi,

We had the problem again a few days ago.

I have noticed that each time the problem occurs the old generation of the
heap suddenly grows. Its size is generally between 0,5 et 1,5Gb on 3Gg
limit. In 4 minutes the old generation grows to 3Gb and never goes down as
consecutive GC reclaims 0 bytes.

I have a heap dump for a previous problem. Near 1.7 Gb is of memory is an
array of 6 BoostQuery objects including themself a huge array of
PhraseQuery$PhraseWeigth

I also noticed that a few queries (nearly 10 per minute) use the parameter "
debug=track=timing". I have asked the customer not enable debug in
production. Can query debug cause so high memory usage ?

Regards.

Dominique

Le lun. 18 mai 2020 à 09:42, Dominique Bejean  a
écrit :

> Hi Shawn,
>
> In fact, I was using logs from a core at WARN log level so with only slow
> queries (>500ms).
>
> I just checked in a core at INFO log level with all queries (we set the
> log level top INFO for one core after the previous crash) and there is no
> more queries with these two facets when the problem starts. There are
> nearly 150 queries per minute faceting with the 750K unique terms fields
> during the 3 hours before the problem occurs and no increase during the few
> minutes before and when the problem starts.
>
> I can't see anything specific in logs at the time the problem start.
>
> Regards
>
> Dominique
>
>
>
>
> Le lun. 18 mai 2020 à 03:28, Shawn Heisey  a écrit :
>
>> On 5/17/2020 4:18 PM, Dominique Bejean wrote:
>> > I was not thinking that queries using facet with fields with high number
>> > of unique value but with low hits count can be the origin of this
>> problem.
>>
>> Performance for most things does not depend on numFound (hit count) or
>> the rows parameter.  The number of terms in the field and the total
>> number of documents in the index matters a lot more.
>>
>> If you do facets or grouping on a field with 750K unique terms, it's
>> going to be very slow and require a LOT of memory.  I would not be
>> surprised to see it require more than 4GB.  These features are designed
>> to work best with fields that have a relatively small number of possible
>> values.
>>
>> Thanks,
>> Shawn
>>
>

Lucene-Solr project split

2020-06-09 Thread Dominique Bejean

Hi,

One of my clients claims that the Lucene-Solr project will split into two
separate projects after a vote of the community. I cannot find any trace of
discussions on this subject. Is it true ?

Regards.

Dominique

Re: Solrcloud 6.6 becomes nuts

2020-05-18 Thread Dominique Bejean

Hi Shawn,

In fact, I was using logs from a core at WARN log level so with only slow
queries (>500ms).

I just checked in a core at INFO log level with all queries (we set the log
level top INFO for one core after the previous crash) and there is no more
queries with these two facets when the problem starts. There are nearly 150
queries per minute faceting with the 750K unique terms fields during the 3
hours before the problem occurs and no increase during the few minutes
before and when the problem starts.

I can't see anything specific in logs at the time the problem start.

Regards

Dominique

Le lun. 18 mai 2020 à 03:28, Shawn Heisey  a écrit :

> On 5/17/2020 4:18 PM, Dominique Bejean wrote:
> > I was not thinking that queries using facet with fields with high number
> > of unique value but with low hits count can be the origin of this
> problem.
>
> Performance for most things does not depend on numFound (hit count) or
> the rows parameter.  The number of terms in the field and the total
> number of documents in the index matters a lot more.
>
> If you do facets or grouping on a field with 750K unique terms, it's
> going to be very slow and require a LOT of memory.  I would not be
> surprised to see it require more than 4GB.  These features are designed
> to work best with fields that have a relatively small number of possible
> values.
>
> Thanks,
> Shawn
>

Re: Solrcloud 6.6 becomes nuts

2020-05-17 Thread Dominique Bejean

Mickhail,


Thank you for your response.


--- For the logs

On not leader replica, there are no error in log, only WARN due to slow
queries.

On leader replica, there are these errors:

* Twice per minute during all the day before the problem starts and also
after the problem start
RequestHandlerBase org.apache.solr.common.SolrException: Collection: xx
not found
where xx is the alias name pointing on the collection

* Just after the problem start
2020-05-13 15:24:41.450 ERROR (qtp1682092198-315202) [c:xx_2 s:shard3
r:core_node1 x:xx_2_shard3_replica0] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request:[
http://XX127:8983/solr/xx_2_shard1_replica1,
http://XX132:8983/solr/xx_2_shard2_replica0]
2020-05-13 15:24:41.451 ERROR (qtp1682092198-315202) [c:xx_2 s:shard3
r:core_node1 x:xx_2_shard3_replica0] o.a.s.s.HttpSolrCall
null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request:[
http://XX127:8983/solr/xx_2_shard1_replica1,
http://XX132:8983/solr/xx_2_shard2_replica0]

2020-05-13 15:25:49.642 ERROR (qtp1682092198-315193) [c:xx_2 s:shard3
r:core_node1 x:xx_2_shard3_replica0] o.a.s.s.HttpSolrCall
null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
timeout expired: 51815/5 ms

and later until the JVM hangs
2020-05-13 15:58:54.397 ERROR (qtp1682092198-316314) [c:xx_2 s:shard3
r:core_node1 x:xx_2_shard3_replica0] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: no servers hosting shard:
xx_2_shard2

No OOM errors in Solr logs, just OOM killer scripts log
Running OOM killer script for process 4488 for Solr on port 8983
Killed process 4488


--- For heap dump

I have dump for one shard leader just before the OOM script kill the JVM
but more than one hour the problem starts. I will take a look.

Regards.

Dominique










Le dim. 17 mai 2020 à 20:22, Mikhail Khludnev  a écrit :

> Hello, Dominique.
> What did it log? Which exception?
> Do you have a chance to review heap dump? What did consume whole heap?
>
> On Sun, May 17, 2020 at 11:05 AM Dominique Bejean <
> dominique.bej...@eolya.fr> wrote:
>
>> Hi,
>>
>> I have a six node Solrcoud that suddenly has its six nodes failed with OOM
>> at the same time.
>> This can happen even when the Solrcloud is not under heavy load and there
>> is no indexing.
>>
>> I do not see any raison for this to happen. Here are the description of
>> the
>> issue. Thank you for your suggestions and advices.
>>
>>
>> One or two hours before the nodes stop with OOM, we see this scenario on
>> all six nodes during the same five minutes time frame :
>> * a little bit more young gc : from one each second (duration<0.05secs) to
>> one each two or three seconds (duration <0.15 sec)
>> * full gc start occurs each 5sec with 0 bytes reclaimed
>> * young gc start reclaim less bytes
>> * long full gc start reclaim bytes but with less and less reclaimed bytes
>> * then no more young GC
>> Here are GC graphs : https://www.eolya.fr/solr_issue_gc.png
>>
>>
>> Just before the problem occurs :
>> * there is no more requests per seconds
>> * no update/commit/merge
>> * CPU usage and load are low
>> * disk I/O are low
>> After the problem starts, requests become longer and longer but still no
>> increase of CPU usage or disk I/O
>>
>>
>> During last issue, we dumped the threads on one node just before OOM but
>> unfortunately, more than one hour after the problem starts.
>> 85% of threads (more than 3000) are BLOCKED and related to log4j
>> Solr either try to log slow query or try to log problems in requesthandler
>> at org.apache.solr.common.SolrException.log(SolrException.java:148)
>> at
>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:204)
>>
>> This high count of BLOCKED threads is more a consequence than a cause. We
>> will dump threads each minute until the next issue.
>>
>>
>> About Solr environment :
>> * Solr 6.6
>> * Java Oracle 1.8.0_112 25.112-b15
>>
>> * 1 collection with 10 millions small documents
>> * 3 shards x 2 replicas
>> * 3.5 millions docs per core
>> * 90 Gb index size per core
>>
>> * Server with 6 processors and 90 Gb of RAM
>> * Swappiness set to 1, nearly no swap used
>> * 4Gb Heap used nearly between 25 to 60% before young GC and one full GC
>> (3
>> seconds) each 15 to 30 minutes when all is fine.
>>
>> * Default JVM settin

Re: Solrcloud 6.6 becomes nuts

2020-05-17 Thread Dominique Bejean

Hi Shawn,

There is no OOM error in logs. I gave more details in response to  Mickhail.

The problem starts with full GC near 15h20 but Young GC changed a little
starting 15h10.
Here are the heap usage before and after during this period.
https://www.eolya.fr/solr_issue_heap_before_after.png

There is no grouping but there are faceting.
The collection contains 10.000.000 documents

2 fields contains each 60.000 and 750.000 uniq values

These two fields were used in query for faceting 1 to 10 times per hour
before the problem starts
They are used a lot during the 20 minutes the problem starts
* 50 times for the field with  750.000 uniq values
* 250 times for the field with 60.000 uniq values

Hits count for these queries are mainly under 10, a couple of time between
100 and 1000.
Once hits count is  2000 for the field with  60.000 uniq values

In the other hand these queries are very long.

We will investigate this !

I was not thinking that queries using facet with fields with high number
of unique value but with low hits count can be the origin of this problem.


Regards

Dominique







Le dim. 17 mai 2020 à 21:45, Shawn Heisey  a écrit :

> On 5/17/2020 2:05 AM, Dominique Bejean wrote:
> > One or two hours before the nodes stop with OOM, we see this scenario on
> > all six nodes during the same five minutes time frame :
> > * a little bit more young gc : from one each second (duration<0.05secs)
> to
> > one each two or three seconds (duration <0.15 sec)
> > * full gc start occurs each 5sec with 0 bytes reclaimed
> > * young gc start reclaim less bytes
> > * long full gc start reclaim bytes but with less and less reclaimed bytes
> > * then no more young GC
> > Here are GC graphs : https://www.eolya.fr/solr_issue_gc.png
>
> Do you have the OutOfMemoryException in the solr log?  From the graph
> you provided, it does look likely that it was heap memory on the OOME,
> I'd just like to be sure, by seeing the logged exception.
>
> Between 15:00 and 15:30, something happened which suddenly required
> additional heap memory.  Do you have any idea what that was?  If you can
> zoom in on the graph, you could get a more accurate time for this.  I am
> looking specifically at the "heap usage before GC" graph.  The "heap
> usage after GC" graph that gceasy makes, which has not been included
> here, is potentially more useful.
>
> I found that I most frequently ran into memory problems when I executed
> a data mining query -- doing facets or grouping on a high cardinality
> field, for example.  Those kinds of queries required a LOT of extra memory.
>
> If the servers have any memory left, you might need to increase the max
> heap beyond where it currently sits.  To handle your indexes and
> queries, Solr may simply require more memory than you have allowed.
>
> Thanks,
> Shawn
>

Solrcloud 6.6 becomes nuts

2020-05-17 Thread Dominique Bejean

Hi,

I have a six node Solrcoud that suddenly has its six nodes failed with OOM
at the same time.
This can happen even when the Solrcloud is not under heavy load and there
is no indexing.

I do not see any raison for this to happen. Here are the description of the
issue. Thank you for your suggestions and advices.


One or two hours before the nodes stop with OOM, we see this scenario on
all six nodes during the same five minutes time frame :
* a little bit more young gc : from one each second (duration<0.05secs) to
one each two or three seconds (duration <0.15 sec)
* full gc start occurs each 5sec with 0 bytes reclaimed
* young gc start reclaim less bytes
* long full gc start reclaim bytes but with less and less reclaimed bytes
* then no more young GC
Here are GC graphs : https://www.eolya.fr/solr_issue_gc.png


Just before the problem occurs :
* there is no more requests per seconds
* no update/commit/merge
* CPU usage and load are low
* disk I/O are low
After the problem starts, requests become longer and longer but still no
increase of CPU usage or disk I/O


During last issue, we dumped the threads on one node just before OOM but
unfortunately, more than one hour after the problem starts.
85% of threads (more than 3000) are BLOCKED and related to log4j
Solr either try to log slow query or try to log problems in requesthandler
at org.apache.solr.common.SolrException.log(SolrException.java:148)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:204)

This high count of BLOCKED threads is more a consequence than a cause. We
will dump threads each minute until the next issue.


About Solr environment :
* Solr 6.6
* Java Oracle 1.8.0_112 25.112-b15

* 1 collection with 10 millions small documents
* 3 shards x 2 replicas
* 3.5 millions docs per core
* 90 Gb index size per core

* Server with 6 processors and 90 Gb of RAM
* Swappiness set to 1, nearly no swap used
* 4Gb Heap used nearly between 25 to 60% before young GC and one full GC (3
seconds) each 15 to 30 minutes when all is fine.

* Default JVM settings with CMS GC
* JMX enabled
* Average Request per seconds in pic on one core : 170, but during the last
issue the Average Request per seconds was 30 !!!
* Average Time per seconds : < 30 ms

About updates :
* Very few add/updates in general
* Some deleteByQuery (nearly 2000 per day) but not before the problem occurs
* autocommit maxTime:15000ms

About queries :
* Queries are standard queries or suggesters
* Queries generate facets but there is no fields with very high number of
unique values
* No grouping
* High usage of function query for relevance computing


Thank you.

Dominique

Re: Solr Cloud on Docker?

2020-02-05 Thread Dominique Bejean

Thank you Dwane. Great info :)


Le mer. 5 févr. 2020 à 11:49, Dwane Hall  a écrit :

> Hey Dominique,
>
> From a memory management perspective I don't do any container resource
> limiting specifically in Docker (although as you mention you certainly
> can).  In our circumstances these hosts are used specifically for Solr so I
> planned and tested my capacity beforehand. We have ~768G of RAM on each of
> these 5 hosts so with 20x16G heaps we had ~320G of heap being used by Solr,
> some overhead for Docker and the other OS services leaving ~400G for the OS
> cache and whatever wants to grab it on each host. Not everyone will have
> servers this large which is why we really had to take advantage of multiple
> Solr instances/host and Docker became important for our cluster operation
> management.  Our disk's are not SSD's either and all instances write to the
> same raid 5 spinner which is bind mounted to the containers.  With this
> configuration we've been able to achieve consistent median response times
> of under 500ms across the largest collection but obviously query type
> varies this (no terms, leading wildcards etc.).  Our QPS is not huge
> ranging from 2-20/sec but if we need to scale further or speed up response
> times there's certainly wins that can be made at a disk level.  For our
> current circumstances we're very content with the deployment.
>
> In not sure if you've read Toke's blog on his experiences at the Royal
> Danish Library but I found it really useful when capacity planning and
> recommend reading it (
> https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-4-machines-1-solrcloud/
> ).
>
> As always it's recommend to test for your own conditions and best of luck
> with your deployment!
>
> Dwane
>
> --
> *From:* Scott Stults 
> *Sent:* Thursday, 30 January 2020 1:45 AM
> *To:* solr-user@lucene.apache.org 
> *Subject:* Re: Solr Cloud on Docker?
>
> One of our clients has been running a big Solr Cloud (100-ish nodes, TB
> index, billions of docs) in kubernetes for over a year and it's been
> wonderful. I think during that time the biggest scrapes we got were when we
> ran out of disk space. Performance and reliability has been solid
> otherwise. Like Dwane alluded to, a lot of operations pitfalls can be
> avoided if you do your Docker orchestration through kubernetes.
>
>
> k/r,
> Scott
>
> On Tue, Jan 28, 2020 at 3:34 AM Dominique Bejean <
> dominique.bej...@eolya.fr>
> wrote:
>
> > Hi  Dwane,
> >
> > Thank you for sharing this great solr/docker user story.
> >
> > According to your Solr/JVM memory requirements (Heap size + MetaSpace +
> > OffHeap size) are you specifying specific settings in docker-compose
> files
> > (mem_limit, mem_reservation, mem_swappiness, ...) ?
> > I suppose you are limiting total memory used by all dockerised Solr in
> > order to keep free memory on host for MMAPDirectory ?
> >
> > In short can you explain the memory management ?
> >
> > Regards
> >
> > Dominique
> >
> >
> >
> >
> > Le lun. 23 déc. 2019 à 00:17, Dwane Hall  a
> écrit :
> >
> > > Hey Walter,
> > >
> > > I recently migrated our Solr cluster to Docker and am very pleased I
> did
> > > so. We run relativity large servers and run multiple Solr instances per
> > > physical host and having managed Solr upgrades on bare metal installs
> > since
> > > Solr 5, containerisation has been a blessing (currently Solr 7.7.2). In
> > our
> > > case we run 20 Solr nodes per host over 5 hosts totalling 100 Solr
> > > instances. Here I host 3 collections of varying size. The first
> contains
> > > 60m docs (8 shards), the second 360m (12 shards) , and the third 1.3b
> (30
> > > shards) all with 2 NRT replicas. The docs are primarily database
> sourced
> > > but are not tiny by any means.
> > >
> > > Here are some of my comments from our migration journey:
> > > - Running Solr on Docker should be no different to bare metal. You
> still
> > > need to test for your environment and conditions and follow the guides
> > and
> > > best practices outlined in the excellent Lucidworks blog post
> > >
> >
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> > > .
> > > - The recent Solr Docker images are built with Java 11 so if you store
> > > your indexes in hdfs you'll have to build your own Docker image as
> Hadoop
> > > is not yet certified with Java 11 (or use an older Solr version image
> > built
> > >

Re: Solr Cloud on Docker?

2020-01-28 Thread Dominique Bejean

Hi  Dwane,

Thank you for sharing this great solr/docker user story.

According to your Solr/JVM memory requirements (Heap size + MetaSpace +
OffHeap size) are you specifying specific settings in docker-compose files
(mem_limit, mem_reservation, mem_swappiness, ...) ?
I suppose you are limiting total memory used by all dockerised Solr in
order to keep free memory on host for MMAPDirectory ?

In short can you explain the memory management ?

Regards

Dominique




Le lun. 23 déc. 2019 à 00:17, Dwane Hall  a écrit :

> Hey Walter,
>
> I recently migrated our Solr cluster to Docker and am very pleased I did
> so. We run relativity large servers and run multiple Solr instances per
> physical host and having managed Solr upgrades on bare metal installs since
> Solr 5, containerisation has been a blessing (currently Solr 7.7.2). In our
> case we run 20 Solr nodes per host over 5 hosts totalling 100 Solr
> instances. Here I host 3 collections of varying size. The first contains
> 60m docs (8 shards), the second 360m (12 shards) , and the third 1.3b (30
> shards) all with 2 NRT replicas. The docs are primarily database sourced
> but are not tiny by any means.
>
> Here are some of my comments from our migration journey:
> - Running Solr on Docker should be no different to bare metal. You still
> need to test for your environment and conditions and follow the guides and
> best practices outlined in the excellent Lucidworks blog post
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> .
> - The recent Solr Docker images are built with Java 11 so if you store
> your indexes in hdfs you'll have to build your own Docker image as Hadoop
> is not yet certified with Java 11 (or use an older Solr version image built
> with Java 8)
> - As Docker will be responsible for quite a few Solr nodes it becomes
> important to make sure the Docker daemon is configured in systemctl to
> restart after failure or reboot of the host. Additionally the Docker
> restart=always setting is useful for restarting failed containers
> automatically if a single container dies (i.e. JVM explosions). I've
> deliberately blown up the JVM in test conditions and found the
> containers/Solr recover really well under Docker.
> - I use Docker Compose to spin up our environment and it has been
> excellent for maintaining consistent settings across Solr nodes and hosts.
> Additionally using a .env file makes most of the Solr environment variables
> per node configurable in an external file.
> - I'd recommend Docker Swarm if you plan on running Solr over multiple
> physical hosts. Unfortunately we had an incompatible OS so I was unable to
> utilise this approach. The same incompatibility existed for K8s but
> Lucidworks has another great article on this approach if you're more
> fortunate with your environment than us
> https://lucidworks.com/post/running-solr-on-kubernetes-part-1/.
> - Our Solr instances are TLS secured and use the basic auth plugin and
> rules based authentication provider. There's nothing I have not been able
> to configure with the default Docker images using environment variables
> passed into the container. This makes upgrades to Solr versions really easy
> as you just need to grab the image and pass in your environment details to
> the container for any new Solr version.
> - If possible I'd start with the Solr 8 Docker image. The project
> underwent a large refactor to align it with the install script based on
> community feedback. If you start with an earlier version you'll need to
> refactor when you eventually move to Solr version 8. The Solr Docker page
> has more details on this.
> - Matijn Koster (the project lead) is excellent and very responsive to
> questions on the project page. Read through the q page before reaching
> out I found a lot of my questions already answered there.  Additionally, he
> provides a number of example Docker configurations from command line
> parameters to docker-compose files running multiple instances and zookeeper
> quarums.
> - The Docker extra hosts parameter is useful for adding extra hosts to
> your containers hosts file particularly if you have multiple nic cards with
> internal and external interfaces and you want to force communication over a
> specific one.
> - We use the Solr Prometheus exporter to collect node metrics. I've found
> I've needed to reduce the metrics to collect as having this many nodes
> overwhelmed it occasionally. From memory it had something to do with
> concurrent modification of Future objects the collector users and it
> sometimes misses collection cycles. This is not Docker related but Solr
> size related and the exporter's ability to handle it.
> - We use the zkCli script a lot for updating configsets. As I did not want
> to have to copy them into a container to update them I just download a copy
> of the Solr binaries and use it entirely for this zookeeper script. It's
> not elegant but a number of our Dev's are

Re: Convert TLOG collection to NRT

2019-12-10 Thread Dominique Bejean

Thank you Shawn.
You're right !
It is better to read the good version of the Collection API documentation.


Le mar. 10 déc. 2019 à 19:49, Shawn Heisey  a écrit :

> On 12/10/2019 11:25 AM, Dominique Bejean wrote:
> > I would like to convert a collection (3 shards x 3 replicas) from TLOG to
> > NRT.
> >
> > The only solution I imagine is something like :
> > * with collection API, remove replicas in order to keep only 1 replica
> per
> > 3 shard
> > * update the collection state.json in zookeer
> > * with collection API, reload the collection
> > * with collection API, add 2 replicas per shard
>
> I have not actually done this, but based on my understanding of how the
> collections API functions, you could just run ADDREPLICA to create a new
> NRT replica on the desired host, then DELETEREPLICA to remove the TLOG
> replica that it replaces.  Repeat those two steps for each one you want
> to convert.  I don't think reloading would be required, but might be
> something you want to do after you're all done with those operations.
> If you expect copying the shard to the new replica to take longer than 3
> minutes, you should do the ADDREPLICA operations as async requests.
>
> Modifying data in zookeeper directly is an expert option, not something
> you would want to do unless you've got a very deep understanding of
> SolrCloud code.  It could leave your setup in a state that's difficult
> to fix.
>
> Thanks,
> Shawn
>

Convert TLOG collection to NRT

2019-12-10 Thread Dominique Bejean

Hi,

I would like to convert a collection (3 shards x 3 replicas) from TLOG to
NRT.

The only solution I imagine is something like :
* with collection API, remove replicas in order to keep only 1 replica per
3 shard
* update the collection state.json in zookeer
* with collection API, reload the collection
* with collection API, add 2 replicas per shard

Is there a simpler solution ?

Regards

Dominique

Re: Zk upconfig command is appending local directory to default confdir

2019-11-18 Thread Dominique Bejean

Hi Michael,

It seems Sorl really don't find any solrconfig.xml file or a
conf/solrconfig.xml file in the local path you specified. The last try is
to look in "/opt/solr-6.5.1/server/solr/configsets/", but obviously it doesn't work has you didn't specify a
confiset name.

The code is here -
https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/solrj/src/java/org/apache/solr/common/cloud/ZkConfigManager.java#L181


Any error in read access rights to your config directory ?

Regards

Dominique



Le lun. 18 nov. 2019 à 15:48, Michael Becker  a écrit :

> I’ve run into an issue when attempting to configure Zookeeper. When
> running the zk upconfig -d command specifying a local directory where the
> solrconfig.xml files are located, I get the following error:
> “Could not complete upconfig operation for reason: Could not find
> solrconfig.xml at /opt/solr-6.5.1/server/solr/configsets/solrconfig.xml,
> /opt/solr-6.5.1/server/solr/configsets/conf/solrconfig.xml or
> /opt/solr-6.5.1/server/solr/configsets/ 
> /solrconfig.xml”
>
> I’m trying to determine why the solr zk upconfig command is appending my
> local directory to the default confdir, rather than looking for the XML
> files in that directory,
> I have two other environments with Solr where this does not occur. It’s
> just this one environment that is having this issue.
> I am using Solr version 6.5.1.
> Any suggestions on how to troubleshoot this would be appreciated.
>
> Mike
>

Re: When does Solr write in Zookeeper ?

2019-11-18 Thread Dominique Bejean

Thanh you Shawn


Le lun. 18 nov. 2019 à 19:28, Shawn Heisey  a écrit :

> On 11/18/2019 8:39 AM, Dominique Bejean wrote:
> > How Solr nodes know that something was changed in Zookeeper by an other
> > node ? Is there any notification from ZK or do Solr nodes read
> > systematically in ZK (without local caching) ?
>
> This is built-in functionality of ZooKeeper.  The client allows setting
> what's called watches, which trigger when the watched node changes.
>
>
> https://zookeeper.apache.org/doc/r3.5.6/zookeeperProgrammers.html#sc_zkDataMode_watches
>
> This functionality is used extensively in SolrCloud.
>
> Thanks,
> Shawn
>

Re: When does Solr write in Zookeeper ?

2019-11-18 Thread Dominique Bejean

How Solr nodes know that something was changed in Zookeeper by an other
node ? Is there any notification from ZK or do Solr nodes read
systematically in ZK (without local caching) ?

Dominique



Le ven. 15 nov. 2019 à 18:36, Erick Erickson  a
écrit :

> Dominique:
>
> In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is
> actively involved in queries/updates/whatever. Basically, what ZK is
> responsible for is maintaining collection-wide resources, i.e. the current
> state of all the replicas, config files, etc., your “global configuration"
> and "collection configuration”, which should change very rarely thus rarely
> generate writes.
>
> The “collection state” (including your “nodes state”) information changes
> more frequently and generates more writes as nodes come up and down, go
> into recovery, etc. That said, for a cluster where all the replicas are
> “active” and don’t go away or go into recovery etc, ZK won’t do any writes.
>
> So the consequence is that when you power up a cluster, there will be a
> flurry of write operations managed by the Overseer, but after all the
> replicas are up, write activity should pretty much cease.
>
> As long as the state is steady, i.e. no replicas changing state, each
> individual Solr node has a copy of the relevant collection’s “state.json”
> znode and has all the information it needs to query or index without asking
> Zookeeper without _either_ reading or writing to ZK.
>
> One rather obscure cause for ZK writes is when using “schemaless” mode.
> When a new field is detected, the schema (and thus the collection’s
> configuration) is changed, which generates writes..
>
> Best,
> Erick
>
>
> > On Nov 15, 2019, at 12:06 PM, Dominique Bejean <
> dominique.bej...@eolya.fr> wrote:
> >
> > Hi,
> >
> > I would like to be certain to understand how Solr use Zookeeper and more
> > precisely when Solr write into Zookeeper.
> >
> > Solr stores various informations in ZK
> >
> >   - globale configuration (autoscaling, security.json)
> >   - collection configuration (configs)
> >   - collections state (state.json, leaders, ...)
> >   - nodes state (live_nodes, overseer)
> >
> >
> > Writes in Zk occur when
> >
> >   - a zookeeper member start or stop
> >   - a solr node start or stop
> >   - a configuration is loaded
> >   - a collection is created, deleted or updated (nearly all call to
> >   collection, core or config API)
> >
> >
> > Write do not occur during
> >
> >   - SolrJ client creation
> >   - indexing data (Solrj, HTTP, DIH, ...)
> >   - searching (Solrj, HTTP)
> >
> >
> > In conclusion, if Solr nodes are stable (no failure, no maintenance), no
> > calls to  collection, core or config API are done, so there is nearly no
> > writes to ZK.
> >
> > Is it correct ?
> >
> >
> > Regards
> >
> > Dominique
>
>

Re: $deleteDocByQuery and $deleteDocByID

2019-11-15 Thread Dominique Bejean

Hi Paresh,

Due to deleteDocByQuery impact on commits and searcher reopen, if a lot of
deletions are done it is preferable when possible to use deletebyid .

Regards

Dominique

Le mar. 12 nov. 2019 à 07:03, Paresh  a écrit :

> Hi Erik,
>
> I am also looking for some example of deleteDocByQuery. Here is my
> requirement -
>
> I want to do the database query and get the list of values of which
> matching
> documents should be deleted from Solr.
>
> I want to delete the docs which matches following query
> SolrColumnName:
>
> This  will come from query executed on RDBMS -
> select columnName from Table where state = 'deleted'
>
> This columnName is the value populated in Solr for SolrColulmnName.
>
> Regards,
> Paresh
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: When does Solr write in Zookeeper ?

2019-11-15 Thread Dominique Bejean

Thank you Erick for this fast answer
Why is it a best practice to set the zookeeper  connection timeout to 3
instead the default 15000 value?

Regards

Dominique

Le ven. 15 nov. 2019 à 18:36, Erick Erickson  a
écrit :

> Dominique:
>
> In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is
> actively involved in queries/updates/whatever. Basically, what ZK is
> responsible for is maintaining collection-wide resources, i.e. the current
> state of all the replicas, config files, etc., your “global configuration"
> and "collection configuration”, which should change very rarely thus rarely
> generate writes.
>
> The “collection state” (including your “nodes state”) information changes
> more frequently and generates more writes as nodes come up and down, go
> into recovery, etc. That said, for a cluster where all the replicas are
> “active” and don’t go away or go into recovery etc, ZK won’t do any writes.
>
> So the consequence is that when you power up a cluster, there will be a
> flurry of write operations managed by the Overseer, but after all the
> replicas are up, write activity should pretty much cease.
>
> As long as the state is steady, i.e. no replicas changing state, each
> individual Solr node has a copy of the relevant collection’s “state.json”
> znode and has all the information it needs to query or index without asking
> Zookeeper without _either_ reading or writing to ZK.
>
> One rather obscure cause for ZK writes is when using “schemaless” mode.
> When a new field is detected, the schema (and thus the collection’s
> configuration) is changed, which generates writes..
>
> Best,
> Erick
>
>
> > On Nov 15, 2019, at 12:06 PM, Dominique Bejean <
> dominique.bej...@eolya.fr> wrote:
> >
> > Hi,
> >
> > I would like to be certain to understand how Solr use Zookeeper and more
> > precisely when Solr write into Zookeeper.
> >
> > Solr stores various informations in ZK
> >
> >   - globale configuration (autoscaling, security.json)
> >   - collection configuration (configs)
> >   - collections state (state.json, leaders, ...)
> >   - nodes state (live_nodes, overseer)
> >
> >
> > Writes in Zk occur when
> >
> >   - a zookeeper member start or stop
> >   - a solr node start or stop
> >   - a configuration is loaded
> >   - a collection is created, deleted or updated (nearly all call to
> >   collection, core or config API)
> >
> >
> > Write do not occur during
> >
> >   - SolrJ client creation
> >   - indexing data (Solrj, HTTP, DIH, ...)
> >   - searching (Solrj, HTTP)
> >
> >
> > In conclusion, if Solr nodes are stable (no failure, no maintenance), no
> > calls to  collection, core or config API are done, so there is nearly no
> > writes to ZK.
> >
> > Is it correct ?
> >
> >
> > Regards
> >
> > Dominique
>
>

When does Solr write in Zookeeper ?

2019-11-15 Thread Dominique Bejean

Hi,

I would like to be certain to understand how Solr use Zookeeper and more
precisely when Solr write into Zookeeper.

Solr stores various informations in ZK

   - globale configuration (autoscaling, security.json)
   - collection configuration (configs)
   - collections state (state.json, leaders, ...)
   - nodes state (live_nodes, overseer)


Writes in Zk occur when

   - a zookeeper member start or stop
   - a solr node start or stop
   - a configuration is loaded
   - a collection is created, deleted or updated (nearly all call to
   collection, core or config API)


Write do not occur during

   - SolrJ client creation
   - indexing data (Solrj, HTTP, DIH, ...)
   - searching (Solrj, HTTP)


In conclusion, if Solr nodes are stable (no failure, no maintenance), no
calls to  collection, core or config API are done, so there is nearly no
writes to ZK.

Is it correct ?


Regards

Dominique

Re: NRT vs TLOG bulk indexing performances

2019-10-30 Thread Dominique Bejean

Hi,

<http://gofile.me/2dlpH/66hv2NPhJ>Thank you Erick for your response.

My documents are small. Here is a sample csv file
http://gofile.me/2dlpH/66hv2NPhJ

In the TLOG case, the CPU is not hot and not idling

on leaders :

   - 1m load average between 1.5 and 2.5 (4 cpu cores)
   - CPU % between 20% and 50% with average at 30%
   - CPU I/O wait % average : 2.5


on followers :

   - 1m load average between 0.5 and 2.0 (4 cpu cores)
   - CPU % between 5% and 35% with average at 15%
   - CPU I/O wait % average : 2.0


I made more tests. The difference is not always so big as my first tests  :

   - One shard leader only NRT or TLOG : 36 minutes
   - All NRT timing is between 23 and 27 minutes
   - All TLOG timing is between 28 and 34 minutes


I also changed the autoCommit maxtime from 15000 et 3 in order to get
the 28 minutes in TLOG mode.

With one shard and no replica, create the collection as NRT or as TLOG
gives the same indexing time and the same CPU usage.

My impression is that use TLOG replica produce 10% to 20% indexing time
increase according to autoCommit maxtime setting.

Regards

Dominique


Le ven. 25 oct. 2019 à 15:46, Erick Erickson  a
écrit :

> I’m also surpised that you see a slowdown, it’s worth investigating.
>
> Let’s take the NRT case with only a leader. I’ve seen the NRT indexing
> time increase when even a single follower was added (30-40% in this case).
> We believed that the issue was the time the leader sat waiting around for
> the follower to acknowledge receipt of the documents. Also note that these
> were very short documents.
>
> You’d still pay that price with more than one TLOG replica. But again, I’d
> expect the two times to be roughly equivalent.
>
> Indexing does not stop during index replication. That said, if you commit
> very frequently, you’ll be pushing lots of info around the network. Was
> your CPU running hot in the TLOG case or idling? If idling, then Solr isn’t
> getting fed fast enough. Perhaps there’s increased network traffic with the
> TLOG replicas replicating changed segments and that’s slowing down
> ingestion?
>
> It’d be interesting to index to NRT, leader-only and also a single TLOG
> collection.
>
>
> Best,
> Erick
>
> > On Oct 25, 2019, at 8:28 AM, Dominique Bejean 
> wrote:
> >
> > Shawn,
> >
> > So, I understand that while non leader TLOG is copying the index from
> > leader, the leader stop indexing.
> > One shot large heavy bulk indexing should be very much more impacted than
> > continus ligth indexing.
> >
> > Regards.
> >
> > Dominique
> >
> >
> > Le ven. 25 oct. 2019 à 13:54, Shawn Heisey  a
> écrit :
> >
> >> On 10/25/2019 1:16 AM, Dominique Bejean wrote:
> >>> For collection created with all replicas as NRT
> >>>
> >>> * Indexing time : 22 minutes
> >>
> >> 
> >>
> >>> For collection created with all replicas as TLOG
> >>>
> >>> * Indexing time : 34 minutes
> >>
> >> NRT indexes simultaneously on all replicas.  So when indexing is done on
> >> one, it is also done on all the others.
> >>
> >> PULL and non-leader TLOG replicas must copy the index from the leader.
> >> The leader will do the indexing and the other replicas will copy the
> >> completed index from the leader.  This takes time.  If the index is
> >> large, it can take a LOT of time, especially if the disks or network are
> >> slow.  TLOG replicas can become leader and PULL replicas cannot.
> >>
> >> What I would do personally is set two replicas for each shard to TLOG
> >> and all the rest to PULL.  When a TLOG replica is acting as leader, it
> >> will function exactly like an NRT replica.
> >>
> >>> The conclusion seems to be that by using TLOG :
> >>>
> >>> * You save CPU resources on non leaders nodes at index time
> >>> * The JVM Heap and GC are the same
> >>> * Indexing performance ares really less with TLOG
> >>
> >> Java works in such a way that it will always eventually allocate and use
> >> the entire max heap that it is allowed.  It is not always possible to
> >> determine how much heap is truly needed, though analyzing large GC logs
> >> will sometimes reveal that info.
> >>
> >> Non-leader replicas will probably require less heap if they are TLOG or
> >> PULL.  I cannot say how much less, that will be something that has to be
> >> determined.  Those replicas will also use less CPU.
> >>
> >> With newer Solr versions, you can ask SolrCloud to prefer PULL replicas
> >> for querying, so queries will be targeted to those replicas, unless they
> >> all go down, in which case it will go to non-preferred replica types.  I
> >> do not know how to do this, I only know that it is possible.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>
>

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean

Shawn,

So, I understand that while non leader TLOG is copying the index from
leader, the leader stop indexing.
One shot large heavy bulk indexing should be very much more impacted than
continus ligth indexing.

Regards.

Dominique


Le ven. 25 oct. 2019 à 13:54, Shawn Heisey  a écrit :

> On 10/25/2019 1:16 AM, Dominique Bejean wrote:
> > For collection created with all replicas as NRT
> >
> > * Indexing time : 22 minutes
>
> 
>
> > For collection created with all replicas as TLOG
> >
> > * Indexing time : 34 minutes
>
> NRT indexes simultaneously on all replicas.  So when indexing is done on
> one, it is also done on all the others.
>
> PULL and non-leader TLOG replicas must copy the index from the leader.
> The leader will do the indexing and the other replicas will copy the
> completed index from the leader.  This takes time.  If the index is
> large, it can take a LOT of time, especially if the disks or network are
> slow.  TLOG replicas can become leader and PULL replicas cannot.
>
> What I would do personally is set two replicas for each shard to TLOG
> and all the rest to PULL.  When a TLOG replica is acting as leader, it
> will function exactly like an NRT replica.
>
> > The conclusion seems to be that by using TLOG :
> >
> > * You save CPU resources on non leaders nodes at index time
> > * The JVM Heap and GC are the same
> > * Indexing performance ares really less with TLOG
>
> Java works in such a way that it will always eventually allocate and use
> the entire max heap that it is allowed.  It is not always possible to
> determine how much heap is truly needed, though analyzing large GC logs
> will sometimes reveal that info.
>
> Non-leader replicas will probably require less heap if they are TLOG or
> PULL.  I cannot say how much less, that will be something that has to be
> determined.  Those replicas will also use less CPU.
>
> With newer Solr versions, you can ask SolrCloud to prefer PULL replicas
> for querying, so queries will be targeted to those replicas, unless they
> all go down, in which case it will go to non-preferred replica types.  I
> do not know how to do this, I only know that it is possible.
>
> Thanks,
> Shawn
>

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean

Hi Jörn ,

I am using version 8.2.
I repeated the test twice for each mode.
I restarted solr nodes and deleted / created empty collection each time.

Regards.

Dominique


Le ven. 25 oct. 2019 à 09:20, Jörn Franke  a écrit :

> Which Solr version are you using and how often you repeated the test?
>
> > Am 25.10.2019 um 09:16 schrieb Dominique Bejean <
> dominique.bej...@eolya.fr>:
> >
> > Hi,
> >
> > I made some benchmarks for bulk indexing in order to compare performances
> > and ressources usage for NRT versus TLOG replica.
> >
> > Environnent :
> > * Solrcloud with 4 Solr nodes (8 Gb RAM, 4 Gb Heap)
> > * 1 collection with 2 shards x 2 replicas (all NRT or all TLOG)
> > * 1 core per Solr Server
> >
> > Indexing of a 10.000.000 documents in one json file with bin/post script
> >
> > If I compare NRT vs TLOG indexing, I see :
> >
> > For collection created with all replicas as NRT
> >
> > * Indexing time : 22 minutes
> > * GC times : identical on all nodes
> > * GC count : identical on all nodes
> > * Heap size : identical on all nodes
> > * CPU Load / CPU usage : identical on all nodes
> >
> > For collection created with all replicas as TLOG
> >
> > * Indexing time : 34 minutes
> > * GC times : identical on all nodes
> > * GC count : identical on all nodes
> > * Heap size : identical on all nodes
> > * CPU Load / CPU usage : identical on NRT leaders, divide by 4 on TLOG
> not
> > leaders
> >
> >
> > The conclusion seems to be that by using TLOG :
> >
> > * You save CPU resources on non leaders nodes at index time
> > * The JVM Heap and GC are the same
> > * Indexing performance ares really less with TLOG
> >
> > I am disappointed in TLOG mode by very slower indexing time and by JVM
> Heap
> > / GC.
> >
> > Are these results conform to what we could expect ?
> > What can explain bad batch indexing performances in TLOG mode ?
> >
> > I have Grafana graph for all these metrics during tests.
> >
> > Rergards.
> >
> > Dominique
>

NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean

Hi,

I made some benchmarks for bulk indexing in order to compare performances
and ressources usage for NRT versus TLOG replica.

Environnent :
* Solrcloud with 4 Solr nodes (8 Gb RAM, 4 Gb Heap)
* 1 collection with 2 shards x 2 replicas (all NRT or all TLOG)
* 1 core per Solr Server

Indexing of a 10.000.000 documents in one json file with bin/post script

If I compare NRT vs TLOG indexing, I see :

For collection created with all replicas as NRT

* Indexing time : 22 minutes
* GC times : identical on all nodes
* GC count : identical on all nodes
* Heap size : identical on all nodes
* CPU Load / CPU usage : identical on all nodes

For collection created with all replicas as TLOG

* Indexing time : 34 minutes
* GC times : identical on all nodes
* GC count : identical on all nodes
* Heap size : identical on all nodes
* CPU Load / CPU usage : identical on NRT leaders, divide by 4 on TLOG not
leaders


The conclusion seems to be that by using TLOG :

* You save CPU resources on non leaders nodes at index time
* The JVM Heap and GC are the same
* Indexing performance ares really less with TLOG

I am disappointed in TLOG mode by very slower indexing time and by JVM Heap
/ GC.

Are these results conform to what we could expect ?
What can explain bad batch indexing performances in TLOG mode ?

I have Grafana graph for all these metrics during tests.

Rergards.

Dominique

Re: Minimum Tomcat version that supports latest Solr version

2019-10-15 Thread Dominique Bejean

Hi,

Solr is not tested with Tomcat since version 4.
Why not using the embedded Jetty server ?

Regards

Dominique

Le mar. 15 oct. 2019 à 10:44, vikas shinde  a écrit :

> Dear Solr team,
>
> Which is the latest Tomcat version that supports the latest Solr version
> 8.2.0?
>
> Also provide details about previous Solr versions & their compatible Tomcat
> versions.
>
>
> Thanks & Regards.
> Vikas Shinde.
>

Re: solr.log explanations for update handler

2019-10-03 Thread Dominique Bejean

Thank you

Le mer. 2 oct. 2019 à 14:30, Mikhail Khludnev  a écrit :

> Bonjour, Dominique.
>
> Turns out it's zero and elapsed millis
>
> https://github.com/apache/lucene-solr/blob/302cd09b4ce7bd3049f8480287b3dd03bb838b0d/solr/core/src/java/org/apache/solr/update/processor/LogUpdateProcessorFactory.java#L212
>
>
> On Wed, Oct 2, 2019 at 12:56 PM Dominique Bejean <
> dominique.bej...@eolya.fr>
> wrote:
>
> > Hi,
> >
> > I don't find explanations on what are the 2 numeric values mean at the
> end
> > of these log lines.
> >
> > Regards.
> >
> > Dominique
> >
> >
> > 2019-09-30 09:19:17.474 INFO  (qtp2051853139-9577) [c:maCollection3s3r
> > s:shard1 r:core_node11 x:maCollection3s3r_shard1_replica_t2]
> > o.a.s.u.p.LogUpdateProcessorFactory [maCollection3s3r_shard1_replica_t2]
> >  webapp=/solr path=/update
> >
> >
> params={update.distrib=FROMLEADER=true=true=true=false=
> >
> >
> http://xx.xx.xx.xx:8983/solr/maCollection3s3r_shard3_replica_t9/_end_point=true=javabin=2=false}{commit=
> > }
> > 0 0
> >
> > 2019-09-30 09:19:17.500 INFO  (qtp2051853139-9581) [c:maCollection3s3r
> > s:shard1 r:core_node11 x:maCollection3s3r_shard1_replica_t2]
> > o.a.s.u.p.LogUpdateProcessorFactory [maCollection3s3r_shard1_replica_t2]
> >  webapp=/solr path=/update
> params={update.distrib=FROMLEADER=
> >
> >
> http://10.143.17.57:8983/solr/maCollection3s3r_shard1_replica_t3/=javabin=2}{add=[84156918900019
> > (1646087695166865409), 84157668900019 (1646087695167913985),
> 84146538800012
> > (1646087695167913986), 84146558600011 (1646087695168962560),
> 84111248500013
> > (1646087695168962561), 84111318600016 (1646087695168962562),
> 84155698800019
> > (1646087695168962563), 84116508700013 (1646087695168962564),
> 84138868900017
> > (1646087695170011136), 84144178500018 (1646087695170011137), ... (164
> > adds)]} 0 33
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

solr.log explanations for update handler

2019-10-02 Thread Dominique Bejean

Hi,

I don't find explanations on what are the 2 numeric values mean at the end
of these log lines.

Regards.

Dominique


2019-09-30 09:19:17.474 INFO  (qtp2051853139-9577) [c:maCollection3s3r
s:shard1 r:core_node11 x:maCollection3s3r_shard1_replica_t2]
o.a.s.u.p.LogUpdateProcessorFactory [maCollection3s3r_shard1_replica_t2]
 webapp=/solr path=/update
params={update.distrib=FROMLEADER=true=true=true=false=
http://xx.xx.xx.xx:8983/solr/maCollection3s3r_shard3_replica_t9/_end_point=true=javabin=2=false}{commit=}
0 0

2019-09-30 09:19:17.500 INFO  (qtp2051853139-9581) [c:maCollection3s3r
s:shard1 r:core_node11 x:maCollection3s3r_shard1_replica_t2]
o.a.s.u.p.LogUpdateProcessorFactory [maCollection3s3r_shard1_replica_t2]
 webapp=/solr path=/update params={update.distrib=FROMLEADER=
http://10.143.17.57:8983/solr/maCollection3s3r_shard1_replica_t3/=javabin=2}{add=[84156918900019
(1646087695166865409), 84157668900019 (1646087695167913985), 84146538800012
(1646087695167913986), 84146558600011 (1646087695168962560), 84111248500013
(1646087695168962561), 84111318600016 (1646087695168962562), 84155698800019
(1646087695168962563), 84116508700013 (1646087695168962564), 84138868900017
(1646087695170011136), 84144178500018 (1646087695170011137), ... (164
adds)]} 0 33

Re: Synonym filters memory usage

2019-10-02 Thread Dominique Bejean

Thank you for all your responses.
Dominique

Le lun. 30 sept. 2019 à 13:38, Erick Erickson  a
écrit :

> Solr/Lucene _better_ not have a copy of the synonym map for every segment,
> if so it’s a JIRA for sure. I’ve seen indexes with 100s of segments. With a
> large synonym file it’d be terrible.
>
> I would be really, really, really surprised if this is the case. The
> Lucene people are very careful with memory usage and would hop on this in
> an instant if true I’d guess.
>
> Best,
> Erick
>
> > On Sep 30, 2019, at 5:27 AM, Andrea Gazzarini 
> wrote:
> >
> > That sounds really strange to me.
> > Segments are created gradually depending on changes applied to the
> index, while the Schema should have a completely different lifecycle,
> independent from that.
> > If that is true, that would mean each time a new segment is created Solr
> would instantiate a new Schema instance (or at least, assuming this is
> valid only for synonyms, one SynonymFilterFactory, one SynonymFilter, one
> SynonymMap), which again, sounds really strange.
> >
> > Thanks for the point, I'll check and I'll let you know
> >
> > Cheers,
> > Andrea
> >
> > On 30/09/2019 09:58, Bernd Fehling wrote:
> >> Yes, I think so.
> >> While integrating a Thesaurus as synonyms.txt I saw massive memory
> usage.
> >> A heap dump and analysis with MemoryAnalyzer pointed out that the
> >> SynonymMap took 3 times a huge amount of memory, together with each
> >> opened index segment.
> >> Just try it and check that by yourself with heap dump and
> MemoryAnalyzer.
> >>
> >> Regards
> >> Bernd
> >>
> >>
> >> Am 30.09.19 um 09:44 schrieb Andrea Gazzarini:
> >>> mmm, ok for the core but are you sure things in this case are working
> per-segment? I would expect a FilterFactory instance per index, initialized
> at schema loading time.
> >>>
> >>> On 30/09/2019 09:04, Bernd Fehling wrote:
> >>>> And I think this is per core per index segment.
> >>>>
> >>>> 2 cores per instance, each core with 3 index segments, sums up to 6
> times
> >>>> the 2 SynonymMaps. Results in 12 times SynonymMaps.
> >>>>
> >>>> Regards
> >>>> Bernd
> >>>>
> >>>>
> >>>> Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:
> >>>>>   Hi,
> >>>>> looking at the stateful nature of SynonymGraphFilter/FilterFactory
> classes,
> >>>>> the answer should be 2 times (one time per type instance).
> >>>>> The SynonymMap, which internally holds the synonyms table, is a
> private
> >>>>> member of the filter factory and it is loaded each time the factory
> needs
> >>>>> to create a type.
> >>>>>
> >>>>> Best,
> >>>>> Andrea
> >>>>>
> >>>>> On 29/09/2019 23:49, Dominique Bejean wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> My concern is about memory used by synonym filter, especially if
> synonyms
> >>>>> resources files are large.
> >>>>>
> >>>>> If in my schema, there are two field types "TypeSyno1" and
> "TypeSyno2"
> >>>>> using synonym filter with the same synonyms files.
> >>>>> For each of these two field types there are two fields
> >>>>>
> >>>>> Field1 type is TypeSyno1
> >>>>> Field2 type is TypeSyno1
> >>>>> Field3 type is TypeSyno2
> >>>>> Field4 type is TypeSyno2
> >>>>>
> >>>>> How many times is the synonym file loaded in memory ?
> >>>>> 4 times, so one time per field ?
> >>>>> 2 times, so one time per instanciated type ?
> >>>>>
> >>>>> Regards
> >>>>>
> >>>>> Dominique
> >>>
> >
> > --
> > Andrea Gazzarini
> > Search Consultant, R Software Engineer
> >
> >
> >
> > mobile: +39 349 513 86 25
> > email: a.gazzar...@sease.io
> >
>
>

Synonym filters memory usage

2019-09-29 Thread Dominique Bejean

Hi,

My concern is about memory used by synonym filter, especially if synonyms
resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique

Re: [ZOOKEEPER] - Error - HEAP MEMORY

2019-07-30 Thread Dominique Bejean

Hi,

I don’t find any documentation about the parameter zookeeper_server_java_heaps
in zoo.cfg.
The way to control java heap size is either the java.env file of the
zookeeper-env.sh file. In zookeeper-env.sh
SERVER_JVMFLAGS="-Xmx=512m"

How many RAM on your server ?

Regards

Dominique




Le lun. 29 juil. 2019 à 20:35, Rodrigo Oliveira <
adamantina.rodr...@gmail.com> a écrit :

> Hi,
>
> After 3 days running, my zookeeper showing this error.
>
> 2019-07-29 15:10:41,906 [myid:1] - WARN
>  [RecvWorker:4332550065071534382:QuorumCnxManager$RecvWorker@1176] -
> Connection broken for id 4332550065071534382, my id = 1, error =
> java.io.IOException: Received packet with invalid packet: 824196618
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1163)
> 2019-07-29 15:10:41,906 [myid:1] - WARN
>  [RecvWorker:4332550065071534382:QuorumCnxManager$RecvWorker@1179] -
> Interrupting SendWorker
> 2019-07-29 15:10:41,907 [myid:1] - WARN
>  [SendWorker:4332550065071534382:QuorumCnxManager$SendWorker@1092] -
> Interrupted while waiting for message on queue
> java.lang.InterruptedException
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
> at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1243)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:78)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1080)
> 2019-07-29 15:10:41,907 [myid:1] - WARN
>  [SendWorker:4332550065071534382:QuorumCnxManager$SendWorker@1102] - Send
> worker leaving thread  id 4332550065071534382 my id = 1
> 2019-07-29 15:10:41,917 [myid:1] - INFO  [/177.55.55.152:3888
> :QuorumCnxManager$Listener@888] - Received connection request /
> 177.55.55.63:53972
> 2019-07-29 15:10:41,920 [myid:1] - WARN
>  [RecvWorker:4332550065071534382:QuorumCnxManager$RecvWorker@1176] -
> Connection broken for id 4332550065071534382, my id = 1, error =
> java.io.IOException: Received packet with invalid packet: 840973834
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1163)
> 2019-07-29 15:10:41,921 [myid:1] - WARN
>  [RecvWorker:4332550065071534382:QuorumCnxManager$RecvWorker@1179] -
> Interrupting SendWorker
> 2019-07-29 15:10:41,922 [myid:1] - WARN
>  [SendWorker:4332550065071534382:QuorumCnxManager$SendWorker@1092] -
> Interrupted while waiting for message on queue
> java.lang.InterruptedException
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
> at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1243)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:78)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1080)
> 2019-07-29 15:10:41,922 [myid:1] - WARN
>  [SendWorker:4332550065071534382:QuorumCnxManager$SendWorker@1102] - Send
> worker leaving thread  id 4332550065071534382 my id = 1
> 2019-07-29 15:10:41,932 [myid:1] - INFO  [/177.55.55.152:3888
> :QuorumCnxManager$Listener@888] - Received connection request /
> 177.55.55.63:38633
> 2019-07-29 15:10:41,933 [myid:1] - WARN
>  [RecvWorker:4332550065071534638:QuorumCnxManager$RecvWorker@1176] -
> Connection broken for id 4332550065071534638, my id = 1, error =
> java.io.IOException: Received packet with invalid packet: 807419402
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1163)
> 2019-07-29 15:10:41,933 [myid:1] - WARN
>  [RecvWorker:4332550065071534638:QuorumCnxManager$RecvWorker@1179] -
> Interrupting SendWorker
> 2019-07-29 15:10:41,934 [myid:1] - WARN
>  [SendWorker:4332550065071534638:QuorumCnxManager$SendWorker@1092] -
> Interrupted while waiting for message on queue
> java.lang.InterruptedException
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
> at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1243)
> at
>
>

Field value different over replicas

2019-07-26 Thread Dominique Bejean

Hi,

We have a date field with default set to “now”. For this field, some
documents of the collection don’t have the same value in all replicas. The
difference can be 3 or 4 minutes !
The collection has 1 shard and 2 NRT replicas. Solr version is 7.5.
Collection is populated with DIH.

Any ideas for this issue ?

NRT Replica type and GC issue could be a explanation, but only for a
difference of a few ms

Regards

Dominique

Re: RuleBasedAuthorizationPlugin configuration

2019-01-01 Thread Dominique Bejean

Hi,

I created a Jira issue
https://issues.apache.org/jira/browse/SOLR-13097

Regards.

Dominique


Le lun. 31 déc. 2018 à 11:26, Dominique Bejean 
a écrit :

> Hi,
>
> In debugging mode, I discovered that only in SolrCloud mode the collection
> name is extract from the request path in the init() method of
> HttpSolrCall.java
>
>if (cores.isZooKeeperAware()) {
>   // init collectionList (usually one name but not when there are
> aliases)
>   ...
> }
>
> So in Solr standalone mode, only authentication is fully fonctionnal, not
> authorization !
>
> Regards.
>
> Dominique
>
>
>
>
>
> Le dim. 30 déc. 2018 à 13:40, Dominique Bejean 
> a écrit :
>
>> Hi,
>>
>> After reading more carefully the log file, here is my understanding.
>>
>> The request
>>
>> http://2:xx@localhost:8983/solr/biblio/select?indent=on=*:*=json
>>
>>
>> report this in log
>>
>> 2018-12-30 12:24:52.102 INFO  (qtp1731656333-20) [   x:biblio]
>> o.a.s.s.HttpSolrCall USER_REQUIRED auth header Basic Mjox context :
>> userPrincipal: [[principal: 2]] type: [READ], collections: [], Path:
>> [/select] path : /select params :q=*:*=on=json
>>
>> collections is empty, so it looks like "/select" is not collection
>> specific and so it is not possible to define read access by collection.
>>
>> Can someone confirm ?
>>
>> Regards
>>
>> Dominique
>>
>>
>>
>>
>>
>> Le ven. 21 déc. 2018 à 10:46, Dominique Bejean 
>> a écrit :
>>
>>> Hi,
>>>
>>> I am trying to configure security.json file, in order to define the
>>> following users and permissions :
>>>
>>>- user "admin" with all permissions on all collections
>>>- user "read" with read  permissions  on all collections
>>>- user "1" with only read  permissions  on biblio collection
>>>- user "2" with only read  permissions  on personnes collection
>>>
>>> Here is my security.json file
>>>
>>> {
>>>   "authentication":{
>>> "blockUnknown":true,
>>> "class":"solr.BasicAuthPlugin",
>>> "credentials":{
>>>   "admin":"4uwfcjV7bCqOdLF/Qn2wiTyC7zIWN6lyA1Bgp1yqZj0=
>>> 7PCh68vhIlZXg1l45kSlvGKowMg1bm/L3eSfgT5dzjs=",
>>>   "read":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
>>> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=",
>>>   "1":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
>>> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=",
>>>   "2":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
>>> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo="},
>>> "":{"v":0}},
>>>   "authorization":{
>>> "class":"solr.RuleBasedAuthorizationPlugin",
>>> "permissions":[
>>>   {
>>> "name":"all",
>>> "role":"admin",
>>> "index":1},
>>>   {
>>> "name":"read-biblio",
>>> "path":"/select",
>>> "role":["admin","read","r1"],
>>> "collection":"biblio",
>>> "index":2},
>>>   {
>>> "name":"read-personnes",
>>> "path":"/select",
>>> "role":["admin","read","r2"],
>>> "collection":"personnes",
>>> "index":3},
>>>  {
>>> "name":"read",
>>> "collection":"*",
>>> "role":["admin","read"],
>>> "index":4}],
>>> "user-role":{
>>>   "admin":"admin",
>>>   "read":"read",
>>>   "1":"r1",
>>>   "2":"r2"}
>>>   }
>>> }
>>>
>>>
>>> I have a 403 errors for user 1 on biblio and user 2 on personnes while
>>> using the "/select" requestHandler. However according to r1 and r2 roles
>>> and premissions order, the access should be allowed.
>>>
>>> I have duplicated the TestRuleBasedAuthorizationPlugin.java class in
>>> order to test these exact same permissions and roles. checkRules reports
>>> access is allowed !!!
>>>
>>> I don't understand where is the problem. Any ideas ?
>>>
>>> Regards
>>>
>>> Dominique
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>

Re: RuleBasedAuthorizationPlugin configuration

2018-12-31 Thread Dominique Bejean

Hi,

In debugging mode, I discovered that only in SolrCloud mode the collection
name is extract from the request path in the init() method of
HttpSolrCall.java

   if (cores.isZooKeeperAware()) {
  // init collectionList (usually one name but not when there are
aliases)
  ...
}

So in Solr standalone mode, only authentication is fully fonctionnal, not
authorization !

Regards.

Dominique





Le dim. 30 déc. 2018 à 13:40, Dominique Bejean 
a écrit :

> Hi,
>
> After reading more carefully the log file, here is my understanding.
>
> The request
>
> http://2:xx@localhost:8983/solr/biblio/select?indent=on=*:*=json
>
> report this in log
>
> 2018-12-30 12:24:52.102 INFO  (qtp1731656333-20) [   x:biblio]
> o.a.s.s.HttpSolrCall USER_REQUIRED auth header Basic Mjox context :
> userPrincipal: [[principal: 2]] type: [READ], collections: [], Path:
> [/select] path : /select params :q=*:*=on=json
>
> collections is empty, so it looks like "/select" is not collection
> specific and so it is not possible to define read access by collection.
>
> Can someone confirm ?
>
> Regards
>
> Dominique
>
>
>
>
>
> Le ven. 21 déc. 2018 à 10:46, Dominique Bejean 
> a écrit :
>
>> Hi,
>>
>> I am trying to configure security.json file, in order to define the
>> following users and permissions :
>>
>>- user "admin" with all permissions on all collections
>>- user "read" with read  permissions  on all collections
>>- user "1" with only read  permissions  on biblio collection
>>- user "2" with only read  permissions  on personnes collection
>>
>> Here is my security.json file
>>
>> {
>>   "authentication":{
>> "blockUnknown":true,
>> "class":"solr.BasicAuthPlugin",
>> "credentials":{
>>   "admin":"4uwfcjV7bCqOdLF/Qn2wiTyC7zIWN6lyA1Bgp1yqZj0=
>> 7PCh68vhIlZXg1l45kSlvGKowMg1bm/L3eSfgT5dzjs=",
>>   "read":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
>> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=",
>>   "1":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
>> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=",
>>   "2":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
>> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo="},
>> "":{"v":0}},
>>   "authorization":{
>> "class":"solr.RuleBasedAuthorizationPlugin",
>> "permissions":[
>>   {
>> "name":"all",
>> "role":"admin",
>> "index":1},
>>   {
>> "name":"read-biblio",
>> "path":"/select",
>> "role":["admin","read","r1"],
>> "collection":"biblio",
>> "index":2},
>>   {
>> "name":"read-personnes",
>> "path":"/select",
>> "role":["admin","read","r2"],
>> "collection":"personnes",
>> "index":3},
>>  {
>> "name":"read",
>> "collection":"*",
>> "role":["admin","read"],
>> "index":4}],
>> "user-role":{
>>   "admin":"admin",
>>   "read":"read",
>>   "1":"r1",
>>   "2":"r2"}
>>   }
>> }
>>
>>
>> I have a 403 errors for user 1 on biblio and user 2 on personnes while
>> using the "/select" requestHandler. However according to r1 and r2 roles
>> and premissions order, the access should be allowed.
>>
>> I have duplicated the TestRuleBasedAuthorizationPlugin.java class in
>> order to test these exact same permissions and roles. checkRules reports
>> access is allowed !!!
>>
>> I don't understand where is the problem. Any ideas ?
>>
>> Regards
>>
>> Dominique
>>
>>
>>
>>
>>
>>
>>
>>

Re: RuleBasedAuthorizationPlugin configuration

2018-12-30 Thread Dominique Bejean

Hi,

After reading more carefully the log file, here is my understanding.

The request

http://2:xx@localhost:8983/solr/biblio/select?indent=on=*:*=json

report this in log

2018-12-30 12:24:52.102 INFO  (qtp1731656333-20) [   x:biblio]
o.a.s.s.HttpSolrCall USER_REQUIRED auth header Basic Mjox context :
userPrincipal: [[principal: 2]] type: [READ], collections: [], Path:
[/select] path : /select params :q=*:*=on=json

collections is empty, so it looks like "/select" is not collection specific
and so it is not possible to define read access by collection.

Can someone confirm ?

Regards

Dominique





Le ven. 21 déc. 2018 à 10:46, Dominique Bejean 
a écrit :

> Hi,
>
> I am trying to configure security.json file, in order to define the
> following users and permissions :
>
>- user "admin" with all permissions on all collections
>- user "read" with read  permissions  on all collections
>- user "1" with only read  permissions  on biblio collection
>- user "2" with only read  permissions  on personnes collection
>
> Here is my security.json file
>
> {
>   "authentication":{
> "blockUnknown":true,
> "class":"solr.BasicAuthPlugin",
> "credentials":{
>   "admin":"4uwfcjV7bCqOdLF/Qn2wiTyC7zIWN6lyA1Bgp1yqZj0=
> 7PCh68vhIlZXg1l45kSlvGKowMg1bm/L3eSfgT5dzjs=",
>   "read":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=",
>   "1":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=",
>   "2":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo="},
> "":{"v":0}},
>   "authorization":{
> "class":"solr.RuleBasedAuthorizationPlugin",
> "permissions":[
>   {
> "name":"all",
> "role":"admin",
> "index":1},
>   {
> "name":"read-biblio",
> "path":"/select",
> "role":["admin","read","r1"],
> "collection":"biblio",
> "index":2},
>   {
> "name":"read-personnes",
> "path":"/select",
> "role":["admin","read","r2"],
> "collection":"personnes",
> "index":3},
>  {
> "name":"read",
> "collection":"*",
> "role":["admin","read"],
> "index":4}],
> "user-role":{
>   "admin":"admin",
>   "read":"read",
>   "1":"r1",
>   "2":"r2"}
>   }
> }
>
>
> I have a 403 errors for user 1 on biblio and user 2 on personnes while
> using the "/select" requestHandler. However according to r1 and r2 roles
> and premissions order, the access should be allowed.
>
> I have duplicated the TestRuleBasedAuthorizationPlugin.java class in order
> to test these exact same permissions and roles. checkRules reports access
> is allowed !!!
>
> I don't understand where is the problem. Any ideas ?
>
> Regards
>
> Dominique
>
>
>
>
>
>
>
>

Re: Zookeeper timeout issue -

2018-12-21 Thread Dominique Bejean

Hi,

What is the scenario ? High query activity ? High update activity ?

Regards.

Dominique


Le mer. 19 déc. 2018 à 13:44, AshB  a écrit :

> Hi,
>
> We are facing issue with solr/zookeeper where zookeeper timeouts after
> 1ms. Error below.
>
> *SolrException: java.util.concurrent.TimeoutException: Could not connect to
> ZooKeeper :9181,:9182,:9183 within 1 ms.
> at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:184)
> at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:121)
> at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:111)
> at
> org.apache.solr.common.cloud.ZkStateReader.(ZkStateReader.java:295)*
>
> We are not getting any error in zookeeper logs.Except below logs
> 2018-12-19 04:35:22,305 [myid:2] - INFO
> [SessionTracker:ZooKeeperServer@354] - Expiring session 0x200830234de3127,
> timeout of 1ms exceeded
> 2018-12-19 05:35:38,304 [myid:2] - INFO
> [SessionTracker:ZooKeeperServer@354] - Expiring session 0x200b4f912730086,
> timeout of 1ms exceeded
> 2018-12-19 05:56:58,302 [myid:2] - INFO
> [SessionTracker:ZooKeeperServer@354] - Expiring session 0x100b4f9125e00bf,
> timeout of 1ms exceeded
>
>
> During the issue threads go high and we could notice below in weblogic
> server.
>
> Name: Connection evictor
> State: TIMED_WAITING
> Total blocked: 0  Total waited: 1
>
> Stack trace:
> java.lang.Thread.sleep(Native Method)
>
> org.apache.http.impl.client.IdleConnectionEvictor$1.run(IdleConnectionEvictor.java:66)
> java.lang.Thread.run(Thread.java:748)
>
> What could be going wrong here?
>
> Regards
> Ashish
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Is there a common tool for SOLR benckmark?

2018-12-21 Thread Dominique Bejean

Hi,

There are the powerfull JMeter obviously and also SolrMeter (
https://github.com/tflobbe/solrmeter).

Regards

Dominique


Le jeu. 20 déc. 2018 à 03:17, zhenyuan wei  a écrit :

> Hi all,
>Is there a common tool for SOLR benckmark? YCSB is not very
> suitable for SOLR.  Currently,  Is there a good benchmark tool for SOLR?
>
>
> Best, TinsWzy
>

Re: ZooKeeper for Solr 7.6

2018-12-21 Thread Dominique Bejean

Hi,

This is a Solr side issue not a Zookeeper side issue.
Zookeeper 3.4.13 is 5 monthes old version so you can use it on server side
with the zookeeper client 3.4.11 provided by Solr.

Dominique


Le jeu. 20 déc. 2018 à 01:53, Yasufumi Mizoguchi  a
écrit :

> Hi,
>
> I searched JIRA and found SOLR-12727 .
> ( https://issues.apache.org/jira/browse/SOLR-12727 )
> That is why I avoided using ZooKeeper 3.4.13 .
>
> Thanks,
> Yasufumi
>
> 2018年12月19日(水) 16:02 S G :
>
> > Why don't you try 3.4.13 instead? That's a version newer than 3.4.12
> >
> > On Tue, Dec 18, 2018 at 12:37 AM Yasufumi Mizoguchi <
> > yasufumi0...@gmail.com>
> > wrote:
> >
> > > Thank you Jan.
> > >
> > > I will try it.
> > >
> > > Thanks,
> > > Yasufumi.
> > >
> > > 2018年12月18日(火) 17:21 Jan Høydahl :
> > >
> > > > That is no problem, doing it myself.
> > > >
> > > > --
> > > > Jan Høydahl, search solution architect
> > > > Cominvent AS - www.cominvent.com
> > > >
> > > > > 18. des. 2018 kl. 04:34 skrev Yasufumi Mizoguchi <
> > > yasufumi0...@gmail.com
> > > > >:
> > > > >
> > > > > Hi
> > > > >
> > > > > I am trying Solr 7.6 in SolrCloud mode.
> > > > > But I found that ZooKeeper 3.4.11 has a critical issue about
> handling
> > > > > data/log directories.
> > > > > (https://issues.apache.org/jira/browse/ZOOKEEPER-2960)
> > > > >
> > > > > So, I want to know if using ZooKeeper 3.4.12 with Solr 7.6 is safe.
> > > > >
> > > > > Does anyone know this?
> > > > >
> > > > > Thanks,
> > > > > Yasufumi.
> > > >
> > > >
> > >
> >
>

RuleBasedAuthorizationPlugin configuration

2018-12-21 Thread Dominique Bejean

Hi,

I am trying to configure security.json file, in order to define the
following users and permissions :

   - user "admin" with all permissions on all collections
   - user "read" with read  permissions  on all collections
   - user "1" with only read  permissions  on biblio collection
   - user "2" with only read  permissions  on personnes collection

Here is my security.json file

{
  "authentication":{
"blockUnknown":true,
"class":"solr.BasicAuthPlugin",
"credentials":{
  "admin":"4uwfcjV7bCqOdLF/Qn2wiTyC7zIWN6lyA1Bgp1yqZj0=
7PCh68vhIlZXg1l45kSlvGKowMg1bm/L3eSfgT5dzjs=",
  "read":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=",
  "1":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=",
  "2":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk=
gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo="},
"":{"v":0}},
  "authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[
  {
"name":"all",
"role":"admin",
"index":1},
  {
"name":"read-biblio",
"path":"/select",
"role":["admin","read","r1"],
"collection":"biblio",
"index":2},
  {
"name":"read-personnes",
"path":"/select",
"role":["admin","read","r2"],
"collection":"personnes",
"index":3},
 {
"name":"read",
"collection":"*",
"role":["admin","read"],
"index":4}],
"user-role":{
  "admin":"admin",
  "read":"read",
  "1":"r1",
  "2":"r2"}
  }
}


I have a 403 errors for user 1 on biblio and user 2 on personnes while
using the "/select" requestHandler. However according to r1 and r2 roles
and premissions order, the access should be allowed.

I have duplicated the TestRuleBasedAuthorizationPlugin.java class in order
to test these exact same permissions and roles. checkRules reports access
is allowed !!!

I don't understand where is the problem. Any ideas ?

Regards

Dominique

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

2018-10-12 Thread Dominique Bejean

Hi,

1/
As previously said by other persons, my first action would be to understand
why you need so much heap ?

The first step is to maximize your heap size to 31Gb (or obviously less if
possible).
https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/

Can you provide some typical sorl requests covering most of your use cases
? Take them in solr logs in order to provide also hits count and qtime.

   - take care to rows and fl parameters
   - if you are using facets, use JSON API facets


Did you optimise your schema ?

   - remove unnecessary fields from you indices
   - optimize indexed, stored and docValues attributes (do not index or
   store unnecessary)


Did you increase to much solr caches ?

I didn't see the java version you are using.


2/
with huge heap, I would try the G1 GC.


3/
I would stop optimize indexes


4/
It looks like you have enough RAM to for your heap and the system cache (80
Gb + 20 Gb < 120 Gb), but did you disable swap on your server
(vm.swappiness = 1) ?


5/
How often are you updating your indexes on master (continuously, once an
hour, ... once a day) ?


Regards

Dominique



Le mer. 3 oct. 2018 à 23:11, yasoobhaider  a
écrit :

> Hi
>
> I'm working with a Solr cluster with master-slave architecture.
>
> Master and slave config:
> ram: 120GB
> cores: 16
>
> At any point there are between 10-20 slaves in the cluster, each serving
> ~2k
> requests per minute. Each slave houses two collections of approx 10G
> (~2.5mil docs) and 2G(10mil docs) when optimized.
>
> I am working with Solr 6.2.1
>
> Solr configuration:
>
> -XX:+CMSParallelRemarkEnabled
> -XX:+CMSScavengeBeforeRemark
> -XX:+ParallelRefProcEnabled
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC
> -XX:-OmitStackTraceInFastThrow
> -XX:CMSInitiatingOccupancyFraction=50
> -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:ConcGCThreads=4
> -XX:MaxTenuringThreshold=8
> -XX:ParallelGCThreads=4
> -XX:PretenureSizeThreshold=64m
> -XX:SurvivorRatio=15
> -XX:TargetSurvivorRatio=90
> -Xmn10G
> -Xms80G
> -Xmx80G
>
> Some of these configurations have been reached by multiple trial and errors
> over time, including the huge heap size.
>
> This cluster usually runs without any error.
>
> In the usual scenario, old gen gc is triggered according to the
> configuration at 50% old gen occupancy, and the collector clears out the
> memory over the next minute or so. This happens every 10-15 minutes.
>
> However, I have noticed that sometimes the GC pattern of the slaves
> completely changes and old gen gc is not able to clear the memory.
>
> After observing the gc logs closely for multiple old gen gc collections, I
> noticed that the old gen gc is triggered at 50% occupancy, but if there is
> a
> GC Allocation Failure before the collection completes (after CMS Initial
> Remark but before CMS reset), the old gen collection is not able to clear
> much memory. And as soon as this collection completes, another old gen gc
> is
> triggered.
>
> And in worst case scenarios, this cycle of old gen gc triggering, GC
> allocation failure keeps happening, and the old gen memory keeps
> increasing,
> leading to a single threaded STW GC, which is not able to do much, and I
> have to restart the solr server.
>
> The last time this happened after the following sequence of events:
>
> 1. We optimized the bigger collection bringing it to its optimized size of
> ~10G.
> 2. For an unrelated reason, we had stopped indexing to the master. We
> usually index at a low-ish throughput of ~1mil docs/day. This is relevant
> as
> when we are indexing, the size of the collection increases, and this
> effects
> the heap size used by collection.
> 3. The slaves started behaving erratically, with old gc collection not
> being
> able to free up the required memory and finally being stuck in a STW GC.
>
> As unlikely as this sounds, this is the only thing that changed on the
> cluster. There was no change in query throughput or type of queries.
>
> I restarted the slaves multiple times but the gc behaved in the same way
> for
> over three days. Then when we fixed the indexing and made it live, the
> slaves resumed their original gc pattern and are running without any issues
> for over 24 hours now.
>
> I would really be grateful for any advice on the following:
>
> 1. What could be the reason behind CMS not being able to free up the
> memory?
> What are some experiments I can run to solve this problem?
> 2. Can stopping/starting indexing be a reason for such drastic changes to
> GC
> pattern?
> 3. I have read at multiple places on this mailing list that the heap size
> should be much lower (2x-3x the size of collection), but the last time I
> tried CMS was not able to run smoothly and GC STW would occur which was
> only
> solved by a restart. My reasoning

Re: Index size issue in SOLR-6.5.1

2018-10-08 Thread Dominique Bejean

HI,

In the Solr Admin console, you can access for each core to the "Segment
info" page. You can see if there are more deleted documents in segments on
server X.

Dominique

Le lun. 8 oct. 2018 à 07:29, SOLR4189  a écrit :

> About which details do you ask? Yesterday we restarted all our solr
> services
> and index size in serverX descreased from 82Gb to 60Gb, and in serverY
> index
> size didn't change (49Gb).
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Index size issue in SOLR-6.5.1

2018-10-07 Thread Dominique Bejean

Hi,

What about cores segment details in admin UI  interface ? More deleted
documents ?

Regards

Dominique

Le dim. 7 oct. 2018 à 08:22, SOLR4189  a écrit :

> Hi all,
>
> We use SOLR-6.5.1 and we have very strange issue. In our collection index
> size is very different from server to server (33gb difference):
> 1. We have index size 82Gb in serverX and 49Gb in serverY
> 2. ServerX displays 82gb used place if we run "df -h
> /opt/solr/Xxx_shardX_replica1/data/index"
> and through web admin ui it displays 60gb used place.
>
> What can it be? Why do we have difference between server? Between server
> and
> web admin ui?
>
> Thank you.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Docker and Solr Indexing

2018-09-12 Thread Dominique Bejean

Hi,

Are you aware about issues in Java applications in Docker if java version
is not 10 ?
https://blog.docker.com/2018/04/improved-docker-container-integration-with-java-10/

Regards.

Dominique


Le mer. 12 sept. 2018 à 05:42, Shawn Heisey  a écrit :

> On 9/11/2018 9:20 PM, solrnoobie wrote:
> > So what we did is we upgraded the instances to 16 gigs and we rarely
> > encounter this now.
> >
> > So what we did was to increase the batch size to 500 instead of 50 and it
> > worked for our test data. But when we tried 1000 batch size, the invalid
> > content type error returned. Can you guys shed some light on why this is
> > happening? I don't think that a thousand per batch is too much (although
> we
> > have documents with many fields and child documents) so I am not really
> sure
> > what's causing this aside from a docker containter restart.
>
> At no point in this thread have you shared the actual error messages.
> Without those and the exact version of Solr, it's difficult to help
> you.  Saying that you got a "content type error" doesn't mean anything.
> We need to see the actual error, complete with all stacktrace data.  The
> best information will be found in the logfile -- solr.log.
>
> Solr (as packaged by this project) is not designed to restart itself
> automatically.  If the JVM encounters an OutOfMemoryError exception and
> the platform is NOT Windows, then Solr is designed to kill itself ...
> but it will NOT automatically restart without outside intervention or a
> change to its startup scripts.  This is done because program operation
> is completely unpredictable when OOME hits, so the best course of action
> is to self-terminate and let the admin fix the problem that cause the OOME.
>
> The publicly available Solr docker container is NOT an official product
> of this project.  It is third-party, so problems specific to the docker
> container may need to be handled by the project that created it.  If the
> docker container is set up to automatically restart Solr when it dies, I
> would consider that to be a bug. About the only reason that Solr will
> ever die is the OOME self-termination that I already described ... and
> since the OOME is likely to occur again after restart, it's usually
> better for the software to stay offline until the admin fixes the problem.
>
> Thanks,
> Shawn
>
>

Re: SOLR zookeeper connection timeout during startup is hardcoded to 10000ms

2018-08-27 Thread Dominique Bejean

Hi,

We also experimenting time-out issues from time to time.

I sent this message one month ago, by mistake in the dev list.

Why use hardcoded values just in ZkClientClusterStateProvider.java file
while there are existing parameters for these time-out ?

Regards

Dominique



We are experimenting an issue related to Zk Timeout

Stacktrace is :

ERROR 19 juin 2018 06:24:07,152 - h.concurrent.ConcurrentService:67   -
Erreur dans l'attente de la fin de l'exécution d'un thread
ERROR 19 juin 2018 06:24:07,152 - h.concurrent.ConcurrentService:68   -
org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
xxx.xxx.xxx.xxx  :2181 within 1 ms
ERROR 19 juin 2018 06:24:07,152 -   api.batch.Lanceur:98   -
org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
xxx.xxx.xxx.xxx  :2181 within 1 ms
java.util.concurrent.ExecutionException:
org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
xxx.xxx.xxx.xxx:2181 within 1 ms
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 ...
Caused by: org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
xxx.xxx.xxx.xxx:2181 within 1 ms
 at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:182)
 at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:116)
 at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:106)
 at
org.apache.solr.common.cloud.ZkStateReader.(ZkStateReader.java:226)
 at
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:121)
...


In solr.xml, we have :
${zkClientTimeout:3}

In Solr.in.sh , we have :
#ZK_CLIENT_TIMEOUT="15000"
or
ZK_CLIENT_TIMEOUT="3"

So zkClientTimeout  should be 3.

In source code of ZkClientClusterStateProvider.java, I see zkClientTimeout
is hardcoded to 1 ! Is it normal that configuration is not used ?

lucene-solr/solr/solrj/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java

int zkConnectTimeout = 1;
int zkClientTimeout = 1;

...

zk = new ZkStateReader(zkHost, zkClientTimeout, zkConnectTimeout);


Regards.

Le ven. 24 août 2018 à 20:15, dshih  a écrit :

> Sorry, yes 10,000 ms.
>
> We have a single test cluster (out of probably hundreds) where one node
> hits
> this consistently.  I'm not sure what kind of issues (network?) that node
> is
> having.
>
> Generally though, we ship SOLR as part of our product, and we cannot
> control
> our customers' hardware and setup besides listing minimum requirements.
> While I think this issue will probably be extremely rare, we would
> definitely prefer to be able to say: "well, if you can't fix your hardware
> issue, try increasing this timeout setting".
>
> Thanks,
> Danny
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Solr and ZK timeout issues

2018-07-17 Thread Dominique Bejean

Hi,

We are experimenting an issue related to Zk Timeout

Stacktrace is :

ERROR 19 juin 2018 06:24:07,152 - h.concurrent.ConcurrentService:67   -
Erreur dans l'attente de la fin de l'exécution d'un thread
ERROR 19 juin 2018 06:24:07,152 - h.concurrent.ConcurrentService:68   -
org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
xxx.xxx.xxx.xxx  :2181 within 1 ms
ERROR 19 juin 2018 06:24:07,152 -  api.batch.Lanceur:98   -
org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
xxx.xxx.xxx.xxx  :2181 within 1 ms
java.util.concurrent.ExecutionException:
org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
xxx.xxx.xxx.xxx:2181 within 1 ms
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 ...
Caused by: org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
xxx.xxx.xxx.xxx:2181 within 1 ms
 at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:182)
 at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:116)
 at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:106)
 at
org.apache.solr.common.cloud.ZkStateReader.(ZkStateReader.java:226)
 at
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:121)
...


In solr.xml, we have :
${zkClientTimeout:3}

In Solr.in.sh, we have :
#ZK_CLIENT_TIMEOUT="15000"
or
ZK_CLIENT_TIMEOUT="3"

So zkClientTimeout  should be 3.

In source code of ZkClientClusterStateProvider.java, I see zkClientTimeout
is hardcoded to 1 ! Is it normal that configuration is not used ?

lucene-solr/solr/solrj/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java

int zkConnectTimeout = 1;
int zkClientTimeout = 1;

...

zk = new ZkStateReader(zkHost, zkClientTimeout, zkConnectTimeout);


Regards.

Dominique

Re: Silk from LucidWorks

2018-07-16 Thread Dominique Bejean

Hi,

Use Grafana with Solr starting version 7 si very easy and well documented.
https://lucene.apache.org/solr/guide/7_3/monitoring-solr-with-prometheus-and-grafana.html

Dominique
Le lun. 16 juil. 2018 à 06:56, Aroop Ganguly
 a écrit :

> How do you use Grafana with Solr ? Did you build a http communication
> interface or is there some open source project that you leveraged ?
>
>
> > On Jul 15, 2018, at 2:54 PM, Rahul Singh 
> wrote:
> >
> > Their commercial offering still has something like it. You can always
> try Grafana
> >
> > Rahul
> > On Jul 13, 2018, 9:59 AM -0400, rgummadi ,
> wrote:
> >> Is SiLK from LucidWorks still an acitve project. I looked at their
> github and
> >> it does not seem to be active. If so are there any alternative
> solutions.
> >>
> >>
> >>
> >> --
> >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
>

Re: Removed nodes still visible as gone in Solrcloud graph

2018-05-29 Thread Dominique Bejean

Hi,

I reply to myself.

The solution is to edit the state.json fie for all impacted collections.


   - Stop all Solr nodes


   - Download state.json file from ZK for collection "xx"

# server/scripts/cloud-scripts/zkcli.sh -z "xxx.xxx.xxx.xxx:2181" -cmd
getfile /collections/xx/state.json /tmp/-state-local.json


   - Edit the downloaded state.json and save it


   - Remove collection state.json from ZK

# server/scripts/cloud-scripts/zkcli.sh -z "xxx.xxx.xxx.xxx:2181" -cmd
clear /collections/xx/state.json


   - Upload modified state.json to ZK

# server/scripts/cloud-scripts/zkcli.sh -z "xxx.xxx.xxx.xxx:2181" -cmd
putfile /collections/xx/state.json /tmp/-state-local.json


   - Start all Solr nodes


Dominique


Le mar. 29 mai 2018 à 14:19, Dominique Bejean  a
écrit :

> Hi,
>
> On a node, I accidentally changed the SOLR_HOST value from uppercase to
> lowercase and I restarted the node. After I fixed the error, I restarted
> again the node but the node name in lowercase is still visible as "gone".
> How to definitively remove a gone node from the Solrcloud graph ?
>
> Regards.
>
> Dominique
>
>
> --
> Dominique Béjean
> 06 08 46 12 43
>
-- 
Dominique Béjean
06 08 46 12 43

Removed nodes still visible as gone in Solrcloud graph

2018-05-29 Thread Dominique Bejean

Hi,

On a node, I accidentally changed the SOLR_HOST value from uppercase to
lowercase and I restarted the node. After I fixed the error, I restarted
again the node but the node name in lowercase is still visible as "gone".
How to definitively remove a gone node from the Solrcloud graph ?

Regards.

Dominique


-- 
Dominique Béjean
06 08 46 12 43

Re: Howto disable PrintGCTimeStamps in Solr

2018-05-07 Thread Dominique Bejean

Hi,

Which version of Solr are you using ?

Regards

Dominique


Le ven. 4 mai 2018 à 09:13, Bernd Fehling 
a écrit :

> Hi list,
>
> this sounds simple but I can't disable PrintGCTimeStamps in solr_gc
> logging.
> I tried with GC_LOG_OPTS in start scripts and --verbose reporting during
> start to make sure it is not in Solr start scripts.
> But if Solr is up and running there are always TimeStamps in solr_gc.log
> and
> the file reports at the top with "CommandLine flags:" that the option
> -XX:+PrintGCTimeStamps has been set.
> But where?
>
> Is it something passed down from Jetty?
>
> Regards,
> Bernd
>
>
>
> --
Dominique Béjean
06 08 46 12 43

Re: What are descent disk I/O for Solr and Zookeeper ?

2018-03-11 Thread Dominique Bejean

Hi Shawn,

I agree on Disk I/O versus available memory about Solr performances.
However for heavy indexing and heavy searching context, even with a lot of
RAM, disk I/O should be critical.

My concern is also about write I/O for Zookeeper transactions log. My
understanding is that is critical not as much for Solrcloud performances
but mainly for SolrCloud stability.

Sometimes even with best practices respect and all possible configuration
tuning, Solrcoud is not stable or not performant due to lake of hardware
resources. Monitoring CPU, CPU load, iowait, jvm GC, … should highlight
theses lake of ressources. If the hardware is undersized, we need metrics
in order to explain and demonstrate this to the customer (furthermore if
the infrastructure provider do not want admit there are issues with
hardware or virtualization). That was the meaning of my question about
“decent disk I/O”.

Regards

Dominique

Le ven. 9 mars 2018 à 00:40, Shawn Heisey <apa...@elyograg.org> a écrit :

> On 3/8/2018 2:55 PM, Dominique Bejean wrote:
> > Disk I/O are critical for high performance Solrcloud.
>
> This statement has truth to it, but if your system is correctly sized,
> disk performance will not have much of an impact on Solr performance.
> If upgrading to faster disks does improves long-term query performance,
> the system probably doesn't have enough memory installed.  There can be
> other causes, but that is the most common.
>
> When there is enough memory available to allow the operating system to
> effectively cache the index data, Solr will not need to access the disk
> much at all for queries -- all that data will be already in memory.
> Indexing will still be dependent on disk performance even when there is
> plenty of memory, because that will require writing new data to the disk.
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> This is my hammer.  To me, your question looks like a nail.  :)
>
> Thanks,
> Shawn
>
> --
Dominique Béjean
06 08 46 12 43

What are descent disk I/O for Solr and Zookeeper ?

2018-03-08 Thread Dominique Bejean

Hi,

Disk I/O are critical for high performance Solrcloud.
I am looking for relevante disk I/O tests for both Solr node or Zookeeper
element and with these tests what are bad, correct or good results.

For instance how to know if these results with basic dd utility reports
correct disk performances ? And are these tests relevants ?

Write small files
# dd if=/dev/zero of=test bs=4k count=1024k conv=fdatasync
4294967296 bytes (4.3 GB) copied, 4.14932 s, 1.0 GB/s

Write medium files
# dd if=/dev/zero of=test bs=64k count=64k conv=fdatasync
4294967296 bytes (4.3 GB) copied, 3.07326 s, 1.4 GB/s

Write large files
# dd if=/dev/zero of=test bs=1024k count=4k conv=fdatasync
4294967296 bytes (4.3 GB) copied, 2.97767 s, 1.4 GB/s

Read small files
# dd if=test of=/dev/zero bs=4k
4294967296 bytes (4.3 GB) copied, 0.707424 s, 6.1 GB/s

Read medium files
# dd if=test of=/dev/zero bs=64k
4294967296 bytes (4.3 GB) copied, 0.545915 s, 7.9 GB/s

Read large files
# dd if=test of=/dev/zero bs=1024k
4294967296 bytes (4.3 GB) copied, 0.578093 s, 7.4 GB/s


Regards

Dominique



-- 
Dominique Béjean
06 08 46 12 43

Re: Multi words query time synonyms

2018-02-11 Thread Dominique Bejean

Steve,

According to your comment, I made this test :

1/ put the SynonymGraphFilterFactory after the StopFilterFactory in query
time analyze chain


  
  
  
  
  
  
  


2/ remove the stop word in the synonyms file

om, olympique marseille


The parsed query string are :

for "om maillot"
"parsedquery_toString":"+(+name_text_gp:olympiqu +name_text_gp:marseil)
name_text_gp:om)) (name_text_gp:maillot))~1)",

for "olympique de marseille maillot"
"parsedquery_toString":"+name_text_gp:om (+name_text_gp:olympiqu
+name_text_gp:marseil))) (name_text_gp:maillot))~1)",

for "maillot om"
parsedquery_toString":"+(((name_text_gp:maillot) (((+name_text_gp:olympiqu
+name_text_gp:marseil) name_text_gp:om)))~1)",

for "maillot olympique de marseille"
 "parsedquery_toString":"+(((name_text_gp:maillot) ((name_text_gp:om
(+name_text_gp:olympiqu +name_text_gp:marseil~1)",


The query result are the same for all queries.

It looks like this could be an acceptable workaround.

Thank you

Dominique



Le dim. 11 févr. 2018 à 10:31, Dominique Bejean <dominique.bej...@eolya.fr>
a écrit :

> Hi Steve,
>
> Thank you for your response.
> The Jira was created : SOLR-11968
>
> I let you add your comments.
>
> Regards.
>
> Dominique
>
>
> Le sam. 10 févr. 2018 à 20:30, Steve Rowe <sar...@gmail.com> a écrit :
>
>> Hi Dominique,
>>
>> Looks like it’s a bug, not sure where exactly though.  Can you please
>> create a JIRA?
>>
>> I can see the same behavior on master too, not just on the
>> releases/lucene-solr/6.6.2 tag.
>>
>> One interesting thing I found is that if I remove the stop filter from
>> the query analyzer, I get the following for qq=“maillot om”:
>>
>> +((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de
>> +name_text_gp:marseil) name_text_gp:om)))
>>
>> (btw my stop list only has “de” on it)
>>
>> Thanks,
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Feb 10, 2018, at 2:12 AM, Dominique Bejean <
>> dominique.bej...@eolya.fr> wrote:
>> >
>> > Hi,
>> >
>> > More info.
>> >
>> > When I test the analisys for the field type the synonyms are correctly
>> > expanded for both expressions
>> >
>> > om maillot
>> > maillot om
>> > olympique de marseille maillot
>> > maillot olympique de marseille
>> >
>> > resulting outputs always include the following terms (obvioulsly not
>> always
>> > in the same order)
>> >
>> > olympiqu om marseil maillot
>> >
>> >
>> > So, i suspect an issue with edismax query parser.
>> >
>> > Regards.
>> >
>> > Dominique
>> >
>> >
>> > Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <
>> dominique.bej...@eolya.fr>
>> > a écrit :
>> >
>> >> Hi,
>> >>
>> >> I am trying multi words query time synonyms with Solr 6.6.2and
>> >> SynonymGraphFilterFactory filter as explain in this article
>> >>
>> >>
>> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
>> >>
>> >> My field type is :
>> >>
>> >> > >> positionIncrementGap="100">
>> >>
>> >>  
>> >>  > >>articles="lang/contractions_fr.txt"/>
>> >>  
>> >>  
>> >>  > >> ignoreCase="true"/>
>> >>  
>> >>
>> >>
>> >>  
>> >>  > >>articles="lang/contractions_fr.txt"/>
>> >>  
>> >>  > >> synonyms="synonyms.txt"
>> >>ignoreCase="true" expand="true"/>
>> >>  
>> >>  > >> ignoreCase="true"/>
>> >>  
>> >>
>> >>  
>> >>
>> >>
>> >> synonyms.txt contains the line
>> >>
>> >> om, olympique de marseille
>> >>
>> >>
>> >> The order of words in my query has an impact on the generated query in
>> >> edismax
>> >>
>> >> q={!edismax qf='name_text_gp' v=$qq}
>> >> =false
>> >> =...
>> >>
>> >> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see
>> the
>> >> synonyms expansion. It is working as expected.
>> >>
>> >> "parsedquery_toString":"+(((+name_text_gp:olympiqu
>> +name_text_gp:marseil
>> >> +name_text_gp:maillot) name_text_gp:om))",
>> >> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
>> >> +name_text_gp:marseil +name_text_gp:maillot)))",
>> >>
>> >>
>> >> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see
>> the
>> >> same generated query
>> >>
>> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>> >>
>> >> I don't understand these generated queries. The first one looks like
>> the
>> >> synonym expansion is ignored, but the second one shows it is not
>> ignored
>> >> and only the synonym term is used.
>> >>
>> >>
>> >> What is wrong in the way I am doing this ?
>> >>
>> >> Regards
>> >>
>> >> Dominique
>> >>
>> >> --
>> >> Dominique Béjean
>> >> 06 08 46 12 43
>> >>
>> > --
>> > Dominique Béjean
>> > 06 08 46 12 43
>>
>> --
> Dominique Béjean
> 06 08 46 12 43
>
-- 
Dominique Béjean
06 08 46 12 43

Re: Multi words query time synonyms

2018-02-11 Thread Dominique Bejean

Hi Steve,

Thank you for your response.
The Jira was created : SOLR-11968

I let you add your comments.

Regards.

Dominique


Le sam. 10 févr. 2018 à 20:30, Steve Rowe <sar...@gmail.com> a écrit :

> Hi Dominique,
>
> Looks like it’s a bug, not sure where exactly though.  Can you please
> create a JIRA?
>
> I can see the same behavior on master too, not just on the
> releases/lucene-solr/6.6.2 tag.
>
> One interesting thing I found is that if I remove the stop filter from the
> query analyzer, I get the following for qq=“maillot om”:
>
> +((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de
> +name_text_gp:marseil) name_text_gp:om)))
>
> (btw my stop list only has “de” on it)
>
> Thanks,
>
> --
> Steve
> www.lucidworks.com
>
> > On Feb 10, 2018, at 2:12 AM, Dominique Bejean <dominique.bej...@eolya.fr>
> wrote:
> >
> > Hi,
> >
> > More info.
> >
> > When I test the analisys for the field type the synonyms are correctly
> > expanded for both expressions
> >
> > om maillot
> > maillot om
> > olympique de marseille maillot
> > maillot olympique de marseille
> >
> > resulting outputs always include the following terms (obvioulsly not
> always
> > in the same order)
> >
> > olympiqu om marseil maillot
> >
> >
> > So, i suspect an issue with edismax query parser.
> >
> > Regards.
> >
> > Dominique
> >
> >
> > Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <
> dominique.bej...@eolya.fr>
> > a écrit :
> >
> >> Hi,
> >>
> >> I am trying multi words query time synonyms with Solr 6.6.2and
> >> SynonymGraphFilterFactory filter as explain in this article
> >>
> >>
> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
> >>
> >> My field type is :
> >>
> >>  >> positionIncrementGap="100">
> >>
> >>  
> >>   >>articles="lang/contractions_fr.txt"/>
> >>  
> >>  
> >>   >> ignoreCase="true"/>
> >>  
> >>
> >>
> >>  
> >>   >>articles="lang/contractions_fr.txt"/>
> >>  
> >>   >> synonyms="synonyms.txt"
> >>ignoreCase="true" expand="true"/>
> >>  
> >>   >> ignoreCase="true"/>
> >>  
> >>
> >>  
> >>
> >>
> >> synonyms.txt contains the line
> >>
> >> om, olympique de marseille
> >>
> >>
> >> The order of words in my query has an impact on the generated query in
> >> edismax
> >>
> >> q={!edismax qf='name_text_gp' v=$qq}
> >> =false
> >> =...
> >>
> >> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see
> the
> >> synonyms expansion. It is working as expected.
> >>
> >> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil
> >> +name_text_gp:maillot) name_text_gp:om))",
> >> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
> >> +name_text_gp:marseil +name_text_gp:maillot)))",
> >>
> >>
> >> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see
> the
> >> same generated query
> >>
> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
> >>
> >> I don't understand these generated queries. The first one looks like the
> >> synonym expansion is ignored, but the second one shows it is not ignored
> >> and only the synonym term is used.
> >>
> >>
> >> What is wrong in the way I am doing this ?
> >>
> >> Regards
> >>
> >> Dominique
> >>
> >> --
> >> Dominique Béjean
> >> 06 08 46 12 43
> >>
> > --
> > Dominique Béjean
> > 06 08 46 12 43
>
> --
Dominique Béjean
06 08 46 12 43

Re: Multi words query time synonyms

2018-02-10 Thread Dominique Bejean

Hi,

More info.

When I test the analisys for the field type the synonyms are correctly
expanded for both expressions

om maillot
maillot om
olympique de marseille maillot
maillot olympique de marseille

resulting outputs always include the following terms (obvioulsly not always
in the same order)

olympiqu om marseil maillot


So, i suspect an issue with edismax query parser.

Regards.

Dominique


Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <dominique.bej...@eolya.fr>
a écrit :

> Hi,
>
> I am trying multi words query time synonyms with Solr 6.6.2and
> SynonymGraphFilterFactory filter as explain in this article
>
> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
>
> My field type is :
>
>  positionIncrementGap="100">
> 
>   
>articles="lang/contractions_fr.txt"/>
>   
>   
>ignoreCase="true"/>
>   
> 
> 
>   
>articles="lang/contractions_fr.txt"/>
>   
>synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>   
>ignoreCase="true"/>
>   
> 
>   
>
>
> synonyms.txt contains the line
>
> om, olympique de marseille
>
>
> The order of words in my query has an impact on the generated query in
> edismax
>
> q={!edismax qf='name_text_gp' v=$qq}
> =false
> =...
>
> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the
> synonyms expansion. It is working as expected.
>
> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil
> +name_text_gp:maillot) name_text_gp:om))",
> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
> +name_text_gp:marseil +name_text_gp:maillot)))",
>
>
> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the
> same generated query
>
> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>
> I don't understand these generated queries. The first one looks like the
> synonym expansion is ignored, but the second one shows it is not ignored
> and only the synonym term is used.
>
>
> What is wrong in the way I am doing this ?
>
> Regards
>
> Dominique
>
> --
> Dominique Béjean
> 06 08 46 12 43
>
-- 
Dominique Béjean
06 08 46 12 43

Multi words query time synonyms

2018-02-09 Thread Dominique Bejean

Hi,

I am trying multi words query time synonyms with Solr 6.6.2and
SynonymGraphFilterFactory filter as explain in this article
https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/

My field type is :



  
  
  
  
  
  


  
  
  
  
  
  
  

  


synonyms.txt contains the line

om, olympique de marseille


The order of words in my query has an impact on the generated query in
edismax

q={!edismax qf='name_text_gp' v=$qq}
=false
=...

with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the
synonyms expansion. It is working as expected.

"parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil
+name_text_gp:maillot) name_text_gp:om))",
"parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
+name_text_gp:marseil +name_text_gp:maillot)))",


with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the
same generated query

"parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
"parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",

I don't understand these generated queries. The first one looks like the
synonym expansion is ignored, but the second one shows it is not ignored
and only the synonym term is used.


What is wrong in the way I am doing this ?

Regards

Dominique

-- 
Dominique Béjean
06 08 46 12 43

Re: Solr JVM best pratices

2017-12-04 Thread Dominique Bejean

Thank you Shaw for replying each items

I start to figure out better all these tricky jvm stuff.

Dominique

Le dim. 3 déc. 2017 à 01:30, Shawn Heisey <apa...@elyograg.org> a écrit :

> On 12/2/2017 8:43 AM, Dominique Bejean wrote:
> > I would like to have some advices on best practices related to Heap Size,
> > MMap, direct memory, GC algorithm and OS Swap.
>
> For the most part, there is no generic advice we can give you for these
> things.  What you need is going to be highly dependent on exactly what
> you are doing with Solr and how much index data you have.  There are no
> formulas for calculating these values based on information about your
> setup.
>
> Experienced Solr users can make *guesses* if you provide some
> information, but those guesses might turn out the be completely wrong.
>
>
> https://lucidworks.com/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> > About JVM heap size setting
> >
> > JVM heap size setting is related to use case so there is no other advice
> > than reduce it at the minimum possible size in order to avoid GC issue.
> > Reduce Heap size at is minimum will be achieved mainly by :
>
> The max heap size should be as large as you need, and no larger.
> Figuring out what you need may require trial and error on an
> installation that has all the index data and is receiving production
> queries.
>
> On this wiki page, I wrote a small section about one way you MIGHT be
> able to figure out what heap size you need:
>
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F
>
> > Optimize schema by remove unused fields and not index / store fields
> if
> > it is not necessary
> > -
> >
> > Enable docValues on fields used for facetting, sorting and grouping
> > -
> >
> > Not oversize Solr cache
> > -
> >
> > Be careful with rows and fl query parameters
>
> These are good ideas.  But sometimes you find that you need a lot of
> fields, and you need a lot of them to be stored.  The schema and config
> should be designed around what you need Solr to do.  Designing them for
> the lowest possible memory usage might result in a config that doesn't
> do what you want.
>
> > About MMap setting
> >
> > According to the great article “
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html”
> > from Uwe Schindler, the only tasks that have to be done at OS settings
> > level is check that “ulimit -v” and “ulimit -m” both report “unlimited”
> and
> > increase vm.max_map_count setting from is default 65536.
>
> The default directory implementation that recent Solr versions use is
> NRTCachingDirectoryFactory.  This wraps another implementation with a
> small memory cache.  The implementation that is wrapped by default DOES
> use MMAP.
>
> The amount of memory used for caching MMAP access cannot be configured
> in the application.  The OS handles that caching completely
> automatically, without any configuration at all.  All modern operating
> systems are designed so that the disk cache can use *all* available
> memory in the system.  This is because the cache will instantly give up
> memory if a program requests it.  The cache never keeps memory that
> programs want.
>
> > I suppose the best value is related to available off heap memory. I
> > generally set it to 262144. Is it a good value or is there a better way
> to
> > determine this value ?
>
> Solr doesn't use any off heap memory as far as I'm aware.  There was a
> fork of Solr for a short time named heliosearch, which DID use off-heap
> memory.  Java itself will use some off-heap memory for its own
> operation.  I do not know whether that is configurable, and if so, how
> it's done.
>
> > About Direct Memory
> >
> > According to a response in Solr Maillig list from Uwe Schindler (again),
> I
> > understand that the MmapDirectory is not Direct Memory.
> >
> > The only place where I read that MaxDirectMemorySize JVM setting have to
> be
> > set for Solr is in Cloudera blog post and in Solr mailing list when using
> > Solr with HDFS.
> >
> > Is it necessary to change the default MaxDirectMemorySize JVM setting ?
> If
> > yes, how to determine the appropriate value ?
>
> I have never heard of this "direct memory."  Solr probably doesn't use
> it.  I really have no idea what happens when the index is in HDFS.
> You'd have to ask somebody who knows Hadoop.
>
> > About OS Swap setting
> >
> > Linux generally starts swapping when less than 30% of the memory is free.
> > In

Re: Solr JVM best pratices

2017-12-02 Thread Dominique Bejean

Hi Walter,

Thank you for this response. Did you use CMS before G1 ? Was there any GC
issues fixed by G1 ?

Dominique


Le sam. 2 déc. 2017 à 17:13, Walter Underwood <wun...@wunderwood.org> a
écrit :

> We use an 8G heap and G1 with Shawn Heisey’s settings. Java 8, update 131.
>
> This has been solid in production with a 32 node Solr Cloud cluster. We do
> not do faceting.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Dec 2, 2017, at 7:43 AM, Dominique Bejean <dominique.bej...@eolya.fr>
> wrote:
> >
> > Hi,
> >
> > I would like to have some advices on best practices related to Heap Size,
> > MMap, direct memory, GC algorithm and OS Swap.
> >
> > This is a waste subject and sorry for this long question but all these
> > items are linked in order to have a stable Solr environment.
> >
> > My understanding and questions.
> >
> > About JVM heap size setting
> >
> > JVM heap size setting is related to use case so there is no other advice
> > than reduce it at the minimum possible size in order to avoid GC issue.
> > Reduce Heap size at is minimum will be achieved mainly by :
> >
> >   -
> >
> >   Optimize schema by remove unused fields and not index / store fields if
> >   it is not necessary
> >   -
> >
> >   Enable docValues on fields used for facetting, sorting and grouping
> >   -
> >
> >   Not oversize Solr cache
> >   -
> >
> >   Be careful with rows and fl query parameters
> >
> >
> > Any other advice is welcome :)
> >
> >
> > About MMap setting
> >
> > According to the great article “
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html”
> > from Uwe Schindler, the only tasks that have to be done at OS settings
> > level is check that “ulimit -v” and “ulimit -m” both report “unlimited”
> and
> > increase vm.max_map_count setting from is default 65536.
> >
> > I suppose the best value is related to available off heap memory. I
> > generally set it to 262144. Is it a good value or is there a better way
> to
> > determine this value ?
> >
> >
> > About Direct Memory
> >
> > According to a response in Solr Maillig list from Uwe Schindler (again),
> I
> > understand that the MmapDirectory is not Direct Memory.
> >
> > The only place where I read that MaxDirectMemorySize JVM setting have to
> be
> > set for Solr is in Cloudera blog post and in Solr mailing list when using
> > Solr with HDFS.
> >
> > Is it necessary to change the default MaxDirectMemorySize JVM setting ?
> If
> > yes, how to determine the appropriate value ?
> >
> >
> > About OS Swap setting
> >
> > Linux generally starts swapping when less than 30% of the memory is free.
> > In order to avoid OS goes against Solr for off heap memory management,  I
> > use to change OS swappiness value to 0. Can you confirm it is a good
> thing ?
> >
> >
> > About CMS GC vs G1 GC
> >
> > Default Solr setting use CMS GC.
> >
> > According to the post from Shawn Heisey in the old Solr wiki (
> > https://wiki.apache.org/solr/ShawnHeisey), can we consider that G1 GC
> can
> > definitely be used with Solr for heap size over nearly 4Gb ?
> >
> >
> > Regards
> >
> > Dominique
> >
> > --
> > Dominique Béjean
> > 06 08 46 12 43
>
> --
Dominique Béjean
06 08 46 12 43

Re: JVM GC Issue

2017-12-02 Thread Dominique Bejean

Hi Toke,

Nearly 30% of the requests are setting facet.limit=200

On 42000 requests the number of time each field is used for faceting is

$ grep  'facet=true' select.log | grep -oP 'facet.field=([^&])*' | sort |
uniq -c | sort -r

 23119 facet.field=category_path

  8643 facet.field=EUR_0_price_decimal

  5560 facet.field=type_pratique_facet

  5560 facet.field=size_facet_facet

  5560 facet.field=marque_facet

  5560 facet.field=is_marketplace_origin_facet

  5560 facet.field=gender_facet

  5560 facet.field=color_facet_facet

  5560 facet.field=club_facet

  5560 facet.field=age_facet

  3290 facet.field=durete_facet

  3290 facet.field=diametre_roues_facet

   169 facet.field=EUR_1_price_decimal

38 facet.field=EUR_4_price_decimal


The larger count of unique values for these fields are

category_path 3025

marque_facet 2100

size_facet_facet 1400

type_pratique_facet 166

color_facet_facet 165

Here are 2 typical queries :

2017-11-20 10:13:27.585 INFO  (qtp592179046-15153) [   x:french]
o.a.s.c.S.Request [french]  webapp=/solr path=/select
params={mm=100%25=category_path=age_facet=is_marketplace_origin_facet=type_pratique_facet=gender_facet=color_facet_facet=size_facet_facet=EUR_0_price_decimal=EUR_0_price_decimal=marque_facet=club_facet=diametre_roues_facet=durete_facet=*:*&
json.nl=map=products_id,product_type_static,name_varchar,store_id,website_id,EUR_0_price_decimal=0=(store_id:"1")+AND+(website_id:"1")+AND+(product_status:"1")+AND+(category_id:"3389")+AND+(filter_visibility_int:"2"+OR+filter_visibility_int:"4")=48=is_marketplace_origin_boost_exact:"OUI"^210+is_marketplace_origin_boost:OUI^207+is_marketplace_origin_relative_boost:OUI^203+is_marketplace_origin_boost_exact:"NON"^210+is_marketplace_origin_boost:NON^207+is_marketplace_origin_relative_boost:NON^203+=*:*=200==edismax=textSearch=true=1=true=json=EUR_0_price_decimal=sort_EUR_0_special_price_decimal=true=1511172807}
hits=953 status=0 QTime=26

2017-11-20 10:17:28.193 INFO  (qtp592179046-17115) [   x:french]
o.a.s.c.S.Request [french]  webapp=/solr path=/select
params={mm=100%25=category_path=age_facet=is_marketplace_origin_facet=type_pratique_facet=gender_facet=color_facet_facet=size_facet_facet=EUR_0_price_decimal=marque_facet=club_facet&
json.nl=map=products_id,product_type_static,name_varchar,store_id,website_id,EUR_0_price_decimal=0=(store_id:"1")+AND+(website_id:"1")+AND+(product_status:"1")+AND+(filter_visibility_int:"3"+OR+filter_visibility_int:"4")8=name_boost_exact:"velo"^120+name_boost:"velo"^100+name_relative_boost:velo^80+category_boost:velo^60+is_marketplace_origin_boost_exact:"OUI"^210+is_marketplace_origin_boost:OUI^207+is_marketplace_origin_relative_boost:OUI^203+is_marketplace_origin_boost_exact:"NON"^210+is_marketplace_origin_boost:NON^207+is_marketplace_origin_relative_boost:NON^203+size_facet_boost_exact:"velo"^299+size_facet_boost:velo^296+size_facet_relative_boost:velo^292+marque_boost_exact:"velo"^359+marque_boost:velo^356+marque_relative_boost:velo^352+=velo=200=velo=edismax=textSearch=true=1=true=json=EUR_0_price_decimal=sort_EUR_0_special_price_decimal=true=1511173047}
hits=6761 status=0 QTime=38




Dominique


Le sam. 2 déc. 2017 à 16:23, Toke Eskildsen <t...@kb.dk> a écrit :

> Dominique Bejean <dominique.bej...@eolya.fr> wrote:
> > Hi, Thank you for the explanations about faceting. I was thinking the hit
> > count had a biggest impact on facet memory lifecycle.
>
> Only if you have a very high facet.limit. Could you provide us with a
> typical query, including all the parameters?
>
> - Toke Eskildsen
>
-- 
Dominique Béjean
06 08 46 12 43

Solr JVM best pratices

2017-12-02 Thread Dominique Bejean

Hi,

I would like to have some advices on best practices related to Heap Size,
MMap, direct memory, GC algorithm and OS Swap.

This is a waste subject and sorry for this long question but all these
items are linked in order to have a stable Solr environment.

My understanding and questions.

About JVM heap size setting

JVM heap size setting is related to use case so there is no other advice
than reduce it at the minimum possible size in order to avoid GC issue.
Reduce Heap size at is minimum will be achieved mainly by :

   -

   Optimize schema by remove unused fields and not index / store fields if
   it is not necessary
   -

   Enable docValues on fields used for facetting, sorting and grouping
   -

   Not oversize Solr cache
   -

   Be careful with rows and fl query parameters


Any other advice is welcome :)


About MMap setting

According to the great article “
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html”
from Uwe Schindler, the only tasks that have to be done at OS settings
level is check that “ulimit -v” and “ulimit -m” both report “unlimited” and
increase vm.max_map_count setting from is default 65536.

I suppose the best value is related to available off heap memory. I
generally set it to 262144. Is it a good value or is there a better way to
determine this value ?


About Direct Memory

According to a response in Solr Maillig list from Uwe Schindler (again), I
understand that the MmapDirectory is not Direct Memory.

The only place where I read that MaxDirectMemorySize JVM setting have to be
set for Solr is in Cloudera blog post and in Solr mailing list when using
Solr with HDFS.

Is it necessary to change the default MaxDirectMemorySize JVM setting ? If
yes, how to determine the appropriate value ?


About OS Swap setting

Linux generally starts swapping when less than 30% of the memory is free.
In order to avoid OS goes against Solr for off heap memory management,  I
use to change OS swappiness value to 0. Can you confirm it is a good thing ?


About CMS GC vs G1 GC

Default Solr setting use CMS GC.

According to the post from Shawn Heisey in the old Solr wiki (
https://wiki.apache.org/solr/ShawnHeisey), can we consider that G1 GC can
definitely be used with Solr for heap size over nearly 4Gb ?


Regards

Dominique

-- 
Dominique Béjean
06 08 46 12 43

Re: JVM GC Issue

2017-12-02 Thread Dominique Bejean

Hi, Thank you for the explanations about faceting. I was thinking the hit
count had a biggest impact on facet memory lifecycle. Regardless the hit
cout there is a query peak at the time the issue occurs. This is relative
in regard of what Solr is supposed be able to handle, but this should be
sufficient to explain GC activity growing up. 198 10:07 208 10:08 267 10:09
285 10:10 244 10:11 286 10:12 277 10:13 252 10:14 183 10:15 302 10:16 299
10:17 273 10:18 348 10:19 468 10:20 496 10:21 673 10:22 496 10:23 101 10:24
At the time the issue occurs, we see the CPU activity growing up to very
high. May be there is a lack of CPU. So, I will suggest all actions that
will remove pressure on heap memory.


   - enable docValues
   - divide cache size per 2 in order go back to Solr default
   - refine the fl parameter as I know it can optimized

Concerning phonetic filter, anyway it will be removed as a large number of
results are really irrelevant. Regads. Dominique


Le sam. 2 déc. 2017 à 04:25, Erick Erickson <erickerick...@gmail.com> a
écrit :

> Doninique:
>
> Actually, the memory requirements shouldn't really go up as the number
> of hits increases. The general algorithm is (say rows=10)
> Calcluate the score of each doc
> If the score is zero, ignore
> If the score is > the minimum in my current top 10, replace the lowest
> scoring doc in my current top 10 with the new doc (a PriorityQueue
> last I knew).
> else discard the doc.
>
> When all docs have been scored, assemble the return from the top 10
> (or whatever rows is set to).
>
> The key here is that most of the Solr index is kept in
> MMapDirecotry/OS space, see Uwe's excellent blog here:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html.
> In terms of _searching_, very little of the Lucene index structures
> are kept in memory.
>
> That said, faceting plays a bit loose with the rules. If you have
> docValues set to true, most of the memory structures are in the OS
> memory space, not the JVM. If you have docValues set to false, then
> the "uninverted" structure is built in the JVM heap space.
>
> Additionally, the JVM requirements are sensitive to the number of
> unique values in field being faceted on. For instance, let's say you
> faceted by a date field with just facet.field=some_date_field. A
> bucket would have to be allocated to hold the counts for each and
> every unique date field, i.e. one for each millisecond in your search,
> which might be something you're seeing. Conceptually this is just an
> array[uniqueValues] of ints (longs? I'm not sure). This should be
> relatively easily testable by omitting the facets while measuring.
>
> Where the number of rows _does_ make a difference is in the return
> packet. Say I have rows=10. In that case I create a single return
> packet with all 10 docs "fl" field. If rows = 10,000 then that return
> packet is obviously 1,000 times as large and must be assembled in
> memory.
>
> I rather doubt the phonetic filter is to blame. But you can test this
> by just omitting the field containing the phonetic filter in the
> search query. I've certainly been wrong before.
>
> Best,
> Erick
>
> On Fri, Dec 1, 2017 at 2:31 PM, Dominique Bejean
> <dominique.bej...@eolya.fr> wrote:
> > Hi,
> >
> >
> > Thank you both for your responses.
> >
> >
> > I just have solr log for the very last period of the CG log.
> >
> >
> > Grep command allows me to count queries per minute with hits > 1000 or >
> > 1 and so with the biggest impact on memory and cpu during faceting
> >
> >
> >> 1000
> >
> >  59 11:13
> >
> >  45 11:14
> >
> >  36 11:15
> >
> >  45 11:16
> >
> >  59 11:17
> >
> >  40 11:18
> >
> >  95 11:19
> >
> > 123 11:20
> >
> > 137 11:21
> >
> > 123 11:22
> >
> >  86 11:23
> >
> >  26 11:24
> >
> >  19 11:25
> >
> >  17 11:26
> >
> >
> >> 1
> >
> >  55 11:19
> >
> >  78 11:20
> >
> >  48 11:21
> >
> > 134 11:22
> >
> >  93 11:23
> >
> >  10 11:24
> >
> >
> > So we see that at the time GC start become nuts, large result set count
> > increase.
> >
> >
> > The query field include phonetic filter and results are really not
> relevant
> > due to this. I will suggest to :
> >
> >
> > 1/ remove the phonetic filter in order to have less irrelevant results
> and
> > so get smaller result set
> >
&

1 2 >

1 - 100 of 156 matches

Mail list logo