Re: using S3 as the Directory for Solr

2020-04-23 Thread Walter Underwood
It will be a lot more than 2X or 3X slower. Years ago, I accidentally put Solr 
indexes on an NFS mounted filesystem and it was 100X slower. S3 would be a lot 
slower than that.

Are you doing relevance-ranked searches on all that data? That is the only 
reason to use Solr instead of some other solution.

I’d use Apache Hive, or whatever has replaced it. That is what Facebook wrote 
to do searches on their multi-petabyte logs.

https://hive.apache.org

More options.

https://jethro.io/hadoop-hive
https://mapr.com/why-hadoop/sql-hadoop/sql-hadoop-details/

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 23, 2020, at 7:29 PM, Christopher Schultz 
>  wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Rahul,
> 
> On 4/23/20 21:49, dhurandar S wrote:
>> Thank you for your reply. The reason we are looking for S3 is since
>> the volume is close to 10 Petabytes. We are okay to have higher
>> latency of say twice or thrice that of placing data on the local
>> disk. But we have a requirement to have long-range data and
>> providing Seach capability on that.  Every other storage apart from
>> S3 turned out to be very expensive at that scale.
>> 
>> Basically I want to replace
>> 
>> -Dsolr.directoryFactory=HdfsDirectoryFactory \
>> 
>> with S3 based implementation.
> 
> Can you clarify whether you have 10 PiB of /source data/ or 10 PiB of
> /index data/?
> 
> You can theoretically store your source data anywhere, of course. 10
> PiB sounds like a truly enormous index.
> 
> - -chris
> 
>> On Thu, Apr 23, 2020 at 3:12 AM Jan Høydahl 
>> wrote:
>> 
>>> Hi,
>>> 
>>> Is your data so partitioned that it makes sense to consider
>>> splitting up in multiple collections and make some arrangement
>>> that will keep only a few collections live at a time, loading
>>> index files from S3 on demand?
>>> 
>>> I cannot see how an S3 directory would be able to effectively
>>> cache files in S3 and what units the index files would be stored
>>> as?
>>> 
>>> Have you investigated EFS as an alternative? That would look like
>>> a normal filesystem to Solr but might be cheaper storage wise,
>>> but much slower.
>>> 
>>> Jan
>>> 
 23. apr. 2020 kl. 06:57 skrev dhurandar S
 :
 
 Hi,
 
 I am looking to use S3 as the place to store indexes. Just how
 Solr uses HdfsDirectory to store the index and all the other
 documents.
 
 We want to provide a search capability that is okay to be a
 little slow
>>> but
 cheaper in terms of the cost. We have close to 2 petabytes of
 data on
>>> which
 we want to provide the Search using Solr.
 
 Are there any open-source implementations around using S3 as
 the
>>> Directory
 for Solr ??
 
 Any recommendations on this approach?
 
 regards, Rahul
>>> 
>>> 
>> 
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iQIyBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl6iTwUACgkQHPApP6U8
> pFjRaw/4sGbH286gZJe+wfKsLc4JPvyJZjjwVDCdpiR2SHt50IA23wYSK97R6xRj
> dbWWReA7C3JNWp6x21i8Bb6sIeLDnotbc7IOSmOMuNep1BtVaYBMJ8wyW6uUtXf6
> hQbY0Ew93ZhDlS9CWMJqbQtWfrQEqH51Xbz+4uqqvJU8Bq9o9Vv0rnuVp/5f73lV
> ihek0sbA73oGle0gC5NFmrKItnn+14X8vIxUC8JRZlY4rDSiOdOcIil3DExxOQNQ
> UodIvwKKhzALFY77PeGSSjKiy0X3JJ1rKzLeIBrW0JCNMprYLzL2CQjZ5F09MraZ
> WxXdA64lEg2diEwHywNrsaaygbEZYTWd8gaeGA7kzCk78Y2KuhWuEQej6KmE3Iq2
> AW+K7JgFakUpzB5oorCtKNLQOqFHX85ne57gCYKr42S3Htfxmf98pBdudQy4RvuT
> +tJvGYx8NLqgeOoZN4u+G/8WunlzUC+u2vUxVcIoK3Ozz0usMioFDqn69vmOxxoH
> cN2Y4T1ZZZGtndiAGZww1JXKAbVN0U41isXg2F8tHQV9dxaeoYDQ/xYbAoWEhhlM
> SVtEdr76eMJ08T6h5711gtrhSK+RQFPD2Jbr8B/Xl063xPfN2TpqmcJCKXkucvpc
> CEDLFqeKX6qIRZDgMf8EICmbFl6aF5knbDP0MkyYk4urB+uFaw==
> =Y/6Y
> -END PGP SIGNATURE-



Re: using S3 as the Directory for Solr

2020-04-23 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rahul,

On 4/23/20 21:49, dhurandar S wrote:
> Thank you for your reply. The reason we are looking for S3 is since
> the volume is close to 10 Petabytes. We are okay to have higher
> latency of say twice or thrice that of placing data on the local
> disk. But we have a requirement to have long-range data and
> providing Seach capability on that.  Every other storage apart from
> S3 turned out to be very expensive at that scale.
>
> Basically I want to replace
>
> -Dsolr.directoryFactory=HdfsDirectoryFactory \
>
> with S3 based implementation.

Can you clarify whether you have 10 PiB of /source data/ or 10 PiB of
/index data/?

You can theoretically store your source data anywhere, of course. 10
PiB sounds like a truly enormous index.

- -chris

> On Thu, Apr 23, 2020 at 3:12 AM Jan Høydahl 
> wrote:
>
>> Hi,
>>
>> Is your data so partitioned that it makes sense to consider
>> splitting up in multiple collections and make some arrangement
>> that will keep only a few collections live at a time, loading
>> index files from S3 on demand?
>>
>> I cannot see how an S3 directory would be able to effectively
>> cache files in S3 and what units the index files would be stored
>> as?
>>
>> Have you investigated EFS as an alternative? That would look like
>> a normal filesystem to Solr but might be cheaper storage wise,
>> but much slower.
>>
>> Jan
>>
>>> 23. apr. 2020 kl. 06:57 skrev dhurandar S
>>> :
>>>
>>> Hi,
>>>
>>> I am looking to use S3 as the place to store indexes. Just how
>>> Solr uses HdfsDirectory to store the index and all the other
>>> documents.
>>>
>>> We want to provide a search capability that is okay to be a
>>> little slow
>> but
>>> cheaper in terms of the cost. We have close to 2 petabytes of
>>> data on
>> which
>>> we want to provide the Search using Solr.
>>>
>>> Are there any open-source implementations around using S3 as
>>> the
>> Directory
>>> for Solr ??
>>>
>>> Any recommendations on this approach?
>>>
>>> regards, Rahul
>>
>>
>
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIyBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl6iTwUACgkQHPApP6U8
pFjRaw/4sGbH286gZJe+wfKsLc4JPvyJZjjwVDCdpiR2SHt50IA23wYSK97R6xRj
dbWWReA7C3JNWp6x21i8Bb6sIeLDnotbc7IOSmOMuNep1BtVaYBMJ8wyW6uUtXf6
hQbY0Ew93ZhDlS9CWMJqbQtWfrQEqH51Xbz+4uqqvJU8Bq9o9Vv0rnuVp/5f73lV
ihek0sbA73oGle0gC5NFmrKItnn+14X8vIxUC8JRZlY4rDSiOdOcIil3DExxOQNQ
UodIvwKKhzALFY77PeGSSjKiy0X3JJ1rKzLeIBrW0JCNMprYLzL2CQjZ5F09MraZ
WxXdA64lEg2diEwHywNrsaaygbEZYTWd8gaeGA7kzCk78Y2KuhWuEQej6KmE3Iq2
AW+K7JgFakUpzB5oorCtKNLQOqFHX85ne57gCYKr42S3Htfxmf98pBdudQy4RvuT
+tJvGYx8NLqgeOoZN4u+G/8WunlzUC+u2vUxVcIoK3Ozz0usMioFDqn69vmOxxoH
cN2Y4T1ZZZGtndiAGZww1JXKAbVN0U41isXg2F8tHQV9dxaeoYDQ/xYbAoWEhhlM
SVtEdr76eMJ08T6h5711gtrhSK+RQFPD2Jbr8B/Xl063xPfN2TpqmcJCKXkucvpc
CEDLFqeKX6qIRZDgMf8EICmbFl6aF5knbDP0MkyYk4urB+uFaw==
=Y/6Y
-END PGP SIGNATURE-


Re: using S3 as the Directory for Solr

2020-04-23 Thread dhurandar S
Hi Jan,

Thank you for your reply. The reason we are looking for S3 is since the
volume is close to 10 Petabytes.
We are okay to have higher latency of say twice or thrice that of placing
data on the local disk. But we have a requirement to have long-range data
and providing Seach capability on that.  Every other storage apart from S3
turned out to be very expensive at that scale.

Basically I want to replace

-Dsolr.directoryFactory=HdfsDirectoryFactory \

 with S3 based implementation.


regards,
Rahul





On Thu, Apr 23, 2020 at 3:12 AM Jan Høydahl  wrote:

> Hi,
>
> Is your data so partitioned that it makes sense to consider splitting up
> in multiple collections and make some arrangement that will keep only
> a few collections live at a time, loading index files from S3 on demand?
>
> I cannot see how an S3 directory would be able to effectively cache files
> in S3 and what units the index files would be stored as?
>
> Have you investigated EFS as an alternative? That would look like a
> normal filesystem to Solr but might be cheaper storage wise, but much
> slower.
>
> Jan
>
> > 23. apr. 2020 kl. 06:57 skrev dhurandar S :
> >
> > Hi,
> >
> > I am looking to use S3 as the place to store indexes. Just how Solr uses
> > HdfsDirectory to store the index and all the other documents.
> >
> > We want to provide a search capability that is okay to be a little slow
> but
> > cheaper in terms of the cost. We have close to 2 petabytes of data on
> which
> > we want to provide the Search using Solr.
> >
> > Are there any open-source implementations around using S3 as the
> Directory
> > for Solr ??
> >
> > Any recommendations on this approach?
> >
> > regards,
> > Rahul
>
>


Solr 8.2 Cloud Replication Locked

2020-04-23 Thread Justin Sweeney
Hi all,

We are running Solr 8.2 Cloud in a cluster where we have a single TLOG
replica per shard and multiple PULL replicas for each shard. We have
noticed an issue recently where some of the PULL replicas stop replicating
from the masters. The will have a replication which outputs:

o.a.s.h.IndexFetcher Number of files in latest index in master:

Then nothing else for IndexFetcher after that. I went onto a few instances
and took a thread dump and we see the following where it seems to be locked
getting the index write lock. I don’t see anything else in the thread dump
indicating deadlock. Any ideas here?

"indexFetcher-19-thread-1" #468 prio=5 os_prio=0 cpu=285847.01ms
> elapsed=62993.13s tid=0x7fa8fc004800 nid=0x254 waiting on condition
> [0x7ef584ede000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> - parking to wait for <0x0003aa5e4ad8> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> /LockSupport.java:234)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(java.base@11.0.6
> /AbstractQueuedSynchronizer.java:980)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(java.base@11.0.6
> /AbstractQueuedSynchronizer.java:1288)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(java.base@11.0.6
> /ReentrantReadWriteLock.java:1131)
> at
> org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)
> at
> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:240)
> at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:569)
> at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:424)
> at
> org.apache.solr.handler.ReplicationHandler.lambda$setupPolling$13(ReplicationHandler.java:1193)
> at
> org.apache.solr.handler.ReplicationHandler$$Lambda$668/0x000800d0f440.run(Unknown
> Source)
> at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.6
> /Executors.java:515)
> at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.6
> /FutureTask.java:305)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.6
> /ScheduledThreadPoolExecutor.java:305)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6
> /ThreadPoolExecutor.java:1128)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6
> /ThreadPoolExecutor.java:628)
> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)


Dynamic reload of TLS configuration

2020-04-23 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

Does anyone know if it is possible to reconfigure Solr's TLS
configuration (specifically, the server key and certificate) without a
restart?

I'm looking for a zero-downtime situation with a single-server and an
updated TLS certificate.

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl6htdYACgkQHPApP6U8
pFhIpA/9EZ/jC3QjGUfx+g9fpNel5AxzpCV0NnTaJulGLkWNVeGoGNY2IcwfG+Oe
13piWVfRWs3OTLWYwiEhuxbY3FzBxJZL9EJ6QFNNmCVkyg2MMgQzR+bdSWiT0P7F
K8hPyOzEMqLML5y6c1TOQHF8Bn09mHwgLACHdnvzfFKcaaUSzzBKItDlIvDTB5Vm
m1x/GOBQ4P7uYr+Gi7hUbr+Zz6MwDI9HT2arUwAiG0aeTO///FrZEtVdKdJtrDWk
tBwZz+qzkOzWj6EuTWgLU2/64QVzJsutGJmhkpixLaGaAnrpQ5d+3PjhxYraKA3j
tahzRYJGC2PEUxQMZsWWCPSJodDsB/5h4zo5DsdIOZLmrAuuI367j5fcb9fO/J3c
KxStUZf04ZCXWb17xMIrcYecWwkNQydjuwH5yRQHb9c7C3oRCpYNxY0yueUg/+8W
voJUvCwR9qRD9NdSAUB9JOkt0Tj0c/SEgP5X8zllF5kISb7q7KcUVoyjG+vei1H0
E+4VNV3KnqnIJQgnFIsUU6ZiGznn+uy0I29+we8P08GX27MlEL1+KxjsT8la6h97
OWXwuH44e4ntFFsYbC9lOmn3ib/zA45l1sO77wTdDH9iBwKZXmLmf24ABlXvy8uI
4AH3dvOxjFeXWtYq9m2jebotiirzpkPaxvzBHJ+WDcVgtQKZ7wo=
=5MDT
-END PGP SIGNATURE-


Failure to distribute update after 25 retries

2020-04-23 Thread Beikov Christian
Hello,

I have a few very strange problems and hope anyone can help me with that. I'm 
trying to index something with Solr 8.4.1 but after a few documents I get the 
following exceptions:

2020-04-23 13:00:43.484 INFO  (qtp1635378213-21) [c:cc5363_dm_documentversion 
s:shard1 r:core_node3 x:cc5363_dm_documentversion_shard1_replica_n1] 
o.a.s.u.SolrCmdDistributor SolrCmdDistributor found 1 errors
2020-04-23 13:00:45.484 ERROR (qtp1635378213-21) [c:cc5363_dm_documentversion 
s:shard1 r:core_node3 x:cc5363_dm_documentversion_shard1_replica_n1] 
o.a.s.u.SolrCmdDistributor forwarding update to 
http://solr-0.solr.cc-demo:8983/solr/cc5363_dm_documentversion_shard2_replica_n4/
  failed - retrying ... retries: 25/25. add{,id=14691!14706} 
params:update.distrib=TOLEADER=http://solr-1.solr.cc-demo:8983/solr/cc5363_dm_documentversion_shard1_replica_n1/
  rsp:404:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at 
http://solr-0.solr.cc-demo:8983/solr/cc5363_dm_documentversion_shard2_replica_n4/:
  null



request: 
http://solr-0.solr.cc-demo:8983/solr/cc5363_dm_documentversion_shard2_replica_n4/
 at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:274)
 at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:181)
 at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
 at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 at java.base/java.lang.Thread.run(Thread.java:834)

2020-04-23 13:00:45.489 WARN  
(updateExecutor-5-thread-1-processing-x:cc5363_dm_documentversion_shard1_replica_n1
 r:core_node3 null n:solr-1.solr.cc-demo:8983_solr c:cc5363_dm_documentversion 
s:shard1) [c:cc5363_dm_documentversion s:shard1 r:core_node3 
x:cc5363_dm_documentversion_shard1_replica_n1] 
o.a.s.c.s.i.ConcurrentUpdateHttp2SolrClient Failed to parse error response from 
http://solr-0.solr.cc-demo:8983/solr/cc5363_dm_documentversion_shard2_replica_n4/
  due to: java.lang.RuntimeException: Invalid version (expected 2, but 60) or 
the data in not in 'javabin' format
2020-04-23 13:00:45.489 ERROR 
(updateExecutor-5-thread-1-processing-x:cc5363_dm_documentversion_shard1_replica_n1
 r:core_node3 null n:solr-1.solr.cc-demo:8983_solr c:cc5363_dm_documentversion 
s:shard1) [c:cc5363_dm_documentversion s:shard1 r:core_node3 
x:cc5363_dm_documentversion_shard1_replica_n1] 
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling 
SolrCmdDistributor$Req: cmd=add{,id=14691!14706}; node=ForwardNode: 
http://solr-0.solr.cc-demo:8983/solr/cc5363_dm_documentversion_shard2_replica_n4/
  to 
http://solr-0.solr.cc-demo:8983/solr/cc5363_dm_documentversion_shard2_replica_n4/
  => org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at 
http://solr-0.solr.cc-demo:8983/solr/cc5363_dm_documentversion_shard2_replica_n4/:
  null

I have two nodes(solr-0 and solr-1) running in a stateful set in OpenShift with 
a single zookeeper instance. The collection cc5363_dm_documentversion is 
configured with a shardCount 2, replicationFactor 2, maxShardsPerNode 2, router 
compositeId and autoAddReplicas false. I create the collection on demand while 
indexing when encountering that a collection does not yet exist and create 
around 10 documents per "transaction" i.e. commit after 10 documents.

The first thing that is strange is that some shards that are created 
automatically get the replica created on the same node as the leader. In this 
case shard1 has two replicas core_node3(replica_n1, the leader) and 
core_node5(replica_n2, the replica) which are both on solr-1. The shard2 has 
core_node7(replica_n4, the leader) on solr-0 and core_node8(replica_n6, the 
replica) on solr-1. That's what the web-interface tells me

Replica: core_node3
core:cc5363_dm_documentversion_shard1_replica_n1
base URL:http://solr-1.solr.cc-demo:8983/solr
node name:solr-1.solr.cc-demo:8983_solr
state:active
leader: yes
Replica: core_node5
core:cc5363_dm_documentversion_shard1_replica_n2
base URL:http://solr-1.solr.cc-demo:8983/solr
node name:solr-1.solr.cc-demo:8983_solr
state:active
leader: no

Replica: core_node7
core:cc5363_dm_documentversion_shard2_replica_n4
base URL:http://solr-0.solr.cc-demo:8983/solr
node name:solr-0.solr.cc-demo:8983_solr
state:active
leader: yes
Replica: core_node8
core:cc5363_dm_documentversion_shard2_replica_n6
base URL:http://solr-1.solr.cc-demo:8983/solr
node name:solr-1.solr.cc-demo:8983_solr
state:active
leader: no

I thought the replica 

failed collection‘s metadata remains in ZK

2020-04-23 Thread YangLiu
Hello everyone,
I am using Solr 7.7.2,  I create a collection with shards=10 more than my 
nodes, then the service returned the following error:


"Cannot create collection solrdemo. Value of maxShardsPerNode is 1, and the 
number of nodes currently live or live and part of your createNodeSet is 1. 
This allows a maximum of 1 to be created. Value of numShards is 10, value of 
nrtReplicas is 1, value of tlogReplicas is 0 and value of pullReplicas is 0. 
This requires 10 shards to be created (higher than the allowed number)"


The error is in line with expectations, but I saw the collection without any 
shards in WEB UI. Although the creation of the above collection failed, some 
collection metadata remains in ZK.





Automatic reset of zxid on client for CloudSolrClient

2020-04-23 Thread Beikov Christian
Hey there!

I am curious if there is an option to automatically reset the zxid on a 
CloudSolrClient if a mismatch is detected. After a reset of my cluster I 
currently have to also restart the application because of the wrong zxid, 
although it doesn't participate in the cluster.
I have a feeling that this could also be a problem when zookeeper 
crashes/restarts if some TXs haven't been flushed properly.

I'm using Zookeeper 3.5.7 and SolrJ 8.4.1. Thanks in advance!

Freundliche Grüße

---
Christian Beikov
Software-Architect, R

curecomp Software Services GmbH
Neue Werft
Industriezeile 35
4020 Linz

web: www.curecomp.com
E-Mail: c.bei...@curecomp.com
mobile: +43 660 5566055

[BMEzertifizierung_Banner_Signatur7]





Re: Defaults Merge Policy

2020-04-23 Thread Erick Erickson
Glad those articles helped, I remember them well ;)

Do note that 30 (well, actually 33%) is usually the ceiling.
But as I mentioned, it’s soft, not absolute. So your index
might have a higher percentage temporarily.

Best,
Erick

> On Apr 23, 2020, at 4:01 AM, Kayak28  wrote:
> 
> Hello, Erick Erickson:
> 
> Thank you for answering my questions.
> 
> Deleted docs in Solr 8.3.0 has not reached to 30% of the entire index,
> so I will monitor it for now.
> Again thank you for your response.
> 
> Actually, the articles below helped me a lot.
> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
> 
> 
> Sincerely,
> Kaya Ota
> 
> 2020年4月16日(木) 2:41 Erick Erickson :
> 
>> The number of deleted documents will bounce around.
>> The default TieredMergePolicy has a rather complex
>> algorithm that decides which segments to
>> merge, and the percentage of deleted docs in any
>> given segment is a factor, but not the sole determinant.
>> 
>> Merging is not really based on the raw number of segments,
>> rather on the number of segments of similar size.
>> 
>> But the short answer is “no, you don’t have to configure
>> anything explicitly”. The percentage of deleted docs
>> should max out at around 30% or so, although that’s a
>> soft number, it’s usually lower.
>> 
>> Unless you have some provable performance problem,
>> I wouldn’t worry about it. And don’t infer anything
>> until you’ve indexed a _lot_ of docs.
>> 
>> Oh, and I kind of dislike numDocs as the trigger and
>> tend to use time on the theory that it’s easier to explain,
>> whereas when commits happen when using maxDocs
>> varies depending on the throughput rate.
>> 
>> Best,
>> Erick
>> 
>>> On Apr 15, 2020, at 1:28 PM, Kayak28  wrote:
>>> 
>>> Hello, Solr Community:
>>> 
>>> I would like to ask about Default's Merge Policy for Solr 8.3.0.
>>> My client (SolrJ) makes a commit every 10'000 doc.
>>> I have not explicitly configured Merge Policy via solrconfig.xml
>>> For each indexing time, some documents are updated or deleted.
>>> I think the Default Merge Policy will merge segments automatically
>>> if there are too many segments.
>>> But, the number of deleted documents is increasing.
>>> 
>>> Is there a Default Merge Policy Configuration?
>>> Or, do I have to configure it?
>>> 
>>> Sincerely,
>>> Kaya Ota
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Sincerely,
>>> Kaya
>>> github: https://github.com/28kayak
>> 
>> 
> 
> -- 
> 
> Sincerely,
> Kaya
> github: https://github.com/28kayak



Re: Cause of java.io.IOException: No space left on device Error

2020-04-23 Thread Erick Erickson
In addition to what Dario mentioned, background merges
happen all the time, optimize is just a special case (and
very expensive). 

You say “one of my Solr cores has 47G”, but segment merging
can easily occur on multiple cores at once, so that’s not
definitive.

We usually recommend that people have at least as much free
space on disk as the aggregate of _all_ the cores/replicas on
a physical machine.

More than you may want to know about segment merging:
https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/

Best,
Erick

> On Apr 23, 2020, at 4:38 AM, Dario Rigolin  wrote:
> 
> When solr starts an optimization of the index you have to have free at
> least same size (I don't know if 3 times is correct) of the core you are
> optimizing.
> Maybe your free space isn't enough to handle the optimization process.
> Sometimes you have to restart the Solr process to have released more space
> on the filesystem, couple of time solr didn't release all space.
> 
> 
> 
> Il giorno gio 23 apr 2020 alle ore 10:23 Kayak28 
> ha scritto:
> 
>> Hello, Community:
>> 
>> I am currently using Solr 5.3.1. on CentOS.
>> The other day, I faced an error message that shows
>> " java.io.IOException: No space left on device"
>> 
>> My disk for Solr has empty space about 35GB
>> and the total amount of the disk is 581GB.
>> 
>> I doubted there was no enough space for Linux inode,
>> but inode still has spaces. (IUse was 1% )
>> 
>> One of my Solr cores has 47GB of indexes.
>> 
>> Is there a possibility that the error happens when I do forceMerge on the
>> big core
>> (which I believe optimize needs temporarily 3 times spaces of index-size)?
>> Or is there any other possibility to cause the error?
>> 
>> 
>> Any Clues are very helpful.
>> 
>> --
>> 
>> Sincerely,
>> Kaya
>> github: https://github.com/28kayak
>> 
> 
> 
> -- 
> 
> Dario Rigolin
> Comperio srl - CTO
> Mobile: +39 347 7232652 - Office: +39 0425 471482
> Skype: dario.rigolin



Re: SegmentsInfoRequestHandler does not release IndexWriter

2020-04-23 Thread Andrzej Białecki
Hi Tiziano,

Indeed, this looks like a bug - good catch! Please file a Jira issue, I’ll get 
to it soon.

> On 23 Apr 2020, at 00:19, Tiziano Degaetano  
> wrote:
> 
> Hello,
> 
> I’m digging in an issue getting timeouts doing a managed schema change using 
> the schema api.
> The call  hangs reloading the cores (does not recover until restarting the 
> node):
> 
> sun.misc.Unsafe.park​(Native Method)
> java.util.concurrent.locks.LockSupport.parkNanos​(Unknown Source)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos​(Unknown 
> Source)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos​(Unknown
>  Source)
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock​(Unknown 
> Source)
> org.apache.solr.update.DefaultSolrCoreState.lock​(DefaultSolrCoreState.java:179)
> org.apache.solr.update.DefaultSolrCoreState.newIndexWriter​(DefaultSolrCoreState.java:230)
> org.apache.solr.core.SolrCore.reload​(SolrCore.java:696)
> org.apache.solr.core.CoreContainer.reload​(CoreContainer.java:1558)
> org.apache.solr.schema.SchemaManager.doOperations​(SchemaManager.java:133)
> org.apache.solr.schema.SchemaManager.performOperations​(SchemaManager.java:92)
> org.apache.solr.handler.SchemaHandler.handleRequestBody​(SchemaHandler.java:90)
> org.apache.solr.handler.RequestHandlerBase.handleRequest​(RequestHandlerBase.java:211)
> org.apache.solr.core.SolrCore.execute​(SolrCore.java:2596)
> org.apache.solr.servlet.HttpSolrCall.execute​(HttpSolrCall.java:802)
> org.apache.solr.servlet.HttpSolrCall.call​(HttpSolrCall.java:579)
> 
> After a while I realized it was only deadlocked, after I used the AdminUI to 
> view the segments info of the core.
> 
> So my question: is this line correct? If withCoreInfo is false iwRef.decref() 
> will not be called to release the reader lock, preventing any further writer 
> locks.
> https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144
> 
> Regards,
> Tiziano
> 



Re: using S3 as the Directory for Solr

2020-04-23 Thread Jan Høydahl
Hi,

Is your data so partitioned that it makes sense to consider splitting up
in multiple collections and make some arrangement that will keep only
a few collections live at a time, loading index files from S3 on demand?

I cannot see how an S3 directory would be able to effectively cache files
in S3 and what units the index files would be stored as?

Have you investigated EFS as an alternative? That would look like a
normal filesystem to Solr but might be cheaper storage wise, but much
slower.

Jan

> 23. apr. 2020 kl. 06:57 skrev dhurandar S :
> 
> Hi,
> 
> I am looking to use S3 as the place to store indexes. Just how Solr uses
> HdfsDirectory to store the index and all the other documents.
> 
> We want to provide a search capability that is okay to be a little slow but
> cheaper in terms of the cost. We have close to 2 petabytes of data on which
> we want to provide the Search using Solr.
> 
> Are there any open-source implementations around using S3 as the Directory
> for Solr ??
> 
> Any recommendations on this approach?
> 
> regards,
> Rahul



Re: Cause of java.io.IOException: No space left on device Error

2020-04-23 Thread Dario Rigolin
When solr starts an optimization of the index you have to have free at
least same size (I don't know if 3 times is correct) of the core you are
optimizing.
Maybe your free space isn't enough to handle the optimization process.
Sometimes you have to restart the Solr process to have released more space
on the filesystem, couple of time solr didn't release all space.



Il giorno gio 23 apr 2020 alle ore 10:23 Kayak28 
ha scritto:

> Hello, Community:
>
> I am currently using Solr 5.3.1. on CentOS.
> The other day, I faced an error message that shows
> " java.io.IOException: No space left on device"
>
> My disk for Solr has empty space about 35GB
> and the total amount of the disk is 581GB.
>
> I doubted there was no enough space for Linux inode,
> but inode still has spaces. (IUse was 1% )
>
> One of my Solr cores has 47GB of indexes.
>
> Is there a possibility that the error happens when I do forceMerge on the
> big core
> (which I believe optimize needs temporarily 3 times spaces of index-size)?
> Or is there any other possibility to cause the error?
>
>
> Any Clues are very helpful.
>
> --
>
> Sincerely,
> Kaya
> github: https://github.com/28kayak
>


-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin


Re: FuzzyQuery causing Out of Memory Errors in 8.5.x

2020-04-23 Thread Colvin Cowie
https://issues.apache.org/jira/browse/SOLR-14428



On Thu, 23 Apr 2020 at 08:45, Colvin Cowie 
wrote:

> I created a little test that fires off fuzzy queries from random UUID
> strings for 5 minutes
> *FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"*
>
> The change in heap usage is really severe.
>
> On 8.5.1 Solr went OOM almost immediately on a 512mb heap, and with a 4GB
> heap it only just stayed alive.
> On 8.3.1 it was completely happy.
>
> I'm guessing that the memory might be being leaked if the FuzzyQuery
> objects are referenced from the cache, while the FuzzyTermsEnum would not
> have been.
>
> I'm going to raise an issue
>
>
> On Wed, 22 Apr 2020 at 19:44, Colvin Cowie 
> wrote:
>
>> Hello,
>>
>> I'm moving our product from 8.3.1 to 8.5.1 in dev and we've got tests
>> failing because Solr is getting OOMEs with a 512mb heap where it was
>> previously fine.
>>
>> I ran our tests on both versions with jconsole to track the heap usage.
>> Here's a little comparison. 8.5.1 dies part way through
>> https://drive.google.com/open?id=113Ujts-lzv9ZBJOUB78LA2Qw5PsIsajO
>>
>> We have our own query parser as an extension to Solr, and we do various
>> things with user queries, including generating FuzzyQuery-s. Our
>> implementation of org.apache.solr.search.QParser.parse() isn't stateful and
>> parses the qstr and returns new Query objects each time it's called.
>> With JProfiler on I can see that the majority of the heap is being
>> allocated through FuzzyQuery's constructor.
>> https://issues.apache.org/jira/browse/LUCENE-9068 moved construction of
>> the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor.
>>
>> When profiling on 8.3.1 we still have a fairly large number of
>> FuzzyTermEnums created at times, but that's accounting for about ~40mb of
>> the heap for a few seconds rather than the 100mb to 300mb of continual
>> allocation for FuzzyQuery I'm seeing in 8.5.
>>
>> It's definitely possible that we're doing something wrong in our
>> extension (which I can't share the source of) but it seems like the memory
>> cost of FuzzyQuery now is totally disproportionate to what it was before.
>> We've not had issues like this with our extension before (which doesn't
>> mean that our parser is flawless, but it's not been causing noticeable
>> problems for the last 4 years).
>>
>>
>> So I suppose the question is, are we misusing FuzzyQuery in some way
>> (hard for you to say without seeing the source), or are the recent changes
>> using more memory than they should?
>>
>> I will investigate further into what we're doing. But I could maybe use
>> some help to create a stress test for Lucene itself that compares the
>> memory consumption of the old FuzzyQuery vs the new, to see whether it's
>> fundamentally bad for memory or if it's just how we're using it.
>>
>> Regards,
>> Colvin
>>
>>
>>
>>


Re: Solr indexing with Tika DIH - ZeroByteFileException

2020-04-23 Thread Charlie Hull
If users can upload any PDF, including broken or huge ones, and some 
cause a Tika error, you should decouple Tika from Solr and run it as a 
separate process to extract text before indexing with Solr. Otherwise 
some of what is uploaded *will* break Solr.

https://lucidworks.com/post/indexing-with-solrj/ has some good hints.

Cheers

Charlie

On 11/06/2019 15:27, neilb wrote:

Hi, while going through solr logs, I found data import error for certain
documents. Here are details about the error.

Exception while processing: file document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 7866
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.tika.exception.ZeroByteFileException: InputStream must
have > 0 bytes
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:122)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)


How do I know which document(document name with path) is #7866? And how do I
ignore ZeroByteFileException as document network share is not in my control.
Users can upload any size pdfs to it.

Thanks!



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com



Cause of java.io.IOException: No space left on device Error

2020-04-23 Thread Kayak28
Hello, Community:

I am currently using Solr 5.3.1. on CentOS.
The other day, I faced an error message that shows
" java.io.IOException: No space left on device"

My disk for Solr has empty space about 35GB
and the total amount of the disk is 581GB.

I doubted there was no enough space for Linux inode,
but inode still has spaces. (IUse was 1% )

One of my Solr cores has 47GB of indexes.

Is there a possibility that the error happens when I do forceMerge on the
big core
(which I believe optimize needs temporarily 3 times spaces of index-size)?
Or is there any other possibility to cause the error?


Any Clues are very helpful.

-- 

Sincerely,
Kaya
github: https://github.com/28kayak


Re: Defaults Merge Policy

2020-04-23 Thread Kayak28
Hello, Erick Erickson:

Thank you for answering my questions.

Deleted docs in Solr 8.3.0 has not reached to 30% of the entire index,
so I will monitor it for now.
Again thank you for your response.

Actually, the articles below helped me a lot.
https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/


 Sincerely,
Kaya Ota

2020年4月16日(木) 2:41 Erick Erickson :

> The number of deleted documents will bounce around.
> The default TieredMergePolicy has a rather complex
> algorithm that decides which segments to
> merge, and the percentage of deleted docs in any
> given segment is a factor, but not the sole determinant.
>
> Merging is not really based on the raw number of segments,
> rather on the number of segments of similar size.
>
> But the short answer is “no, you don’t have to configure
> anything explicitly”. The percentage of deleted docs
> should max out at around 30% or so, although that’s a
> soft number, it’s usually lower.
>
> Unless you have some provable performance problem,
> I wouldn’t worry about it. And don’t infer anything
> until you’ve indexed a _lot_ of docs.
>
> Oh, and I kind of dislike numDocs as the trigger and
> tend to use time on the theory that it’s easier to explain,
> whereas when commits happen when using maxDocs
> varies depending on the throughput rate.
>
> Best,
> Erick
>
> > On Apr 15, 2020, at 1:28 PM, Kayak28  wrote:
> >
> > Hello, Solr Community:
> >
> > I would like to ask about Default's Merge Policy for Solr 8.3.0.
> > My client (SolrJ) makes a commit every 10'000 doc.
> > I have not explicitly configured Merge Policy via solrconfig.xml
> > For each indexing time, some documents are updated or deleted.
> > I think the Default Merge Policy will merge segments automatically
> > if there are too many segments.
> > But, the number of deleted documents is increasing.
> >
> > Is there a Default Merge Policy Configuration?
> > Or, do I have to configure it?
> >
> > Sincerely,
> > Kaya Ota
> >
> >
> >
> > --
> >
> > Sincerely,
> > Kaya
> > github: https://github.com/28kayak
>
>

-- 

Sincerely,
Kaya
github: https://github.com/28kayak


gzip compression solr 8.4.1

2020-04-23 Thread Johannes Siegert
Hi,

we want to use gzip-compression between our application and the solr server.

We use a standalone solr server version 8.4.1 and the prepackaged jetty as
application server.

We have enabled the jetty gzip module by adding these two files:

{path_to_solr}/server/modules/gzip.mod (see below the question)
{path_to_solr}/server/etc/jetty-gzip.xml (see below the question)

Within the application we use a HttpSolrServer that is configured with
allowCompression=true.

After we had released our application we saw that the number of connections
within the TCP-state CLOSE_WAIT rising up until the application was not
able to open new connections.


After a long debugging session we think the problem is that the header
"Content-Length" that is returned by the jetty is sometimes wrong when
gzip-compression is enabled.

The solrj client uses a ContentLengthInputStream, that uses the header
"Content-Lenght" to detect if all data was received. But the InputStream
can not be fully consumed because the value of the header "Content-Lenght"
is higher than the actual content-length.

Usually the method PoolingHttpClientConnectionManager.releaseConnection is
called after the InputStream was fully consumed. This give the connection
free to be reused or to be closed by the application.

Due to the incorrect header "Content-Length" the
PoolingHttpClientConnectionManager.releaseConnection method is never called
and the connection stays active. After the connection-timeout of the jetty
is reached, it closes the connection from the server-side and the TCP-state
switches into CLOSE_WAIT. The client never closes the connection and so the
number of connections in use rises up.


Currently we try to configure the jetty gzip module to return no
"Content-Length" if gzip-compression was used. We hope that in this case
another InputStream implementation is used that uses the NULL-terminator to
see when the InputStream was fully consumed.

Do you have any experiences with this problem or any suggestions for us?

Thanks,

Johannes


gzip.mod

-

DO NOT EDIT - See:
https://www.eclipse.org/jetty/documentation/current/startup-modules.html

[description]
Enable GzipHandler for dynamic gzip compression
for the entire server.

[tags]
handler

[depend]
server

[xml]
etc/jetty-gzip.xml

[ini-template]
## Minimum content length after which gzip is enabled
jetty.gzip.minGzipSize=2048

## Check whether a file with *.gz extension exists
jetty.gzip.checkGzExists=false

## Gzip compression level (-1 for default)
jetty.gzip.compressionLevel=-1

## User agents for which gzip is disabled
jetty.gzip.excludedUserAgent=.*MSIE.6\.0.*

-

jetty-gzip.xml

-


http://www.eclipse.org/jetty/configure_9_3.dtd;>


















































-


Re: FuzzyQuery causing Out of Memory Errors in 8.5.x

2020-04-23 Thread Colvin Cowie
I created a little test that fires off fuzzy queries from random UUID
strings for 5 minutes
*FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"*

The change in heap usage is really severe.

On 8.5.1 Solr went OOM almost immediately on a 512mb heap, and with a 4GB
heap it only just stayed alive.
On 8.3.1 it was completely happy.

I'm guessing that the memory might be being leaked if the FuzzyQuery
objects are referenced from the cache, while the FuzzyTermsEnum would not
have been.

I'm going to raise an issue


On Wed, 22 Apr 2020 at 19:44, Colvin Cowie 
wrote:

> Hello,
>
> I'm moving our product from 8.3.1 to 8.5.1 in dev and we've got tests
> failing because Solr is getting OOMEs with a 512mb heap where it was
> previously fine.
>
> I ran our tests on both versions with jconsole to track the heap usage.
> Here's a little comparison. 8.5.1 dies part way through
> https://drive.google.com/open?id=113Ujts-lzv9ZBJOUB78LA2Qw5PsIsajO
>
> We have our own query parser as an extension to Solr, and we do various
> things with user queries, including generating FuzzyQuery-s. Our
> implementation of org.apache.solr.search.QParser.parse() isn't stateful and
> parses the qstr and returns new Query objects each time it's called.
> With JProfiler on I can see that the majority of the heap is being
> allocated through FuzzyQuery's constructor.
> https://issues.apache.org/jira/browse/LUCENE-9068 moved construction of
> the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor.
>
> When profiling on 8.3.1 we still have a fairly large number of
> FuzzyTermEnums created at times, but that's accounting for about ~40mb of
> the heap for a few seconds rather than the 100mb to 300mb of continual
> allocation for FuzzyQuery I'm seeing in 8.5.
>
> It's definitely possible that we're doing something wrong in our extension
> (which I can't share the source of) but it seems like the memory cost of
> FuzzyQuery now is totally disproportionate to what it was before. We've not
> had issues like this with our extension before (which doesn't mean that our
> parser is flawless, but it's not been causing noticeable problems for the
> last 4 years).
>
>
> So I suppose the question is, are we misusing FuzzyQuery in some way (hard
> for you to say without seeing the source), or are the recent changes using
> more memory than they should?
>
> I will investigate further into what we're doing. But I could maybe use
> some help to create a stress test for Lucene itself that compares the
> memory consumption of the old FuzzyQuery vs the new, to see whether it's
> fundamentally bad for memory or if it's just how we're using it.
>
> Regards,
> Colvin
>
>
>
>


RE: How upgrade to Solr 8 impact performance

2020-04-23 Thread Srinivas Kashyap
Can you share with details, what performance was degraded?

Thanks,
srinivas
From: Natarajan, Rajeswari 
Sent: 23 April 2020 12:41
To: solr-user@lucene.apache.org
Subject: Re: How upgrade to Solr 8 impact performance

With the same hardware and configuration we also saw performance degradation 
from 7.6 to 8.4.1 as this is why we are checking here to see if anyone else saw 
this behavior.

-Rajeswari

On 4/22/20, 7:16 AM, "Paras Lehana" 
mailto:paras.leh...@indiamart.com>> wrote:

Hi Rajeswari,

I can only share my experience of moving from Solr 6 to Solr 8. I suggest
you to move and then reevaluate your performance metrics. To recall another
experience, we moved from Java 8 to 11 for Solr 8.

Please note experiences can differ! :)

On Wed, 22 Apr 2020 at 00:50, Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Any other experience from solr 7 to sol8 upgrade performance .Please
> share.
>
> Thanks,
> Rajeswari
>
> On 4/15/20, 4:00 PM, "Paras Lehana" 
> mailto:paras.leh...@indiamart.com>> wrote:
>
> In January, we upgraded Solr from version 6 to 8 skipping all versions
> in
> between.
>
> The hardware and Solr configurations were kept the same but we still
> faced
> degradation in response time by 30-50%. We had exceptional Query times
> around 25 ms with Solr 6 and now we are hovering around 36 ms.
>
> Since response times under 50 ms are very good even for Auto-Suggest,
> we
> have not tried any changes regarding this. Nevertheless, you can try
> using
> Caffeine Cache. Looking forward to read community inputs as well.
>
>
>
> On Thu, 16 Apr 2020 at 01:34, ChienHuaWang 
> mailto:chien-hua.w...@sap.com>>
> wrote:
>
> > Do anyone have experience to upgrade the application with Solr 7.X
> to 8.X?
> > How's the query performance?
> > Found out a little slower response time from application with Solr8
> based
> > on
> > current measurement, still looking into more detail it.
> > But wondering is any one have similar experience? is that something
> we
> > should expect for Solr 8.X?
> >
> > Please kindly share, thanks.
> >
> > Regards,
> > ChienHua
> >
> >
> >
> > --
> > Sent from:
> https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, *Auto-Suggest*,
> IndiaMART InterMESH Ltd,
>
> 11th Floor, Tower 2, Assotech Business Cresterra,
> Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>
> Mob.: +91-9560911996
> Work: 0120-4056700 | Extn:
> *1196*
>
> --
> *
> *
>
> >
>
>

--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

--
*
*

>

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Re: How upgrade to Solr 8 impact performance

2020-04-23 Thread Natarajan, Rajeswari
With the same hardware and configuration we also saw performance degradation 
from 7.6 to 8.4.1 as this is why we are checking here to see if anyone else saw 
this behavior.

-Rajeswari

On 4/22/20, 7:16 AM, "Paras Lehana"  wrote:

Hi Rajeswari,

I can only share my experience of moving from Solr 6 to Solr 8. I suggest
you to move and then reevaluate your performance metrics. To recall another
experience, we moved from Java 8 to 11 for Solr 8.

Please note experiences can differ! :)

On Wed, 22 Apr 2020 at 00:50, Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Any other experience from solr 7 to sol8 upgrade performance  .Please
> share.
>
> Thanks,
> Rajeswari
>
> On 4/15/20, 4:00 PM, "Paras Lehana"  wrote:
>
> In January, we upgraded Solr from version 6 to 8 skipping all versions
> in
> between.
>
> The hardware and Solr configurations were kept the same but we still
> faced
> degradation in response time by 30-50%. We had exceptional Query times
> around 25 ms with Solr 6 and now we are hovering around 36 ms.
>
> Since response times under 50 ms are very good even for Auto-Suggest,
> we
> have not tried any changes regarding this. Nevertheless, you can try
> using
> Caffeine Cache. Looking forward to read community inputs as well.
>
>
>
> On Thu, 16 Apr 2020 at 01:34, ChienHuaWang 
> wrote:
>
> > Do anyone have experience to upgrade the application with Solr 7.X
> to 8.X?
> > How's the query performance?
> > Found out a little slower response time from application with Solr8
> based
> > on
> > current measurement, still looking into more detail it.
> > But wondering is any one have similar experience? is that something
> we
> > should expect for Solr 8.X?
> >
> > Please kindly share, thanks.
> >
> > Regards,
> > ChienHua
> >
> >
> >
> > --
> > Sent from:
> https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, *Auto-Suggest*,
> IndiaMART InterMESH Ltd,
>
> 11th Floor, Tower 2, Assotech Business Cresterra,
> Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>
> Mob.: +91-9560911996
> Work: 0120-4056700 | Extn:
> *1196*
>
> --
> *
> *
>
>  
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 



Re: Potential bug with optimistic concurrency

2020-04-23 Thread Sachin Divekar
Missed an important detail. It works correctly for single shard
collections.

--
Sachin

On Wed, Apr 22, 2020 at 10:03 PM Sachin Divekar  wrote:

> Hi all,
>
> I am facing the exact same issue reported
> https://issues.apache.org/jira/browse/SOLR-8733 and
> https://issues.apache.org/jira/browse/SOLR-7404
>
> I have tried it with Solr v8.4.1 and v8.5.1. In both cases, the cluster
> consisted of three nodes and a collection with 3 shards and 2 replicas.
>
> Following simple test case fails.
>
> Collection "test" contains only two documents with ids "1" and "2"
>
> Update operation:
>
> curl -X POST -H 'Content-Type: application/json' '
> http://localhost:8983/solr/test/update?versions=true=false'
> --data-binary '
> [ { "id" : "2", "attr": "val", },
>   { "id" : "1", "attr": "val", "_version_": -1 } ]'
>
> Consistent response:
>
> {
>   "adds":[
> "2",0,
> "1",0],
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException",
>
> "error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException",
>
> "root-error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException"],
> "msg":"Async exception during distributed update: Error from server at
> http://10.0.5.237:8983/solr/test_shard1_replica_n1/: null\n\n\n\nrequest:
> http://10.0.5.237:8983/solr/test_shard1_replica_n1/\nRemote error
> message: version conflict for 1 expected=-1 actual=1664690075695316992",
> "code":409}}
>
> I tried different updates using combinations of _version_ and document
> values to generate conflicts. Every time the result is the same. There is
> no problem with system resources. These servers are running only these Solr
> nodes and Solr has been given a few GB of heap.
>
> Are those issues SOLR-7404 and SOLR-8733 still unfixed? Unlike these
> issues, I am not using the schema and config from example templates. These
> nodes are set up by following Solr's production deployment document.
>
> What are your thoughts/suggestions?
>
> thanks
> Sachin
>
>