[Solr8.5.2] Sudden increase in cpu usage.

2021-01-18 Thread raj.yadav
Hi Everyone,

We are using solr8.5.2 (Solr cloud mode), external zookeeper ensemble
(hosted on the separate node)
All of a sudden we are seeing sudden spike in CPU but at the same same time
neither any heavy indexing is performed nor any sudden increase in request
rate.

Collection info:
Collection has 6 shards and each shard has 5 replicas (NRT type) and each
replica is hosted on a separate VM. Total we have 30 VMs running.

Each shard have 14 million docs, Avg size/doc: 909.5b, Size of each shard is
around 12GB.

This sudden spike first started on one VM and immediately (within 1 minute),
such CPU spikes also occurred on 2-3 more VMs. At the same time remaining
unaffected VMs were running fine. We have had let these high CPU VMs running
for some time (more than 8 hours) but still, CPU was not coming down. 



VM detail:
OS: centos 7.7.1908
Java: openjdk version "1.8.0_262" 
CPU/RAM: 8 vcpus, 64 GiB memory
OS disk size: 256 GB (SSD)

JVM memory allocated to each machine => 26GB

GC Parameter
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=150 \
-XX:InitiatingHeapOccupancyPercent=70 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"


Caching layer parameter








Here are details of different metrics during CPU spike

1. At 21.06 timestamp you can see that there is sudden spike in CPU but at
the same time request rate is constant. 
https://drive.google.com/file/d/1cJhFFIkfEdBJouw0A6PRz-HAHhIpudba/view?usp=sharing

2.Processes running at 21.06 timestamp
https://drive.google.com/file/d/1Qsfv-ivy664ShFihcb--EgapMcpgOij8/view?usp=sharing
https://drive.google.com/file/d/1Nak4bI7PqroNmImpsunUcMNm6pZ41TuG/view?usp=sharing
https://drive.google.com/file/d/1q3iuSZtK4rlzrM7vIIXdSTrNPOYd_XQ6/view?usp=sharing

3. 26 GB is allocated to JVM. And used JVM memory is hardly crossing 20GB
https://drive.google.com/file/d/1zSZFcqscXmWZbj-aMWhql28kbxbyW_Qa/view?usp=sharing

4. GC metrics is also normal
https://drive.google.com/file/d/1zBTjL6tbzM_xQeMcqbVyCIt4qAaBDprP/view?usp=sharing


We decided to replace these VMs. On running the DELETEREPLICA command it
throws timeout error. We observe that replica is deregistered from
state.json (in zookeeper config) but its replica folder was still available
on the physical VM.
In-fact after DELETEREPLICA command, though no replica was hosted on VM and
request rate on it was 0 req/sec , its CPU was still high (check below image
for reference). CPU came down to zero only after stopping solr process.
https://drive.google.com/file/d/1HXe5jjs5kJCUWXBfl2FZ0Z3BtiCOaW3V/view?usp=sharing

I'm not able to figure out what is wrong with the configuration. I have read
few blogs and most of them are pointing to look into GC. Ongoing through the
gc metrics I don't see any unusual. Also, why it's happening to only few
VMs. In the last week, this issue has occurred thrice.




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: [SOLR-8.5.2] Commit through curl command is causing delay in issuing commit

2020-12-15 Thread raj.yadav
Hi All,

For further investigation, I have raised a JIRA ticket.
https://issues.apache.org/jira/browse/SOLR-15045

In case, anyone has any information to share, feel free to mention it here.

Regards,
Raj




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: [SOLR-8.5.2] Commit through curl command is causing delay in issuing commit

2020-12-14 Thread raj.yadav
Hi All,


As I mentioned in my previous post that reloading/refreshing of the external
file is consuming most of the time during a commit operation.
In order to nullify the impact of external files, I had deleted external
files from all the shards and issued commit through the curl command. Commit
operation got completed in 3 seconds. Individual shards took 1.5 seconds to
complete the commit operation. But there was a delay of around 1.5 seconds
on the shard whose hostname was used to issue the commit. Hence overall
commit time is 3 seconds.

During this operation, there was no timeout or any other kind of error
(except `external file not found` error which is expected). I'm not able to
figure what might be causing the delay on hostname_shard. Is there any
setting that impacts curls operation and we might have accidentally changed
it.

I have been trying to solve this issue for the last 15 days, can someone
please help in resolving it.
Let me know in case any information/logs are missing. 

Regards,
Raj 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: [SOLR-8.5.2] Commit through curl command is causing delay in issuing commit

2020-12-12 Thread raj.yadav
Hi All,

Till we investigate further about this issue.
Can anyone please share what other ways we can issue a commit or point me to
existing documents that have a relevant example. 


Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Timeout occured while waiting response from server

2020-12-11 Thread raj.yadav
elivis wrote
> See:
> https://lucene.472066.n3.nabble.com/SolrServerException-Timeout-occured-while-waiting-response-from-server-tc4464632.html
> 
> Maybe this will help somebody. I was dealing with exact same problem. We
> are
> running on VMs, and all of our timeout problems went away after we
> switched
> from a 5yo VmWare version to the latest Hyper-V VMs. We also made sure
> that
> all VMs have a dedicated spindle. It appears the underlying physical disk
> drive (which all VMs use) was getting overloaded with reads/writes. 

Hey Elivis,
Thanks for the quick response.
We are not getting `IOException` neither there is any iowait during commit.
We use curl command to issue commit and its not starting commit command on
all the shards at the same time.
You can find details about it here:
https://lucene.472066.n3.nabble.com/SOLR-8-5-2-Commit-through-curl-command-is-causing-delay-in-issuing-commit-tp4466276.html

Do let me know, in case you have any suggestions about this.

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


[SOLR-8.5.2] Commit through curl command is causing delay in issuing commit

2020-12-11 Thread raj.yadav
Solr Setup: (Running in solrCloud mode)
It has 6 shards, and each shard has only one replica (which is also a
leader) and the replica type is NRT.
Each shards are hosted on the separate physical host.

Zookeeper => We are using external zookeeper ensemble (3 separate node
cluster)

Shard and Host name
shard1_0=>solr_199
shard1_1=>solr_200
shard2_0=> solr_254 
shard2_1=> solr_132 
shard3_0=>solr_133
shard3_1=>solr_198

*Request rate on the system is currently zero and only hourly indexing
running on it.*

We are using curl command to issue commit.
/curl
"http://solr_254:8389/solr/my_collection/update?openSearcher=true=true=json"/
(Here we are using solr_254 host to issue commit)

*On using above command all the shards have started processing commit (i.e
getting `start commit` request) except the one used in curl command (i.e
shard2_0 which is hosted on solr_254). Individually each shards takes around
10 to 12 min to process hard commit (most of this time is spent on reloading
external files).
As per logs, shard2_0 is getting `start commit` request after 10 minutes
(approx). This leads to following timeout error.*


2020-12-06 18:47:47.013 ERROR
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at:
http://solr_132:9744/solr/my_collection_shard2_1_replica_n21/update?update.distrib=TOLEADER=http%3A%2F%2Fsolr_254%3A9744%2Fsolr%2Fmy_collection_shard2_0_replica_n11%2F
  at
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407)
  at
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753)
  at
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369)
  at
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
  at
org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344)
  at
org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
  at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
  at
org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
  at
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398)
  ... 13 more


Above timeout error is between solr_254 and solr_132. Similar errors are
there between solr_254 and other 4 shards


Since query load is zero, mostly CPU utilization is around 3%.
After issuing curl commit command, CPU goes up to 14% on all shards except
shard2_0 (host: solr_254, the one used in curl command).
And after 10 minutes (i.e after getting the `start commit` request)  CPU  on
shard2_0 also goes up to 14%.

As I mentioned earlier each shards take around 10-12 mins to process commit
and due to delay in starting commit process on one shard (shard2_0) our
overall commit time is doubled now. (22-24 minutes approx)

In our solr-5.4.0(having similar setup), we use similar command curl command
to issue commit and there all the shards are getting `start commit` request
at same time. Including the one used in curl command. 

I'm not able to figure out why this is happening? Is there something changed
in internal functioning when we issue commit through curl? Is it do
something with HTTP2? We do not use autoCommit feature available in
solrconfig.xml. Its not suitable for our system. Apart from curl command are
there any other alternate way to issue commit?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Commits (with openSearcher = true) are too slow in solr 8

2020-12-09 Thread raj.yadav
Hi All,

I tried debugging but unable to find any solution. Do let me know in case
details/logs shared by me are not suffiecient/clear. 

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Commits (with openSearcher = true) are too slow in solr 8

2020-12-08 Thread raj.yadav
matthew sporleder wrote
> I would stick to soft commits and schedule hard-commits as
> spaced-out-as-possible in regular maintenance windows until you can
> find the culprit of the timeout.
> 
> This way you will have very focused windows for intense monitoring
> during the hard-commit runs.

*Little correction:*
In my last post, I had mentioned that softCommit is working fine and there
no delay or error message.
Here is what happening:

1. Hard commit with openSearcher=true
curl
"http://:solr_port/solr/my_collection/update?openSearcher=true=true=json"

All the cores started processing commit except , the one hosted ``.
Also we are getting timeout error on this.

2. softCommit
curl
"http://:solr_port/solr/my_collection/update?softCommit=true=json"
Same as 1.

3.Hard commit with openSearcher=false
curl
"http://:solr_port/solr/my_collection/update?openSearcher=false=true=json"
All the cores started processing commit immediately and there is no error.


Solr commands used to set up system

Solr start command
#/var/solr-8.5.2/bin/solr start -c  -p solr_port  -z
zk_host1:zk_port,zk_host1:zk_port,zk_host1:zk_port -s
/var/node_my_collection_1/solr-8.5.2/server/solr -h  -m 26g
-DzkClientTimeout=3 -force



Creat Collection
1.upload config to zookeper
#var/solr-8.5.2/server/scripts/cloud-scripts/./zkcli.sh -z
zk_host1:zk_port,zk_host1:zk_port,zk_host1:zk_port  -cmd upconfig -confname
my_collection  -confdir /

2. Cretaed collection with 3 shards (shard1,shard2,shard3),
#curl
"http://:solr_port/solr/admin/collections?action=CREATE=my_collection=3=1=1=my_collection=solr_node1:solr_port,solr_node2:solr_port,solr_node3:solr_port"

3. Used SPLITSHARD command to split each shards into two half
(shard1_1,shard1_0,shard2_0,...)
e.g
 #curl
"http://:solr_port/solr/admin/collections?action=SPLITSHARD=my_collection=shard1

4. Used DELETESHARD command to delete old shatds (shard1,shard2,shard3).
e.g
 #curl
"http://:solr_port/solr/admin/collections?action=DELETESHARD=my_collection=shard1









--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Getting Reset cancel_stream_error on solr-8.5.2

2020-12-08 Thread raj.yadav
Hey All,
We have updated our system from solr 5.4 to solr 8.5.2 and we are suddenly
seeing a lot of the below errors in our logs.

HttpChannelState org.eclipse.jetty.io.EofException: Reset
cancel_stream_error

Is this related to some system level or solr level config?

How do I find the cause of this?
How do I solve this?

*Solr Setup Details:*
Solr version => solr-8.5.2

GC setting: GC_TUNE=" -XX:+UseG1GC -XX:+PerfDisableSharedMem
-XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=150
-XX:InitiatingHeapOccupancyPercent=60 -XX:+UseLargePages -XX:+AggressiveOpts
"

Solr Collection details: (running in solrCloud mode) It has 6 shards, and
each shard has only one replica (which is also a leader) and replica type is
NRT. Total doc in collection: 77 million and each shard index size: 11 GB
and avg size/doc: 1.0Kb

Zookeeper => We are using external zookeeper ensemble (3 node cluster)

System Datails:
Centos (7.7); disk size: 250 GB; cpu: (8 vcpus, 64 GiB memory)


Solr OPs

Solr start command
#/var/solr-8.5.2/bin/solr start -c  -p solr_port  -z
zk_host1:zk_port,zk_host1:zk_port,zk_host1:zk_port -s
/var/node_my_collection_1/solr-8.5.2/server/solr -h  -m 26g 
-DzkClientTimeout=3 -force



Creat Collection 
1.upload config to zookeper
#var/solr-8.5.2/server/scripts/cloud-scripts/./zkcli.sh -z
zk_host1:zk_port,zk_host1:zk_port,zk_host1:zk_port  -cmd upconfig -confname
my_collection  -confdir /

2. Cretaed collection with 3 shards (shard1,shard2,shard3),
#curl 
"http://:solr_port/solr/admin/collections?action=CREATE=my_collection=3=1=1=my_collection=solr_node1:solr_port,solr_node2:solr_port,solr_node3:solr_port"

3. Used SPLITSHARD command to split each shards into two half
(shard1_1,shard1_0,shard2_0,...)
e.g
 #curl
"http://:solr_port/solr/admin/collections?action=SPLITSHARD=my_collection=shard1

4. Used DELETESHARD command to delete old shatds (shard1,shard2,shard3).
e.g
 #curl
"http://:solr_port/solr/admin/collections?action=DELETESHARD=my_collection=shard1




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Commits (with openSearcher = true) are too slow in solr 8

2020-12-07 Thread raj.yadav
Hi Folks,

Do let me know if any more information required to debug this.


Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Commits (with openSearcher = true) are too slow in solr 8

2020-12-06 Thread raj.yadav
matthew sporleder wrote
> Is zookeeper on the solr hosts or on its own?  Have you tried
> opensearcher=false (soft commit?)

1. we are using zookeeper in ensemble mode. Its hosted on 3 seperate node.
2. Soft commit  (opensearcher=false) is working fine. All the shards are
getting commit request immediately and its got processed within second.





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Commits (with openSearcher = true) are too slow in solr 8

2020-12-06 Thread raj.yadav
Hi Everyone,


matthew sporleder wrote
> Are you stuck in iowait during that commit?

During commit operation, there is no iowait.
Infact most of the time cpu utilization percentage is very low.

/*As I mentioned in my previous post that we are getting `SolrCmdDistributor
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server` and `DistributedZkUpdateProcessor` ERROR on
one of the shards. And this error is always occurring on the shard that is
used (in culr command) to issue commit. (See below example for better
understanding)*/

Here is shard and corresponding node details:
shard1_0=>solr_199
shard1_1=>solr_200
shard2_0=> solr_254
shard2_1=> solr_132
shard3_0=>solr_133
shard3_1=>solr_198

We are using the following command to issue commit:
/curl
"http://solr_node:8389/solr/my_collection/update?openSearcher=true=true=json"/

For example, in the above command, if we replace solr_node with solr_254,
then it's throwing SolrCmdDistributor and DistributedZkUpdateProcessor
errors on shard2_0. Similarly, if we replace solr_node with solr_200 its
throws errors on shard1_1.

*I'm not able to figure out why this is happening. Is there any connection
timeout setting that is affecting this? Is there any limit that, at a time
only N number of shards can run commit ops simultaneously or is it some
network related issue?*


For a better understanding of what's happening in SOLR logs. I will
demonstrate here one commit operation.

I used the below command to issue commit at `2020-12-06 18:37:40` (approx)
curl
"http://solr_200:8389/solr/my_collection/update?openSearcher=true=true=json;


/*shard2_0 (node: solr_254) Logs:*/


*Commit is received at `2020-12-06 18:37:47` and got over by `2020-12-06
18:37:47` since there were no changes to commit. And CPU utilization during
the whole period is around 2%.*


2020-12-06 18:37:47.023 INFO  (qtp2034610694-31355) [c:my_collection
s:shard2_0 r:core_node13 x:my_collection_shard2_0_replica_n11]
o.a.s.u.DirectUpdateHandler2 start
commit{_version_=1685355093842460672,optimize=false,ope
nSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2020-12-06 18:37:47.023 INFO No uncommitted changes. Skipping IW.commit.
2020-12-06 18:37:47.023 INFO end_commit_flush
2020-12-06 18:37:47.023 INFO  (qtp2034610694-31355) [c:my_collection
s:shard2_0 r:core_node13 x:my_collection_shard2_0_replica_n11]
o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard2_0_replica_n11] 
webapp=/solr path=/update

params={update.distrib=TOLEADER=true=true=true=false=http://solr_200:8389/solr/my_collection_shard1_1_replica_n19/_end_point=leaders=javabi
n=2=false}{commit=} 0 3

/*shard2_1 (node: solr_132) Logs:*/

*Commit is received at `2020-12-06 18:37:47` and got over by `2020-12-06
18:50:46` in between there were some external file reloading operations (our
solr-5.4.2 system is also taking similar time to reload external files so
right now this is not a major concern for us)
CPU utilization before commit (i.e `2020-12-06 18:37:47` timestamp) is 2%
and between commit ops (i.e from `2020-12-06 18:37:47`  to `2020-12-06
18:50:46` timestamp) is 14% and after commit operation is done it agains
fall back to 2%*


2020-12-06 18:37:47.024 INFO  (qtp2034610694-30058) [c:my_collection
s:shard2_1 r:core_node22 x:my_collection_shard2_1_replica_n21]
o.a.s.u.DirectUpdateHandler2 start
commit{_version_=1685355093844557824,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

2020-12-06 18:50:46.218 INFO  (qtp2034610694-30058) [c:my_collection
s:shard2_1 r:core_node22 x:my_collection_shard2_1_replica_n21]
o.a.s.u.p.LogUpdateProcessorFactory [my_collection_shard2_1_replica_n21] 
webapp=/solr path=/update
params={update.distrib=TOLEADER=true=true=true=false=http://solr_200:8389/solr/my_collection_shard1_1_replica_n19/_end_point=leaders=javabin=2=false}{commit=}
0 779196


/*shard3_0 (node: solr_133) logs*/

Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by
`2020-12-06 18:49:24`.
CPU utilization pattern is same is shard2_1.

/*shard3_1 (node: solr_198) logs.*/

Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by
`2020-12-06 18:53:57`.
CPU utilization pattern is same is shard2_1.

/*shard1_0 (node: solr_199) logs.*/

Same as shard2_1, commit received at `2020-12-06 18:37:47` and got over by
`2020-12-06 18:54:51`.
CPU utilization pattern is same is shard2_1.

/*shard1_1 (node: solr_200) logs.*/

/This is the same solr_node which we used in curl command to issue commit.
As expected we got SolrCmdDistributor and DistributedZkUpdateProcessor error
on it./

/Till `2020-12-06 18:46:50` timestamp there is no `start commit`  request
received. Also CPU utilization is 2% till this time./
/*Received follwing error at `2020-12-06 18:47:47` timestamp*/

2020-12-06 18:47:47.013 ERROR
(updateExecutor-5-thread-6-processing-n:solr_200:8389_solr

Re: Timeout occured while waiting response from server

2020-12-06 Thread raj.yadav
Hey Karl,

Can you elaborate more about your system? How many shards does your
collection have, what is replica type? Are you using an external zookeeper?
Its looks like (from logs) that you are running solr on SolrCloud mode?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Commits (with openSearcher = true) are too slow in solr 8

2020-12-06 Thread raj.yadav
matthew sporleder wrote
> On unix the top command will tell you.  On windows you need to find
> the disk latency stuff.

Will check this and report here



matthew sporleder wrote
> Are you on a spinning disk or on a (good) SSD?

we are using SSD


matthew sporleder wrote
> Anyway, my theory is that trying to do too many commits in parallel
> (too many or not enough shards) is causing iowait = high latency to
> work through.

Can you please elaborate more about this.




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Commits (with openSearcher = true) are too slow in solr 8

2020-12-06 Thread raj.yadav
matthew sporleder wrote
> Are you stuck in iowait during that commit?

I am not sure how do I determine that, could you help me here.




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Commits (with openSearcher = true) are too slow in solr 8

2020-12-02 Thread raj.yadav
Hi everyone,

As per suggestions in previous post (by Erick and Shawn) we did following
changes.

OLD




 

NEW







*Reduced JVM heap size from 30GB to 26GB*

GC setting:
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=150 \
-XX:InitiatingHeapOccupancyPercent=60 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

Solr Collection details: (running in solrCloud mode)
It has 6 shards, and each shard has only one replica (which is also a
leader) and replica type is NRT
Each shard Index size: 11 GB
avg size/doc: 1.0Kb

We are running indexing on this collection:
*Indexing rate: 2.4 million per hour*

*The query rate is zero. Still commit with opensearcher=true is taking 25 to
28 minutes.*
Is this because of heavy indexing? Also, with an increase in the number of
documents in collection commit time is increasing. 
This is not our production system. In the prod system generally, our
indexing rate is 5k/hour.

Is it expected to have such high commit time with the above indexing rate?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Unable to finish sending updates - Solr 8.5.0

2020-11-09 Thread raj.yadav
He Scott,
We have also recently migrated to solr 8.5.2. And facing similar issue.
Are you able to resolve this



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


How to reflect changes of solrconfig.xml to all the cores without causing any conflict

2020-11-09 Thread raj.yadav
Recently we had modified `noCFSRatio` parameter of merge policy.

 
8
5
50.0
4000
0.0
  

This is our current merge policy. Earlier `noCFSRatio` was set to `0.1`.

generally to reflect any changes of solrconfig we reload the collection. But
we stop doing this because we observe that during reload operation some of
the replicas go under-recovery after reloading operation. 
So instead of reload, we restart each replica one by one.  

Our restart procesdure:
1. Indexing was stopped on the collection and  issued a hard commit 
2. First restarted are the non leader replica and in the end restarted
leader replica

*Question:*
Since reload is not done, none of the replica (including leader) will have
updated solrconfig. And if we restart replica and if it trys to sync up with
leader will it reflect the latest changes of solrconfig or it will be the
same as leader. 

Also after this exercise, we have seen a sudden spike in CPU utilization on
a few replicas though there is not much increase in our system load. 
 

System config of VM:
disk size: 250 GB
cpu: (8 vcpus, 64 GiB memory)

Solr Collection detail:
single collection having 6 shard. each Vm is hosting single replica. 
Collection size: 60 GB (each shard size is 10 GB)
Average doc size: 1.0Kb




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Commits (with openSearcher = true) are too slow in solr 8

2020-11-09 Thread raj.yadav
Thanks, Shawn and  Erick.
We are step by step trying out the changes suggested in your post.
Will get back once we have some numbers.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Commits (with openSearcher = true) are too slow in solr 8

2020-11-03 Thread raj.yadav
Hi everyone,
We have two parallel system one is  solr 8.5.2 and other one is solr 5.4
In solr_5.4 commit time with opensearcher true is 10 to 12 minutes while in
solr_8 it's around 25 minutes.

This is our current caching policy of solr_8







In solr 5, we are using FastLRUCache (instead of CaffeineCache) and other
parameters are same.

While debugging this we came across this page.
https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-Slowcommits

Here one of the reasons for slow commit is mentioned as:
*/`Heap size issues. Problems from the heap being too big will tend to be
infrequent, while problems from the heap being too small will tend to happen
consistently.`/*

Can anyone please help me understand the above point?

System config:
disk size: 250 GB
cpu: (8 vcpus, 64 GiB memory)
Index size: 11 GB
JVM heap size: 30 GB



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 7.6 query performace question

2020-10-01 Thread raj.yadav
harjags wrote
> Below errors are very common in 7.6 and we have solr nodes failing with
> tanking memory.
> 
> The request took too long to iterate over terms. Timeout: timeoutAt:
> 162874656583645 (System.nanoTime(): 162874701942020),
> TermsEnum=org.apache.lucene.codecs.blocktree.SegmentTermsEnum@74507f4a
> 
> or 
> 
> #*BitSetDocTopFilter*]; The request took too long to iterate over terms.
> Timeout: timeoutAt: 33288640223586 (System.nanoTime(): 33288700895778),
> TermsEnum=org.apache.lucene.codecs.blocktree.SegmentTermsEnum@5e458644
> 
> 
> or 
> 
> #SortedIntDocSetTopFilter]; The request took too long to iterate over
> terms.
> Timeout: timeoutAt: 552497919389297 (System.nanoTime(): 552508251053558),
> TermsEnum=org.apache.lucene.codecs.blocktree.SegmentTermsEnum@60b7186e
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



We are also seeing such errors in our log. But our nodes are not failing
also the frequency of such warnings are less then 5% of overall traffic.
What does this error means.
Can someone eleaborate following :
1. What does `The request took too long to iterate over terms` means ? 
2. what is `BitSetDocTopFilter` and `SortedIntDocSetTopFilter` ?

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to Resolve : "The request took too long to iterate over doc values"?

2020-09-30 Thread raj.yadav
raj.yadav wrote
> In cases for which we are getting this warning, I'm not able to extract
> the
> `exact solr query`. Instead logger is logging `parsedquery ` for such
> cases.
> Here is one example:
> 
> 
> 2020-09-29 13:09:41.279 WARN  (qtp926837661-82461) [c:mycollection
> s:shard1_0 r:core_node5 x:mycollection_shard1_0_replica_n3]
> o.a.s.s.SolrIndexSearcher Query: [+FunctionScoreQuery(+*:*, scored by
> boost(product(if(max(const(0),
> sub(float(my_doc_value_field1),const(500))),const(0.01),
>
> if(max(const(0),sub(float(my_doc_value_field2),const(290))),const(0.2),const(1))),
>
> sqrt(product(sum(const(1),float(my_doc_value_field3),float(my_doc_value_field4)),
> sqrt(sum(const(1),float(my_doc_value_field5
> #BitSetDocTopFilter]; The request took too long to iterate over doc
> values.
> Timeout: timeoutAt: 1635297585120522 (System.nanoTime():
> 1635297690311384),
> DocValues=org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$8@7df12bf1
> 
> 


Hi Community members,

In my previous mail, I had mentioned that solr is not logging actual
`solr_query` and instead its only logging parsedquery. Actually, solr is
logging the solr_query just after logging above warning message.

Coming back to the above query for which we are getting above warning:
QUERY => retrieve all docs (i.e q = *:*) and ordered them using
multiplicative boost function (i.e boost functional query). So this clearly
rules out the possibility mentioned by Erick (i.e query might be searching
against field which has indexed=false and docValue=true). 

Is this expected on using doc values for the functional query? This is only
happening when the query is retrieving  large number of documents (in
millions).  Has anyone else faced this issue before? We are experiencing
this issue even when there is no load on the system.

Regards,
Raj






--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to Resolve : "The request took too long to iterate over doc values"?

2020-09-30 Thread raj.yadav
Hi,

I went through other queries for which we are getting `The request took too
long to iterate over doc values` warning. As pointed by Erick I have cross
check all the fields that are being used in query and there is no such field
against which we are searching and it as index=false and docValues=true.

Few observations I would like to share here:

- We are performing a load test on our system and the above timeout warning
is occurring for only those queries which are fetching a large number of
documents.

- I had stopped all the load on the system and fired same queries (for which
we were getting timeout warning). Here is solr response:

Solr Response:
response: {
numFound: 6082251,
start: 0,
maxScore: 4709.594,
docs: [ ]
}

The response was quite weird (header is saying there are `6082251` docs
found but `docs` array is empty) also there was no timeout warning in logs.
Then I increased `timeAllowed` to 5000ms (default is 1000ms). This time
`docs` array was not empty and in fact there was an increase in numFound
count. This clearly points that query was not able to complete in 1000ms
(default timeAllowed).

I have following question:
1. Is doc value is as effiecient as ExternalFileField for functional query?
2. Why I got warning message when system was under load but no warning
message was thrown when there was no not under laod?

When we were performing load test (load scale is same) with 
ExternalFileField type were not getting any warning messages in our logs.

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to Resolve : "The request took too long to iterate over doc values"?

2020-09-30 Thread raj.yadav
Hi,

I went through other queries for which we are getting `The request took too
long to iterate over doc values` warning. As pointed by Erick I have cross
check all the fields that are being used in query and there is no such field
against which we are searching and it as index=false and docValues=true.

Few observations I would like to share here:

- We are performing a load test on our system and the above timeout warning
is occurring for only those queries which are fetching a large number of
documents.

- I had stopped all the load on the system and fired same queries (for which
we were getting timeout warning). Here is solr response:

Solr Response:
response: {
numFound: 6082251,
start: 0,
maxScore: 4709.594,
docs: [ ]
}

The response was quite weird (header is saying there are `6082251` docs
found but `docs` array is empty) also there was no timeout warning in logs.
Then I increased `timeAllowed` to 5000ms (default is 1000ms). This time
`docs` array was not empty and in fact there was an increase in numFound
count. This clearly points that query was not able to complete in 1000ms
(default timeAllowed).

I have following question:
1. Is doc value is as effiecient as ExternalFileField for functional query?
2. Why I got warning message when system was under load but no when there
was no laod?

When we were performing load test (load scale is same) with 
ExternalFileField type were not getting any warning messages in our logs.

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to Resolve : "The request took too long to iterate over doc values"?

2020-09-30 Thread raj.yadav
I went through other queries for which we are getting `The request took too
long to iterate over doc values` warning. As pointed by Erick I have cross
check all the fields that are being used in query and there is no such field
against which we are searching and it as index=false and docValues=true. 

Few observations I would like to share here:

- We are performing a load test on our system and the above timeout warning
is occurring for only those queries which are fetching a large number of
documents. 

- I had stopped all the load on the system and fired same queries (for which
we were getting timeout warning). Here is solr response:

Solr Response:
response: {
numFound: 6082251,
start: 0,
maxScore: 4709.594,
docs: [ ]
}

The response was quite weird (header is saying there are `6082251` docs
found but `docs` array is empty) also there was no timeout warning in logs.
Then I increased `timeAllowed` to 5000ms (default is 1000ms). This time
`docs` array was not empty and in fact there was an increase in numFound
count. This clearly points that query was not able to complete in 1000ms
(default timeAllowed).

I have following question:
1. Is doc value is as effiecient as ExternalFileField for functional query?
2. Why I got warning message when system was under load but no when there
was no laod?

When we were performing load test (load scale is same) with 
ExternalFileField type were not getting any warning messages in our logs.



raj.yadav wrote
> Hey Erick,
> 
> In cases for which we are getting this warning, I'm not able to extract
> the
> `exact solr query`. Instead logger is logging `parsedquery ` for such
> cases.
> Here is one example:
> 
> 
> 2020-09-29 13:09:41.279 WARN  (qtp926837661-82461) [c:mycollection
> s:shard1_0 r:core_node5 x:mycollection_shard1_0_replica_n3]
> o.a.s.s.SolrIndexSearcher Query: [+FunctionScoreQuery(+*:*, scored by
> boost(product(if(max(const(0),
> sub(float(my_doc_value_field1),const(500))),const(0.01),
>
> if(max(const(0),sub(float(my_doc_value_field2),const(290))),const(0.2),const(1))),
>
> sqrt(product(sum(const(1),float(my_doc_value_field3),float(my_doc_value_field4)),
> sqrt(sum(const(1),float(my_doc_value_field5
> #BitSetDocTopFilter]; The request took too long to iterate over doc
> values.
> Timeout: timeoutAt: 1635297585120522 (System.nanoTime():
> 1635297690311384),
> DocValues=org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$8@7df12bf1
> 
> 
> 
> As per my understanding query in the above case is `q=*:*`. And then there
> is boost function which uses functional query on my_doc_value_field*
> (fieldtype doc_value_field i.e having index=false and docValue = true) to
> reorder matched docs. If docValue works efficiently for _function queries_
> then why this warning are coming?
> 
> 
> Also, we do use frange queries on doc_value_field (having index=false and
> docValue = true).
> example:
> {!frange l=1.0}my_doc_value_field1
> 
> 
> Erick Erickson wrote
>> Let’s see the query. My bet is that you are _searching_ against the field
>> and have indexed=false.
>> 
>> Searching against a docValues=true indexed=false field results in the
>> equivalent of a “table scan” in the RDBMS world. You may use
>> the docValues efficiently for _function queries_ to mimic some
>> search behavior.
>> 
>> Best,
>> Erick
> 
> 
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to Resolve : "The request took too long to iterate over doc values"?

2020-09-29 Thread raj.yadav
Hey Erick,

In cases for which we are getting this warning, I'm not able to extract the
`exact solr query`. Instead logger is logging `parsedquery ` for such cases.
Here is one example:


2020-09-29 13:09:41.279 WARN  (qtp926837661-82461) [c:mycollection
s:shard1_0 r:core_node5 x:mycollection_shard1_0_replica_n3]
o.a.s.s.SolrIndexSearcher Query: [+FunctionScoreQuery(+*:*, scored by
boost(product(if(max(const(0),sub(float(my_doc_value_field1),const(50))),const(0.01),if(max(const(0),sub(float(my_doc_value_field2),const(29))),const(0.2),const(1))),sqrt(product(sum(const(1),float(my_doc_value_field3),float(my_doc_value_field4)),sqrt(sum(const(1),float(my_doc_value_field5
#BitSetDocTopFilter]; The request took too long to iterate over doc values.
Timeout: timeoutAt: 1635297585120522 (System.nanoTime(): 1635297690311384),
DocValues=org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$8@7df12bf1



As per my understanding query in the above case is `q=*:*`. And then there
is boost function which uses functional query on my_doc_value_field*
(fieldtype doc_value_field i.e having index=false and docValue = true) to
reorder matched docs. If docValue works efficiently for _function queries_
then why this warning are coming?


Also, we do use frange queries on doc_value_field (having index=false and
docValue = true).
example:
{!frange l=1.0}my_doc_value_field1


Erick Erickson wrote
> Let’s see the query. My bet is that you are _searching_ against the field
> and have indexed=false.
> 
> Searching against a docValues=true indexed=false field results in the
> equivalent of a “table scan” in the RDBMS world. You may use
> the docValues efficiently for _function queries_ to mimic some
> search behavior.
> 
> Best,
> Erick





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


How to Resolve : "The request took too long to iterate over doc values"?

2020-09-29 Thread raj.yadav
In our index, we have few fields defined as `ExternalFileField` field type.
We decided to use docValues for such fields. Here is the field type
definition

OLD => (ExternalFileField)


NEW => (docValues)


After this modification we started getting the following `timeout warning`
messages:

```The request took too long to iterate over doc values. Timeout: timeoutAt:
1626463774823735 (System.nanoTime(): 1626463774836490),
DocValues=org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$8@4efddff
```

Our system configuration:
Each Solr Instance: 8 vcpus, 64 GiB memory
JAVA Memory: 30GB
Collection: 4 shards (each shard has approximately 12 million docs and index
size of 12 GB) and each Solr instance has one replica of the shard. 

GC_TUNE="-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:PermSize=64m \
-XX:MaxPermSize=64m \
-XX:TargetSurvivorRatio=80 \
-XX:MaxTenuringThreshold=9 \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+CMSClassUnloadingEnabled \
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled"
 
1. What this warning message means?
2. How to resolve it?




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to perform keyword (exact_title) match in solr with sow=true

2020-08-24 Thread raj.yadav
Hi Community members,

I tried the following approaches but non of them worked for my use case.

1. For achieving exact match in solr we have to kept sow='false' (solr will
use field centric matching mode) and grouped multiple similar fields  into
one copy field. It does solve the problem of recall but we use different
boost values per field. So it hits our precision.This approach is mentioned
in this search-relevance book
(https://livebook.manning.com/book/relevant-search/chapter-6?origin=product-toc)

2. i tried to perform exact matching using functional query but couldn't
found any supporting function. 

Is there any existing/patch solution for performing an exact match with
sow='true'. Please let me know.

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Force term-centric matching mode in solr and also perform keyword (exact_title) match

2020-08-20 Thread raj.yadav
We are using solr 5.4.0 in the production environment. we are planning to
migrate to solr 8.5.

We have observed that in solr 8.5 if we keep `sow`(split on whitespace)
parameter as false(default) query is parsed as field-centric and if `sow` is
marked as true query is parsed as term-centric. 

Our search application is more suited to term-centric match (with minimum
should match set as 100%) and we want to continue using that.

Currently, we are using edismax query parser and documents are
ranked/matched/boosted using qf (Query Fields) for term match  and pf
(Phrase Fields) for phrase matching. 

Along with term/phrase matching, we want to add exact matching
functionality. For this, we decided to define a field (say 'exact_match')
using KeywordTokenizerFactory. But the problem over here is `sow` parameter.
If we keep `sow` as true (which is required for term-centric match in solr
8.5) query terms are tokenized on whitespace before sending for matching on
exact_match field.

Is there any way we can keep using term-centric match and also support exact
matching?
We have found one workaround for this:
We append  and  token to the field value while indexing and
during query time we append same  and  token to the query
terms(s) and use phrase matching.

Since phrase matching is expensive as compared to keyword (exact) match. We
are looking for a way to support exact_match and also have term-centric
matching. 


I also found one jira ticket which is loosely related to this. 
https://issues.apache.org/jira/browse/SOLR-12779



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Force open a searcher in solr.

2020-08-10 Thread raj.yadav
Erick Erickson wrote
> Ah, ok. That makes sense. I wonder if your use-case would be better
> served, though, by “in place updates”, see:
> https://lucene.apache.org/solr/guide/8_1/updating-parts-of-documents.html
> This has been around in since Solr 6.5…

As per documentation `in place update` is only available for numeric
docValues (along with few more conditions). And here its external field
type.

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Why External File Field is marked as indexed in solr admin SCHEMA page?

2020-08-10 Thread raj.yadav
Hi Chris,


Chris Hostetter-3 wrote
> ...ExternalFileField is "special" and as noted in it's docs it is not 
> searchable -- it doesn't actaully care what the indexed (or "stored") 
> properties are ... but the default values of those properties as assigend 
> by the schema defaults are still there in the metadata of the field -- 
> which is what the schema API/browser are showing you.

As you mentioned above, that the `stored` parameter will also be ignored
(i.e doesn't matter whether its marked as false or true). So when we
retrieve the external fields using the `fl = field(exteranl_field_name)`
solr will always retrieve the field value from the external file.


Regards,
Raj





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


How to forcefully open new searcher, in case when there is no change in Solr index

2020-08-10 Thread raj.yadav
I have a use case where none of the document in my solr index is changing but
I still want to open a new searcher through the curl api. 

On executing the below curl command 
curl
"XXX.XX.XX.XXX:9744/solr/mycollection/update?openSearcher=true=true"
it doesn't open a new searcher. 

Below is what I get in logs
2020-08-10 09:32:22.696 INFO  (qtp297786644-6824) [c:mycollection
s:shard1_1_0 r:core_node6 x:mycollection_shard1_1_0_replica1]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2020-08-10 09:32:22.696 INFO  (qtp297786644-6819) [c:mycollection
s:shard1_0_1 r:core_node5 x:mycollection_shard1_0_1_replica1]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2020-08-10 09:32:22.696 INFO  (qtp297786644-6829) [c:mycollection
s:shard1_0_0 r:core_node4 x:mycollection_shard1_0_0_replica1]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2020-08-10 09:32:22.696 INFO  (qtp297786644-6824) [c:mycollection
s:shard1_1_0 r:core_node6 x:mycollection_shard1_1_0_replica1]
o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
2020-08-10 09:32:22.696 INFO  (qtp297786644-6819) [c:mycollection
s:shard1_0_1 r:core_node5 x:mycollection_shard1_0_1_replica1]
o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
2020-08-10 09:32:22.696 INFO  (qtp297786644-6766) [c:mycollection
s:shard1_1_1 r:core_node7 x:mycollection_shard1_1_1_replica1]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2020-08-10 09:32:22.696 INFO  (qtp297786644-6829) [c:mycollection
s:shard1_0_0 r:core_node4 x:mycollection_shard1_0_0_replica1]
o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
2020-08-10 09:32:22.696 INFO  (qtp297786644-6766) [c:mycollection
s:shard1_1_1 r:core_node7 x:mycollection_shard1_1_1_replica1]
o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
2020-08-10 09:32:22.697 INFO  (qtp297786644-6824) [c:mycollection
s:shard1_1_0 r:core_node6 x:mycollection_shard1_1_0_replica1]
o.a.s.c.SolrCore SolrIndexSearcher has not changed - not re-opening:
org.apache.solr.search.SolrIndexSearcher
2020-08-10 09:32:22.697 INFO  (qtp297786644-6819) [c:mycollection
s:shard1_0_1 r:core_node5 x:mycollection_shard1_0_1_replica1]
o.a.s.c.SolrCore SolrIndexSearcher has not changed - not re-opening:
org.apache.solr.search.SolrIndexSearcher
2020-08-10 09:32:22.697 INFO  (qtp297786644-6829) [c:mycollection
s:shard1_0_0 r:core_node4 x:mycollection_shard1_0_0_replica1]
o.a.s.c.SolrCore SolrIndexSearcher has not changed - not re-opening:
org.apache.solr.search.SolrIndexSearcher
2020-08-10 09:32:22.697 INFO  (qtp297786644-6824) [c:mycollection
s:shard1_1_0 r:core_node6 x:mycollection_shard1_1_0_replica1]
o.a.s.u.DirectUpdateHandler2 end_commit_flush
2020-08-10 09:32:22.697 INFO  (qtp297786644-6819) [c:mycollection
s:shard1_0_1 r:core_node5 x:mycollection_shard1_0_1_replica1]
o.a.s.u.DirectUpdateHandler2 end_commit_flush
2020-08-10 09:32:22.697 INFO  (qtp297786644-6829) [c:mycollection
s:shard1_0_0 r:core_node4 x:mycollection_shard1_0_0_replica1]
o.a.s.u.DirectUpdateHandler2 end_commit_flush

I don't want to do a complete reload of my collection.
Is there any parameter that can be used to forcefully open a new searcher
every time I do a commit with openSearcher=true. 

In our collection there are few ExternalFileField type and changes in the
external file is not getting reflected on issuing commits (using the curl
command mentioned above).

Thanks in advance for the help



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Why External File Field is marked as indexed in solr admin SCHEMA page?

2020-07-22 Thread raj.yadav
Chris Hostetter-3 wrote
> : *
>  : class="solr.ExternalFileField" valType="float"/>
> *
> : 
> : *
> 
> *
>   ...
> : I was expecting that for field "fieldA" indexed will be marked as false
> and
> : it will not be part of the index. But Solr admin "SCHEMA page" (we get
> this
> : option after selecting collection name in the drop-down menu)  is
> showing
> : it as an indexed field (green tick mark under Indexed flag).
> 
> Because, per the docs, the IndexSchema uses a default assumption of "true" 
> for the "indexed" property (if not specified at a field/fieldtype level) 
> ...
> 
> https://lucene.apache.org/solr/guide/8_4/field-type-definitions-and-properties.html#field-default-properties
> 
> Property: indexed
> Descrption: If true, the value of the field can be used in queries to
> retrieve matching documents.
> Values: true or false 
> Implicit Default: true
> 
> ...ExternalFileField is "special" and as noted in it's docs it is not 
> searchable -- it doesn't actaully care what the indexed (or "stored") 
> properties are ... but the default values of those properties as assigend 
> by the schema defaults are still there in the metadata of the field -- 
> which is what the schema API/browser are showing you.
> 
> 
> Imagine you had a a 
> 
>  that was a TextField -- implicitly 
> indexed="true" -- but it was impossible for you to ever put any values 
> in that field (say for hte sake of argument you used an analyzier that 
> threw away all terms).  The schema browser would say: "It's (implicitly) 
> marked indexed=true, therefore it's searchable" even though searching on
> that 
> field would never return anything ... equivilent situation with 
> ExternalFileField.
> 
> (ExternalFileField could be modified to override the implicit default for 
> these properties, but that's not something anyone has ever really worried 
> about because it wouldn't functionally change any of it's behavior)
> 
> 
> -Hoss
> http://www.lucidworks.com/

Thanks Chris.




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Why External File Field is marked as indexed in solr admin SCHEMA page?

2020-07-22 Thread raj.yadav
Vadim Ivanov wrote
> Hello, Raj
> 
> I've just checked my Schema page for external file field
> 
> Solr version 8.3.1 gives only such parameters for externalFileField:
> 
> 
> Field: fff
> 
> Field-Type:
> 
> org.apache.solr.schema.ExternalFileField
> 
> 
> Flags:
> 
> UnInvertible
> 
> Omit Term Frequencies & Positions
> 
> 
> Properties
> 
> √
> 
> √
> 
> 
> Are u sure you don’t have (or had)  fieldA  in main collection schema ?
> 
>  
> 
> externalFileField is not part of the index. It resides in separate file in
> Solr index directory and goes into memory every commit.

Hi Vadim Ivanov,

Earlier the fieldType and field I shared with were from solr_5.4.

I have cross the same thing in solr_8.5.2. I have created the following two
fieldTypes and field.
 








In fieldType `ext_file_fieldA` since I have explicitly mentioned about
indexed and stored parameter. I'm getting the expected result in solr SCHEMA
page. (PFA image file: fieldA_schema)

In fieldType `ext_file_fieldB` not mentioned anything about indexed and
stored parameters. I was expecting that the indexed parameter will be false
by default. But in solr SCHEMA page indexed flag is marked green √ (PFA
image file: fieldB_schema)


Please find attached files.

Regards,
Raj  
 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Best field definition which is only use for filter query.

2020-07-22 Thread raj.yadav
Erik Hatcher-4 wrote
> Wouldn’t a “string” field be as good, if not better, for this use case?

What is the rationale behind this type change to 'string'. How will it speed
up search/filtering? Will it not increase the index size. Since in general
string type takes more space storage then int (not sure about whats case in
lucene). 

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Best field definition which is only use for filter query.

2020-07-22 Thread raj.yadav
Erick Erickson wrote
> Also, the default pint type is not as efficient for single-value searches
> like this, the trie fields are better. Trie support will be kept until
> there’s a good alternative for the single-value lookup with pint.
> 
> So for what you’re doing, I’d change to TrieInt, docValues=false,
> index=true.

 
So, we should use TrieInt type for single-value searches (on a single value
and multivalue field). Please correct me if I'm wrong. 

Also in what scenarios we should prefer pint over TrieInt from both document
search(index) and retrieval(stored) latency point of view (not looking from
sorting or faceting point of view). Is there any documentation that compares
these two field types.

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: In-place update vs Atomic updates

2020-07-14 Thread raj.yadav
Shawn Heisey-2 wrote
> Atomic updates are nearly identical to simple indexing, except that the 
> existing document is read from the index to populate a new document 
> along with whatever updates were requested, then the new document is 
> indexed and the old one is deleted.

As per the above statement in atomic-update, it reindex the entire document
and deletes the old one.
But I was going through solr documentation regarding the  ( solr document
update policy
 
) and found these two contradicting statements:

1. /The first is atomic updates. This approach allows changing only one or
more fields of a document without having to reindex the entire document./

2./In regular atomic updates, the entire document is reindexed internally
during the application of the update. /

Is there something I'm missing here?

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html