Re: Modify data-conf.xml on the runtime

2018-04-11 Thread Shawn Heisey
On 4/11/2018 9:21 AM, rameshkjes wrote: > I am doing configuration of solr with the xml and pdf dataset, it works > perfect. But, I want to modify few things: > Such as, we can see below, "baseDir" and "filePrefix" is being defined > manually. I want this to be defined on the runtime. The way I

Re: Atomic update with condition

2018-04-11 Thread Shawn Heisey
On 4/11/2018 10:52 AM, SOLR4189 wrote: > How can I change field value by specific condition in indexing? > > Indexed Doc in SOLR: { id:1, foo:A } > Indexing Doc into SOLR: { id:1, foo: B } > > foo is single value field. > > Let's say I want to replace value of foo from A to B, if A > B, else do >

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Shawn Heisey
On 4/11/2018 9:23 AM, Adam Harrison-Fuller wrote: > In addition, here is the GC log leading up to the crash. > > https://www.dropbox.com/s/sq09d6hbss9b5ov/solr_gc_log_20180410_1009.zip?dl=0 I pulled that log into the http://gceasy.io website. This is a REALLY nice way to look at GC logs.  I do

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Kevin Risden
I'm going to share how I've debugged a similar OOM crash and solving it had nothing to do with increasing heap. https://risdenk.github.io/2017/12/18/ambari-infra-solr-ranger.html This is specifically for Apache Ranger and how to fix it but you can treat it just like any application using Solr.

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Deepak Goel
A few observations: 1. The Old Gen Heap on 9th April is about 6GB occupied which then runs up to 9+GB on 10th April (It steadily increases throughout the day) 2. The Old Gen GC is never able to reclaim any free memory Deepak "Please stop cruelty to Animals, help by becoming a Vegan" +91 73500

Re: Using Solr to search website and external Oracle ServiceCloud

2018-04-11 Thread Emir Arnautović
Hi, You have several options: 1. keep date as is and implement federated search logic with two queries - one to Solr and other to Oracle. PRO: no need to change data flow. CON: Results are not comparable when it comes to scores and you will probably need to present it as separate groups 2.

Nested streaming expression with differing "on" fields

2018-04-11 Thread Sarvothaman Madhavan
Hello solr users, Is it possible to perform streaming expression of the following type: intersect( search (collection_3, fl=“field_1,field_2",sort="field_2 asc",qt="/export",q=*:* ), intersect( search (collection_1, fl=“field_1,field_2",sort=“field_1

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Joe Obernberger
Just as a side note, when Solr goes OOM and kills itself, and if you're running HDFS, you are guaranteed to have write.lock files left over.  If you're running lots of shards/replicas, you may have many files that you need to go into HDFS and delete before restarting. -Joe On 4/11/2018

Re: Modify data-conf.xml on the runtime

2018-04-11 Thread Alexandre Rafalovitch
I believe you can use variable substitution like ${dih.request.paramname} and then pass 'paramname=value' in your HTTP request. Regards, Alex. On 11 April 2018 at 11:25, rameshkjes wrote: > Hi, > > I am doing configuration of solr with the xml and pdf dataset, it works

Atomic update with condition

2018-04-11 Thread SOLR4189
Hi all, How can I change field value by specific condition in indexing? Indexed Doc in SOLR: { id:1, foo:A } Indexing Doc into SOLR: { id:1, foo: B } foo is single value field. Let's say I want to replace value of foo from A to B, if A > B, else do nothing. Thank you. -- Sent from:

Re: Decision on Number of shards and collection

2018-04-11 Thread SOLR4189
I advise you to read the book Solr in Action. To answer your question you need to take account server resources that you have (CPU, RAM and disk), take account index size and take account average size single doc. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Modify data-conf.xml on the runtime

2018-04-11 Thread rameshkjes
Hi, I am doing configuration of solr with the xml and pdf dataset, it works perfect. But, I want to modify few things: Such as, we can see below, "baseDir" and "filePrefix" is being defined manually. I want this to be defined on the runtime. Consider that I have GUI and user is specifying the

Modify data-conf.xml on the runtime

2018-04-11 Thread rameshkjes
Hi, I am doing configuration of solr with the xml and pdf dataset, it works perfect. But, I want to modify few things: Such as, we can see below, "baseDir" and "filePrefix" is being defined manually. I want this to be defined on the runtime. Consider that I have GUI and user is specifying the

Re: Decision on Number of shards and collection

2018-04-11 Thread Emir Arnautović
Hi, Only you can tell what are acceptable query latency (I can tell you ideal - it is 0 :) Usually you start test with a single shard and start adding documents to it and measure query latency. When you start being close to max allowed latency, you have your shard size. Then you try to estimate

Re: replication

2018-04-11 Thread Erick Erickson
bq: are you simply flagging the fact that we wouldn't direct the queries to A v. B v. C since SolrCloud will make the decisions itself as to which part of the distro gets hit for the operation Yep. SolrCloud takes care of it all itself. I should also add that there are about a zillion metrics now

Re: Decision on Number of shards and collection

2018-04-11 Thread Erick Erickson
50M is a ballpark number I use as a place to _start_ getting a handle on capacity. It's useful solely to answer the "is it bigger than a breadbox and smaller than a house" question. It's totally meaningless without testing. Say I'm talking to a client and we have no data. Some are scared that

Re: Decision on Number of shards and collection

2018-04-11 Thread Abhi Basu
*The BKM I have read so far (trying to find source) says 50 million docs/shard performs well. I have found this in my recent tests as well. But of course it depends on index structure, etc.* On Wed, Apr 11, 2018 at 10:37 AM, Shawn Heisey wrote: > On 4/11/2018 4:15 AM,

Re: in-place updates

2018-04-11 Thread Emir Arnautović
Hi Hendrik, Documentation clearly states conditions when in-place updates are possible: https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates

Re: Decision on Number of shards and collection

2018-04-11 Thread Shawn Heisey
On 4/11/2018 4:15 AM, neotorand wrote: > I believe heterogeneous data can be indexed to same collection and i can > have multiple shards for the index to be partitioned.So whats the need of a > second collection?. yes when collection size grows i should look for more > collection.what exactly that

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Walter Underwood
One other note on the JVM options, even though those aren’t the cause of the problem. Don’t run four GC threads when you have four processors. That can use 100% of CPU just doing GC. With four processors, I’d run one thread. wunder Walter Underwood wun...@wunderwood.org

Re: Indexing fails with partially done

2018-04-11 Thread Emir Arnautović
Hi Neo, My DIH knowledge is a bit rusty, but I think that in best case, depending on your queries you might be able to use delta update to “resume” indexing, but it is likely that you cannot do that. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Adam Harrison-Fuller
In addition, here is the GC log leading up to the crash. https://www.dropbox.com/s/sq09d6hbss9b5ov/solr_gc_log_20180410_1009.zip?dl=0 Thanks! Adam On 11 April 2018 at 16:18, Adam Harrison-Fuller wrote: > Thanks for the advice so far. > > The directoryFactory is

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Adam Harrison-Fuller
Thanks for the advice so far. The directoryFactory is set to ${solr.directoryFactory:solr.NRTCachingDirectoryFactory}. The servers workload is predominantly queries with updates taking place once a day. It seems the servers are more likely to go down whilst the servers are indexing but not

Re: Confusing error when creating a new core with TLS, service enabled

2018-04-11 Thread Shawn Heisey
On 4/11/2018 8:29 AM, Christopher Schultz wrote: >> Unless you run Solr in cloud mode (which means using zookeeper), the >> server cannot create the core directories itself. When running in >> standalone mode, the core directory is created by the bin/solr program >> doing the "create" -- which was

Re: Indexing fails with partially done

2018-04-11 Thread Shawn Heisey
On 4/11/2018 6:46 AM, neotorand wrote: > with Solrcloud What happens if indexing is partially completed and ensemble > goes down.What are the ways to Resume.In one of the scenario i am using 3 ZK > Node in ensemble.Lets say i am indexing 5 million data and i have partially > indexed the data

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Walter Underwood
For readability, I’d use -Xmx12G instead of -XX:MaxHeapSize=12884901888. Also, I always use a start size the same as the max size, since servers will eventually grow to the max size. So: -Xmx12G -Xms12G wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) >

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Shawn Heisey
On 4/11/2018 4:01 AM, Adam Harrison-Fuller wrote: > I was wondering if I could get some JVM/GC tuning advice to resolve an > issue that we are experiencing. > > Full disclaimer, I am in no way a JVM/Solr expert so any advice you can > render would be greatly appreciated. > > Our Solr cloud nodes

RealTimeGetComponent performance

2018-04-11 Thread Blackknight
Hello guys! I want to write an update processor which will use an old doc to compare some fields with a new doc. I saw that in solr code guys used RealTimeGetComponent for getting old doc by index ID. How much it will influent on solr performance? In my case I allways add docs with atomic upates

Re: Confusing error when creating a new core with TLS, service enabled

2018-04-11 Thread Christopher Schultz
Shawn, On 4/10/18 10:16 AM, Shawn Heisey wrote: > On 4/10/2018 7:32 AM, Christopher Schultz wrote: >>> What happened is that the new core directory was created as root, >>> owned by root. >> Was it? If my server is running as solr, how can it create directories >> as root? > > Unless you run

Re: Indexing fails with partially done

2018-04-11 Thread neotorand
Thanks Emir with context to DIH do we have any Resume mechanism? Regards Neo -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Decision on Number of shards and collection

2018-04-11 Thread neotorand
Hi Emir, Thanks a lot for your reply. so when i design a solr eco system i should start with some rough guess on shards and increase the number of shards to make performance better.what is the accepted/ideal Response Time.There should be a trade off between Response time and the number of shards

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Sujay Bawaskar
What is directory factory defined in solrconfig.xml? Your JVM heap should be tuned up with respect to that. How solr is being use, is it more updates and less query or less updates more queries? What is OOM error? Is it frequent GC or Error 12? On Wed, Apr 11, 2018 at 6:05 PM, Adam

Re: Decision on Number of shards and collection

2018-04-11 Thread Emir Arnautović
Hi Neo, Shard size determines query latency, so you split your index when queries become too slow. Distributed search comes with some overhead, so oversharding is not the way to go either. There is no hard rule what are the best numbers, but here are some thought how to approach this:

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Emir Arnautović
Hi Adam, From Solr’s point of view, you should probably check your caches, mostly filterCache, fieldCache and fieldValueCache. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 11 Apr 2018, at

Re: Indexing fails with partially done

2018-04-11 Thread Emir Arnautović
Hi, First of all, the purpose of having 3 ZK in ensemble is to minimise the chances of loosing quorum. With 3 ZK, you can loose one and still have operable ZK ensemble. You should monitor it and alert if you loose one ZK and react to alert before you loose the second one. If you are wondering

Re: replication

2018-04-11 Thread John Blythe
thanks, erick. great info. although you can't (yet) direct queries to one or the other. So just making > them all NRT and forgetting about it is reasonable. are you simply flagging the fact that we wouldn't direct the queries to A v. B v. C since SolrCloud will make the decisions itself as to

Indexing fails with partially done

2018-04-11 Thread neotorand
with Solrcloud What happens if indexing is partially completed and ensemble goes down.What are the ways to Resume.In one of the scenario i am using 3 ZK Node in ensemble.Lets say i am indexing 5 million data and i have partially indexed the data and ZK ensemble goes down. What should be the

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Adam Harrison-Fuller
Hey Jesus, Thanks for the suggestions. The Solr nodes have 4 CPUs assigned to them. Cheers! Adam On 11 April 2018 at 11:22, Jesus Olivan wrote: > Hi Adam, > > IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical RAM, > your JVM can afford more RAM

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Jesus Olivan
Hi Adam, IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical RAM, your JVM can afford more RAM without threading penalties due to outside heap RAM lacks. Another good one would be to increase -XX:CMSInitiatingOccupancyFraction=50 to 75. I think that CMS collector works better

Re: Default Index config

2018-04-11 Thread mganeshs
Hi Shawn, We found following link where its mentioned like in 6.2.1 it's

Decision on Number of shards and collection

2018-04-11 Thread neotorand
Hi Team First of all i take this opportunity to thank you all for creating a beautiful place where people can explore ,learn and debate. I have been on my knees for couple of days to decide on this. When i am creating a solr cloud eco system i need to decide on number of shards and collection.

Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Adam Harrison-Fuller
Hey all, I was wondering if I could get some JVM/GC tuning advice to resolve an issue that we are experiencing. Full disclaimer, I am in no way a JVM/Solr expert so any advice you can render would be greatly appreciated. Our Solr cloud nodes are having issues throwing OOM exceptions under load.

Re: Ignore Field from indexing

2018-04-11 Thread Emir Arnautović
Hi, You have two options when it comes to updating: 1. Send complete document with the same id that will replace existing document. 2. Use atomic updates to send changes, but not that fields need to be stored: