Re: Managed Schemas and Version Control

2018-06-29 Thread Walter Underwood
I wrote a Python program that: 1. Gets a cluster status. 2. Extracts the Zookeeper location from that. 3. Uploads solr.xml and config to Zookeeper (using kazoo library). 4. Sends an async reload command. 5. Polls for success until all the nodes have finished the reload. 6. Optionally rebuilds the

Re: Querying in Solrcloud

2018-06-29 Thread Walter Underwood
We use an AWS ALB for all of our Solr clusters. One is 40 instances. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 29, 2018, at 8:33 PM, Sushant Vengurlekar > wrote: > > What are some of the suggested loadbalancers for solrcloud? Can AWS ELB

Re: Querying in Solrcloud

2018-06-29 Thread Sushant Vengurlekar
What are some of the suggested loadbalancers for solrcloud? Can AWS ELB be used for load balancing? On Fri, Jun 29, 2018 at 8:04 PM, Erick Erickson wrote: > In your setup, the load balancer prevents single points of failure. > > Since you're pinging a URL, what happens if that node dies or is

Re: Managed Schemas and Version Control

2018-06-29 Thread Erick Erickson
Adding to Shawn's comments. You've pretty much nailed all the possibilities, it depends on what you're most comfortable with I suppose. The only thing I'd add is that you probably have dev and prod environments and work out the correct schemas on dev then migrate to prod (at least that's what

Re: Querying in Solrcloud

2018-06-29 Thread Sushant Vengurlekar
Thanks for the detailed explanation Eric. Really helped clear up my understanding. On Fri, Jun 29, 2018 at 8:04 PM, Erick Erickson wrote: > In your setup, the load balancer prevents single points of failure. > > Since you're pinging a URL, what happens if that node dies or is turned > off? >

Re: Querying in Solrcloud

2018-06-29 Thread Erick Erickson
In your setup, the load balancer prevents single points of failure. Since you're pinging a URL, what happens if that node dies or is turned off? Your PHP program has no way of knowing what to do, but the load balancer does. Your understanding of Zookeeper's role shows a common misconception.

Re: Querying in Solrcloud

2018-06-29 Thread Sushant Vengurlekar
Thanks for your reply. I have a follow up question. Why is a load balancer needed? Isn't that the job of zookeeper to loadbalance queries across solr nodes? I was under the impression that you send query to zookeeper and it handles the rest and sends the response back. Can you please enlighten

Re: Querying in Solrcloud

2018-06-29 Thread Shalin Shekhar Mangar
You send your queries and updates directly to Solr's collection e.g. http://host:port/solr/. You can use any Solr node for this request. If the node does not have the collection being queried then the request will be forwarded internally to a Solr instance which has that collection. ZooKeeper is

Querying in Solrcloud

2018-06-29 Thread Sushant Vengurlekar
I have a question regarding querying in solrcloud. I am working on php code to query solrcloud for search results. Do I send the query to zookeeper or send it to a particular solr node? How does the querying process work in general. Thank you

Re: Managed Schemas and Version Control

2018-06-29 Thread Shawn Heisey
On 6/29/2018 3:26 PM, Zimmermann, Thomas wrote: > We're transitioning from Solr 4.10 to 7.x and working through our options > around managing our schemas. Currently we manage our schema files in a git > repository, make changes to the xml files, Hopefully you've got the entire config in version

Managed Schemas and Version Control

2018-06-29 Thread Zimmermann, Thomas
Hi, We're transitioning from Solr 4.10 to 7.x and working through our options around managing our schemas. Currently we manage our schema files in a git repository, make changes to the xml files, and then push them out to our zookeeper cluster via the zkcli and the upconfig command like:

Re: Solr 7.4 and Zookeeper 3.4.12

2018-06-29 Thread Walter Underwood
The documentation does not say that Solr uses the zk client 3.4.11. It says, "Solr currently uses Apache ZooKeeper v3.4.11.” That is on the page titled "Setting Up an External ZooKeeper Ensemble” in the section "Download Apache ZooKeeper”. Maybe that is supposed to mean “The Solr code uses the

Re: Solr 7.4 and Zookeeper 3.4.12

2018-06-29 Thread Zimmermann, Thomas
Thanks Shawn - I misspoke when I said recommendation, should have said ³packaged with². I appreciate the feedback and the quick updates to the Jira issue. We¹ll plan to proceed with 3.4.12 when we go live. -TZ On 6/29/18, 11:38 AM, "Shawn Heisey" wrote: >On 6/28/2018 8:39 PM, Zimmermann,

Re: 7.3 appears to leak

2018-06-29 Thread Erick Erickson
This is truly puzzling then, I'm clueless. It's hard to imagine this is lurking out there and nobody else notices, but you've eliminated the custom code. And this is also very peculiar: * it occurs only in our main text search collection, all other collections are unaffected; * despite what i

Re: Solr - zoo with more than 1000 collections

2018-06-29 Thread Yago Riveiro
Solr doesn’t scale very well with ~2K collections, and yes de bottleneck is Zookeeper itself. Zookeeper doesn’t perform operation as quickly as expected with folders with a lot of children. In a scenario where you are in a recovery state (a node crash), this limitation will hurt a lot, the

RE: 7.3 appears to leak

2018-06-29 Thread Markus Jelsma
Hello Erick, The custom search handler doesn't interact with SolrIndexSearcher, this is really all it does: public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { super.handleRequestBody(req, rsp); if (rsp.getToLog().get("hits") instanceof

Re: 7.3 appears to leak

2018-06-29 Thread Erick Erickson
bq. The only custom stuff left is an extension of SearchHandler that only writes numFound to the response headers. Well, one more to go ;). It's incredibly easy to overlook innocent-seeming calls that increment the underlying reference count of some objects but don't decrement them, usually

Solr - zoo with more than 1000 collections

2018-06-29 Thread Bertrand Mahé
Hi, In order to store timeseries data and perform deletion easily, we create a several collections per day and then use aliases. We are using SOLR 7.3 and we have 2 questions: Q1 : In order to access quickly the latest data would it be possible to load cores in descending chronological

RE: 7.3 appears to leak

2018-06-29 Thread Markus Jelsma
Hello Yonik, I took one node of the 7.2.1 cluster out of the load balancer so it would only receive shard queries, this way i could kind of 'safely' disable our custom components one by one, while keeping functionality in place by letting the other 7.2.1 nodes continue on with the full

Re: Graph, GraphML, Gephi and Edge Labels

2018-06-29 Thread Heidi McClure
Ok. Will do. I saw the place in the code, but haven’t managed to get the code to build, yet. > On Jun 29, 2018, at 9:03 AM, Joel Bernstein wrote: > > Hi, > > Currently the nodes expression doesn't have this capability. Feel free to > make a feature request on jira. This sounds like a fairly

Re: Sorting issue while using collection parameter

2018-06-29 Thread Erick Erickson
What _is_ your expectation? You haven't provided any examples of what your input and expectations _are_. You might review: https://wiki.apache.org/solr/UsingMailingLists string types are case-sensitive for instance, so that's one thing that could be happening. You can also specify

Re: /replication?command=details does not show infos for all replicas on the core

2018-06-29 Thread Shawn Heisey
On 6/29/2018 8:47 AM, Arturas Mazeika wrote: Out of curiosity: some cores give infos for both shards (through replication query) and some only for one (if you still be able to see the prev post). I wonder why.. Adding to what Erick said: If SolrCloud has initiated a replication on that core

Re: CursorMarks and 'end of results'

2018-06-29 Thread Erick Erickson
bq. It basically cuts down the search time in half in the usual case for us, so it's an important 'feature'. Wait. You mean that the "extra" call to get back 0 rows doubles your query time? That's surprising, tell us more. How many times does your "usual" use case call using CursorMark? My

Re: Solr 7.4 and Zookeeper 3.4.12

2018-06-29 Thread Shawn Heisey
On 6/28/2018 8:39 PM, Zimmermann, Thomas wrote: I was wondering if there was a reason Solr 7.4 is still recommending ZK 3.4.11 as the major version in the official changelog vs shipping with 3.4.12 despite the known regression in 3.4.11. Are there any known issues with running 7.4 alongside

Re: /replication?command=details does not show infos for all replicas on the core

2018-06-29 Thread Erick Erickson
Arturas: Please make yourself a promise, "Only use the collections commands" ;) At least for a while. Trying to mix collection-level commands and core-level commands is extremely confusing at the start. Under the covers, the Collections API _uses_ the Core API, but in a very precise manner. Any

Re: Graph, GraphML, Gephi and Edge Labels

2018-06-29 Thread Joel Bernstein
Hi, Currently the nodes expression doesn't have this capability. Feel free to make a feature request on jira. This sounds like a fairly easy feature to add. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 27, 2018 at 5:21 PM, Heidi McClure < heidi.mccl...@polarisalpha.com> wrote: >

Re: /replication?command=details does not show infos for all replicas on the core

2018-06-29 Thread Arturas Mazeika
Hi Shawn et al, Thanks a lot for the clarification. It makes a lot of sense and explains which functionality needs to be used to get the infos :-). Out of curiosity: some cores give infos for both shards (through replication query) and some only for one (if you still be able to see the prev

Re: /replication?command=details does not show infos for all replicas on the core

2018-06-29 Thread Shawn Heisey
On 6/29/2018 7:53 AM, Arturas Mazeika wrote: but the query reports infos on only one shard: F:\solr_server\solr-7.2.1>curl -s http://localhost:9996/solr/de_wiki_man/replication?command=details | grep "indexPath\|indexSize" "indexSize":"15.04 GB",

/replication?command=details does not show infos for all replicas on the core

2018-06-29 Thread Arturas Mazeika
Hi Solr-Team, I am benchmarking solr with the German Wikipedia pages on 4 nodes (Running on ports , 9998, 9997 and 9996), 4 shards, replication factor 2): "F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 3g -cloud -p -s "F:\solr_server\solr-7.2.1\example\cloud\node1\solr"

Re: Importance of having the lsof utility on our solr server VMs

2018-06-29 Thread THADC
Thanks. I think that's a good point that it helps recognize port conflict at start up. Although that scenario is unlikely in my case, I am going to try to get it installed. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SolrJ Kerberos Client API

2018-06-29 Thread Jason Gerlowski
Hi Tushar, You're right; the docs are a little out of date there. Krb5HttpClientConfigurer underwent some refactoring recently and came out with a different name: Krb5HttpClientBuilder. The ref-guide should update the snippet you were referencing to something more like:

Re: Retrieving json.facet from a search

2018-06-29 Thread Jason Gerlowski
You might also have luck using the "NoOpResponseParser" https://opensourceconnections.com/blog/2015/01/08/using-solr-cloud-for-robustness-but-returning-json-format/ https://lucene.apache.org/solr/7_0_0/solr-solrj/org/apache/solr/client/solrj/impl/NoOpResponseParser.html (Disclaimer: Didn't try

Re: Solr 7 MoreLikeThis boost calculation

2018-06-29 Thread Alessandro Benedetti
Hi Jesse, you are correct, the variable 'bestScore' used in the createQuery(PriorityQueue q) should be "minScore". it is used to normalise the terms score : tq = new BoostQuery(tq, boostFactor * myScore / bestScore); e.g. Queue -> Term1:100 , Term2:50, Term3:20, Term4:10 The minScore will be 10

Re: CursorMarks and 'end of results'

2018-06-29 Thread David Frese
Am 22.06.18 um 02:37 schrieb Chris Hostetter: : the documentation of 'cursorMarks' recommends to fetch until a query returns : the cursorMark that was passed in to a request. : : But that always requires an additional request at the end, so I wonder if I : can stop already, if a request returns

Re: Sorting issue while using collection parameter

2018-06-29 Thread Vijay Tiwary
Hello Eric, title is a string field On Wed, 27 Jun 2018, 9:21 pm Erick Erickson, wrote: > what kind of field is title? text_general or something? Sorting on a > tokenized field is usually something you don't want to do. If a field > has aardvard and zebra, how would it sort? > > There's

Re: Maximum number of SolrCloud collections in limited hardware resource

2018-06-29 Thread Emir Arnautović
Hi, It is probably the best if you merge some of your collections (or all) and have discriminator field that will be used to filter out tenant’s documents only. In case you go with multiple collections serving multiple tenants, you would have to have logic on top of it to resolve tenant to