Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Erick Erickson
This has been an occasional problem with clusters with lots of replicas in aggregate. There was a major improvement in how large Overseer queues are handled in SOLR-10619 which was released with Solr 6.6. that you might want to look at. If you can't go to 6.6 (or apply the patch yourself to your

solr jetty based auth and distributed solr requests

2017-08-22 Thread radha krishnan
Hi, I enabled jetty basic auth for solr by making changes to jetty.xml and add a 'realm.properties' while basic queries are working, queries involving more than one shard is not working. i went through the code and figured out that in HttpShardHandler, there is no provision to specify a

Re: SOLR Learning to Rank Questions

2017-08-22 Thread Michael Nilsson
Hey Jaoa! To also address your second question, the purpose of the normalizers is to ensure that whatever manipulation you did to your feature values offline at training time (say to minimize floating point precision roundoff) also get reflected online at query rerank time, since you will be

Re: Per Text Field Similarity Measures for Learning to Rank

2017-08-22 Thread Michael Nilsson
Hi Michael, Using your example, if you have 5 different fields, you could create 5 individual SolrFeatures against those fields. The one tricky thing here is that you want to use different similarity scoring mechanisms against your fields. By default, Solr uses a single Similarity class

Excessive resources consumption migrating from Solr 6.6.0 Master/Slave to SolrCloud 6.6.0 (dozen times more resources)

2017-08-22 Thread Daniel Ortega
*Main Problems* We are involved in a migration from Solr Master/Slave infrastructure to SolrCloud infrastructure. The main problems that we have now are: - Excessive resources consumption: Currently we have 5 instances with 80 processors/768 GB RAM each instance using SSD Hard Disk

Re: QueryParser changes query by itself [solved]

2017-08-22 Thread Steve Rowe
Hi Bernd, > On Aug 22, 2017, at 4:31 AM, Bernd Fehling > wrote: > > But the QueryBuilder only calls "stream.reset()", it never calls > "stream.end()" so that Filters > in the Analyzer chain can't do any cleanup (like my Filter wanted to do). > I moved my

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
It is a known problem: https://cwiki.apache.org/confluence/display/CURATOR/TN4 There are multiple JIRAs around this, like the one I pointed to earlier: https://issues.apache.org/jira/browse/SOLR-10524 There it states: This JIRA is to break out that part of the discussion as it might be an

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Jeff Courtade
righto, thanks very much for your help clarifying this. I am not alone :) I have been looking at this for a few days now. I am seeing people who have experienced this issue going back to solr version 4.x. I am wondering if it is an underlying issue with the way the q is managed. I would think

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
- stop all solr nodes - start zk with the new jute.maxbuffer setting - start a zk client, like zkCli, with the changed jute.maxbuffer setting and check that you can read out the overseer queue - clear the queue - restart zk with the normal settings - slowly start solr On 22.08.2017 15:27, Jeff

group.truncate for json facets not working?

2017-08-22 Thread Stefan Matheis
Hi all, just a quick santiy check - group.truncate does apply for old-school facets, but it seems to be ignored for json.facets. I’m trying to strip my configuration down to verify that i’m not mistaken. for others that are using json facets together with grouping, did you expirience

Re: Few document replication not happen in solr cloud

2017-08-22 Thread Sanjay Lokhande
some more details on the issue. I am having 5 nodes solr cloud setup with single shard. The solr version is 5.2.1. server1 (http://146.XXX.com:4001/solr/contracts_shard1_replica4)is the leader. A document with id '43e14a86cbdd422880cac22d9a15d3c0' was not replicated 3 nodes. Log shows

Machine Learning for search

2017-08-22 Thread Joe Obernberger
Hi All - One of the really neat features of solr 6 is the ability to create machine learning models (information gain) and then use those models as a query.  If I want a user to be able to execute a query for the text Hawaii and use a machine learning model related to weather data, how can I

Re: Solr uses lots of shared memory!

2017-08-22 Thread Shawn Heisey
On 8/22/2017 7:24 AM, Markus Jelsma wrote: > I have never seen this before, one of our collections, all nodes eating tons > of shared memory! > > Here's one of the nodes: > 10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8 2511:46 java > > RSS is roughly equal to heap size + usual

Few document replication not happen in solr cloud

2017-08-22 Thread Sanjay Lokhande
Hi guys, I having 5 node solr cloud setup. The logs indicates leader and 2 solr node receiving document add request. The other 2 nodes logs did not show the entry to add the document and also these nodes missing the document. - how such issue can be troubleshoot? any

Re: Solrcloud setup

2017-08-22 Thread Susheel Kumar
Stop optimizing call and see if that resolves the problem. Also how are you indexing? (point 3 above). Are you using CloudSolrClient or manually sending requests to any node? Thanks, Susheel On Tue, Aug 22, 2017 at 9:27 AM, Shreya Kampli wrote: > Hi, > > I have setup a

Solrcloud setup

2017-08-22 Thread Shreya Kampli
Hi, I have setup a solrcloud with 1 shard and 3 replicas on solr 6.6. Every index operation makes an explicit commit so that the documents are available immediately. The indexing happens every 5 minutes for few 100's of documents. My problem is that I see that the replica nodes are frequently

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Jeff Courtade
I set jute.maxbuffer on the so hosts should this be done to solr as well? Mine is happening in a severely memory constrained end as well. Jeff Courtade M: 240.507.6116 On Aug 22, 2017 8:53 AM, "Hendrik Haddorp" wrote: > We have Solr and ZK running in Docker

Solr uses lots of shared memory!

2017-08-22 Thread Markus Jelsma
Hi, I have never seen this before, one of our collections, all nodes eating tons of shared memory! Here's one of the nodes: 10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8 2511:46 java RSS is roughly equal to heap size + usual off-heap space + shared memory. Virtual is equal to

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
We have Solr and ZK running in Docker containers. There is no more then one Solr/ZK node per host but Solr and ZK node can run on the same host. So Solr and ZK are spread out separately. I have not seen this problem during normal processing just when we recycle nodes or when we have nodes

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Jeff Courtade
Thanks very much. I will followup when we try this. Im curious in the env this is happening to you are the zookeeper servers residing on solr nodes? Are the solr nodes underpowered ram and or cpu? Jeff Courtade M: 240.507.6116 On Aug 22, 2017 8:30 AM, "Hendrik Haddorp"

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
I'm always using a small Java program to delete the nodes directly. I assume you can also delete the whole node but that is nothing I have tried myself. On 22.08.2017 14:27, Jeff Courtade wrote: So ... Using the zkCli.sh i have the jute.maxbuffer setup so I can list it now. Can I rmr

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Jeff Courtade
So ... Using the zkCli.sh i have the jute.maxbuffer setup so I can list it now. Can I rmr /overseer/queue Or do i need to delete individual entries? Will rmr /overseer/queue/* work? Jeff Courtade M: 240.507.6116 On Aug 22, 2017 8:20 AM, "Hendrik Haddorp"

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
When Solr is stopped it did not cause a problem so far. I cleared the queue also a few times while Solr was still running. That also didn't result in a real problem but some replicas might not come up again. In those case it helps to either restart the node with the replicas that are in state

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Jeff Courtade
How does the cluster react to the overseer q entries disapeering? Jeff Courtade M: 240.507.6116 On Aug 22, 2017 8:01 AM, "Hendrik Haddorp" wrote: > Hi Jeff, > > we ran into that a few times already. We have lots of collections and when > nodes get started too fast

Re: 700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Hendrik Haddorp
Hi Jeff, we ran into that a few times already. We have lots of collections and when nodes get started too fast the overseer queue grows faster then Solr can process it. At some point Solr tries to redo things like leaders votes and adds new tasks to the list, which then gets longer and

700k entries in overseer q cannot addreplica or deletereplica

2017-08-22 Thread Jeff Courtade
Hi, I have an issue with what seems to be a blocked up /overseer/queue There are 700k + entries. Solr cloud 6.x You cannot addreplica or deletereplica the commands time out. Full stop and start of solr and zookeeper does not clear it. Is it safe to use the zookeeper supplied zkCli.sh to

Re: Get results in multiple orders (multiple boosts)

2017-08-22 Thread Rick Leir
Luca, Did you say _slower_ mySQL? It is blazing fast, I used it with over 10m records and no appreciable latency. The underlying InnoDB is excellent. Design your schema using mySQLworkbench. Cheers -- Rick On August 22, 2017 2:16:07 AM EDT, Luca Dall'Osto wrote:

Re: QueryParser changes query by itself [solved]

2017-08-22 Thread Bernd Fehling
Finally I solved the problem :-) I don't know if it's a bug or a feature in org.apache.lucene.util.QueryBuilder but I solved it in my Filter code which feels like a dirty hack. The TokeStream API says in the docs:

Re: Get results in multiple orders (multiple boosts)

2017-08-22 Thread Luca Dall'Osto
Hello, thank you for your responses. Ok, therefore I have to archive this problem with no appropriate solution in Solr, and try to do it with a relation-based DB such as mySQL or Postgres. Build the custom sort function could be a valid solution instead of use the slower mySQL or try Postgres (I

RE: FastVector does not highlight for phrase query when it contains stop word/s

2017-08-22 Thread Jagdish Vasani
Hi Rick, Thanks for response. I understood that If I do not use StopFilter factory or do not exclude stop words , than it will solve problem. But here stop words excluded and search is working well with stop word in phrase query.. but fast vector highlighter does not highlighting. I debug the