Re: NRT replicas miss hits and return duplicate hits when paging solrcloud searches

2018-02-27 Thread Webster Homer
Emir, Using tlog replica types addresses my immediate problem. The secondary issue is that all of our searches show inconsistent results. These are all normal paging use cases. We regularly test our relevancy, and these differences creates confusion in the testers. Moreover, we are migrating

Re: Solr crashing StandardWrapperValve

2018-02-27 Thread Erick Erickson
You'd really have to talk to Cloudera for support, the version of Solr shipped with CDH isn't a standard distro. Best, Erick On Tue, Feb 27, 2018 at 8:25 AM, Wael Kader wrote: > Hello, > > SOLR kept crashing today over and over again . > I am running a single node solr

Re: Gentle reminder RE: Object not fetched because its identifier appears to be already in processing

2018-02-27 Thread Cassandra Targett
There is not enough information here for anyone to answer. You mention a "below message", but there is no message that we can see. If it was in an attachment to the mail, it got stripped by the mail server. If you want a response, please provide in the body of the mail details such as: the error

Re: Gentle reminder RE: Object not fetched because its identifier appears to be already in processing

2018-02-27 Thread Shawn Heisey
On 2/27/2018 7:08 AM, YELESWARAPU, VENKATA BHAN wrote: While indexing job is running we are seeing the below message for all the objects. Object not fetched because its identifier appears to be already in processing This time, I am going to include you as a CC on the message.  This is not

Configuration of SOLR Cluster

2018-02-27 Thread James Keeney
I'm setting up a solr cluster in AWS cloud and I need help with the configuration of ZooKeeper. The cluster has 3 ZK nodes and 3 Solr nodes There are two behaviors that are of concern: *1 - ZK ensemble not accepting return of node* Currently, when a ZK node in the ensemble goes down the ensemble

Searching for a phrase in proximity to another token in SOLR

2018-02-27 Thread Deyan Yotsov
Hello, Is there a way to achieve something along these lines: "("john smith") josh"~12 Thank you, Deyan

Re: Searching for a phrase in proximity to another token in SOLR

2018-02-27 Thread Erick Erickson
Did you try the ComplexPhraseQueryParser? See: https://lucene.apache.org/solr/guide/6_6/other-parsers.html Best, Erick On Tue, Feb 27, 2018 at 7:23 AM, Deyan Yotsov wrote: > Hello, > > Is there a way to achieve something along these lines: > > "("john smith") josh"~12 > >

New payload handling 7.2

2018-02-27 Thread Markus Jelsma
Hello, Our payload handling became broken since Lucene/Solr 7.2, we sometimes get 0.0 = AveragePayloadFunction.docScore() for some but not all query clauses. We only have payloads on some terms, to signal the similarity it needs to 'punish' the term, e.g. being a article or adjective. I

Gentle reminder RE: Object not fetched because its identifier appears to be already in processing

2018-02-27 Thread YELESWARAPU, VENKATA BHAN
Information Classification: ** Limited Access If any of you experts could help, we would greatly appreciate it. Thank you. From: YELESWARAPU, VENKATA BHAN Sent: Friday, February 23, 2018 8:30 AM To: 'd...@lucene.apache.org' ; 'solr-user@lucene.apache.org'

Solr crashing StandardWrapperValve

2018-02-27 Thread Wael Kader
Hello, SOLR kept crashing today over and over again . I am running a single node solr instance on Cloudera with 140 GB of data. Things were working fine until today. I have a replication server that I am replicating data to but it wasn't working before and was fixed today.. so I thought maybe its

Defining Document Transformers in Solr Configuration

2018-02-27 Thread simon
We do quite complex data pulls from a Solr index for subsequent analytics, currently using a home-grown Python API. Queries might include a handful of pseudofields which this API rewrites to an aliased field invoking a Document Transformer in the 'fl' parameter list. For example 'numcites' is

Re:SOLR Similarity Difference

2018-02-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Rick, I don't think the issue is BM25 vs TFIDF (the old similarity), it seems more due to the "matching" logic. you are asking to match: "(Action AND Technical AND Temporaries AND t/a AND CTR AND Corporation)" This (in theory) means that you want to retrieve **only** the documents that

Re: Changing Leadership in SolrCloud

2018-02-27 Thread Shawn Heisey
On 2/27/2018 1:36 AM, zahra121 wrote: Suppose I have a node which is a leader in SolrCloud. When I block this leader's SolrCloud and Zookeeper ports by the command "firewall-cmd --remove-port=/tcp --permanent", the leader does not change automatically and this leader status remains active in

RE: Question on "other language" than english stemmers and using both

2018-02-27 Thread TG Servers
Ok thank you. Sounds like a bit more reading into the whole thing. It's just a tool for me so i didn't want to go too deep into it bit sometimes a must is a must. :) default schema.xml? I just get this managed_schema file when installing. Do you mean that one? Am 27. Februar 2018 11:12:39

Re: Changing Leadership in SolrCloud

2018-02-27 Thread Amin Raeiszadeh
i don't understand your problem clearly but solr admin ui has some bugs. to check your cloud nodes state use the CLUSTERSTATUS command: /admin/collections?action=CLUSTERSTATUS in some cases your command was done but you can't see in admin ui. On Tue, Feb 27, 2018 at 12:49 PM, Shawn Heisey

RE: Question on "other language" than english stemmers and using both

2018-02-27 Thread Markus Jelsma
Maybe check the example directory, it has lots of languages configured: https://github.com/apache/lucene-solr/blob/master/solr/example/files/conf/managed-schema And be sure to check out the manual on the subject: https://lucene.apache.org/solr/guide/7_2/language-analysis.html -Original

Re: Rename solrconfig.xml

2018-02-27 Thread Shawn Heisey
On 2/27/2018 12:59 AM, Zheng Lin Edwin Yeo wrote: Regarding the core.properties, understand from the Solr guide that we need to define the "config" properties first. However, my core.properties will only be created when I create the collection from the command

RE: Question on "other language" than english stemmers and using both

2018-02-27 Thread Markus Jelsma
Hello, Mixing language specific filters in the same analyzer is not going to give predictable or desirable results. Instead, create separate text_en and text_de fieldTypes and fields. See Solr's default schema.xml, it has many examples of various languages. Depending on what query parser you

Re: Changing Leadership in SolrCloud

2018-02-27 Thread Zahra Aminolroaya
Thanks Shawn for the reply. when I try to add a document to solr I get the "no route to host" exception. this means that SolrCloud is aware of the blocking ports; However, zookeeper does not automatically change the leader! -- Sent from:

Re: NRT replicas miss hits and return duplicate hits when paging solrcloud searches

2018-02-27 Thread Emir Arnautović
Hi Webster, Since you are returning all hits, returning the last page is almost as heavy for Solr as returning all documents. Maybe you should consider just returning one large page and completely avoid this issue. I agree with you that this should be handled by Solr. ES solved this issue with

Re: Changing Leadership in SolrCloud

2018-02-27 Thread Zahra Aminolroaya
The leader status is active. My main question is that how I can change the leader in SolrCloud. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Changing Leadership in SolrCloud

2018-02-27 Thread zahra121
Suppose I have a node which is a leader in SolrCloud. When I block this leader's SolrCloud and Zookeeper ports by the command "firewall-cmd --remove-port=/tcp --permanent", the leader does not change automatically and this leader status remains active in solr admin UI. Thus, I decided to change

Re: Solr Phrase Count : How to get count of a phrase in a text field solr

2018-02-27 Thread aneeshkappu
Found the solution put `debug=results` at the end of solr url it will give you the phrase freq also. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
Ah, so there are ~560 shards per node and not all nodes are indexing at the same time. Why is that? You can have better throughput if indexing on all nodes. If happy with shard size, you can create new collection with 49 shards every 2h and have everything the same and index on all nodes. Back

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thanks for you reply again. I just said that you may have some misunderstanding, we have 49 solr nodes, each collection has 25 shards, each shard has only one replica of the data, there is no copy, and I reduce the part of the cache. If you need the metric data, I can check Come out to tell you,

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thank you, we were 49 shard 49 nodes, but later found that in this case, often disconnect between solr and zookeepr, zookeeper too many nodes caused solr instability, so reduced to 25 A follow-up performance can not keep up also need to increase back. Very slow when solr and zookeeper not found

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
Hi, To get more complete picture, can you tell us how many shards/replicas do you have per collection? Also what is index size on disk? Did you check GC? BTW, using 32GB heap prevents you from using compressed oops, resulting in less memory available than 31GB. Thanks, Emir -- Monitoring - Log

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
In addition, we found that the rate was normal when the number of collections was kept below 936 and the speed was slower and slower at 984. Therefore, we could only temporarily delete the older collection, but now we need more Online collection, there has been no good way to confuse us for a long

RE: Question on "other language" than english stemmers and using both

2018-02-27 Thread TG Servers
Ok thanks! Thomas Am 27. Februar 2018 11:36:52 vorm. schrieb Markus Jelsma : Maybe check the example directory, it has lots of languages configured: https://github.com/apache/lucene-solr/blob/master/solr/example/files/conf/managed-schema And be sure to check out

When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
I encountered a more serious problem in the process of using solr. We use the solr version is 6.0, our daily amount of data is about 500 billion documents, create a collection every hour, the online collection of more than a thousand, 49 solr nodes. If the collection in less than 800, the speed is

Re: Changing Leadership in SolrCloud

2018-02-27 Thread Shalin Shekhar Mangar
When you block communication between Zookeeper and the leader, the ZK client inside Solr will disconnect and its session will expire after the session timeout. At this point a new leader should be elected automatically. The default timeout is 30 seconds. You should be able to see the value in

Re: is it appropriate to use external cache for whole shards

2018-02-27 Thread Emir Arnautović
Hi, Assuming you have some web interface, it is not uncommon to apply caching in web browser/middle layer/Solr. The question is if you can live with stale data or if you have some nice mechanism to invalidate data when needed. Solr does that “blindly” - on every commit that includes opening

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thank you for reply. One collection has 25 shard one replica, one solr node has about 5T on desk. GC is checked ,and modify as follow : SOLR_JAVA_MEM="-Xms32768m -Xmx32768m " GC_TUNE=" \ -XX:+UseG1GC \ -XX:+PerfDisableSharedMem \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=8m \

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
Hi, It is hard to tell without looking more into your metrics. It seems to me that you are reaching limits of your cluster. I would doublecheck if memory is the issue. If I got it right, you have ~1120 shards per node. It takes some heap just to keep them open. If you have some caches enabled

Re: Rename solrconfig.xml

2018-02-27 Thread Zheng Lin Edwin Yeo
Hi Shawn, Yes, I'm running SolrCloud. Meaning we have to create all the cores in the collection with the default solrconfig.xml first? Then we have to modify the core.properties, and rename the solrconfig.xml. After which, we have to reload the renamed config to ZooKeeper, then reload the

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
This does not show much: only that your heap is around 75% (24-25GB). I was thinking that you should compare metrics (heap/GC as well) when running on without issues and when running with issues and see if something can be concluded. About instability: Do you run ZK on dedicated nodes? Emir --

solr src 6.0 ant error

2018-02-27 Thread 苗海泉
I encountered a problem, when I was in the process of compiling solr6.0 source error, I have installed the ant and ivy, and then solr6 source code catalog Executive eclipse ant eclipse would like to generate a project error as follows " Buildfile: D: \ solr-6.0.0-src \ solr-6.0.0 \ build.xml

Re: Configuration of SOLR Cluster

2018-02-27 Thread Shawn Heisey
On 2/27/2018 10:57 AM, James Keeney wrote: > *1 - ZK ensemble not accepting return of node* > Currently, when a ZK node in the ensemble goes down the ensemble is able to > do what it should do and keeps working. However when I bring the 3rd node > back online the other two nodes reject connection

Re: Configuration of SOLR Cluster

2018-02-27 Thread James Keeney
Shawn - First, it's good to know that this is unusual behavior. That actually helps as it lets me know that I should keep digging. Here are a couple of things that might help. In the configuration I am calling out all three ZK nodes. Here is the configuration of Solr: -DSTOP.KEY=solrrocks

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thank you, I read under the memory footprint, I set 75% recovery, memory occupancy at about 76%, the other we zookeeper not on a dedicated server, perhaps because of this cause instability. What else do you recommend for me to check? 2018-02-27 22:37 GMT+08:00 Emir Arnautović

Re: Defining Document Transformers in Solr Configuration

2018-02-27 Thread simon
On Tue, Feb 27, 2018 at 5:34 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) < dceccarel...@bloomberg.net> wrote: > I don't think you can define docTrasformer in the SolrConfig at the > moment, I agree it would be a cool feature. > > Maybe one possibility could be to use the update request processors

Re: Configuration of SOLR Cluster

2018-02-27 Thread Shawn Heisey
On 2/27/2018 6:42 PM, James Keeney wrote: -DzkHost=:2181,:2181,:2181 This looks correct, except that with AWS, I have no idea whether you need the internal IP addressing or the external IP addressing.  If all of the machines involved (both servers and clients) are able to communicate on the

Re: Changing Leadership in SolrCloud

2018-02-27 Thread Shalin Shekhar Mangar
When you say it is active, I presume you mean the "state" as returned by the Cluster Status API or as shown on the UI. But is it still the leader? Are you sure the firewall rules are correct? Do you see disconnected or session expiry exceptions in the leader logs? On Wed, Feb 28, 2018 at 12:21

Re: Gentle reminder RE: Object not fetched because its identifier appears to be already in processing

2018-02-27 Thread Shawn Heisey
On 2/28/2018 12:06 AM, YELESWARAPU, VENKATA BHAN wrote: Thank you for your reply Shawn. I'm not part of that user list so I never received any emails so far. Could you please subscribe me (vyeleswar...@statestreet.com) or let me know the process? Also I would greatly appreciate if you could

Re:Defining Document Transformers in Solr Configuration

2018-02-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I don't think you can define docTrasformer in the SolrConfig at the moment, I agree it would be a cool feature. Maybe one possibility could be to use the update request processors [1], and precompute the fields at index time, it would be more expensive in disk and index time, but then it

Re: Defining Document Transformers in Solr Configuration

2018-02-27 Thread Mikhail Khludnev
Hello, Simon. You can define a search handler where have numcites:[subquery]=pmid={!terms f=md_c_pmid v=$row.pmid}=10=q or something like that. On Tue, Feb 27, 2018 at 11:20 PM, simon wrote: > We do quite complex data pulls from a Solr index for subsequent analytics, >

Re: Changing Leadership in SolrCloud

2018-02-27 Thread Zahra Aminolroaya
Thanks Shalin. our "zkClientTimeout" is 3, so the leader should be changed by now; However, the previous leader is still active. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SOLR Similarity Difference

2018-02-27 Thread Rick Leir
Rick Did you experiment in the SolrAdmin analysis page? It would possibly tell you whether your chain is doing what you expect. Then you need to consider that boolean logic is not strictly boolean in Solr. There is a Lucidworks blog which explains this nicely; every now and then someone posts