Re: Use TopicStream as percolator

2020-05-01 Thread SOLR4189
Hi everyone, I wrote SOLR Update Processor that wraps Luwak library and implements Saved Searches a la ElasticSearch Percolator. https://github.com/SOLR4189/solcolator for anyone who wants to use. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Weak Leader & Weak Replica VS Strong Leader

2020-03-21 Thread SOLR4189
Hi all, Maybe a tricky question little bit, but I need to ask. Let's say I have infinite RAM and infinite SSDs, but I have deficiency of CPU (Lets's say 4 CPU for each shard). So, my question is which is more preferable: 1. One leader with 4 CPU OR 2. One leader with 2 CPU and one replica with

Solutio for long time highlighting

2019-08-28 Thread SOLR4189
Hi all. In our team we thought about some tricky solution for queries with long time highlighting. For example, highlighting that takes more than 25 seconds. So, we created our component that wraps highlighting component of SOLR in this way: public void inform(SolrCore core) { . . . .

Re: Distributed IDF in Alias

2019-05-18 Thread SOLR4189
I ask my question due to I want to use TRA (Time Routed Aliases). Let's say SOLR will open new collection every month. In the beginning of month a new collection will be empty almost. So IDF will be different between new collection and collection of previous month? -- Sent from:

Distributed IDF in Alias

2019-05-17 Thread SOLR4189
Hi all, Can somebody explain me SOLR tip from here : /"Any alias (standard or routed) that references multiple collections may complicate relevancy. By default, SolrCloud scores documents on a per

Re: The easiest way to get array of matched terms

2019-05-06 Thread SOLR4189
Nice feature, but it isn't what I search. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

The easiest way to get array of matched terms

2019-05-06 Thread SOLR4189
Hi all, What is the easiest way to get array of matched terms per doc? I don't need positions or offsets, matched terms only. I found a way - debug=results, but it requires to parse result (for example, extract term from weight(field_name: term). Maybe does somebody know another way? Without

Re: Optimal RAM to size index ration

2019-04-15 Thread SOLR4189
All my queries from production environments, from real customers. I build query player that runs queries in the same time intervals like in PRODUCTION (all customers' queries with time intervals between them are saved in splunk). So all queries are distinct. -- Sent from:

Re: Optimal RAM to size index ration

2019-04-15 Thread SOLR4189
No, I don't load index to RAM, but I run 8 hours queries, so OS must load necessary files (segments) to RAM during my tests. So in the case where I set 25GB for RAM, not all files will be loaded to RAM and I thought I'll see degradation in queries times, but I didn't -- Sent from:

Optimal RAM to size index ration

2019-04-15 Thread SOLR4189
Hi all, I have a collection with many shards. Each shard is in separate SOLR node (VM) has 40Gb index size, 4 CPU and SSD. When I run performance checking with 50GB RAM (10Gb for JVM and 40Gb for index) per node and 25GB RAM (10Gb for JVM and 15Gb for index), I get the same queries times

Strange disk size behavior

2019-03-21 Thread SOLR4189
Hi all. We use SOLR-6.5.1 and in our cluster each solr core is placed in different virtual machine (one core per one node). Each virtual machine has 104 Gb size of disk. Yesterday we marked that several solr cores use disk space in the abnormal manner. In running command *"df -h

Re-read from CloudSolrStream

2019-02-18 Thread SOLR4189
Hi all, Let's say I have a next code: http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html public class StreamingClient { public static void main(String args[]) throws IOException {

Re: Createsnapshot null pointer exception

2019-02-18 Thread SOLR4189
I think, you don't understand what I mean. 1) I create collection with X shards, each shard has hash range (by CREATE collection command) 2) I add Y new shards in the same collection, each shard hasn't hash range, I call them gateways (by CREATE core command) 3) I add LoadBalancer over Y

Re: Createsnapshot null pointer exception

2019-02-12 Thread SOLR4189
Ok. I understood my problem. Usually I create collection with X shards and then add some Y cores. This Y cores I use like gateways or federators (my web application sends queries to load balancer that connected to Y cores only). When I create Y cores, I used this command

Re: Createsnapshot null pointer exception

2019-02-12 Thread SOLR4189
Ok. I understood my problem. Usually I create collection with X shards and then add some Y cores. This Y cores I use like gateways or federators (my web application sends queries to load balancer that connected to Y cores only). When I create Y cores, I used this command

Createsnapshot null pointer exception

2019-02-11 Thread SOLR4189
Hi all, I use SOLR-6.5.1. When I run this command: *http://my_server_name:8983/solr/admin/collections?action=CREATESNAPSHOT=collection_name=MYCommit* I got this exception: Collection: collection_name operation: createsnapshot failed: java.lang.NullPointerException at

Curator in SOLR

2019-01-13 Thread SOLR4189
Hi all, I want to use TimeRoutedAlias collection. But first of all I have a question : Does Solr have something like Curator in ElasticSearch? How can I manage/move old read-only collections to "weaker hardware"? -- Sent from:

Backup in SOLR 6.5.1

2018-12-25 Thread SOLR4189
Hi all, I use SOLR-6.5.1 and I want to understand how to work with backups in SOLR. I did some checks in SOLR-6.5.1 and I have some problems: 1. If I backup a dynamic collection (while there's constant indexing in the background), I get a NoSuchFileException, but in a static collection (with no

Re: Index size issue in SOLR-6.5.1

2018-10-07 Thread SOLR4189
About which details do you ask? Yesterday we restarted all our solr services and index size in serverX descreased from 82Gb to 60Gb, and in serverY index size didn't change (49Gb). -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Index size issue in SOLR-6.5.1

2018-10-07 Thread SOLR4189
Hi all, We use SOLR-6.5.1 and we have very strange issue. In our collection index size is very different from server to server (33gb difference): 1. We have index size 82Gb in serverX and 49Gb in serverY 2. ServerX displays 82gb used place if we run "df -h

Replication error in SOLR-6.5.1

2018-09-25 Thread SOLR4189
Hi all, I use SOLR-6.5.1. Before couple weeks I started to use replication feature in cloud mode without override default behavior of ReplicationHandler. After deployment replication feature to production, almost every day I hit these errors: SolrException: Unable to download completely.

SOLR in Openshift with indexing from Hadoop

2018-07-24 Thread SOLR4189
Hi all, We try to use SOLR cloud in openshift. We manage our Solr by StatefulSet. All SOLR functionalities work good except indexing. We index our docs from HADOOP by SolrJ jar that try to index to specific Pod, but openshift blocks access to internal Pods. In my case, separate service for

Re: Using LUWAK in SOLR

2018-06-25 Thread SOLR4189
Ok. If somebody needs I found solution: https://github.com/flaxsearch/luwak/issues/173 -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Using LUWAK in SOLR

2018-06-22 Thread SOLR4189
Does somebody use LUWAK for percolator functionality in UpdateProcessor? I noticed that when I passed my docs in batches (3000 docs in batch) through Monitor I don't get all matching pairs. When I passed my docs in batches with one doc per batch I get all results. What can it be? Has LUWAK batch

Indexing to replica instead leader

2018-06-08 Thread SOLR4189
I'm using SOLR 6.5.1 in cloud mode with replicas. I read here : /When a document is sent to a Solr node for indexing, the system first determines which Shard that document belongs to, and then which node is

Re: Different docs order in different replicas of the same shard

2018-06-08 Thread SOLR4189
I think that I found very simple solution: to set my updateProcessorsChain to default="true" and it is solving all my problems without moving all post-updateprocessors to be pre-updateprocessors. What do you think about it? -- Sent from:

Re: Different docs order in different replicas of the same shard

2018-06-01 Thread SOLR4189
I thought about next solution for the my problem: Atomic Update + Replicas. I can set my *UpdateProcessorsChain* in the next order: .. . MergerUpdateProcessor will use getUpdatedDocument function of DistibutedUpdateProcessor

Re: Different docs order in different replicas of the same shard

2018-05-25 Thread SOLR4189
You are right, BUT I have two indexers (one in WCF service and one in HADOOP) and in two my indexers I'm using atomic updates in each document. According to Atomic Update Processor Factory and according to your solution

Different docs order in different replicas of the same shard

2018-05-25 Thread SOLR4189
I use SOLR-6.5.1 and I want to start to use replicas. For it I want to understand something: 1) Can asynchronous forwarding document from leader to all replicas or some another reasons cause that replica A may see update X then Y, and replica B may see update Y then X? If yes, thus a particular

Atomic update with condition

2018-04-11 Thread SOLR4189
Hi all, How can I change field value by specific condition in indexing? Indexed Doc in SOLR: { id:1, foo:A } Indexing Doc into SOLR: { id:1, foo: B } foo is single value field. Let's say I want to replace value of foo from A to B, if A > B, else do nothing. Thank you. -- Sent from:

Re: Decision on Number of shards and collection

2018-04-11 Thread SOLR4189
I advise you to read the book Solr in Action. To answer your question you need to take account server resources that you have (CPU, RAM and disk), take account index size and take account average size single doc. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Use TopicStream as percolator

2018-04-08 Thread SOLR4189
Hi all, I need to implement percolator functionality in SOLR (i.e. get all indexed docs that are matched to monitored query). How can I do this? I found out in Solr TopicStream class. If I understand right, using TopicStream with DaemonStream will give me percolator functionality, won't it?

Re: Replicas: sending query to leader and replica simultaneously

2018-03-04 Thread SOLR4189
Today I found something interesting that exists in ElasticSearch. It's called Adaptive Replica Selection https://www.elastic.co/guide/en/elasticsearch/reference/6.1/search.html Did you hear about it? Maybe exists something in SOLR? I think it's very useful for my case. -- Sent from:

Re: Replicas: sending query to leader and replica simultaneously

2018-02-14 Thread SOLR4189
Thank you, Emir for your answer *But it will not send request to multiple replicas - that would be a waste of resources.* What if server is overloaded, but it is responsive? Then it will not be a waste of resources, because second replica will response faster then overloaded replica. *and flag

Replicas: sending query to leader and replica simultaneously

2018-02-13 Thread SOLR4189
Hi all, I use SOLR-6.5.1 and I want to start to use replicas in SolrCloud mode. I read ref guide and Solr in Action, and I want to make sure only one thing about REPLICAS: SOLR can't send query both to leader and to slave simultaneously and returns the fastest response of them? (in the case

Re: Using replicas in SOLR-6.5.1

2018-01-27 Thread SOLR4189
1. You are right, due to memory and garbage collection issues I set each shard to different VM. So in my VM I has 50 GB RAM (10 GB for JVM and 40 GB for index) and it works good for my using case. Maybe I don't understand solr terms, but if you say to set one VM for 20 shards what does it mean? 20

Using replicas in SOLR-6.5.1

2018-01-27 Thread SOLR4189
I use SOLR-6.5.1. I would like to use SolrCloud replicas. And I have some questions: 1) What is the best architecture for this if my collection contains 20 shards, and each shard is in different vm? 40 vms where 20 for leaders and 20 for replicas? Or maybe stay with 20 vms where leader and

Using TimeAllowed parameter in SOLR-6.5.1

2018-01-14 Thread SOLR4189
I started to use timeAllowed parameter in SOLR-6.5.1. And got too many (each second) exceptions null:java.lang.NullPointerException at org.apache.lucene.search.TimeLimitingCollector.needScores(TimeLimitingCollector.java:166) caused to perfomance problems. For reproducing exception need

SOLR 6.5.1: timeAllowed parameter with grouping

2017-12-21 Thread SOLR4189
A month ago we upgraded our SOLR from 4.10.1 to 6.5.1. Now we want to use timeAllowed parameter that was fixed in Solr 5. We check this parameter in test servers and we don't understand if it works with group=true or not. * If we set group=false and timeAllowed=1 and query with too many

SOLR 6.5.1: timeAllowed parameter with grouping

2017-12-21 Thread SOLR4189
A month ago we upgraded our SOLR from 4.10.1 to 6.5.1. Now we want to use timeAllowed parameter that was fixed in Solr 5. We check this parameter in test servers and we don't understand if it works with group=true or not. * If we set group=false and timeAllowed=1 and query with too many

Re: Some problems in SOLR-6.5.1

2017-10-25 Thread SOLR4189
Ofcource I did it. I did all changes in solrconfig.xml and used IndexUpgrader from 4 to 5 and then from 5 to 6. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Some problems in SOLR-6.5.1

2017-10-24 Thread SOLR4189
Before two days we upgraded our SOLR servers from 4.10.1 version to 6.5.1. We explored logs and saw too many errors like: 1) org.apache.solr.common.SolrException; null:java.lang.NullPointerException at

Number of threads in SOLR grew up without blacklight

2017-09-01 Thread SOLR4189
I'm not sure if this forum is good place for my question, but I want to try. Maybe somebody can help me. I have web application is based on blacklight for working with SOLR (also I use ruby gem for SOLR connection - rsolr). My task is to remove blacklight from my application. In the last two

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

2017-08-13 Thread SOLR4189
> If you are changing things like WordDelimiterFilterFactory to the graph > version, you'll definitely want to reindex What does it mean "*want to reindex*"? If I change WordDelimiterFilterFactory to the graph and use IndexUpgrader is it mistake? Or changes will not be affected only? -- View

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

2017-08-11 Thread SOLR4189
Yes, only because I'm seeing different results. For example, changing *WordDelimiterFilterFactory *to *WordDelimiterGraphFilterFactory * can change order of docs? ( http://lucene.apache.org/core//6_5_1/analyzers-common/index.html?deprecated-list.html

Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

2017-08-04 Thread SOLR4189
Hey all, I need to upgrade from SOLR-4.10.3 to SOLR-6.5.1 in production environment. When I checked it in the test environment, I noticed the order of returned docs for each query is different. The score has changed as well. I use same similarity algorithm - OccapiBM25 as in previous version.

Using ASCIIFoldingFilterFactory

2017-07-03 Thread SOLR4189
Hey all, I need to convert alphabetic, numeric and symbollic unicode characters to their ASCII equivalents. The solr.ASCIIFoldingFilterFactory is the solution for my request. I'm wondering if my usage of the filter is correct and if anyone encountered any problems using the specified filter (I'm

Re: Solr 6: how to get SortedSetDocValues from index by field name

2017-06-20 Thread SOLR4189
Hi, Tomas. It helped. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-6-how-to-get-SortedSetDocValues-from-index-by-field-name-tp4340388p4342002.html Sent from the Solr - User mailing list archive at Nabble.com.

Solr 6: how to get SortedSetDocValues from index by field name

2017-06-13 Thread SOLR4189
How do I get SortedSetDocValues from index by field name? I try it and it works for me but I didn't understand why to use leaves.get(0)? What does it mean? (I saw such using in TestUninvertedReader.java of SOLR-6.5.1): *Map mapping = new HashMap<>();

Re: Different DateTime format in dataimport and index

2017-06-06 Thread SOLR4189
I don't use DB. I do dataimport from one collection of SOLR to another collection with the same configuration. -- View this message in context: http://lucene.472066.n3.nabble.com/Different-DateTime-format-in-dataimport-and-index-tp4339230p4339244.html Sent from the Solr - User mailing list

Different DateTime format in dataimport and index

2017-06-06 Thread SOLR4189
Let's say I have SolrDoc: *{id: test1, price: 100, name: pizza, pickupTime: 2017-06-06T19:00:00}*, where type of id is int, type of price is float, type of name is string and type of pickupTime is tdate/date. And let's say I have my update processor that writes to log indexed item. So, my

DateUtil in SOLR-6

2017-06-01 Thread SOLR4189
In SOLR-4.10.1 I use DateUtil.parse in my UpdateProcessor for different datetime formats. In indexing of document datetime format is *-MM-dd'T'HH:mm:ss'Z'* and in reindexing document datetime format is *EEE MMM d hh:mm:ss z *. And it works fine. But what can I do in SOLR-6? I don't

Re: maxwarmingSearchers and memory leak

2017-03-05 Thread SOLR4189
1) We've actually got 60 to 80 GB of index on the machine (in the image below you can see that size of index on the machine 82GB, because all index is in path /opt/solr): 2) Our commits runs: autoSoftCommit - each 15 minutes and

Re: maxwarmingSearchers and memory leak

2017-02-26 Thread SOLR4189
Shawn, you are right. * OS vendor and version CentosOS 6.5 * Java vendor and version OpenJDK version 1.8.0_20 OpenJDK 64-bit Server VM (build 25.20-b23) * Servlet container used to start Solr. Catalina(tomcat7) * Total amount of memory in the server. 30 GB * Max heap size for Solr.

maxwarmingSearchers and memory leak

2017-02-23 Thread SOLR4189
We have maxwarmingSearchers set to 2 and field value cache set to initial size of 64. We saw that by taking a heap dump that our caches consume 70% of the heap size, by looking into the dump we saw that fieldValueCache has 6 occurences of org.apache.solr.util.concurrentCache. When we have

Re: Upgrade SOLR version - facets perfomance regression

2017-02-13 Thread SOLR4189
I finished to write FacetConverter, but I have a question: How do I config facet.threads parameter in Json Facet Api? I didn't find right syntax in the Confluence page. -- View this message in context:

Re: Upgrade SOLR version - facets perfomance regression

2017-02-13 Thread SOLR4189
I finished to write FacetConverter, but I have some questions: 1) How do I config facet.threads parameter in Json Facet Api? 2) How do I add facet.pivot to query? For example, I need *q=*:*=true=A,B* and I tried to write something like this:

Re: json facet api and facet.threads

2017-02-11 Thread SOLR4189
Did you get answer? I'm interesting also. -- View this message in context: http://lucene.472066.n3.nabble.com/json-facet-api-and-facet-threads-tp4306444p4319929.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Upgrade SOLR version - facets perfomance regression

2017-02-01 Thread SOLR4189
And still I have a question: Is there some convertor from the legacy api to the new API? Or a search component that converts from legacy api to json facet api? I explained why I need it in my first post. Thank you -- View this message in context:

Re: Upgrade SOLR version - facets perfomance regression

2017-02-01 Thread SOLR4189
Alessandro, it helped! Thank you. But I asked which changes do we do in configuration and I think these things must be documented in the reference guide. About your question, first of all I don't override default componets. Second of all, I add my own components and for many reasons (For example,

Re: Upgrade SOLR version - facets perfomance regression

2017-02-01 Thread SOLR4189
I noticed if I don't write list of components in request handler it works fine, but if I add something like query facet Facets don't work... How can you explian it? -- View this message in context:

Re: Upgrade SOLR version - facets perfomance regression

2017-01-31 Thread SOLR4189
Tom, I already tried this syntax (and many another different syntax). It still doesn't work. Are you sure that no need to change facet name in request handler to something else? -- View this message in context:

Re: Upgrade SOLR version - facets perfomance regression

2017-01-30 Thread SOLR4189
But I can't run Json Facet API. I checked on SOLR-5.4.1. If I write: localhost:9001/solr/Test1_shard1_replica1/myHandler/q=*:*=5=*=json=true=someField It works fine. But if I write: localhost:9001/solr/Test1_shard1_replica1/myHandler/q=*:*=5=*=json={field:someField} It doesn't work. Are you sure

Re: Upgrade SOLR version - facets perfomance regression

2017-01-30 Thread SOLR4189
After failing with SOLR-5.4.1, we checked SOLR-5.5.2 also. Most of our facet fields are multivalued. I see that Json Facet API is experimental API and I can't find how to use it (I am not speaking about syntax, I need to know how to use it from the point of view of configurations and jars). I

Re: Upgrade SOLR version - facets perfomance regression

2017-01-30 Thread SOLR4189
After failining SOLR-5.4.1, we checked SOLR-5.5.2 also. -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4317804.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Upgrade SOLR version - facets perfomance regression

2017-01-29 Thread SOLR4189
Method uif: we used it also but it didn't help Cardinality: high Field Type: string, tdate DocValued: yes, for all facet fields Facet Method: fc (but tried fcs and enum) Facet Params: 1. Mincount = 1 2. Limit = 11 3. Threads = -1 4. Query (on tdate field for each query) My question: if

Upgrade SOLR version - facets perfomance regression

2017-01-20 Thread SOLR4189
Before few months we upgraded our SOLR in production from 4.10.1 to 5.4.1. And at once we noticed perfomance regressions. After searchings in internet we found SOLR-8096 issue . So we had to downgrade SOLR version to 4.10.1. We want to be in the

Re: Solr custom document routing

2016-12-05 Thread SOLR4189
First of all, yes, you are right, we're trying to optimize quering, but not "just". In our company we arrived to the limit of resources that we can set to our servers (CPU and RAM). We need return to our example, fieldX=true is all the documents that are indexed in the last week (like "news", it

Solr custom document routing

2016-12-03 Thread SOLR4189
Lets say I have a collection with 4 shards. I need shard1 to contain all documents with fieldX=true and shard2-shard4 to contain all documents with fieldX=false. I need this to work while indexing and while quering. How can I do it in SOLR? -- View this message in context: