Re: Use TopicStream as percolator

2020-05-01 Thread SOLR4189
Hi everyone,

I wrote SOLR Update Processor that wraps Luwak library and implements Saved
Searches a la ElasticSearch Percolator.

https://github.com/SOLR4189/solcolator

for anyone who wants to use.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Weak Leader & Weak Replica VS Strong Leader

2020-03-21 Thread SOLR4189
Hi all,

Maybe a tricky question little bit, but I need to ask. Let's say I have
infinite RAM and infinite SSDs, but I have deficiency of CPU (Lets's say 4
CPU for each shard). So, my question is which is more preferable:

1. One leader with 4 CPU

OR

2. One leader with 2 CPU and one replica with 2 CPU

OR

3. One leader with 1 CPU and 3 replicas with 1 CPU each? 

I understand that the options with replicas are more preferable due to fault
tolerance, BUT what about PERFORMANCE theoretically? 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solutio for long time highlighting

2019-08-28 Thread SOLR4189
Hi all.

In our team we thought about some tricky solution for queries with long time
highlighting. For example, highlighting that takes more than 25 seconds. So,
we created our component that wraps highlighting component of SOLR in this
way:

public void inform(SolrCore core) {
. . . .
subSearchComponent = core.getSearchComponent("highlight");
. . . .
}

public void process(ResponseBuilder rb) throws Exception {
long timeout = 25000;
ExecutorService exec = null:
try {
exec = Executors.newSingleThreadExecutor();
Future future = exec.submit(() -> {
try {
subSearchComponent.process(rb);
} catch (IOException e) {
return e;
} 
return null;
});
Exception ex = future.get(timeout, TimeUnit.MILLISECONDS);
if (ex != null) {
throw ex;
}
} catch ( TimeoutException toe) {
. . . .
} catch (Exception e) {
   throw new IOException(e);
} finally {
if (exec != null) {
exec.shutdownNow();
}
}
}

This solution works, but sometime we see that searchers stay open and as a
result our RAM usage is pretty high (like a memory leak of SolrIndexSearcher
objects). And only after a SOLR service restart they disappear.

What do you think about this solution?
Maybe exists some built-in function for it?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Distributed IDF in Alias

2019-05-18 Thread SOLR4189
I ask my question due to I want to use TRA (Time Routed Aliases). Let's say
SOLR will open new collection every month. In the beginning of month a new
collection will be empty almost. 
So IDF will be different between new collection and collection of previous
month? 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Distributed IDF in Alias

2019-05-17 Thread SOLR4189
Hi all,

Can somebody explain me SOLR tip from  here

 
:
/"Any alias (standard or routed) that references multiple collections may
complicate relevancy. By default, SolrCloud scores documents on a per shard
basis. With multiple collections in an alias this is always a problem, so if
you have a use case for which BM25 or TF/IDF relevancy is important you will
want to turn on one of the ExactStatsCache implementations"/

But there is / "This implementation uses global values (across the
collection) for document frequency" / in ExactStatsCache documentation (from 
here

 
)

So what does it mean "across the collection"? Does it mean that distributed
IDF is inside the same collection (across shards)? If yes, how it will help
in the alias case?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: The easiest way to get array of matched terms

2019-05-06 Thread SOLR4189
Nice feature, but it isn't what I search.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


The easiest way to get array of matched terms

2019-05-06 Thread SOLR4189
Hi all,

What is the easiest way to get array of matched terms per doc? I don't need
positions or offsets, matched terms only. I found a way - debug=results, but
it requires to parse result (for example, extract term from
weight(field_name: term). Maybe does somebody know another way? Without
parsing?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Optimal RAM to size index ration

2019-04-15 Thread SOLR4189
All my queries from production environments, from real customers. I build
query player that runs queries in the same time intervals like in PRODUCTION
(all customers' queries with time intervals between them are saved in
splunk). So all queries are distinct.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Optimal RAM to size index ration

2019-04-15 Thread SOLR4189
No, I don't load index to RAM, but I run 8 hours queries, so OS must load
necessary files (segments) to RAM during my tests. So in the case where I
set 25GB for RAM, not all files will be loaded to RAM and I thought I'll see
degradation in queries times, but I didn't



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Optimal RAM to size index ration

2019-04-15 Thread SOLR4189
Hi all,

I have a collection with many shards. Each shard is in separate SOLR node
(VM) has 40Gb index size, 4 CPU and SSD. 

When I run performance checking with 50GB RAM (10Gb for JVM and 40Gb for
index) per node and 25GB RAM (10Gb for JVM and 15Gb for index), I get the
same queries times (percentile80, percentile90 and percentile95). I run the
long test - 8 hours production queries and updates.
 
What does it mean? All index in RAM it not must? Maybe is it due to SSD? How
can I check it?

Thank you.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Strange disk size behavior

2019-03-21 Thread SOLR4189
Hi all. 
We use SOLR-6.5.1 and in our cluster each solr core is placed in different
virtual machine (one core per one node). Each virtual machine has 104 Gb
size of disk.  
Yesterday we marked that several solr cores use disk space in the abnormal
manner.
In running command *"df -h
/opt/solr/CollectionName_shardX_replicaY/data/index"* we saw that 92GB of
disk is occupied, but size of index in this machine is 62GB by solr cloud
(also by command *"ls -l
/opt/solr/CollectionName_shardX_replicaY/data/index"*). After restart solr
service, df -h also reports 62GB occupied place in disk.

Does somebody know what is it?
Can it be somehow connected to our deletes? (we run each night delete by
query command for deleting expired documents)?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re-read from CloudSolrStream

2019-02-18 Thread SOLR4189
Hi all,

Let's say I have a next code:

http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
  

public class StreamingClient {

   public static void main(String args[]) throws IOException {
  String zkHost = args[0];
  String collection = args[1];

  Map props = new HashMap();
  props.put("q", "*:*");
  props.put("qt", "/export");
  props.put("sort", "fieldA asc");
  props.put("fl", "fieldA,fieldB,fieldC");
  
  CloudSolrStream cstream = new CloudSolrStream(zkHost, 
collection, 
props);
  try {
   
cstream.open();
while(true) {
  
  Tuple tuple = cstream.read();
  if(tuple.EOF) {
 break;
  }

  String fieldA = tuple.getString("fieldA");
  String fieldB = tuple.getString("fieldB");
  String fieldC = tuple.getString("fieldC");
  System.out.println(fieldA + ", " + fieldB + ", " + fieldC);
}
  
  } finally {
   cstream.close();
  }
   }
}

What can I do if I get exception in the line *Tuple tuple =
cstream.read();*? How can I re-read the same tuple, i.e. to continue from
exception moment ? 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Createsnapshot null pointer exception

2019-02-18 Thread SOLR4189
I think, you don't understand what I mean. 

1) I create collection with X shards, each shard has hash range (by CREATE
collection command)
2) I add Y new shards in the same collection, each shard hasn't hash range,
I call them gateways (by CREATE core command)
3) I add LoadBalancer over Y gateways, so all client queries will pass
through gateways

In this case, my Y gateways forward queries and merge results only (WITHOUT
searching in their index) and my X shards search in index only (WITHOUT
forward queries and merge results). It gives me the best queries
performance.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Createsnapshot null pointer exception

2019-02-12 Thread SOLR4189
Ok. I understood my problem. Usually I create collection with X shards and
then add some Y cores. This Y cores I use like gateways or federators (my
web application sends queries to load balancer that connected to Y cores
only).

When I create Y cores, I used this command
*http://:/solr/admin/cores?action=create==*

So always I got Y cores with the same name (collection name) and due to
CREATESNAPSHOT command doesn't work.

Solution is to use something like this if you need to add new cores after
creating collection:
*http://:/solr/admin/cores?action=create=_==*





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Createsnapshot null pointer exception

2019-02-12 Thread SOLR4189
Ok. I understood my problem. Usually I create collection with X shards and
then add some Y cores. This Y cores I use like gateways or federators (my
web application sends queries to load balancer that connected to Y cores
only).

When I create Y cores, I used this command
*http://:/solr/admin/cores?action=create==*

So always I got Y cores with the same name (collection name) and due to
CREATESNAPSHOT command doesn't work.

Solution is to use something like this if you need to add new cores after
creating collection:
*http://:/solr/admin/cores?action=create=_==*





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Createsnapshot null pointer exception

2019-02-11 Thread SOLR4189
Hi all,

I use SOLR-6.5.1. When I run this command:

*http://my_server_name:8983/solr/admin/collections?action=CREATESNAPSHOT=collection_name=MYCommit*

I got this exception:
Collection: collection_name operation: createsnapshot failed:
java.lang.NullPointerException
  at
org.apache.solr.cloud.CreateSnapshotCmd.call(CreateSnapshotCmd.java:128)
  at
org.apache.solr.cloud.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java)
 ...

Does somebody know what's a problem?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Curator in SOLR

2019-01-13 Thread SOLR4189
Hi all,

I want to use TimeRoutedAlias collection. But first of all I have a question
: Does Solr have something like  Curator
   in ElasticSearch? How
can I manage/move old read-only collections to "weaker hardware"?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Backup in SOLR 6.5.1

2018-12-25 Thread SOLR4189
Hi all,

I use SOLR-6.5.1 and I want to understand how to work with backups in SOLR. 
I did some checks in SOLR-6.5.1 and I have some problems:

1. If I backup a dynamic collection (while there's constant indexing in the
background), I get a NoSuchFileException, but in a static collection (with
no indexing going on) or a dynamic collection in which I call commit right
before the backup action it works fine. Why does it work this way?

2. How can I restore shards on the same nodes that were before backup? 

3. How can I restore state.json and not clusterstate.json? I know about
MIGRATESTATEFORMAT action but I ask about built-in action in RESTORE.

4. How can I restore specific shards instead the entire collection?

Thank you.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Index size issue in SOLR-6.5.1

2018-10-07 Thread SOLR4189
About which details do you ask? Yesterday we restarted all our solr services
and index size in serverX descreased from 82Gb to 60Gb, and in serverY index
size didn't change (49Gb).



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Index size issue in SOLR-6.5.1

2018-10-07 Thread SOLR4189
Hi all,

We use SOLR-6.5.1 and we have very strange issue. In our collection index
size is very different from server to server (33gb difference):
1. We have index size 82Gb in serverX and 49Gb in serverY
2. ServerX displays 82gb used place if we run "df -h
/opt/solr/Xxx_shardX_replica1/data/index"
and through web admin ui it displays 60gb used place.

What can it be? Why do we have difference between server? Between server and
web admin ui?

Thank you.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Replication error in SOLR-6.5.1

2018-09-25 Thread SOLR4189
Hi all,

I use SOLR-6.5.1. Before couple weeks I started to use replication feature
in cloud mode without override default behavior of ReplicationHandler.

After deployment replication feature to production, almost every day I hit
these errors:
SolrException: Unable to download  completely. Downloaded x!=y
OR
SolrException: Unable to download  completely. (Downloaded x of y
bytes) No space left on device
OR
Error deleting file: 
NoSuchFileException: /opt/solr//data/index./

All these errors I get when replica in recovering mode, sometimes after
physical machine failing or sometimes after simple solr restarting. Today I
have only one solution for it: after 5th unsuccess replica recovering, I
remove replica and add replica anew.

In all my solr servers I have 40% free space, hard/soft commit is 5 minutes.


What's wrong here and what can be done to correct these errors?
Due to free space or commitReserveDuration parameter or something else?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


SOLR in Openshift with indexing from Hadoop

2018-07-24 Thread SOLR4189
Hi all,

We try to use SOLR cloud in openshift. We manage our Solr by StatefulSet.
All SOLR functionalities work good except indexing.
We index our docs from HADOOP by SolrJ jar that try to index to specific
Pod, but openshift blocks access to internal Pods.

In my case, separate service for external traffic to solr doesn't help
because SolrJ jar looks for pod names in zookeeper.

Does somebody encounter this problem? What can I do in this case?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Using LUWAK in SOLR

2018-06-25 Thread SOLR4189
Ok. If somebody needs I found solution:

https://github.com/flaxsearch/luwak/issues/173



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Using LUWAK in SOLR

2018-06-22 Thread SOLR4189
Does somebody use LUWAK for percolator functionality in UpdateProcessor? I
noticed that when I passed my docs in batches (3000 docs in batch) through
Monitor I don't get all matching pairs. When I passed my docs in batches
with one doc per batch I get all results. What can it be? Has LUWAK batch
size limit? I didn't found...

I'm using ParallelMatcher with SimpleMatcher inside (score doesn't matter
for me), in monitor loaded 3379 queries.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Indexing to replica instead leader

2018-06-08 Thread SOLR4189
I'm using SOLR 6.5.1 in cloud mode with replicas. I read  here

 
:

/When a document is sent to a Solr node for indexing, the system first
determines which Shard that document belongs to, and then which node is
currently hosting the leader for that shard. The document is then forwarded
to the current leader for indexing, and the leader forwards the update to
all of the other replicas./

So my question, what does happen when I'm sending index request to replica
server instead leader server?

Replica becomes a leader for this request? Or replica becomes only federator
that resends request to leader and then leader will resend to replica?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Different docs order in different replicas of the same shard

2018-06-08 Thread SOLR4189
I think that I found very simple solution: to set my updateProcessorsChain to
default="true" and it is solving all my problems without moving all
post-updateprocessors to be pre-updateprocessors. What do you think about
it?






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Different docs order in different replicas of the same shard

2018-06-01 Thread SOLR4189
I thought about next solution for the my problem: Atomic Update + Replicas.

I can set my *UpdateProcessorsChain* in the next order:


  ..  .




MergerUpdateProcessor will use getUpdatedDocument function of 
DistibutedUpdateProcessor

 
.

/  public void processAdd(AddUpdateCommand cmd) {
if (!AtomicUpdateDocumentMerger.isAtomicUpdate(cmd)) {
*super.processAdd(cmd);
return;*
}
*AtomicUpdateDocumentMerger docMerger = new
AtomicUpdateDocumentMerger(cmd.getReq);*
Set inPlaceUpdatedFields =
AtomicUpdateDocumentMerger.computeInPlaceUpdatableFields(cmd);
if (inPlaceUpdatedFields.size() > 0) { // non-empty means this is
suitable for in-place updates
  if (docMerger.doInPlaceUpdateMerge(cmd, inPlaceUpdatedFields)) {
*super.processAdd(cmd);
return;*
  } else {
// in-place update failed, so fall through and re-try the same with
a full atomic update
  }
}   
// full (non-inplace) atomic update
SolrInputDocument sdoc = cmd.getSolrInputDocument();
BytesRef id = cmd.getIndexedId();
SolrInputDocument oldDoc =
RealTimeGetComponent.getInputDocument(cmd.getReq().getCore(), id);
if (oldDoc == null) {
  // create a new doc by default if an old one wasn't found
  if (*cmd.getVersion()* <= 0) {
oldDoc = new SolrInputDocument();
  } else {
// could just let the optimistic locking throw the error
throw new SolrException(ErrorCode.CONFLICT, "Document not found for
update.  id=" + cmd.getPrintableId());
  }
} else {
  oldDoc.remove(CommonParams.VERSION_FIELD);
}
cmd.solrDoc = docMerger.merge(sdoc, oldDoc);
*super.processAdd(cmd);*
 }/

What do you think about my solution (all changes in source code are marked
in bold)? I checked it in my test environment, and it worked fine. Maybe do
I miss something? Edge cases?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Different docs order in different replicas of the same shard

2018-05-25 Thread SOLR4189
You are right, BUT I have two indexers (one in WCF service and one in HADOOP)
and in two my indexers I'm using atomic updates in each document. According
to  Atomic Update Processor Factory
  
and according to your solution (to set all my processors before
DistributedUpdateProcessor), all my processors will run on partial documents
only, but I need on full documents. So, what can I do in this situation?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Different docs order in different replicas of the same shard

2018-05-25 Thread SOLR4189
I use SOLR-6.5.1 and I want to start to use replicas.

For it I want to understand something:

1) Can asynchronous forwarding document from leader to all replicas or some
another reasons cause that replica A may see update X then Y, and replica B
may see update Y then X? 
If yes, thus a particular document in replicaA might sort differently
relative to a document from replicaB if they have the same score (in the
same order as they were stored in the index). Is it an edge case?

2) What does it mean  Custom update chain post-processors may never be
invoked on a recovering replica
  ,
if all my UpdateProcessors are post-processors (i.e. are after
DistributedUpdateProcessor)? Will all buffered update requests in recovery
be indexed in replica without my features?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Atomic update with condition

2018-04-11 Thread SOLR4189
Hi all,

How can I change field value by specific condition in indexing? 

Indexed Doc in SOLR: { id:1, foo:A }
Indexing Doc into SOLR: { id:1, foo: B }

foo is single value field.

Let's say I want to replace value of foo from A to B, if A > B, else do
nothing.

Thank you.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Decision on Number of shards and collection

2018-04-11 Thread SOLR4189
I advise you to read the book Solr in Action. To answer your question you
need to take account server resources that you have (CPU, RAM and disk),
take account index size and take account average size single doc.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Use TopicStream as percolator

2018-04-08 Thread SOLR4189
Hi all,

I need to implement percolator functionality in SOLR (i.e. get all indexed
docs that are matched to monitored query). How can I do this? 

I found out in Solr TopicStream class. If I understand right, using
TopicStream with DaemonStream will give me percolator functionality, won't
it? (like here  Continuous Pull Streaming

 
) 

Is it a good idea to use *Continuous Pull Streaming* in production? How many
queries can I monitor in such way? ( I need up to 1000 queries and I have up
to million indexed docs per day)

And one more thing, I debug DaemonStream/TopicStream code and I don't
understand what is the advantage of this over simple loop in which I'll send
queries each X seconds/minutes/hours to SOLR? Will it work faster than
simple loop? If yes, why? Due to filter query on checkpoint version
(*solrParams.add("fq", "{!frange cost=100 incl=false
l="+checkpoint+"}_version_")*)? I'll be happy to understand all advantages
of using DaemonStream/TopicStream.

Thank you.
 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Replicas: sending query to leader and replica simultaneously

2018-03-04 Thread SOLR4189
Today I found something interesting that exists in ElasticSearch. It's called
Adaptive Replica Selection 
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/search.html

Did you hear about it? Maybe exists something in SOLR? I think it's very
useful for my case.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Replicas: sending query to leader and replica simultaneously

2018-02-14 Thread SOLR4189
Thank you, Emir for your answer

*But it will not send request to multiple replicas - that would be a waste
of resources.*
What if server is overloaded, but it is responsive? Then it will not be a
waste of resources, because second replica will response faster then
overloaded replica.


*and flag unresponsive one*
Until when it will marked unresponsive? If solr will check it every request,
it is also would be a waste of resources...





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Replicas: sending query to leader and replica simultaneously

2018-02-13 Thread SOLR4189
Hi all,

I use SOLR-6.5.1 and I want to start to use replicas in SolrCloud mode. I
read ref guide and Solr in Action, and I want to make sure only one thing
about REPLICAS:

SOLR can't send query both to leader and to slave simultaneously and returns
the fastest response of them?

(in the case leader or slave is active, but one of them is overloaded and
responses a lot of time).

Thank you. 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Using replicas in SOLR-6.5.1

2018-01-27 Thread SOLR4189
1. You are right, due to memory and garbage collection issues I set each
shard to different VM. So in my VM I has 50 GB RAM (10 GB for JVM and 40 GB
for index) and it works good for my using case. Maybe I don't understand
solr terms, but if you say to set one VM for 20 shards what does it mean? 20
nodes or 20 JVMs or 20 solr instances on the same virtual server? Can you
explain what did you mean?

2. I speak about like issues: "facet perfomance regression" or "using ltr
with grouping" or "using timeAllowed with grouping". Something that will
stop me to use replicas feature. Sometimes I don't understand solr issues,
for example, if bug is unresolved and affects version 4.10 and fix version
none, what does it mean? This bug can happen in solr-6.5.1 also?

3. Yes, I'm familiar with the Solr Collection API.

I preferred to set each shard to different small VMs. 

Just make sure with you *one solr node = one JVM = one solr instance = one
or many shards?
*

Thank you.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Using replicas in SOLR-6.5.1

2018-01-27 Thread SOLR4189
I use SOLR-6.5.1. I would like to use SolrCloud replicas. And I have some
questions:

1) What is the best architecture for this if my collection contains 20
shards, and each shard is in different vm? 40 vms where 20 for leaders and
20 for replicas? Or maybe stay with 20 vms where leader and replica (of
another leader) in the same vm but to add RAM?

2) What are opened issues about replicas in SOLR-6.5.1 that I need to check?

3) If I use SolrCloud replica, which configuration parameters should I
change? Which can I change?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Using TimeAllowed parameter in SOLR-6.5.1

2018-01-14 Thread SOLR4189
I started to use timeAllowed parameter in SOLR-6.5.1. And got too many (each
second) exceptions
null:java.lang.NullPointerException
  at
org.apache.lucene.search.TimeLimitingCollector.needScores(TimeLimitingCollector.java:166)
caused to perfomance problems.

For reproducing exception  need group=true==0 in query.

Can somebody explain this strange behavior?
Is it related to SOLR-6156?
How can I solve it?

I  noticed at least two things don't work in Solr-6.5.1 with grouping (ltr
and timeAllowed). Do you know another features?

Thank you.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


SOLR 6.5.1: timeAllowed parameter with grouping

2017-12-21 Thread SOLR4189
A month ago we upgraded our SOLR from 4.10.1 to 6.5.1. Now we want to use
timeAllowed parameter that was fixed in Solr 5. We check this parameter in
test servers and we don't understand if it works with group=true or not. 

* If we set group=false and timeAllowed=1 and query with too many terms:
sometimes it stops query, returns partial results and writes to log was
timeout and sometimes does nothing (oblivious to timeAllowed)

* If we set group=true and timeAllowed=1 and query with too many terms:
sometimes writes to log was timeout without stopping query and without
returning partial result  and sometimes does nothing (oblivious to
timeAllowed)

1) Can somebody explain this behavior? 
2) Does timeAllowed parameter work with grouping in SOLR-6.5.1 (We saw
SOLR-6156 issue that was updated 22/Apr/15 09:20 in the last time)?
3) If query was stopped, partial results will be saved in cache?




 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


SOLR 6.5.1: timeAllowed parameter with grouping

2017-12-21 Thread SOLR4189
A month ago we upgraded our SOLR from 4.10.1 to 6.5.1. Now we want to use
timeAllowed parameter that was fixed in Solr 5. We check this parameter in
test servers and we don't understand if it works with group=true or not. 

* If we set group=false and timeAllowed=1 and query with too many terms:
sometimes it stops query, returns partial results and writes to log was
timeout and sometimes does nothing (oblivious to timeAllowed)

* If we set group=true and timeAllowed=1 and query with too many terms:
sometimes writes to log was timeout without stopping query and without
returning partial result  and sometimes does nothing (oblivious to
timeAllowed)

1) Can somebody explain this behavior? 
2) Does timeAllowed parameter work with grouping in SOLR-6.5.1 (We saw
SOLR-6156 issue that was updated 22/Apr/15 09:20 in the last time)?
3) If query was stopped, partial results will be saved in cache?




 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Some problems in SOLR-6.5.1

2017-10-25 Thread SOLR4189
Ofcource I did it. I did all changes in solrconfig.xml and used IndexUpgrader
from 4 to 5 and then from 5 to 6.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Some problems in SOLR-6.5.1

2017-10-24 Thread SOLR4189
Before two days we upgraded our SOLR servers from 4.10.1 version to 6.5.1. We
explored logs and saw too many errors like:

1)
org.apache.solr.common.SolrException; null:java.lang.NullPointerException
  at
org.apache.solr.search.grouping.distributed.responseprocessor.StoredFieldsShardResponseProcessor.process(StoredFieldsShardResponseProcessor.java:41)
  at
org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:771)
 . . .

We don't know from which queries it throws.

2) Second error or something strange that we saw in logs - sometimes SOLR
service restarts automatically without any error

Can somebody help to us? Does someone have problems like ours?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Number of threads in SOLR grew up without blacklight

2017-09-01 Thread SOLR4189
I'm not sure if this forum is good place for my question, but I want to try.
Maybe somebody can help me.

I have web application is based on blacklight for working with SOLR (also I
use ruby gem for SOLR connection - rsolr). My task is to remove blacklight
from my application. In the last two weeks I tried twice to change my
application to work without blacklight and with rsolr only, but each time
number of threads in SOLR grew up and it worked very slowly. I checked
thread dump (85% of all threads were BLOCKED facetExecutor threads). 

Does somebody know why it happens?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

2017-08-13 Thread SOLR4189
> If you are changing things like WordDelimiterFilterFactory to the graph 
> version, you'll definitely want to reindex

What does it mean "*want to reindex*"? If I change
WordDelimiterFilterFactory to the graph and use IndexUpgrader is it mistake?
Or changes will not be affected only?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-order-of-docs-between-SOLR-4-10-4-to-SOLR-6-5-1-tp4349021p4350413.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

2017-08-11 Thread SOLR4189
Yes, only because I'm seeing different results. 

For example, changing *WordDelimiterFilterFactory *to
*WordDelimiterGraphFilterFactory * can change order of docs? (
http://lucene.apache.org/core//6_5_1/analyzers-common/index.html?deprecated-list.html

 
)

For building index I tried 2 ways: 1) Dataimport from SOLR-4 to SOLR-6 and
2) IndexUpgraderTool
And in both ways order of docs is different.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-order-of-docs-between-SOLR-4-10-4-to-SOLR-6-5-1-tp4349021p4350172.html
Sent from the Solr - User mailing list archive at Nabble.com.


Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

2017-08-04 Thread SOLR4189
Hey all,
I need to upgrade from SOLR-4.10.3 to SOLR-6.5.1 in production environment. 
When I checked it in the test environment, I noticed the order of returned
docs for each query is different. The score has changed as well. I use same
similarity algorithm - OccapiBM25 as in previous version. Number of shards
and number of docs by shards also haven't changed.

Does it normal? 
What might be the causes for such behavior?

Regards.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-order-of-docs-between-SOLR-4-10-4-to-SOLR-6-5-1-tp4349021.html
Sent from the Solr - User mailing list archive at Nabble.com.


Using ASCIIFoldingFilterFactory

2017-07-03 Thread SOLR4189
Hey all,
I need to convert alphabetic, numeric and symbollic unicode characters to
their ASCII equivalents. The solr.ASCIIFoldingFilterFactory is the solution
for my request. I'm wondering if my usage of the filter is correct and if
anyone encountered any problems using the specified filter (I'm using
Solr-4.10.3). 
The image included specifies my usage of the filter. Thanks in advance!


 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-ASCIIFoldingFilterFactory-tp4343999.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 6: how to get SortedSetDocValues from index by field name

2017-06-20 Thread SOLR4189
Hi, Tomas. It helped. Thank you.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-how-to-get-SortedSetDocValues-from-index-by-field-name-tp4340388p4342002.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 6: how to get SortedSetDocValues from index by field name

2017-06-13 Thread SOLR4189
How do I get SortedSetDocValues from index by field name?

I try it and it works for me but I didn't understand why to use
leaves.get(0)? What does it mean? (I saw such using in
TestUninvertedReader.java of SOLR-6.5.1):

*Map mapping = new HashMap<>();
mapping.put(fieldName, UninvertingReader.Type.SORTED);

SolrIndexSearcher searcher = req.getSearcher();

DirectoryReader dReader = searcher.getIndexReader();
LeafReader reader = null;

if (!dReader.leaves.isEmpty()) {
  reader = dReader.leaves().get(0).reader;
  return null;
}

SortedSetDocValues sourceIndex = reader.getSortedSetDocValues(fieldName);*

Maybe do I need to use SlowAtomicReader, like it:

*
UninvertingReader reader = new
UninvertingReader(searcher.getSlowAtomicReader(), mapping)*;

What is right way to get SortedSetDocValues and why?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-how-to-get-SortedSetDocValues-from-index-by-field-name-tp4340388.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Different DateTime format in dataimport and index

2017-06-06 Thread SOLR4189
I don't use DB. I do dataimport from one collection of SOLR to another
collection with the same configuration. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-DateTime-format-in-dataimport-and-index-tp4339230p4339244.html
Sent from the Solr - User mailing list archive at Nabble.com.


Different DateTime format in dataimport and index

2017-06-06 Thread SOLR4189
Let's say I have SolrDoc: 
*{id: test1, price: 100, name: pizza, pickupTime: 2017-06-06T19:00:00}*,
where type of id is int, type of price is float, type of name is string and
type of pickupTime is tdate/date. And let's say I have my update processor
that writes to log indexed item. 

So, my question is why in indexing of item I see in log:
*{id: test1, price: 100, name: pizza, pickupTime: 2017-06-06T19:00:00}*
and in reindex or dataimport I see in log:
*{id: test1, price: 100.0, name: pizza, pickupTime: Tue Jun 6 19:00:00 IDT
2017}*

Why do float and date have different format in index and dataimport? Is it
SOLR bug?
How can I change dataimport format to index format?
Which are types have different format like float and date?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-DateTime-format-in-dataimport-and-index-tp4339230.html
Sent from the Solr - User mailing list archive at Nabble.com.


DateUtil in SOLR-6

2017-06-01 Thread SOLR4189
In SOLR-4.10.1 I use DateUtil.parse in my UpdateProcessor for different
datetime formats.
In indexing of document datetime format is *-MM-dd'T'HH:mm:ss'Z'* and in
reindexing document datetime format is *EEE MMM d hh:mm:ss z *. And it
works fine.

But what can I do in SOLR-6? I don't understand  this issue
  . How *using new
Date(Instant.parse(d).toEpochMilli()); for parsing and
DateTimeFormatter.ISO_INSTANT.format(d.toInstant()) for formatting* will be
help if I want the same behavior?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DateUtil-in-SOLR-6-tp4338503.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: maxwarmingSearchers and memory leak

2017-03-05 Thread SOLR4189
1) We've actually got 60 to 80 GB of index on the machine (in the image below
you can see that size of index on the machine 82GB, because all index is in
path /opt/solr):
 

2) Our commits runs: autoSoftCommit - each 15 minutes and autoHardCommit -
each 30 minutes
and our commits take 10 seconds only

3) ConcurrentLFUCaches (that you saw in the image in the previous message)
aren't filterCaches, they are fieldValueCaches

4) Solr top:
 

5) We don't know if this related to problem, but all our SOLR servers are
virtual servers.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxwarmingSearchers-and-memory-leak-tp4321937p4323509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: maxwarmingSearchers and memory leak

2017-02-26 Thread SOLR4189
Shawn, you are right.
* OS vendor and version 
CentosOS 6.5

* Java vendor and version
OpenJDK version 1.8.0_20
OpenJDK 64-bit Server VM (build 25.20-b23)

* Servlet container used to start Solr. 
Catalina(tomcat7)

* Total amount of memory in the server. 
30 GB

* Max heap size for Solr. 
8GB(JVM)

* An idea of exactly what is running on the server. 
On our servers runs solr service only and splunk forwarder

* Total index size and document count being handled by Solr (add up all 
indexes). 
60GB and 2.6 milion on one shard

* A screen shot of a process list sorted by memory usage. 
 

* A screenshot showing total system memory allocations
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxwarmingSearchers-and-memory-leak-tp4321937p4322362.html
Sent from the Solr - User mailing list archive at Nabble.com.


maxwarmingSearchers and memory leak

2017-02-23 Thread SOLR4189
We have maxwarmingSearchers set to 2 and field value cache set to initial
size of 64. We saw that by taking a heap dump that our caches consume 70% of
the heap size, by looking into the dump we saw that fieldValueCache has 6
occurences of org.apache.solr.util.concurrentCache.
When we have maxWarmingSearches=2 we would expect to have only 3 (maybe 4
before GC has been launched).
What can it be? We use solr4.10.1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxwarmingSearchers-and-memory-leak-tp4321937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade SOLR version - facets perfomance regression

2017-02-13 Thread SOLR4189
I finished to write FacetConverter, but I have a question: 
How do I config facet.threads parameter in Json Facet Api? 
  

I didn't find right syntax in the Confluence page.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4320104.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade SOLR version - facets perfomance regression

2017-02-13 Thread SOLR4189
I finished to write FacetConverter, but I have some questions:
  1) How do I config facet.threads parameter in Json Facet Api?
  2) How do I add facet.pivot to query? For example, I need 
  *q=*:*=true=A,B*
and I tried to write something like this: 
 *q=*:*={ A_B : {type:terms, field:A,
facet:{type:terms, field:B} } }*
but I get error: wrong aggr_B field.

I didn't find right syntax in the Confluence page. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4320103.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: json facet api and facet.threads

2017-02-11 Thread SOLR4189
Did you get answer? I'm interesting also.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/json-facet-api-and-facet-threads-tp4306444p4319929.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade SOLR version - facets perfomance regression

2017-02-01 Thread SOLR4189
And still I have a question:
Is there some convertor from the legacy api to the new API?
Or a search component that converts from legacy api to json facet api?

I explained why I need it in my first post.

Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4318241.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade SOLR version - facets perfomance regression

2017-02-01 Thread SOLR4189
Alessandro, it helped! Thank you.
But I asked which changes do we do in configuration and I think these things
must be documented in the reference guide.
About your question, first of all I don't override default componets. Second
of all, I add my own components and for many reasons (For example, I checked
permissions before each query with my own component).




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4318240.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade SOLR version - facets perfomance regression

2017-02-01 Thread SOLR4189
I noticed if I don't write list of components in request handler it works
fine, but if I add something like

  query
  facet
 
Facets don't work...
How can you explian it?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4318187.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade SOLR version - facets perfomance regression

2017-01-31 Thread SOLR4189
Tom, I already tried this syntax (and many another different syntax). It
still doesn't work. Are you sure that no need to change facet name in
request handler to something else?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4318174.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade SOLR version - facets perfomance regression

2017-01-30 Thread SOLR4189
But I can't run Json Facet API. I checked on SOLR-5.4.1.
If I write:
localhost:9001/solr/Test1_shard1_replica1/myHandler/q=*:*=5=*=json=true=someField
It works fine. But if I write:
localhost:9001/solr/Test1_shard1_replica1/myHandler/q=*:*=5=*=json={field:someField}
It doesn't work.
Are you sure that it is built-in? If it is built-in, why I can't find
explanation about it in reference guid?
Thank you for your help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4317931.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade SOLR version - facets perfomance regression

2017-01-30 Thread SOLR4189

After failing with SOLR-5.4.1, we checked SOLR-5.5.2 also.
Most of our facet fields are multivalued.
I see that Json Facet API is experimental API and I can't find how to use it
(I am not speaking about syntax, I need to know how to use it from the point
of view of configurations and jars).
I came to the conclusion that inorder to upgrade SOLR, I need to give up
facets or to leave in SOLR-4.10.1. Is it right?



alessandro.benedetti wrote
> Hi,
> Reading in here : https://issues.apache.org/jira/browse/SOLR-8466
> It seems that uif has been introduced in Solr 5.5 ( you were using 5.4.1 ,
> were't you?)
> Furthermore, I would recommend to check if your field is/isn't
> multi-valued
> ( that could affect as well)
> It is weird you don't get any benefit from docValues though...
> 
> Cheers

Quoted from: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4317781.html




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4317806.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade SOLR version - facets perfomance regression

2017-01-30 Thread SOLR4189
After failining SOLR-5.4.1, we checked SOLR-5.5.2 also. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4317804.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade SOLR version - facets perfomance regression

2017-01-29 Thread SOLR4189
Method uif: we used it also but it didn't help
Cardinality: high
Field Type: string, tdate
DocValued: yes, for all facet fields
Facet Method: fc (but tried fcs and enum)
Facet Params:
  1. Mincount = 1
  2. Limit = 11
  3. Threads = -1
  4. Query (on tdate field for each query)

My question: if Json Facet Api is good enough and if exists some converter
from old facet api to new facet api?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4317716.html
Sent from the Solr - User mailing list archive at Nabble.com.


Upgrade SOLR version - facets perfomance regression

2017-01-20 Thread SOLR4189
Before few months we upgraded our SOLR in production from 4.10.1 to 5.4.1.
And at once we noticed perfomance regressions. After searchings in internet
we found  SOLR-8096 issue  
. So we had to downgrade SOLR version to 4.10.1. 
We want to be in the latest Solr version (5 or 6), but  
  1. Facets perfomance is very important for us.
  2. We can't use new Facet API, because our client application uses
blacklight-4.5.0 (without option for upgrade).

What can we do in this situations? If the truth that new Facet API have to
return perfomance of SOLR-4.10.1? Maybe does some convertor from new api to
old api exist?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom document routing

2016-12-05 Thread SOLR4189
First of all, yes, you are right, we're trying to optimize quering, but not
"just". In our company we arrived to the limit of resources that we can set
to our servers (CPU and RAM). We need return to our example, fieldX=true is
all the documents that are indexed in the last week (like "news", it may be
first_indexed_time:[NOW/DAY-7DAY TO *]), and fieldX=false is for all the
documents that were first inserted to the system before the last 7 days (it
may be first_indexed_time:[* TO NOW/DAY-7DAY]. We also think about two
collections (first for "news" and second for "old" items), but we have
tf/idf problem between two collections ("news" collection is very small
relative to "old" collection) since we are using solr 4 and there is no
distributed IDF.

Second of all, we have already measured the perfomance. We did a naive
experiment: created two collections: one is a small collection (all the new
documents) and one is a big collection (the other documents). Also we have
created alias that units the two collections. We saw that this architecture
improved perfomance by 30% (query time and throughput) in compare to the
case when we used only one collection.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-document-routing-tp4308432p4308481.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr custom document routing

2016-12-03 Thread SOLR4189
Lets say I have a collection with 4 shards. I need shard1 to contain all
documents with fieldX=true and shard2-shard4 to contain all documents with
fieldX=false. I need this to work while indexing and while quering. How can
I do it in SOLR?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-document-routing-tp4308432.html
Sent from the Solr - User mailing list archive at Nabble.com.