Re: Gentle reminder RE: Object not fetched because its identifier appears to be already in processing

2018-02-27 Thread Shawn Heisey

On 2/28/2018 12:06 AM, YELESWARAPU, VENKATA BHAN wrote:

Thank you for your reply Shawn. I'm not part of that user list so I never 
received any emails so far.
Could you please subscribe me (vyeleswar...@statestreet.com) or let me know the 
process?
Also I would greatly appreciate if you could forward any responses received for 
this issue.


You can subscribe yourself.  That's not something I can do for you.

http://lucene.apache.org/solr/community.html#mailing-lists-irc


To answer your question, we see these messages in the solr log file. Solr search option 
is visible on the UI but when we search for a text, it says "No results found".
The index files are not getting generated/created. We have the index job 
scheduled to run every min, and solr log file is filled with below messages.
"Object not fetched because its identifier appears to be already in processing".


Can you place that logfile (ideally the whole thing) somewhere and 
provide a URL for accessing it?  There are many paste websites and many 
file sharing sites that you can use to do this.  With the actual 
logfile, hopefully the problem can be found.


If I do a google search for that error message, the only thing that 
comes up is messages from you.  It doesn't appear to be something that 
people have encountered before.



These are the Solr & lucene versions.
 solr-spec4.3.1
 solr-impl 4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33
 lucene-spec4.3.1
 lucene-impl 4.3.1 1491148 - shalinmangar - 2013-06-09 12:07:58


If we determine that there *is* a bug, it will need to be demonstrated 
in the current version (7.2.1) before it can be fixed.  There will be no 
more 4.x releases.  As you can see, the version you're running is nearly 
five years old.



Solr master and slave configuration is working fine and I'm able to access the 
urls.
All we are trying is to make the search function work on UI. Please let me know 
if you need any more details.


What happens if you leave the query as *:* and execute it? This is 
special syntax for all documents.


Thanks,
Shawn



Re: Changing Leadership in SolrCloud

2018-02-27 Thread Shalin Shekhar Mangar
When you say it is active, I presume you mean the "state" as returned by
the Cluster Status API or as shown on the UI. But is it still the leader?
Are you sure the firewall rules are correct? Do you see disconnected or
session expiry exceptions in the leader logs?

On Wed, Feb 28, 2018 at 12:21 PM, Zahra Aminolroaya  wrote:

> Thanks Shalin. our "zkClientTimeout" is 3, so the leader should be
> changed by now; However, the previous leader is still active.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Changing Leadership in SolrCloud

2018-02-27 Thread Zahra Aminolroaya
Thanks Shalin. our "zkClientTimeout" is 3, so the leader should be
changed by now; However, the previous leader is still active.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


solr src 6.0 ant error

2018-02-27 Thread 苗海泉
I encountered a problem, when I was in the process of compiling solr6.0
source error, I have installed the ant and ivy, and then solr6 source code
catalog Executive eclipse ant eclipse would like to generate a project
error as follows "
Buildfile: D: \ solr-6.0.0-src \ solr-6.0.0 \ build.xml

BUILD FAILED
D: \ solr-6.0.0-src \ solr-6.0.0 \ build.xml: 21: The following error
occurred while executing this lin
D: \ solr-6.0.0-src \ solr-6.0.0 \ lucene \ common-build.xml: 570:
java.lang.NullPointerException
at java.util.Arrays.stream (Arrays.java:5004)
at java.util.stream.Stream.of (Stream.java:1000)
at java.util.stream.ReferencePipeline $ 7 $ 1.accept
(ReferencePipeline.java:267)
at java.util.stream.ReferencePipeline $ 3 $ 1.accept
(ReferencePipeline.java:193)
at java.util.Spliterators $ ArraySpliterator.forEachRemaining
(Spliterators.java:948)

"
The same is true for my other ant commands, 570 lines are
""
This variable is defined earlier as:
"
  
"


lucene.tgz.unpacked This directory does not exist in the system, what do I
do wrong?


Re: Defining Document Transformers in Solr Configuration

2018-02-27 Thread Mikhail Khludnev
Hello, Simon.

You can define a search handler where have 
numcites:[subquery]=pmid={!terms
f=md_c_pmid 
v=$row.pmid}=10=q
or something like that.

On Tue, Feb 27, 2018 at 11:20 PM, simon  wrote:

> We do quite complex data pulls from a Solr index for subsequent analytics,
> currently using a home-grown Python API. Queries might include  a handful
> of pseudofields which this API rewrites to an aliased field invoking a
> Document Transformer in the 'fl' parameter list.
>
> For example 'numcites' is transformed to
>
> 'fl= ,numcites:[subquery]=pmid={!terms
> f=md_c_pmid v=$row.pmid}=10=q',...'
>
> What I'd ideally like to be able to do would be have this transformation
> defined in Solr configuration so that it's not tied  to one particular
> external API -  defining a macro, if you will, so that you could supply
> 'fl='a,b,c,%numcites%,...' in the request and have Solr do the expansion.
>
> Is there some way to do this that I've overlooked ? if not, I think it
> would be a useful new feature.
>
>
> -Simon
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Configuration of SOLR Cluster

2018-02-27 Thread Shawn Heisey

On 2/27/2018 6:42 PM, James Keeney wrote:

-DzkHost=:2181,:2181,:2181


This looks correct, except that with AWS, I have no idea whether you 
need the internal IP addressing or the external IP addressing.  If all 
of the machines involved (both servers and clients) are able to 
communicate on the internal addresses, then that should be fine.  You 
might want to discuss the IP addressing with Amazon just to make sure.



java.net.ConnectException: Connection refused


All of the logs you included look like they have this message -- 
connection refused.  Normally this happens when the software isn't 
running -- the OS refuses connections when no software is listening on a 
TCP port.  Sometimes firewalls can refuse connections, but more commonly 
they just drop the traffic silently, and the system starting the 
connection has to wait for a timeout and never gets any kind of 
response.  In this case, there IS a response -- the connection is refused.


It looks like you've pasted parts of the log, but I was actually hoping 
for entire logfiles, or at least entire sections of logfiles, to see 
errors in context with non-errors, and to be sure that nothing is lost, 
and that the formatting isn't destroyed by inclusion in an email 
message.  A paste website or a file sharing website is often the best 
way to share that kind of information.  If you need to redact 
information from the files, please do so in a way that preserves the 
ability to decipher the log.  For IP addresses, you could just redact 
the first two octets and leave the last two -- although if they are 
private addresses, you could leave them intact.


My instinct here is to think there's either a fundamental networking 
issue (firewalls, other problems), or that there may be some kind of 
problem with ZK.  What version of ZK are you using on the servers, and 
what version of Solr is it?


My instincts could be wrong because of a limited understanding of how ZK 
functions.


My recommendation would be to run ZK version 3.4.11 on your servers.  
Each new release of ZK has a very impressive list of fixed bugs.  The 
client ZK version will depend on the Solr version, since the ZK jar is 
part of Solr.


I looked at your ZK server config.  Your initLimit value is ten times 
what the default config for the embedded ZK in Solr is. Based on the 
comment in the embedded ZK config, that's probably not a problem, but I 
can't say for sure without more ZK knowledge.  The other parts of the 
config seem normal enough.


Are you configuring the "myid" file in each ZK server's data directory, 
and does the value on each server correspond to the line in the ZK 
config for that server?  I assume you probably have this correct, 
because ZK probably wouldn't work at all if it wasn't right.


I really don't know what might be going on.  Maybe with more complete 
logs I might spot something, but I don't know.


Thanks,
Shawn



Re: Defining Document Transformers in Solr Configuration

2018-02-27 Thread simon
On Tue, Feb 27, 2018 at 5:34 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) <
dceccarel...@bloomberg.net> wrote:

> I don't think you can define docTrasformer in the SolrConfig at the
> moment, I agree it would be a cool feature.
>
> Maybe one possibility could be to use the update request processors [1],
> and precompute the fields at index time, it would be more expensive in disk
> and index time,  but then it would simplify the fl logic and also
> performance at query time.
>

Wouldn't work for my use case - we are taking a field which contains the
IDs of documents which are cited  by the current document and 'inverting'
this to compute the number of documents which cite the current document in
the subquery. Anything precomputed could change as the index is updated.

>
> Cheers,
> Diego
>
> [1] https://lucene.apache.org/solr/guide/6_6/update-request-
> processors.html
>
> From: solr-user@lucene.apache.org At: 02/27/18 20:21:08To:
> solr-user@lucene.apache.org
> Subject: Defining Document Transformers in Solr Configuration
>
> We do quite complex data pulls from a Solr index for subsequent analytics,
> currently using a home-grown Python API. Queries might include  a handful
> of pseudofields which this API rewrites to an aliased field invoking a
> Document Transformer in the 'fl' parameter list.
>
> For example 'numcites' is transformed to
>
> 'fl= ,numcites:[subquery]=pmid={!terms
> f=md_c_pmid v=$row.pmid}=10=q',...'
>
> What I'd ideally like to be able to do would be have this transformation
> defined in Solr configuration so that it's not tied  to one particular
> external API -  defining a macro, if you will, so that you could supply
> 'fl='a,b,c,%numcites%,...' in the request and have Solr do the expansion.
>
> Is there some way to do this that I've overlooked ? if not, I think it
> would be a useful new feature.
>
>
> -Simon
>
>
>


Re: Configuration of SOLR Cluster

2018-02-27 Thread James Keeney
Shawn -

First, it's good to know that this is unusual behavior. That actually helps
as it lets me know that I should keep digging.

Here are a couple of things that might help.

In the configuration I am calling out all three ZK nodes. Here is the
configuration of Solr:

-DSTOP.KEY=solrrocks
-DSTOP.PORT=7983
-Dhost=solr2
-Djetty.home=/opt/solr/server
-Djetty.port=8983
-Dlog4j.configuration=file:/data/solr/log4j.properties
-Dsolr.install.dir=/opt/solr
-Dsolr.log.dir=/data/solr/logs
-Dsolr.log.muteconsole
-Dsolr.solr.home=/data/solr/data
-Duser.timezone=UTC
-DzkClientTimeout=15000
-DzkHost=:2181,:2181,:2181
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails-XX:+PrintGCTimeStamps-XX:+PrintHeapAtGC-XX:+PrintTenuringDistribution-XX:+UseCMSInitiatingOccupancyOnly-XX:+UseConcMarkSweepGC
-XX:+UseGCLogFileRotation
-XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:ConcGCThreads=4
-XX:GCLogFileSize=20M
-XX:MaxTenuringThreshold=8
-XX:NewRatio=3
-XX:NumberOfGCLogFiles=9
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /data/solr/logs
-XX:ParallelGCThreads=4
-XX:PretenureSizeThreshold=64m
-XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90
-Xloggc:/data/solr/logs/solr_gc.log
-Xms2G
-Xmx6G
-Xss1024k
-Xss256k
-verbose:gc


Here are the types of Solr errors I receive when this happens. I was able
to determine that it was not a security problem using telnet to connect to
port 2181 on the ZK nodes.

2018-02-26 19:58:50.964 WARN  (main-SendThread(:2181)) [   ]
o.a.z.ClientCnxn Session 0x361d3ae3f1c for server null, unexpected
error, closing socket connection and attempting reconnect

java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)

2018-02-26 19:58:52.894 WARN  (main-SendThread(:2181)) [   ]
o.a.z.ClientCnxn Session 0x361d3ae3f1c for server null, unexpected
error, closing socket connection and attempting reconnect

java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)

2018-02-26 19:58:53.456 WARN  (main-SendThread(:2181)) [   ]
o.a.z.ClientCnxn Session 0x361d3ae3f1c for server null, unexpected
error, closing socket connection and attempting reconnect

java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)


And here are the errors when the ZK nodes are not able to connect to each
other.


2018-02-26 19:57:25,554 [myid:2] - WARN
[WorkerSender[myid=2]:QuorumCnxManager@588] - Cannot open channel to 1 at
election address /:3888

java.net.ConnectException: Connection refused (Connection refused)

at java.net.PlainSocketImpl.socketConnect(Native Method)

at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)

at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)

at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

at java.net.Socket.connect(Socket.java:589)

at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)

at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:538)

at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:452)

at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:433)

at java.lang.Thread.run(Thread.java:748)

2018-02-26 19:57:25,554 [myid:2] - INFO
[WorkerSender[myid=2]:QuorumPeer$QuorumServer@167] - Resolved
hostname:  to address: /

2018-02-26 19:57:25,554 [myid:2] - INFO
[WorkerReceiver[myid=2]:FastLeaderElection@600] - Notification: 1 (message
format version), 2 (n.leader), 0xa0013 (n.zxid), 0x4 (n.round), LOOKING
(n.state), 2 (n.sid), 0xa (n.peerEpoch) LOOKING (my state)

2018-02-26 19:57:25,556 [myid:2] - WARN
[WorkerSender[myid=2]:QuorumCnxManager@588] - Cannot open channel to 3 at
election address /:3888

java.net.ConnectException: Connection refused (Connection refused)

at java.net.PlainSocketImpl.socketConnect(Native Method)

at

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thank you, I read under the memory footprint, I set 75% recovery, memory
occupancy at about 76%, the other we zookeeper not on a dedicated server,
perhaps because of this cause instability.

What else do you recommend for me to check?

2018-02-27 22:37 GMT+08:00 Emir Arnautović :

> This does not show much: only that your heap is around 75% (24-25GB). I
> was thinking that you should compare metrics (heap/GC as well) when running
> on without issues and when running with issues and see if something can be
> concluded.
> About instability: Do you run ZK on dedicated nodes?
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 27 Feb 2018, at 14:43, 苗海泉  wrote:
> >
> > Thank you, we were 49 shard 49 nodes, but later found that in this case,
> > often disconnect between solr and zookeepr, zookeeper too many nodes
> caused
> > solr instability, so reduced to 25 A follow-up performance can not keep
> up
> > also need to increase back.
> >
> > Very slow when solr and zookeeper not found any errors, just build the
> > index slow, automatic commit inside the log display is slow, but the main
> > reason may not lie in the commit place.
> >
> > I am sorry, I do not know how to look at the utilization of java heap,
> > through the gc log, gc time is not long, I posted the log:
> >
> >
> > {Heap before GC invocations=1144021 (full 72):
> > garbage-first heap   total 33554432K, used 26982419K [0x7f147800,
> > 0x7f1478808000, 0x7f1c7800)
> >  region size 8192K, 204 young (1671168K), 26 survivors (212992K)
> > Metaspace   used 41184K, capacity 41752K, committed 67072K, reserved
> > 67584K
> > 2018-02-27T21:43:01.793+0800: 4668016.044: [GC pause (G1 Evacuation
> Pause)
> > (young)
> > Desired survivor size 109051904 bytes, new threshold 1 (max 15)
> > - age   1:  113878760 bytes,  113878760 total
> > - age   2:   21264744 bytes,  135143504 total
> > - age   3:   17020096 bytes,  152163600 total
> > - age   4:   26870864 bytes,  179034464 total
> > , 0.0579794 secs]
> >   [Parallel Time: 46.9 ms, GC Workers: 18]
> >  [GC Worker Start (ms): Min: 4668016046.1, Avg: 4668016046.3, Max:
> > 4668016046.4, Diff: 0.3]
> >  [Ext Root Scanning (ms): Min: 2.4, Avg: 6.5, Max: 46.3, Diff: 43.9,
> > Sum: 116.9]
> >  [Update RS (ms): Min: 0.0, Avg: 3.4, Max: 6.0, Diff: 6.0, Sum: 62.0]
> > [Processed Buffers: Min: 0, Avg: 6.3, Max: 16, Diff: 16, Sum:
> 113]
> >  [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
> >  [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> > Sum: 0.0]
> >  [Object Copy (ms): Min: 0.1, Avg: 23.8, Max: 25.5, Diff: 25.5, Sum:
> > 428.1]
> >  [Termination (ms): Min: 0.0, Avg: 12.7, Max: 13.5, Diff: 13.5, Sum:
> > 228.9]
> > [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum:
> 18]
> >  [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.4, Sum:
> > 1.2]
> >  [GC Worker Total (ms): Min: 46.4, Avg: 46.6, Max: 46.7, Diff: 0.3,
> > Sum: 838.0]
> >  [GC Worker End (ms): Min: 4668016092.8, Avg: 4668016092.8, Max:
> > 4668016092.8, Diff: 0.0]
> >   [Code Root Fixup: 0.2 ms]
> >   [Code Root Purge: 0.0 ms]
> >   [Clear CT: 0.3 ms]
> >   [Other: 10.7 ms]
> >  [Choose CSet: 0.0 ms]
> >  [Ref Proc: 5.9 ms]
> >  [Ref Enq: 0.2 ms]
> >  [Redirty Cards: 0.2 ms]
> >  [Humongous Register: 2.2 ms]
> >  [Humongous Reclaim: 0.4 ms]
> >  [Free CSet: 0.4 ms]
> >   [Eden: 1424.0M(1424.0M)->0.0B(1552.0M) Survivors: 208.0M->80.0M Heap:
> > 25.7G(32.0G)->24.3G(32.0G)]
> > Heap after GC invocations=1144022 (full 72):
> > garbage-first heap   total 33554432K, used 25489656K [0x7f147800,
> > 0x7f1478808000, 0x7f1c7800)
> >  region size 8192K, 10 young (81920K), 10 survivors (81920K)
> > Metaspace   used 41184K, capacity 41752K, committed 67072K, reserved
> > 67584K
> > }
> > [Times: user=0.84 sys=0.01, real=0.05 secs]
> > 2018-02-27T21:43:01.851+0800: 4668016.102: Total time for which
> application
> > threads were stopped: 0.0661383 seconds, Stopping threads took: 0.0004141
> > seconds
> > 2018-02-27T21:43:02.092+0800: 4668016.343: [GC concurrent-mark-end,
> > 2.5757061 secs]
> > 2018-02-27T21:43:02.100+0800: 4668016.351: [GC remark
> > 2018-02-27T21:43:02.100+0800: 4668016.351: [Finalize Marking, 0.0016508
> > secs] 2018-02-27T21:43:02.102+0800: 4668016.352: [GC ref-proc, 0.0277818
> > secs] 2018-02-27T21:43:02.129+0800: 4668016.380: [Unloading, 0.0118102
> > secs], 0.0704296 secs]
> > [Times: user=0.85 sys=0.04, real=0.07 secs]
> > 2018-02-27T21:43:02.171+0800: 4668016.422: Total time for which
> application
> > threads were stopped: 0.0785762 seconds, Stopping threads took: 0.0006159
> > seconds
> > 2018-02-27T21:43:02.178+0800: 4668016.429: [GC cleanup 24G->24G(32G),
> > 0.0391915 secs]
> > [Times: 

Re: Configuration of SOLR Cluster

2018-02-27 Thread Shawn Heisey
On 2/27/2018 10:57 AM, James Keeney wrote:
> *1 - ZK ensemble not accepting return of node*
> Currently, when a ZK node in the ensemble goes down the ensemble is able to
> do what it should do and keeps working. However when I bring the 3rd node
> back online the other two nodes reject connection requests from the 3rd
> node until I restart the nodes. The sequence is:
>
>1. Bring 3rd node back on line
>2. Restart follower in existing ensemble
>3. Restart leader in existing ensemble
>
> When this is done the third node happily becomes part fo the ensemble.

>From what I understand, restarting the other nodes should not be
required.  If everything is configured properly, I don't think that
should be happening, but I don't have deep ZK knowledge.

> *2 - Solr nodes unable to connect*
> When setting up the cluster for the first time the ensemble rejects the
> solr connection requests until the ZK on the ZK ensemble members is
> restarted.



> However, we have also seen that if we have a problem with one of the Solr
> nodes that requires restarting more than one node we have to restart ZK to
> reconnect the nodes with thee ensemble again.

These problems sound very weird too.  I wish I had some idea, but
without logs showing what kind of errors are encountered, I have no idea
what's happening.

None of these problems are in Solr code.  Solr uses the ZooKeeper client
code without modification.  All the ZK communication is done in ZK code,
initialized with the zkHost string and a few other config bits (like
zkClientTimeout) provided to Solr at startup.

If you want to share the Solr log and the ZK server logs covering the
timeframe when the problems happen, maybe we can find something useful
and at least point you towards the problem, but even then, you may have
to talk to the ZooKeeper mailing list for real help, and they'll want
the same logs.

Are you informing Solr about all three of your ZK hosts when you start
it up?  That is a requirement.  If the zkHost string you send to Solr
doesn't list all your servers, then the ZK client inside Solr will not
be able to fail over correctly.  The version of ZK that Solr includes is
not able to dynamically change the servers that it talks to, and the
version of ZK that *does* have dynamic reconfiguration is still in
beta.  Solr is not going to include ZK 3.5.x until they put out a stable
release.  I don't know when they're going to do that.  It could be soon,
or it could be several months out.  The ZK project does NOT make
frequent releases.

Thanks,
Shawn



Re: SOLR Similarity Difference

2018-02-27 Thread Rick Leir
Rick
Did you experiment in the SolrAdmin analysis page? It would possibly tell you 
whether your chain is doing what you expect. Then you need to consider that 
boolean logic is not strictly boolean in Solr. There is a Lucidworks blog which 
explains this nicely; every now and then someone posts the link here.
Cheers -- Rick

On February 26, 2018 5:39:31 PM EST, "Hodder, Rick"  wrote:
>I'm converting SOLR 4.10.2 to SOLR 7.1
>
>I have the following three strings in both SOLR cores
>
>Action Technical Temporaries t/a CTR Corporation
>Action Technical Temporaries
>Action Technical Temporar
>
>If I search
>
>IDX_CompanyName: (Action AND Technical AND Temporaries AND t/a AND CTR
>AND Corporation)
>
>Under 4.10.2 I see all three in the results
>
>Under 7.1, with the default BF25 similarity. I only see the first
>result
>
>Someone on the list suggested that make 7.1 to go back to the
>similarity factory used in 4.10.2 that I add the following to the
>schema.xml.
>
>class="org.apache.solr.search.similarities.ClassicSimilarityFactory">
>
>That brings all three results.
>
>But my boss would prefer that we don't use the older similarity
>factory.
>
>Is there some setting other than similarity factory that will make 7.1
>include these documents without changing the query?
>
>Thanks,
>
>Rick Hodder
>Information Technology
>Navigators Management Company, Inc.
>83 Wooster Heights Road, 2nd Floor
>Danbury, CT  06810
>(475) 329-6251
>
>[Forbes_Best Places Logo2016]

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re:Defining Document Transformers in Solr Configuration

2018-02-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I don't think you can define docTrasformer in the SolrConfig at the moment, I 
agree it would be a cool feature. 

Maybe one possibility could be to use the update request processors [1], and 
precompute the fields at index time, it would be more expensive in disk and 
index time,  but then it would simplify the fl logic and also performance at 
query time. 

Cheers,
Diego

[1] https://lucene.apache.org/solr/guide/6_6/update-request-processors.html

From: solr-user@lucene.apache.org At: 02/27/18 20:21:08To:  
solr-user@lucene.apache.org
Subject: Defining Document Transformers in Solr Configuration

We do quite complex data pulls from a Solr index for subsequent analytics,
currently using a home-grown Python API. Queries might include  a handful
of pseudofields which this API rewrites to an aliased field invoking a
Document Transformer in the 'fl' parameter list.

For example 'numcites' is transformed to

'fl= ,numcites:[subquery]=pmid={!terms
f=md_c_pmid v=$row.pmid}=10=q',...'

What I'd ideally like to be able to do would be have this transformation
defined in Solr configuration so that it's not tied  to one particular
external API -  defining a macro, if you will, so that you could supply
'fl='a,b,c,%numcites%,...' in the request and have Solr do the expansion.

Is there some way to do this that I've overlooked ? if not, I think it
would be a useful new feature.


-Simon




Defining Document Transformers in Solr Configuration

2018-02-27 Thread simon
We do quite complex data pulls from a Solr index for subsequent analytics,
currently using a home-grown Python API. Queries might include  a handful
of pseudofields which this API rewrites to an aliased field invoking a
Document Transformer in the 'fl' parameter list.

For example 'numcites' is transformed to

'fl= ,numcites:[subquery]=pmid={!terms
f=md_c_pmid v=$row.pmid}=10=q',...'

What I'd ideally like to be able to do would be have this transformation
defined in Solr configuration so that it's not tied  to one particular
external API -  defining a macro, if you will, so that you could supply
'fl='a,b,c,%numcites%,...' in the request and have Solr do the expansion.

Is there some way to do this that I've overlooked ? if not, I think it
would be a useful new feature.


-Simon


Re:SOLR Similarity Difference

2018-02-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Rick,
I don't think the issue is BM25 vs TFIDF (the old similarity), it seems more 
due to the "matching" logic. 

you are asking to match:

"(Action AND Technical AND Temporaries AND t/a AND CTR AND Corporation)"

This (in theory) means that you want to retrieve **only** the documents that 
contain **all* the terms (so just the first) - so bm25 seems to do the right 
thing. 

Solr allows you to "relax" the requirement, for example using the "mm" 
parameter (see [1] ), could you try to use mm with bm25 and see if it solves 
your problem? 
Are you sure that when you use tfidf you are only changing the similarity and 
not something else? 

Can you try to enable the debug mode (=true) and check in the 
response how the query is processed? (feel free to post here) 

[1] 
https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter


From: solr-user@lucene.apache.org At: 02/26/18 22:39:43To:  
solr-user@lucene.apache.org
Subject: SOLR Similarity Difference

 

I’m converting SOLR 4.10.2 to SOLR 7.1 
  
I have the following three strings in both SOLR cores 
  
Action Technical Temporaries t/a CTR Corporation 
Action Technical Temporaries 
Action Technical Temporar 
  
If I search  
  
IDX_CompanyName: (Action AND Technical AND Temporaries AND t/a AND CTR AND 
Corporation) 
  
Under 4.10.2 I see all three in the results 
  
Under 7.1, with the default BF25 similarity. I only see the first result 
  
Someone on the list suggested that make 7.1 to go back to the similarity 
factory used in 4.10.2 that I add the following to the schema.xml. 
  

 
  
That brings all three results. 
  
But my boss would prefer that we don’t use the older similarity factory. 
  
Is there some setting other than similarity factory that will make 7.1 include 
these documents without changing the query? 
  
Thanks, 
  
Rick Hodder 
Information Technology 
Navigators Management Company, Inc. 
83 Wooster Heights Road, 2nd Floor 
Danbury, CT  06810 
(475) 329-6251 
  
 
   



Configuration of SOLR Cluster

2018-02-27 Thread James Keeney
I'm setting up a solr cluster in AWS cloud and I need help with the
configuration of ZooKeeper. The cluster has 3 ZK nodes and 3 Solr nodes

There are two behaviors that are of concern:

*1 - ZK ensemble not accepting return of node*
Currently, when a ZK node in the ensemble goes down the ensemble is able to
do what it should do and keeps working. However when I bring the 3rd node
back online the other two nodes reject connection requests from the 3rd
node until I restart the nodes. The sequence is:


   1. Bring 3rd node back on line
   2. Restart follower in existing ensemble
   3. Restart leader in existing ensemble

When this is done the third node happily becomes part fo the ensemble.

*2 - Solr nodes unable to connect*
When setting up the cluster for the first time the ensemble rejects the
solr connection requests until the ZK on the ZK ensemble members is
restarted.

So the sequences is:


   1. Setup ensemble
   2. Bring up solr nodes
   3. Restart followers on ZK ensemble
   4. Restart leader on ZK ensemble


When I do this everything is fine and the cluster is now stable.

However, we have also seen that if we have a problem with one of the Solr
nodes that requires restarting more than one node we have to restart ZK to
reconnect the nodes with thee ensemble again.

We are trying to achieve a self correcting cluster. In other words, we
would like to get to the point where if a nodes goes down, all that is
necessary is to restart it (after the issue is resolved) and it will add
itself back into the cluster. Obviously this is an issue if ZK has to be
restarted.

Is there a configuration that I am missing? Why is ZK so finicky?

Our ZK config is very simple:

clientPort=2181

dataDir=/var/opt/zookeeper/data

tickTime=2000

autopurge.purgeInterval=24

initLimit=100

syncLimit=5

server.1=:2888:3888

server.2=:2888:3888

server.3=:2888:3888


Any help would be greatly appreciated.


Jim K.

-- 
Jim Keeney
President, FitterWeb
E: j...@fitterweb.com
M: 703-568-5887

*FitterWeb Consulting*
*Are you lean and agile enough? *


New payload handling 7.2

2018-02-27 Thread Markus Jelsma
Hello,

Our payload handling became broken since Lucene/Solr 7.2, we sometimes get 0.0 
= AveragePayloadFunction.docScore() for some but not all query clauses. We only 
have payloads on some terms, to signal the similarity it needs to 'punish' the 
term, e.g. being a article or adjective.

I examined the tickets  and it is not immediately clear on how to migrate from 
old Similarity style payload scoring, to the new method.

Is really as simple as making a custom PayloadDecoder (that actually then 
houses the scoring logic we now have in the Similarity), and pass that new 
decoder to PayloadScoreQuery in our custom QParser?

Additional question, why do some clauses receive 0? In the similarity, if there 
is no payload, i always return 1.0f. Also, is/was it known that user 
implementations could break with the 7.2 upgrade? It did not show up on our 
tests (probably due to it not failing for every clause). The problem only 
became visible during testing, but by accident.

Many thanks,
Markus


Re: Gentle reminder RE: Object not fetched because its identifier appears to be already in processing

2018-02-27 Thread Shawn Heisey

On 2/27/2018 7:08 AM, YELESWARAPU, VENKATA BHAN wrote:

While indexing job is running we are seeing the below message for all the 
objects.

Object not fetched because its identifier appears to be already in processing


This time, I am going to include you as a CC on the message.  This is 
not normally something that I do, because posting to the list normally 
requires subscribing to the list, so you should be getting all replies 
from the list.


I'm pretty sure that I already replied once asking for information, but 
I never got a response.


Another thing I said on my last reply:  The text of the error message 
you have provided (in the subject and in the text I quoted above) is not 
in the Solr or Lucene codebase.  So right away we know that it wasn't 
generated by Solr.  It may have been generated by the other piece of 
software *inside* Solr, but without the rest of the error information, 
we have no way of knowing what actually happened.  Solr errors tend to 
be dozens of lines long, with most of the output being a Java 
stacktrace.  And in order to make sense of the stacktrace, we must have 
the Solr version.


In addition to the details Cassandra mentioned, there's one bit that 
will be critical:


Where *exactly* did you see this error?  Was it in the Solr admin UI, 
the Solr logfile, the logging output from your indexing program, or 
somewhere else?


Thanks,
Shawn



Re: Solr crashing StandardWrapperValve

2018-02-27 Thread Erick Erickson
You'd really have to talk to Cloudera for support, the version of Solr
shipped with CDH isn't a standard distro.

Best,
Erick

On Tue, Feb 27, 2018 at 8:25 AM, Wael Kader  wrote:
> Hello,
>
> SOLR kept crashing today over and over again .
> I am running a single node solr instance on Cloudera with 140 GB of data.
> Things were working fine until today. I have a replication server that I am
> replicating data to but it wasn't working before and was fixed today.. so I
> thought maybe its causing the issue so I stopped the replication.
> I am not sure this is the problem as it crashed once after I stopped the
> replication. I need help on identifying the problem.
>
> I tried to find the problem from the log and I found the below error:
>
> Feb 27, 2018 6:23:14 AM org.apache.catalina.core.StandardWrapperValve invoke
> SEVERE: Servlet.service() for servlet default threw exception
> java.lang.IllegalStateException
> at
> org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
> at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:962)
> at
> org.apache.solr.servlet.SolrDispatchFilter.httpSolrCall(SolrDispatchFilter.java:497)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.solr.servlet.SolrHadoopAuthenticationFilter$2.doFilter(SolrHadoopAuthenticationFilter.java:408)
> at
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:622)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:301)
> at
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:574)
> at
> org.apache.solr.servlet.SolrHadoopAuthenticationFilter.doFilter(SolrHadoopAuthenticationFilter.java:413)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:612)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:503)
> at java.lang.Thread.run(Thread.java:745)
>
> --
> Regards,
> Wael


Solr crashing StandardWrapperValve

2018-02-27 Thread Wael Kader
Hello,

SOLR kept crashing today over and over again .
I am running a single node solr instance on Cloudera with 140 GB of data.
Things were working fine until today. I have a replication server that I am
replicating data to but it wasn't working before and was fixed today.. so I
thought maybe its causing the issue so I stopped the replication.
I am not sure this is the problem as it crashed once after I stopped the
replication. I need help on identifying the problem.

I tried to find the problem from the log and I found the below error:

Feb 27, 2018 6:23:14 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet default threw exception
java.lang.IllegalStateException
at
org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:962)
at
org.apache.solr.servlet.SolrDispatchFilter.httpSolrCall(SolrDispatchFilter.java:497)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.solr.servlet.SolrHadoopAuthenticationFilter$2.doFilter(SolrHadoopAuthenticationFilter.java:408)
at
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:622)
at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:301)
at
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:574)
at
org.apache.solr.servlet.SolrHadoopAuthenticationFilter.doFilter(SolrHadoopAuthenticationFilter.java:413)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:612)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:503)
at java.lang.Thread.run(Thread.java:745)

-- 
Regards,
Wael


Re: Gentle reminder RE: Object not fetched because its identifier appears to be already in processing

2018-02-27 Thread Cassandra Targett
There is not enough information here for anyone to answer. You mention a
"below message", but there is no message that we can see. If it was in an
attachment to the mail, it got stripped by the mail server.

If you want a response, please provide in the body of the mail details such
as: the error message you see (with the full stack trace, if possible);
what you are trying to index; the version of Solr you are using; any custom
configurations you may have in place; and any other detail that might help
someone who doesn't have access to your system try to guess what might be
going wrong.

Cassandra

On Tue, Feb 27, 2018 at 8:08 AM, YELESWARAPU, VENKATA BHAN <
vyeleswar...@statestreet.com> wrote:

> *Information Classification: **ll*
>
> * Limited Access *
>
> If any of you experts could help, we would greatly appreciate it. Thank
> you.
>
>
>
> *From:* YELESWARAPU, VENKATA BHAN
> *Sent:* Friday, February 23, 2018 8:30 AM
> *To:* 'd...@lucene.apache.org' ; '
> solr-user@lucene.apache.org' 
> *Subject:* Object not fetched because its identifier appears to be
> already in processing
>
>
>
> *Information Classification: **ll** Limited Access*
>
> Dear Users,
>
>
>
> While indexing job is running we are seeing the below message for all the
> objects.
>
> Object not fetched because its identifier appears to be already in
> processing
>
>
>
> What is the issue and how to resolve this so that indexing can work. Could
> you please guide.
>
>
>
> Thank you,
>
> Dutt
>
>
>


Gentle reminder RE: Object not fetched because its identifier appears to be already in processing

2018-02-27 Thread YELESWARAPU, VENKATA BHAN
Information Classification: ** Limited Access

If any of you experts could help, we would greatly appreciate it. Thank you.

From: YELESWARAPU, VENKATA BHAN
Sent: Friday, February 23, 2018 8:30 AM
To: 'd...@lucene.apache.org' ; 
'solr-user@lucene.apache.org' 
Subject: Object not fetched because its identifier appears to be already in 
processing

Information Classification: ** Limited Access
Dear Users,


While indexing job is running we are seeing the below message for all the 
objects.

Object not fetched because its identifier appears to be already in processing



What is the issue and how to resolve this so that indexing can work. Could you 
please guide.



Thank you,

Dutt



Re: Searching for a phrase in proximity to another token in SOLR

2018-02-27 Thread Erick Erickson
Did you try the ComplexPhraseQueryParser? See:
https://lucene.apache.org/solr/guide/6_6/other-parsers.html

Best,
Erick

On Tue, Feb 27, 2018 at 7:23 AM, Deyan Yotsov  wrote:
> Hello,
>
> Is there a way to achieve something along these lines:
>
> "("john smith") josh"~12
>
> Thank you,
>
> Deyan
>


Re: NRT replicas miss hits and return duplicate hits when paging solrcloud searches

2018-02-27 Thread Webster Homer
Emir,

Using tlog replica types addresses my immediate problem.

The secondary issue is that all of our searches show inconsistent results.
These are all normal paging use cases. We regularly test our relevancy, and
these differences creates confusion in the testers. Moreover, we are
migrating from Endeca which has very consistent results.

I'm hoping that using the global stats cache will make the other searches
more stable. I think we will eventually move to favoring tlog replicas. We
have a couple of collections where NRT makes sense, but those collections
don't need to return data in relevancy order. I think NRT should be
considered a niche use case for a search engine, tlog and pull replicas are
a much better fit for a search engine (imho)

On Tue, Feb 27, 2018 at 4:01 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Webster,
> Since you are returning all hits, returning the last page is almost as
> heavy for Solr as returning all documents. Maybe you should consider just
> returning one large page and completely avoid this issue.
> I agree with you that this should be handled by Solr. ES solved this issue
> with “preference” search parameter where you can set session id as
> preference and it will stick to the same shards. I guess you could try
> similar thing on your own but that would require you to send list of shards
> as parameter for your search and balance it for different sessions.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 26 Feb 2018, at 21:03, Webster Homer  wrote:
> >
> > Erick,
> >
> > No we didn't look at that. I will add it to the list. We have  not seen
> > performance issues with solr. We have much slower technologies in our
> > stack. This project was to replace a system that was too slow.
> >
> > Thank you, I will look into it
> >
> > Webster
> >
> > On Mon, Feb 26, 2018 at 1:13 PM, Erick Erickson  >
> > wrote:
> >
> >> Did you try enabling distributed IDF (statsCache)? See:
> >> https://lucene.apache.org/solr/guide/6_6/distributed-requests.html
> >>
> >> It's may not totally fix the issue, but it's worth trying. It does
> >> come with a performance penalty of course.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Feb 26, 2018 at 11:00 AM, Webster Homer  >
> >> wrote:
> >>> Thanks Shawn, I had settled on this as a solution.
> >>>
> >>> All our use cases for Solr is to return results in order of relevancy
> to
> >>> the query, so having a deterministic sort would defeat that purpose.
> >> Since
> >>> we wanted to be able to return all the results for a query, I
> originally
> >>> looked at using the Streaming API, but that doesn't support returning
> >>> results sorted by relevancy
> >>>
> >>> I disagree with you about NRT replicas though. They may function as
> >>> designed, but since they cannot guarantee consistent results their
> design
> >>> is buggy, at least it is for a search engine.
> >>>
> >>>
> >>> On Mon, Feb 26, 2018 at 12:20 PM, Shawn Heisey 
> >> wrote:
> >>>
>  On 2/26/2018 10:26 AM, Webster Homer wrote:
> > We need the results by relevancy so the application sorts the results
> >> by
> > score desc, and the unique id ascending as the tie breaker
> 
>  This is the reason for the discrepancy, and why the different replica
>  types don't have the same issue.
> 
>  Each NRT replica can have different deleted documents than the others,
>  just due to the way that NRT replicas work.  Deleted documents affect
>  relevancy scoring.  When one replica has say 5000 deleted documents
> and
>  another has 200, or has 5000 but they're different docs, a relevancy
>  sort can end up different.  So when Solr goes to one replica for page
> 1
>  and another for page 2 (which is expected due to SolrCloud's internal
>  load balancing), you may end up with duplicate documents or documents
>  missing.  Because deleted documents are not counted or returned,
>  numFound will be consistent, as long as the index doesn't change
> between
>  the queries for pages.
> 
>  If you were using a deterministic sort rather than relevancy, this
>  wouldn't be happening, because deleted documents have no influence on
>  that kind of sort.
> 
>  With TLOG or PULL, the replicas are absolutely identical, so there is
> no
>  difference, unless the index is changing as you page through the
> >> results.
> 
>  I think changing replica types is the only solution here.  NRT
> replicas
>  are working as they were designed -- there's no bug, even though
>  problems like this do sometimes turn up.
> 
>  Thanks,
>  Shawn
> 
> 
> >>>
> >>> --
> >>>
> >>>
> >>> This message and any attachment are confidential and may be privileged
> or
> >>> otherwise 

Searching for a phrase in proximity to another token in SOLR

2018-02-27 Thread Deyan Yotsov

Hello,

Is there a way to achieve something along these lines:

"("john smith") josh"~12

Thank you,

Deyan



Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
This does not show much: only that your heap is around 75% (24-25GB). I was 
thinking that you should compare metrics (heap/GC as well) when running on 
without issues and when running with issues and see if something can be 
concluded.
About instability: Do you run ZK on dedicated nodes?

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Feb 2018, at 14:43, 苗海泉  wrote:
> 
> Thank you, we were 49 shard 49 nodes, but later found that in this case,
> often disconnect between solr and zookeepr, zookeeper too many nodes caused
> solr instability, so reduced to 25 A follow-up performance can not keep up
> also need to increase back.
> 
> Very slow when solr and zookeeper not found any errors, just build the
> index slow, automatic commit inside the log display is slow, but the main
> reason may not lie in the commit place.
> 
> I am sorry, I do not know how to look at the utilization of java heap,
> through the gc log, gc time is not long, I posted the log:
> 
> 
> {Heap before GC invocations=1144021 (full 72):
> garbage-first heap   total 33554432K, used 26982419K [0x7f147800,
> 0x7f1478808000, 0x7f1c7800)
>  region size 8192K, 204 young (1671168K), 26 survivors (212992K)
> Metaspace   used 41184K, capacity 41752K, committed 67072K, reserved
> 67584K
> 2018-02-27T21:43:01.793+0800: 4668016.044: [GC pause (G1 Evacuation Pause)
> (young)
> Desired survivor size 109051904 bytes, new threshold 1 (max 15)
> - age   1:  113878760 bytes,  113878760 total
> - age   2:   21264744 bytes,  135143504 total
> - age   3:   17020096 bytes,  152163600 total
> - age   4:   26870864 bytes,  179034464 total
> , 0.0579794 secs]
>   [Parallel Time: 46.9 ms, GC Workers: 18]
>  [GC Worker Start (ms): Min: 4668016046.1, Avg: 4668016046.3, Max:
> 4668016046.4, Diff: 0.3]
>  [Ext Root Scanning (ms): Min: 2.4, Avg: 6.5, Max: 46.3, Diff: 43.9,
> Sum: 116.9]
>  [Update RS (ms): Min: 0.0, Avg: 3.4, Max: 6.0, Diff: 6.0, Sum: 62.0]
> [Processed Buffers: Min: 0, Avg: 6.3, Max: 16, Diff: 16, Sum: 113]
>  [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
>  [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.0]
>  [Object Copy (ms): Min: 0.1, Avg: 23.8, Max: 25.5, Diff: 25.5, Sum:
> 428.1]
>  [Termination (ms): Min: 0.0, Avg: 12.7, Max: 13.5, Diff: 13.5, Sum:
> 228.9]
> [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 18]
>  [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.4, Sum:
> 1.2]
>  [GC Worker Total (ms): Min: 46.4, Avg: 46.6, Max: 46.7, Diff: 0.3,
> Sum: 838.0]
>  [GC Worker End (ms): Min: 4668016092.8, Avg: 4668016092.8, Max:
> 4668016092.8, Diff: 0.0]
>   [Code Root Fixup: 0.2 ms]
>   [Code Root Purge: 0.0 ms]
>   [Clear CT: 0.3 ms]
>   [Other: 10.7 ms]
>  [Choose CSet: 0.0 ms]
>  [Ref Proc: 5.9 ms]
>  [Ref Enq: 0.2 ms]
>  [Redirty Cards: 0.2 ms]
>  [Humongous Register: 2.2 ms]
>  [Humongous Reclaim: 0.4 ms]
>  [Free CSet: 0.4 ms]
>   [Eden: 1424.0M(1424.0M)->0.0B(1552.0M) Survivors: 208.0M->80.0M Heap:
> 25.7G(32.0G)->24.3G(32.0G)]
> Heap after GC invocations=1144022 (full 72):
> garbage-first heap   total 33554432K, used 25489656K [0x7f147800,
> 0x7f1478808000, 0x7f1c7800)
>  region size 8192K, 10 young (81920K), 10 survivors (81920K)
> Metaspace   used 41184K, capacity 41752K, committed 67072K, reserved
> 67584K
> }
> [Times: user=0.84 sys=0.01, real=0.05 secs]
> 2018-02-27T21:43:01.851+0800: 4668016.102: Total time for which application
> threads were stopped: 0.0661383 seconds, Stopping threads took: 0.0004141
> seconds
> 2018-02-27T21:43:02.092+0800: 4668016.343: [GC concurrent-mark-end,
> 2.5757061 secs]
> 2018-02-27T21:43:02.100+0800: 4668016.351: [GC remark
> 2018-02-27T21:43:02.100+0800: 4668016.351: [Finalize Marking, 0.0016508
> secs] 2018-02-27T21:43:02.102+0800: 4668016.352: [GC ref-proc, 0.0277818
> secs] 2018-02-27T21:43:02.129+0800: 4668016.380: [Unloading, 0.0118102
> secs], 0.0704296 secs]
> [Times: user=0.85 sys=0.04, real=0.07 secs]
> 2018-02-27T21:43:02.171+0800: 4668016.422: Total time for which application
> threads were stopped: 0.0785762 seconds, Stopping threads took: 0.0006159
> seconds
> 2018-02-27T21:43:02.178+0800: 4668016.429: [GC cleanup 24G->24G(32G),
> 0.0391915 secs]
> [Times: user=0.64 sys=0.00, real=0.04 secs]
> 2018-02-27T21:43:02.218+0800: 4668016.469: Total time for which application
> threads were stopped: 0.0470020 seconds, Stopping threads took: 0.0001684
> seconds
> 2018-02-27T21:43:02.540+0800: 4668016.791: Total time for which application
> threads were stopped: 0.0074829 seconds, Stopping threads took: 0.0004834
> seconds
> {Heap before GC invocations=1144023 (full 72):
> garbage-first heap   total 33554432K, used 27078904K [0x7f147800,
> 0x7f1478808000, 

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thank you, we were 49 shard 49 nodes, but later found that in this case,
often disconnect between solr and zookeepr, zookeeper too many nodes caused
solr instability, so reduced to 25 A follow-up performance can not keep up
also need to increase back.

Very slow when solr and zookeeper not found any errors, just build the
index slow, automatic commit inside the log display is slow, but the main
reason may not lie in the commit place.

I am sorry, I do not know how to look at the utilization of java heap,
through the gc log, gc time is not long, I posted the log:


{Heap before GC invocations=1144021 (full 72):
 garbage-first heap   total 33554432K, used 26982419K [0x7f147800,
0x7f1478808000, 0x7f1c7800)
  region size 8192K, 204 young (1671168K), 26 survivors (212992K)
 Metaspace   used 41184K, capacity 41752K, committed 67072K, reserved
67584K
2018-02-27T21:43:01.793+0800: 4668016.044: [GC pause (G1 Evacuation Pause)
(young)
Desired survivor size 109051904 bytes, new threshold 1 (max 15)
- age   1:  113878760 bytes,  113878760 total
- age   2:   21264744 bytes,  135143504 total
- age   3:   17020096 bytes,  152163600 total
- age   4:   26870864 bytes,  179034464 total
, 0.0579794 secs]
   [Parallel Time: 46.9 ms, GC Workers: 18]
  [GC Worker Start (ms): Min: 4668016046.1, Avg: 4668016046.3, Max:
4668016046.4, Diff: 0.3]
  [Ext Root Scanning (ms): Min: 2.4, Avg: 6.5, Max: 46.3, Diff: 43.9,
Sum: 116.9]
  [Update RS (ms): Min: 0.0, Avg: 3.4, Max: 6.0, Diff: 6.0, Sum: 62.0]
 [Processed Buffers: Min: 0, Avg: 6.3, Max: 16, Diff: 16, Sum: 113]
  [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
  [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.0]
  [Object Copy (ms): Min: 0.1, Avg: 23.8, Max: 25.5, Diff: 25.5, Sum:
428.1]
  [Termination (ms): Min: 0.0, Avg: 12.7, Max: 13.5, Diff: 13.5, Sum:
228.9]
 [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 18]
  [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.4, Sum:
1.2]
  [GC Worker Total (ms): Min: 46.4, Avg: 46.6, Max: 46.7, Diff: 0.3,
Sum: 838.0]
  [GC Worker End (ms): Min: 4668016092.8, Avg: 4668016092.8, Max:
4668016092.8, Diff: 0.0]
   [Code Root Fixup: 0.2 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.3 ms]
   [Other: 10.7 ms]
  [Choose CSet: 0.0 ms]
  [Ref Proc: 5.9 ms]
  [Ref Enq: 0.2 ms]
  [Redirty Cards: 0.2 ms]
  [Humongous Register: 2.2 ms]
  [Humongous Reclaim: 0.4 ms]
  [Free CSet: 0.4 ms]
   [Eden: 1424.0M(1424.0M)->0.0B(1552.0M) Survivors: 208.0M->80.0M Heap:
25.7G(32.0G)->24.3G(32.0G)]
Heap after GC invocations=1144022 (full 72):
 garbage-first heap   total 33554432K, used 25489656K [0x7f147800,
0x7f1478808000, 0x7f1c7800)
  region size 8192K, 10 young (81920K), 10 survivors (81920K)
 Metaspace   used 41184K, capacity 41752K, committed 67072K, reserved
67584K
}
 [Times: user=0.84 sys=0.01, real=0.05 secs]
2018-02-27T21:43:01.851+0800: 4668016.102: Total time for which application
threads were stopped: 0.0661383 seconds, Stopping threads took: 0.0004141
seconds
2018-02-27T21:43:02.092+0800: 4668016.343: [GC concurrent-mark-end,
2.5757061 secs]
2018-02-27T21:43:02.100+0800: 4668016.351: [GC remark
2018-02-27T21:43:02.100+0800: 4668016.351: [Finalize Marking, 0.0016508
secs] 2018-02-27T21:43:02.102+0800: 4668016.352: [GC ref-proc, 0.0277818
secs] 2018-02-27T21:43:02.129+0800: 4668016.380: [Unloading, 0.0118102
secs], 0.0704296 secs]
 [Times: user=0.85 sys=0.04, real=0.07 secs]
2018-02-27T21:43:02.171+0800: 4668016.422: Total time for which application
threads were stopped: 0.0785762 seconds, Stopping threads took: 0.0006159
seconds
2018-02-27T21:43:02.178+0800: 4668016.429: [GC cleanup 24G->24G(32G),
0.0391915 secs]
 [Times: user=0.64 sys=0.00, real=0.04 secs]
2018-02-27T21:43:02.218+0800: 4668016.469: Total time for which application
threads were stopped: 0.0470020 seconds, Stopping threads took: 0.0001684
seconds
2018-02-27T21:43:02.540+0800: 4668016.791: Total time for which application
threads were stopped: 0.0074829 seconds, Stopping threads took: 0.0004834
seconds
{Heap before GC invocations=1144023 (full 72):
 garbage-first heap   total 33554432K, used 27078904K [0x7f147800,
0x7f1478808000, 0x7f1c7800)
  region size 8192K, 204 young (1671168K), 10 survivors (81920K)
 Metaspace   used 41184K, capacity 41752K, committed 67072K, reserved
67584K
2018-02-27T21:43:04.076+0800: 4668018.326: [GC pause (G1 Evacuation Pause)
(young)
Desired survivor size 109051904 bytes, new threshold 15 (max 15)
- age   1:   47719032 bytes,   47719032 total
, 0.0554183 secs]
   [Parallel Time: 48.0 ms, GC Workers: 18]
  [GC Worker Start (ms): Min: 4668018329.0, Avg: 4668018329.1, Max:
4668018329.3, Diff: 0.3]
  [Ext Root Scanning (ms): Min: 2.9, Avg: 5.7, Max: 47.4, Diff: 44.6,
Sum: 103.0]
  [Update RS (ms): Min: 0.0, Avg: 14.3, Max: 

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
Ah, so there are ~560 shards per node and not all nodes are indexing at the 
same time. Why is that? You can have better throughput if indexing on all 
nodes. If happy with shard size, you can create new collection with 49 shards 
every 2h and have everything the same and index on all nodes.

Back to main question: what is the heap utilisation? When you restart node what 
is heap utilisation? Do you see any errors in your logs? Do you see any errors 
in ZK logs?

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Feb 2018, at 13:22, 苗海泉  wrote:
> 
> Thanks  for you reply again.
> I just said that you may have some misunderstanding, we have 49 solr nodes,
> each collection has 25 shards, each shard has only one replica of the data,
> there is no copy, and I reduce the part of the cache. If you need the
> metric data, I can check Come out to tell you, in addition we are only
> additional system, there will not be any change action.
> 
> 2018-02-27 20:05 GMT+08:00 Emir Arnautović :
> 
>> Hi,
>> It is hard to tell without looking more into your metrics. It seems to me
>> that you are reaching limits of your cluster. I would doublecheck if memory
>> is the issue. If I got it right, you have ~1120 shards per node. It takes
>> some heap just to keep them open. If you have some caches enabled and if it
>> is append only system, old shards will keep caches until reloaded.
>> Probably will not make much diff, but with 25x2=50 shards and 49 nodes,
>> one node will need to handle double indexing load.
>> 
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 27 Feb 2018, at 12:54, 苗海泉  wrote:
>>> 
>>> In addition, we found that the rate was normal when the number of
>>> collections was kept below 936 and the speed was slower and slower at
>> 984.
>>> Therefore, we could only temporarily delete the older collection, but now
>>> we need more Online collection, there has been no good way to confuse us
>>> for a long time, very much hope to give a solution to the problem of
>> ideas,
>>> greatly appreciated
>>> 
>>> 2018-02-27 19:46 GMT+08:00 苗海泉 :
>>> 
 Thank you for reply.
 One collection has 25 shard one replica, one solr node has about 5T on
 desk.
 GC is checked ,and modify as follow :
 SOLR_JAVA_MEM="-Xms32768m -Xmx32768m "
 GC_TUNE=" \
 -XX:+UseG1GC \
 -XX:+PerfDisableSharedMem \
 -XX:+ParallelRefProcEnabled \
 -XX:G1HeapRegionSize=8m \
 -XX:MaxGCPauseMillis=250 \
 -XX:InitiatingHeapOccupancyPercent=75 \
 -XX:+UseLargePages \
 -XX:+AggressiveOpts \
 -XX:+UseLargePages"
 
 2018-02-27 19:27 GMT+08:00 Emir Arnautović <
>> emir.arnauto...@sematext.com>:
 
> Hi,
> To get more complete picture, can you tell us how many shards/replicas
>> do
> you have per collection? Also what is index size on disk? Did you
>> check GC?
> 
> BTW, using 32GB heap prevents you from using compressed oops, resulting
> in less memory available than 31GB.
> 
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
> 
> 
> 
>> On 27 Feb 2018, at 11:36, 苗海泉  wrote:
>> 
>> I encountered a more serious problem in the process of using solr. We
> use
>> the solr version is 6.0, our daily amount of data is about 500 billion
>> documents, create a collection every hour, the online collection of
>> more
>> than a thousand, 49 solr nodes. If the collection in less than 800,
>> the
>> speed is still very fast, if the collection of the number of 1100 or
>> so,
>> the construction of solr index will drop sharply, one of the original
>> program speed of about 2-3 million TPS, Dropped to only a few hundred
>> or
>> even tens of TPS, who have encountered a similar situation, there is
>> no
>> good idea to find this issue. By the way, solr a node memory we
>> assigned
>> 32G,We checked the memory, cpu, disk IO, network IO occupancy is no
>> problem, belong to the normal state. Which friend encountered a
>> similar
>> problem, please inform the solution, thank you very much.
> 
> 
 
 
 --
 ==
 联创科技
 知行如一
 ==
 
>>> 
>>> 
>>> 
>>> --
>>> ==
>>> 联创科技
>>> 知行如一
>>> ==
>> 
>> 
> 
> 
> -- 
> ==
> 联创科技
> 知行如一
> ==



Re: Solr Phrase Count : How to get count of a phrase in a text field solr

2018-02-27 Thread aneeshkappu
Found the solution
put `debug=results` at the end of solr url
it will give you the phrase freq also.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thanks  for you reply again.
I just said that you may have some misunderstanding, we have 49 solr nodes,
each collection has 25 shards, each shard has only one replica of the data,
there is no copy, and I reduce the part of the cache. If you need the
metric data, I can check Come out to tell you, in addition we are only
additional system, there will not be any change action.

2018-02-27 20:05 GMT+08:00 Emir Arnautović :

> Hi,
> It is hard to tell without looking more into your metrics. It seems to me
> that you are reaching limits of your cluster. I would doublecheck if memory
> is the issue. If I got it right, you have ~1120 shards per node. It takes
> some heap just to keep them open. If you have some caches enabled and if it
> is append only system, old shards will keep caches until reloaded.
> Probably will not make much diff, but with 25x2=50 shards and 49 nodes,
> one node will need to handle double indexing load.
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 27 Feb 2018, at 12:54, 苗海泉  wrote:
> >
> > In addition, we found that the rate was normal when the number of
> > collections was kept below 936 and the speed was slower and slower at
> 984.
> > Therefore, we could only temporarily delete the older collection, but now
> > we need more Online collection, there has been no good way to confuse us
> > for a long time, very much hope to give a solution to the problem of
> ideas,
> > greatly appreciated
> >
> > 2018-02-27 19:46 GMT+08:00 苗海泉 :
> >
> >> Thank you for reply.
> >> One collection has 25 shard one replica, one solr node has about 5T on
> >> desk.
> >> GC is checked ,and modify as follow :
> >> SOLR_JAVA_MEM="-Xms32768m -Xmx32768m "
> >> GC_TUNE=" \
> >> -XX:+UseG1GC \
> >> -XX:+PerfDisableSharedMem \
> >> -XX:+ParallelRefProcEnabled \
> >> -XX:G1HeapRegionSize=8m \
> >> -XX:MaxGCPauseMillis=250 \
> >> -XX:InitiatingHeapOccupancyPercent=75 \
> >> -XX:+UseLargePages \
> >> -XX:+AggressiveOpts \
> >> -XX:+UseLargePages"
> >>
> >> 2018-02-27 19:27 GMT+08:00 Emir Arnautović <
> emir.arnauto...@sematext.com>:
> >>
> >>> Hi,
> >>> To get more complete picture, can you tell us how many shards/replicas
> do
> >>> you have per collection? Also what is index size on disk? Did you
> check GC?
> >>>
> >>> BTW, using 32GB heap prevents you from using compressed oops, resulting
> >>> in less memory available than 31GB.
> >>>
> >>> Thanks,
> >>> Emir
> >>> --
> >>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
> >>>
> >>>
>  On 27 Feb 2018, at 11:36, 苗海泉  wrote:
> 
>  I encountered a more serious problem in the process of using solr. We
> >>> use
>  the solr version is 6.0, our daily amount of data is about 500 billion
>  documents, create a collection every hour, the online collection of
> more
>  than a thousand, 49 solr nodes. If the collection in less than 800,
> the
>  speed is still very fast, if the collection of the number of 1100 or
> so,
>  the construction of solr index will drop sharply, one of the original
>  program speed of about 2-3 million TPS, Dropped to only a few hundred
> or
>  even tens of TPS, who have encountered a similar situation, there is
> no
>  good idea to find this issue. By the way, solr a node memory we
> assigned
>  32G,We checked the memory, cpu, disk IO, network IO occupancy is no
>  problem, belong to the normal state. Which friend encountered a
> similar
>  problem, please inform the solution, thank you very much.
> >>>
> >>>
> >>
> >>
> >> --
> >> ==
> >> 联创科技
> >> 知行如一
> >> ==
> >>
> >
> >
> >
> > --
> > ==
> > 联创科技
> > 知行如一
> > ==
>
>


-- 
==
联创科技
知行如一
==


Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
Hi,
It is hard to tell without looking more into your metrics. It seems to me that 
you are reaching limits of your cluster. I would doublecheck if memory is the 
issue. If I got it right, you have ~1120 shards per node. It takes some heap 
just to keep them open. If you have some caches enabled and if it is append 
only system, old shards will keep caches until reloaded.
Probably will not make much diff, but with 25x2=50 shards and 49 nodes, one 
node will need to handle double indexing load.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Feb 2018, at 12:54, 苗海泉  wrote:
> 
> In addition, we found that the rate was normal when the number of
> collections was kept below 936 and the speed was slower and slower at 984.
> Therefore, we could only temporarily delete the older collection, but now
> we need more Online collection, there has been no good way to confuse us
> for a long time, very much hope to give a solution to the problem of ideas,
> greatly appreciated
> 
> 2018-02-27 19:46 GMT+08:00 苗海泉 :
> 
>> Thank you for reply.
>> One collection has 25 shard one replica, one solr node has about 5T on
>> desk.
>> GC is checked ,and modify as follow :
>> SOLR_JAVA_MEM="-Xms32768m -Xmx32768m "
>> GC_TUNE=" \
>> -XX:+UseG1GC \
>> -XX:+PerfDisableSharedMem \
>> -XX:+ParallelRefProcEnabled \
>> -XX:G1HeapRegionSize=8m \
>> -XX:MaxGCPauseMillis=250 \
>> -XX:InitiatingHeapOccupancyPercent=75 \
>> -XX:+UseLargePages \
>> -XX:+AggressiveOpts \
>> -XX:+UseLargePages"
>> 
>> 2018-02-27 19:27 GMT+08:00 Emir Arnautović :
>> 
>>> Hi,
>>> To get more complete picture, can you tell us how many shards/replicas do
>>> you have per collection? Also what is index size on disk? Did you check GC?
>>> 
>>> BTW, using 32GB heap prevents you from using compressed oops, resulting
>>> in less memory available than 31GB.
>>> 
>>> Thanks,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
 On 27 Feb 2018, at 11:36, 苗海泉  wrote:
 
 I encountered a more serious problem in the process of using solr. We
>>> use
 the solr version is 6.0, our daily amount of data is about 500 billion
 documents, create a collection every hour, the online collection of more
 than a thousand, 49 solr nodes. If the collection in less than 800, the
 speed is still very fast, if the collection of the number of 1100 or so,
 the construction of solr index will drop sharply, one of the original
 program speed of about 2-3 million TPS, Dropped to only a few hundred or
 even tens of TPS, who have encountered a similar situation, there is no
 good idea to find this issue. By the way, solr a node memory we assigned
 32G,We checked the memory, cpu, disk IO, network IO occupancy is no
 problem, belong to the normal state. Which friend encountered a similar
 problem, please inform the solution, thank you very much.
>>> 
>>> 
>> 
>> 
>> --
>> ==
>> 联创科技
>> 知行如一
>> ==
>> 
> 
> 
> 
> -- 
> ==
> 联创科技
> 知行如一
> ==



Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
In addition, we found that the rate was normal when the number of
collections was kept below 936 and the speed was slower and slower at 984.
Therefore, we could only temporarily delete the older collection, but now
we need more Online collection, there has been no good way to confuse us
for a long time, very much hope to give a solution to the problem of ideas,
greatly appreciated

2018-02-27 19:46 GMT+08:00 苗海泉 :

> Thank you for reply.
> One collection has 25 shard one replica, one solr node has about 5T on
> desk.
> GC is checked ,and modify as follow :
> SOLR_JAVA_MEM="-Xms32768m -Xmx32768m "
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+PerfDisableSharedMem \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=250 \
> -XX:InitiatingHeapOccupancyPercent=75 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> -XX:+UseLargePages"
>
> 2018-02-27 19:27 GMT+08:00 Emir Arnautović :
>
>> Hi,
>> To get more complete picture, can you tell us how many shards/replicas do
>> you have per collection? Also what is index size on disk? Did you check GC?
>>
>> BTW, using 32GB heap prevents you from using compressed oops, resulting
>> in less memory available than 31GB.
>>
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 27 Feb 2018, at 11:36, 苗海泉  wrote:
>> >
>> > I encountered a more serious problem in the process of using solr. We
>> use
>> > the solr version is 6.0, our daily amount of data is about 500 billion
>> > documents, create a collection every hour, the online collection of more
>> > than a thousand, 49 solr nodes. If the collection in less than 800, the
>> > speed is still very fast, if the collection of the number of 1100 or so,
>> > the construction of solr index will drop sharply, one of the original
>> > program speed of about 2-3 million TPS, Dropped to only a few hundred or
>> > even tens of TPS, who have encountered a similar situation, there is no
>> > good idea to find this issue. By the way, solr a node memory we assigned
>> > 32G,We checked the memory, cpu, disk IO, network IO occupancy is no
>> > problem, belong to the normal state. Which friend encountered a similar
>> > problem, please inform the solution, thank you very much.
>>
>>
>
>
> --
> ==
> 联创科技
> 知行如一
> ==
>



-- 
==
联创科技
知行如一
==


Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thank you for reply.
One collection has 25 shard one replica, one solr node has about 5T on desk.
GC is checked ,and modify as follow :
SOLR_JAVA_MEM="-Xms32768m -Xmx32768m "
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=250 \
-XX:InitiatingHeapOccupancyPercent=75 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
-XX:+UseLargePages"

2018-02-27 19:27 GMT+08:00 Emir Arnautović :

> Hi,
> To get more complete picture, can you tell us how many shards/replicas do
> you have per collection? Also what is index size on disk? Did you check GC?
>
> BTW, using 32GB heap prevents you from using compressed oops, resulting in
> less memory available than 31GB.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 27 Feb 2018, at 11:36, 苗海泉  wrote:
> >
> > I encountered a more serious problem in the process of using solr. We use
> > the solr version is 6.0, our daily amount of data is about 500 billion
> > documents, create a collection every hour, the online collection of more
> > than a thousand, 49 solr nodes. If the collection in less than 800, the
> > speed is still very fast, if the collection of the number of 1100 or so,
> > the construction of solr index will drop sharply, one of the original
> > program speed of about 2-3 million TPS, Dropped to only a few hundred or
> > even tens of TPS, who have encountered a similar situation, there is no
> > good idea to find this issue. By the way, solr a node memory we assigned
> > 32G,We checked the memory, cpu, disk IO, network IO occupancy is no
> > problem, belong to the normal state. Which friend encountered a similar
> > problem, please inform the solution, thank you very much.
>
>


-- 
==
联创科技
知行如一
==


Re: is it appropriate to use external cache for whole shards

2018-02-27 Thread Emir Arnautović
Hi,
Assuming you have some web interface, it is not uncommon to apply caching in 
web browser/middle layer/Solr. The question is if you can live with stale data 
or if you have some nice mechanism to invalidate data when needed. Solr does 
that “blindly” - on every commit that includes opening searcher, it will 
invalidate all caches. Increasing commit time can result in better cache 
utilisation and better average query latency. You need to monitor your caches 
to see if cache utilisation justifies having caches or if you are doing queries 
properly so caches can be utilised.
You mentioned that your shard size is 30GB. Shard size is what dictates query 
latency. Maybe you reached shard size and you can no longer achieve targeted 
latency and caches will just help a bit but any cache miss will be slow. I 
would also address this issue rather then hoping that caches will be good 
enough to hide slow queries.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Feb 2018, at 06:18, park  wrote:
> 
> I'm indexing and searching documents using solr 6.x.
> It is quite efficient when there are fewer shards and fewer cluster units.
> However, when the number of shards exceeds 30 and the size of each shard is
> 30G, the search performance is significantly reduced.
> Currently, usercache in solr is actively used, so we plan queryResultCache
> for the entire shards.
> Is this the right solution what  trying to use an external cache?(for
> example, redis, memcahced, apache ignite, etc.)
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Rename solrconfig.xml

2018-02-27 Thread Zheng Lin Edwin Yeo
Hi Shawn,

Yes, I'm running SolrCloud.

Meaning we have to create all the cores in the collection with the default
solrconfig.xml first?
Then we have to modify the core.properties, and rename the solrconfig.xml.
After which, we have to reload the renamed config to ZooKeeper, then reload
the collection?

We will need to customize our program for the client, which is why we
wanted to have our own unique config file.

Regards,
Edwin


On 27 February 2018 at 17:21, Shawn Heisey  wrote:

> On 2/27/2018 12:59 AM, Zheng Lin Edwin Yeo wrote:
>
>> Regarding the core.properties, understand from the Solr guide that we need
>> to define the "config" properties first. However, my core.properties will
>> only be created when I create the collection from the command
>> http://localhost:8983/solr/admin/collections?action=CREATE;
>> name=collection
>>
>> The core.properties does not exists, and if I try to create one manually,
>> Solr will not read it, and it will still try to look for solrconfig.xml.
>>
>> What should be the right way to create the core.properties?
>>
>
> If you're running SolrCloud, you'll very likely have to allow it to create
> all the cores in the collection, then go back and modify the
> core.properties files that get created, and reload the collection once
> they're all changed.  If this actually works, keep in mind that the renamed
> config file is going to be loaded from zookeeper, right where
> solrconfig.xml would normally exist.
>
> Specifying options remotely in core.properties can only be done with the
> CoreAdmin API, but this is not used when in Cloud mode. The Collections API
> actually *does* use the CoreAdmin API behind the scenes, but because its
> usage in SolrCloud is very much an expert-level task, you shouldn't use it
> directly.
>
> The big question I have:  Why would you want to cause yourself difficulty
> by doing this?
>
> Thanks,
> Shawn
>
>


Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
Hi,
To get more complete picture, can you tell us how many shards/replicas do you 
have per collection? Also what is index size on disk? Did you check GC?

BTW, using 32GB heap prevents you from using compressed oops, resulting in less 
memory available than 31GB.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Feb 2018, at 11:36, 苗海泉  wrote:
> 
> I encountered a more serious problem in the process of using solr. We use
> the solr version is 6.0, our daily amount of data is about 500 billion
> documents, create a collection every hour, the online collection of more
> than a thousand, 49 solr nodes. If the collection in less than 800, the
> speed is still very fast, if the collection of the number of 1100 or so,
> the construction of solr index will drop sharply, one of the original
> program speed of about 2-3 million TPS, Dropped to only a few hundred or
> even tens of TPS, who have encountered a similar situation, there is no
> good idea to find this issue. By the way, solr a node memory we assigned
> 32G,We checked the memory, cpu, disk IO, network IO occupancy is no
> problem, belong to the normal state. Which friend encountered a similar
> problem, please inform the solution, thank you very much.



Re: Changing Leadership in SolrCloud

2018-02-27 Thread Shalin Shekhar Mangar
When you block communication between Zookeeper and the leader, the ZK
client inside Solr will disconnect and its session will expire after the
session timeout. At this point a new leader should be elected
automatically. The default timeout is 30 seconds. You should be able to see
the value in solr.xml property named "zkClientTimeout".

On Tue, Feb 27, 2018 at 2:06 PM, zahra121  wrote:

> Suppose I have a node which is a leader in SolrCloud.
>
> When I block this leader's SolrCloud and Zookeeper ports by the command
> "firewall-cmd --remove-port=/tcp --permanent", the leader does
> not
> change automatically and this leader status remains active in solr admin
> UI.
>
> Thus, I decided to change the leader manually. I tried REBALANCELEADERS and
> ADDROLE commands in solrCloud, however the leader did not change!
>
> How can I manually change the leader if the firewall blocks the SolrCloud
> ports from being listened?
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Regards,
Shalin Shekhar Mangar.


RE: Question on "other language" than english stemmers and using both

2018-02-27 Thread TG Servers

Ok thanks!
Thomas


Am 27. Februar 2018 11:36:52 vorm. schrieb Markus Jelsma 
:



Maybe check the example directory, it has lots of languages configured:
https://github.com/apache/lucene-solr/blob/master/solr/example/files/conf/managed-schema

And be sure to check out the manual on the subject:
https://lucene.apache.org/solr/guide/7_2/language-analysis.html



-Original message-

From:TG Servers 
Sent: Tuesday 27th February 2018 11:18
To: solr-user@lucene.apache.org
Subject: RE: Question on other language than english stemmers 
and using both


Ok thank you. Sounds like a bit more reading into the whole thing. It's
just a tool for me so i didn't want to go too deep into it bit sometimes a
must is a must. :) default schema.xml? I just get this managed_schema file
when installing. Do you mean that one?


Am 27. Februar 2018 11:12:39 vorm. schrieb Markus Jelsma
:

> Hello,
>
> Mixing language specific filters in the same analyzer is not going to give
> predictable or desirable results. Instead, create separate text_en and
> text_de fieldTypes and fields.  See Solr's default schema.xml, it has many
> examples of various languages.
>
> Depending on what query parser you use, you need to make sure you search on
> both fields now.
>
> Regards,
> Markus
>
> -Original message-
>> From:TG Servers 
>> Sent: Tuesday 27th February 2018 8:26
>> To: solr-user@lucene.apache.org
>> Subject: Question on other language than english stemmers and
>> using both
>>
>> Hi,
>>
>> I currently adapted this schema.xml for dovecot and Solr 7.2.1.
>> Now this is stemming only english words.
>> What do I have to do to use it for english AND german?
>> Can I just put the according german filterfactorys appended to it or
>> does that not work?
>> E.g.
>> ...
>> 
>> 
>> 
>> ...
>>
>> Thanks,
>> Thomas
>>
>> Original schema :
>>
>> 
>> 
>> 
>> 
>> 
>> 
>>
>> 
>> 
>> 
>> > words="lang/stopwords_en.txt"/>
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> > ignoreCase="true" expand="true"/>
>> > words="lang/stopwords_en.txt"/>
>> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> > required="true" />
>> > required="true" />
>> > required="true" />
>> > required="true" />
>>
>> 
>> 
>>
>> 
>> 
>> 
>> 
>> 
>>
>> 
>> 
>> 
>>
>> id
>> 
>>








When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
I encountered a more serious problem in the process of using solr. We use
the solr version is 6.0, our daily amount of data is about 500 billion
documents, create a collection every hour, the online collection of more
than a thousand, 49 solr nodes. If the collection in less than 800, the
speed is still very fast, if the collection of the number of 1100 or so,
the construction of solr index will drop sharply, one of the original
program speed of about 2-3 million TPS, Dropped to only a few hundred or
even tens of TPS, who have encountered a similar situation, there is no
good idea to find this issue. By the way, solr a node memory we assigned
32G,We checked the memory, cpu, disk IO, network IO occupancy is no
problem, belong to the normal state. Which friend encountered a similar
problem, please inform the solution, thank you very much.


RE: Question on "other language" than english stemmers and using both

2018-02-27 Thread Markus Jelsma
Maybe check the example directory, it has lots of languages configured:
https://github.com/apache/lucene-solr/blob/master/solr/example/files/conf/managed-schema

And be sure to check out the manual on the subject:
https://lucene.apache.org/solr/guide/7_2/language-analysis.html

 
 
-Original message-
> From:TG Servers 
> Sent: Tuesday 27th February 2018 11:18
> To: solr-user@lucene.apache.org
> Subject: RE: Question on other language than english stemmers and 
> using both
> 
> Ok thank you. Sounds like a bit more reading into the whole thing. It's 
> just a tool for me so i didn't want to go too deep into it bit sometimes a 
> must is a must. :) default schema.xml? I just get this managed_schema file 
> when installing. Do you mean that one?
> 
> 
> Am 27. Februar 2018 11:12:39 vorm. schrieb Markus Jelsma 
> :
> 
> > Hello,
> >
> > Mixing language specific filters in the same analyzer is not going to give 
> > predictable or desirable results. Instead, create separate text_en and 
> > text_de fieldTypes and fields.  See Solr's default schema.xml, it has many 
> > examples of various languages.
> >
> > Depending on what query parser you use, you need to make sure you search on 
> > both fields now.
> >
> > Regards,
> > Markus
> >
> > -Original message-
> >> From:TG Servers 
> >> Sent: Tuesday 27th February 2018 8:26
> >> To: solr-user@lucene.apache.org
> >> Subject: Question on other language than english stemmers and 
> >> using both
> >>
> >> Hi,
> >>
> >> I currently adapted this schema.xml for dovecot and Solr 7.2.1.
> >> Now this is stemming only english words.
> >> What do I have to do to use it for english AND german?
> >> Can I just put the according german filterfactorys appended to it or
> >> does that not work?
> >> E.g.
> >> ...
> >> 
> >> 
> >> 
> >> ...
> >>
> >> Thanks,
> >> Thomas
> >>
> >> Original schema :
> >>
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >>
> >> 
> >> 
> >> 
> >>  >> words="lang/stopwords_en.txt"/>
> >>  >> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >>  >> ignoreCase="true" expand="true"/>
> >>  >> words="lang/stopwords_en.txt"/>
> >>  >> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >>  >> required="true" />
> >>  >> required="true" />
> >>  >> required="true" />
> >>  >> required="true" />
> >>
> >> 
> >> 
> >>
> >> 
> >> 
> >> 
> >> 
> >> 
> >>
> >> 
> >> 
> >> 
> >>
> >> id
> >> 
> >>
> 
> 
> 


Re: Changing Leadership in SolrCloud

2018-02-27 Thread Zahra Aminolroaya
The leader status is active. My main question is that how I can change the
leader in SolrCloud.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Question on "other language" than english stemmers and using both

2018-02-27 Thread TG Servers
Ok thank you. Sounds like a bit more reading into the whole thing. It's 
just a tool for me so i didn't want to go too deep into it bit sometimes a 
must is a must. :) default schema.xml? I just get this managed_schema file 
when installing. Do you mean that one?



Am 27. Februar 2018 11:12:39 vorm. schrieb Markus Jelsma 
:



Hello,

Mixing language specific filters in the same analyzer is not going to give 
predictable or desirable results. Instead, create separate text_en and 
text_de fieldTypes and fields.  See Solr's default schema.xml, it has many 
examples of various languages.


Depending on what query parser you use, you need to make sure you search on 
both fields now.


Regards,
Markus

-Original message-

From:TG Servers 
Sent: Tuesday 27th February 2018 8:26
To: solr-user@lucene.apache.org
Subject: Question on other language than english stemmers and 
using both


Hi,

I currently adapted this schema.xml for dovecot and Solr 7.2.1.
Now this is stemming only english words.
What do I have to do to use it for english AND german?
Can I just put the according german filterfactorys appended to it or
does that not work?
E.g.
...



...

Thanks,
Thomas

Original schema :


















































id







Re: Changing Leadership in SolrCloud

2018-02-27 Thread Amin Raeiszadeh
i don't understand your problem clearly but solr admin ui has some bugs.
to check your cloud nodes state use the CLUSTERSTATUS command:

/admin/collections?action=CLUSTERSTATUS
in some cases your command was done but you can't see in admin ui.

On Tue, Feb 27, 2018 at 12:49 PM, Shawn Heisey  wrote:

> On 2/27/2018 1:36 AM, zahra121 wrote:
>
>> Suppose I have a node which is a leader in SolrCloud.
>>
>> When I block this leader's SolrCloud and Zookeeper ports by the command
>> "firewall-cmd --remove-port=/tcp --permanent", the leader does
>> not
>> change automatically and this leader status remains active in solr admin
>> UI.
>>
>> Thus, I decided to change the leader manually. I tried REBALANCELEADERS
>> and
>> ADDROLE commands in solrCloud, however the leader did not change!
>>
>
> I am not completely familiar with how SolrCloud handles down servers, but
> I don't think it proactively does any kind of "ping" to make sure they're
> still up.  Probably you would need to send a request that SolrCloud tries
> to send to the down server, so that the cluster can notice that Solr is
> down and change the clusterstate.
>
> ZK should be a lot more responsive to changes like that, because it DOES
> use a ping-like mechanism to see if servers are up.  Solr's admin UI does
> not have any visibility into which ZK server is the leader, though -- so
> you can't see the results of blocking a ZK server unless you look at the ZK
> log.
>
> Thanks,
> Shawn
>
>


RE: Question on "other language" than english stemmers and using both

2018-02-27 Thread Markus Jelsma
Hello,

Mixing language specific filters in the same analyzer is not going to give 
predictable or desirable results. Instead, create separate text_en and text_de 
fieldTypes and fields.  See Solr's default schema.xml, it has many examples of 
various languages.

Depending on what query parser you use, you need to make sure you search on 
both fields now.

Regards,
Markus
 
-Original message-
> From:TG Servers 
> Sent: Tuesday 27th February 2018 8:26
> To: solr-user@lucene.apache.org
> Subject: Question on other language than english stemmers and 
> using both
> 
> Hi,
> 
> I currently adapted this schema.xml for dovecot and Solr 7.2.1.
> Now this is stemming only english words.
> What do I have to do to use it for english AND german?
> Can I just put the according german filterfactorys appended to it or
> does that not work?
> E.g.
> ...
> 
> 
> 
> ...
> 
> Thanks,
> Thomas
> 
> Original schema :
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  words="lang/stopwords_en.txt"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> 
> 
> 
> 
> 
> 
>  ignoreCase="true" expand="true"/>
>  words="lang/stopwords_en.txt"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> 
> 
> 
> 
> 
> 
>  required="true" />
>  required="true" />
>  required="true" />
>  required="true" />
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> id
> 
> 


Re: Changing Leadership in SolrCloud

2018-02-27 Thread Zahra Aminolroaya
Thanks Shawn for the reply. when I try to add a document to solr I get the
"no route to host" exception. this means that SolrCloud is aware of the
blocking ports; However, zookeeper does not automatically change the leader!  



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: NRT replicas miss hits and return duplicate hits when paging solrcloud searches

2018-02-27 Thread Emir Arnautović
Hi Webster,
Since you are returning all hits, returning the last page is almost as heavy 
for Solr as returning all documents. Maybe you should consider just returning 
one large page and completely avoid this issue.
I agree with you that this should be handled by Solr. ES solved this issue with 
“preference” search parameter where you can set session id as preference and it 
will stick to the same shards. I guess you could try similar thing on your own 
but that would require you to send list of shards as parameter for your search 
and balance it for different sessions.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 26 Feb 2018, at 21:03, Webster Homer  wrote:
> 
> Erick,
> 
> No we didn't look at that. I will add it to the list. We have  not seen
> performance issues with solr. We have much slower technologies in our
> stack. This project was to replace a system that was too slow.
> 
> Thank you, I will look into it
> 
> Webster
> 
> On Mon, Feb 26, 2018 at 1:13 PM, Erick Erickson 
> wrote:
> 
>> Did you try enabling distributed IDF (statsCache)? See:
>> https://lucene.apache.org/solr/guide/6_6/distributed-requests.html
>> 
>> It's may not totally fix the issue, but it's worth trying. It does
>> come with a performance penalty of course.
>> 
>> Best,
>> Erick
>> 
>> On Mon, Feb 26, 2018 at 11:00 AM, Webster Homer 
>> wrote:
>>> Thanks Shawn, I had settled on this as a solution.
>>> 
>>> All our use cases for Solr is to return results in order of relevancy to
>>> the query, so having a deterministic sort would defeat that purpose.
>> Since
>>> we wanted to be able to return all the results for a query, I originally
>>> looked at using the Streaming API, but that doesn't support returning
>>> results sorted by relevancy
>>> 
>>> I disagree with you about NRT replicas though. They may function as
>>> designed, but since they cannot guarantee consistent results their design
>>> is buggy, at least it is for a search engine.
>>> 
>>> 
>>> On Mon, Feb 26, 2018 at 12:20 PM, Shawn Heisey 
>> wrote:
>>> 
 On 2/26/2018 10:26 AM, Webster Homer wrote:
> We need the results by relevancy so the application sorts the results
>> by
> score desc, and the unique id ascending as the tie breaker
 
 This is the reason for the discrepancy, and why the different replica
 types don't have the same issue.
 
 Each NRT replica can have different deleted documents than the others,
 just due to the way that NRT replicas work.  Deleted documents affect
 relevancy scoring.  When one replica has say 5000 deleted documents and
 another has 200, or has 5000 but they're different docs, a relevancy
 sort can end up different.  So when Solr goes to one replica for page 1
 and another for page 2 (which is expected due to SolrCloud's internal
 load balancing), you may end up with duplicate documents or documents
 missing.  Because deleted documents are not counted or returned,
 numFound will be consistent, as long as the index doesn't change between
 the queries for pages.
 
 If you were using a deterministic sort rather than relevancy, this
 wouldn't be happening, because deleted documents have no influence on
 that kind of sort.
 
 With TLOG or PULL, the replicas are absolutely identical, so there is no
 difference, unless the index is changing as you page through the
>> results.
 
 I think changing replica types is the only solution here.  NRT replicas
 are working as they were designed -- there's no bug, even though
 problems like this do sometimes turn up.
 
 Thanks,
 Shawn
 
 
>>> 
>>> --
>>> 
>>> 
>>> This message and any attachment are confidential and may be privileged or
>>> otherwise protected from disclosure. If you are not the intended
>> recipient,
>>> you must not copy this message or attachment or disclose the contents to
>>> any other person. If you have received this transmission in error, please
>>> notify the sender immediately and delete the message and any attachment
>>> from your system. Merck KGaA, Darmstadt, Germany and any of its
>>> subsidiaries do not accept liability for any omissions or errors in this
>>> message which may arise as a result of E-Mail-transmission or for damages
>>> resulting from any unauthorized changes of the content of this message
>> and
>>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>>> subsidiaries do not guarantee that this message is free of viruses and
>> does
>>> not accept liability for any damages caused by any virus transmitted
>>> therewith.
>>> 
>>> Click http://www.emdgroup.com/disclaimer to access the German, French,
>>> Spanish and Portuguese versions of this disclaimer.
>> 
> 
> -- 
> 
> 
> This message and any attachment are 

Re: Rename solrconfig.xml

2018-02-27 Thread Shawn Heisey

On 2/27/2018 12:59 AM, Zheng Lin Edwin Yeo wrote:

Regarding the core.properties, understand from the Solr guide that we need
to define the "config" properties first. However, my core.properties will
only be created when I create the collection from the command
http://localhost:8983/solr/admin/collections?action=CREATE=collection

The core.properties does not exists, and if I try to create one manually,
Solr will not read it, and it will still try to look for solrconfig.xml.

What should be the right way to create the core.properties?


If you're running SolrCloud, you'll very likely have to allow it to 
create all the cores in the collection, then go back and modify the 
core.properties files that get created, and reload the collection once 
they're all changed.  If this actually works, keep in mind that the 
renamed config file is going to be loaded from zookeeper, right where 
solrconfig.xml would normally exist.


Specifying options remotely in core.properties can only be done with the 
CoreAdmin API, but this is not used when in Cloud mode. The Collections 
API actually *does* use the CoreAdmin API behind the scenes, but because 
its usage in SolrCloud is very much an expert-level task, you shouldn't 
use it directly.


The big question I have:  Why would you want to cause yourself 
difficulty by doing this?


Thanks,
Shawn



Re: Changing Leadership in SolrCloud

2018-02-27 Thread Shawn Heisey

On 2/27/2018 1:36 AM, zahra121 wrote:

Suppose I have a node which is a leader in SolrCloud.

When I block this leader's SolrCloud and Zookeeper ports by the command
"firewall-cmd --remove-port=/tcp --permanent", the leader does not
change automatically and this leader status remains active in solr admin UI.

Thus, I decided to change the leader manually. I tried REBALANCELEADERS and
ADDROLE commands in solrCloud, however the leader did not change!


I am not completely familiar with how SolrCloud handles down servers, 
but I don't think it proactively does any kind of "ping" to make sure 
they're still up.  Probably you would need to send a request that 
SolrCloud tries to send to the down server, so that the cluster can 
notice that Solr is down and change the clusterstate.


ZK should be a lot more responsive to changes like that, because it DOES 
use a ping-like mechanism to see if servers are up.  Solr's admin UI 
does not have any visibility into which ZK server is the leader, though 
-- so you can't see the results of blocking a ZK server unless you look 
at the ZK log.


Thanks,
Shawn



Changing Leadership in SolrCloud

2018-02-27 Thread zahra121
Suppose I have a node which is a leader in SolrCloud.

When I block this leader's SolrCloud and Zookeeper ports by the command
"firewall-cmd --remove-port=/tcp --permanent", the leader does not
change automatically and this leader status remains active in solr admin UI.

Thus, I decided to change the leader manually. I tried REBALANCELEADERS and
ADDROLE commands in solrCloud, however the leader did not change!

How can I manually change the leader if the firewall blocks the SolrCloud
ports from being listened?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html