from:"Susheel Kumar"

Re: Solr ping taking 600 seconds

2020-08-17 Thread Susheel Kumar

yes, Alex. This is reproducible. Will check if we can run Wireshark.

Thank you.

On Mon, Aug 17, 2020 at 8:11 PM Alexandre Rafalovitch 
wrote:

> If this is reproducible, I would run Wireshark on the network and see what
> happens at packet level.
>
> Leaning towards firewall timing out and just starting to drop all packets.
>
> Regards,
>Alex
>
> On Mon., Aug. 17, 2020, 6:22 p.m. Susheel Kumar, 
> wrote:
>
> > Thanks for the all responses.
> >
> > Shawn - to your point both ping or select in between taking 600+ seconds
> to
> > return as you can see below 1st ping attempt was all good and 2nd took
> long
> > time.  Similarly for select couple of select all returned fine and then
> > suddenly taking long time. I'll try to run select with shards.info to
> see
> > if it is a problem with any particular shard but solr.log on many of the
> > shard has QTime>600s entries.
> >
> > Heap doesn't seems to be a problem but will take a look on all the
> shards.
> > I'll share top output as well.
> >
> > Thnx
> >
> >
> > Ping
> >
> > server65:/home/kumar # curl --location --request GET '
> > http://server1:8080/solr/COLL/admin/ping?distrib=true'
> > 
> > 
> > true > name="status">020 > name="q">{!lucene}*:*true > name="df">wordTokensfalse > name="rows">10all > name="status">OK
> > 
> > server65:/home/kumar # curl --location --request GET '
> > http://server1:8080/solr/COLL/admin/ping?distrib=true'
> > 
> > 
> > true > name="status">0600123 name="params"> > name="q">{!lucene}*:*true > name="df">wordTokensfalse > name="rows">10all > name="status">OK
> > 
> >
> > select
> >
> >
> > server67:/home/kumar # curl --location --request GET '
> > http://server1:8080/solr/COLL/select?indent=on=*:*=json=0'
> > {
> >   "responseHeader":{
> > "zkConnected":true,
> > "status":0,
> > "QTime":13,
> > "params":{
> >   "q":"*:*",
> >   "indent":"on",
> >   "rows":"0",
> >   "wt":"json"}},
> >   "response":{"numFound":62221186,"start":0,"maxScore":1.0,"docs":[]
> >   }}
> > server67:/home/kumar # curl --location --request GET '
> > http://server1:8080/solr/COLL/select?indent=on=*:*=json=0'
> > {
> >   "responseHeader":{
> > "zkConnected":true,
> > "status":0,
> > "QTime":10,
> > "params":{
> >   "q":"*:*",
> >   "indent":"on",
> >   "rows":"0",
> >   "wt":"json"}},
> >   "response":{"numFound":62221186,"start":0,"maxScore":1.0,"docs":[]
> >   }}
> > server67:/home/kumar # curl --location --request GET '
> > http://server1:8080/solr/COLL/select?indent=on=*:*=json=0'
> > {
> >   "responseHeader":{
> > "zkConnected":true,
> > "status":0,
> > "QTime":18,
> > "params":{
> >   "q":"*:*",
> >   "indent":"on",
> >   "rows":"0",
> >   "wt":"json"}},
> >   "response":{"numFound":63094900,"start":0,"maxScore":1.0,"docs":[]
> >   }}
> > server67:/home/kumar # curl --location --request GET '
> > http://server1:8080/solr/COLL/select?indent=on=*:*=json=0'
> > {
> >   "responseHeader":{
> > "zkConnected":true,
> > "status":0,
> > "QTime":600093,
> > "params":{
> >   "q":"*:*",
> >   "indent":"on",
> >   "rows":"0",
> >   "wt":"json"}},
> >   "response":{"numFound":62221186,"start":0,"maxScore":1.0,"docs":[]
> >   }}
> >
> > On Sat, Aug 15, 2020 at 1:41 PM Dominique Bejean <
> > dominique.bej...@eolya.fr>
> > wrote:
> >
> > > Hi,
> > >
> > > How long to display the solr console ?
> > > What about CPU and

Re: Solr ping taking 600 seconds

2020-08-17 Thread Susheel Kumar

Thanks for the all responses.

Shawn - to your point both ping or select in between taking 600+ seconds to
return as you can see below 1st ping attempt was all good and 2nd took long
time.  Similarly for select couple of select all returned fine and then
suddenly taking long time. I'll try to run select with shards.info to see
if it is a problem with any particular shard but solr.log on many of the
shard has QTime>600s entries.

Heap doesn't seems to be a problem but will take a look on all the shards.
I'll share top output as well.

Thnx


Ping

server65:/home/kumar # curl --location --request GET '
http://server1:8080/solr/COLL/admin/ping?distrib=true'


true020{!lucene}*:*truewordTokensfalse10allOK

server65:/home/kumar # curl --location --request GET '
http://server1:8080/solr/COLL/admin/ping?distrib=true'


true0600123{!lucene}*:*truewordTokensfalse10allOK


select


server67:/home/kumar # curl --location --request GET '
http://server1:8080/solr/COLL/select?indent=on=*:*=json=0'
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":13,
"params":{
  "q":"*:*",
  "indent":"on",
  "rows":"0",
  "wt":"json"}},
  "response":{"numFound":62221186,"start":0,"maxScore":1.0,"docs":[]
  }}
server67:/home/kumar # curl --location --request GET '
http://server1:8080/solr/COLL/select?indent=on=*:*=json=0'
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":10,
"params":{
  "q":"*:*",
  "indent":"on",
  "rows":"0",
  "wt":"json"}},
  "response":{"numFound":62221186,"start":0,"maxScore":1.0,"docs":[]
  }}
server67:/home/kumar # curl --location --request GET '
http://server1:8080/solr/COLL/select?indent=on=*:*=json=0'
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":18,
"params":{
  "q":"*:*",
  "indent":"on",
  "rows":"0",
  "wt":"json"}},
  "response":{"numFound":63094900,"start":0,"maxScore":1.0,"docs":[]
  }}
server67:/home/kumar # curl --location --request GET '
http://server1:8080/solr/COLL/select?indent=on=*:*=json=0'
{
  "responseHeader":{
"zkConnected":true,
    "status":0,
"QTime":600093,
"params":{
  "q":"*:*",
  "indent":"on",
  "rows":"0",
  "wt":"json"}},
  "response":{"numFound":62221186,"start":0,"maxScore":1.0,"docs":[]
  }}

On Sat, Aug 15, 2020 at 1:41 PM Dominique Bejean 
wrote:

> Hi,
>
> How long to display the solr console ?
> What about CPU and iowait with top ?
>
> You should start by eliminate network issue between your solr nodes by
> testing it with netcat on solr port.
> http://deice.daug.net/netcat_speed.html
>
> Dominique
>
> Le ven. 14 août 2020 à 23:40, Susheel Kumar  a
> écrit :
>
> > Hello,
> >
> >
> >
> > One of our Solr 6.6.2 DR cluster (target CDCR) which even doesn't have
> any
> >
> > live search load seems to be taking 60 ms many times for the ping /
> >
> > health check calls. Anyone has seen this before/suggestion what could be
> >
> > wrong. The collection has 8 shards/3 replicas and 64GB memory and index
> >
> > seems to fit in memory. Below solr log entries.
> >
> >
> >
> >
> >
> > solr.log.26:2020-08-13 14:03:20.827 INFO  (qtp1775120226-46486) [c:COLL
> >
> > s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.S.Request
> >
> > [COLL_shard1_replica1]  webapp=/solr path=/admin/ping
> >
> > params={distrib=true&_stateVer_=COLL:3032=javabin=2}
> >
> > hits=62569458 status=0 QTime=600113
> >
> > solr.log.26:2020-08-13 14:03:20.827 WARN  (qtp1775120226-46486) [c:COLL
> >
> > s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.SolrCore slow:
> >
> > [COLL_shard1_replica1]  webapp=/solr path=/admin/ping
> >
> > params={distrib=true&_stateVer_=COLL:3032=javabin=2}
> >
> > hits=62569458 status=0 QTime=600113
> >
> > solr.log.26:2020-08-13 14:03:20.827 INFO  (qtp1775120226-46486) [c:COLL
> >
> > s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.S.Request
> >
> > [COLL_shard1_replica1]  webapp=/sol

Solr ping taking 600 seconds

2020-08-14 Thread Susheel Kumar

Hello,

One of our Solr 6.6.2 DR cluster (target CDCR) which even doesn't have any
live search load seems to be taking 60 ms many times for the ping /
health check calls. Anyone has seen this before/suggestion what could be
wrong. The collection has 8 shards/3 replicas and 64GB memory and index
seems to fit in memory. Below solr log entries.


solr.log.26:2020-08-13 14:03:20.827 INFO  (qtp1775120226-46486) [c:COLL
s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.S.Request
[COLL_shard1_replica1]  webapp=/solr path=/admin/ping
params={distrib=true&_stateVer_=COLL:3032=javabin=2}
hits=62569458 status=0 QTime=600113
solr.log.26:2020-08-13 14:03:20.827 WARN  (qtp1775120226-46486) [c:COLL
s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.SolrCore slow:
[COLL_shard1_replica1]  webapp=/solr path=/admin/ping
params={distrib=true&_stateVer_=COLL:3032=javabin=2}
hits=62569458 status=0 QTime=600113
solr.log.26:2020-08-13 14:03:20.827 INFO  (qtp1775120226-46486) [c:COLL
s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.S.Request
[COLL_shard1_replica1]  webapp=/solr path=/admin/ping
params={distrib=true&_stateVer_=COLL:3032=javabin=2} status=0
QTime=600113
solr.log.26:2020-08-13 14:03:20.827 WARN  (qtp1775120226-46486) [c:COLL
s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.SolrCore slow:
[COLL_shard1_replica1]  webapp=/solr path=/admin/ping
params={distrib=true&_stateVer_=COLL:3032=javabin=2} status=0
QTime=600113
solr.log.38:2020-08-08 15:01:45.640 INFO  (qtp1775120226-46254) [c:COLL
s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.S.Request
[COLL_shard1_replica1]  webapp=/solr path=/admin/ping
params={distrib=true&_stateVer_=COLL:3032=javabin=2}
hits=62221186 status=0 QTime=600092
solr.log.38:2020-08-08 15:01:45.640 WARN  (qtp1775120226-46254) [c:COLL
s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.SolrCore slow:
[COLL_shard1_replica1]  webapp=/solr path=/admin/ping
params={distrib=true&_stateVer_=COLL:3032=javabin=2}
hits=62221186 status=0 QTime=600092
solr.log.38:2020-08-08 15:01:45.640 INFO  (qtp1775120226-46254) [c:COLL
s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.S.Request
[COLL_shard1_replica1]  webapp=/solr path=/admin/ping
params={distrib=true&_stateVer_=COLL:3032=javabin=2} status=0
QTime=600092
solr.log.38:2020-08-08 15:01:45.640 WARN  (qtp1775120226-46254) [c:COLL
s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.SolrCore slow:
[COLL_shard1_replica1]  webapp=/solr path=/admin/ping
params={distrib=true&_stateVer_=COLL:3032=javabin=2} status=0
QTime=600092
solr.log.39:2020-08-08 13:20:12.117 INFO  (qtp1775120226-46254) [c:COLL
s:shard1 r:core_node19 x:COLL_shard1_replica1] o.a.s.c.S.Request
[COLL_shard1_replica1]  webapp=/solr path=/admin/ping
params={distrib=true&_stateVer_=COLL:3032=javabin=2}
hits=63094900 status=0 QTime=600095



server1:/home/kumar # curl --location --request GET '
http://server1:8080/solr/COLL/admin/ping?distrib=true'


true0600095{!lucene}*:*truewordTokensfalse10allOK

Re: How do you restrict access to Solr?

2020-03-16 Thread Susheel Kumar

Basic auth should help you to start

https://lucene.apache.org/solr/guide/8_1/basic-authentication-plugin.html

On Mon, Mar 16, 2020 at 10:44 AM Ryan W  wrote:

> How do you, personally, do it?  Do you use IPTables?  Basic Authentication
> Plugin? Something else?
>
> I'm asking in part so I'l have something to search for.  I don't know where
> I should begin, so I figured I would ask how others do it.
>
> I haven't been able to find anything that works, so if you can tell me what
> works for you, I can at least narrow it down a bit and do some Google
> searches.  Do I need to learn Solr's plugin system?  Am I starting in the
> right place if I follow this document:
>
> https://lucene.apache.org/solr/guide/7_0/rule-based-authorization-plugin.html#rule-based-authorization-plugin
>
> Initially, the above document seems far too comprehensive for my needs.  I
> just want to block access to the Solr admin UI, and the list of predefined
> permissions in that document don't seem to be relevant.  Also, it seems
> unlikely this plugin system is necessary just to control access to the
> admin UI... or maybe it necessary?
>
> In any case, what is your approach?
>
> I'm using version 7.7.2 of Solr.
>
> Thanks!
>

Re: Unable to start solr server on "Ubuntu 18.04 bash shell on Windows 10"

2020-02-20 Thread Susheel Kumar

check if below directories have correct permission.  solr.log file not
created implies some issue

tail: cannot open
'/home/pawasthi/projects/solr_practice/ex1/solr-8.4.1/example/cloud/node1/solr/../logs/solr.log'

Solr home directory
/home/pawasthi/projects/solr_practice/ex1/solr-8.4.1/example/cloud/node1/solr
already exists.
/home/pawasthi/projects/solr_practice/ex1/solr-8.4.1/example/cloud/node2
already exists.

On Thu, Feb 20, 2020 at 8:14 AM Vadim Ivanov <
vadim.iva...@spb.ntk-intourist.ru> wrote:

> Hi
> That seems the reason of solr not starting:
>
> cannot open
>
> '/home/pawasthi/projects/solr_practice/ex1/solr-8.4.1/example/cloud/node1/solr/../logs/solr.log'
> for reading: No such file or directory
>
>
> > -Original Message-
> > From: Prabhat Awasthi [mailto:pawasthi.i...@gmail.com]
> > Sent: Wednesday, February 19, 2020 6:34 PM
> > To: solr-user@lucene.apache.org
> > Subject: Unable to start solr server on "Ubuntu 18.04 bash shell on
> Windows
> > 10"
> >
> > Hello,
> >
> > I am using Linux bash sell (Ubuntu app) on Windows 10 to run Solr on
> Ubuntu
> > 18.04.
> >
> > $ lsb_release -a
> > No LSB modules are available.
> > Distributor ID: Ubuntu
> > Description:Ubuntu 18.04.2 LTS
> > Release:18.04
> > Codename:   bionic
> >
> > I already installed Java8 (Openjdk) on my Ubuntu environment.
> >
> > $ java -version
> > openjdk version "1.8.0_242"
> > OpenJDK Runtime Environment (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-
> > b08)
> > OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
> >
> > But I face error when I try to start SolrCloud on my Ubuntu system.
> > Could you please help to give me some pointers if I miss anything here ?
> > Please find below the full logs.
> >
> > Thanks in advance.
> > - Prabhat
> >
> >
> --
> > -
> > $ bin/solr start -e cloud
> > *** [WARN] *** Your open file limit is currently 1024.
> >  It should be set to 65000 to avoid operational disruption.
> >  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
> false
> > in your profile or solr.in.sh
> > *** [WARN] ***  Your Max Processes Limit is currently 7823.
> >  It should be set to 65000 to avoid operational disruption.
> >  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
> false
> > in your profile or solr.in.sh
> >
> > Welcome to the SolrCloud example!
> >
> > This interactive session will help you launch a SolrCloud cluster on
> your local
> > workstation.
> > To begin, how many Solr nodes would you like to run in your local
> cluster?
> > (specify 1-4 nodes) [2]:
> >
> > Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.
> > Please enter the port for node1 [8983]:
> >
> > Please enter the port for node2 [7574]:
> >
> > Solr home directory
> > /home/pawasthi/projects/solr_practice/ex1/solr-
> > 8.4.1/example/cloud/node1/solr
> > already exists.
> > /home/pawasthi/projects/solr_practice/ex1/solr-
> > 8.4.1/example/cloud/node2
> > already exists.
> >
> > Starting up Solr on port 8983 using command:
> > "bin/solr" start -cloud -p 8983 -s "example/cloud/node1/solr"
> >
> > *** [WARN] ***  Your Max Processes Limit is currently 7823.
> >  It should be set to 65000 to avoid operational disruption.
> >  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to
> false
> > in your profile or solr.in.sh *Waiting up to 180 seconds to see Solr
> running on
> > port 8983 [|]  bin/solr:
> > line 664:   293 Aborted (core dumped) nohup "$JAVA"
> > "${SOLR_START_OPTS[@]}" $SOLR_ADDL_ARGS -Dsolr.log.muteconsole "-
> > XX:OnOutOfMemoryError=$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT
> > $SOLR_LOGS_DIR" -jar start.jar "${SOLR_JETTY_CONFIG[@]}"
> > $SOLR_JETTY_ADDL_CONFIG > "$SOLR_LOGS_DIR/solr-$SOLR_PORT-
> > console.log" 2>&1*  [|]  Still not seeing Solr listening on 8983 after
> 180
> > seconds!
> > tail: cannot open
> > '/home/pawasthi/projects/solr_practice/ex1/solr-
> > 8.4.1/example/cloud/node1/solr/../logs/solr.log'
> > for reading: No such file or directory
> >
> > ERROR: Did not see Solr at http://localhost:8983/solr come online
> within 30
> >
> --
> > -
>
>

Re: Storage/Volume type for Kubernetes Solr POD?

2020-02-11 Thread Susheel Kumar

Thanks, Karl for sharing.  With local SSD's you be able to auto scale. Is
that correct?

On Fri, Feb 7, 2020 at 5:22 AM Nicolas PARIS 
wrote:

> hi all
>
> what about cephfs or lustre distrubuted filesystem for such purpose ?
>
>
> Karl Stoney  writes:
>
> > we personally run solr on google cloud kubernetes engine and each node
> has a 512Gb persistent ssd (network attached) storage which gives roughly
> this performance (read/write):
> >
> > Sustained random IOPS limit 15,360.00 15,360.00
> > Sustained throughput limit (MB/s) 245.76  245.76
> >
> > and we get very good performance.
> >
> > ultimately though it's going to depend on your workload
> > 
> > From: Susheel Kumar 
> > Sent: 06 February 2020 13:43
> > To: solr-user@lucene.apache.org 
> > Subject: Storage/Volume type for Kubernetes Solr POD?
> >
> > Hello,
> >
> > Whats type of storage/volume is recommended to run Solr on Kubernetes
> POD?
> > I know in the past Solr has issues with NFS storing its indexes and was
> not
> > recommended.
> >
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkubernetes.io%2Fdocs%2Fconcepts%2Fstorage%2Fvolumes%2Fdata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7Cade649a9f6e84e1ee7d008d7ab0a8c7b%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165934101219754sdata=wsc4v3dJwTzOqSirbo7DvdmrimTL2sOX66Ug%2FvzrRw8%3Dreserved=0
> >
> > Thanks,
> > Susheel
> > This e-mail is sent on behalf of Auto Trader Group Plc, Registered
> Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in
> England No. 9439967). This email and any files transmitted with it are
> confidential and may be legally privileged, and intended solely for the use
> of the individual or entity to whom they are addressed. If you have
> received this email in error please notify the sender. This email message
> has been swept for the presence of computer viruses.
>
>
> --
> nicolas paris
>

Storage/Volume type for Kubernetes Solr POD?

2020-02-06 Thread Susheel Kumar

Hello,

Whats type of storage/volume is recommended to run Solr on Kubernetes POD?
I know in the past Solr has issues with NFS storing its indexes and was not
recommended.

https://kubernetes.io/docs/concepts/storage/volumes/

Thanks,
Susheel

Multiple versions of same documents with different effective dates

2019-11-11 Thread Susheel Kumar

Hello,

I am trying to keep multiple versions of same document (empId,
empName,deptID,effectiveDt,empTitle..,..) with different effective dates
(composite key: deptID,empID,effectiveDt) but mark/ soft delete (deleted=Y)
the older ones and keep deleted=N for the latest one.

This way i can query the latest one (AND deleted=N) and if required
show all of them.

I am thinking to do this in processAdd / ScriptUpdateProcessor and query
Solr to first see if there is any existing record with deptID,empID and
then update those with deleted=Y and then process new one with deleted=N.

Any suggestions or issues you see with this approach?

Thanks,
Susheel

P.S.  I need to figure out how update another document same time
in processAdd

Re: How do I index PDF and Word Doc

2019-09-23 Thread Susheel Kumar

Which collection are you trying to index?  Is the localDocs or books?

you can also try to run thru steps @ exercise 1 at above link to post data
to techproducts and in general it should work end to end

solr-7.7.0:$ bin/post -c techproducts example/exampledocs/*
Which documents do you

On Mon, Sep 23, 2019 at 3:31 PM Pasle Choix  wrote:

> Thank you Susheel,
>
> Here is what I see
> from /opt/solr-7.7.2/server/solr/configsets/_default/conf/solrconfig.xml:
>
>  startup="lazy"
>   class="solr.extraction.ExtractingRequestHandler" >
> 
>   true
>   ignored_
>   _text_
> 
>   
>
> Is there anything wrong with it and how to fix it?
>
> Thank you.
>
> Pasle Choix
>
>
>
> On Mon, Sep 23, 2019 at 2:09 PM Susheel Kumar 
> wrote:
>
>> Not sure which configuration you are using but double check
>> solrconfig.xml to have entries like below and have below sr_mv_txt below in
>> schema.xml for storing and indexing.
>>
>> > startup="lazy"
>> class="solr.extraction.ExtractingRequestHandler" >
>>   
>> true
>> ignored_
>> sr_mv_txt
>>   
>> 
>>
>>
>> Thnx
>>
>>
>> On Thu, Sep 19, 2019 at 11:02 PM PasLe Choix 
>> wrote:
>>
>>> I am on Solr 7.7, according to the official document:
>>> https://lucene.apache.org/solr/guide/7_7/solr-tutorial.html
>>> Although it is mentioned Post Tool can index a directory of files, and
>>> can
>>> handle HTML, PDF, Office formats like Word, however no example working
>>> command is given.
>>>
>>> ./bin/post -c localDocs ~/DocumentsError:Problem accessing
>>> /solr/books/update. Reason:
>>> Not Found
>>>
>>> or if I directly upload a pdf as Document through Admin GUI, I will get
>>> Unsupported ContentType: application/pdf Not in: [application/xml,
>>> application/csv, application/json, text/json, text/csv, text/xml,
>>> application/javabin]
>>>
>>> Can anyone please share the correct way to index on pdf/doc/docx, etc.?
>>> through both Admin GUI and command line.
>>>
>>> Thank you very much.
>>>
>>>
>>> Pasle Choix
>>>
>>

Re: How do I index PDF and Word Doc

2019-09-23 Thread Susheel Kumar

Not sure which configuration you are using but double check solrconfig.xml
to have entries like below and have below sr_mv_txt below in schema.xml for
storing and indexing.


  
true
ignored_
sr_mv_txt
  



Thnx


On Thu, Sep 19, 2019 at 11:02 PM PasLe Choix  wrote:

> I am on Solr 7.7, according to the official document:
> https://lucene.apache.org/solr/guide/7_7/solr-tutorial.html
> Although it is mentioned Post Tool can index a directory of files, and can
> handle HTML, PDF, Office formats like Word, however no example working
> command is given.
>
> ./bin/post -c localDocs ~/DocumentsError:Problem accessing
> /solr/books/update. Reason:
> Not Found
>
> or if I directly upload a pdf as Document through Admin GUI, I will get
> Unsupported ContentType: application/pdf Not in: [application/xml,
> application/csv, application/json, text/json, text/csv, text/xml,
> application/javabin]
>
> Can anyone please share the correct way to index on pdf/doc/docx, etc.?
> through both Admin GUI and command line.
>
> Thank you very much.
>
>
> Pasle Choix
>

bi-directional CDCR

2019-06-11 Thread Susheel Kumar

Hello,

What does that mean by below.  How do we set which cluster will act as
source or target at a time?

Both Cluster 1 and Cluster 2 can act as Source and Target at any given
point of time but a cluster cannot be both Source and Target at the same
time.
Also following the directions mentioned in this page doesn't make cdcr
works. No data flows from cluster 1  to cluster 2. The Solr 7.7.1.  Is
there something missing.
https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates

Solr-Docker in Prod?

2019-04-02 Thread Susheel Kumar

Hello,

If you are running Solr in a docker container in Production,  can you
please share you experience/tips/issues you came across with
performance/sharding/CDCR etc.

Also I assume you might have take image from docker hub/or do you compose
your own image https://hub.docker.com/_/solr

Thanks,
Susheel

Re: Solr CDRC updating data in target and not in source

2019-02-21 Thread Susheel Kumar

Do you see CDCR forward messages in source solr logs and with some
numbers?  That will confirm if data indeed going thru source and forwarded
to target.

Also any auto/soft commit settings difference between source & target?

On Wed, Feb 20, 2019 at 8:29 AM ypriverol  wrote:

> Hi:
>
> I'm using a CDRC feature from solr 7.1. My source solrcloud cluster is 3
> shards and the target similar 3 shards. When we create both clusters and
> push to the source and then enable CDRC the data is transfer nicely to the
> target. If we start adding records everything is fine.
>
> However, we have deleted ALL records in source and start adding again all
> with our pipelines (Spring solr). Interestingly,  all records appear in
> target but not in the source. We have even stopped the cdrc and the data
> continue transferring to the target and not appear in the source even when
> we are 100% we are inserting in the source.
>
> Any ideas?
>
> Regards
> Yasset
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: SolrCloud Replication Failure

2018-11-01 Thread Susheel Kumar

Are we saying it has to do something with stop and restarting replica's
otherwise I haven't seen/heard any issues with document updates and
forwarding to replica's...

Thanks,
Susheel

On Thu, Nov 1, 2018 at 12:58 PM Erick Erickson 
wrote:

> So  this seems like it absolutely needs a JIRA
> On Thu, Nov 1, 2018 at 9:39 AM Kevin Risden  wrote:
> >
> > I pushed 3 branches that modifies test.sh to test 5.5, 6.6, and 7.5
> locally
> > without docker. I still see the same behavior where the latest updates
> > aren't on the replicas. I still don't know what is happening but it
> happens
> > without Docker :(
> >
> >
> https://github.com/risdenk/test-solr-start-stop-replica-consistency/branches
> >
> > Kevin Risden
> >
> >
> > On Thu, Nov 1, 2018 at 11:41 AM Kevin Risden  wrote:
> >
> > > Erick - Yea thats a fair point. Would be interesting to see if this
> fails
> > > without Docker.
> > >
> > > Kevin Risden
> > >
> > >
> > > On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> Kevin:
> > >>
> > >> You're also using Docker, right? Docker is not "officially" supported
> > >> although there's some movement in that direction and if this is only
> > >> reproducible in Docker than it's a clue where to look
> > >>
> > >> Erick
> > >> On Wed, Oct 31, 2018 at 7:24 PM
> > >> Kevin Risden
> > >>  wrote:
> > >> >
> > >> > I haven't dug into why this is happening but it definitely
> reproduces. I
> > >> > removed the local requirements (port mapping and such) from the
> gist you
> > >> > posted (very helpful). I confirmed this fails locally and on Travis
> CI.
> > >> >
> > >> > https://github.com/risdenk/test-solr-start-stop-replica-consistency
> > >> >
> > >> > I don't even see the first update getting applied from num 10 -> 20.
> > >> After
> > >> > the first update there is no more change.
> > >> >
> > >> > Kevin Risden
> > >> >
> > >> >
> > >> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith 
> > >> wrote:
> > >> >
> > >> > > Thanks Erick, this is 7.5.0.
> > >> > > 
> > >> > > From: Erick Erickson 
> > >> > > Sent: Wednesday, October 31, 2018 8:20:18 PM
> > >> > > To: solr-user
> > >> > > Subject: Re: SolrCloud Replication Failure
> > >> > >
> > >> > > What version of solr? This code was pretty much rewriten in 7.3
> IIRC
> > >> > >
> > >> > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith  wrote:
> > >> > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > >  We are currently running a moderately large instance of
> > >> standalone
> > >> > > > solr and are preparing to switch to solr cloud to help us scale
> > >> up.  I
> > >> > > have
> > >> > > > been running a number of tests using docker locally and ran
> into an
> > >> issue
> > >> > > > where replication is consistently failing.  I have pared down
> the
> > >> test
> > >> > > case
> > >> > > > as minimally as I could.  Here's a link for the
> docker-compose.yml
> > >> (I put
> > >> > > > it in a directory called solrcloud_simple) and a script to run
> the
> > >> test:
> > >> > > >
> > >> > > >
> > >> > > >
> https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
> > >> > > >
> > >> > > >
> > >> > > > Here's the basic idea behind the test:
> > >> > > >
> > >> > > >
> > >> > > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard,
> and 2
> > >> > > > replicas (each node gets a replica).  Just use the default
> schema,
> > >> > > although
> > >> > > > I've also tried our schema and got the same result.
> > >> > > >
> > >> > > >
> > >> > > > 2) Shut down solr-2
> > >> > > >
> > >> > > >
> > >> > > > 3) Add 100 simple docs, just id and a field called num.
> > >> > > >
> > >> > > >
> > >> > > > 4) Start solr-2 and check that it received the documents.  It
> did!
> > >> > > >
> > >> > > >
> > >> > > > 5) Update a document, commit, and check that solr-2 received the
> > >> update.
> > >> > > > It did!
> > >> > > >
> > >> > > >
> > >> > > > 6) Stop solr-2, update the same document, start solr-2, and make
> > >> sure
> > >> > > that
> > >> > > > it received the update.  It did!
> > >> > > >
> > >> > > >
> > >> > > > 7) Repeat step 6 with a new value.  This time solr-2 reverts
> back
> > >> to what
> > >> > > > it had in step 5.
> > >> > > >
> > >> > > >
> > >> > > > I believe the main issue comes from this in the logs:
> > >> > > >
> > >> > > >
> > >> > > > solr-2_1  | 2018-10-31 17:04:26.135 INFO
> > >> > > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
> > >> > > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test
> > >> s:shard1
> > >> > > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync
> PeerSync:
> > >> > > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our
> > >> versions
> > >> > > are
> > >> > > > newer. ourHighThreshold=1615861330901729280
> > >> > > > otherLowThreshold=1615861314086764545
> ourHighest=1615861330901729280
> > >> > > > otherHighest=1615861335081353216
> > >> > > >
> > >> > > > PeerSync thinks the

Re: ZookeeperServer not running/Client Session timed out

2018-10-23 Thread Susheel Kumar

Domain Validation skipping
write tests
[Fri Oct 19 12:58:57 2018] scsi target0:0:0: Ending Domain Validation
[Fri Oct 19 12:58:57 2018] scsi target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST
(25 ns, offset 127)
[Fri Oct 19 12:58:58 2018] scsi target0:0:1: Beginning Domain Validation
[Fri Oct 19 12:58:58 2018] scsi target0:0:1: Domain Validation skipping
write tests
[Fri Oct 19 12:58:58 2018] scsi target0:0:1: Ending Domain Validation
[Fri Oct 19 12:58:58 2018] scsi target0:0:1: FAST-40 WIDE SCSI 80.0 MB/s ST
(25 ns, offset 127)
[Fri Oct 19 12:58:58 2018] scsi target0:0:2: Beginning Domain Validation
[Fri Oct 19 12:58:58 2018] scsi target0:0:2: Domain Validation skipping
write tests
[Fri Oct 19 12:58:58 2018] scsi target0:0:2: Ending Domain Validation
[Fri Oct 19 12:58:58 2018] scsi target0:0:2: FAST-40 WIDE SCSI 80.0 MB/s ST
(25 ns, offset 127)
[Fri Oct 19 12:58:58 2018] scsi target0:0:3: Beginning Domain Validation
[Fri Oct 19 12:58:58 2018] scsi target0:0:3: Domain Validation skipping
write tests
[Fri Oct 19 12:58:58 2018] scsi target0:0:3: Ending Domain Validation
[Fri Oct 19 12:58:58 2018] scsi target0:0:3: FAST-40 WIDE SCSI 80.0 MB/s ST
(25 ns, offset 127)


On Mon, Oct 22, 2018 at 10:15 PM Susheel Kumar 
wrote:

> Hi Shawn,
>
> Here is the link for Solr GC log and it doesn't look Solr GC problem. The
> total GC is 12 GB.  The GC log is from yesterday and the issue happened
> this morning i.e. 10/22.
>
>
> https://www.dropbox.com/s/zdlu9sk8kc469ls/Screen%20Shot%202018-10-22%20at%2010.08.37%20PM.png?dl=0
>
>
> It may be network issue but just looking the message "ZookeeperSolr server
> not running" and later it instantiate doesn't give any clue.
>
> Thnx
>
> On Mon, Oct 22, 2018 at 9:54 PM Shawn Heisey  wrote:
>
>> On 10/22/2018 7:32 PM, Susheel Kumar wrote:
>> > Hi Shawn, you meant ZK GC log correct?
>>
>> There was another potential cause I was thinking of, but when I got to
>> where I was going to list them in the previous message, I could not for
>> the life of me remember what the other one was.
>>
>> I just remembered:  This problem could be caused by severe network
>> connectivity issues between your servers.  A few dropped packets
>> probably isn't enough ... I think it would have to be a severe problem.
>>
>> Thanks,
>> Shawn
>>
>>

Re: ZookeeperServer not running/Client Session timed out

2018-10-22 Thread Susheel Kumar

Hi Shawn,

Here is the link for Solr GC log and it doesn't look Solr GC problem. The
total GC is 12 GB.  The GC log is from yesterday and the issue happened
this morning i.e. 10/22.

https://www.dropbox.com/s/zdlu9sk8kc469ls/Screen%20Shot%202018-10-22%20at%2010.08.37%20PM.png?dl=0

It may be network issue but just looking the message "ZookeeperSolr server
not running" and later it instantiate doesn't give any clue.

Thnx

On Mon, Oct 22, 2018 at 9:54 PM Shawn Heisey  wrote:

> On 10/22/2018 7:32 PM, Susheel Kumar wrote:
> > Hi Shawn, you meant ZK GC log correct?
>
> There was another potential cause I was thinking of, but when I got to
> where I was going to list them in the previous message, I could not for
> the life of me remember what the other one was.
>
> I just remembered:  This problem could be caused by severe network
> connectivity issues between your servers.  A few dropped packets
> probably isn't enough ... I think it would have to be a severe problem.
>
> Thanks,
> Shawn
>
>

Re: ZookeeperServer not running/Client Session timed out

2018-10-22 Thread Susheel Kumar

Hi Shawn, you meant ZK GC log correct?

Thnx

On Mon, Oct 22, 2018 at 7:03 PM Shawn Heisey  wrote:

> On 10/22/2018 3:31 PM, Susheel Kumar wrote:
> > Hello,
> >
> > I am seeing "ZookeeperServer not running" WARM messages in zookeeper logs
> > which is causing the Solr client connections to timeout...
> >
> > What could be the problem?
> >
> > ZK: 3.4.10
> >
> > Zookeeper.out
> > ==
>
> For help with the ZK server log, you'll need to consult the ZooKeeper
> project.  The language in their log entries seems plain enough, but they
> will be able to tell you precisely what it means.
>
> > solr.log
> > 2018-10-22 10:02:21.466 WARN  (main-SendThread(srch0118:2182)) [   ]
> > o.a.z.ClientCnxn Client session timed out, have not heard from server in
> > 26675ms for sessionid 0x5665c67cb0d000b
>
> The ZK client in Solr hasn't heard from the ZK server in over 26
> seconds, so it considers that connection to have timed out, and will
> throw the connection away.  It should try again to establish a new
> connection ... but whatever's causing the problem will probably also
> affect the new connection.
>
> It's a particularly bad sign for the ZK connection to time out,
> especially on an interval like 26 seconds.  That's a REALLY long time
> for software like Solr and ZK.
>
> One of the things that can cause problems like this is having a heap
> that's too small, so Java must spend the majority of its time doing
> garbage collection, rather than running the program it's been asked to
> run.  There are sometimes other causes, but that is a very common cause.
>
> Can you share a garbage collection log from a time when these errors
> happen?  Solr should set up Java so that it creates a GC log.  You'll
> need to use a file sharing site (like Dropbox) -- email attachments
> almost never make it to the list.
>
> Thanks,
> Shawn
>
>

ZookeeperServer not running/Client Session timed out

2018-10-22 Thread Susheel Kumar

Hello,

I am seeing "ZookeeperServer not running" WARM messages in zookeeper logs
which is causing the Solr client connections to timeout...

What could be the problem?

ZK: 3.4.10

Zookeeper.out
==
2018-10-22 06:04:51,071 [myid:2] - INFO
[WorkerReceiver[myid=2]:FastLeaderElection@600] - Notification: 1 (message
format version), 5 (n.leader), 0xf0461 (n.zxid), 0x10 (n.round),
FOLLOWING (n.state), 4 (n.sid), 0xf (n.peerEpoch) LOOKING (my state)
2018-10-22 06:04:51,093 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxnFactory@192] - Accepted socket connection
from /192.72.25.177:39514
2018-10-22 06:04:51,094 [myid:2] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@373] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2018-10-22 06:04:51,094 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@1044] - Closed socket connection for
client /192.72.25.177:39514 (no session established for client)
2018-10-22 06:04:51,138 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxnFactory@192] - Accepted socket connection
from /192.3.101.219:56298
2018-10-22 06:04:51,138 [myid:2] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@373] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2018-10-22 06:04:51,139 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@1044] - Closed socket connection for
client /192.3.101.219:56298 (no session established for client)
2018-10-22 06:04:51,250 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxnFactory@192] - Accepted socket connection
from /192.72.27.181:46414
2018-10-22 06:04:51,250 [myid:2] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@373] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2018-10-22 06:04:51,250 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@1044] - Closed socket connection for
client /192.72.27.181:46414 (no session established for client)
2018-10-22 06:04:51,275 [myid:2] - INFO
[WorkerReceiver[myid=2]:FastLeaderElection@600] - Notification: 1 (message
format version), 4 (n.leader), 0xf0461 (n.zxid), 0x192 (n.round),
LOOKING (n.state), 4 (n.sid), 0xf (n.peerEpoch) LOOKING (my state)
2018-10-22 06:04:51,275 [myid:2] - INFO
[WorkerReceiver[myid=2]:FastLeaderElection@600] - Notification: 1 (message
format version), 4 (n.leader), 0xf0461 (n.zxid), 0x192 (n.round),
LOOKING (n.state), 2 (n.sid), 0xf (n.peerEpoch) LOOKING (my state)
2018-10-22 06:04:51,275 [myid:2] - INFO
[WorkerReceiver[myid=2]:FastLeaderElection@600] - Notification: 1 (message
format version), 4 (n.leader), 0xf0461 (n.zxid), 0x192 (n.round),
LOOKING (n.state), 1 (n.sid), 0xf (n.peerEpoch) LOOKING (my state)
2018-10-22 06:04:51,309 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxnFactory@192] - Accepted socket connection
from /192.72.5.212:38944
2018-10-22 06:04:51,309 [myid:2] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@373] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2018-10-22 06:04:51,309 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@1044] - Closed socket connection for
client /192.72.5.212:38944 (no session established for client)
2018-10-22 06:04:51,356 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxnFactory@192] - Accepted socket connection
from /192.72.7.201:59310
2018-10-22 06:04:51,356 [myid:2] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@373] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2018-10-22 06:04:51,356 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@1044] - Closed socket connection for
client /192.72.7.201:59310 (no session established for client)
2018-10-22 06:04:51,402 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxnFactory@192] - Accepted socket connection
from /192.3.101.219:56302
2018-10-22 06:04:51,402 [myid:2] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@373] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2018-10-22 06:04:51,402 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@1044] - Closed socket connection for
client /192.3.101.219:56302 (no session established for client)
2018-10-22 06:04:51,467 [myid:2] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxnFactory@192] - Accepted socket connection
from /192.72.7.205:46694
2018-10-22 06:04:51,468 [myid:2] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2182:NIOServerCnxn@373] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2018-10-22 06:04:51,468

Re: checksum failed (hardware problem?)

2018-10-09 Thread Susheel Kumar

Exactly. I have a node with checksum issue and it is alive which is good
for us since if it goes down one of the shard would be down and thus
outage.

Yes, i agree that we don't get to know if there is node having checksum
issue and thats where we are putting log monitoring which will alert if
"corrupt" or "checksum" keyword is found in the logs.

Thnx

On Mon, Oct 8, 2018 at 5:41 PM Stephen Bianamara 
wrote:

> Hi Susheel,
>
> Yes, I believe you are correct on fixing a node in place. My org actually
> just cycles instances rather than repair broken ones.
>
> It's too bad that there's nothing conclusive we can look for to help
> investigate the scope. We'd love to pin this down so that we could take
> something concrete to investigate to AWS if it's a hardware failure (e.g.,
> we found a log indicating ). I haven't been able to find anything which
> might clarify that matter outside of SOLR either. Perhaps it's just not
> realistic at this time.
>
> I'm also curious about another aspect, which is that the nodes don't report
> as unhealthy. Currently a node with a bad checksum will just stay in the
> collection forever. Shouldn't the node go to "down" if it has an
> irreparable checksum?
>
> On Fri, Oct 5, 2018 at 5:25 AM Susheel Kumar 
> wrote:
>
> > My understanding is once the index is corrupt, the only way to fix is
> using
> > checkindex utility which will remove some bad segments and then only we
> can
> > use it.
> >
> > This is bit scary that you see similar error on 6.6.2 though in our case
> we
> > know we are going thru some hardware problem which likely would have
> caused
> > the corruption but there is no concrete evidence which can be used to
> > confirm if it is hardware or Solr/Lucene.  Are you able to use another
> AWS
> > instance similar to Simon's case.
> >
> > Thanks,
> > Susheel
> >
> > On Thu, Oct 4, 2018 at 7:11 PM Stephen Bianamara  >
> > wrote:
> >
> > > To be more concrete: Is the definitive test of whether or not a core's
> > > index is corrupt to copy it onto a new set of hardware and attempt to
> > write
> > > to it? If this is a definitive test, we can run the experiment and
> update
> > > the report so you have a sense of how often this happens.
> > >
> > > Since this is a SOLR cloud node, which is already removed but whose
> data
> > > dir was preserved, I believe I can just copy the data directory to a
> > fresh
> > > machine and start a regular non-cloud solr node hosting this core. Can
> > you
> > > please confirm that this will be a definitive test, or whether there is
> > > some aspect needed to make it definitive?
> > >
> > > Thanks!
> > >
> > > On Wed, Oct 3, 2018 at 2:10 AM Stephen Bianamara <
> sbianam...@panopto.com
> > >
> > > wrote:
> > >
> > > > Hello All --
> > > >
> > > > As it would happen, we've seen this error on version 6.6.2 very
> > recently.
> > > > This is also on an AWS instance, like Simon's report. The drive
> doesn't
> > > > show any sign of being unhealthy, either from cursory investigation.
> > > FWIW,
> > > > this occurred during a collection backup.
> > > >
> > > > Erick, is there some diagnostic data we can find to help pin this
> down?
> > > >
> > > > Thanks!
> > > > Stephen
> > > >
> > > > On Sun, Sep 30, 2018 at 12:48 PM Susheel Kumar <
> susheel2...@gmail.com>
> > > > wrote:
> > > >
> > > >> Thank you, Simon. Which basically points that something related to
> env
> > > and
> > > >> was causing the checksum failures than any lucene/solr issue.
> > > >>
> > > >> Eric - I did check with hardware folks and they are aware of some
> > VMware
> > > >> issue where the VM hosted in HCI environment is coming into some
> halt
> > > >> state
> > > >> for minute or so and may be loosing connections to disk/network.  So
> > > that
> > > >> probably may be the reason of index corruption though they have not
> > been
> > > >> able to find anything specific from logs during the time Solr run
> into
> > > >> issue
> > > >>
> > > >> Also I had again issue where Solr is loosing the connection with
> > > zookeeper
> > > >> (Client session timed out, have not heard from server in 8367ms for
> > > >> sessionid 0x0

Re: checksum failed (hardware problem?)

2018-10-05 Thread Susheel Kumar

My understanding is once the index is corrupt, the only way to fix is using
checkindex utility which will remove some bad segments and then only we can
use it.

This is bit scary that you see similar error on 6.6.2 though in our case we
know we are going thru some hardware problem which likely would have caused
the corruption but there is no concrete evidence which can be used to
confirm if it is hardware or Solr/Lucene.  Are you able to use another AWS
instance similar to Simon's case.

Thanks,
Susheel

On Thu, Oct 4, 2018 at 7:11 PM Stephen Bianamara 
wrote:

> To be more concrete: Is the definitive test of whether or not a core's
> index is corrupt to copy it onto a new set of hardware and attempt to write
> to it? If this is a definitive test, we can run the experiment and update
> the report so you have a sense of how often this happens.
>
> Since this is a SOLR cloud node, which is already removed but whose data
> dir was preserved, I believe I can just copy the data directory to a fresh
> machine and start a regular non-cloud solr node hosting this core. Can you
> please confirm that this will be a definitive test, or whether there is
> some aspect needed to make it definitive?
>
> Thanks!
>
> On Wed, Oct 3, 2018 at 2:10 AM Stephen Bianamara 
> wrote:
>
> > Hello All --
> >
> > As it would happen, we've seen this error on version 6.6.2 very recently.
> > This is also on an AWS instance, like Simon's report. The drive doesn't
> > show any sign of being unhealthy, either from cursory investigation.
> FWIW,
> > this occurred during a collection backup.
> >
> > Erick, is there some diagnostic data we can find to help pin this down?
> >
> > Thanks!
> > Stephen
> >
> > On Sun, Sep 30, 2018 at 12:48 PM Susheel Kumar 
> > wrote:
> >
> >> Thank you, Simon. Which basically points that something related to env
> and
> >> was causing the checksum failures than any lucene/solr issue.
> >>
> >> Eric - I did check with hardware folks and they are aware of some VMware
> >> issue where the VM hosted in HCI environment is coming into some halt
> >> state
> >> for minute or so and may be loosing connections to disk/network.  So
> that
> >> probably may be the reason of index corruption though they have not been
> >> able to find anything specific from logs during the time Solr run into
> >> issue
> >>
> >> Also I had again issue where Solr is loosing the connection with
> zookeeper
> >> (Client session timed out, have not heard from server in 8367ms for
> >> sessionid 0x0)  Does that points to similar hardware issue, Any
> >> suggestions?
> >>
> >> Thanks,
> >> Susheel
> >>
> >> 2018-09-29 17:30:44.070 INFO
> >> (searcherExecutor-7-thread-1-processing-n:server54:8080_solr
> >> x:COLL_shard4_replica2 s:shard4 c:COLL r:core_node8) [c:COLL s:shard4
> >> r:core_node8 x:COLL_shard4_replica2] o.a.s.c.SolrCore
> >> [COLL_shard4_replica2] Registered new searcher
> >> Searcher@7a4465b1[COLL_shard4_replica2]
> >>
> >>
> main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_7x3f(6.6.2):C826923/317917:delGen=2523)
> >> Uninverting(_83pb(6.6.2):C805451/172968:delGen=2957)
> >> Uninverting(_3ywj(6.6.2):C727978/334529:delGen=2962)
> >> Uninverting(_7vsw(6.6.2):C872110/385178:delGen=2020)
> >> Uninverting(_8n89(6.6.2):C741293/109260:delGen=3863)
> >> Uninverting(_7zkq(6.6.2):C720666/101205:delGen=3151)
> >> Uninverting(_825d(6.6.2):C707731/112410:delGen=3168)
> >> Uninverting(_dgwu(6.6.2):C760421/295964:delGen=4624)
> >> Uninverting(_gs5x(6.6.2):C540942/138952:delGen=1623)
> >> Uninverting(_gu6a(6.6.2):c75213/35640:delGen=1110)
> >> Uninverting(_h33i(6.6.2):c131276/40356:delGen=706)
> >> Uninverting(_h5tc(6.6.2):c44320/11080:delGen=380)
> >> Uninverting(_h9d9(6.6.2):c35088/3188:delGen=104)
> >> Uninverting(_h80h(6.6.2):c11927/3412:delGen=153)
> >> Uninverting(_h7ll(6.6.2):c11284/1368:delGen=205)
> >> Uninverting(_h8bs(6.6.2):c11518/2103:delGen=149)
> >> Uninverting(_h9r3(6.6.2):c16439/1018:delGen=52)
> >> Uninverting(_h9z1(6.6.2):c9428/823:delGen=27)
> >> Uninverting(_h9v2(6.6.2):c933/33:delGen=12)
> >> Uninverting(_ha1c(6.6.2):c1056/1:delGen=1)
> >> Uninverting(_ha6i(6.6.2):c1883/124:delGen=8)
> >> Uninverting(_ha3x(6.6.2):c807/14:delGen=3)
> >> Uninverting(_ha47(6.6.2):c1229/133:delGen=6)
> >> Uninverting(_hapk(6.6.2):c523) Uninverting(_haoq(6.6.2):c279)
> >> Uninverting(_hamr(6.6.2):c311) Uninverting(_hap0(6.6.

Re: checksum failed (hardware problem?)

2018-09-30 Thread Susheel Kumar

Thank you, Simon. Which basically points that something related to env and
was causing the checksum failures than any lucene/solr issue.

Eric - I did check with hardware folks and they are aware of some VMware
issue where the VM hosted in HCI environment is coming into some halt state
for minute or so and may be loosing connections to disk/network.  So that
probably may be the reason of index corruption though they have not been
able to find anything specific from logs during the time Solr run into issue

Also I had again issue where Solr is loosing the connection with zookeeper
(Client session timed out, have not heard from server in 8367ms for
sessionid 0x0)  Does that points to similar hardware issue, Any suggestions?

Thanks,
Susheel

2018-09-29 17:30:44.070 INFO
(searcherExecutor-7-thread-1-processing-n:server54:8080_solr
x:COLL_shard4_replica2 s:shard4 c:COLL r:core_node8) [c:COLL s:shard4
r:core_node8 x:COLL_shard4_replica2] o.a.s.c.SolrCore
[COLL_shard4_replica2] Registered new searcher
Searcher@7a4465b1[COLL_shard4_replica2]
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_7x3f(6.6.2):C826923/317917:delGen=2523)
Uninverting(_83pb(6.6.2):C805451/172968:delGen=2957)
Uninverting(_3ywj(6.6.2):C727978/334529:delGen=2962)
Uninverting(_7vsw(6.6.2):C872110/385178:delGen=2020)
Uninverting(_8n89(6.6.2):C741293/109260:delGen=3863)
Uninverting(_7zkq(6.6.2):C720666/101205:delGen=3151)
Uninverting(_825d(6.6.2):C707731/112410:delGen=3168)
Uninverting(_dgwu(6.6.2):C760421/295964:delGen=4624)
Uninverting(_gs5x(6.6.2):C540942/138952:delGen=1623)
Uninverting(_gu6a(6.6.2):c75213/35640:delGen=1110)
Uninverting(_h33i(6.6.2):c131276/40356:delGen=706)
Uninverting(_h5tc(6.6.2):c44320/11080:delGen=380)
Uninverting(_h9d9(6.6.2):c35088/3188:delGen=104)
Uninverting(_h80h(6.6.2):c11927/3412:delGen=153)
Uninverting(_h7ll(6.6.2):c11284/1368:delGen=205)
Uninverting(_h8bs(6.6.2):c11518/2103:delGen=149)
Uninverting(_h9r3(6.6.2):c16439/1018:delGen=52)
Uninverting(_h9z1(6.6.2):c9428/823:delGen=27)
Uninverting(_h9v2(6.6.2):c933/33:delGen=12)
Uninverting(_ha1c(6.6.2):c1056/1:delGen=1)
Uninverting(_ha6i(6.6.2):c1883/124:delGen=8)
Uninverting(_ha3x(6.6.2):c807/14:delGen=3)
Uninverting(_ha47(6.6.2):c1229/133:delGen=6)
Uninverting(_hapk(6.6.2):c523) Uninverting(_haoq(6.6.2):c279)
Uninverting(_hamr(6.6.2):c311) Uninverting(_hap0(6.6.2):c338)
Uninverting(_hapu(6.6.2):c275) Uninverting(_hapv(6.6.2):C4/2:delGen=1)
Uninverting(_hapw(6.6.2):C5/2:delGen=1)
Uninverting(_hapx(6.6.2):C2/1:delGen=1)
Uninverting(_hapy(6.6.2):C2/1:delGen=1)
Uninverting(_hapz(6.6.2):C3/1:delGen=1)
Uninverting(_haq0(6.6.2):C6/3:delGen=1)
Uninverting(_haq1(6.6.2):C1)))}
2018-09-29 17:30:52.390 WARN
(zkCallback-5-thread-91-processing-n:server54:8080_solr-SendThread(server117:2182))
[   ] o.a.z.ClientCnxn Client session timed out, have not heard from
server in 8367ms for sessionid 0x0
2018-09-29 17:31:01.302 WARN
(zkCallback-5-thread-91-processing-n:server54:8080_solr-SendThread(server120:2182))
[   ] o.a.z.ClientCnxn Client session timed out, have not heard from
server in 8812ms for sessionid 0x0
2018-09-29 17:31:14.049 INFO
(zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
  ] o.a.s.c.c.ConnectionManager Connection with ZooKeeper
reestablished.
2018-09-29 17:31:14.049 INFO
(zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
  ] o.a.s.c.ZkController ZooKeeper session re-connected ... refreshing
core states after session expiration.
2018-09-29 17:31:14.051 INFO
(zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
  ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (16)
-> (15)
2018-09-29 17:31:14.144 INFO  (qtp834133664-520378) [c:COLL s:shard4
r:core_node8 x:COLL_shard4_replica2] o.a.s.c.S.Request
[COLL_shard4_replica2]  webapp=/solr path=/admin/ping
params={distrib=false=wordTokens&_stateVer_=COLL:1246=false=/admin/ping=id=score=4=0=true=http://server54:8080/solr/COLL_shard4_replica2/|http://server53:8080/solr/COLL_shard4_replica1/=10=2={!lucene}*:*=1538242274139=true=javabin}
webapp=/solr path=/admin/ping
params={distrib=false=wordTokens&_stateVer_=COLL:1246=false=/admin/ping=id=score=4=0=true=http://server54:8080/solr/COLL_shard4_replica2/|http://server53:8080/solr/COLL_shard4_replica1/=10=2={!lucene}*:*=1538242274139=true=javabin}
hits=4989979 status=0 QTime=0




On Wed, Sep 26, 2018 at 9:44 AM simon  wrote:

> I saw something like this a year ago which i reported as a possible bug  (
> https://issues.apache.org/jira/browse/SOLR-10840, which has  a full
> description and stack traces)
>
> This occurred very randomly on an AWS instance; moving the index directory
> to a different file system did not fix the problem Eventually I cloned our
> environment to a new AWS instance, which proved to be the solution. Why, I
> have no idea...
>
> -Simon
>
> On Mon, Sep 24, 2018 at 1:13 PM, Susheel Kumar 
> wrote:
>
> > Got it.

Re: SOLR Index Time Running Optimization

2018-09-26 Thread Susheel Kumar

Also are you using Solr data import? That will be much slower compare to if
you write our own little indexer which does indexing in batches and with
multiple threads.

On Wed, Sep 26, 2018 at 8:00 AM Vincenzo D'Amore  wrote:

> Hi, I know this is the shortest way but, had you tried to add more core or
> CPU to your solr instances? How big is you collection in terms of GB and
> number of documents?
>
> Ciao,
> Vincenzo
>
>
> > On 26 Sep 2018, at 08:36, Krizelle Mae Hernandez <
> krizellemae.marti...@sas.com> wrote:
> >
> > Hi.
> >
> > Our SOLR currently is running approximately 39hours for Full and Delta
> Import. I would like to ask for your assistance on how can we shorten the
> 39hours run time in any possible solution?
> > For SOLR version, we are using solr 5.3.1.
> >
> > Regards,
> > Krizelle Mae M. Hernandez
>

Re: checksum failed (hardware problem?)

2018-09-24 Thread Susheel Kumar

Got it. I'll have first hardware folks check and if they don't see/find
anything suspicious then i'll return here.

Wondering if any body has seen similar error and if they were able to
confirm if it was hardware fault or so.

Thnx

On Mon, Sep 24, 2018 at 1:01 PM Erick Erickson 
wrote:

> Mind you it could _still_ be Solr/Lucene, but let's check the hardware
> first ;)
> On Mon, Sep 24, 2018 at 9:50 AM Susheel Kumar 
> wrote:
> >
> > Hi Erick,
> >
> > Thanks so much for your reply.  I'll now look mostly into any possible
> > hardware issues than Solr/Lucene.
> >
> > Thanks again.
> >
> > On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson  >
> > wrote:
> >
> > > There are several of reasons this would "suddenly" start appearing.
> > > 1> Your disk went bad and some sector is no longer faithfully
> > > recording the bits. In this case the checksum will be wrong
> > > 2> You ran out of disk space sometime and the index was corrupted.
> > > This isn't really a hardware problem.
> > > 3> Your disk controller is going wonky and not reading reliably.
> > >
> > > The "possible hardware issue" message is to alert you that this is
> > > highly unusual and you should at leasts consider doing integrity
> > > checks on your disk before assuming it's a Solr/Lucene problem
> > >
> > > Best,
> > > Erick
> > > On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar 
> > > wrote:
> > > >
> > > > Hello,
> > > >
> > > > I am still trying to understand the corrupt index exception we saw
> in our
> > > > logs. What does the hardware problem comment indicates here?  Does
> that
> > > > mean it caused most likely due to hardware issue?
> > > >
> > > > We never had this problem in last couple of months. The Solr is
> 6.6.2 and
> > > > ZK: 3.4.10.
> > > >
> > > > Please share your thoughts.
> > > >
> > > > Thanks,
> > > > Susheel
> > > >
> > > > Caused by: org.apache.lucene.index.CorruptIndexException: checksum
> > > > failed *(hardware
> > > > problem?)* : expected=db243d1a actual=7a00d3d2
> > > >
> > >
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
> > > > [slice=_i27s_Lucene50_0.tim])
> > > >
> > > > It suddenly started in the logs and before which there was no such
> error.
> > > > Searches & ingestions all seems to be working prior to that.
> > > >
> > > > 
> > > >
> > > > 2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL s:shard1
> > > > r:core_node1 x:COLL_shard1_replica1]
> > > > o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> update-script#processAdd:
> > > >
> newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US
> > > > 2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL s:shard1
> > > > r:core_node1 x:COLL_shard1_replica1] o.a.s.h.RequestHandlerBase
> > > > org.apache.solr.common.SolrException: Exception writing document id
> > > > G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US
> to
> > > the
> > > > index; possible analysis error.
> > > > at
> > > >
> > >
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:206)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> > > > at
> > > >
> > >
> org.apache.solr.update.processor.StatelessScriptUpdateProcessor

Re: checksum failed (hardware problem?)

2018-09-24 Thread Susheel Kumar

Hi Erick,

Thanks so much for your reply.  I'll now look mostly into any possible
hardware issues than Solr/Lucene.

Thanks again.

On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson 
wrote:

> There are several of reasons this would "suddenly" start appearing.
> 1> Your disk went bad and some sector is no longer faithfully
> recording the bits. In this case the checksum will be wrong
> 2> You ran out of disk space sometime and the index was corrupted.
> This isn't really a hardware problem.
> 3> Your disk controller is going wonky and not reading reliably.
>
> The "possible hardware issue" message is to alert you that this is
> highly unusual and you should at leasts consider doing integrity
> checks on your disk before assuming it's a Solr/Lucene problem
>
> Best,
> Erick
> On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar 
> wrote:
> >
> > Hello,
> >
> > I am still trying to understand the corrupt index exception we saw in our
> > logs. What does the hardware problem comment indicates here?  Does that
> > mean it caused most likely due to hardware issue?
> >
> > We never had this problem in last couple of months. The Solr is 6.6.2 and
> > ZK: 3.4.10.
> >
> > Please share your thoughts.
> >
> > Thanks,
> > Susheel
> >
> > Caused by: org.apache.lucene.index.CorruptIndexException: checksum
> > failed *(hardware
> > problem?)* : expected=db243d1a actual=7a00d3d2
> >
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
> > [slice=_i27s_Lucene50_0.tim])
> >
> > It suddenly started in the logs and before which there was no such error.
> > Searches & ingestions all seems to be working prior to that.
> >
> > 
> >
> > 2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL s:shard1
> > r:core_node1 x:COLL_shard1_replica1]
> > o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> > newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US
> > 2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL s:shard1
> > r:core_node1 x:COLL_shard1_replica1] o.a.s.h.RequestHandlerBase
> > org.apache.solr.common.SolrException: Exception writing document id
> > G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US to
> the
> > index; possible analysis error.
> > at
> >
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:206)
> > at
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
> > at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> > at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
> > at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
> > at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
> > at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> > at
> >
> org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.processAdd(StatelessScriptUpdateProcessorFactory.java:380)
> > at
> >
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
> > at
> >
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180)
> > at
> >
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
> > at
> >
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
> > at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
> > at
> >
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:122)
> > at
> >
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271)
> > at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
> > at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
> > at
> >
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:187)
> > at
> >
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:108)
> > at
> org.apache.

checksum failed (hardware problem?)

2018-09-24 Thread Susheel Kumar

Hello,

I am still trying to understand the corrupt index exception we saw in our
logs. What does the hardware problem comment indicates here?  Does that
mean it caused most likely due to hardware issue?

We never had this problem in last couple of months. The Solr is 6.6.2 and
ZK: 3.4.10.

Please share your thoughts.

Thanks,
Susheel

Caused by: org.apache.lucene.index.CorruptIndexException: checksum
failed *(hardware
problem?)* : expected=db243d1a actual=7a00d3d2
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
[slice=_i27s_Lucene50_0.tim])

It suddenly started in the logs and before which there was no such error.
Searches & ingestions all seems to be working prior to that.



2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL s:shard1
r:core_node1 x:COLL_shard1_replica1]
o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US
2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL s:shard1
r:core_node1 x:COLL_shard1_replica1] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Exception writing document id
G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_1-en_US to the
index; possible analysis error.
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:206)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at
org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.processAdd(StatelessScriptUpdateProcessorFactory.java:380)
at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:122)
at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:187)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:108)
at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at

Re: POSSIBLE RESOURCE LEAK- solr 6.6.2

2018-09-22 Thread Susheel Kumar

4(6.6.2):c664/36:delGen=10)
Uninverting(_hbba(6.6.2):c465/72:delGen=3)
Uninverting(_hbd1(6.6.2):c1224/1:delGen=1)
Uninverting(_hba6(6.6.2):c504/31:delGen=6)
Uninverting(_hbcq(6.6.2):c377/1:delGen=1)
Uninverting(_hbe7(6.6.2):c2163/1:delGen=1)
Uninverting(_hbdx(6.6.2):c506/3:delGen=1)
Uninverting(_hbfl(6.6.2):c71/1:delGen=1)
Uninverting(_hbfb(6.6.2):c62/1:delGen=1)
Uninverting(_hbed(6.6.2):C206/3:delGen=1)
Uninverting(_hbfm(6.6.2):C6/2:delGen=1)
Uninverting(_hbfn(6.6.2):C560/281:delGen=2)
Uninverting(_hbfo(6.6.2):C12/7:delGen=1)
Uninverting(_hbfp(6.6.2):C4/2:delGen=1)
Uninverting(_hbfq(6.6.2):C4/2:delGen=1)
Uninverting(_hbfs(6.6.2):C9/5:delGen=1)))}
2018-09-19 23:45:21.084 INFO
(searcherExecutor-7-thread-1-processing-n:server62:8080_solr
x:COLL_shard8_replica2 s:shard8 c:COLL r:core_node16) [c:COLL s:shard8
r:core_node16 x:COLL_shard8_replica2] o.a.s.c.QuerySenderListener
QuerySenderListener done.
2018-09-19 23:45:21.084 INFO
(searcherExecutor-7-thread-1-processing-n:server62:8080_solr
x:COLL_shard8_replica2 s:shard8 c:COLL r:core_node16) [c:COLL s:shard8
r:core_node16 x:COLL_shard8_replica2] o.a.s.c.SolrCore
[COLL_shard8_replica2] Registered new searcher
Searcher@711c5774[COLL_shard8_replica2]
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_8j4y(6.6.2):C879512/181368:delGen=2714)
Uninverting(_8z2y(6.6.2):C801846/163505:delGen=4343)
Uninverting(_81f9(6.6.2):C774110/378113:delGen=2398)
Uninverting(_82ll(6.6.2):C769263/348530:delGen=2427)
Uninverting(_83ne(6.6.2):C848884/94597:delGen=2133)
Uninverting(_4l73(6.6.2):C814472/317397:delGen=2514)
Uninverting(_7yqo(6.6.2):C724552/169736:delGen=3155)
Uninverting(_852r(6.6.2):C707686/75054:delGen=3038)
Uninverting(_ejht(6.6.2):C725012/321262:delGen=3591)
Uninverting(_gxvh(6.6.2):c432789/82761:delGen=1078)
Uninverting(_gblg(6.6.2):c215311/113006:delGen=1782)
Uninverting(_h8mr(6.6.2):c92526/14080:delGen=277)
Uninverting(_h71f(6.6.2):c51823/8823:delGen=371)
Uninverting(_h1p2(6.6.2):c58771/20719:delGen=618)
Uninverting(_h8uz(6.6.2):c9264/2802:delGen=143)
Uninverting(_hb7e(6.6.2):c35846/429:delGen=27)
Uninverting(_h9m4(6.6.2):c12335/1407:delGen=114)
Uninverting(_haiz(6.6.2):c9691/1789:delGen=68)
Uninverting(_halr(6.6.2):c960/340:delGen=25)
Uninverting(_hb16(6.6.2):c1196/69:delGen=12)
Uninverting(_hb5q(6.6.2):c2112/867:delGen=7)
Uninverting(_hb6u(6.6.2):c648/65:delGen=8)
Uninverting(_hb74(6.6.2):c664/36:delGen=10)
Uninverting(_hbba(6.6.2):c465/72:delGen=3)
Uninverting(_hbd1(6.6.2):c1224/1:delGen=1)
Uninverting(_hba6(6.6.2):c504/31:delGen=6)
Uninverting(_hbcq(6.6.2):c377/1:delGen=1)
Uninverting(_hbe7(6.6.2):c2163/1:delGen=1)
Uninverting(_hbdx(6.6.2):c506/3:delGen=1)
Uninverting(_hbfl(6.6.2):c71/1:delGen=1)
Uninverting(_hbfb(6.6.2):c62/1:delGen=1)
Uninverting(_hbed(6.6.2):C206/3:delGen=1)
Uninverting(_hbfm(6.6.2):C6/2:delGen=1)
Uninverting(_hbfn(6.6.2):C560/281:delGen=2)
Uninverting(_hbfo(6.6.2):C12/7:delGen=1)
Uninverting(_hbfp(6.6.2):C4/2:delGen=1)
Uninverting(_hbfq(6.6.2):C4/2:delGen=1)
Uninverting(_hbfs(6.6.2):C9/5:delGen=1)))}

On Sat, Sep 22, 2018 at 2:20 AM Susheel Kumar  wrote:

> Hello,
>
> I noticed one of the replica's in Recover Failed status and after trying
> to recreate the replica/restart the node to recover, I see below error from
> solr log.  The leader for this shard8 replica2 i.e. replica1@server 61
> seems to be fine and serving the queries.
>
> What does this indicate "SolrIndexWriter was not closed prior to
> finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!" from below log.
>
> 2018-09-22 06:00:13.686 ERROR (Finalizer) [   ] o.a.s.u.SolrIndexWriter Error 
> closing IndexWriter2018-09-22 05:58:13.409 INFO  (qtp834133664-21) [c:COLL 
> s:shard8 r:core_node16 x:COLL_shard8_replica2] o.a.s.c.S.Request 
> [COLL_shard8_replica2]  webapp=/solr path=/update 
> params={update.distrib=FROMLEADER=cdcr-processor-chain=http://server61:8080/solr/COLL_shard8_replica1/=javabin=2}
>  status=0 QTime=5
> 2018-09-22 05:58:13.686 INFO  (qtp834133664-19) [c:COLL s:shard8 
> r:core_node16 x:COLL_shard8_replica2] o.a.s.c.S.Request 
> [COLL_shard8_replica2]  webapp=/solr path=/update 
> params={update.distrib=FROMLEADER=cdcr-processor-chain=http://server61:8080/solr/COLL_shard8_replica1/=javabin=2}
>  status=0 QTime=39
> 2018-09-22 05:58:19.338 ERROR 
> (recoveryExecutor-3-thread-3-processing-n:server62:8080_solr 
> x:COLL_shard8_replica2 s:shard8 c:COLL r:core_node16) [c:COLL s:shard8 
> r:core_node16 x:COLL_shard8_replica2] o.a.s.h.ReplicationHandler Index fetch 
> failed :org.apache.solr.common.SolrException: Index fetch failed :
> at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:598)
> at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:301)
> at 
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:400)
> at 
> org

POSSIBLE RESOURCE LEAK- solr 6.6.2

2018-09-22 Thread Susheel Kumar

Hello,

I noticed one of the replica's in Recover Failed status and after trying to
recreate the replica/restart the node to recover, I see below error from
solr log.  The leader for this shard8 replica2 i.e. replica1@server 61
seems to be fine and serving the queries.

What does this indicate "SolrIndexWriter was not closed prior to
finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!" from below log.

2018-09-22 06:00:13.686 ERROR (Finalizer) [   ]
o.a.s.u.SolrIndexWriter Error closing IndexWriter2018-09-22
05:58:13.409 INFO  (qtp834133664-21) [c:COLL s:shard8 r:core_node16
x:COLL_shard8_replica2] o.a.s.c.S.Request [COLL_shard8_replica2]
webapp=/solr path=/update
params={update.distrib=FROMLEADER=cdcr-processor-chain=http://server61:8080/solr/COLL_shard8_replica1/=javabin=2}
status=0 QTime=5
2018-09-22 05:58:13.686 INFO  (qtp834133664-19) [c:COLL s:shard8
r:core_node16 x:COLL_shard8_replica2] o.a.s.c.S.Request
[COLL_shard8_replica2]  webapp=/solr path=/update
params={update.distrib=FROMLEADER=cdcr-processor-chain=http://server61:8080/solr/COLL_shard8_replica1/=javabin=2}
status=0 QTime=39
2018-09-22 05:58:19.338 ERROR
(recoveryExecutor-3-thread-3-processing-n:server62:8080_solr
x:COLL_shard8_replica2 s:shard8 c:COLL r:core_node16) [c:COLL s:shard8
r:core_node16 x:COLL_shard8_replica2] o.a.s.h.ReplicationHandler Index
fetch failed :org.apache.solr.common.SolrException: Index fetch failed
:
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:598)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:301)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:400)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:219)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:471)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:284)
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.lucene.index.CorruptIndexException: codec header
mismatch: actual header=1997958933 vs expected header=1071082519
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard8_replica2/data/index/_8ktm.fnm")))
at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:196)
at 
org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255)
at 
org.apache.lucene.codecs.lucene60.Lucene60FieldInfosFormat.read(Lucene60FieldInfosFormat.java:117)
at 
org.apache.lucene.index.IndexWriter.readFieldInfos(IndexWriter.java:1063)
at 
org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:1075)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:960)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:118)
at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:93)
at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:257)
at 
org.apache.solr.update.DefaultSolrCoreState.changeWriter(DefaultSolrCoreState.java:220)
at 
org.apache.solr.update.DefaultSolrCoreState.openIndexWriter(DefaultSolrCoreState.java:245)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:551)
... 12 more
Suppressed: org.apache.lucene.index.CorruptIndexException:
checksum failed (hardware problem?) : expected=e42fdf3e
actual=c0432e62
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard8_replica2/data/index/_8ktm.fnm")))
at
org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419)
at
org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:462)
at
org.apache.lucene.codecs.lucene60.Lucene60FieldInfosFormat.read(Lucene60FieldInfosFormat.java:171)
... 21 more

2018-09-22 05:58:19.338 ERROR
(recoveryExecutor-3-thread-3-processing-n:server62:8080_solr
x:COLL_shard8_replica2 s:shard8 c:COLL r:core_node16) [c:COLL s:shard8
r:core_node16 x:COLL_shard8_replica2] o.a.s.c.RecoveryStrategy Error
while trying to recover:org.apache.solr.common.SolrException:
Replication for recovery failed.
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:222)
at

Re: 20180917-Need Apache SOLR support

2018-09-17 Thread Susheel Kumar

I'll highly advice if you can use Java library/SolrJ to connect to Solr
than .Net.  There are many things taken care by CloudSolrClient and other
classes when communicated with Solr Cloud having shards/replica's etc and
if your .Net port for SolrJ are not up to date/having all the functionality
(which I am sure) , you may run into issues.

Thnx

On Mon, Sep 17, 2018 at 10:01 AM Jan Høydahl  wrote:

> > We are beginners to Apache SOLR, We need following clarifications from
> you.
> >
> >
> >
> > 1.  In SOLRCloud, How can we install more than one Shared on Single
> PC?
>
> You typically have one installation of Solr on each server. Then you can
> add a collection with multiple shards, specifying how many shards you wish
> when creating the collection, e.g.
>
> bin/solr create -c mycoll -shards 4
>
> Although possible, it is normally not advised to install multiple
> instances of Solr on the same server.
>
> > 2.  How many maximum number of shared can be added under on
> SOLRCloud?
>
> There is no limit. You should find a good number based on the number of
> documents, the size of your data, the number of servers in your cluster,
> available RAM and disk size and the required performance.
>
> In practice you will guess the initial #shards and then benchmark a few
> different settings before you decide.
> Note that you can also adjust the number of shards as you go through
> CREATESHARD / SPLITSHARD APIs, so even if you start out with few shards you
> can grow later.
>
> > 3.  In my application there is no need of ACID properties, other than
> > this can I use SOLR as a Complete Database?
>
> You COULD, but Solr is not intended to be your primary data store. You
> should always design your system so that you can re-index all content from
> some source (does not need to be a database) when needed. There are several
> use cases for a complete re-index that you should consider.
>
> > 4.  In Which OS we can feel the better performance, Windows Server
> OS /
> > Linux?
>
> I'd say Linux if you can. If you HAVE to, then you could also run on
> Windows :-)
>
> > 5.  If a SOLR Core contains 2 Billion indexes, what is the
> recommended
> > RAM size and Java heap space for better performance?
>
> It depends. It is not likely that you will ever put 2bn docs in one single
> core. Normally you would have sharded long before that number.
> The amount of physical RAM and the amount of Java heap to allocate to Solr
> must be calculated and decided on a per case basis.
> You could also benchmark this - test if a larger RAM size improves
> performance due to caching. Depending on your bottlennecks, adding more RAM
> may be a way to scale further before needing to add more servers.
>
> Sounds like you should consult with a Solr expert to dive deep into your
> exact usecase and architect the optimal setup for your case, if you have
> these amounts of data.
>
> > 6.  I have 20 fields per document, how many maximum number of
> documents
> > can be inserted / retrieved in a single request?
>
> No limit. But there are practical limits.
> For indexing (update), attempt various batch sizes and find which gives
> the best performance for you. It is just as important to do inserts
> (updates) in many parallell connections as in large batches.
>
> For searching, why would you want to know a maximum? Normally the usecase
> for search is to get TOP N docs, not a maximum number?
> If you need to retrieve thousands of results, you should have a look at
> /export handler and/or streaming expressions.
>
> > 7.   If I have Billions of indexes, If the "start" parameter is 10th
> > Million index and "end" parameter is  start+100th index, for this case
> any
> > performance issue will be raised ?
>
> Don't do it!
> This is a warning sign that you are using Solr in a wrong way.
>
> If you need to scroll through all docs in the index, have a look at
> streaming expressions or cursorMark instead!
>
> > 8.  Which .net client is best for SOLR?
>
> The only I'm aware of is SolrNET. There may be others. None of them are
> supported by the Solr project.
>
> > 9.  Is there any limitation for single field, I mean about the size
> for
> > blob data?
>
> I think there is some default cutoff for very large values.
>
> Why would you want to put very large blobs into documents?
> This is a warning flag that you may be using the search index in a wrong
> way. Consider storing large blobs outside of the search index and reference
> them from the docs.
>
>
> In general, it would help a lot if you start telling us WHAT you intend to
> use Solr for, what you try to achieve, what performance goals/requirements
> you have etc, instead of a lot of very specific max/min questions. There
> are very seldom hard limits, and if there are, it is usually not a good
> idea to approach them :)
>
> Jan
>
>

Re: Corrupt Index error on Target cluster

2018-09-09 Thread Susheel Kumar

Thanks. I have 6.6.2.  Do you remember the exact minor version which you
run into with corruptIndex.  I did fix it using CheckIndex.

On Sat, Sep 8, 2018 at 2:00 AM Stephen Bianamara 
wrote:

> Hmm, when this occurred for me I was also on 6.6 between minor releases. So
> unclear if it's connected to 6.6 specifically.
>
> If you want to resolve the problem, you should be able to use the
> collection api delete that node from the collection, and then re-add it
> which will trigger resync.
>
>
> On Fri, Sep 7, 2018, 10:35 AM Susheel Kumar  wrote:
>
> > No. The solr i have is 6.6.
> >
> > On Fri, Sep 7, 2018 at 10:51 AM Stephen Bianamara <
> > sdl1tinsold...@gmail.com>
> > wrote:
> >
> > > I've gotten incorrect checksums when upgrading solr versions across the
> > > cluster. Or in other words, when indexing into a mixed version cluster.
> > Are
> > > you running mixed versions by chance?
> > >
> > > On Fri, Sep 7, 2018, 6:07 AM Susheel Kumar 
> > wrote:
> > >
> > > > Anyone has  insight / have faced above errors ?
> > > >
> > > > On Thu, Sep 6, 2018 at 12:04 PM Susheel Kumar  >
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > We had a running cluster with CDCR and there were some issues with
> > > > > indexing on Source cluster which got resolved after restarting the
> > > nodes
> > > > > (in my absence...) and now I see  below errors on a shard at Target
> > > > > cluster.  Any suggestions / ideas what could have caused this and
> > whats
> > > > the
> > > > > best way to recover.
> > > > >
> > > > > Thnx
> > > > >
> > > > > Caused by: org.apache.solr.common.SolrException: Error opening new
> > > > searcher
> > > > > at
> > > > > org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2069)
> > > > > at
> > > org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189)
> > > > > at
> > > org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1926)
> > > > > at
> > > org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1826)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:127)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:310)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> > > > > at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:267)
> > > > > ... 34 more
> > > > > Caused by: org.apache.lucene.index.CorruptIndexException: Corrupted
> > > > > bitsPerDocBase: 6033
> > > > >
> > > >
> > >
> >
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard8_replica1/data/index.20180903220548447/_9nsy.tvx")))
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsIndexReader.(CompressingStoredFieldsIndexReader.java:89)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.(CompressingTermVectorsReader.java:126)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat.vectorsReader(CompressingTermVectorsFormat.java:91)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:128)
> > > > > at
> > > > > org.apache.lucene.index.SegmentReader.(SegmentReader.jav

Re: Corrupt Index error on Target cluster

2018-09-07 Thread Susheel Kumar

No. The solr i have is 6.6.

On Fri, Sep 7, 2018 at 10:51 AM Stephen Bianamara 
wrote:

> I've gotten incorrect checksums when upgrading solr versions across the
> cluster. Or in other words, when indexing into a mixed version cluster. Are
> you running mixed versions by chance?
>
> On Fri, Sep 7, 2018, 6:07 AM Susheel Kumar  wrote:
>
> > Anyone has  insight / have faced above errors ?
> >
> > On Thu, Sep 6, 2018 at 12:04 PM Susheel Kumar 
> > wrote:
> >
> > > Hello,
> > >
> > > We had a running cluster with CDCR and there were some issues with
> > > indexing on Source cluster which got resolved after restarting the
> nodes
> > > (in my absence...) and now I see  below errors on a shard at Target
> > > cluster.  Any suggestions / ideas what could have caused this and whats
> > the
> > > best way to recover.
> > >
> > > Thnx
> > >
> > > Caused by: org.apache.solr.common.SolrException: Error opening new
> > searcher
> > > at
> > > org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2069)
> > > at
> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189)
> > > at
> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1926)
> > > at
> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1826)
> > > at
> > >
> >
> org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:127)
> > > at
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:310)
> > > at
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)
> > > at
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> > > at
> > >
> >
> org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:267)
> > > ... 34 more
> > > Caused by: org.apache.lucene.index.CorruptIndexException: Corrupted
> > > bitsPerDocBase: 6033
> > >
> >
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard8_replica1/data/index.20180903220548447/_9nsy.tvx")))
> > > at
> > >
> >
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsIndexReader.(CompressingStoredFieldsIndexReader.java:89)
> > > at
> > >
> >
> org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.(CompressingTermVectorsReader.java:126)
> > > at
> > >
> >
> org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat.vectorsReader(CompressingTermVectorsFormat.java:91)
> > > at
> > >
> >
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:128)
> > > at
> > > org.apache.lucene.index.SegmentReader.(SegmentReader.java:74)
> > > at
> > >
> >
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
> > > at
> > >
> >
> org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:197)
> > > at
> > >
> >
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:103)
> > > at
> > > org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:467)
> > > at
> > > org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:103)
> > > at
> > > org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:79)
> > > at
> > >
> >
> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:39)
> > > at
> > > org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2033)
> > > ... 43 more
> > > Suppressed: org.apache.lucene.index.CorruptIndexException:
> > > checksum failed (hardware problem?) : expected=e5bf0d15 actual=21722825
> > >
> >
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard8_replica1/data/index.20180903220548447/_9nsy.tvx")))
> > > at
> > > org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419)
> > > at
> > > org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:462)
> > > at
> > >
> >
> org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.(CompressingTermVectorsReader.java:131)
> > > ... 54 more
> > >
> >
>

Re: Corrupt Index error on Target cluster

2018-09-07 Thread Susheel Kumar

Anyone has  insight / have faced above errors ?

On Thu, Sep 6, 2018 at 12:04 PM Susheel Kumar  wrote:

> Hello,
>
> We had a running cluster with CDCR and there were some issues with
> indexing on Source cluster which got resolved after restarting the nodes
> (in my absence...) and now I see  below errors on a shard at Target
> cluster.  Any suggestions / ideas what could have caused this and whats the
> best way to recover.
>
> Thnx
>
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
> at
> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2069)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1926)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1826)
> at
> org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:127)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:310)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> at
> org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:267)
> ... 34 more
> Caused by: org.apache.lucene.index.CorruptIndexException: Corrupted
> bitsPerDocBase: 6033
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard8_replica1/data/index.20180903220548447/_9nsy.tvx")))
> at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsIndexReader.(CompressingStoredFieldsIndexReader.java:89)
> at
> org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.(CompressingTermVectorsReader.java:126)
> at
> org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat.vectorsReader(CompressingTermVectorsFormat.java:91)
> at
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:128)
> at
> org.apache.lucene.index.SegmentReader.(SegmentReader.java:74)
> at
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
> at
> org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:197)
> at
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:103)
> at
> org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:467)
> at
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:103)
> at
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:79)
> at
> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:39)
> at
> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2033)
> ... 43 more
> Suppressed: org.apache.lucene.index.CorruptIndexException:
> checksum failed (hardware problem?) : expected=e5bf0d15 actual=21722825
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard8_replica1/data/index.20180903220548447/_9nsy.tvx")))
> at
> org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419)
> at
> org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:462)
> at
> org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.(CompressingTermVectorsReader.java:131)
> ... 54 more
>

Re: Multi word searching is not working getting random search results

2018-09-06 Thread Susheel Kumar

How about you search with Intermodal Schedules (plural) & try phrase slop
for better control on relevancy order

https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html


On Thu, Sep 6, 2018 at 12:10 PM Muddapati, Jagadish <
jagadish.muddap...@nscorp.com> wrote:

> Label: newbie
> Environment:
> I am currently running solr on Linux platform.
>
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.5"
>
> openjdk version "1.8.0_181"
>
> AEM version: 6.2
>
> I recently integrate solr to AEM and when i do search for multiple words
> the search results are getting randomly.
>
> search words: Intermodal schedule
> Results: First solr displaying the search results related to Intermodal
> and after few pages I am seeing the serch term schedule related pages
> randomly. I am not getting the results related to multi words on the page.
> For example: I am not seeing the results like [Terminals & Schedules |
> Intermodal | Shipping Options ... page on starting and getting random
> results and the  [Terminals & Schedules | Intermodal | Shipping Options ...
> page displaying after the 40 results.
>
> Here is the query on browser URL:
>
> http://test-servername/content/nscorp/en/search-results.html?start=0=Intermodal+Schedule
> <
> http://servername/content/nscorp/en/search-results.html?start=0=Intermodal+Schedule
> >
>
> I am using solr version 7.4
>
> Thanks,
> Jagadish M.
>
>
>

Corrupt Index error on Target cluster

2018-09-06 Thread Susheel Kumar

Hello,

We had a running cluster with CDCR and there were some issues with indexing
on Source cluster which got resolved after restarting the nodes (in my
absence...) and now I see  below errors on a shard at Target cluster.  Any
suggestions / ideas what could have caused this and whats the best way to
recover.

Thnx

Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2069)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1926)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1826)
at
org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:127)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:310)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at
org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:267)
... 34 more
Caused by: org.apache.lucene.index.CorruptIndexException: Corrupted
bitsPerDocBase: 6033
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard8_replica1/data/index.20180903220548447/_9nsy.tvx")))
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsIndexReader.(CompressingStoredFieldsIndexReader.java:89)
at
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.(CompressingTermVectorsReader.java:126)
at
org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat.vectorsReader(CompressingTermVectorsFormat.java:91)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:128)
at
org.apache.lucene.index.SegmentReader.(SegmentReader.java:74)
at
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
at
org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:197)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:103)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:467)
at
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:103)
at
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:79)
at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:39)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2033)
... 43 more
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum
failed (hardware problem?) : expected=e5bf0d15 actual=21722825
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard8_replica1/data/index.20180903220548447/_9nsy.tvx")))
at
org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419)
at
org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:462)
at
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.(CompressingTermVectorsReader.java:131)
... 54 more

Re: indexing two words, searching single word

2018-08-03 Thread Susheel Kumar

and as you suggested, use stop word before shingles...

On Fri, Aug 3, 2018 at 8:10 AM, Clemens Wyss DEV 
wrote:

> 
>   
>   
>outputUnigrams="true" tokenSeparator=""/> 
> 
>
> seems to "work"
>
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV 
> Gesendet: Freitag, 3. August 2018 13:46
> An: solr-user@lucene.apache.org
> Betreff: AW: indexing two words, searching single word
>
> >Because you probably are not looking for "andthe" kind of tokens
> (unfortunately) I guess I am, as we don't know what people enter...
>
> > a shingle plus regex to remove whitespace
> sounds interesting. How would that filter-chain look like? That would be
> an type="index"-analyzer?
> I guess we could shingle after stop-word-filtering and I quess
> maxShingleSize="2" would suffice
>
> -Ursprüngliche Nachricht-
> Von: Alexandre Rafalovitch 
> Gesendet: Freitag, 3. August 2018 13:33
> An: solr-user 
> Betreff: Re: indexing two words, searching single word
>
> But what is your generic problem then. Because you probably are not
> looking for "andthe" kind of tokens.
>
> However a shingle plus regex to remove whitespace can give you "anytwo
> wordstogether smooshed" tokens in the index.
>
> Regards,
>  Alex
>
>
> On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV, 
> wrote:
>
> > Hi Markus,
> > thanks for the quick answer.
> >
> > "sound stage" was just an example. We are looking for a generic
> > solution ...
> >
> > Is it "ok" to apply an NGRamFilter for query-analyzing?
> > 
> > 
> > 
> >  > maxGramSize="15" />
> > 
> >
> > I guess (besides the performance impact) this reduces search results
> > accuracy?
> >
> > -Clemens
> >
> > -Ursprüngliche Nachricht-
> > Von: Markus Jelsma 
> > Gesendet: Freitag, 3. August 2018 12:43
> > An: solr-user@lucene.apache.org
> > Betreff: RE: indexing two words, searching single word
> >
> > Hello,
> >
> > If your case is English you could use synonyms to work around the
> > problem of the few compound words of the language. However, would you
> > be dealing with a Germanic compound language, the
> > HyphenationCompoundWordTokenFilter
> > [1] or DictionaryCompoundWordTokenFilter are a better choice. The
> > former is much more flexible but has its drawbacks.
> >
> > Regards,
> > Markus
> >
> >
> > https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucen
> > e/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html
> >
> >
> >
> > -Original message-
> > > From:Clemens Wyss DEV 
> > > Sent: Friday 3rd August 2018 12:22
> > > To: solr-user@lucene.apache.org
> > > Subject: indexing two words, searching single word
> > >
> > > Sounds like a rather simple issue:
> > > if I index "sound stage" and search for "soundstage" I get no hits
> > >
> > > What am I doing wrong
> > > a) when indexing
> > > b) when searching
> > > ?
> > >
> > > Thx in advance
> > > - Clemens
> > >
> >
>

Re: Solr fails even ZK quorum has majority

2018-07-24 Thread Susheel Kumar

Thank you, Shalin.

Here is the Jira  https://issues.apache.org/jira/browse/SOLR-12585

On Mon, Jul 23, 2018 at 11:21 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Can you please open a Jira issue? I don't think we handle DNS problems very
> well during startup. Thanks.
>
> On Tue, Jul 24, 2018 at 2:31 AM Susheel Kumar 
> wrote:
>
> > Something messed up with DNS which resulted into unknown host exception
> for
> > one the machines in our env and caused Solr to throw the above exception
> >
> >  Eric,  I have the Solr configured using service installation script and
> > the ZK_HOST entry in
> > solr.in.sh="server1:2181,server2:2181,server3:2181/collection"
> > and after removing the server1 from above, was able to start Solr
> otherwise
> > it was throwing above exception.
> >
> > Thnx
> >
> >
> > On Mon, Jul 23, 2018 at 4:20 PM, Erick Erickson  >
> > wrote:
> >
> > > And how do you start Solr? Do you use the entire 3-node ensemble
> address?
> > >
> > > On Mon, Jul 23, 2018 at 12:55 PM, Michael Braun 
> > wrote:
> > > > Per the exception, this looks like a network / DNS resolution issue,
> > > > independent of Solr and Zookeeper code:
> > > >
> > > > Caused by: org.apache.solr.common.SolrException:
> > > > java.net.UnknownHostException: ditsearch001.es.com: Name or service
> > not
> > > > known
> > > >
> > > > Is this address actually resolvable at the time?
> > > >
> > > > On Mon, Jul 23, 2018 at 3:46 PM, Susheel Kumar <
> susheel2...@gmail.com>
> > > > wrote:
> > > >
> > > >> In usual circumstances when one Zookeeper goes down while others 2
> are
> > > up,
> > > >> Solr continues to operate but when one of the ZK machine was not
> > > reachable
> > > >> with ping returning below results, Solr count't starts.  See stack
> > trace
> > > >> below
> > > >>
> > > >> ping: cannot resolve ditsearch001.es.com: Unknown host
> > > >>
> > > >>
> > > >> Setup: Solr 6.6.2 and Zookeeper 3.4.10
> > > >>
> > > >> I had to remove this server name from the ZK_HOST list (solr.in.sh)
> > in
> > > >> order to get Solr started. Ideally whatever issue is there as far as
> > > >> majority is there, Solr should get started.
> > > >>
> > > >> Has any one noticed this issue?
> > > >>
> > > >> Thnx
> > > >>
> > > >> 2018-07-23 15:30:47.218 INFO  (main) [   ] o.e.j.s.Server
> > > >> jetty-9.3.14.v20161028
> > > >>
> > > >> 2018-07-23 15:30:47.817 INFO  (main) [   ]
> o.a.s.s.SolrDispatchFilter
> > > ___
> > > >> _   Welcome to Apache Solr‚Ñ¢ version 6.6.2
> > > >>
> > > >> 2018-07-23 15:30:47.829 INFO  (main) [   ]
> o.a.s.s.SolrDispatchFilter
> > /
> > > __|
> > > >> ___| |_ _   Starting in cloud mode on port 8080
> > > >>
> > > >> 2018-07-23 15:30:47.830 INFO  (main) [   ]
> o.a.s.s.SolrDispatchFilter
> > > \__
> > > >> \/ _ \ | '_|  Install dir: /opt/solr
> > > >>
> > > >> 2018-07-23 15:30:47.861 INFO  (main) [   ]
> o.a.s.s.SolrDispatchFilter
> > > >> |___/\___/_|_|Start time: 2018-07-23T15:30:47.832Z
> > > >>
> > > >> 2018-07-23 15:30:47.863 INFO  (main) [   ]
> o.a.s.s.StartupLoggingUtils
> > > >> Property solr.log.muteconsole given. Muting ConsoleAppender named
> > > CONSOLE
> > > >>
> > > >> 2018-07-23 15:30:47.929 INFO  (main) [   ]
> o.a.s.c.SolrResourceLoader
> > > Using
> > > >> system property solr.solr.home: /app/solr/data
> > > >>
> > > >> 2018-07-23 15:30:48.037 ERROR (main) [   ]
> o.a.s.s.SolrDispatchFilter
> > > Could
> > > >> not start Solr. Check solr/home property and the logs
> > > >>
> > > >> 2018-07-23 15:30:48.235 ERROR (main) [   ] o.a.s.c.SolrCore
> > > >> null:org.apache.solr.common.SolrException: Error occurred while
> > loading
> > > >> solr.xml from zookeeper
> > > >>
> > > >> at
> > > >> org.apache.solr.servlet.SolrDispatchFilter.loadNodeConfig(
> > > >> SolrDis

Re: Solr fails even ZK quorum has majority

2018-07-23 Thread Susheel Kumar

Something messed up with DNS which resulted into unknown host exception for
one the machines in our env and caused Solr to throw the above exception

 Eric,  I have the Solr configured using service installation script and
the ZK_HOST entry in
solr.in.sh="server1:2181,server2:2181,server3:2181/collection"
and after removing the server1 from above, was able to start Solr otherwise
it was throwing above exception.

Thnx


On Mon, Jul 23, 2018 at 4:20 PM, Erick Erickson 
wrote:

> And how do you start Solr? Do you use the entire 3-node ensemble address?
>
> On Mon, Jul 23, 2018 at 12:55 PM, Michael Braun  wrote:
> > Per the exception, this looks like a network / DNS resolution issue,
> > independent of Solr and Zookeeper code:
> >
> > Caused by: org.apache.solr.common.SolrException:
> > java.net.UnknownHostException: ditsearch001.es.com: Name or service not
> > known
> >
> > Is this address actually resolvable at the time?
> >
> > On Mon, Jul 23, 2018 at 3:46 PM, Susheel Kumar 
> > wrote:
> >
> >> In usual circumstances when one Zookeeper goes down while others 2 are
> up,
> >> Solr continues to operate but when one of the ZK machine was not
> reachable
> >> with ping returning below results, Solr count't starts.  See stack trace
> >> below
> >>
> >> ping: cannot resolve ditsearch001.es.com: Unknown host
> >>
> >>
> >> Setup: Solr 6.6.2 and Zookeeper 3.4.10
> >>
> >> I had to remove this server name from the ZK_HOST list (solr.in.sh) in
> >> order to get Solr started. Ideally whatever issue is there as far as
> >> majority is there, Solr should get started.
> >>
> >> Has any one noticed this issue?
> >>
> >> Thnx
> >>
> >> 2018-07-23 15:30:47.218 INFO  (main) [   ] o.e.j.s.Server
> >> jetty-9.3.14.v20161028
> >>
> >> 2018-07-23 15:30:47.817 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> ___
> >> _   Welcome to Apache Solr‚Ñ¢ version 6.6.2
> >>
> >> 2018-07-23 15:30:47.829 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter /
> __|
> >> ___| |_ _   Starting in cloud mode on port 8080
> >>
> >> 2018-07-23 15:30:47.830 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> \__
> >> \/ _ \ | '_|  Install dir: /opt/solr
> >>
> >> 2018-07-23 15:30:47.861 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> >> |___/\___/_|_|Start time: 2018-07-23T15:30:47.832Z
> >>
> >> 2018-07-23 15:30:47.863 INFO  (main) [   ] o.a.s.s.StartupLoggingUtils
> >> Property solr.log.muteconsole given. Muting ConsoleAppender named
> CONSOLE
> >>
> >> 2018-07-23 15:30:47.929 INFO  (main) [   ] o.a.s.c.SolrResourceLoader
> Using
> >> system property solr.solr.home: /app/solr/data
> >>
> >> 2018-07-23 15:30:48.037 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter
> Could
> >> not start Solr. Check solr/home property and the logs
> >>
> >> 2018-07-23 15:30:48.235 ERROR (main) [   ] o.a.s.c.SolrCore
> >> null:org.apache.solr.common.SolrException: Error occurred while loading
> >> solr.xml from zookeeper
> >>
> >> at
> >> org.apache.solr.servlet.SolrDispatchFilter.loadNodeConfig(
> >> SolrDispatchFilter.java:270)
> >>
> >> at
> >> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(
> >> SolrDispatchFilter.java:242)
> >>
> >> at
> >> org.apache.solr.servlet.SolrDispatchFilter.init(
> >> SolrDispatchFilter.java:173)
> >>
> >> at
> >> org.eclipse.jetty.servlet.FilterHolder.initialize(
> FilterHolder.java:137)
> >>
> >> at
> >> org.eclipse.jetty.servlet.ServletHandler.initialize(
> >> ServletHandler.java:873)
> >>
> >> at
> >> org.eclipse.jetty.servlet.ServletContextHandler.startContext(
> >> ServletContextHandler.java:349)
> >>
> >> at
> >> org.eclipse.jetty.webapp.WebAppContext.startWebapp(
> >> WebAppContext.java:1404)
> >>
> >> at
> >> org.eclipse.jetty.webapp.WebAppContext.startContext(
> >> WebAppContext.java:1366)
> >>
> >> at
> >> org.eclipse.jetty.server.handler.ContextHandler.
> >> doStart(ContextHandler.java:778)
> >>
> >> at
> >> org.eclipse.jetty.servlet.ServletContextHandler.doStart(
> >> ServletContextHandler.java:262)
> >>
> >>

Solr fails even ZK quorum has majority

2018-07-23 Thread Susheel Kumar

In usual circumstances when one Zookeeper goes down while others 2 are up,
Solr continues to operate but when one of the ZK machine was not reachable
with ping returning below results, Solr count't starts.  See stack trace
below

ping: cannot resolve ditsearch001.es.com: Unknown host


Setup: Solr 6.6.2 and Zookeeper 3.4.10

I had to remove this server name from the ZK_HOST list (solr.in.sh) in
order to get Solr started. Ideally whatever issue is there as far as
majority is there, Solr should get started.

Has any one noticed this issue?

Thnx

2018-07-23 15:30:47.218 INFO  (main) [   ] o.e.j.s.Server
jetty-9.3.14.v20161028

2018-07-23 15:30:47.817 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter  ___
_   Welcome to Apache Solr‚Ñ¢ version 6.6.2

2018-07-23 15:30:47.829 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter / __|
___| |_ _   Starting in cloud mode on port 8080

2018-07-23 15:30:47.830 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter \__
\/ _ \ | '_|  Install dir: /opt/solr

2018-07-23 15:30:47.861 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
|___/\___/_|_|Start time: 2018-07-23T15:30:47.832Z

2018-07-23 15:30:47.863 INFO  (main) [   ] o.a.s.s.StartupLoggingUtils
Property solr.log.muteconsole given. Muting ConsoleAppender named CONSOLE

2018-07-23 15:30:47.929 INFO  (main) [   ] o.a.s.c.SolrResourceLoader Using
system property solr.solr.home: /app/solr/data

2018-07-23 15:30:48.037 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could
not start Solr. Check solr/home property and the logs

2018-07-23 15:30:48.235 ERROR (main) [   ] o.a.s.c.SolrCore
null:org.apache.solr.common.SolrException: Error occurred while loading
solr.xml from zookeeper

at
org.apache.solr.servlet.SolrDispatchFilter.loadNodeConfig(SolrDispatchFilter.java:270)

at
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:242)

at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:173)

at
org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:137)

at
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:873)

at
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:349)

at
org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1404)

at
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1366)

at
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)

at
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)

at
org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:520)

at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)

at
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:41)

at
org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)

at
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:499)

at
org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:147)

at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180)

at
org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:458)

at
org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:64)

at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)

at
org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)

at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)

at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313)

at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)

at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:150)

at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)

at
org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:561)

at
org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:236)

at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)

at
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)

at org.eclipse.jetty.server.Server.start(Server.java:422)

at
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)

at
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)

at org.eclipse.jetty.server.Server.doStart(Server.java:389)

at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)

at
org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1516)

at

Re: Question regarding searching Chinese characters

2018-07-20 Thread Susheel Kumar

I think so.  I used the exact as in github


  







  




On Fri, Jul 20, 2018 at 10:12 AM, Amanda Shuman 
wrote:

> Thanks! That does indeed look promising... This can be added on top of
> Smart Chinese, right? Or is it an alternative?
>
>
> --
> Dr. Amanda Shuman
> Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> <http://www.maoistlegacy.uni-freiburg.de/>
> PhD, University of California, Santa Cruz
> http://www.amandashuman.net/
> http://www.prchistoryresources.org/
> Office: +49 (0) 761 203 4925
>
>
> On Fri, Jul 20, 2018 at 3:11 PM, Susheel Kumar 
> wrote:
>
> > I think CJKFoldingFilter will work for you.  I put 舊小說 in index and then
> > each of A, B or C or D in query and they seems to be matching and CJKFF
> is
> > transforming the 舊 to 旧
> >
> > On Fri, Jul 20, 2018 at 9:08 AM, Susheel Kumar 
> > wrote:
> >
> > > Lack of my chinese language knowledge but if you want, I can do quick
> > test
> > > for you in Analysis tab if you can give me what to put in index and
> query
> > > window...
> > >
> > > On Fri, Jul 20, 2018 at 8:59 AM, Susheel Kumar 
> > > wrote:
> > >
> > >> Have you tried to use CJKFoldingFilter https://g
> > >> ithub.com/sul-dlss/CJKFoldingFilter.  I am not sure if this would
> cover
> > >> your use case but I am using this filter and so far no issues.
> > >>
> > >> Thnx
> > >>
> > >> On Fri, Jul 20, 2018 at 8:44 AM, Amanda Shuman <
> amanda.shu...@gmail.com
> > >
> > >> wrote:
> > >>
> > >>> Thanks, Alex - I have seen a few of those links but never considered
> > >>> transliteration! We use lucene's Smart Chinese analyzer. The issue is
> > >>> basically what is laid out in the old blogspot post, namely this
> point:
> > >>>
> > >>>
> > >>> "Why approach CJK resource discovery differently?
> > >>>
> > >>> 2.  Search results must be as script agnostic as possible.
> > >>>
> > >>> There is more than one way to write each word. "Simplified"
> characters
> > >>> were
> > >>> emphasized for printed materials in mainland China starting in the
> > 1950s;
> > >>> "Traditional" characters were used in printed materials prior to the
> > >>> 1950s,
> > >>> and are still used in Taiwan, Hong Kong and Macau today.
> > >>> Since the characters are distinct, it's as if Chinese materials are
> > >>> written
> > >>> in two scripts.
> > >>> Another way to think about it:  every written Chinese word has at
> least
> > >>> two
> > >>> completely different spellings.  And it can be mix-n-match:  a word
> can
> > >>> be
> > >>> written with one traditional  and one simplified character.
> > >>> Example:   Given a user query 舊小說  (traditional for old fiction), the
> > >>> results should include matches for 舊小說 (traditional) and 旧小说
> > (simplified
> > >>> characters for old fiction)"
> > >>>
> > >>> So, using the example provided above, we are dealing with materials
> > >>> produced in the 1950s-1970s that do even weirder things like:
> > >>>
> > >>> A. 舊小說
> > >>>
> > >>> can also be
> > >>>
> > >>> B. 旧小说 (all simplified)
> > >>> or
> > >>> C. 旧小說 (first character simplified, last character traditional)
> > >>> or
> > >>> D. 舊小 说 (first character traditional, last character simplified)
> > >>>
> > >>> Thankfully the middle character was never simplified in recent times.
> > >>>
> > >>> From a historical standpoint, the mixed nature of the characters in
> the
> > >>> same word/phrase is because not all simplified characters were
> adopted
> > at
> > >>> the same time by everyone uniformly (good times...).
> > >>>
> > >>> The problem seems to be that Solr can easily handle A or B above, but
> > >>> NOT C
> > >>> or D using the Smart Chinese analyzer. I'm not really sure how to
> > change
> > >>> that at this point... maybe I should figure out how to contact the
> > >>> creators
> > >>> of the analyzer and

Re: Question regarding searching Chinese characters

2018-07-20 Thread Susheel Kumar

I think CJKFoldingFilter will work for you.  I put 舊小說 in index and then
each of A, B or C or D in query and they seems to be matching and CJKFF is
transforming the 舊 to 旧

On Fri, Jul 20, 2018 at 9:08 AM, Susheel Kumar 
wrote:

> Lack of my chinese language knowledge but if you want, I can do quick test
> for you in Analysis tab if you can give me what to put in index and query
> window...
>
> On Fri, Jul 20, 2018 at 8:59 AM, Susheel Kumar 
> wrote:
>
>> Have you tried to use CJKFoldingFilter https://g
>> ithub.com/sul-dlss/CJKFoldingFilter.  I am not sure if this would cover
>> your use case but I am using this filter and so far no issues.
>>
>> Thnx
>>
>> On Fri, Jul 20, 2018 at 8:44 AM, Amanda Shuman 
>> wrote:
>>
>>> Thanks, Alex - I have seen a few of those links but never considered
>>> transliteration! We use lucene's Smart Chinese analyzer. The issue is
>>> basically what is laid out in the old blogspot post, namely this point:
>>>
>>>
>>> "Why approach CJK resource discovery differently?
>>>
>>> 2.  Search results must be as script agnostic as possible.
>>>
>>> There is more than one way to write each word. "Simplified" characters
>>> were
>>> emphasized for printed materials in mainland China starting in the 1950s;
>>> "Traditional" characters were used in printed materials prior to the
>>> 1950s,
>>> and are still used in Taiwan, Hong Kong and Macau today.
>>> Since the characters are distinct, it's as if Chinese materials are
>>> written
>>> in two scripts.
>>> Another way to think about it:  every written Chinese word has at least
>>> two
>>> completely different spellings.  And it can be mix-n-match:  a word can
>>> be
>>> written with one traditional  and one simplified character.
>>> Example:   Given a user query 舊小說  (traditional for old fiction), the
>>> results should include matches for 舊小說 (traditional) and 旧小说 (simplified
>>> characters for old fiction)"
>>>
>>> So, using the example provided above, we are dealing with materials
>>> produced in the 1950s-1970s that do even weirder things like:
>>>
>>> A. 舊小說
>>>
>>> can also be
>>>
>>> B. 旧小说 (all simplified)
>>> or
>>> C. 旧小說 (first character simplified, last character traditional)
>>> or
>>> D. 舊小 说 (first character traditional, last character simplified)
>>>
>>> Thankfully the middle character was never simplified in recent times.
>>>
>>> From a historical standpoint, the mixed nature of the characters in the
>>> same word/phrase is because not all simplified characters were adopted at
>>> the same time by everyone uniformly (good times...).
>>>
>>> The problem seems to be that Solr can easily handle A or B above, but
>>> NOT C
>>> or D using the Smart Chinese analyzer. I'm not really sure how to change
>>> that at this point... maybe I should figure out how to contact the
>>> creators
>>> of the analyzer and ask them?
>>>
>>> Amanda
>>>
>>> --
>>> Dr. Amanda Shuman
>>> Post-doc researcher, University of Freiburg, The Maoist Legacy Project
>>> <http://www.maoistlegacy.uni-freiburg.de/>
>>> PhD, University of California, Santa Cruz
>>> http://www.amandashuman.net/
>>> http://www.prchistoryresources.org/
>>> Office: +49 (0) 761 203 4925
>>>
>>>
>>> On Fri, Jul 20, 2018 at 1:40 PM, Alexandre Rafalovitch <
>>> arafa...@gmail.com>
>>> wrote:
>>>
>>> > This is probably your start, if not read already:
>>> > https://lucene.apache.org/solr/guide/7_4/language-analysis.html
>>> >
>>> > Otherwise, I think your answer would be somewhere around using ICU4J,
>>> > IBM's library for dealing with Unicode: http://site.icu-project.org/
>>> > (mentioned on the same page above)
>>> > Specifically, transformations:
>>> > http://userguide.icu-project.org/transforms/general
>>> >
>>> > With that, maybe you map both alphabets into latin. I did that once
>>> > for Thai for a demo:
>>> > https://github.com/arafalov/solr-thai-test/blob/master/
>>> > collection1/conf/schema.xml#L34
>>> >
>>> > The challenge is to figure out all the magic rules for that. You'd
>>> > have to dig through the ICU documentation and other web pages. I found
&

Re: Question regarding searching Chinese characters

2018-07-20 Thread Susheel Kumar

Lack of my chinese language knowledge but if you want, I can do quick test
for you in Analysis tab if you can give me what to put in index and query
window...

On Fri, Jul 20, 2018 at 8:59 AM, Susheel Kumar 
wrote:

> Have you tried to use CJKFoldingFilter https://github.com/sul-dlss/
> CJKFoldingFilter.  I am not sure if this would cover your use case but I
> am using this filter and so far no issues.
>
> Thnx
>
> On Fri, Jul 20, 2018 at 8:44 AM, Amanda Shuman 
> wrote:
>
>> Thanks, Alex - I have seen a few of those links but never considered
>> transliteration! We use lucene's Smart Chinese analyzer. The issue is
>> basically what is laid out in the old blogspot post, namely this point:
>>
>>
>> "Why approach CJK resource discovery differently?
>>
>> 2.  Search results must be as script agnostic as possible.
>>
>> There is more than one way to write each word. "Simplified" characters
>> were
>> emphasized for printed materials in mainland China starting in the 1950s;
>> "Traditional" characters were used in printed materials prior to the
>> 1950s,
>> and are still used in Taiwan, Hong Kong and Macau today.
>> Since the characters are distinct, it's as if Chinese materials are
>> written
>> in two scripts.
>> Another way to think about it:  every written Chinese word has at least
>> two
>> completely different spellings.  And it can be mix-n-match:  a word can be
>> written with one traditional  and one simplified character.
>> Example:   Given a user query 舊小說  (traditional for old fiction), the
>> results should include matches for 舊小說 (traditional) and 旧小说 (simplified
>> characters for old fiction)"
>>
>> So, using the example provided above, we are dealing with materials
>> produced in the 1950s-1970s that do even weirder things like:
>>
>> A. 舊小說
>>
>> can also be
>>
>> B. 旧小说 (all simplified)
>> or
>> C. 旧小說 (first character simplified, last character traditional)
>> or
>> D. 舊小 说 (first character traditional, last character simplified)
>>
>> Thankfully the middle character was never simplified in recent times.
>>
>> From a historical standpoint, the mixed nature of the characters in the
>> same word/phrase is because not all simplified characters were adopted at
>> the same time by everyone uniformly (good times...).
>>
>> The problem seems to be that Solr can easily handle A or B above, but NOT
>> C
>> or D using the Smart Chinese analyzer. I'm not really sure how to change
>> that at this point... maybe I should figure out how to contact the
>> creators
>> of the analyzer and ask them?
>>
>> Amanda
>>
>> --
>> Dr. Amanda Shuman
>> Post-doc researcher, University of Freiburg, The Maoist Legacy Project
>> <http://www.maoistlegacy.uni-freiburg.de/>
>> PhD, University of California, Santa Cruz
>> http://www.amandashuman.net/
>> http://www.prchistoryresources.org/
>> Office: +49 (0) 761 203 4925
>>
>>
>> On Fri, Jul 20, 2018 at 1:40 PM, Alexandre Rafalovitch <
>> arafa...@gmail.com>
>> wrote:
>>
>> > This is probably your start, if not read already:
>> > https://lucene.apache.org/solr/guide/7_4/language-analysis.html
>> >
>> > Otherwise, I think your answer would be somewhere around using ICU4J,
>> > IBM's library for dealing with Unicode: http://site.icu-project.org/
>> > (mentioned on the same page above)
>> > Specifically, transformations:
>> > http://userguide.icu-project.org/transforms/general
>> >
>> > With that, maybe you map both alphabets into latin. I did that once
>> > for Thai for a demo:
>> > https://github.com/arafalov/solr-thai-test/blob/master/
>> > collection1/conf/schema.xml#L34
>> >
>> > The challenge is to figure out all the magic rules for that. You'd
>> > have to dig through the ICU documentation and other web pages. I found
>> > this one for example:
>> > http://avajava.com/tutorials/lessons/what-are-the-system-
>> > transliterators-available-with-icu4j.html;jsessionid=
>> > BEAB0AF05A588B97B8A2393054D908C0
>> >
>> > There is also 12 part series on Solr and Asian text processing, though
>> > it is a bit old now: http://discovery-grindstone.blogspot.com/
>> >
>> > Hope one of these things help.
>> >
>> > Regards,
>> >Alex.
>> >
>> >
>> > On 20 July 2018 at 03:54, Amanda Shuman 
>> wrote:
>> > > Hi all,
>

Re: Question regarding searching Chinese characters

2018-07-20 Thread Susheel Kumar

Have you tried to use CJKFoldingFilter
https://github.com/sul-dlss/CJKFoldingFilter.  I am not sure if this would
cover your use case but I am using this filter and so far no issues.

Thnx

On Fri, Jul 20, 2018 at 8:44 AM, Amanda Shuman 
wrote:

> Thanks, Alex - I have seen a few of those links but never considered
> transliteration! We use lucene's Smart Chinese analyzer. The issue is
> basically what is laid out in the old blogspot post, namely this point:
>
>
> "Why approach CJK resource discovery differently?
>
> 2.  Search results must be as script agnostic as possible.
>
> There is more than one way to write each word. "Simplified" characters were
> emphasized for printed materials in mainland China starting in the 1950s;
> "Traditional" characters were used in printed materials prior to the 1950s,
> and are still used in Taiwan, Hong Kong and Macau today.
> Since the characters are distinct, it's as if Chinese materials are written
> in two scripts.
> Another way to think about it:  every written Chinese word has at least two
> completely different spellings.  And it can be mix-n-match:  a word can be
> written with one traditional  and one simplified character.
> Example:   Given a user query 舊小說  (traditional for old fiction), the
> results should include matches for 舊小說 (traditional) and 旧小说 (simplified
> characters for old fiction)"
>
> So, using the example provided above, we are dealing with materials
> produced in the 1950s-1970s that do even weirder things like:
>
> A. 舊小說
>
> can also be
>
> B. 旧小说 (all simplified)
> or
> C. 旧小說 (first character simplified, last character traditional)
> or
> D. 舊小 说 (first character traditional, last character simplified)
>
> Thankfully the middle character was never simplified in recent times.
>
> From a historical standpoint, the mixed nature of the characters in the
> same word/phrase is because not all simplified characters were adopted at
> the same time by everyone uniformly (good times...).
>
> The problem seems to be that Solr can easily handle A or B above, but NOT C
> or D using the Smart Chinese analyzer. I'm not really sure how to change
> that at this point... maybe I should figure out how to contact the creators
> of the analyzer and ask them?
>
> Amanda
>
> --
> Dr. Amanda Shuman
> Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> 
> PhD, University of California, Santa Cruz
> http://www.amandashuman.net/
> http://www.prchistoryresources.org/
> Office: +49 (0) 761 203 4925
>
>
> On Fri, Jul 20, 2018 at 1:40 PM, Alexandre Rafalovitch  >
> wrote:
>
> > This is probably your start, if not read already:
> > https://lucene.apache.org/solr/guide/7_4/language-analysis.html
> >
> > Otherwise, I think your answer would be somewhere around using ICU4J,
> > IBM's library for dealing with Unicode: http://site.icu-project.org/
> > (mentioned on the same page above)
> > Specifically, transformations:
> > http://userguide.icu-project.org/transforms/general
> >
> > With that, maybe you map both alphabets into latin. I did that once
> > for Thai for a demo:
> > https://github.com/arafalov/solr-thai-test/blob/master/
> > collection1/conf/schema.xml#L34
> >
> > The challenge is to figure out all the magic rules for that. You'd
> > have to dig through the ICU documentation and other web pages. I found
> > this one for example:
> > http://avajava.com/tutorials/lessons/what-are-the-system-
> > transliterators-available-with-icu4j.html;jsessionid=
> > BEAB0AF05A588B97B8A2393054D908C0
> >
> > There is also 12 part series on Solr and Asian text processing, though
> > it is a bit old now: http://discovery-grindstone.blogspot.com/
> >
> > Hope one of these things help.
> >
> > Regards,
> >Alex.
> >
> >
> > On 20 July 2018 at 03:54, Amanda Shuman  wrote:
> > > Hi all,
> > >
> > > We have a problem. Some of our historical documents have mixed together
> > > simplified and Chinese characters. There seems to be no problem when
> > > searching either traditional or simplified separately - that is, if a
> > > particular string/phrase is all in traditional or simplified, it finds
> > it -
> > > but it does not find the string/phrase if the two different characters
> > (one
> > > traditional, one simplified) are mixed together in the SAME
> > string/phrase.
> > >
> > > Has anyone ever handled this problem before? I know some libraries seem
> > to
> > > have implemented something that seems to be able to handle this, but
> I'm
> > > not sure how they did so!
> > >
> > > Amanda
> > > --
> > > Dr. Amanda Shuman
> > > Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> > > 
> > > PhD, University of California, Santa Cruz
> > > http://www.amandashuman.net/
> > > http://www.prchistoryresources.org/
> > > Office: +49 (0) 761 203 4925
> >
>

Re: Suggestions for debugging performance issue

2018-06-27 Thread Susheel Kumar

Did you try to see where/which component  like query, facet highlight... is
taking time by debugQuery=on when performance is slow. Just to rule out any
other component is not the culprit...

Thnx

On Mon, Jun 25, 2018 at 2:06 PM, Chris Troullis 
wrote:

> FYI to all, just as an update, we rebuilt the index in question from
> scratch for a second time this weekend and the problem went away on 1 node,
> but we were still seeing it on the other node. After restarting the
> problematic node, the problem went away. Still makes me a little uneasy as
> we weren't able to determine the cause, but at least we are back to normal
> query times now.
>
> Chris
>
> On Fri, Jun 15, 2018 at 8:06 AM, Chris Troullis 
> wrote:
>
> > Thanks Shawn,
> >
> > As mentioned previously, we are hard committing every 60 seconds, which
> we
> > have been doing for years, and have had no issues until enabling CDCR. We
> > have never seen large tlog sizes before, and even manually issuing a hard
> > commit to the collection does not reduce the size of the tlogs. I believe
> > this is because when using the CDCRUpdateLog the tlogs are not purged
> until
> > the docs have been replicated over. Anyway, since we manually purged the
> > tlogs they seem to now be staying at an acceptable size, so I don't think
> > that is the cause. The documents are not abnormally large, maybe ~20
> > string/numeric fields with simple whitespace tokenization.
> >
> > To answer your questions:
> >
> > -Solr version: 7.2.1
> > -What OS vendor and version Solr is running on: CentOS 6
> > -Total document count on the server (counting all index cores): 13
> > collections totaling ~60 million docs
> > -Total index size on the server (counting all cores): ~60GB
> > -What the total of all Solr heaps on the server is - 16GB heap (we had to
> > increase for CDCR because it was using a lot more heap).
> > -Whether there is software other than Solr on the server - No
> > -How much total memory the server has installed - 64 GB
> >
> > All of this has been consistent for multiple years across multiple Solr
> > versions and we have only started seeing this issue once we started using
> > the CDCRUpdateLog and CDCR, hence why that is the only real thing we can
> > point to. And again, the issue is only affecting 1 of the 13 collections
> on
> > the server, so if it was hardware/heap/GC related then I would think we
> > would be seeing it for every collection, not just one, as they all share
> > the same resources.
> >
> > I will take a look at the GC logs, but I don't think that is the cause.
> > The consistent nature of the slow performance doesn't really point to GC
> > issues, and we have profiling set up in New Relic and it does not show
> any
> > long/frequent GC pauses.
> >
> > We are going to try and rebuild the collection from scratch again this
> > weekend as that has solved the issue in some lower environments, although
> > it's not really consistent. At this point it's all we can think of to do.
> >
> > Thanks,
> >
> > Chris
> >
> >
> > On Thu, Jun 14, 2018 at 6:23 PM, Shawn Heisey 
> wrote:
> >
> >> On 6/12/2018 12:06 PM, Chris Troullis wrote:
> >> > The issue we are seeing is with 1 collection in particular, after we
> >> set up
> >> > CDCR, we are getting extremely slow response times when retrieving
> >> > documents. Debugging the query shows QTime is almost nothing, but the
> >> > overall responseTime is like 5x what it should be. The problem is
> >> > exacerbated by larger result sizes. IE retrieving 25 results is almost
> >> > normal, but 200 results is way slower than normal. I can run the exact
> >> same
> >> > query multiple times in a row (so everything should be cached), and I
> >> still
> >> > see response times way higher than another environment that is not
> using
> >> > CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just
> >> that
> >> > we are using the CDCRUpdateLog. The problem started happening even
> >> before
> >> > we enabled CDCR.
> >> >
> >> > In a lower environment we noticed that the transaction logs were huge
> >> > (multiple gigs), so we tried stopping solr and deleting the tlogs then
> >> > restarting, and that seemed to fix the performance issue. We tried the
> >> same
> >> > thing in production the other day but it had no effect, so now I don't
> >> know
> >> > if it was a coincidence or not.
> >>
> >> There is one other cause besides CDCR buffering that I know of for huge
> >> transaction logs, and it has nothing to do with CDCR:  A lack of hard
> >> commits.  It is strongly recommended to have autoCommit set to a
> >> reasonably short interval (about a minute in my opinion, but 15 seconds
> >> is VERY common).  Most of the time openSearcher should be set to false
> >> in the autoCommit config, and other mechanisms (which might include
> >> autoSoftCommit) should be used for change visibility.  The example
> >> autoCommit settings might seem superfluous because they don't affect
> >> what's searchable, but it is

Re: tlogs not deleting

2018-06-20 Thread Susheel Kumar

Not in my knowledge.  Please double check or wait for some time but after
DISABLEBUFFER on source, your logs should start rolling and its the exact
same issue I have faced with 6.6 which you resolve by DISABLEBUFFER.

On Tue, Jun 19, 2018 at 1:39 PM, Brian Yee  wrote:

> Does anyone have any additional possible causes for this issue? I checked
> the buffer status using "/cdcr?action=STATUS" and it says buffer disabled
> at both target and source.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, June 19, 2018 11:55 AM
> To: solr-user 
> Subject: Re: tlogs not deleting
>
> bq. Do you recommend disabling the buffer on the source SolrCloud as well?
>
> Disable them all on both source and target IMO.
>
> On Tue, Jun 19, 2018 at 8:50 AM, Brian Yee  wrote:
> > Thank you Erick. I am running Solr 6.6. From the documentation:
> > "Replicas do not need to buffer updates, and it is recommended to
> disable buffer on the target SolrCloud."
> >
> > Do you recommend disabling the buffer on the source SolrCloud as well?
> It looks like I already have the buffer disabled at target locations but
> not the source location. Would it even make sense at the source location?
> >
> > This is what I have at the target locations:
> > 
> >   
> >   100
> >   
> >   
> > disabled
> >   
> > 
> >
> >
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Tuesday, June 19, 2018 11:00 AM
> > To: solr-user 
> > Subject: Re: tlogs not deleting
> >
> > Take a look at the CDCR section of your reference guide, be sure you get
> the version which you can download from here:
> > https://archive.apache.org/dist/lucene/solr/ref-guide/
> >
> > There's the CDCR API call you can use for in-flight disabling, and
> depending on the version of Solr you can set it in solrconfig.
> >
> > Basically, buffering was there in the original CDCR to allow a larger
> maintenance window, you could enable buffering and all updates were saved
> until you disabled it, during which period you could do whatever you needed
> with your target cluster and not lose any updates.
> >
> > Later versions can do the full sync of the index and buffering is being
> removed.
> >
> > Best,
> > Erick
> >
> > On Tue, Jun 19, 2018 at 7:31 AM, Brian Yee  wrote:
> >> Thanks for the suggestion. Can you please elaborate a little bit about
> what DISABLEBUFFER does? The documentation is not very detailed. Is this
> something that needs to be done manually whenever this problem happens or
> is it something that we can do to fix it so it won't happen again?
> >>
> >> -Original Message-
> >> From: Susheel Kumar [mailto:susheel2...@gmail.com]
> >> Sent: Monday, June 18, 2018 9:12 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: tlogs not deleting
> >>
> >> You may have to DISABLEBUFFER in source to get rid of tlogs.
> >>
> >> On Mon, Jun 18, 2018 at 6:13 PM, Brian Yee  wrote:
> >>
> >>> So I've read a bunch of stuff on hard/soft commits and tlogs. As I
> >>> understand, after a hard commit, solr is supposed to delete old
> >>> tlogs depending on the numRecordsToKeep and maxNumLogsToKeep values
> >>> in the autocommit settings in solrconfig.xml. I am occasionally
> >>> seeing solr fail to do this and the tlogs just build up over time
> >>> and eventually we run out of disk space on the VM and this causes
> problems for us.
> >>> This does not happen all the time, only sometimes. I currently have
> >>> a tlog directory that has 123G worth of tlogs. The last hard commit
> >>> on this node was 10 minutes ago but these tlogs date back to 3 days
> ago.
> >>>
> >>> We have sometimes found that restarting solr on the node will get it
> >>> to clean up the old tlogs, but we really want to find the root cause
> >>> and fix it if possible so we don't keep getting disk space alerts
> >>> and have to adhoc restart nodes. Has anyone seen an issue like this
> before?
> >>>
> >>> My update handler settings look like this:
> >>>   
> >>>
> >>>   
> >>>
> >>>   ${solr.ulog.dir:}
> >>>   ${solr.ulog.numVersionBuckets:
> >>> 65536}
> >>> 
> >>> 
> >>> 60
> >>> 25
> >>> false
> >>> 
> >>> 
> >>> 12
> >>> 
> >>>
> >>>   
> >>> 100
> >>>   
> >>>
> >>>   
> >>>
>

Re: tlogs not deleting

2018-06-18 Thread Susheel Kumar

You may have to DISABLEBUFFER in source to get rid of tlogs.

On Mon, Jun 18, 2018 at 6:13 PM, Brian Yee  wrote:

> So I've read a bunch of stuff on hard/soft commits and tlogs. As I
> understand, after a hard commit, solr is supposed to delete old tlogs
> depending on the numRecordsToKeep and maxNumLogsToKeep values in the
> autocommit settings in solrconfig.xml. I am occasionally seeing solr fail
> to do this and the tlogs just build up over time and eventually we run out
> of disk space on the VM and this causes problems for us. This does not
> happen all the time, only sometimes. I currently have a tlog directory that
> has 123G worth of tlogs. The last hard commit on this node was 10 minutes
> ago but these tlogs date back to 3 days ago.
>
> We have sometimes found that restarting solr on the node will get it to
> clean up the old tlogs, but we really want to find the root cause and fix
> it if possible so we don't keep getting disk space alerts and have to adhoc
> restart nodes. Has anyone seen an issue like this before?
>
> My update handler settings look like this:
>   
>
>   
>
>   ${solr.ulog.dir:}
>   ${solr.ulog.numVersionBuckets:
> 65536}
> 
> 
> 60
> 25
> false
> 
> 
> 12
> 
>
>   
> 100
>   
>
>   
>

Re: Suggestions for debugging performance issue

2018-06-13 Thread Susheel Kumar

Is this collection anyway drastically different than others in terms of
schema/# of fields/total document etc is it sharded and if so can you look
which shard taking more time with shard.info=true.

Thnx
Susheel

On Wed, Jun 13, 2018 at 2:29 PM, Chris Troullis 
wrote:

> Thanks Erick,
>
> Seems to be a mixed bag in terms of tlog size across all of our indexes,
> but currently the index with the performance issues has 4 tlog files
> totally ~200 MB. This still seems high to me since the collections are in
> sync, and we hard commit every minute, but it's less than the ~8GB it was
> before we cleaned them up. Spot checking some other indexes show some have
> tlogs >3GB, but none of those indexes are having performance issues (on the
> same solr node), so I'm not sure it's related. We have 13 collections of
> various sizes running on our solr cloud cluster, and none of them seem to
> have this issue except for this one index, which is not our largest index
> in terms of size on disk or number of documents.
>
> As far as the response intervals, just running a default search *:* sorting
> on our id field so that we get consistent results across environments, and
> returning 200 results (our max page size in app) with ~20 fields, we see
> times of ~3.5 seconds in production, compared to ~1 second on one of our
> lower environments with an exact copy of the index. Both have CDCR enabled
> and have identical clusters.
>
> Unfortunately, currently the only instance we are seeing the issue on is
> production, so we are limited in the tests that we can run. I did confirm
> in the lower environment that the doc cache is large enough to hold all of
> the results, and that both the doc and query caches should be serving the
> results. Obviously production we have much more indexing going on, but we
> do utilize autowarming for our caches so our response times are still
> stable across new searchers.
>
> We did move the lower environment to the same ESX host as our production
> cluster, so that it is getting resources from the same pool (CPU, RAM,
> etc). The only thing that is different is the disks, but the lower
> environment is running on slower disks than production. And if it was a
> disk issue you would think it would be affecting all of the collections,
> not just this one.
>
> It's a mystery!
>
> Chris
>
>
>
> On Wed, Jun 13, 2018 at 10:38 AM, Erick Erickson 
> wrote:
>
> > First, nice job of eliminating all the standard stuff!
> >
> > About tlogs: Sanity check: They aren't growing again, right? They
> > should hit a relatively steady state. The tlogs are used as a queueing
> > mechanism for CDCR to durably store updates until they can
> > successfully be transmitted to the target. So I'd expect them to hit a
> > fairly steady number.
> >
> > Your lack of CPU/IO spikes is also indicative of something weird,
> > somehow Solr just sitting around doing nothing. What intervals are we
> > talking about here for response? 100ms? 5000ms?
> >
> > When you hammer the same query over and over, you should see your
> > queryResultCache hits increase. If that's the case, Solr is doing no
> > work at all for the search, just assembling the resopnse packet which,
> > as you say, should be in the documentCache. This assumes it's big
> > enough to hold all of the docs that are requested by all the
> > simultaneous requests. The queryResultCache cache will be flushed
> > every time a new searcher is opened. So if you still get your poor
> > response times, and your queryResultCache hits are increasing then
> > Solr is doing pretty much nothing.
> >
> > So does this behavior still occur if you aren't adding docs to the
> > index? If you turn indexing off as a test, that'd be another data
> > point.
> >
> > And, of course, if it's at all possible to just take the CDCR
> > configuration out of your solrconfig file temporarily that'd nail
> > whether CDCR is the culprit or whether it's coincidental. You say that
> > CDCR is the only difference between the environments, but I've
> > certainly seen situations where it turns out to be a bad disk
> > controller or something that's _also_ different.
> >
> > Now, assuming all that's inconclusive, I'm afraid the next step would
> > be to throw a profiler at it. Maybe pull a stack traces.
> >
> > Best,
> > Erick
> >
> > On Wed, Jun 13, 2018 at 6:15 AM, Chris Troullis 
> > wrote:
> > > Thanks Erick. A little more info:
> > >
> > > -We do have buffering disabled everywhere, as I had read multiple posts
> > on
> > > the mailing list regarding the issue you described.
> > > -We soft commit (with opensearcher=true) pretty frequently (15 seconds)
> > as
> > > we have some NRT requirements. We hard commit every 60 seconds. We
> never
> > > commit manually, only via the autocommit timers. We have been using
> these
> > > settings for a long time and have never had any issues until recently.
> > And
> > > all of our other indexes are fine (some larger than this one).
> > > -We do have

Re: Solaris 10

2018-05-24 Thread Susheel Kumar

No idea about Solaris much but the only option is to install manually as
you did and try to modify /bin/solr script to get rid of the errors you are
seeing etc.
Thnx

On Thu, May 24, 2018 at 5:40 AM, Takuya Kawasaki 
wrote:

> Please let me ask a question.
>
> I would like to use Solr on Solaris 10.
> But I encountered a lot of errors.
> First, I can’t install solr using install script in .tgz. script result
> shows I have to install manually not using the script.
> Second, I can’t use ‘start’ command using /bin/solr script because of the
> wrong options of ‘awk’ command in Solaris 10.
> Also, I can’t use ‘stop’ command using /bin/solr script and it happens
> just the same reason.
>
> I think this script is written only for Linux not for Unix or just only
> not for Solaris.
> (I know Solaris is unique OS and isn’t used widely in the world.)
> I could install Solr on Ubuntu 16.04 (Linux) system so easily reading your
> website. (Thank you for creating so detailed documents!)
> So I just would like to know whether Solr could run on the Solaris system
> or not.
>
> Best Regards,
> Takuya.
>

Re: Must clause with filter queries

2018-05-10 Thread Susheel Kumar

1. a) is accurate while 2. b) is accurate.

if query 1. a) is just for example then its fine but otherwise usually want
to use filter on fields which has low cardinality like state, country,
gender etc. Name is a high cardinality column and using filter query
wouldn't be efficient and also doesn't help with caching.

Thnx

On Wed, May 9, 2018 at 2:56 PM, root23  wrote:

> Thanks for the explanation shawn. I will look at our autowarming time.
> Looking at your response i am thinking i might be doing few more things
> wrong
> 1. Does Must clause with any of the filter query makes any sense or is
> automatically implied.
>   e.g if i want all the docs with firstName:michael and lastname:jordan,
> which of the following queries makes sense or both are equivalent
> a) q=*:*=name:michael=lastname:jordan
> b) q=*:*=+name:michael=+lastname:jordan
>
>
> 2.Does Must clause also implied with the join query. so in the following
> query i am joining between 2 cores, on field:id. It should filter first
> from
> the index "search" where title is full and then join on id and then only
> get
> the docs which also has status set to monitor.
>
>  a ) q=*:*=+{!join from=id to=id fromIndex=search
> force=true}title:full=+status:monitor
>
>  b) q=*:*={!join from=id to=id fromIndex=search
> force=true}title:full=status:monitor
>
> so of the above which one is accurate a) or b)
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Query Regarding Solr Garbage Collection

2018-05-02 Thread Susheel Kumar

A very high rate of indexing documents could cause heap usage to go high
(all temporary objects getting created are in JVM memory and with very high
rate heap utilization may go high)

Having Cache's not sized/set correctly would also return in high JVM usage
since as searches are happening, it will keep filling cache's thus JVM.
Other factors like sorting/faceting etc. would also require JVM memory and
deep paging could even cause JVM to run out of memory/OOM.

Thnx

On Tue, May 1, 2018 at 6:18 PM, Greenhorn Techie 
wrote:

> Hi,
>
> Following the https://wiki.apache.org/solr/SolrPerformanceFactors article,
> I understand that Garbage Collection might be triggered due to significant
> increase in JVM heap usage unless a commit is performed. Given this
> background, I am curious to understand the reasons / factors that
> contribute to increased heap usage of Solr JVM, which would thus force a
> Garbage Collection cycle.
>
> Especially, what are the factors that contribute to heap usage increase
> during indexing time and what factors contribute during search/query time?
>
> Thanks
>

Re: Solr Heap usage

2018-05-02 Thread Susheel Kumar

Take a look at https://wiki.apache.org/solr/SolrPerformanceProblems. The
section "how much heap do i need"  talks about that.
Cache also goes to JVM so take a look how much you need/allocating for
different cache's.

Thnx


On Tue, May 1, 2018 at 7:33 PM, Greenhorn Techie 
wrote:

> Hi,
>
> Wondering what are the considerations to be aware to arrive at an optimal
> heap size for Solr JVM? Though I did discuss this on the IRC, I am still
> unclear on how Solr uses the JVM heap space. Are there any pointers to
> understand this aspect better?
>
> Given that Solr requires an optimally configured heap, so that the
> remaining unused memory can be used for OS disk cache, I wonder how to best
> configure Solr heap. Also, on the IRC it was discussed that having 31GB of
> heap is better than having 32GB due to Java’s internal usage of heap. Can
> anyone guide further on heap configuration please?
>
> Thanks
>

Re: Confusing SOLR results after upgrading from 4.10 to 7.1

2018-04-30 Thread Susheel Kumar

This may not be the reason but i noticed you have FlattenGraphFilterFactory
at query time while its only required at index time. I would suggest to go
Analysis tab if not checked already.

Thnx


On Mon, Apr 30, 2018 at 2:22 PM, Hodder, Rick  wrote:

> I upgraded from SOLR 4.10 to SOLR 7.1
>
> In the core, I have a string field called "company" and string field
> "year", and I have an index on company called IDX_Company.
> Here is the definition of the company field, and the definition of
> text_general in my schema in 4.10
>
>  stored="false" multiValued="true" />
> 
> 
>  positionIncrementGap="100">
> 
> 
>  class="solr.EdgeNGramFilterFactory"
> minGramSize="1" maxGramSize="15" side="front"/>
>  class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
>  class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
>  pattern="[^\p{L}\p{N} ]" replacement=" " replace="all" />
> 
> 
> 
>  class="solr.EdgeNGramFilterFactory"
> minGramSize="1" maxGramSize="15" side="front"/>
>  class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
>  class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
>
>   
>  pattern="[^\p{L}\p{N} ]" replacement=" " replace="all" />
> 
> 
>
>
> Here is the field definition and definition of text_general in 7.10
>
>  stored="false" multiValued="true" />
> 
> 
>  positionIncrementGap="100" multiValued="true">
> 
> 
>  maxGramSize="15"/>
>  words="stopwords.txt" />
>  synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
> 
> 
> 
>  pattern="[^\p{L}\p{N} ]" replacement=" " replace="all" />
> 
> 
> 
>  maxGramSize="15"/>
>  words="stopwords.txt" />
>  synonyms="synonyms.txt" ignoreCase="true" expand="false"/> class="solr.FlattenGraphFilterFactory"/>
> 
> 
>  pattern="[^\p{L}\p{N} ]" replacement=" " replace="all" />
> 
> 
>
> Among the documents in the core are:
>
> company year
> AB Landscaping Inc.  2001
> AB Landscaping Inc.  2002
> AB Landscaping : AB Landscaping and Excavating LLC  2001
> AB Landscaping : AB Landscaping and Excavating LLC  2002
> A B Landscaping : AB Landscaping and Excavating LLC  2000
> Landscaping Firm1999
> Landscaping Associates  1998
>
> Under 4.10 if I search for
> IDX_Company:(AB AND Landscaping)
> I see all 7 companies, and the two AB Landscaping Incs are at the top of
> the results
>
> Under 7.1 if I search for
> IDX_Company:(AB AND Landscaping)
>
> I only see the following, notice that documents with Excavating dont
> appear, and AB Landscapting are not at the top of the results - the
> Landscaping Firm and Landscaping Associates are
>
> Landscaping Associates  1998
> Landscaping Firm1999
> AB Landscaping Inc.  2001
> AB Landscaping Inc.  2002
>
> Any ideas that might be causing this? The query seems very straightforward.
>
>

Re: CDCR Bootstrap

2018-04-26 Thread Susheel Kumar

Thanks, Tom. Is that correct that i have to execute this for each shard?

On Thu, Apr 26, 2018 at 10:19 AM, Tom Peters <tpet...@synacor.com> wrote:

> I'm not sure under what conditions it will be automatically triggered, but
> if you manually wanted to trigger a CDCR Bootstrap you need to issue the
> following query to the leader in your target data center.
>
> /solr//cdcr?action=BOOTSTRAP= URL>
>
> The masterUrl will look something like (change the necessary values):
> http%3A%2F%2Fsolr-leader.solrurl%3A8983%2Fsolr%2Fcollection
>
> > On Apr 26, 2018, at 10:15 AM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
> >
> > Anybody has idea how to trigger Solr CDCR BOOTSTRAP or under what
> condition
> > it gets triggered ?
> >
> > Thanks,
> > Susheel
> >
> > On Tue, Apr 24, 2018 at 12:34 PM, Susheel Kumar <susheel2...@gmail.com>
> > wrote:
> >
> >> Hello,
> >>
> >> I am wondering under what different conditions does that CDCR bootstrap
> >> process gets triggered.  I did notice it getting triggered after I
> stopped
> >> CDCR and then started again later and now I am trying to reproduce the
> same
> >> behavior.
> >>
> >> In case target cluster is left behind and buffer was disabled on
> source, i
> >> would like the CDCR bootstrap to trigger and sync target.
> >>
> >> Does deleting records from target and then starting CDCR would trigger
> >> bootstrap ?
> >>
> >> Thanks,
> >> Susheel
> >>
> >>
> >>
>
>
>
>
>
> This message and any attachment may contain information that is
> confidential and/or proprietary. Any use, disclosure, copying, storing, or
> distribution of this e-mail or any attached file by anyone other than the
> intended recipient is strictly prohibited. If you have received this
> message in error, please notify the sender by reply email and delete the
> message and any attachments. Thank you.
>

Re: CDCR Bootstrap

2018-04-26 Thread Susheel Kumar

Anybody has idea how to trigger Solr CDCR BOOTSTRAP or under what condition
it gets triggered ?

Thanks,
Susheel

On Tue, Apr 24, 2018 at 12:34 PM, Susheel Kumar <susheel2...@gmail.com>
wrote:

> Hello,
>
> I am wondering under what different conditions does that CDCR bootstrap
> process gets triggered.  I did notice it getting triggered after I stopped
> CDCR and then started again later and now I am trying to reproduce the same
> behavior.
>
> In case target cluster is left behind and buffer was disabled on source, i
> would like the CDCR bootstrap to trigger and sync target.
>
> Does deleting records from target and then starting CDCR would trigger
> bootstrap ?
>
> Thanks,
> Susheel
>
>
>

CDCR Bootstrap

2018-04-24 Thread Susheel Kumar

Hello,

I am wondering under what different conditions does that CDCR bootstrap
process gets triggered.  I did notice it getting triggered after I stopped
CDCR and then started again later and now I am trying to reproduce the same
behavior.

In case target cluster is left behind and buffer was disabled on source, i
would like the CDCR bootstrap to trigger and sync target.

Does deleting records from target and then starting CDCR would trigger
bootstrap ?

Thanks,
Susheel

Re: CdcrReplicator Forwarder not working on some shards

2018-04-18 Thread Susheel Kumar

I was able to resolve this issue by start/stop the cdcr process couple of
times until all shards leaders started forwarding updates...

Thnx

On Tue, Apr 17, 2018 at 3:20 PM, Susheel Kumar <susheel2...@gmail.com>
wrote:

> Hi Amrit,
>
> The cdcr?action=ERRORS is returning consecutiveErrors=1 on the shards
> which are not forwarding updates.  Any clue does that gives?
>
> 
> 1
> 1
> 0
> 
> bad_request
> 
> 
>
>
>
>
> On Tue, Apr 17, 2018 at 1:22 PM, Amrit Sarkar <sarkaramr...@gmail.com>
> wrote:
>
>> Susheel,
>>
>> At the time of core reload, logs must be complaining or atleast pointing
>> to
>> some direction. Each leader of shard is responsible to spawn a threadpool
>> for cdcr replicator to get the data over.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> Medium: https://medium.com/@sarkaramrit2
>>
>> On Tue, Apr 17, 2018 at 9:04 PM, Susheel Kumar <susheel2...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > Has anyone gone thru this issue where few shard leaders are forwarding
>> > updates to their counterpart leaders in target cluster while some of the
>> > shards leaders are not forwarding the updates.
>> >
>> > on Solr 6.6,  4 of the shards logs I see below entries and their
>> > counterpart in target are getting updated but for other 4 shards I don't
>> > below entries and neither being replicated to target.
>> >
>> > Any suggestion on how / what can be done to start cdcr-replicator
>> threads
>> > on other shards?
>> >
>> > 2018-04-17 15:26:38.394 INFO
>> > (cdcr-replicator-24-thread-6-processing-n:dc2prsrcvap0049.
>> > whc.dc02.us.adp:8080_solr)
>> > [   ] o.a.s.h.CdcrReplicator Forwarded 0 updates to target COLL
>> > 2018-04-17 15:26:39.394 INFO
>> > (cdcr-replicator-24-thread-7-processing-n:dc2prsrcvap0049.
>> > whc.dc02.us.adp:8080_solr)
>> > [   ] o.a.s.h.CdcrReplicator Forwarded 0 updates to target COLL
>> >
>> > Thanks
>> > Susheel
>> >
>>
>
>

Re: CdcrReplicator Forwarder not working on some shards

2018-04-17 Thread Susheel Kumar

Hi Amrit,

The cdcr?action=ERRORS is returning consecutiveErrors=1 on the shards which
are not forwarding updates.  Any clue does that gives?


1
1
0

bad_request






On Tue, Apr 17, 2018 at 1:22 PM, Amrit Sarkar <sarkaramr...@gmail.com>
wrote:

> Susheel,
>
> At the time of core reload, logs must be complaining or atleast pointing to
> some direction. Each leader of shard is responsible to spawn a threadpool
> for cdcr replicator to get the data over.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Tue, Apr 17, 2018 at 9:04 PM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Has anyone gone thru this issue where few shard leaders are forwarding
> > updates to their counterpart leaders in target cluster while some of the
> > shards leaders are not forwarding the updates.
> >
> > on Solr 6.6,  4 of the shards logs I see below entries and their
> > counterpart in target are getting updated but for other 4 shards I don't
> > below entries and neither being replicated to target.
> >
> > Any suggestion on how / what can be done to start cdcr-replicator threads
> > on other shards?
> >
> > 2018-04-17 15:26:38.394 INFO
> > (cdcr-replicator-24-thread-6-processing-n:dc2prsrcvap0049.
> > whc.dc02.us.adp:8080_solr)
> > [   ] o.a.s.h.CdcrReplicator Forwarded 0 updates to target COLL
> > 2018-04-17 15:26:39.394 INFO
> > (cdcr-replicator-24-thread-7-processing-n:dc2prsrcvap0049.
> > whc.dc02.us.adp:8080_solr)
> > [   ] o.a.s.h.CdcrReplicator Forwarded 0 updates to target COLL
> >
> > Thanks
> > Susheel
> >
>

CdcrReplicator Forwarder not working on some shards

2018-04-17 Thread Susheel Kumar

Hi,

Has anyone gone thru this issue where few shard leaders are forwarding
updates to their counterpart leaders in target cluster while some of the
shards leaders are not forwarding the updates.

on Solr 6.6,  4 of the shards logs I see below entries and their
counterpart in target are getting updated but for other 4 shards I don't
below entries and neither being replicated to target.

Any suggestion on how / what can be done to start cdcr-replicator threads
on other shards?

2018-04-17 15:26:38.394 INFO
(cdcr-replicator-24-thread-6-processing-n:dc2prsrcvap0049.whc.dc02.us.adp:8080_solr)
[   ] o.a.s.h.CdcrReplicator Forwarded 0 updates to target COLL
2018-04-17 15:26:39.394 INFO
(cdcr-replicator-24-thread-7-processing-n:dc2prsrcvap0049.whc.dc02.us.adp:8080_solr)
[   ] o.a.s.h.CdcrReplicator Forwarded 0 updates to target COLL

Thanks
Susheel

Re: Weird transaction log behavior with CDCR

2018-04-17 Thread Susheel Kumar

DISABLEBUFFER on source cluster would solve this problem.

On Tue, Apr 17, 2018 at 9:29 AM, Chris Troullis 
wrote:

> Hi,
>
> We are attempting to use CDCR with solr 7.2.1 and are experiencing odd
> behavior with transaction logs. My understanding is that by default, solr
> will keep a maximum of 10 tlog files or 100 records in the tlogs. I assume
> that with CDCR, the records will not be removed from the tlogs until it has
> been confirmed that they have been replicated to the other cluster.
> However, even when replication has finished and the CDCR queue sizes are 0,
> we are still seeing large numbers (50+) and large sizes (over a GB) of
> tlogs sitting on the nodes.
>
> We are hard committing once per minute.
>
> Doing a lot of reading on the mailing list, I see that a lot of people were
> pointing to buffering being enabled as the cause for some of these
> transaction log issues. However, we have disabled buffering on both the
> source and target clusters, and are still seeing the issues.
>
> Also, while some of our indexes replicate very rapidly (millions of
> documents in minutes), other smaller indexes are crawling. If we restart
> CDCR on the nodes then it finishes almost instantly.
>
> Any thoughts on these behaviors?
>
> Thanks,
>
> Chris
>

Re: Does CDCR Bootstrap sync leaves replica's out of sync

2018-04-16 Thread Susheel Kumar

Thanks Amrit, Peter. I'll go with option#2 but what else i am seeing is
that after bootstrap, target has not been synched further (even though we
have continous indexing happening in source) which I believe due to  the
leaders on source cluster shows updateLogSynchronizer stopped while
replica's on source cluster shows updateLogSynchronizer started.

How can we start updateLogSynchronizer on leader and stop updateLogSynchronizer
on replica on source without switching the leader/follower. Any idea?

Thnx



On Mon, Apr 16, 2018 at 2:20 PM, Tom Peters <tpet...@synacor.com> wrote:

> There are two ways I've gotten around this issue:
>
> 1. Add replicas in the target data center after CDCR bootstrapping has
> completed.
>
> -or-
>
> 2. After the bootstrapping has completed, restart the replica nodes
> one-at-time in the target data center (restart, wait for replica to catch
> up, then restart the next).
>
>
> I recommend doing method #1 over #2 if you can. If you accidentally
> restart the leader node using method #2, it will promote an out-of-sync
> replica to the leader and all followers will receive that out-of-date index.
>
> I also recommend pausing indexing if you can while you let the target
> replicas catch up. I have run into issues where the replicas will not catch
> up if the leader has a fair amount of updates to replay from the source.
>
> > On Apr 16, 2018, at 2:15 PM, Amrit Sarkar <sarkaramr...@gmail.com>
> wrote:
> >
> > Hi Susheel,
> >
> > Pretty sure you are talking about this:
> > https://issues.apache.org/jira/browse/SOLR-11724
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Mon, Apr 16, 2018 at 11:35 PM, Susheel Kumar <susheel2...@gmail.com>
> > wrote:
> >
> >> Does anybody know about known issue where CDCR bootstrap sync leaves the
> >> replica's on target cluster non touched/out of sync.
> >>
> >> After I stopped and restart CDCR, it builds my target leaders index but
> >> replica's on target cluster still showing old index / not modified.
> >>
> >>
> >> Thnx
> >>
>
>
>
> This message and any attachment may contain information that is
> confidential and/or proprietary. Any use, disclosure, copying, storing, or
> distribution of this e-mail or any attached file by anyone other than the
> intended recipient is strictly prohibited. If you have received this
> message in error, please notify the sender by reply email and delete the
> message and any attachments. Thank you.
>

Does CDCR Bootstrap sync leaves replica's out of sync

2018-04-16 Thread Susheel Kumar

Does anybody know about known issue where CDCR bootstrap sync leaves the
replica's on target cluster non touched/out of sync.

After I stopped and restart CDCR, it builds my target leaders index but
replica's on target cluster still showing old index / not modified.


Thnx

Re: Log reader for target is not initialised

2018-04-16 Thread Susheel Kumar

I figured it out that after restarting nodes, source cluster leaders were
switched and causing above warning and cdcr replication to stop. After
stopping the CDCR process and then restart again, above warning disappear
and bootstrap sync stepped in.

On Sun, Apr 15, 2018 at 7:54 PM, Susheel Kumar <susheel2...@gmail.com>
wrote:

> Hello,
>
> Over the weekend, we restarted nodes on target and source and after that I
> see the replication from source to target has stopped and see below warning
> messages in solr.log.
>
> What could be done to resolve this.  The leaders on source cluster shows 
> updateLogSynchronizer
> stoppped while replica's on source cluster shows updateLogSynchronizer
> started
>
> 
> 
> 0
> 0
> 
> 
>

Log reader for target is not initialised

2018-04-15 Thread Susheel Kumar

Hello,

Over the weekend, we restarted nodes on target and source and after that I
see the replication from source to target has stopped and see below warning
messages in solr.log.

What could be done to resolve this.  The leaders on source cluster
shows updateLogSynchronizer
stoppped while replica's on source cluster shows updateLogSynchronizer
started



0
0

Re: ZK CLI script giving IOException doing upconfig

2018-04-04 Thread Susheel Kumar

Hi Doug,  are you able to connect to Zookeeper thru Zookeeper zkCli.sh or
does Zookeeper.out show anything useful.

Thnx

On Wed, Apr 4, 2018 at 2:13 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Thanks for the responses. Yeah I thought they were weird errors too... :)
>
> Below are the logs from zookeeper running in foreground after a connection
> attempt. But this Exception looks suspicous to me:
>
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@383] - Exception
> causing close of session 0x10024db7e280006: *Len error 5327937*
>
> Has anyone seen this before? The LenError seems to be a thread to google...
>
> 2018-04-04 14:06:01,210 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket
> connection
> from /127.0.0.1:55078
> 2018-04-04 14:06:01,218 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@938] - Client attempting to establish
> new session at /127.0.0.1:55078
> 2018-04-04 14:06:01,219 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@683]
> - Established session 0x10024db7e280006 with negotiated timeout 3 for
> client /127.0.0.1:55078
> 2018-04-04 14:06:01,361 [myid:] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@383] - Exception causing close of
> session 0x10024db7e280006: Len error 5327937
> 2018-04-04 14:06:01,362 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for
> client /127.0.0.1:55078 which had sessionid 0x10024db7e280006
> 2018-04-04 14:06:01,956 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket
> connection
> from /0:0:0:0:0:0:0:1:55079
> 2018-04-04 14:06:01,959 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@931] - Client attempting to renew
> session 0x10024db7e280006 at /0:0:0:0:0:0:0:1:55079
> 2018-04-04 14:06:01,960 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@683] - Established session
> 0x10024db7e280006 with negotiated timeout 3 for client
> /0:0:0:0:0:0:0:1:55079
> 2018-04-04 14:06:03,223 [myid:] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data
> from client sessionid 0x10024db7e280006, likely client has closed socket
> 2018-04-04 14:06:03,223 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for
> client /0:0:0:0:0:0:0:1:55079 which had sessionid 0x10024db7e2800
>
> On Wed, Apr 4, 2018 at 11:15 AM Shawn Heisey  wrote:
>
> > On 4/4/2018 7:14 AM, Doug Turnbull wrote:
> > > I've been struggling to do a basic upconfig both with embedded and
> actual
> > > Zookeeper in Solr 7.2.1 using the zkcli script on OSX.
> > >
> > > One variable, I recently upgraded to Java 9. I get slightly different
> > > errors on Java 8 vs 9
> >
> > 
> >
> > > Java 9:
> > >
> > > doug@wiz$~/ws/foo(mas) $
> > > /Users/doug/bin/solr-7.2.1/server/scripts/cloud-scripts/zkcli.sh
> -zkhost
> > > localhost:2181 -cmd upconfig -confdir solr_home/foo/ -confname foo_conf
> > > WARN  - 2018-04-04 09:05:28.194;
> > > org.apache.zookeeper.ClientCnxn$SendThread; Session 0x100244e8ffb0004
> for
> > > server localhost/127.0.0.1:2181, unexpected error, closing socket
> > > connection and attempting reconnect
> > > java.io.IOException: Connection reset by peer
> >
> > 
> >
> > > Java 8 gives the error
> > >
> > > java.io.IOException: Protocol wrong type for socket
> > >
> > > WARN  - 2018-04-04 09:10:11.879;
> > > org.apache.zookeeper.ClientCnxn$SendThread; Session 0x10024db7e280002
> for
> > > server localhost/0:0:0:0:0:0:0:1:2181, unexpected error, closing
> socket
> > > connection and attempting reconnect
> > > java.io.IOException: Protocol wrong type for socket
> >
> > I'm with Erick on this.  These are REALLY weird errors. The stacktraces
> > for the errors are entirely in ZooKeeper and Java code, not Solr code.
> > The log for Java 9 does have an entry that mentions Solr classes, but
> > that's a disconnect after the error, not part of the error.
> >
> > Are you getting any corresponding log messages in the ZK server log?
> >
> > The ZkCLI class is part of Solr, and does interface to ZK through Solr
> > internals, but ultimately it's ZK doing the work.
> >
> > The ZK client that's in Solr 7.2.1 is version 3.4.10.
> >
> > Thanks,
> > Shawn
> >
> > --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug
>

Schema update/Reload during Live traffic

2018-03-23 Thread Susheel Kumar

Hello,

I did schema update to Solr cloud of Source CDCR cluster and same on
target. After Collection Reload, noticed "error opening searcher" /
IndexWriter closed etc. on leader node while all replica's went into
recovery mode.

Later after restarting Solr on Leader noticed below too many file open
errors.  The ulimit shows unlimited.

What could have gone wrong above/and how can we avoid.  This is with 6.6.2
and there were ingestions/delete and searches happening the time I reload
the collection.

Caused by: org.apache.solr.common.SolrException:
java.io.FileNotFoundException:
/app/solr/data/COLL_shard1_replica4/data/tlog/tlog.0002214.1591528549576081408
(Too many open files)

at
org.apache.solr.update.CdcrTransactionLog.reopenOutputStream(CdcrTransactionLog.java:250)

at
org.apache.solr.update.CdcrTransactionLog.incref(CdcrTransactionLog.java:179)

at
org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1342)

at org.apache.solr.update.UpdateLog.init(UpdateLog.java:393)

at org.apache.solr.update.CdcrUpdateLog.init(CdcrUpdateLog.java:77)

at
org.apache.solr.update.UpdateHandler.(UpdateHandler.java:153)

at
org.apache.solr.update.UpdateHandler.(UpdateHandler.java:110)

at
org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:108)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)

at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:760)

at
org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:822)

at
org.apache.solr.core.SolrCore.initUpdateHandler(SolrCore.java:1088)

at org.apache.solr.core.SolrCore.(SolrCore.java:947)

Re: CDCR performance issues

2018-03-23 Thread Susheel Kumar

Yea,  Amrit. to clarify we have 30 sec soft commit on target data center
and for the test when we use Documents tab,  the default Commit Within=1000
ms which makes the commit quickly on source and then we just wait for it to
appear on target data center per commit strategy.

On Fri, Mar 23, 2018 at 8:47 AM, Amrit Sarkar <sarkaramr...@gmail.com>
wrote:

> Susheel,
>
> That is the correct behavior, "commit" operation is not propagated to
> target and the documents will be visible in the target as per commit
> strategy devised there.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Fri, Mar 23, 2018 at 6:02 PM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
>
> > Just a simple check, if you go to source solr and index single document
> > from Documents tab, then keep querying target solr for the same document.
> > How long does it take the document to appear in target data center.  In
> our
> > case, I can see document show up in target within 30 sec which is our
> soft
> > commit time.
> >
> > Thanks,
> > Susheel
> >
> > On Fri, Mar 23, 2018 at 8:16 AM, Amrit Sarkar <sarkaramr...@gmail.com>
> > wrote:
> >
> > > Hey Tom,
> > >
> > > I'm also having issue with replicas in the target data center. It will
> go
> > > > from recovering to down. And when one of my replicas go to down in
> the
> > > > target data center, CDCR will no longer send updates from the source
> to
> > > > the target.
> > >
> > >
> > > Are you able to figure out the issue? As long as the leaders of each
> > shard
> > > in each collection is up and serving, CDCR shouldn't stop.
> > >
> > > Sometimes we have to reindex a large chunk of our index (1M+
> documents).
> > > > What's the best way to handle this if the normal CDCR process won't
> be
> > > > able to keep up? Manually trigger a bootstrap again? Or is there
> > > something
> > > > else we can do?
> > > >
> > >
> > > That's one of the limitations of CDCR, it cannot handle bulk indexing,
> > > preferable way to do is
> > > * stop cdcr
> > > * bulk index
> > > * issue manual BOOTSTRAP (it is independent of stop and start cdcr)
> > > * start cdcr
> > >
> > > 1. Is it accurate that updates are not actually batched in transit from
> > the
> > > > source to the target and instead each document is posted separately?
> > >
> > >
> > > The batchsize and schedule regulate how many docs are sent across
> target.
> > > This has more details:
> > > https://lucene.apache.org/solr/guide/7_2/cdcr-config.
> > > html#the-replicator-element
> > >
> > >
> > >
> > >
> > > Amrit Sarkar
> > > Search Engineer
> > > Lucidworks, Inc.
> > > 415-589-9269
> > > www.lucidworks.com
> > > Twitter http://twitter.com/lucidworks
> > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > > Medium: https://medium.com/@sarkaramrit2
> > >
> > > On Tue, Mar 13, 2018 at 12:21 AM, Tom Peters <tpet...@synacor.com>
> > wrote:
> > >
> > > > I'm also having issue with replicas in the target data center. It
> will
> > go
> > > > from recovering to down. And when one of my replicas go to down in
> the
> > > > target data center, CDCR will no longer send updates from the source
> to
> > > the
> > > > target.
> > > >
> > > > > On Mar 12, 2018, at 9:24 AM, Tom Peters <tpet...@synacor.com>
> wrote:
> > > > >
> > > > > Anyone have any thoughts on the questions I raised?
> > > > >
> > > > > I have another question related to CDCR:
> > > > > Sometimes we have to reindex a large chunk of our index (1M+
> > > documents).
> > > > What's the best way to handle this if the normal CDCR process won't
> be
> > > able
> > > > to keep up? Manually trigger a bootstrap again? Or is there something
> > > else
> > > > we can do?
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > >
> > > > >> On Mar 9, 2018, at 3:59 PM, Tom Peters <tpet...@synacor.com>
> wrote:
> > >

Re: CDCR performance issues

2018-03-23 Thread Susheel Kumar

Just a simple check, if you go to source solr and index single document
from Documents tab, then keep querying target solr for the same document.
How long does it take the document to appear in target data center.  In our
case, I can see document show up in target within 30 sec which is our soft
commit time.

Thanks,
Susheel

On Fri, Mar 23, 2018 at 8:16 AM, Amrit Sarkar 
wrote:

> Hey Tom,
>
> I'm also having issue with replicas in the target data center. It will go
> > from recovering to down. And when one of my replicas go to down in the
> > target data center, CDCR will no longer send updates from the source to
> > the target.
>
>
> Are you able to figure out the issue? As long as the leaders of each shard
> in each collection is up and serving, CDCR shouldn't stop.
>
> Sometimes we have to reindex a large chunk of our index (1M+ documents).
> > What's the best way to handle this if the normal CDCR process won't be
> > able to keep up? Manually trigger a bootstrap again? Or is there
> something
> > else we can do?
> >
>
> That's one of the limitations of CDCR, it cannot handle bulk indexing,
> preferable way to do is
> * stop cdcr
> * bulk index
> * issue manual BOOTSTRAP (it is independent of stop and start cdcr)
> * start cdcr
>
> 1. Is it accurate that updates are not actually batched in transit from the
> > source to the target and instead each document is posted separately?
>
>
> The batchsize and schedule regulate how many docs are sent across target.
> This has more details:
> https://lucene.apache.org/solr/guide/7_2/cdcr-config.
> html#the-replicator-element
>
>
>
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Tue, Mar 13, 2018 at 12:21 AM, Tom Peters  wrote:
>
> > I'm also having issue with replicas in the target data center. It will go
> > from recovering to down. And when one of my replicas go to down in the
> > target data center, CDCR will no longer send updates from the source to
> the
> > target.
> >
> > > On Mar 12, 2018, at 9:24 AM, Tom Peters  wrote:
> > >
> > > Anyone have any thoughts on the questions I raised?
> > >
> > > I have another question related to CDCR:
> > > Sometimes we have to reindex a large chunk of our index (1M+
> documents).
> > What's the best way to handle this if the normal CDCR process won't be
> able
> > to keep up? Manually trigger a bootstrap again? Or is there something
> else
> > we can do?
> > >
> > > Thanks.
> > >
> > >
> > >
> > >> On Mar 9, 2018, at 3:59 PM, Tom Peters  wrote:
> > >>
> > >> Thanks. This was helpful. I did some tcpdumps and I'm noticing that
> the
> > requests to the target data center are not batched in any way. Each
> update
> > comes in as an independent update. Some follow-up questions:
> > >>
> > >> 1. Is it accurate that updates are not actually batched in transit
> from
> > the source to the target and instead each document is posted separately?
> > >>
> > >> 2. Are they done synchronously? I assume yes (since you wouldn't want
> > operations applied out of order)
> > >>
> > >> 3. If they are done synchronously, and are not batched in any way,
> does
> > that mean that the best performance I can expect would be roughly how
> long
> > it takes to round-trip a single document? ie. If my average ping is 25ms,
> > then I can expect a peak performance of roughly 40 ops/s.
> > >>
> > >> Thanks
> > >>
> > >>
> > >>
> > >>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] <
> > daniel.da...@nih.gov> wrote:
> > >>>
> > >>> These are general guidelines, I've done loads of networking, but may
> > be less familiar with SolrCloud  and CDCR architecture.  However, I know
> > it's all TCP sockets, so general guidelines do apply.
> > >>>
> > >>> Check the round-trip time between the data centers using ping or TCP
> > ping.   Throughput tests may be high, but if Solr has to wait for a
> > response to a request before sending the next action, then just like any
> > network protocol that does that, it will get slow.
> > >>>
> > >>> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also
> > check whether some proxy/load balancer between data centers is causing it
> > to be a single connection per operation.   That will *kill* performance.
> >  Some proxies default to HTTP/1.0 (open, send request, server send
> > response, close), and that will hurt.
> > >>>
> > >>> Why you should listen to me even without SolrCloud knowledge -
> > checkout paper "Latency performance of SOAP Implementations".   Same
> > distribution of skills - I knew TCP well, but Apache Axis 1.1 not so
> well.
> >  I still improved response time of Apache Axis 1.1 by 250ms per call with
> > 1-line of code.
> > >>>
> > >>> -Original Message-
> > >>> From: Tom Peters

Re: Solr Swap space

2018-02-22 Thread Susheel Kumar

Cool, Thanks, Shawn.  I was also looking the swapiness and it is set to
60.  Will try this out and let you know.  Thanks, again.

On Thu, Feb 22, 2018 at 10:55 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 2/21/2018 7:58 PM, Susheel Kumar wrote:
>
>> Below output for prod machine based on the steps you described.  Please
>> take a look.  The solr searches are returning fine and no issue with
>> performance but since last 4 months swap space started going up. After
>> restart, it comes down to zero and then few weeks, it utilization reaches
>> to 40-50% and thus requires restart of solr process.
>>
>
> I bet that if you run this command, it will show you a value of 60:
>
> cat /proc/sys/vm/swappiness
>
> This makes the OS very aggressive about using swap, even when there is
> absolutely no need for it to do so.
>
> If you type the following series of commands, it should fix the problem
> and prevent it from happening again until you reboot the system:
>
> echo "0" > /proc/sys/vm/swappiness
> swapoff -a
> swapon -a
>
> Note that when the swapoff command runs, it will force the OS to read all
> the swapped data back into memory.  It will take several minutes for this
> to occur, because it must read nearly a gigabyte of data and figure out how
> to put it back in memory. Both of the command outputs you included say that
> there is over 20GB of free memory.  So I do not anticipate the system
> having problems from running these commands.  It will slow the machine down
> temporarily, though -- so only do it during a quiet time for your Solr
> install.
>
> To make this setting survive a reboot, find the sysctl.conf file somewhere
> in your /etc directory and add this line to it:
>
> vm.swappiness = 0
>
> This setting does not completely disable swap.  If the system finds itself
> with real memory pressure and actually does NEED to use swap, it still will
> ... it just won't swap anything out before it's actually required.
>
> I do not think the behavior you are seeing is actually causing problems,
> based on your system load and CPU usage.  But what I've shared should fix
> it for you.
>
> Thanks,
> Shawn
>
>

Re: Solr Swap space

2018-02-21 Thread Susheel Kumar

 S 0.000 0.000   0:00.00
kworker/6:0H
   0




 >free -m
 total   used   free sharedbuffers cached
Mem: 64430  42816  21614132  0  38823
-/+ buffers/cache:   3992  60437
Swap: 2047920   1127

On Wed, Feb 21, 2018 at 8:45 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 2/21/2018 1:30 PM, Susheel Kumar wrote:
> > I did go thru your posts on swap usage  http://lucene.472066.n3.
> > nabble.com/Solr-4-3-1-memory-swapping-td4126641.html and my situation is
> > also similar.  Below is top output from our prod and performance test
> > machine and as you can see the swap utilization on Prod machine is 44%
> > while on test machines it is zero.
>
> Are those "top" outputs sorted by the default (CPU usage) or by memory?
> To make any useful determination, it needs to be by memory.  Press
> shift-M to sort by memory if your top program supports that key.   Also,
> the list only shows the first few processes.  More of the list needs to
> be visible.
>
> The system load on the second "top" output is quite low, and doesn't
> show much used CPU percentage.  So it looks like the system is not
> actually suffering due to the swap usage, which probably means that it
> is not actively swapping.  The machine has plenty of memory available --
> even though almost all of the memory is allocated, the vast majority of
> what's allocated is in the "cached" state -- used by the OS disk cache.
> The OS will instantly give up this memory if a program requests it.
>
> I've learned how to use top to show which processes are using swap.
> What I've described below should work on recent version of gnu top.  If
> the top is from another software provider, it may not support this.
>
> http://northernmost.org/blog/swap-usage-5-years-later/
>
> These steps are not precisely as described in that blog post:  Run top,
> press f, press p, press space, then press the right angle bracket (>)
> key three times.  If top is running with default settings, these
> keypresses should enable the SWAP column and move the sort to that
> column.  The list should be sorted by swap usage.  If a .toprc file
> exists in your home directory, then the program may be running with very
> different settings than default, and these keypresses might not work as
> expected.
>
> Thanks,
> Shawn
>
>

Re: Solr Swap space

2018-02-21 Thread Susheel Kumar

Hello Shawn,

I did go thru your posts on swap usage  http://lucene.472066.n3.
nabble.com/Solr-4-3-1-memory-swapping-td4126641.html and my situation is
also similar.  Below is top output from our prod and performance test
machine and as you can see the swap utilization on Prod machine is 44%
while on test machines it is zero.

I haven't been able to figure out what/where exactly is the problem.  Both
test and prod machines, in terms of Java version are same  1.8.0_91.  What
could help to further understand and debug the issue?

Thanks,
Susheel

*Performance test machine *

top - 15:03:52 up 37 days,  5:49,  1 user,  load average: 0.06, 0.09, 0.04
Tasks: 191 total,   1 running, 190 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,
0.0 st
KiB Mem:  65977076 total, 60079356 used,  5897720 free, 1080 buffers
KiB Swap:  2097148 total,0 used,  2097148 free. 56416304 cached Mem

  PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
COMMAND
  433 solr  20   071.813g 3.484g 311548 S 1.667 5.538 105:37.51
java
1 root20   0   37504   5672   4020 S 0.000 0.009   3:43.54
systemd
2 root20   0   0  0  0 S 0.000 0.000   0:04.91
kthreadd
...
...


*Prod machine*

op - 15:10:20 up 116 days, 13:47,  3 users,  load average: 0.16, 0.20, 0.18
Tasks: 200 total,   1 running, 199 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.0 us,  0.0 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.0 hi,  0.0 si,
0.0 st
KiB Mem:  65976996 total, 65460256 used,   516740 free,   84 buffers
KiB Swap:  2097148 total,   *942288* used,  1154860 free. 61361628 cached
Mem

  PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
COMMAND
20327 solr  20   0 53.792g 0.021t 0.017t S 15.95 33.42   5051:07 java

  481   root  20   0  267688 152256   9012 S 0.332 0.231  46:47.60
splunkd
1 root  20   0   37532   5360   3896 S 0.000 0.008   6:06.47
systemd
...
...


On Fri, Feb 9, 2018 at 5:44 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> No worries, I don't mind being confused with Erick ;)
>
> Emir
>
> On Feb 9, 2018 9:16 PM, "Susheel Kumar" <susheel2...@gmail.com> wrote:
>
> > Sorry, I meant Emir.
> >
> > On Fri, Feb 9, 2018 at 3:15 PM, Susheel Kumar <susheel2...@gmail.com>
> > wrote:
> >
> > > Thanks, Shawn, Eric.  I see that same using swapon -s.  Looks like
> during
> > > the OS setup, it was set as 2 GB (Solr 6.0) and other 16GB (Solr 6.6)
> > >
> > > Our 6.0 instance has been running since 1+ year but recently our monit
> > > started reporting swap usage above 30% and Solr dashboard showing the
> > > same.  I haven't been able to find what causes / why Solr is using the
> > swap
> > > space.  The index size is well within memory size to fit and we have
> > double
> > > the index size on our performance test machines but there is no usage
> of
> > > swap space.
> > >
> > > Thanks,
> > > Susheel
> > >
> > > On Wed, Feb 7, 2018 at 3:09 PM, Shawn Heisey <apa...@elyograg.org>
> > wrote:
> > >
> > >> On 2/7/2018 12:01 PM, Susheel Kumar wrote:
> > >>
> > >>> Just trying to find where do we set swap space available to Solr
> > >>> process. I
> > >>> see in our 6.0 instances it was set to 2GB on and on 6.6 instances
> its
> > >>> set
> > >>> to 16GB.
> > >>>
> > >>
> > >> Solr has absolutely no involvement or control over swap space.
> Neither
> > >> does Java.  This is a function of your operating system's memory
> > >> management, and is typically set up when you first install your OS.
> > >>
> > >> https://www.linux.com/news/all-about-linux-swap-space
> > >> https://en.wikipedia.org/wiki/Paging#Windows_NT
> > >>
> > >> If your system is using swap space, it's a strong indication that you
> > >> don't have enough memory installed.  If any of the memory that Solr
> > uses is
> > >> swapped out to disk, Solr performance is going to be REALLY bad.
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> > >
> > >
> >
>

Re: Solr Swap space

2018-02-09 Thread Susheel Kumar

Sorry, I meant Emir.

On Fri, Feb 9, 2018 at 3:15 PM, Susheel Kumar <susheel2...@gmail.com> wrote:

> Thanks, Shawn, Eric.  I see that same using swapon -s.  Looks like during
> the OS setup, it was set as 2 GB (Solr 6.0) and other 16GB (Solr 6.6)
>
> Our 6.0 instance has been running since 1+ year but recently our monit
> started reporting swap usage above 30% and Solr dashboard showing the
> same.  I haven't been able to find what causes / why Solr is using the swap
> space.  The index size is well within memory size to fit and we have double
> the index size on our performance test machines but there is no usage of
> swap space.
>
> Thanks,
> Susheel
>
> On Wed, Feb 7, 2018 at 3:09 PM, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 2/7/2018 12:01 PM, Susheel Kumar wrote:
>>
>>> Just trying to find where do we set swap space available to Solr
>>> process. I
>>> see in our 6.0 instances it was set to 2GB on and on 6.6 instances its
>>> set
>>> to 16GB.
>>>
>>
>> Solr has absolutely no involvement or control over swap space.  Neither
>> does Java.  This is a function of your operating system's memory
>> management, and is typically set up when you first install your OS.
>>
>> https://www.linux.com/news/all-about-linux-swap-space
>> https://en.wikipedia.org/wiki/Paging#Windows_NT
>>
>> If your system is using swap space, it's a strong indication that you
>> don't have enough memory installed.  If any of the memory that Solr uses is
>> swapped out to disk, Solr performance is going to be REALLY bad.
>>
>> Thanks,
>> Shawn
>>
>
>

Re: Solr Swap space

2018-02-09 Thread Susheel Kumar

Thanks, Shawn, Eric.  I see that same using swapon -s.  Looks like during
the OS setup, it was set as 2 GB (Solr 6.0) and other 16GB (Solr 6.6)

Our 6.0 instance has been running since 1+ year but recently our monit
started reporting swap usage above 30% and Solr dashboard showing the
same.  I haven't been able to find what causes / why Solr is using the swap
space.  The index size is well within memory size to fit and we have double
the index size on our performance test machines but there is no usage of
swap space.

Thanks,
Susheel

On Wed, Feb 7, 2018 at 3:09 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 2/7/2018 12:01 PM, Susheel Kumar wrote:
>
>> Just trying to find where do we set swap space available to Solr process.
>> I
>> see in our 6.0 instances it was set to 2GB on and on 6.6 instances its set
>> to 16GB.
>>
>
> Solr has absolutely no involvement or control over swap space.  Neither
> does Java.  This is a function of your operating system's memory
> management, and is typically set up when you first install your OS.
>
> https://www.linux.com/news/all-about-linux-swap-space
> https://en.wikipedia.org/wiki/Paging#Windows_NT
>
> If your system is using swap space, it's a strong indication that you
> don't have enough memory installed.  If any of the memory that Solr uses is
> swapped out to disk, Solr performance is going to be REALLY bad.
>
> Thanks,
> Shawn
>

Solr Swap space

2018-02-07 Thread Susheel Kumar

Hello,

Just trying to find where do we set swap space available to Solr process. I
see in our 6.0 instances it was set to 2GB on and on 6.6 instances its set
to 16GB.

Thanks,
Susheel

Re: problem with Solr Sorting by score and distance together

2018-01-04 Thread Susheel Kumar

Hi Deepak,  As Shawn mentioned, switch your q and fq values above like

q=facilityName:"orthodontist"+OR+facilityName:*orthodontist*
+OR+facilityName:"paul"+OR+facilityName:*paul*+OR+facilityName:*paul+
orthodontist*+OR+facilityName:"paul+orthodontist"+OR+
firstName:"orthodontist"+OR+firstName:*orthodontist*+OR+
firstName:"paul"+OR+firstName:*paul*+OR+firstName:*paul+
orthodontist*+OR+firstName:..
...={!geofilt+sfield%3Dlocation+pt%3D37.564143,-122.004179+d%3D60.0}

Also looking your query you would be better off with using catch all field
when you are trying to find same text in multiple fields

Thnx


On Thu, Jan 4, 2018 at 7:26 PM, Deepak Udapudi  wrote:

> Hi Shawn,
>
> Thanks for the response.
>
> In the problem example in the below email I had used a hypothetical
> example for my query.
>
> Actually, we are trying to search for the name and specialty
> combination(for ex:- paul orthodontist) of the dentist sorted by the
> highest score and distance (in case of same dentists matching the free text
> criteria).
>
> Below are the Solr logs.
>
> 2018-01-05 00:13:05.835 INFO  (qtp1348949648-14) [
>  x:provider_collection] o.a.s.c.S.Request [provider_collection]
> webapp=/solr path=/select params={q=distance:{!geofilt+
> sfield%3Dlocation+pt%3D37.564143,-122.004179+d%3D60.0}&
> fl=*,distance:mul(geodist(location,37.5641425,-122.
> 004179),0.621371)=0=facilityName:"orthodontist"+
> OR+facilityName:*orthodontist*+OR+facilityName:"paul"+OR+
> facilityName:*paul*+OR+facilityName:*paul+orthodontist*+OR+facilityName:
> "paul+orthodontist"+OR+firstName:"orthodontist"+OR+
> firstName:*orthodontist*+OR+firstName:"paul"+OR+firstName:
> *paul*+OR+firstName:*paul+orthodontist*+OR+firstName:"
> paul+orthodontist"+OR+fullName:"orthodontist"+OR+
> fullName:*orthodontist*+OR+fullName:"paul"+OR+fullName:*
> paul*+OR+fullName:*paul+orthodontist*+OR+fullName:"paul+orthodontist"+OR+
> groupPracticeNpi:"orthodontist"+OR+groupPracticeNpi:*orthodontist*+OR+
> groupPracticeNpi:"paul"+OR+groupPracticeNpi:*paul*+OR+
> groupPracticeNpi:*paul+orthodontist*+OR+groupPracticeNpi:"paul+
> orthodontist"+OR+keywords:"orthodontist"+OR+keywords:*
> orthodontist*+OR+keywords:"paul"+OR+keywords:*paul*+OR+
> keywords:*paul+orthodontist*+OR+keywords:"paul+orthodontist"+OR+lastName:"
> orthodontist"+OR+lastName:*orthodontist*+OR+lastName:"
> paul"+OR+lastName:*paul*+OR+lastName:*paul+orthodontist*+
> OR+lastName:"paul+orthodontist"+OR+licenseNumber:"orthodontist"+
> OR+licenseNumber:*orthodontist*+OR+licenseNumber:"paul"+OR+
> licenseNumber:*paul*+OR+licenseNumber:*paul+orthodontist*+OR+
> licenseNumber:"paul+orthodontist"+OR+npi:"orthodontist"+OR+npi:*
> orthodontist*+OR+npi:"paul"+OR+npi:*paul*+OR+npi:*paul+
> orthodontist*+OR+npi:"paul+orthodontist"+OR+officeName:"
> orthodontist"+OR+officeName:*orthodontist*+OR+officeName:"
> paul"+OR+officeName:*paul*+OR+officeName:*paul+orthodontist*
> +OR+officeName:"paul+orthodontist"+OR+practiceLocationLanguages:"
> orthodontist"+OR+practiceLocationLanguages:*orthodontist*+OR+
> practiceLocationLanguages:"paul"+OR+practiceLocationLanguages:*paul*+OR+
> practiceLocationLanguages:*paul+orthodontist*+OR+
> practiceLocationLanguages:"paul+orthodontist"+OR+practiceLocationNpi:"
> orthodontist"+OR+practiceLocationNpi:*orthodontist*+OR+
> practiceLocationNpi:"paul"+OR+practiceLocationNpi:*paul*+OR+
> practiceLocationNpi:*paul+orthodontist*+OR+practiceLocationNpi:"paul+
> orthodontist"+OR+providerLanguages:"orthodontist"+OR+providerLanguages:*
> orthodontist*+OR+providerLanguages:"paul"+OR+providerLanguages:*paul*+OR+
> providerLanguages:*paul+orthodontist*+OR+providerLanguages:"paul+
> orthodontist"+OR+specialty:"orthodontist"+OR+specialty:*
> orthodontist*+OR+specialty:"paul"+OR+specialty:*paul*+OR+
> specialty:*paul+orthodontist*+OR+specialty:"paul+
> orthodontist"=geodist(location,37.564143,-122.
> 004179)+asc,score+desc=10=javabin=2} hits=577 status=0
> QTime=284
>
> 2018-01-05 00:13:06.886 INFO  (qtp1348949648-17) [
>  x:provider_collection] o.a.s.c.S.Request [provider_collection]
> webapp=/solr path=/admin/ping params={wt=javabin=2} hits=304592
> status=0 QTime=0
> 2018-01-05 00:13:06.886 INFO  (qtp1348949648-17) [
>  x:provider_collection] o.a.s.c.S.Request [provider_collection]
> webapp=/solr path=/admin/ping params={wt=javabin=2} status=0 QTime=0
> 2018-01-05 00:13:06.888 INFO  (qtp1348949648-16) [
>  x:provider_collection] o.a.s.c.S.Request [provider_collection]
> webapp=/solr path=/admin/ping params={wt=javabin=2} hits=304592
> status=0 QTime=0
> 2018-01-05 00:13:06.888 INFO  (qtp1348949648-16) [
>  x:provider_collection] o.a.s.c.S.Request [provider_collection]
> webapp=/solr path=/admin/ping params={wt=javabin=2} status=0 QTime=0
> 2018-01-05 00:13:06.891 INFO  (qtp1348949648-19) [   x:yelp_collection]
> o.a.s.c.S.Request [yelp_collection]  webapp=/solr path=/admin/ping
> params={wt=javabin=2} hits=13 status=0 QTime=0
> 2018-01-05 00:13:06.891 INFO  (qtp1348949648-19) [

Re: How to routing document for send to particular shard range

2018-01-02 Thread Susheel Kumar

Hi Ketan,

I believe you need multiple shard looking the count 800M.  How much will be
the index size?   Assume it comes out to 400G and assume your VM/machines
has 64GB and practically you want to fit your index into memory for each
shard... With that I would create 10shards on 10 machines (40 GB index on
each with some buffer for growth).  Also utilize _route_ parameter for your
queries to be faster.

Thnx

On Tue, Jan 2, 2018 at 5:27 AM, hemanth  wrote:

> Hi Ketan,
>
> I also tried various ways to route documents to different shards based on
> some routing key value. eg:  status: active,inactive and terminated should
> go to 3 different shards. I tried creating implicit as well as composite id
> routers. I could not route the documents to the shard I want. Only thing
> which we can achieve is , documents will be routed based on the hash values
> of the field values. This will do automatically and it will not help to
> manually route to the shard we need. The api documents looks little fuzzy
> and I think solr will not route the documents to the desired shard
> manually.
> I am referring 6.6 version. I also tried creating some dummy "_route_"
> field
> and copied my status to this field and tried. But no luck. By any chance if
> you got the solution. Please let me know. I think , it will be one of the
> important feature , that can be enhanced. Creating different collections ,
> just for the difference of one field is of not good option. for eg: if we
> have sales documents, we want to partition them by sales country. i.e USA
> sales in one shard and Canada sales in one shard etc.. For this case , we
> need one collection with many shards and each shard should contain the data
> only to that particular shard.
>
> Thanks
> Hemanth
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: OOM spreads to other replica's/HA when OOM

2017-12-18 Thread Susheel Kumar

Technically I agree Shawn with you on fixing OOME cause, Infact it is not
an issue any more but I was testing for HA when planing for any failures.
Same time it's hard to convince Business folks that HA wouldn't be there in
case of OOME.

I think the best option is to enable timeAllowed for now.

Thanks,
Susheel

On Mon, Dec 18, 2017 at 11:37 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 12/18/2017 9:01 AM, Susheel Kumar wrote:
> > Any thoughts on how one can provide HA in these situations.
>
> As I have said already a couple of times today on other threads, there
> are *exactly* two ways to deal with OOME.  No other solution is possible.
>
> 1) Configure the system to allow the process to access more of the
> resource that it's running out of.  This is typically the solution that
> people will utilize.  In your case, you would need to make the heap larger.
>
> 2) Change the configuration or the environment so fewer resources are
> required.
>
> OOME is special.  It is a problem that all the high availability steps
> in the world cannot protect you from, for precisely the reasons that
> Emir and I have described.  You must ensure that Solr is set up so there
> are enough resources that OOME cannot occur.
>
> I can see a general argument for making it possible to configure or
> disable any retry mechanism in SolrCloud, but that is not the solution
> here.  It would most likely only *delay* the problem to a later query.
> The OOME itself must be fixed, using one of the two solutions already
> outlined.
>
> Thanks,
> Shawn
>
>

Re: OOM spreads to other replica's/HA when OOM

2017-12-18 Thread Susheel Kumar

Shawn/Emir - its the Java heap space issue.  I can see in GCViewer sudden
heap utilization and finally Full GC lines and oom killer script killing
the solr.

What I wonder is if there is retry from coordinating node which is causing
this OOM query to spread to next set of replica's then how can we tune /
change this behavior. Otherwise even though we have higher replication
factor > 1 but still HA is not guaranteed in this situation which defeats
the purpose...

If we can't control this retry by coordinating node then I would say we
have something fundamental wrong.  I know "timeAllowed" may save us in some
of these scenario's but if OOM happens before "timeAllowed"+extraTime (it
takes to really kill the query) we still have the issue.

Any thoughts on how one can provide HA in these situations.

Thanks,
Susheel



On Mon, Dec 18, 2017 at 9:53 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Ah, I misunderstood your usecase - it is not node that receives query that
> OOMs but nodes that are included in distributed queries are the one that
> OOMs. I would also say that it is expected because queries to particular
> shards fails and coordinating node retries using other replicas causing all
> replicas to fail. I did not check the code, but I would expect to have some
> sort of retry mechanism in place.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 18 Dec 2017, at 15:36, Susheel Kumar <susheel2...@gmail.com> wrote:
> >
> > Yes, Emir.  If I repeat the query, it will spread to other nodes but
> that's
> > not the case.  This is my test env and i am deliberately executing the
> > query with very high offset and wildcard to cause OOM but executing only
> > one time.
> >
> > So it shouldn't spread to other replica sets and at the end of my test,
> > the first 6 shard/replica set's which gets hit should go down while
> other 6
> > should survive but that's not what I see at the end.
> >
> > Setup :  400+ million docs, JVM is 12GB.  Yes, only one collection. Total
> > 12 machines with 6 shards and 6 replica's (replicationFactor = 2)
> >
> > On Mon, Dec 18, 2017 at 9:22 AM, Emir Arnautović <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Hi Susheel,
> >> The fact that only node that received query OOM tells that it is about
> >> merging results from all shards and providing final result. It is
> expected
> >> that repeating the same query on some other node will result in a
> similar
> >> behaviour - it just mean that Solr does not have enough memory to
> execute
> >> this heavy query.
> >> Can you share more details on your test: size of collection, type of
> >> query, expected number of results, JVM settings, is that the only
> >> collection on cluster etc.
> >>
> >> Thanks,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 18 Dec 2017, at 15:07, Susheel Kumar <susheel2...@gmail.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I was testing Solr to see if a query which would cause OOM and would
> >> limit
> >>> the OOM issue to only the replica set's which gets hit first.
> >>>
> >>> But the behavior I see that after all set of first replica's went down
> >> due
> >>> to OOM (gone on cloud view) other replica's starts also getting down.
> >> Total
> >>> 6 shards I have with each shard having 2 replica's and on separate
> >> machines
> >>>
> >>> The expected behavior is that all shards replica which gets hit first
> >>> should go down due to OOM and then other replica's should survive and
> >>> provide High Availability.
> >>>
> >>> The setup I am testing with is Solr 6.0 and wondering if this is would
> >>> remain same with 6.6 or there has been some known improvements made to
> >>> avoid spreading OOM to second/third set of replica's and causing whole
> >>> cluster to down.
> >>>
> >>> Any info on this is appreciated.
> >>>
> >>> Thanks,
> >>> Susheel
> >>
> >>
>
>

Re: OOM spreads to other replica's/HA when OOM

2017-12-18 Thread Susheel Kumar

Yes, Emir.  If I repeat the query, it will spread to other nodes but that's
not the case.  This is my test env and i am deliberately executing the
query with very high offset and wildcard to cause OOM but executing only
one time.

So it shouldn't spread to other replica sets and at the end of my test,
the first 6 shard/replica set's which gets hit should go down while other 6
should survive but that's not what I see at the end.

Setup :  400+ million docs, JVM is 12GB.  Yes, only one collection. Total
12 machines with 6 shards and 6 replica's (replicationFactor = 2)

On Mon, Dec 18, 2017 at 9:22 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Susheel,
> The fact that only node that received query OOM tells that it is about
> merging results from all shards and providing final result. It is expected
> that repeating the same query on some other node will result in a similar
> behaviour - it just mean that Solr does not have enough memory to execute
> this heavy query.
> Can you share more details on your test: size of collection, type of
> query, expected number of results, JVM settings, is that the only
> collection on cluster etc.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 18 Dec 2017, at 15:07, Susheel Kumar <susheel2...@gmail.com> wrote:
> >
> > Hello,
> >
> > I was testing Solr to see if a query which would cause OOM and would
> limit
> > the OOM issue to only the replica set's which gets hit first.
> >
> > But the behavior I see that after all set of first replica's went down
> due
> > to OOM (gone on cloud view) other replica's starts also getting down.
> Total
> > 6 shards I have with each shard having 2 replica's and on separate
> machines
> >
> > The expected behavior is that all shards replica which gets hit first
> > should go down due to OOM and then other replica's should survive and
> > provide High Availability.
> >
> > The setup I am testing with is Solr 6.0 and wondering if this is would
> > remain same with 6.6 or there has been some known improvements made to
> > avoid spreading OOM to second/third set of replica's and causing whole
> > cluster to down.
> >
> > Any info on this is appreciated.
> >
> > Thanks,
> > Susheel
>
>

OOM spreads to other replica's/HA when OOM

2017-12-18 Thread Susheel Kumar

Hello,

I was testing Solr to see if a query which would cause OOM and would limit
the OOM issue to only the replica set's which gets hit first.

But the behavior I see that after all set of first replica's went down due
to OOM (gone on cloud view) other replica's starts also getting down. Total
6 shards I have with each shard having 2 replica's and on separate machines

The expected behavior is that all shards replica which gets hit first
should go down due to OOM and then other replica's should survive and
provide High Availability.

The setup I am testing with is Solr 6.0 and wondering if this is would
remain same with 6.6 or there has been some known improvements made to
avoid spreading OOM to second/third set of replica's and causing whole
cluster to down.

Any info on this is appreciated.

Thanks,
Susheel

Re: [ANNOUNCE] Apache Solr 7.1.0 released

2017-10-17 Thread Susheel Kumar

Thank you, Yonik. Able to download directly.

On Tue, Oct 17, 2017 at 11:29 AM, Yonik Seeley <ysee...@gmail.com> wrote:

> It pointed to 7.1.0 for me perhaps a browser cache issue?
> Anyway, you can go directly as well:
> http://www.apache.org/dyn/closer.lua/lucene/solr/7.1.0
>
> -Yonik
>
>
> On Tue, Oct 17, 2017 at 11:25 AM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
> > Thanks, Shalin.
> >
> > But the download mirror still has 7.0.1 not 7.1.0.
> >
> > http://www.apache.org/dyn/closer.lua/lucene/solr/7.0.1
> >
> >
> >
> >
> > On Tue, Oct 17, 2017 at 5:28 AM, Shalin Shekhar Mangar
> > <shalinman...@gmail.com> wrote:
> >>
> >> 17 October 2017, Apache Solr™ 7.1.0 available
> >>
> >> The Lucene PMC is pleased to announce the release of Apache Solr 7.1.0
> >>
> >> Solr is the popular, blazing fast, open source NoSQL search platform
> >> from the Apache Lucene project. Its major features include powerful
> >> full-text search, hit highlighting, faceted search, dynamic
> >> clustering, database integration, rich document (e.g., Word, PDF)
> >> handling, and geospatial search. Solr is highly scalable, providing
> >> fault tolerant distributed search and indexing, and powers the search
> >> and navigation features of many of the world's largest internet sites.
> >>
> >> Solr 7.1.0 is available for immediate download at:
> >>
> >> http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
> >>
> >> See http://lucene.apache.org/solr/7_1_0/changes/Changes.html for a
> >> full list of details.
> >>
> >> Solr 7.1.0 Release Highlights:
> >>
> >> * Critical Security Update: Fix for CVE-2017-12629 which is a working
> >> 0-day exploit reported on the public mailing list. See
> >> https://s.apache.org/FJDl
> >>
> >> * Auto-scaling: Solr can now move replicas automatically when a new
> >> node is added or an existing node is removed using the auto scaling
> >> policy framework introduced in 7.0
> >>
> >> * Auto-scaling: The 'autoAddReplicas' feature which was limited to
> >> shared file systems is now available for all file systems. It has been
> >> ported to use the new autoscaling framework internally.
> >>
> >> * Auto-scaling: New set-trigger, remove-trigger, set-listener,
> >> remove-listener, suspend-trigger, resume-trigger APIs
> >>
> >> * Auto-scaling: New /autoscaling/history API to show past autoscaling
> >> actions and cluster events
> >>
> >> * New JSON based Query DSL for Solr that extends JSON Request API to
> >> also support all query parsers and their nested parameters
> >>
> >> * JSON Facet API: min/max aggregations are now supported on
> >> single-valued date fields
> >>
> >> * Lucene's Geo3D (surface of sphere & ellipsoid) is now supported on
> >> spatial RPT fields by setting spatialContextFactory="Geo3D".
> >> Furthermore, this is the first time Solr has out of the box support
> >> for polygons
> >>
> >> * Expanded support for statistical stream evaluators such as various
> >> distributions, rank correlations, distances and more.
> >>
> >> * Multiple other optimizations and bug fixes
> >>
> >> You are encouraged to thoroughly read the "Upgrade Notes" at
> >> http://lucene.apache.org/solr/7_1_0/changes/Changes.html or in the
> >> CHANGES.txt file accompanying the release.
> >>
> >> Solr 7.1 also includes many other new features as well as numerous
> >> optimizations and bugfixes of the corresponding Apache Lucene release.
> >>
> >> Please report any feedback to the mailing lists
> >> (http://lucene.apache.org/solr/discussion.html)
> >>
> >> Note: The Apache Software Foundation uses an extensive mirroring
> >> network for distributing releases. It is possible that the mirror you
> >> are using may not have replicated the release yet. If that is the
> >> case, please try another mirror. This also goes for Maven access.
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: [ANNOUNCE] Apache Solr 7.1.0 released

2017-10-17 Thread Susheel Kumar

Thanks, Shalin.

But the download mirror still has 7.0.1 not 7.1.0.

http://www.apache.org/dyn/closer.lua/lucene/solr/7.0.1




On Tue, Oct 17, 2017 at 5:28 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> 17 October 2017, Apache Solr™ 7.1.0 available
>
> The Lucene PMC is pleased to announce the release of Apache Solr 7.1.0
>
> Solr is the popular, blazing fast, open source NoSQL search platform
> from the Apache Lucene project. Its major features include powerful
> full-text search, hit highlighting, faceted search, dynamic
> clustering, database integration, rich document (e.g., Word, PDF)
> handling, and geospatial search. Solr is highly scalable, providing
> fault tolerant distributed search and indexing, and powers the search
> and navigation features of many of the world's largest internet sites.
>
> Solr 7.1.0 is available for immediate download at:
>
> http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
>
> See http://lucene.apache.org/solr/7_1_0/changes/Changes.html for a
> full list of details.
>
> Solr 7.1.0 Release Highlights:
>
> * Critical Security Update: Fix for CVE-2017-12629 which is a working
> 0-day exploit reported on the public mailing list. See
> https://s.apache.org/FJDl
>
> * Auto-scaling: Solr can now move replicas automatically when a new
> node is added or an existing node is removed using the auto scaling
> policy framework introduced in 7.0
>
> * Auto-scaling: The 'autoAddReplicas' feature which was limited to
> shared file systems is now available for all file systems. It has been
> ported to use the new autoscaling framework internally.
>
> * Auto-scaling: New set-trigger, remove-trigger, set-listener,
> remove-listener, suspend-trigger, resume-trigger APIs
>
> * Auto-scaling: New /autoscaling/history API to show past autoscaling
> actions and cluster events
>
> * New JSON based Query DSL for Solr that extends JSON Request API to
> also support all query parsers and their nested parameters
>
> * JSON Facet API: min/max aggregations are now supported on
> single-valued date fields
>
> * Lucene's Geo3D (surface of sphere & ellipsoid) is now supported on
> spatial RPT fields by setting spatialContextFactory="Geo3D".
> Furthermore, this is the first time Solr has out of the box support
> for polygons
>
> * Expanded support for statistical stream evaluators such as various
> distributions, rank correlations, distances and more.
>
> * Multiple other optimizations and bug fixes
>
> You are encouraged to thoroughly read the "Upgrade Notes" at
> http://lucene.apache.org/solr/7_1_0/changes/Changes.html or in the
> CHANGES.txt file accompanying the release.
>
> Solr 7.1 also includes many other new features as well as numerous
> optimizations and bugfixes of the corresponding Apache Lucene release.
>
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/solr/discussion.html)
>
> Note: The Apache Software Foundation uses an extensive mirroring
> network for distributing releases. It is possible that the mirror you
> are using may not have replicated the release yet. If that is the
> case, please try another mirror. This also goes for Maven access.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: streaming with SolrJ

2017-09-28 Thread Susheel Kumar

I have this snippet with couple of functions e.g. if that helps

---
TupleStream stream;
List tuples;
StreamContext streamContext = new StreamContext();
SolrClientCache solrClientCache = new SolrClientCache();
streamContext.setSolrClientCache(solrClientCache);

StreamFactory factory = new StreamFactory()
 .withCollectionZkHost("gettingstarted", "localhost:2181")
.withFunctionName("search", CloudSolrStream.class)
  .withFunctionName("select", SelectStream.class)
  .withFunctionName("add", AddEvaluator.class)
  .withFunctionName("if", IfThenElseEvaluator.class)
  .withFunctionName("gt", GreaterThanEvaluator.class)
  .withFunctionName("let", LetStream.class)
  .withFunctionName("get", GetStream.class)
  .withFunctionName("echo", EchoStream.class)
  .withFunctionName("merge", MergeStream.class)
  .withFunctionName("sort", SortStream.class)
  .withFunctionName("tuple", TupStream.class)
  .withFunctionName("rollup",RollupStream.class)
  .withFunctionName("hashJoin", HashJoinStream.class)
  .withFunctionName("complement", ComplementStream.class)
  .withFunctionName("fetch", FetchStream.class)
  .withFunctionName("having",HavingStream.class)
//  .withFunctionName("eq", EqualsEvaluator.class)
  .withFunctionName("count", CountMetric.class)
  .withFunctionName("facet", FacetStream.class)
  .withFunctionName("sum", SumMetric.class)
  .withFunctionName("unique", UniqueStream.class)
  .withFunctionName("uniq", UniqueMetric.class)
  .withFunctionName("innerJoin", InnerJoinStream.class)
  .withFunctionName("intersect", IntersectStream.class)
  .withFunctionName("replace", ReplaceOperation.class)

  ;
try {
clause = getClause();
  stream = factory.constructStream(clause);
  stream.setStreamContext(streamContext);
  tuples = getTuples(stream);

  for(Tuple tuple : tuples )
  {
  System.out.println(tuple.getString("id"));
  System.out.println(tuple.getString("business_email_s"));


  }

  System.out.println("Total tuples retunred "+tuples.size());


---
private static String getClause() {
String clause = "select(search(gettingstarted,\n" +
"q=*:* NOT personal_email_s:*,\n" +
"fl=\"id,business_email_s\",\n" +
"sort=\"business_email_s asc\"),\n" +
"id,\n" +
"business_email_s,\n" +
"personal_email_s,\n" +
"replace(personal_email_s,null,withField=business_email_s)\n" +
")";
return clause;
}


On Thu, Sep 28, 2017 at 3:35 PM, Hendrik Haddorp 
wrote:

> Hi,
>
> I'm trying to use the streaming API via SolrJ but have some trouble with
> the documentation and samples. In the reference guide I found the below
> example in http://lucene.apache.org/solr/guide/6_6/streaming-expression
> s.html. Problem is that "withStreamFunction" does not seem to exist.
> There is "withFunctionName", which would match the arguments but there is
> no documentation in the JavaDoc nor is the sample stating why I would need
> all those "with" calls if pretty much everything is also in the last
> "constructStream" method call. I was planning to retrieve a few fields for
> all documents in a collection but have trouble to figure out what is the
> correct way to do so. The documentation also uses "/export" and "/search",
> with little explanation on the differences. Would really appreciate a
> pointer to some simple samples.
>
> The org.apache.solr.client.solrj.io package provides Java classes that
> compile streaming expressions into streaming API objects. These classes can
> be used to execute streaming expressions from inside a Java application.
> For example:
>
> StreamFactory streamFactory = new 
> StreamFactory().withCollectionZkHost("collection1",
> zkServer.getZkAddress())
> .withStreamFunction("search", CloudSolrStream.class)
> .withStreamFunction("unique", UniqueStream.class)
> .withStreamFunction("top", RankStream.class)
> .withStreamFunction("group", ReducerStream.class)
> .withStreamFunction("parallel", ParallelStream.class);
>
> ParallelStream pstream = (ParallelStream)streamFactory.
> constructStream("parallel(collection1, group(search(collection1,
> q=\"*:*\", fl=\"id,a_s,a_i,a_f\", sort=\"a_s asc,a_f asc\",
> partitionKeys=\"a_s\"), by=\"a_s asc\"), workers=\"2\",
> zkHost=\""+zkHost+"\", sort=\"a_s asc\")");
>
> regards,
> Hendrik
>

Re: Solr 5.5.2 - Custom Function Query update

2017-09-25 Thread Susheel Kumar

ignore solr version...

On Mon, Sep 25, 2017 at 11:21 AM, Susheel Kumar <susheel2...@gmail.com>
wrote:

> Check if your jar is present at solr-6.0.0/server/solr//lib/ or do
> a find under solr directory...
>
> On Mon, Sep 25, 2017 at 9:59 AM, Florian Le Vern <florian.lev...@mappy.com
> > wrote:
>
>> Hi,
>>
>> I added a custom Function Query in a jar library that is loaded from the
>> `solr/data/lib` folder (same level as the cores) with the solrconfig line:
>> > class="blah.blah.solr.search.function.MyFuncValueParser"
>> />
>>
>> I just updated this lib but after restarting Solr, it seems that it
>> still uses the previous version.
>> I also tried to delete the lib from the `solr/data/lib` folder without
>> changing the solrconfig but it was still working.
>>
>> Do you have any clues for updating a custom lib ?
>>
>> Thanks in advance,
>> Florian
>>
>>
>

Re: Solr 5.5.2 - Custom Function Query update

2017-09-25 Thread Susheel Kumar

Check if your jar is present at solr-6.0.0/server/solr//lib/ or do a
find under solr directory...

On Mon, Sep 25, 2017 at 9:59 AM, Florian Le Vern 
wrote:

> Hi,
>
> I added a custom Function Query in a jar library that is loaded from the
> `solr/data/lib` folder (same level as the cores) with the solrconfig line:
>  class="blah.blah.solr.search.function.MyFuncValueParser"
> />
>
> I just updated this lib but after restarting Solr, it seems that it
> still uses the previous version.
> I also tried to delete the lib from the `solr/data/lib` folder without
> changing the solrconfig but it was still working.
>
> Do you have any clues for updating a custom lib ?
>
> Thanks in advance,
> Florian
>
>

Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-22 Thread Susheel Kumar

It may happen that you may never find the queries/query time being logged
for the queries which caused OOM and your app never got chance to log how
much time it took...

So if you had proper exception handled in your client code, you may see
exception being logged but not see the query time for such queries.

Thnx

On Fri, Sep 22, 2017 at 6:32 AM, shamik  wrote:

> I usually log queries that took more than 1sec. Based on the logs, I
> haven't
> seen anything alarming or surge in terms of slow queries, especially around
> the time when the CPU spike happened.
>
> I don't necessarily have the data for deep paging, but the usage of sort
> parameter (date in our case) has been typically low. We also restrict 10
> results per page for pagination. Are there are recommendations around this?
>
> Again, I don't want to sound like a broken record, but I still don't get
> the
> part why these issues crop in 6.6 as compared to 5.5
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Not able to import timestamp data into Solr

2017-09-20 Thread Susheel Kumar

Checkout this article for working with date types and format etc.
http://lucene.apache.org/solr/guide/6_6/working-with-dates.html

On Wed, Sep 20, 2017 at 6:32 AM, shankhamajumdar <
shankha.majum...@lexmark.com> wrote:

> Hi,
>
> I have a field with timestamp data in Cassandra for example - 2017-09-20
> 10:25:46.752000+.
> I am not able to import the data using Solr DataImportHandler, getting the
> bellow error in the Solr log.
>
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
> range: -1
>
> I am able to import other datatype data from Cassandra to Solr. I am using
> below configuration
> managed-schema
>  required="true" multiValued="false" />
>  required="true" multiValued="false" />
>  required="true" multiValued="false" />
>  required="true"  multiValued="false" />
>
> dataconfig.xml
> query="SELECT test_data1,test_data2,test_data3, upserttime from
> test_table"
> autoCommit="true">
> 
> 
> 
> 
>
> Regards,
> Shankha
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Solr Streaming Question

2017-09-19 Thread Susheel Kumar

You can follow the sectionCreating an Alert With the Topic Streaming
Expression" at http://joelsolr.blogspot.com/  and use random function for
getting random records and schedule using daemon function to retrieve
periodically etc.

Thanks,
Susheel



On Tue, Sep 19, 2017 at 4:56 PM, Erick Erickson 
wrote:

> Webster:
>
> I think you're looking for UpdateStream. Unfortunately the fix version
> wasn't entered so you'll have to look at your particular version but
> going strictly from the dates it appears in 6.0.
>
> David:
>
> Stored is irrelevant. Streaming only works with docValues="true"
> fields and moves the docValues content over.
>
> Best,
> Erick
>
> On Tue, Sep 19, 2017 at 12:39 PM, David Hastings
>  wrote:
> > I am also curious about this, specifically about indexed/non stored
> fields.
> >
> > On Tue, Sep 19, 2017 at 3:33 PM, Webster Homer 
> > wrote:
> >
> >> Is it possible to use the streaming API to stream documents from a
> >> collection and load them into a new collection? I was thinking that this
> >> would be a great way to get a random sample of data from our main
> >> collections to developer machines. Making it a random sample would be
> >> useful as well. This looks feasible, but I've only scratched the
> surface of
> >> streaming Solr
> >>
> >> Thanks
> >>
> >> --
> >>
> >>
> >> This message and any attachment are confidential and may be privileged
> or
> >> otherwise protected from disclosure. If you are not the intended
> recipient,
> >> you must not copy this message or attachment or disclose the contents to
> >> any other person. If you have received this transmission in error,
> please
> >> notify the sender immediately and delete the message and any attachment
> >> from your system. Merck KGaA, Darmstadt, Germany and any of its
> >> subsidiaries do not accept liability for any omissions or errors in this
> >> message which may arise as a result of E-Mail-transmission or for
> damages
> >> resulting from any unauthorized changes of the content of this message
> and
> >> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> >> subsidiaries do not guarantee that this message is free of viruses and
> does
> >> not accept liability for any damages caused by any virus transmitted
> >> therewith.
> >>
> >> Click http://www.emdgroup.com/disclaimer to access the German, French,
> >> Spanish and Portuguese versions of this disclaimer.
> >>
>

Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-19 Thread Susheel Kumar

+1. Asking for way more than anything you need may result into OOM.  rows
and facet.limit should be carefully passed.

On Tue, Sep 19, 2017 at 1:23 PM, Toke Eskildsen  wrote:

> shamik  wrote:
> > I've facet.limit=-1 configured for few search types, but facet.mincount
> is
> > always set as 1. Didn't know that's detrimental to doc values.
>
> It is if you have a lot (1000+) of unique values in your facet field,
> especially when you have more than 1 shard. Only ask for the number you
> need. Same goes for rows BTW.
>
> - Toke Eskildsen
>

Re: Zookeeper credentials are showed up on the Solr Admin GUI

2017-09-19 Thread Susheel Kumar

Hi Ivan, Can you please submit a JIRA/bug report for this at
https://issues.apache.org/jira/projects/SOLR

Thanks,
Susheel

On Tue, Sep 19, 2017 at 11:12 AM, Pekhov, Ivan (NIH/NLM/NCBI) [C] <
ivan.pek...@nih.gov> wrote:

> Hello Guys,
>
> We've been noticing this problem with Solr version 5.4.1 and it's still
> the case for the version 6.6.0. The problem is that we're using SolrCloud
> with secured Zookeeper and our users are granted access to Solr Admin GUI,
> and, at the same time, they are not supposed to have access to Zookeeper
> credentials, i.e. usernames and passwords. However, we (and some of our
> users) have found out that Zookeeper credentials are displayed on at least
> two sections of the Solr Admin GUI, i.e. "Dashboard" and "Java Properties".
>
> Having taken a look at the JavaScript code that runs behind the scenes for
> those pages, we can see that the sensitive parameters ( -DzkDigestPassword,
> -DzkDigestReadonlyPassword, -DzkDigestReadonlyUsername, -DzkDigestUsername
> ) are fetched via AJAX from the following two URL paths:
>
> /solr/admin/info/system
> /solr/admin/info/properties
>
> Could you please consider for the future Solr releases removing the
> Zookeeper parameters mentioned above from the output of these URLs and from
> other URLs that contain this information in their output, if there are any
> besides the ones mentioned? We find that it is be pretty challenging (and
> probably impossible) to restrict users from accessing some particular paths
> with security.json mechanism, and we think that that would be beneficial
> for overall Solr security to hide Zookeeper credentials.
>
> Thank you so much for your consideration!
>
> Best regards,
> Ivan Pekhov
>
>

Re: Solr- Data search across multiple vores

2017-09-18 Thread Susheel Kumar

What fields do you want to search among two separate collections/cores and
provide some details on your use case.

Thnx

On Mon, Sep 18, 2017 at 1:42 AM, Agrawal, Harshal (GE Digital) <
harshal.agra...@ge.com> wrote:

> Hello Folks,
>
> I want to search data in two separate cores. Both cores are unidentical
> only few fields are common in between.
> I don't want to join data . Is it possible to search data from two cores.
>
> I read about distributed search concept but not able to understand that.
> Is it the only way to search across multiple cores?
>
> Regards
> Harshal
>

Re: query with @ and *

2017-09-14 Thread Susheel Kumar

You may want to use UAX29URLEmailTokenizerFactory tokenizer into your
analysis chain.

Thanks,
Susheel


On Thu, Sep 14, 2017 at 8:46 AM, Shawn Heisey  wrote:

> On 9/14/2017 5:06 AM, Mannott, Birgit wrote:
> > I have a problem when searching on email addresses.
> > @ seems to be handled as a special character but I don't find anything
> about it in the documentation.
> >
> > This is my test data
> > t...@one.com
> > t...@two.com
>
> Chances are that have analysis defined on this field, and that the
> analysis includes a tokenizer or tokenizer/filter combination that
> splits on punctuation.  This means that for the both entries, you have
> three terms.  For the first one, those terms are test, one, and com.
> For the second one, they are test,  two, and com.  The rest of what I'm
> writing assumes that this is the case.
>
> > searching for test* results both, ok.
>
> This matches the term "test" in both entries.
>
> > searching for t...@one.com results the correct one, ok.
>
> Query analysis probably splits the same way index analysis does, so the
> actual search is for all three terms.
>
> > searching for test results both, what I didn't expect but it's ok.
>
> In this case, it matches the simple term "test" that's in the index on
> both documents.
>
> > searching for test@one* results none and that's the problem.
>
> When you include wildcards in a query, most query analysis is skipped,
> so it's looking for the literal text "test@one" followed by any
> characters.  Because the index analysis removed the @ character and
> split the things around it into separate terms, this will not match any
> of the terms in the index.
>
> Wildcards, while they do work in many cases, are often not the correct
> way to do queries.
>
> Thanks,
> Shawn
>
>

Re: Search if the indexed data is present in the query or not

2017-09-12 Thread Susheel Kumar

I am not able to follow what's the use case/ask is, but you already have
the query.  You can search/highlight whatever you want to do with the query
string. Remember you search a single query against multiple (hundreds of
documents)

On Tue, Sep 12, 2017 at 1:31 AM, Nithin Sreekumar 
wrote:

> My query is a message which can be of any length. For example "A quick
> brown lazy fox jump over the well and ran to the jungle" is my query to
> check. I have some indexed data. The indexed data contains strings which is
> present as well as not in the message.
> For example :- {'lazy','jump',etc}
> Obviously, we can check for the query string in the indexed data. But what
> I need is if I search with that long message and if any part of the message
> is present in the indexed data, then it must highlight the part of the
> message where the string is found.
> Example :-
>
> q: "A quick brown lazy fox jump over the well and ran to the jungle"
> If indexed data has a term 'lazy' and no other keywords which is present in
> the string, then if I search for the query then it must return the result
> with the term in the query 'lazy' as highlighted. If more terms are present
> in the indexed data, then they must also be highlighted.
>
> Thanks & Regards,
>
> NITHIN B SREEKUMAR
>

Re: SolrCloud 5.3.1 "IndexWriter is closed"

2017-09-12 Thread Susheel Kumar

During Reload collection, it does close the IndexWriter and opens a new.
So looks like with live traffic (is it heavy traffic, how many qps?) and
collection reload, its running into a situation when IndexWriter is not
able to open up again.  Not sure if this issue might have been fixed in 5
or 6.x and may work fine in latest 6.6.1 if you could test anyway.

https://www.quora.com/What-happens-during-a-Solr-Core-Reload


On Tue, Sep 12, 2017 at 12:47 PM, Kelly, Frank <frank.ke...@here.com> wrote:

> The schema change doesn¹t seem to be making any difference - just the act
> of a reload whilst handling live traffic.
>
> The reload takes about 30 seconds and soon after (within a few seconds) we
> start to see IndexWriter closed exceptions
>
> -Frank
>
>
> Frank Kelly
> Principal Software Engineer
> Identity Profile Team (SCBE, Traces, CDA)
>
> HERE
> 5 Wayside Rd, Burlington, MA 01803, USA
> 42° 29' 7" N 71° 11' 32" W
>
>  <http://360.here.com/> <https://www.twitter.com/here>
> <https://www.facebook.com/here>
> <https://www.linkedin.com/company/heremaps>
> <https://www.instagram.com/here/>
>
>
>
> On 9/12/17, 9:13 AM, "Susheel Kumar" <susheel2...@gmail.com> wrote:
>
> >Kelly -
> >
> >If you do not make any change to schema and just reload your collection,
> >does it work fine? How much time it takes to reload the collection?
> >
> >I am suspecting some conflict with commit frequency (5mins) and collection
> >reload.
> >
> >Thnx
>
>

Re: solr join query

2017-09-12 Thread Susheel Kumar

You may want to look at fetch function of Streaming expressions

http://lucene.apache.org/solr/guide/6_6/stream-decorators.html

Thanks,
Susheel

On Tue, Sep 12, 2017 at 11:11 AM, Brian Yee  wrote:

> I have one solr collection used for auto-suggestions. If I submit a query
> with q="coffe", I will get a responses back with documents that have a
> field suggestion="coffee", "coffee table", "coffee maker", etc. I want to
> know if those suggestions would have results if I used them to query a
> second solr collection for products.
>
> Is there a way to do this without multiple solr queries? I could do the
> first query to the suggestions collection, then take the results from that
> and do more queries to the products collection. But I don't think that will
> meet my performance requirements.
>
> Is this possible with a single join query? Any other ideas?
>

Re: SolrCloud 5.3.1 "IndexWriter is closed"

2017-09-12 Thread Susheel Kumar

Kelly -

If you do not make any change to schema and just reload your collection,
does it work fine? How much time it takes to reload the collection?

I am suspecting some conflict with commit frequency (5mins) and collection
reload.

Thnx

On Tue, Sep 12, 2017 at 6:59 AM, Kelly, Frank  wrote:

> No - these are new terms for new documents we will be adding later so no
> need to reindex old documents.
>
> Frank
>
>
> Frank Kelly
> Principal Software Engineer
> Identity Profile Team (SCBE, Traces, CDA)
>
> HERE
> 5 Wayside Rd, Burlington, MA 01803, USA
> 42° 29' 7" N 71° 11' 32" W
>
>   
> 
> 
> 
>
>
>
> On 9/12/17, 6:27 AM, "Rick Leir"  wrote:
>
> >Frank,
> >
> >I assume you re-index everything after changing schema.xml?
> >
> >cheers -- Rick
> >
>
>

Re: Solr Deleting Docs after Indexing

2017-09-11 Thread Susheel Kumar

Does all 4 document's have same docID (Unqiue key)?

On Mon, Sep 11, 2017 at 2:44 PM, Kaushik  wrote:

> I am using Solr 5.3 and have a custom Solr J application to write to Solr.
> When I index using this application, I expect to see 4 documents indexed.
> But for some strange reason, 3 documents get deleted and there is always
> only 1 document in the index. I say that because the final tally on the
> Solr Admin console is
> Num Docs: 1
> Max Doc: 4
> Deleted Docs: 3
>
>
> How and where in Solr/logs can I find why the documents are being deleted?
>
> Thanks,
> Kaushik
>

Re: commit time in solr cloud

2017-09-11 Thread Susheel Kumar

Hi Wei,

I'm assuming the lastModified time is when latest hard commit happens. Is
that correct?

>> Yes. its correct.

I also see sometime difference between replicas and leader commit
timestamps where the "diff/lag < autoCommit interval". So in your case you
noticed like upto 10 mins.
My guess is due to different start time of these replica/leader node, their
commit may happen at different times and thus you would see the difference.

Thanks,
Susheel

On Fri, Sep 8, 2017 at 3:06 PM, Wei  wrote:

> Hi,
>
> In solr cloud we want to track the last commit time on each node. The
> information source is from the luke handler:
>  admin/luke?numTerms=0=json, e.g.
>
>
>- userData:
>{
>   - commitTimeMSec: "1504895505447"
>   },
>- lastModified: "2017-09-08T18:31:45.447Z"
>
>
>
> I'm assuming the lastModified time is when latest hard commit happens. Is
> that correct?
>
> On all nodes we have autoCommit set to 15 mins interval. One observation I
> don't  understand is quite often the last commit time on shard leaders lags
> behind the last commit time on replicas, some times the lag is over 10
> minutes.  My understanding is that as update requests goes to leader first,
> the timer on the leaders would start earlier than the replicas. Am I
> missing something here?
>
> Thanks,
> Wei
>

Re: Solr list operator

2017-09-06 Thread Susheel Kumar

Nick, checkout terms query parser
http://lucene.apache.org/solr/guide/6_6/other-parsers.html or streaming
expressions.

Thnx

On Wed, Sep 6, 2017 at 8:33 AM, alex goretoy  wrote:

> https://www.youtube.com/watch?v=pNe1wWeaHOU=
> PLYI8318YYdkCsZ7dsYV01n6TZhXA6Wf9i=1
> https://www.youtube.com/watch?v=pNe1wWeaHOU=
> PLYI8318YYdkCsZ7dsYV01n6TZhXA6Wf9i=1
>
> http://audiobible.life CHECK IT OUT!
>
>
> On Wed, Sep 6, 2017 at 5:57 PM, Nick Way 
> wrote:
> > Hi, I have a custom field "listOfIDs" = "1,2,4,33"
> >
> > I want the equivalent of:
> >
> > select * where '1' IN (listOfIDs)  --> should get a match
> >
> > select * where '33' IN (listOfIDs)  --> should get a match
> >
> > select * where '3' IN (listOfIDs)  --> should NOT get a match
> >
> >
> > Can anyone help me out please as I can't seem to find any documentation
> on
> > this. Thanks very much in advance.
> >
> > Kind regards,
> >
> >
> > Nick Way
>

Re: Solr6.6 Issue/Bug

2017-09-06 Thread Susheel Kumar

Try to utilize the steps mentioned here at
http://lucene.apache.org/solr/guide/6_6/taking-solr-to-production.html

On Wed, Sep 6, 2017 at 3:52 AM, Michael Kuhlmann  wrote:

> Why would you need to start Solr as root? You should definitely not do
> this, there's no reason for that.
>
> And even if you *really* want this: What's so bad about the -force option?
>
> -Michael
>
> Am 06.09.2017 um 07:26 schrieb Kasim Jinwala:
> > Dear team,
> >   I am using solr 5.0 last 1 year, now we are planning to upgrade
> > solr 6.6.
> >  While trying to start solr using root user, we need to pass -force
> > parameter to start solr forcefully,
> > please help to start solr using root user without -force command.
> >
> > Regards
> > Kasim J.
> >
>
>

Re: Overseer task timeout

2017-09-01 Thread Susheel Kumar

Which solr and zookeeper version you have. Any why do you  have just 1 node
zookeeper.  Usually you have 3 or so to maintain quorum.

Thnx

On Fri, Sep 1, 2017 at 7:24 AM, Mikhail Ibraheem <
arsenal2...@yahoo.com.invalid> wrote:

>
> Any help please?  From: Mikhail Ibraheem 
>  To: Solr-user 
>  Sent: Wednesday, 30 August 2017, 18:36
>  Subject: Overseer task timeout
>
> Hi,We have one node zookeeper and one no solr. Sometimes when trying to
> create or delete collection there is "SEVERE: 
> null:org.apache.solr.common.SolrException:
> delete the collection time out:180s" error.
> After checking the code I found that solr puts a task node to
> zookeeper /overseer/collection-queue-work/qnr-012764 /overseer/
> collection-queue-work/qn-012764 then a watcher listen for this and
> process the task, then delete the response node which triggers the
> latchWatcher to notify the thread that the task finished. The timeout for
> this is 180 seconds (hard coded). I think that sometimes the watcher to
> trigger the processor not triggered? Is that a bug? How to fix that?
> Please help.
> ThanksMikhail
>
>
>

Re: Index relational database

2017-08-31 Thread Susheel Kumar

Yes, if you can avoid join and work with flat/denormalized structure then
that's the best.

On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti <renuka.srisht...@gmail.com>
wrote:

> Thanks Erick, Walter
> But I think join query will reduce the performance. Denormalization will be
> the better way than join query, am I right?
>
>
>
> On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood <wun...@wunderwood.org>
> wrote:
>
> > Think about making a denormalized view, with all the fields needed in one
> > table. That view gets sent to Solr. Each row is a Solr document.
> >
> > It could be implemented as a view or as SQL, but that is a useful mental
> > model for people starting from a relational background.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > > On Aug 30, 2017, at 9:14 AM, Erick Erickson <erickerick...@gmail.com>
> > wrote:
> > >
> > > First, it's often best, by far, to denormalize the data in your solr
> > index,
> > > that's what I'd explore first.
> > >
> > > If you can't do that, the join query parser might work for you.
> > >
> > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" <renuka.srisht...@gmail.com>
> > > wrote:
> > >
> > >> Thanks Susheel for your response.
> > >> Here is the scenario about which I am talking:
> > >>
> > >>   - Let suppose there are two documents doc1 and doc2.
> > >>   - I want to fetch the data from doc2 on the basis of doc1 fields
> which
> > >>   are related to doc2.
> > >>
> > >> How to achieve this efficiently.
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Renuka Srishti
> > >>
> > >>
> > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar <susheel2...@gmail.com
> >
> > >> wrote:
> > >>
> > >>> Hello Renuka,
> > >>>
> > >>> I would suggest to start with your use case(s). May be start with
> your
> > >>> first use case with the below questions
> > >>>
> > >>> a) What is that you want to search (which fields like name, desc,
> city
> > >>> etc.)
> > >>> b) What is that you want to show part of search result (name, city
> > etc.)
> > >>>
> > >>> Based on above two questions, you would know what data to pull in
> from
> > >>> relational database and create solr schema and index the data.
> > >>>
> > >>> You may first try to denormalize / flatten the structure so that you
> > deal
> > >>> with one collection/schema and query upon it.
> > >>>
> > >>> HTH.
> > >>>
> > >>> Thanks,
> > >>> Susheel
> > >>>
> > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
> > >>> renuka.srisht...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hii,
> > >>>>
> > >>>> What is the best way to index relational database, and how it
> impacts
> > >> on
> > >>>> the performance?
> > >>>>
> > >>>> Thanks
> > >>>> Renuka Srishti
> > >>>>
> > >>>
> > >>
> >
> >
>

Re: SolrCloud indexing -- 2 collections, 2 indexes, sharing the same nodes possible?

2017-08-30 Thread Susheel Kumar

1) As regards naming of the shards: Is using the same naming for the shards
o.k. in this constellation? I.e. does it create trouble to have e.g.
"Shard001", "Shard002", etc. in collection1 and "Shard001", "Shard002",
etc. as well in collection2?
>> The default naming convention for shards would be
"_shard#_replica#".  So complete name will be different
like coll1_shard1_replica1 and coll2_shard1_replica1

2) Performance: In my current single collection setup, I have 2 shards per
node. After creating the second collection, there will be 4 shards per
node. Do I have to edit the RAM per node value (raise the -m parameter when
starting the node)? In my case, I am quite sure that the collections will
never be queried simultaneously. So will the "running but idle" collection
slow me down?
>> Its up to you how you setup JVM.  You can have one JVM instance running
on port assume 8080 and have multiple shards/collections or you can setup
two JVM/solr instances on a node running on different ports like 8080 and
8081 etc. I would suggest to start and test with one JVM and setup multiple
collections until run into performance bottleneck and then split into JVM
with different heaps etc.



On Wed, Aug 30, 2017 at 12:42 PM, Johannes Knaus <kn...@mpdl.mpg.de> wrote:

> Thank you, Susheel, for the quick response.
>
> So, that means that when I create a new collection, it shards will be
> newly created at each node, right?
> Thus, if I have two collections with
> numShards=38,
> maxShardsPerNode=2 and
> replicationFactor=2
> on my 38 nodes, then this would result in each node "hosting" 4 shards
> (two from each collection).
>
> If this is correct, I have two follow up questions:
>
> 1) As regards naming of the shards: Is using the same naming for the
> shards o.k. in this constellation? I.e. does it create trouble to have e.g.
> "Shard001", "Shard002", etc. in collection1 and "Shard001", "Shard002",
> etc. as well in collection2?
>
> 2) Performance: In my current single collection setup, I have 2 shards per
> node. After creating the second collection, there will be 4 shards per
> node. Do I have to edit the RAM per node value (raise the -m parameter when
> starting the node)? In my case, I am quite sure that the collections will
> never be queried simultaneously. So will the "running but idle" collection
> slow me down?
>
> Johannes
>
> -Ursprüngliche Nachricht-
> Von: Susheel Kumar [mailto:susheel2...@gmail.com]
> Gesendet: Mittwoch, 30. August 2017 17:36
> An: solr-user@lucene.apache.org
> Betreff: Re: SolrCloud indexing -- 2 collections, 2 indexes, sharing the
> same nodes possible?
>
> Yes, absolutely.  You can create as many as collections you need (like you
> would create table in relational world).
>
> On Wed, Aug 30, 2017 at 10:13 AM, Johannes Knaus <kn...@mpdl.mpg.de>
> wrote:
>
> > I have a working SolrCloud-Setup with 38 nodes with a collection
> > spanning over these nodes with 2 shards per node and replication
> > factor 2 and a router field.
> >
> > Now I got some new data for indexing which has the same structure and
> > size as my existing index in the described collection.
> > However, although it has the same structure the new data to be indexed
> > should not be mixed with the old data.
> >
> > Do I have create another 38 new nodes and a new collection and index
> > the new data or is there a better / more efficient way I could use the
> > existing nodes?
> > Is it possible that the 2 collections could share the 38 nodes without
> > the indexes being mixed?
> >
> > Thanks for your help.
> >
> > Johannes
> >
>

1 2 3 4 5 >

1 - 100 of 417 matches

Mail list logo