RE: Why External File Field is marked as indexed in solr admin SCHEMA page?

2020-07-22 Thread Vadim Ivanov
Hello, Raj

I've just checked my Schema page for external file field

Solr version 8.3.1 gives only such parameters for externalFileField:


Field: fff

Field-Type:

org.apache.solr.schema.ExternalFileField


Flags:

UnInvertible

Omit Term Frequencies & Positions


Properties

√

√


Are u sure you don’t have (or had)  fieldA  in main collection schema ?

 

externalFileField is not part of the index. It resides in separate file in Solr 
index directory and goes into memory every commit.

 

> -Original Message-

> From: Raj Yadav [mailto:rajkum...@cse.ism.ac.in]

> Sent: Wednesday, July 22, 2020 3:09 PM

> To: solr-user@lucene.apache.org

> Subject: Why External File Field is marked as indexed in solr admin SCHEMA

> page?

> 

> We have following external file field type and field:

> 

> * class="solr.ExternalFileField" valType="float"/>*

> 

> **

> 

> In solr official documentation is it mentioned that :

> *"*The ExternalFileField type makes it possible to specify the values for a

> field in a file outside the Solr index. *External fields are not searchable. 
> They

> can be used only for function queries or display."*

> 

> I was expecting that for field "fieldA" indexed will be marked as false and it

> will not be part of the index. But Solr admin "SCHEMA page" (we get this

> option after selecting collection name in the drop-down menu)  is showing it

> as an indexed field (green tick mark under Indexed flag).

> 

> We have not explicitly specified indexed=false for this external field in our

> schema. Wanted to know whether this field is really part of the index.

> Or it is just a bug from the admin UI side.

> 

> Regards,

> Raj



RE: Unable to start solr server on "Ubuntu 18.04 bash shell on Windows 10"

2020-02-20 Thread Vadim Ivanov
Hi
That seems the reason of solr not starting:

cannot open
'/home/pawasthi/projects/solr_practice/ex1/solr-8.4.1/example/cloud/node1/solr/../logs/solr.log'
for reading: No such file or directory


> -Original Message-
> From: Prabhat Awasthi [mailto:pawasthi.i...@gmail.com]
> Sent: Wednesday, February 19, 2020 6:34 PM
> To: solr-user@lucene.apache.org
> Subject: Unable to start solr server on "Ubuntu 18.04 bash shell on Windows
> 10"
> 
> Hello,
> 
> I am using Linux bash sell (Ubuntu app) on Windows 10 to run Solr on Ubuntu
> 18.04.
> 
> $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description:Ubuntu 18.04.2 LTS
> Release:18.04
> Codename:   bionic
> 
> I already installed Java8 (Openjdk) on my Ubuntu environment.
> 
> $ java -version
> openjdk version "1.8.0_242"
> OpenJDK Runtime Environment (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-
> b08)
> OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
> 
> But I face error when I try to start SolrCloud on my Ubuntu system.
> Could you please help to give me some pointers if I miss anything here ?
> Please find below the full logs.
> 
> Thanks in advance.
> - Prabhat
> 
> --
> -
> $ bin/solr start -e cloud
> *** [WARN] *** Your open file limit is currently 1024.
>  It should be set to 65000 to avoid operational disruption.
>  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false
> in your profile or solr.in.sh
> *** [WARN] ***  Your Max Processes Limit is currently 7823.
>  It should be set to 65000 to avoid operational disruption.
>  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false
> in your profile or solr.in.sh
> 
> Welcome to the SolrCloud example!
> 
> This interactive session will help you launch a SolrCloud cluster on your 
> local
> workstation.
> To begin, how many Solr nodes would you like to run in your local cluster?
> (specify 1-4 nodes) [2]:
> 
> Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.
> Please enter the port for node1 [8983]:
> 
> Please enter the port for node2 [7574]:
> 
> Solr home directory
> /home/pawasthi/projects/solr_practice/ex1/solr-
> 8.4.1/example/cloud/node1/solr
> already exists.
> /home/pawasthi/projects/solr_practice/ex1/solr-
> 8.4.1/example/cloud/node2
> already exists.
> 
> Starting up Solr on port 8983 using command:
> "bin/solr" start -cloud -p 8983 -s "example/cloud/node1/solr"
> 
> *** [WARN] ***  Your Max Processes Limit is currently 7823.
>  It should be set to 65000 to avoid operational disruption.
>  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false
> in your profile or solr.in.sh *Waiting up to 180 seconds to see Solr running 
> on
> port 8983 [|]  bin/solr:
> line 664:   293 Aborted (core dumped) nohup "$JAVA"
> "${SOLR_START_OPTS[@]}" $SOLR_ADDL_ARGS -Dsolr.log.muteconsole "-
> XX:OnOutOfMemoryError=$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT
> $SOLR_LOGS_DIR" -jar start.jar "${SOLR_JETTY_CONFIG[@]}"
> $SOLR_JETTY_ADDL_CONFIG > "$SOLR_LOGS_DIR/solr-$SOLR_PORT-
> console.log" 2>&1*  [|]  Still not seeing Solr listening on 8983 after 180
> seconds!
> tail: cannot open
> '/home/pawasthi/projects/solr_practice/ex1/solr-
> 8.4.1/example/cloud/node1/solr/../logs/solr.log'
> for reading: No such file or directory
> 
> ERROR: Did not see Solr at http://localhost:8983/solr come online within 30
> --
> -



RE: A question about solr filter cache

2020-02-17 Thread Vadim Ivanov
Hi!
Yes, it may depends on Solr version
Solr 8.3 Admin filterCache page stats looks like:

stats:
CACHE.searcher.filterCache.cleanupThread:false
CACHE.searcher.filterCache.cumulative_evictions:0
CACHE.searcher.filterCache.cumulative_hitratio:0.94
CACHE.searcher.filterCache.cumulative_hits:198
CACHE.searcher.filterCache.cumulative_idleEvictions:0
CACHE.searcher.filterCache.cumulative_inserts:12
CACHE.searcher.filterCache.cumulative_lookups:210
CACHE.searcher.filterCache.evictions:0
CACHE.searcher.filterCache.hitratio:1
CACHE.searcher.filterCache.hits:84
CACHE.searcher.filterCache.idleEvictions:0
CACHE.searcher.filterCache.inserts:0
CACHE.searcher.filterCache.lookups:84
CACHE.searcher.filterCache.maxRamMB:-1
CACHE.searcher.filterCache.ramBytesUsed:70768
CACHE.searcher.filterCache.size:12
CACHE.searcher.filterCache.warmupTime:1

> -Original Message-
> From: Hongxu Ma [mailto:inte...@outlook.com]
> Sent: Tuesday, February 18, 2020 5:32 AM
> To: solr-user@lucene.apache.org
> Subject: Re: A question about solr filter cache
> 
> @Erick Erickson<mailto:erickerick...@gmail.com> and @Mikhail Khludnev
> 
> got it, the explanation is very clear.
> 
> Thank you for your help.
> 
> From: Hongxu Ma 
> Sent: Tuesday, February 18, 2020 10:22
> To: Vadim Ivanov ; solr-
> u...@lucene.apache.org 
> Subject: Re: A question about solr filter cache
> 
> Thank you @Vadim Ivanov<mailto:vadim.iva...@spb.ntk-intourist.ru>
> I know that admin page, but I cannot find the memory usage of filter cache
> (only has "CACHE.searcher.filterCache.size", I think it's the used slot
number
> of filtercache)
> 
> There is my output (solr version 7.3.1):
> 
> filterCache
> 
>   *
> 
> class:
> org.apache.solr.search.FastLRUCache
>   *
> 
> description:
> Concurrent LRU Cache(maxSize=512, initialSize=512, minSize=460,
> acceptableSize=486, cleanupThread=false)
>   *   stats:
>  *
> 
> CACHE.searcher.filterCache.cumulative_evictions:
> 0
>  *
> 
> CACHE.searcher.filterCache.cumulative_hitratio:
> 0.5
>  *
> 
> CACHE.searcher.filterCache.cumulative_hits:
> 1
>  *
> 
> CACHE.searcher.filterCache.cumulative_inserts:
> 1
>  *
> 
> CACHE.searcher.filterCache.cumulative_lookups:
> 2
>  *
> 
> CACHE.searcher.filterCache.evictions:
> 0
>  *
> 
> CACHE.searcher.filterCache.hitratio:
> 0.5
>  *
> 
> CACHE.searcher.filterCache.hits:
> 1
>  *
> 
> CACHE.searcher.filterCache.inserts:
> 1
>  *
> 
> CACHE.searcher.filterCache.lookups:
> 2
>  *
> 
> CACHE.searcher.filterCache.size:
> 1
>  *
> 
> CACHE.searcher.filterCache.warmupTime:
> 0
> 
> 
> 
> 
> From: Vadim Ivanov 
> Sent: Monday, February 17, 2020 17:51
> To: solr-user@lucene.apache.org 
> Subject: RE: A question about solr filter cache
> 
> You can easily check amount of RAM used by core filterCache in Admin UI:
> Choose core - Plugins/Stats - Cache - filterCache It shows useful
information
> on configuration, statistics and current RAM usage by filter cache, as
well as
> some examples of current filtercaches in RAM Core, for ex, with 10 mln
docs
> uses 1.3 MB of Ram for every filterCache
> 
> 
> > -Original Message-
> > From: Hongxu Ma [mailto:inte...@outlook.com]
> > Sent: Monday, February 17, 2020 12:13 PM
> > To: solr-user@lucene.apache.org
> > Subject: A question about solr filter cache
> >
> > Hi
> > I want to know the internal of solr filter cache, especially its
> > memory
> usage.
> >
> > I googled some pages:
> > https://teaspoon-consulting.com/articles/solr-cache-tuning.html
> > https://lucene.472066.n3.nabble.com/Solr-Filter-Cache-Size-td4120912.h
> > tml
> > (Erick Erickson's answer)
> >
> > All of them said its structure is: fq => a bitmap (total doc number
> > bits),
> but I
> > think it's not so simple, reason:
> > Given total doc number is 1 billion, each filter cache entry will use
> nearly
> > 1GB(10/8 bit), it's too big and very easy to make solr OOM (I
> > have
> a
> > 1 billion doc cluster, looks it works well)
> >
> > And I also checked solr node, but cannot find the details (only saw
> > using DocSets structure)
> >
> > So far, I guess:
> >
> >   *   degenerate into an doc id array/list when the bitmap is sparse
> >   *   using some compressed bitmap, e.g. roaring bitmaps
> >
> > which one is correct? or another answer, thanks you very much!
> 




RE: A question about solr filter cache

2020-02-17 Thread Vadim Ivanov
You can easily check amount of RAM used by core filterCache in Admin UI:
Choose core - Plugins/Stats - Cache - filterCache
It shows useful information on configuration, statistics and current RAM
usage by filter cache,
as well as some examples of current filtercaches in RAM
Core, for ex, with 10 mln docs uses 1.3 MB of Ram for every filterCache


> -Original Message-
> From: Hongxu Ma [mailto:inte...@outlook.com]
> Sent: Monday, February 17, 2020 12:13 PM
> To: solr-user@lucene.apache.org
> Subject: A question about solr filter cache
> 
> Hi
> I want to know the internal of solr filter cache, especially its memory
usage.
> 
> I googled some pages:
> https://teaspoon-consulting.com/articles/solr-cache-tuning.html
> https://lucene.472066.n3.nabble.com/Solr-Filter-Cache-Size-td4120912.html
> (Erick Erickson's answer)
> 
> All of them said its structure is: fq => a bitmap (total doc number bits),
but I
> think it's not so simple, reason:
> Given total doc number is 1 billion, each filter cache entry will use
nearly
> 1GB(10/8 bit), it's too big and very easy to make solr OOM (I have
a
> 1 billion doc cluster, looks it works well)
> 
> And I also checked solr node, but cannot find the details (only saw using
> DocSets structure)
> 
> So far, I guess:
> 
>   *   degenerate into an doc id array/list when the bitmap is sparse
>   *   using some compressed bitmap, e.g. roaring bitmaps
> 
> which one is correct? or another answer, thanks you very much!




RE: Solr grouping with offset

2020-02-14 Thread Vadim Ivanov
group.mincount ? Never heard of it. It exists?
May be you have in mind facet.mincount and second approach mentioned earlier:

> > > > Next approach was to use facet first with facet.mincount=3, then
> > > > find docs ids by every facet result  and then delete docs by id.
> > > > That way seems to me  too complicated for the task.

> -Original Message-
> From: Saurabh Sharma [mailto:saurabh.infoe...@gmail.com]
> Sent: Friday, February 14, 2020 4:36 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr grouping with offset
> 
> Hi,
> 
> If you want to sort on your field and want to put a count restriction too then
> you have to use mincount. That seems to be best approach for your
> problem.
> 
> Thanks
> Saurabh
> 
> On Fri, Feb 14, 2020, 6:24 PM Vadim Ivanov < vadim.iva...@spb.ntk-
> intourist.ru> wrote:
> 
> > Example of gtouping with empty groups in results:
> > Filed1 = rr_group, field2 = rr_updatedate Problem is that I have tens
> > of million groups in result and only several thousand with  "numFound"
> > >2
> >
> > "params":{
> >   "q":"*:* ",
> >   "group.sort":"rr_updatedate desc ",
> >   "group.limit":"-1",
> >   "fl":"rr_group,rr_adl,rr_createdate,rr_calctaskkey ",
> >   "group.offset":"2",
> >   "wt":"json",
> >   "group.field":"rr_group",
> >   "group":"true"}},
> >   "grouped":{
> > "rr_group":{
> >   "matches":41475082,
> >   "groups":[{
> >   "groupValue":"164370:20200707:23:251",
> >   "doclist":{"numFound":1,"start":2,"docs":[]
> >   }},
> > {
> >   "groupValue":"163942:20200708:22:251",
> >   "doclist":{"numFound":1,"start":2,"docs":[]
> >   }},
> >     {
> >   "groupValue":"163943:20200708:22:251",
> >   "doclist":{"numFound":1,"start":2,"docs":[]
> >   }},
> > {
> >   "groupValue":"164355:20200708:22:251",
> >   "doclist":{"numFound":1,"start":2,"docs":[]
> >
> > > -Original Message-
> > > From: Paras Lehana [mailto:paras.leh...@indiamart.com]
> > > Sent: Friday, February 14, 2020 3:37 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Solr grouping with offset
> > >
> > > It would be better if you give us an example.
> > >
> > > On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
> > >  wrote:
> > >
> > > > Hello guys!
> > > > I need an advise. My task is to delete some documents in collection.
> > > > Del algorithm is following:
> > > > Group docs by field1  with sort by field2 and delete every 3 and
> > > > following occurrences in every group.
> > > > Unfortunately I didn't find easy way to do so.
> > > > Closest approach was to use group.offset = 2, but  result set is
> > > > polluted with empty groups with no documents (they have less then
> > > > 3
> > docs
> > > in group).
> > > > May be I'm missing smth and there is way not to receive empty
> > > > groups in results?
> > > > Next approach was to use facet first with facet.mincount=3, then
> > > > find docs ids by every facet result  and then delete docs by id.
> > > > That way seems to me  too complicated for the task.
> > > > What's the best use case for the task?
> > > >
> > >
> > >
> > > --
> > > --
> > > Regards,
> > >
> > > *Paras Lehana* [65871]
> > > Development Engineer, *Auto-Suggest*, IndiaMART InterMESH Ltd,
> > >
> > > 11th Floor, Tower 2, Assotech Business Cresterra, Plot No. 22,
> > > Sector
> > 135,
> > > Noida, Uttar Pradesh, India 201305
> > >
> > > Mob.: +91-9560911996
> > > Work: 0120-4056700 | Extn:
> > > *11096*
> > >
> > > --
> > > *
> > > *
> > >
> > >  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
> >
> >



RE: Solr grouping with offset

2020-02-14 Thread Vadim Ivanov
Example of gtouping with empty groups in results:
Filed1 = rr_group, field2 = rr_updatedate
Problem is that I have tens of million groups in result and only several 
thousand with  "numFound" >2
   
"params":{
  "q":"*:* ",
  "group.sort":"rr_updatedate desc ",
  "group.limit":"-1",
  "fl":"rr_group,rr_adl,rr_createdate,rr_calctaskkey ",
  "group.offset":"2",
  "wt":"json",
  "group.field":"rr_group",
  "group":"true"}},
  "grouped":{
"rr_group":{
  "matches":41475082,
  "groups":[{
  "groupValue":"164370:20200707:23:251",
  "doclist":{"numFound":1,"start":2,"docs":[]
  }},
{
  "groupValue":"163942:20200708:22:251",
  "doclist":{"numFound":1,"start":2,"docs":[]
  }},
{
  "groupValue":"163943:20200708:22:251",
  "doclist":{"numFound":1,"start":2,"docs":[]
  }},
    {
  "groupValue":"164355:20200708:22:251",
  "doclist":{"numFound":1,"start":2,"docs":[]

> -Original Message-
> From: Paras Lehana [mailto:paras.leh...@indiamart.com]
> Sent: Friday, February 14, 2020 3:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr grouping with offset
> 
> It would be better if you give us an example.
> 
> On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
>  wrote:
> 
> > Hello guys!
> > I need an advise. My task is to delete some documents in collection.
> > Del algorithm is following:
> > Group docs by field1  with sort by field2 and delete every 3 and
> > following occurrences in every group.
> > Unfortunately I didn't find easy way to do so.
> > Closest approach was to use group.offset = 2, but  result set is
> > polluted with empty groups with no documents (they have less then 3 docs
> in group).
> > May be I'm missing smth and there is way not to receive empty groups
> > in results?
> > Next approach was to use facet first with facet.mincount=3, then find
> > docs ids by every facet result  and then delete docs by id.
> > That way seems to me  too complicated for the task.
> > What's the best use case for the task?
> >
> 
> 
> --
> --
> Regards,
> 
> *Paras Lehana* [65871]
> Development Engineer, *Auto-Suggest*,
> IndiaMART InterMESH Ltd,
> 
> 11th Floor, Tower 2, Assotech Business Cresterra, Plot No. 22, Sector 135,
> Noida, Uttar Pradesh, India 201305
> 
> Mob.: +91-9560911996
> Work: 0120-4056700 | Extn:
> *11096*
> 
> --
> *
> *
> 
>  <https://www.facebook.com/IndiaMART/videos/578196442936091/>



RE: Deleting Data from SOLR Collection.

2020-02-14 Thread Vadim Ivanov
Probably solution is here
https://stackoverflow.com/questions/51416042/solr-error-stream-body-is-disab
led/51420987

> -Original Message-
> From: Nitish Kumar [mailto:nnitishku...@firstam.com]
> Sent: Friday, February 14, 2020 10:28 AM
> To: solr-user@lucene.apache.org
> Subject: Deleting Data from SOLR Collection.
> 
> Hi ,
> I am working on SOLR upgrade from my current version  to SOLR 8.4.1
version
> and I am unable to delete indexed data from solr collection .
> I have tried this URL
> http://localhost:8983/solr/TakTech/update?stream.body=
> *:*=true pdate?stream.body=%3cdelete%3e%3cquery%3e*:*%3c/query%3e%3c/del
> ete%3e=true>
> and I am getting response like this
> {  "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Stream Body is disabled. See
> http://lucene.apache.org/solr/guide/requestdispatcher-in-solrconfig.html
> for help",
> "code":400}}
> 
> I wanted to know ,how can I delete data from my collection is there any
> configuration issue ?
> 
> Thanks & Regards
> N Kumar
> 
> 
> **
> 
> This message may contain confidential or proprietary information intended
> only for the use of the
> addressee(s) named above or may contain information that is legally
> privileged. If you are not the intended addressee, or the person
responsible
> for delivering it to the intended addressee, you are hereby notified that
> reading, disseminating, distributing or copying this message is strictly
> prohibited. If you have received this message by mistake, please
> immediately notify us by replying to the message and delete the original
> message and any copies immediately thereafter.
> 
> If you received this email as a commercial message and would like to opt
out
> of future commercial messages, please let us know and we will remove you
> from our distribution list.
> 
> Thank you.~
> **
> 
> FAFLD



Solr grouping with offset

2020-02-14 Thread Vadim Ivanov
Hello guys!
I need an advise. My task is to delete some documents in collection.
Del algorithm is following:
Group docs by field1  with sort by field2 and delete every 3 and following 
occurrences in every group.
Unfortunately I didn't find easy way to do so.
Closest approach was to use group.offset = 2, but  result set is polluted with 
empty groups with no documents (they have less then 3 docs in group).
May be I'm missing smth and there is way not to receive empty groups in results?
Next approach was to use facet first with facet.mincount=3, then find docs ids 
by every facet result  and then delete docs by id.
That way seems to me  too complicated for the task.
What's the best use case for the task?


RE: Exceptions in solr log

2019-12-28 Thread Vadim Ivanov
Hi,
I'm facing the same problem with Solrcloud 7x - 8x.
I have TLOG type of replicas and when I delete Leader,  log is always full
of this:
2019-12-28 14:46:56.239 ERROR (indexFetcher-45942-thread-1) [   ]
o.a.s.h.IndexFetcher No files to download for index generation: 7166
2019-12-28 14:48:03.157 ERROR (indexFetcher-45881-thread-1) [   ]
o.a.s.h.IndexFetcher No files to download for index generation: 10588
Unfortunately, by this error it's hard to say even what exact replica, shard
and collection is in trouble. 
Sometimes, indexing helps - my guess that after commit slave replicas
somehow understands what index generation should be retrieved from new
leader.
Sometimes I have to restart node.

-- 
Vadim

> -Original Message-
> From: Akreeti Agarwal [mailto:akree...@hcl.com]
> Sent: Friday, December 27, 2019 8:20 AM
> To: solr-user@lucene.apache.org
> Subject: Exceptions in solr log
> 
> Hi All,
> 
> Please help me with these exceptions and their workarounds:
> 
> 1. org.apache.solr.common.SolrException:
org.apache.solr.search.SyntaxError:
> Cannot parse
> 2. o.a.s.h.IndexFetcher No files to download for index generation: 1394327
> 3. o.a.s.h.a.LukeRequestHandler Error getting file length for [segments_b]
(this
> one is warning as discussed)
> 
> I am getting these errors always in my solr logs, what can be the reason
> behind them and how should I resolve it.
> 
> 
> Thanks & Regards,
> Akreeti Agarwal
> ::DISCLAIMER::
> 
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only. E-mail transmission is not
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or may contain
viruses in
> transmission. The e mail and its contents (with or without referred
errors)
> shall therefore not attach any liability on the originator or HCL or its
affiliates.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the views or opinions of HCL or its
> affiliates. Any form of reproduction, dissemination, copying, disclosure,
> modification, distribution and / or publication of this message without
the
> prior written consent of authorized representative of HCL is strictly
> prohibited. If you have received this email in error please delete it and
notify
> the sender immediately. Before opening any email and/or attachments,
> please check them for viruses and other defects.
> 



RE: SQL data import handler

2019-09-09 Thread Vadim Ivanov
Hi,
Latest jdbc driver 7.4.1 seems to support JRE 8, 11, 12
https://www.microsoft.com/en-us/download/details.aspx?id=58505
You have to delete all previous versions of Sql Server jdbc driver from Solr 
installation (/solr/server/lib/ in my case)

-- 
Vadim

> -Original Message-
> From: Friscia, Michael [mailto:michael.fris...@yale.edu]
> Sent: Monday, September 09, 2019 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: SQL data import handler
> 
> I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with default-jre
> which installed version 11. So after a day of trying to make my Microsoft SQL
> Server data import handler work and failing, I built a new VM and installed
> JRE 8 and then everything works perfectly.
> 
> The root of the problem was the elimination of java.bind.xml in JRE 9. I’m not
> a Java programmer so I’m only going by what I uncovered digging through the
> error logs. I am not positive this is the only error to deal with, for all I 
> know
> fixing that will just uncover something else that needs repair. There were
> solutions where you compile SOLR using Maven but this is moving out of my
> comfort zone as well as long term strategy to keep SOLR management (as well
> as other Linux systems management) out-of-the-box. There were also
> solutions to include some sort of dependency on this older library but I’m at 
> a
> loss on how to relate that to a SOLR install.
> 
> My questions, since I am not that familiar with Java dependencies:
> 
>   1.  Is it ok to run JRE 8 on a production server? It’s heavily firewalled 
> and
> SOLR, Zookeeper nor anything else on these servers is available off the 
> virtual
> network so it seems ok, but I try not to run very old versions of any 
> software.
>   2.  Is there a way to fix this and keep the installation out-of-the-box or 
> at
> least almost out of the box?
> 
> ___
> Michael Friscia
> Office of Communications
> Yale School of Medicine
> (203) 737-7932 - office
> (203) 931-5381 - mobile
> http://web.yale.edu




RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-02 Thread Vadim Ivanov
Timeout causes DIH to finish with error message. So, If I check DIH response to 
be sure 
that DIH session have finished without any mistakes it causes some trouble 
:)
I haven't check yet whether all records successfully imported to solr. 
Supposed that after timeout shard does not accept records from DIH.  Am I wrong?
-- 
Vadim


> -Original Message-
> From: Mikhail Khludnev [mailto:m...@apache.org]
> Sent: Monday, September 02, 2019 12:23 PM
> To: Vadim Ivanov; solr-user
> Subject: Re: Idle Timeout while DIH indexing and implicit sharding in 7.4
> 
> It seems like reasonable behavior. SolrWriter hangs while import is
> running, it holds DistributedZkUpdateProcessor, which holds
> SolrCmdDistributor, which keep client connection to shards open, which
> causes timeout.
> It might be worked around by supplying custom SolrWriter which finishes
> UpdateProcessor's chain from time to time and recreate it. However, that
> exception shouldn't cause any problem or it does?
> Also, it's worth to track as a jira, or mentioned in the ticket regarding
> adjusting DIH for Cloud.
> 
> On Mon, Sep 2, 2019 at 9:44 AM Vadim Ivanov <
> vadim.iva...@spb.ntk-intourist.ru> wrote:
> 
> > I’ve raised that timeout from 5 min to 40 min.
> >
> > It somehow mitigated the issue in my use case.
> >
> > Problem occurs when some updates goes to one shard in the beginning of
> > long DIH session, then all updates goes to other shards for more than 300
> > sec.
> >
> > And when ( after 300 sec) some updates goes again to the first shard it
> > faces closed connection problem.
> >
> >
> >
> > Either it’s right behavior or not – it’s hard to say….
> >
> >
> >
> >
> >
> > --
> >
> > Vadim
> >
> > *From:* Mikhail Khludnev [mailto:m...@apache.org]
> > *Sent:* Monday, September 02, 2019 1:31 AM
> > *To:* solr-user
> > *Cc:* vadim.iva...@spb.ntk-intourist.ru
> > *Subject:* Re: Idle Timeout while DIH indexing and implicit sharding in
> > 7.4
> >
> >
> >
> > Giving that
> >
> > org.apache.solr.common.util.FastInputStream.peek(FastInputStream.java:60)
> > at
> >
> >
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoad
> er.java:107)
> >
> >
> >
> >
> > JavabinLoader hangs on Stream.peek(), awaiting -1, and hit timeout. I
> > guess it's might be related with "closing sockets". I've looked through
> > CHANGES after 6.5.1, here's what might be examined for impacting this:
> >
> >
> >
> > * SOLR-10779: JavaBinCodec should use close consistently
> >
> > * SOLR-12290: Do not close any servlet streams and improve our servlet
> > stream
> >
> > Description Resource Path Location
> > * SOLR-12477: An update would return a client error(400) if it hit a
> > AlreadyClos
> >
> > * SOLR-12897: Introduce AlreadyClosedException to clean up silly close /
> > shutd
> >
> >
> >
> > Have nothing more than that so far.
> >
> >
> >
> > On Sun, Sep 1, 2019 at 8:50 PM swapna.minnaka...@copart.com <
> > swapna.minnaka...@copart.com> wrote:
> >
> > I am facing same exact issue. We never had any issue with 6.5.1 when doing
> > full index (initial bulk load)
> > After upgrading to 7.5.0, getting below exception and indexing is taking a
> > very long time
> >
> > 2019-09-01 10:11:27.436 ERROR (qtp1650813924-22) [c:c_collection s:shard1
> > r:core_node3 x:c_collection_shard1_replica_n1] o.a.s.h.RequestHandlerBase
> > java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout
> > expired: 30/30 ms
> > at
> >
> >
> org.eclipse.jetty.server.HttpInput$ErrorState.noContent(HttpInput.java:1080)
> > at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:313)
> > at
> >
> >
> org.apache.solr.servlet.ServletInputStreamWrapper.read(ServletInputStream
> Wrapper.java:74)
> > at
> >
> >
> org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:
> 100)
> > at
> >
> >
> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputS
> tream.java:79)
> > at
> > org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
> > at
> > org.apache.solr.common.util.FastInputStream.peek(FastInputStream.java:60)
> > at
> >
> >
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoad
&

RE: Searches across Cores

2019-08-09 Thread Vadim Ivanov


May be consider having one collection with implicit sharding ?
This way you can have all advantages of solrcloud and can control content of 
each core "manualy" as well as query them independently (=false)
... or some of them using =core1,core2 as was proposed before
Quote from doc
" If you created the collection and defined the "implicit" router at the time 
of creation, you can additionally define a router.field parameter to use a 
field from each document to identify a shard where the document belongs. If the 
field specified is missing in the document, however, the document will be 
rejected. You could also use the _route_ parameter to name a specific shard."
-- 
Vadim


> -Original Message-
> From: Komal Motwani [mailto:motwani.ko...@gmail.com]
> Sent: Friday, August 09, 2019 7:57 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Searches across Cores
> 
> For some good reasons, SolrCloud is not an option for me.
> I need to run nested graph queries so firing parallel queries and taking
> union/intersection won't work.
> I am aware of achieving this via shards however I am looking for ways to
> achieve this via multiple cores. We already have data existing in multiple
> cores on which i need to add this feature.
> 
> Thanks,
> Komal Motwani
> 
> On Fri, Aug 9, 2019 at 8:57 PM Erick Erickson 
> wrote:
> 
> > So my question is why do you have individual cores? Why not use SolrCloud
> > and collections and have this happen automatically?
> >
> > There may be very good reasons, this is more if a sanity check….
> >
> > > On Aug 9, 2019, at 8:02 AM, Jan Høydahl  wrote:
> > >
> > > USE request param =core1,core2 or if on separate machines
> > host:port/solr/core1,host:port/solr/core2
> > >
> > > Jan Høydahl
> > >
> > >> 9. aug. 2019 kl. 11:23 skrev Komal Motwani
> :
> > >>
> > >> Hi,
> > >>
> > >>
> > >>
> > >> I have a use case where I would like a query to span across Cores
> > >> (Multi-Core); all the cores involved do have same schema. I have started
> > >> using solr just recently and have been trying to find ways to achieve
> > this
> > >> but couldn’t find any solution so far (Distributed searches, shards are
> > not
> > >> what I am looking for). I remember in one of the tech talks, there was a
> > >> mention of this feature to be included in future releases. Appreciate
> > any
> > >> pointers to help me progress further.
> > >>
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Komal Motwani
> >
> >



RE: Solr join query

2019-07-29 Thread Vadim Ivanov
Erick,
I'm using query time join, that requires colocated collections.
I just have dictionaries replicas on all nodes of my cluster

Like this:
={!join score=none from=id fromIndex=collection to=dictionary}*:*

-- 
Vadim


> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, July 29, 2019 3:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr join query
> 
> Vadim:
> 
> Are you using streaming or the special “cross collection” join that requires
> colocated collection?
> 
> > On Jul 29, 2019, at 4:23 AM, Vadim Ivanov  intourist.ru> wrote:
> >
> > I'm using join of multivalued field to the id field of dictionary (another
> collection).
> > It's working pretty well
> >
> > --
> > Vadim
> >
> >> -Original Message-
> >> From: Rajdeep Sahoo [mailto:rajdeepsahoo2...@gmail.com]
> >> Sent: Monday, July 22, 2019 9:19 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Solr join query
> >>
> >> Can we join two solr collection based on multivalued field.
> >



RE: Solr join query

2019-07-29 Thread Vadim Ivanov
I'm using join of multivalued field to the id field of dictionary (another 
collection).
It's working pretty well

-- 
Vadim

> -Original Message-
> From: Rajdeep Sahoo [mailto:rajdeepsahoo2...@gmail.com]
> Sent: Monday, July 22, 2019 9:19 PM
> To: solr-user@lucene.apache.org
> Subject: Solr join query
> 
> Can we join two solr collection based on multivalued field.



RE: Solr 8.0.0 Customized Indexing

2019-06-25 Thread Vadim Ivanov


... and =false  if you want to index just new records and keep old ones.
-- 
Vadim


> -Original Message-
> From: Jan Høydahl [mailto:jan@cominvent.com]
> Sent: Tuesday, June 25, 2019 10:48 AM
> To: solr-user
> Subject: Re: Solr 8.0.0 Customized Indexing
> 
> Adjust your SQL (located in data-config.xml) to extract just what you need
> (add a WHERE clause)
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> > 25. jun. 2019 kl. 07:23 skrev Anuj Bhargava :
> >
> > Customized Indexing date specific
> >
> > We have a huge database of more than 10 years. How can I index just some
> of
> > the records - say for last 30 days. One of the fields in the database is
> > *date_upload* which contains the date when the record was uploaded.
> >
> > Currently using Cron to index -
> > curl -q
> > http://loaclhost:8983/solr/newsdata/dataimport?command=full-
> import=true=true
> >> /dev/null 2>&1




RE: Cannot set pollInterval in SolrCloud for PULL or TLOG replica

2019-04-22 Thread Vadim Ivanov
In my use case, I don't have bulk updates. 
I just have continuously heavy updates on most cores

But maybe you can try to set updateHandler.autoCommit.maxTime through config 
api before and after bulk updates.
updateHandler.autoCommit.maxTime
https://lucene.apache.org/solr/guide/7_6/config-api.html

-- 
Vadim


> -Original Message-
> From: Dmitry Vorotilin [mailto:d.voroti...@gmail.com]
> Sent: Wednesday, April 17, 2019 7:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Cannot set pollInterval in SolrCloud for PULL or TLOG replica
> 
> It looks like `/solr//replication?command=disablepoll` doesn't work
> in cloud mode so there's no way to change settings for interval as well as
> to say replicas to stop polling.
> My own conclusion: if you have bulk updates and commit with
> openSearcher=true only at the end PULL/TLOG replicas isn't your choice, the
> only option you have is NRT which burns CPU on all machines slowing down
> all select queries.
> 
> On Wed, Apr 17, 2019 at 3:25 PM Dmitry Vorotilin 
> wrote:
> 
> > Hi Vadim, thank you seems like we both had similar questions.
> > So I think that all confirms that it's not configurable for now. That's in
> > fact a pity because it only makes sense to use PULL/TLOG replicas in order
> > to save CPU and not reindex docs on every node but current situation with
> > reopening searcher every time ruins it all at least for bulk updates. The
> > only solution I see now is to use manual replication and trigger it on
> > every node after leader optimized index and this configuration was
> > available on master-salve legacy...
> >
> > On Tue, Apr 16, 2019 at 6:30 PM Vadim Ivanov <
> > vadim.iva...@spb.ntk-intourist.ru> wrote:
> >
> >> Hi, Dmitri
> >> There was discussion here a while ago...
> >>
> >> http://lucene.472066.n3.nabble.com/Soft-commit-and-new-replica-types-
> td4417253.html
> >> May be it helps you somehow.
> >>
> >> --
> >> Vadim
> >>
> >>
> >> > -Original Message-
> >> > From: Dmitry Vorotilin [mailto:d.voroti...@gmail.com]
> >> > Sent: Tuesday, April 16, 2019 9:41 AM
> >> > To: solr-user@lucene.apache.org
> >> > Subject: Cannot set pollInterval in SolrCloud for PULL or TLOG replica
> >> >
> >> > Hi everyone,
> >> >
> >> > We have SolrCloud cluster with 3 zk and 3 solr nodes. It's 1 shard only
> >> and
> >> > all replicas are PULL.
> >> > We have bulk updates so like once a day we reindex all cores (no soft
> >> > commits, only hard commit every 15s), do commit with
> openSearcher=true
> >> > and
> >> > all our indexes become available for search.
> >> >
> >> > The issue is that for PULL replication when leader reindexing starts it
> >> > downloads index every
> >> > hard commit / 2 seconds (o.a.s.h.ReplicationHandler Poll scheduled at an
> >> > interval of 7000ms) then puts index into proper directory and just
> >> reopens
> >> > searcher so that we see no changes on leader because there was no
> commit
> >> > with openSearcher=true yet and that index keeps growing on PULL
> >> replicas.
> >> >
> >> > Judging by this page
> >> > <https://lucene.apache.org/solr/guide/7_7/index-replication.html#index-
> >> > replication-in-solr>
> >> > there's no setting for pollInterval or when to start replication on
> >> slaves
> >> > in SolrCloud and the info is rather confusing because in cloud we still
> >> use
> >> > the same handlers which we cannot configure.
> >> >
> >> > We changed replication from NRT to PULL because we don't need
> realtime
> >> > and
> >> > burn CPU with bulk updates on every machine, but this constantly
> >> catching
> >> > up index on slaves isn't any better...
> >> >
> >> > Do you know any way to fix it?
> >>
> >>



RE: Cannot set pollInterval in SolrCloud for PULL or TLOG replica

2019-04-16 Thread Vadim Ivanov
Hi, Dmitri
There was discussion here a while ago...
http://lucene.472066.n3.nabble.com/Soft-commit-and-new-replica-types-td4417253.html
May be it helps you somehow.

-- 
Vadim


> -Original Message-
> From: Dmitry Vorotilin [mailto:d.voroti...@gmail.com]
> Sent: Tuesday, April 16, 2019 9:41 AM
> To: solr-user@lucene.apache.org
> Subject: Cannot set pollInterval in SolrCloud for PULL or TLOG replica
> 
> Hi everyone,
> 
> We have SolrCloud cluster with 3 zk and 3 solr nodes. It's 1 shard only and
> all replicas are PULL.
> We have bulk updates so like once a day we reindex all cores (no soft
> commits, only hard commit every 15s), do commit with openSearcher=true
> and
> all our indexes become available for search.
> 
> The issue is that for PULL replication when leader reindexing starts it
> downloads index every
> hard commit / 2 seconds (o.a.s.h.ReplicationHandler Poll scheduled at an
> interval of 7000ms) then puts index into proper directory and just reopens
> searcher so that we see no changes on leader because there was no commit
> with openSearcher=true yet and that index keeps growing on PULL replicas.
> 
> Judging by this page
>  replication-in-solr>
> there's no setting for pollInterval or when to start replication on slaves
> in SolrCloud and the info is rather confusing because in cloud we still use
> the same handlers which we cannot configure.
> 
> We changed replication from NRT to PULL because we don't need realtime
> and
> burn CPU with bulk updates on every machine, but this constantly catching
> up index on slaves isn't any better...
> 
> Do you know any way to fix it?



RE: What's the deal with dataimporthandler overwriting indexes?

2019-02-12 Thread Vadim Ivanov
Hi!
If clean=true then index will be replaced completely by the new import. That is 
how it is supposed to work.
If you don't want preemptively delete your index set =false. And set 
=true instead of =true
Are you sure about optimize? Do you really need it? Usually it's very costly.
So, I'd try:
dataimport?command=full-import=false=true

If nevertheless nothing imported, please check the log
-- 
Vadim



> -Original Message-
> From: Joakim Hansson [mailto:joakim.hansso...@gmail.com]
> Sent: Tuesday, February 12, 2019 12:47 PM
> To: solr-user@lucene.apache.org
> Subject: What's the deal with dataimporthandler overwriting indexes?
> 
> Hi!
> We are currently upgrading from solr 6.2 master slave setup to solr 7.6
> running solrcloud.
> I dont know if I've missed something really trivial, but everytime I start
> a full import (dataimport?command=full-import=true=true)
> the
> old index gets overwritten by the new import.
> 
> In 6.2 this wasn't really a problem since I could disable replication in
> the API on the master and enable it once the import was completed.
> With 7.6 and solrcloud we use NRT-shards and replicas since those are the
> only ones that support rule-based replica placement and whenever I start a
> new import the old index is overwritten all over the solrcloud cluster.
> 
> I have tried changing to clean=false, but that makes the import finish
> without adding any docs.
> Doesn't matter if I use soft or hard commits.
> 
> I don't get the logic in this. Why would you ever want to delete an
> existing index before there is a new one in place? What is it I'm missing
> here?
> 
> Please enlighten me.



RE: unable to create new threads: out-of-memory issues

2019-02-12 Thread Vadim Ivanov
Hi!
I had the same issue and found that actual problem with the file limit (in 
spite of the error message)
To increase file limit:

On Linux, you can increase the limits by running the following command as root:
sysctl -w vm.max_map_count=262144

To set this value permanently, update the vm.max_map_count setting in 
/etc/sysctl.conf. 
To verify after rebooting, run sysctl vm.max_map_count.

Hope, it'll help
-- 
Vadim


> -Original Message-
> From: Martin Frank Hansen (MHQ) [mailto:m...@kmd.dk]
> Sent: Tuesday, February 12, 2019 3:25 PM
> To: solr-user@lucene.apache.org
> Subject: unable to create new threads: out-of-memory issues
> 
> Hi,
> 
> I am trying to create an index on a small Linux server running Solr-7.5.0, but
> keep running into problems.
> 
> When I try to index a file-folder of roughly 18 GB (18000 files) I get the
> following error from the server:
> 
> java.lang.OutOfMemoryError: unable to create new native thread.
> 
> From the server I can see the following limits:
> 
> User$ ulimit -a
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size   (blocks, -f) unlimited
> pending signals  (-i) 257568
> max locked memory (kbytes, -l) 64
> max memory size  (kbytes, -m) unlimited
> open files(-n) 1024
> pipe size   (512 bytes, -p) 8
> POSIX message queues(bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 8192
> cpu time   (seconds, -t) unlimited
> max user processes  (-u) 257568
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
> 
> I do not see any limits on threads only on open files.
> 
> I have added a autoCommit of a maximum of 1000 documents, but that did
> not help. How can I increase the thread limit, or is there another way of
> solving this issue? Any help is appreciated.
> 
> Best regards
> 
> Martin
> 
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s
> Privatlivspolitik, der fortæller, hvordan
> vi behandler oplysninger om dig.
> 
> Protection of your personal data is important to us. Here you can read KMD’s
> Privacy Policy outlining how we process
> your personal data.
> 
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-
> mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og
> ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre 
> fejl,
> som kan påvirke computeren eller it-systemet, hvori den modtages og læses,
> åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar for
> tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.
> 
> Please note that this message may contain confidential information. If you
> have received this message by mistake, please inform the sender of the
> mistake by sending a reply, then delete the message from your system
> without making, distributing or retaining any copies of it. Although we 
> believe
> that the message and any attachments are free from viruses and other errors
> that might affect the computer or it-system where it is received and read, the
> recipient opens the message at his or her own risk. We assume no
> responsibility for any loss or damage arising from the receipt or use of this
> message.



RE: SolrCloud recovery

2019-01-25 Thread Vadim Ivanov
 You can try to tweak solr.xml


coreLoadThreads
Specifies the number of threads that will be assigned to load cores in parallel.

https://lucene.apache.org/solr/guide/7_6/format-of-solr-xml.html

> 
> > -Original Message-
> > From: Hendrik Haddorp [mailto:hendrik.hadd...@gmx.net]
> > Sent: Friday, January 25, 2019 11:39 AM
> > To: solr-user@lucene.apache.org
> > Subject: SolrCloud recovery
> >
> > Hi,
> >
> > I have a SolrCloud with many collections. When I restart an instance and
> > the replicas are recovering I noticed that number replicas recovering at
> > one point is usually around 5. This results in the recovery to take
> > rather long. Is there a configuration option that controls how many
> > replicas can recover in parallel?
> >
> > thanks,
> > Hendrik



RE: SolrCloud recovery

2019-01-25 Thread Vadim Ivanov


You can try to tweak solr.xml

> -Original Message-
> From: Hendrik Haddorp [mailto:hendrik.hadd...@gmx.net]
> Sent: Friday, January 25, 2019 11:39 AM
> To: solr-user@lucene.apache.org
> Subject: SolrCloud recovery
> 
> Hi,
> 
> I have a SolrCloud with many collections. When I restart an instance and
> the replicas are recovering I noticed that number replicas recovering at
> one point is usually around 5. This results in the recovery to take
> rather long. Is there a configuration option that controls how many
> replicas can recover in parallel?
> 
> thanks,
> Hendrik



RE: join query and new searcher on joined collection

2019-01-15 Thread Vadim Ivanov
I see, thank you very much!

> -Original Message-
> From: Mikhail Khludnev [mailto:m...@apache.org]
> Sent: Tuesday, January 15, 2019 6:45 PM
> To: solr-user
> Subject: Re: join query and new searcher on joined collection
> 
> It doesn't invalidate anything. It just doesn't matches to the join query
> from older collection2 see
> https://github.com/apache/lucene-
> solr/blob/b7f99fe55a6fb6e7b38828676750b3512d6899a1/solr/core/src/java/o
> rg/apache/solr/search/JoinQParserPlugin.java#L570
> So, after commit collection2 following join at collection1 just won't hit
> filter cache, and will be cached as new entry and lately the old entry will
> be evicted.
> 
> On Tue, Jan 15, 2019 at 5:30 PM Vadim Ivanov <
> vadim.iva...@spb.ntk-intourist.ru> wrote:
> 
> > Thanx, Mikhail for reply
> > > collection1 has no idea about new searcher in collection2.
> > I suspected it. :)
> >
> > So, when "join" query arrives searcher on collection1 has no chance to use
> > filter cache, stored before.
> > I suppose it invalidates filter cache, am I right?
> >
> > ={!join score=none from=id fromIndex=collection2 to=field1}*:*
> >
> > > On Tue, Jan 15, 2019 at 1:18 PM Vadim Ivanov <
> > > vadim.iva...@spb.ntk-intourist.ru> wrote:
> > >
> > > > Sory, I've sent unfinished message
> > > > So, query on collection1
> > > > q=*:*{!join score=none from=id fromIndex=collection2 to=field1}*:*
> > > >
> > > > The question is what happened with autowarming and new searchers on
> > > > collection1 when new searcher starts on collection2?
> > > > IMHO when request with join comes it's impossible to use caches on
> > > > collection1 and ...
> > > > Does new searcher starts on collection1 as well?
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru]
> > > > > Sent: Tuesday, January 15, 2019 1:00 PM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: join query and new searcher on joined collection
> > > > >
> > > > > Solr 6.3
> > > > >
> > > > >
> > > > >
> > > > > I have a query like this:
> > > > >
> > > > > q=*:*{!join score=none from=id fromIndex=hss_4 to=rpk_hdquotes
> > > v=$qq}*:*
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Vadim
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> >
> >
> 
> --
> Sincerely yours
> Mikhail Khludnev



RE: join query and new searcher on joined collection

2019-01-15 Thread Vadim Ivanov
Thanx, Mikhail for reply
> collection1 has no idea about new searcher in collection2.
I suspected it. :) 

So, when "join" query arrives searcher on collection1 has no chance to use 
filter cache, stored before.
I suppose it invalidates filter cache, am I right?

={!join score=none from=id fromIndex=collection2 to=field1}*:* 
 
> On Tue, Jan 15, 2019 at 1:18 PM Vadim Ivanov <
> vadim.iva...@spb.ntk-intourist.ru> wrote:
> 
> > Sory, I've sent unfinished message
> > So, query on collection1
> > q=*:*{!join score=none from=id fromIndex=collection2 to=field1}*:*
> >
> > The question is what happened with autowarming and new searchers on
> > collection1 when new searcher starts on collection2?
> > IMHO when request with join comes it's impossible to use caches on
> > collection1 and ...
> > Does new searcher starts on collection1 as well?
> >
> >
> > > -Original Message-
> > > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru]
> > > Sent: Tuesday, January 15, 2019 1:00 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: join query and new searcher on joined collection
> > >
> > > Solr 6.3
> > >
> > >
> > >
> > > I have a query like this:
> > >
> > > q=*:*{!join score=none from=id fromIndex=hss_4 to=rpk_hdquotes
> v=$qq}*:*
> > >
> > >
> > >
> > > --
> > >
> > > Vadim
> > >
> > >
> >
> >
> >
> 
> --
> Sincerely yours
> Mikhail Khludnev



RE: join query and new searcher on joined collection

2019-01-15 Thread Vadim Ivanov
Sory, I've sent unfinished message
So, query on collection1
q=*:*{!join score=none from=id fromIndex=collection2 to=field1}*:*

The question is what happened with autowarming and new searchers on
collection1 when new searcher starts on collection2?
IMHO when request with join comes it's impossible to use caches on
collection1 and ...
Does new searcher starts on collection1 as well?


> -Original Message-
> From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru]
> Sent: Tuesday, January 15, 2019 1:00 PM
> To: solr-user@lucene.apache.org
> Subject: join query and new searcher on joined collection
> 
> Solr 6.3
> 
> 
> 
> I have a query like this:
> 
> q=*:*{!join score=none from=id fromIndex=hss_4 to=rpk_hdquotes v=$qq}*:*
> 
> 
> 
> --
> 
> Vadim
> 
> 




join query and new searcher on joined collection

2019-01-15 Thread Vadim Ivanov
Solr 6.3

 

I have a query like this:

q=*:*{!join score=none from=id fromIndex=hss_4 to=rpk_hdquotes v=$qq}*:*

 

-- 

Vadim

 



RE: Solr Replication

2019-01-07 Thread Vadim Ivanov
Using cdcr with new replica types be aware of 
https://issues.apache.org/jira/browse/SOLR-12057?focusedComm

Parallel indexing to both cluster might be an option as well
-- 
Vadim


> -Original Message-
> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
> Sent: Monday, January 07, 2019 11:10 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Replication
> 
> In SolrCloud there are Data Centers.
> Your Cluster 1 is DataCenter 1 and your Cluster 2 is Data Center 2.
> You can then use CDCR (Cross Data Center Replication).
> http://lucene.apache.org/solr/guide/7_0/cross-data-center-replication-
> cdcr.html
> 
> Nevertheless I would spend your Cluster 2 another 2 zookeeper instances.
> 
> Regards, Bernd
> 
> Am 07.01.19 um 06:39 schrieb Mannar mannan:
> > Hi All,
> >
> > I would like to configure master slave between two solr cloud clusters (for
> > failover). Below is the scenario
> >
> > Solr version : 7.0
> >
> > Cluster 1:
> > 3 zookeeper instances :   zk1, zk2, zk3
> > 2 solr instances : solr1, solr2
> >
> > Cluster 2:
> > 1 zookeeper instance : bkpzk1,
> > 1 solr instances : bkpsolr1, bkpsolr2
> >
> > Master / Slave :  solr1 / bkpsolr1
> >solr2 / bkpsolr2
> >
> > Is it possible to have master / slave replication configured for solr
> > instances running in cluster1 & cluster2 (for failover). Kindly let me know
> > the possibility.
> >



Re: Solr reload process flow

2018-12-27 Thread Vadim Ivanov
Hi!
(Solr 7.6 ,  Tlog replicas)
I have an issue while reloading collection with 100 shards and 3 replicas per 
shard residing on 5 nodes.
Configuration of that collection is pretty complex (90 external file fields)
When node starts cores load always successfully.

When I reload collection with collection api command: 
/admin/collections?action=RELOAD=col 
all 5 nodes stop responding and I have dead cluster. Only restarting solr on 
all nodes revives it.

When I decreased number of shards/cores by 5 times (to 20 shards instead of 
100)  Collection reloaded successfully.
My guess is that during Collection RELOAD , limit on threads is not honored and 
all cores try to reload simultaneously.

Erick wrote here ( 
http://lucene.472066.n3.nabble.com/collection-reload-leads-to-OutOfMemoryError-td4380754.html#a4380791
 )
➢ There are a limited number of threads that load in parallel when 
➢ starting up, depends on the configuration. The defaults are 3 threads 
➢ in stand-alone and 8 in Cloud (see: NodeConfig.java) 
➢
➢ public static final int DEFAULT_CORE_LOAD_THREADS = 3; 
➢ public static final int DEFAULT_CORE_LOAD_THREADS_IN_CLOUD = 8; 

But unfortunately  stumbling about source I can't find out the place and 
approve 
whether these "threads limit" plays any role in reload collection or not...   
though I lack the necessary skills in java
Maybe somebody can give a hint where to look?

There was discussion here as well
http://lucene.472066.n3.nabble.com/Solr-reload-process-flow-td4379966.html#none
-- 
Vadim




RE: Not able to see newly added filed in query results

2018-12-25 Thread Vadim Ivanov
In order to see newly added fields you have to reindex.

If there were any mistakes while reindexing they should appear in the log
file.

No clues in the kog?

 

-- 

 

From: Surender Reddy [mailto:suren...@swooptalent.com] 
Sent: Tuesday, December 25, 2018 8:15 AM
To: solr-user@lucene.apache.org
Subject: Not able to see newly added filed in query results

 

Hi Experts,

  I am not able to see the newly added filed(allJobReqIds) in search
results.

  From the Solr Admin UI, I don't see any indexed records count .

   Address is the one that I am seeing data and allJobReqIds which is
missing.

  Please help.

 



 



 

Attached schema.xml also.

 

Thanks,

  Surender.



RE: Nol Leader after nodes restart

2018-12-23 Thread Vadim Ivanov
Unfortunately nothing of the both works well for me.
1. Restarting all nodes leads to described situation on some shards.
Even if no alternatives for the shard it does not gain the leader
(all other replicas on down nodes). I suppose it waits for somw timeout.
But what timeout and can it be altered?
2. FORCELEADER simply does not work.
 As well as RELOAD and REBALANCElEADERS.

DELETEREPLICA until no alternatives, than ADDREPLICA - that trick works.

> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, December 24, 2018 4:09 AM
> To: solr-user
> Subject: Re: Nol Leader after nodes restart
> 
> There are a couple of options:
> 
> 1> stop all your nodes. Start them one at a time and wait for "leader
> election" to occur. This can take several minutes, but eventually the
> replicas on that machine will become the leader. Then start the other
> nodes, again one at a time waiting for them to recover fully before
> starting the next node.
> 
> 2> you can try the FORCELEADER collecrions API option..
> 
> The leater election and retry logic has been vastly improved in 7.3+
> (with some of the last improvements in 7.5).
> 
> Best,
> Erick
> 
> On Sun, Dec 23, 2018 at 1:43 AM Vadim Ivanov
>  wrote:
> >
> > Hi!
> > After restart of  nodes I have situation when no leader on shard can be
> > elected
> > Shard rpk51_222_306 resides on 3 nodes (solr00, solr06, solr09) with
> > corresponding replica names
> > (rpk51_222_306_00, rpk51_222_306_06, rpk51_222_306_09)
> > Logs looks like this
> > PeerSync: core=rpk51_222_306_00 url=http://solr00:8983/solr Requested 26
> > updates from http://solr06:8983/solr/rpk51_222_306_06/ but retrieved 25
> > PeerSync: core=rpk51_222_306_06 url=http://solr06:8983/solr Requested 29
> > updates from http://solr00:8983/solr/rpk51_222_306_00/ but retrieved 24
> > PeerSync: core=rpk51_222_306_09 url=http://solr09:8983/solr Requested 26
> > updates from http://solr06:8983/solr/rpk51_222_306_06/ but retrieved 25
> >
> > 00 and 09 tries to recover from 06 and fail
> > 06 tries to recover from 00 and fail
> >
> > It goes continuously every minute and forever
> >
> > How to break this deadlock loop?
> > --
> > Vadim
> >
> >



Nol Leader after nodes restart

2018-12-23 Thread Vadim Ivanov
Hi!
After restart of  nodes I have situation when no leader on shard can be
elected
Shard rpk51_222_306 resides on 3 nodes (solr00, solr06, solr09) with
corresponding replica names 
(rpk51_222_306_00, rpk51_222_306_06, rpk51_222_306_09)
Logs looks like this
PeerSync: core=rpk51_222_306_00 url=http://solr00:8983/solr Requested 26
updates from http://solr06:8983/solr/rpk51_222_306_06/ but retrieved 25
PeerSync: core=rpk51_222_306_06 url=http://solr06:8983/solr Requested 29
updates from http://solr00:8983/solr/rpk51_222_306_00/ but retrieved 24
PeerSync: core=rpk51_222_306_09 url=http://solr09:8983/solr Requested 26
updates from http://solr06:8983/solr/rpk51_222_306_06/ but retrieved 25

00 and 09 tries to recover from 06 and fail
06 tries to recover from 00 and fail

It goes continuously every minute and forever

How to break this deadlock loop?
-- 
Vadim




RE: REBALANCELEADERS is not reliable

2018-12-20 Thread Vadim Ivanov
Yes! It works!
I have tested RebalanceLeaders today with the patch provided by Endika Posadas. 
(http://lucene.472066.n3.nabble.com/Rebalance-Leaders-Leader-node-deleted-when-rebalancing-leaders-td4417040.html)
And at last it works as expected on my collection with 5 nodes and about 400 
shards.
Original patch was slightly incompatible with 7.6.0 
I hope this patch will help to try this feature with 7.6
https://drive.google.com/file/d/19z_MPjxItGyghTjXr6zTCVsiSJg1tN20

RebalanceLeaders was not very useful feature before 7.0 (as all replicas were 
NRT)
But new replica types made it very helpful to keep big clusters in order...

I wonder, why there is no any jira about this case (or maybe I missed it)?
Anyone who cares, please, help to create jira and improve this feature in the 
nearest releaase
-- 
Vadim

> -Original Message-
> From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru]
> Sent: Friday, December 07, 2018 6:13 PM
> To: solr-user@lucene.apache.org
> Subject: RE: REBALANCELEADERS is not reliable
> 
> I'm waiting for 7.6 or 7.5.1 and plan to apply patch from  Endika Posadas to 
> it.
> Then test again and hope it'll help
> --
> Vadim
> 
> 
> > -Original Message-
> > From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
> > Sent: Friday, December 07, 2018 12:01 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: REBALANCELEADERS is not reliable
> >
> > Thanks for looking this up.
> > It could be a hint where to jump into the code.
> > I wonder why they rejected a jira ticket about this problem?
> >
> > Regards, Bernd
> >
> > Am 06.12.18 um 16:31 schrieb Vadim Ivanov:
> > > Is solr-dev forum I came across this post
> > > http://lucene.472066.n3.nabble.com/Rebalance-Leaders-Leader-node-
> > deleted-when-rebalancing-leaders-td4417040.html
> > > May be it will shed some light?
> > >
> > >
> > >> -Original Message-
> > >> From: Atita Arora [mailto:atitaar...@gmail.com]
> > >> Sent: Thursday, November 29, 2018 11:03 PM
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Re: REBALANCELEADERS is not reliable
> > >>
> > >> Indeed, I tried that on 7.4 & 7.5 too, indeed did not work for me as 
> > >> well,
> > >> even with the preferredLeader property as recommended in the
> > >> documentation.
> > >> I handled it with a little hack but certainly this dint work as expected.
> > >> I can provide more details if there's a ticket.
> > >>
> > >> On Thu, Nov 29, 2018 at 8:42 PM Aman Tandon
> > >>  wrote:
> > >>
> > >>> ++ correction
> > >>>
> > >>> On Fri, Nov 30, 2018, 01:10 Aman Tandon  > >> wrote:
> > >>>
> > >>>> For me today, I deleted the leader replica of one of the two shard
> > >>>> collection. Then other replicas of that shard wasn't getting elected 
> > >>>> for
> > >>>> leader.
> > >>>>
> > >>>> After waiting for long tried the setting addreplicaprop preferred 
> > >>>> leader
> > >>>> on one of the replica then tried FORCELEADER but no luck. Then also
> > tried
> > >>>> rebalance but no help. Finally have to recreate the whole collection.
> > >>>>
> > >>>> Not sure what was the issue but both FORCELEADER AND
> REBALANCING
> > >> didn't
> > >>>> work if there was no leader however preferred leader property was
> > setted.
> > >>>>
> > >>>> On Wed, Nov 28, 2018, 12:54 Bernd Fehling <
> > >>> bernd.fehl...@uni-bielefeld.de
> > >>>> wrote:
> > >>>>
> > >>>>> Hi Vadim,
> > >>>>>
> > >>>>> thanks for confirming.
> > >>>>> So it seems to be a general problem with Solr 6.x, 7.x and might
> > >>>>> be still there in the most recent versions.
> > >>>>>
> > >>>>> But where to start to debug this problem, is it something not
> > >>>>> correctly stored in zookeeper or is overseer the problem?
> > >>>>>
> > >>>>> I was also reading something about a "leader queue" where possible
> > >>>>> leaders have to be requeued or something similar.
> > >>>>>
> > >>>>> May be I should try to get a situation where a "locked" core
> &

RE: Soft commit and new replica types

2018-12-13 Thread Vadim Ivanov
9KDWU
> > > >
> > > > Regards,
> > > > Edward
> > > >
> > > > Em dom, 9 de dez de 2018 16:56,  > > escreveu:
> > > >
> > > > >
> > > > >  If hard commit max time is 300 sec then commit happens every 300
> sec
> > > on
> > > > > tlog leader. And new segments pop up on the leader every 300 sec,
> > > during
> > > > > indexing. Polling interval on other replicas 150 sec, but not every
> > > poll
> > > > > attempt they fetch new segment from the leader, afaiu. Erick, do you
> > > mean
> > > > > that on all other  tlog replicas(not leaders) commit occurs every
> > poll?
> > > > > воскресенье, 09 декабря 2018г., 19:21 +03:00 от Erick Erickson
> > > > > erickerick...@gmail.com :
> > > > >
> > > > > >Not quite, 60. The polling interval is half the commit
> > > interval
> > > > > >
> > > > > >This has always bothered me a little bit, I wonder at the utility
> > of a
> > > > > >config param. We already have old-style replication with a
> > > > > >configurable polling interval. Under very heavy indexing loads, it
> > > > > >seems to me that either the tlogs will grow quite large or we'll be
> > > > > >pulling a lot of unnecessary segments across the wire, segments
> > > > > >that'll soon be merged away and the merged segment re-pulled.
> > > > > >
> > > > > >Apparently, though, nobody's seen this "in the wild", so it's
> > > > > >theoretical at this point.
> > > > > >On Sun, Dec 9, 2018 at 1:48 AM Vadim Ivanov
> > > > > < vadim.iva...@spb.ntk-intourist.ru> wrote:
> > > > > >
> > > > > > Thanks, Edward, for clues.
> > > > > > What bothers me is newSearcher start, warming, cache clear... all
> > > that
> > > > > CPU consuming stuff in my heavy-indexing scenario.
> > > > > > With NRT I had autoSoftCommit:  30 .
> > > > > > So I had new Searcher no more than  every 5 min on every replica.
> > > > > > To have more or less  the same effect with TLOG - PULL collection,
> > > > > > I suppose, I have to have  :  30
> > > > > > (yes, I understand that newSearchers start asynchronously on leader
> > > and
> > > > > replicas)
> > > > > > Am I right?
> > > > > > --
> > > > > > Vadim
> > > > > >
> > > > > >
> > > > > >> -Original Message-
> > > > > >> From: Edward Ribeiro [mailto:edward.ribe...@gmail.com]
> > > > > >> Sent: Sunday, December 09, 2018 12:42 AM
> > > > > >> To:  solr-user@lucene.apache.org
> > > > > >> Subject: Re: Soft commit and new replica types
> > > > > >>
> > > > > >> Some insights in the new replica types below:
> > > > > >>
> > > > > >> On Sat, December 8, 2018 08:42, Vadim Ivanov <
> > > > > >> vadim.iva...@spb.ntk-intourist.ru wrote:
> > > > > >>
> > > > > >>>
> > > > > >>> From Ref guide we have:
> > > > > >>> " NRT is the only type of replica that supports soft-commits..."
> > > > > >>> "If TLOG replica does become a leader, it will behave the same as
> > > if it
> > > > > >>> was a NRT type of replica."
> > > > > >>> Does it mean, that if we do not have NRT replicas in the cluster
> > > then
> > > > > >>> autoSoftCommit section in solconfig.xml Ignored completely (even
> > on
> > > > > TLOG
> > > > > >>> leader)?
> > > > > >>>
> > > > > >>
> > > > > >> No, not completely. Both TLOG and PULL nodes will periodically
> > poll
> > > the
> > > > > >> leader for changes in index segments' files and download those
> > > segments
> > > > > >> from the leader. If hard commit max time is defined in
> > > solrconfig.xml
> > > > > the
> > > > > >> polling interval of each replica will be half that value. Or else
> > > if the
> > > > > >

Re: Soft commit and new replica types

2018-12-09 Thread vadim . ivanov

 If hard commit max time is 300 sec then commit happens every 300 sec on tlog 
leader. And new segments pop up on the leader every 300 sec, during indexing. 
Polling interval on other replicas 150 sec, but not every poll attempt they 
fetch new segment from the leader, afaiu. Erick, do you mean that on all other  
tlog replicas(not leaders) commit occurs every poll?  воскресенье, 09 декабря 
2018г., 19:21 +03:00 от Erick Erickson  erickerick...@gmail.com :

>Not quite, 60. The polling interval is half the commit interval
>
>This has always bothered me a little bit, I wonder at the utility of a
>config param. We already have old-style replication with a
>configurable polling interval. Under very heavy indexing loads, it
>seems to me that either the tlogs will grow quite large or we'll be
>pulling a lot of unnecessary segments across the wire, segments
>that'll soon be merged away and the merged segment re-pulled.
>
>Apparently, though, nobody's seen this "in the wild", so it's
>theoretical at this point.
>On Sun, Dec 9, 2018 at 1:48 AM Vadim Ivanov
< vadim.iva...@spb.ntk-intourist.ru> wrote:
>
> Thanks, Edward, for clues.
> What bothers me is newSearcher start, warming, cache clear... all that CPU 
> consuming stuff in my heavy-indexing scenario.
> With NRT I had autoSoftCommit:  30 .
> So I had new Searcher no more than  every 5 min on every replica.
> To have more or less  the same effect with TLOG - PULL collection,
> I suppose, I have to have  :  30
> (yes, I understand that newSearchers start asynchronously on leader and 
> replicas)
> Am I right?
> --
> Vadim
>
>
>> -Original Message-
>> From: Edward Ribeiro [mailto:edward.ribe...@gmail.com]
>> Sent: Sunday, December 09, 2018 12:42 AM
>> To:  solr-user@lucene.apache.org
>> Subject: Re: Soft commit and new replica types
>>
>> Some insights in the new replica types below:
>>
>> On Sat, December 8, 2018 08:42, Vadim Ivanov <
>> vadim.iva...@spb.ntk-intourist.ru wrote:
>>
>>>
>>> From Ref guide we have:
>>> " NRT is the only type of replica that supports soft-commits..."
>>> "If TLOG replica does become a leader, it will behave the same as if it
>>> was a NRT type of replica."
>>> Does it mean, that if we do not have NRT replicas in the cluster then
>>> autoSoftCommit section in solconfig.xml Ignored completely (even on TLOG
>>> leader)?
>>>
>>
>> No, not completely. Both TLOG and PULL nodes will periodically poll the
>> leader for changes in index segments' files and download those segments
>> from the leader. If hard commit max time is defined in solrconfig.xml the
>> polling interval of each replica will be half that value. Or else if the
>> soft commit max time is defined then the replicas will use half the soft
>> commit max time as the interval. If neither are defined then the poll
>> interval will be 3 seconds (hard coded). See here:
>> https://github.com/apache/lucene-
>> solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/java/o
>> rg/apache/solr/cloud/ReplicateFromLeader.java#L68-L77
>>
>> If the TLOG is the leader it will index locally and append the doc to
>> transaction log as a NRT node would do as well as it will synchronously
>> replicate the data to other TLOG replicas' transaction logs (PULL nodes
>> don't have transaction logs). But TLOG/PULL replicas doesn't support soft
>> commits nor real time gets, afaik.
>>
>>>
>>
>>>
>>> 6
>>>
>>>
>>> Should we say that in autoCommit section openSearcher is always true in
>>> that case?
>>
>>
>>
>> 1
>> 3
>> 512m
>> false
>>
>>
>> Does it mean that new Searcher always starts on all replicas when hard
>> commit happens on leader?
>>
>>
>> Nope. Or at least, the searcher is not synchronously created. Each non
>> leader replica will periodically fetch the index changes from the leader
>> and open a new searcher to reflect those changes as seen here:
>> https://github.com/apache/lucene-
>> solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/java/o
>> rg/apache/solr/handler/IndexFetcher.java#L653
>> But it's important to note that the potential delay between the leader's
>> hard commit and the other replicas fetching those changes from the leader
>> and opening a new searcher to reflect latest changes.
>>
>> PS: I am still digging these new replica types so I can have misunderstood
>> or missed some aspect of it.
>>
>> Regards,
>> Edward
>


RE: Soft commit and new replica types

2018-12-09 Thread Vadim Ivanov
Thanks, Edward, for clues.
What bothers me is newSearcher start, warming, cache clear... all that CPU 
consuming stuff in my heavy-indexing scenario.
With NRT I had autoSoftCommit:   30. 
So I had new Searcher no more than  every 5 min on every replica.
To have more or less  the same effect with TLOG - PULL collection, 
I suppose, I have to have  :   30
(yes, I understand that newSearchers start asynchronously on leader and 
replicas)
Am I right?
-- 
Vadim


> -Original Message-
> From: Edward Ribeiro [mailto:edward.ribe...@gmail.com]
> Sent: Sunday, December 09, 2018 12:42 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Soft commit and new replica types
> 
> Some insights in the new replica types below:
> 
> On Sat, December 8, 2018 08:42, Vadim Ivanov <
> vadim.iva...@spb.ntk-intourist.ru wrote:
> 
> >
> > From Ref guide we have:
> > " NRT is the only type of replica that supports soft-commits..."
> > "If TLOG replica does become a leader, it will behave the same as if it
> > was a NRT type of replica."
> > Does it mean, that if we do not have NRT replicas in the cluster then
> > autoSoftCommit section in solconfig.xml Ignored completely (even on TLOG
> > leader)?
> >
> 
> No, not completely. Both TLOG and PULL nodes will periodically poll the
> leader for changes in index segments' files and download those segments
> from the leader. If hard commit max time is defined in solrconfig.xml the
> polling interval of each replica will be half that value. Or else if the
> soft commit max time is defined then the replicas will use half the soft
> commit max time as the interval. If neither are defined then the poll
> interval will be 3 seconds (hard coded). See here:
> https://github.com/apache/lucene-
> solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/java/o
> rg/apache/solr/cloud/ReplicateFromLeader.java#L68-L77
> 
> If the TLOG is the leader it will index locally and append the doc to
> transaction log as a NRT node would do as well as it will synchronously
> replicate the data to other TLOG replicas' transaction logs (PULL nodes
> don't have transaction logs). But TLOG/PULL replicas doesn't support soft
> commits nor real time gets, afaik.
> 
> >
> 
> > 
> >   6
> > 
> >
> > Should we say that in autoCommit section openSearcher is always true in
> > that case?
> 
> 
> 
>   1
>   3
>   512m
>   false
> 
> 
> Does it mean that new Searcher always starts on all replicas when hard
> commit happens on leader?
> 
> 
> Nope. Or at least, the searcher is not synchronously created. Each non
> leader replica will periodically fetch the index changes from the leader
> and open a new searcher to reflect those changes as seen here:
> https://github.com/apache/lucene-
> solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/java/o
> rg/apache/solr/handler/IndexFetcher.java#L653
> But it's important to note that the potential delay between the leader's
> hard commit and the other replicas fetching those changes from the leader
> and opening a new searcher to reflect latest changes.
> 
> PS: I am still digging these new replica types so I can have misunderstood
> or missed some aspect of it.
> 
> Regards,
> Edward



Soft commit and new replica types

2018-12-08 Thread Vadim Ivanov
Before 7.x all replicas in SolrCloud were NRT type.
And following rules were applicable:
https://stackoverflow.com/questions/45998804/when-should-we-apply-hard-commit-and-soft-commit-in-solr
and
https://lucene.apache.org/solr/guide/7_5/updatehandlers-in-solrconfig.html#commit-and-softcommit

But having  new TLOG and PULL replica types causing some mess in that 
explanations.
>From Ref guide we have:
" NRT is the only type of replica that supports soft-commits..."
"If TLOG replica does become a leader, it will behave the same as if it was a 
NRT type of replica."
Does it mean, that if we do not have NRT replicas in the cluster then 
autoSoftCommit section in solconfig.xml Ignored completely (even on TLOG 
leader)?


  6


Should we say that in autoCommit section openSearcher is always true in that 
case?


  1
  3
  512m
  false
 

Does it mean that new Searcher always starts on all replicas when hard commit 
happens on leader?
Some words in Ref Guide about new replica types in section 
#commit-and-softcommit seems to be usefull.
-- 
Vadim



RE: REBALANCELEADERS is not reliable

2018-12-07 Thread Vadim Ivanov
I'm waiting for 7.6 or 7.5.1 and plan to apply patch from  Endika Posadas to it.
Then test again and hope it'll help
-- 
Vadim


> -Original Message-
> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
> Sent: Friday, December 07, 2018 12:01 PM
> To: solr-user@lucene.apache.org
> Subject: Re: REBALANCELEADERS is not reliable
> 
> Thanks for looking this up.
> It could be a hint where to jump into the code.
> I wonder why they rejected a jira ticket about this problem?
> 
> Regards, Bernd
> 
> Am 06.12.18 um 16:31 schrieb Vadim Ivanov:
> > Is solr-dev forum I came across this post
> > http://lucene.472066.n3.nabble.com/Rebalance-Leaders-Leader-node-
> deleted-when-rebalancing-leaders-td4417040.html
> > May be it will shed some light?
> >
> >
> >> -Original Message-
> >> From: Atita Arora [mailto:atitaar...@gmail.com]
> >> Sent: Thursday, November 29, 2018 11:03 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: REBALANCELEADERS is not reliable
> >>
> >> Indeed, I tried that on 7.4 & 7.5 too, indeed did not work for me as well,
> >> even with the preferredLeader property as recommended in the
> >> documentation.
> >> I handled it with a little hack but certainly this dint work as expected.
> >> I can provide more details if there's a ticket.
> >>
> >> On Thu, Nov 29, 2018 at 8:42 PM Aman Tandon
> >>  wrote:
> >>
> >>> ++ correction
> >>>
> >>> On Fri, Nov 30, 2018, 01:10 Aman Tandon  >> wrote:
> >>>
> >>>> For me today, I deleted the leader replica of one of the two shard
> >>>> collection. Then other replicas of that shard wasn't getting elected for
> >>>> leader.
> >>>>
> >>>> After waiting for long tried the setting addreplicaprop preferred leader
> >>>> on one of the replica then tried FORCELEADER but no luck. Then also
> tried
> >>>> rebalance but no help. Finally have to recreate the whole collection.
> >>>>
> >>>> Not sure what was the issue but both FORCELEADER AND REBALANCING
> >> didn't
> >>>> work if there was no leader however preferred leader property was
> setted.
> >>>>
> >>>> On Wed, Nov 28, 2018, 12:54 Bernd Fehling <
> >>> bernd.fehl...@uni-bielefeld.de
> >>>> wrote:
> >>>>
> >>>>> Hi Vadim,
> >>>>>
> >>>>> thanks for confirming.
> >>>>> So it seems to be a general problem with Solr 6.x, 7.x and might
> >>>>> be still there in the most recent versions.
> >>>>>
> >>>>> But where to start to debug this problem, is it something not
> >>>>> correctly stored in zookeeper or is overseer the problem?
> >>>>>
> >>>>> I was also reading something about a "leader queue" where possible
> >>>>> leaders have to be requeued or something similar.
> >>>>>
> >>>>> May be I should try to get a situation where a "locked" core
> >>>>> is on the overseer and then connect the debugger to it and step
> >>>>> through it.
> >>>>> Peeking and poking around, like old Commodore 64 days :-)
> >>>>>
> >>>>> Regards, Bernd
> >>>>>
> >>>>>
> >>>>> Am 27.11.18 um 15:47 schrieb Vadim Ivanov:
> >>>>>> Hi, Bernd
> >>>>>> I have tried REBALANCELEADERS with Solr 6.3 and 7.5
> >>>>>> I had very similar results and notion that it's not reliable :(
> >>>>>> --
> >>>>>> Br, Vadim
> >>>>>>
> >>>>>>> -Original Message-
> >>>>>>> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
> >>>>>>> Sent: Tuesday, November 27, 2018 5:13 PM
> >>>>>>> To: solr-user@lucene.apache.org
> >>>>>>> Subject: REBALANCELEADERS is not reliable
> >>>>>>>
> >>>>>>> Hi list,
> >>>>>>>
> >>>>>>> unfortunately REBALANCELEADERS is not reliable and the leader
> >>>>>>> election has unpredictable results with SolrCloud 6.6.5 and
> >>>>>>> Zookeeper 3.4.10.
> >>>>>>> Seen with 5 shards / 3 replicas.
> >>>>>>>
> >>>>>>> - CLUSTERSTATUS reports all replicas (core_nodes) as state=active.
> >>>>>>> - setting with ADDREPLICAPROP the property preferredLeader to
> other
> >>>>> replicas
> >>>>>>> - calling REBALANCELEADERS
> >>>>>>> - some leaders have changed, some not.
> >>>>>>>
> >>>>>>> I then tried:
> >>>>>>> - removing all preferredLeader properties from replicas which
> >>>>> succeeded.
> >>>>>>> - trying again REBALANCELEADERS for the rest. No success.
> >>>>>>> - Shutting down nodes to force the leader to a specific replica left
> >>>>> running.
> >>>>>>>No success.
> >>>>>>> - calling REBALANCELEADERS responds that the replica is inactive!!!
> >>>>>>> - calling CLUSTERSTATUS reports that the replica is active!!!
> >>>>>>>
> >>>>>>> Also, the replica which don't want to become leader is not in the
> >>> list
> >>>>>>> of collections->[collection_name]->leader_elect->shard1..x->election
> >>>>>>>
> >>>>>>> Where is CLUSTERSTATUS getting it's state info from?
> >>>>>>>
> >>>>>>> Has anyone else problems with REBALANCELEADERS?
> >>>>>>>
> >>>>>>> I noticed that the Reference Guide writes "preferredLeader" (with
> >>>>> capital "L")
> >>>>>>> but the JAVA code has "preferredleader".
> >>>>>>>
> >>>>>>> Regards, Bernd
> >>>>>>
> >>>>>
> >>>>
> >>>
> >



RE: REBALANCELEADERS is not reliable

2018-12-06 Thread Vadim Ivanov
Is solr-dev forum I came across this post
http://lucene.472066.n3.nabble.com/Rebalance-Leaders-Leader-node-deleted-when-rebalancing-leaders-td4417040.html
May be it will shed some light?

-- 
Vadim

> -Original Message-
> From: Atita Arora [mailto:atitaar...@gmail.com]
> Sent: Thursday, November 29, 2018 11:03 PM
> To: solr-user@lucene.apache.org
> Subject: Re: REBALANCELEADERS is not reliable
> 
> Indeed, I tried that on 7.4 & 7.5 too, indeed did not work for me as well,
> even with the preferredLeader property as recommended in the
> documentation.
> I handled it with a little hack but certainly this dint work as expected.
> I can provide more details if there's a ticket.
> 
> On Thu, Nov 29, 2018 at 8:42 PM Aman Tandon
>  wrote:
> 
> > ++ correction
> >
> > On Fri, Nov 30, 2018, 01:10 Aman Tandon  wrote:
> >
> > > For me today, I deleted the leader replica of one of the two shard
> > > collection. Then other replicas of that shard wasn't getting elected for
> > > leader.
> > >
> > > After waiting for long tried the setting addreplicaprop preferred leader
> > > on one of the replica then tried FORCELEADER but no luck. Then also tried
> > > rebalance but no help. Finally have to recreate the whole collection.
> > >
> > > Not sure what was the issue but both FORCELEADER AND REBALANCING
> didn't
> > > work if there was no leader however preferred leader property was setted.
> > >
> > > On Wed, Nov 28, 2018, 12:54 Bernd Fehling <
> > bernd.fehl...@uni-bielefeld.de
> > > wrote:
> > >
> > >> Hi Vadim,
> > >>
> > >> thanks for confirming.
> > >> So it seems to be a general problem with Solr 6.x, 7.x and might
> > >> be still there in the most recent versions.
> > >>
> > >> But where to start to debug this problem, is it something not
> > >> correctly stored in zookeeper or is overseer the problem?
> > >>
> > >> I was also reading something about a "leader queue" where possible
> > >> leaders have to be requeued or something similar.
> > >>
> > >> May be I should try to get a situation where a "locked" core
> > >> is on the overseer and then connect the debugger to it and step
> > >> through it.
> > >> Peeking and poking around, like old Commodore 64 days :-)
> > >>
> > >> Regards, Bernd
> > >>
> > >>
> > >> Am 27.11.18 um 15:47 schrieb Vadim Ivanov:
> > >> > Hi, Bernd
> > >> > I have tried REBALANCELEADERS with Solr 6.3 and 7.5
> > >> > I had very similar results and notion that it's not reliable :(
> > >> > --
> > >> > Br, Vadim
> > >> >
> > >> >> -Original Message-
> > >> >> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
> > >> >> Sent: Tuesday, November 27, 2018 5:13 PM
> > >> >> To: solr-user@lucene.apache.org
> > >> >> Subject: REBALANCELEADERS is not reliable
> > >> >>
> > >> >> Hi list,
> > >> >>
> > >> >> unfortunately REBALANCELEADERS is not reliable and the leader
> > >> >> election has unpredictable results with SolrCloud 6.6.5 and
> > >> >> Zookeeper 3.4.10.
> > >> >> Seen with 5 shards / 3 replicas.
> > >> >>
> > >> >> - CLUSTERSTATUS reports all replicas (core_nodes) as state=active.
> > >> >> - setting with ADDREPLICAPROP the property preferredLeader to other
> > >> replicas
> > >> >> - calling REBALANCELEADERS
> > >> >> - some leaders have changed, some not.
> > >> >>
> > >> >> I then tried:
> > >> >> - removing all preferredLeader properties from replicas which
> > >> succeeded.
> > >> >> - trying again REBALANCELEADERS for the rest. No success.
> > >> >> - Shutting down nodes to force the leader to a specific replica left
> > >> running.
> > >> >>No success.
> > >> >> - calling REBALANCELEADERS responds that the replica is inactive!!!
> > >> >> - calling CLUSTERSTATUS reports that the replica is active!!!
> > >> >>
> > >> >> Also, the replica which don't want to become leader is not in the
> > list
> > >> >> of collections->[collection_name]->leader_elect->shard1..x->election
> > >> >>
> > >> >> Where is CLUSTERSTATUS getting it's state info from?
> > >> >>
> > >> >> Has anyone else problems with REBALANCELEADERS?
> > >> >>
> > >> >> I noticed that the Reference Guide writes "preferredLeader" (with
> > >> capital "L")
> > >> >> but the JAVA code has "preferredleader".
> > >> >>
> > >> >> Regards, Bernd
> > >> >
> > >>
> > >
> >



RE: REBALANCELEADERS is not reliable

2018-11-27 Thread Vadim Ivanov
Hi, Bernd
I have tried REBALANCELEADERS with Solr 6.3 and 7.5
I had very similar results and notion that it's not reliable :(
-- 
Br, Vadim


> -Original Message-
> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
> Sent: Tuesday, November 27, 2018 5:13 PM
> To: solr-user@lucene.apache.org
> Subject: REBALANCELEADERS is not reliable
> 
> Hi list,
> 
> unfortunately REBALANCELEADERS is not reliable and the leader
> election has unpredictable results with SolrCloud 6.6.5 and
> Zookeeper 3.4.10.
> Seen with 5 shards / 3 replicas.
> 
> - CLUSTERSTATUS reports all replicas (core_nodes) as state=active.
> - setting with ADDREPLICAPROP the property preferredLeader to other replicas
> - calling REBALANCELEADERS
> - some leaders have changed, some not.
> 
> I then tried:
> - removing all preferredLeader properties from replicas which succeeded.
> - trying again REBALANCELEADERS for the rest. No success.
> - Shutting down nodes to force the leader to a specific replica left running.
>No success.
> - calling REBALANCELEADERS responds that the replica is inactive!!!
> - calling CLUSTERSTATUS reports that the replica is active!!!
> 
> Also, the replica which don't want to become leader is not in the list
> of collections->[collection_name]->leader_elect->shard1..x->election
> 
> Where is CLUSTERSTATUS getting it's state info from?
> 
> Has anyone else problems with REBALANCELEADERS?
> 
> I noticed that the Reference Guide writes "preferredLeader" (with capital "L")
> but the JAVA code has "preferredleader".
> 
> Regards, Bernd



RE: Issue Searching Data from multiple Databases

2018-11-14 Thread Vadim Ivanov
Hi!
Have you tried to name entity in Fulldataimport http call
As
/dataimport/?command=full-import=Document1=true=true
Is there something sane in the log file after that command?

-- 
Vadim


> -Original Message-
> From: Santosh Kumar S [mailto:santoshkumar.saripa...@infinite.com]
> Sent: Wednesday, November 14, 2018 5:03 PM
> To: solr-user@lucene.apache.org
> Subject: Issue Searching Data from multiple Databases
> 
> I am trying to achieve search by connecting to multiple Databases (in my
case
> trying with 2 different DBs) to index data from multiple DB tables.
> I have tried doing the below as an approach to achieve my goal but in
vain,
> I am able to get only data from the DB 1 when I perform a full-import.
> Steps performed :
> 1.  Added multiple data source in the data-config.xml file as shown below
:
> 
>  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> url="jdbc:sqlserver://10.10.10.10;databaseName=TestDB1;" user="TestUser"
> password="TestUser$"/>
>  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> url="jdbc:sqlserver://10.10.10.10;databaseName=TestDB2;" user="TestUser"
> password="TestUser$"/>
> 
> 2. Added multiple entities against each data source added as shown below :
> 
>  transformer="RegexTransformer"
> pk="Id" query="select * from MyTestTable">
> 
> 
> 
> 
>  transformer="RegexTransformer"
> pk="EmpId" query="select * from MySampleTable" >
> 
> 
> 
> 
> 3. Added appropriate fields in the managed-schema.xml file as well
> 
>  required="true" multiValued="false" />
>  required="false" multiValued="false" />
> 
>  required="false" multiValued="false" />
>  required="false" multiValued="false" />
> 
> 4. Reloaded the collection for the changes to take affect.
> 5. Performed a full import. even observed that the data has not got
imported
> from DB2.
> 6. Did a search only to find the data from DB1 is getting fetched where as
> data from DB2 is not at all getting fetched.
> 
> Suggestions/Guidance shall be highly appreciated.
> Please let me know in case you need any further information.
> 
> Note :  I tried connecting 2 different DBs on 2 different servers and also
2
> different DBs on same server as well.
> 
> Thank you in advance!!
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



RE: Replicationhandler with TLOG replicas

2018-11-12 Thread Vadim Ivanov
Hi, Erick
I have about 1300 cores in my test environment for 159 colllections
Today I have wrote a script to check all of them.
For 138 out of 1300 cores "generation" and "indexversion" information returned 
by mbeans and replicationhandler do not match.
Most of the replicas has more than 1 gap in generation (fro ex. 14 - returned 
by mbeans. 6 - returned by RH) (so  it's not indexing for sure)
None of these 138 replicas are leader of corresponding shards.
All of these 138 replicas when queried with =false returned absolutely 
the same documents as their leaders.
I've checked some replicas for segments - yes they have the same segments as 
their  leaders with absolutely same sizes in bytes.

It seems to me this issue does not affect indexing or searching... it's just 
curious misread of some information I faced.

My autocommit is:

 
   ${solr.autoCommit.maxTime:6} 
   false 
 

  
   ${solr.autoSoftCommit.maxTime:30} 
 

-- 
BR, Vadim


> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, November 11, 2018 9:51 PM
> To: solr-user
> Subject: Re: Replicationhandler with TLOG replicas
> 
> Vadim:
> 
> The next time you see this, is it possible to check that the replicas
> showing different index versions have the same documents? Actually, it
> should be sufficient to verify that they have the same segments in
> their data/index directory, and they should match the segments on the
> leader _assuming_ you're not actively indexing and you stopped
> indexing more than the polling interval ago.
> 
> If you are actively indexing, it should be sufficient to check that
> the questionable replica's index files are changing over time, that
> would mean that replication is happening.
> 
> And what's your commit interval? The polling interval on the followers is:
> 1> 1/2 the hard commit interval if defined to be > -1. If not
> 2> 1/2 the soft commit interval if defined to be > -1. If not
> 3> 3000ms
> 
> There are two possibilities here as I see it.
> 1> this is just a reporting error, which we should still address but
> doesn't worry me much.
> 2> the TLOG/PULL replication process has some bug and the indexes are,
> indeed different
> 2a> when you reloaded the collection, it's possible that the startup
> progress kicked off a replication
>and if there's really a bug reloading just masked it.
> 
> Best,
> Erick
> On Sun, Nov 11, 2018 at 2:34 AM Vadim Ivanov
>  wrote:
> >
> > Reload collection helps !
> > After reloading collection  generation and indexversion returned by
> Replicationhandler  catch up with the leader
> >
> >
> > > -Original Message-
> > > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru]
> > > Sent: Sunday, November 11, 2018 1:09 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: Replicationhandler with TLOG replicas
> > >
> > > Thanks, Shawn
> > > I have anticipated the answer about information returned by
> > > ReplicationHandler.
> > > What baffled me is that usually on most of replicas indexversion and
> generation
> > > returned by ReplicationHandler is right and it increases with commits.
> > > But on some replicas it's not - it stops changing at some moment in the 
> > > past
> > > forever.
> > > For example, I have 5 TLOG replicas:
> > > For leader(and all good 3 replicas)
> > > http://host_n:8983/solr/core_n/replication?command=indexversion
> returnes
> > > {
> > >   "responseHeader":{
> > > "status":0,
> > > "QTime":0},
> > >   "indexversion":1541885907200,
> > >   "generation":1704}
> > >
> > > But for one replica:
> > > {
> > >   "responseHeader":{
> > > "status":0,
> > > "QTime":0},
> > >   "indexversion":1540842454653,
> > >   "generation":1216}
> > >
> > > Could it be sign of some hidden issue? Where that information stored and
> why
> > > it stops changing at some moment?
> > > No indexing is going on of that collection at the moment of request. I'm
> > > "deltaimporting" that collection ones per hour and only if needed.
> > > So usually there is only 5-10 commits per day.
> > > It's not a crucial issue for my use case as I have adequate information of
> > > indexversion
> > > and generation returned by mbeans, just curious of that strange behavior.
> > >
> > > > -O

RE: Replicationhandler with TLOG replicas

2018-11-11 Thread Vadim Ivanov
Reload collection helps !
After reloading collection  generation and indexversion returned by 
Replicationhandler  catch up with the leader


> -Original Message-
> From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru]
> Sent: Sunday, November 11, 2018 1:09 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Replicationhandler with TLOG replicas
> 
> Thanks, Shawn
> I have anticipated the answer about information returned by
> ReplicationHandler.
> What baffled me is that usually on most of replicas indexversion and 
> generation
> returned by ReplicationHandler is right and it increases with commits.
> But on some replicas it's not - it stops changing at some moment in the past
> forever.
> For example, I have 5 TLOG replicas:
> For leader(and all good 3 replicas)
> http://host_n:8983/solr/core_n/replication?command=indexversion returnes
> {
>   "responseHeader":{
> "status":0,
> "QTime":0},
>   "indexversion":1541885907200,
>   "generation":1704}
> 
> But for one replica:
> {
>   "responseHeader":{
> "status":0,
> "QTime":0},
>   "indexversion":1540842454653,
>   "generation":1216}
> 
> Could it be sign of some hidden issue? Where that information stored and why
> it stops changing at some moment?
> No indexing is going on of that collection at the moment of request. I'm
> "deltaimporting" that collection ones per hour and only if needed.
> So usually there is only 5-10 commits per day.
> It's not a crucial issue for my use case as I have adequate information of
> indexversion
> and generation returned by mbeans, just curious of that strange behavior.
> 
> > -Original Message-
> > From: Shawn Heisey [mailto:apa...@elyograg.org]
> > Sent: Saturday, November 10, 2018 6:46 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Replicationhandler with TLOG replicas
> >
> > On 11/10/2018 8:05 AM, Vadim Ivanov wrote:
> > > Seems, the latter gets some wrong information as indexversion and
> > generation
> > > is far behind then leader.
> > > But core index seems up to date and healthy.
> > > Why such things could happen on some replicas? (Most of the replicas
> > retuned
> > > the same information by both commands)
> > > Is information returned  by Replicationhandler  not applicable to 
> > > tlog/pull
> > > replicas and is not reliable ?
> >
> > SolrCloud does not use the replication handler in the same way that
> > master/slave replication does.  It "manually" initiates any replication
> > that takes place -- the replication handler is not in charge.  You
> > cannot be sure that the indexes the replication handler thinks are
> > master and slave are in fact the indexes that will be replicated next.
> > Just ignore anything that the replication handler tells you.  It may
> > have absolutely no bearing on what's happening.
> >
> > Was indexing happening when you looked, or was it entirely stopped?  If
> > indexing is ongoing, you may have seen the difference in the index
> > versions in between data being indexed on the leader and the time that
> > the replication is initiated.
> >
> > Thanks,
> > Shawn



RE: Replicationhandler with TLOG replicas

2018-11-11 Thread Vadim Ivanov
Thanks, Shawn
I have anticipated the answer about information returned by ReplicationHandler. 
What baffled me is that usually on most of replicas indexversion and generation 
returned by ReplicationHandler is right and it increases with commits.
But on some replicas it's not - it stops changing at some moment in the past 
forever.
For example, I have 5 TLOG replicas:
For leader(and all good 3 replicas)  
http://host_n:8983/solr/core_n/replication?command=indexversion returnes
{
  "responseHeader":{
"status":0,
"QTime":0},
  "indexversion":1541885907200,
  "generation":1704}

But for one replica:
{
  "responseHeader":{
"status":0,
"QTime":0},
  "indexversion":1540842454653,
  "generation":1216}
 
Could it be sign of some hidden issue? Where that information stored and why it 
stops changing at some moment?
No indexing is going on of that collection at the moment of request. I'm 
"deltaimporting" that collection ones per hour and only if needed.
So usually there is only 5-10 commits per day.
It's not a crucial issue for my use case as I have adequate information of 
indexversion 
and generation returned by mbeans, just curious of that strange behavior.
 
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Saturday, November 10, 2018 6:46 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replicationhandler with TLOG replicas
> 
> On 11/10/2018 8:05 AM, Vadim Ivanov wrote:
> > Seems, the latter gets some wrong information as indexversion and
> generation
> > is far behind then leader.
> > But core index seems up to date and healthy.
> > Why such things could happen on some replicas? (Most of the replicas
> retuned
> > the same information by both commands)
> > Is information returned  by Replicationhandler  not applicable to tlog/pull
> > replicas and is not reliable ?
> 
> SolrCloud does not use the replication handler in the same way that
> master/slave replication does.  It "manually" initiates any replication
> that takes place -- the replication handler is not in charge.  You
> cannot be sure that the indexes the replication handler thinks are
> master and slave are in fact the indexes that will be replicated next.
> Just ignore anything that the replication handler tells you.  It may
> have absolutely no bearing on what's happening.
> 
> Was indexing happening when you looked, or was it entirely stopped?  If
> indexing is ongoing, you may have seen the difference in the index
> versions in between data being indexed on the leader and the time that
> the replication is initiated.
> 
> Thanks,
> Shawn



Replicationhandler with TLOG replicas

2018-11-10 Thread Vadim Ivanov
Hello!
I have SolrCloud 7.5 with TLOG replicas.
I have noticed that information about replication state of replicas differs
when received from 
...core/admin/mbeans?stats=true=replication=true=REPLICATION
And 
...core/replication?command=indexversion

Seems, the latter gets some wrong information as indexversion and generation
is far behind then leader.
But core index seems up to date and healthy.
Why such things could happen on some replicas? (Most of the replicas retuned
the same information by both commands)
Is information returned  by Replicationhandler  not applicable to tlog/pull
replicas and is not reliable ?

-- 
Vadim




RE: TLOG replica stucks

2018-11-02 Thread Vadim Ivanov
It seems to me that issue related with:
- restart solr node
- rebalance leader
- reload collection
- reload core (Core admin is not forbidden but seems obsolete in SolrCloud)
If nothing is changing in cluster state everything goes smoothly.
May be it can be reproduced wit the same test as in " SolrCloud Replication 
Failure" branch
-- Vadim

> -Original Message-
> From: Ere Maijala [mailto:ere.maij...@helsinki.fi]
> Sent: Thursday, November 01, 2018 5:21 PM
> To: solr-user@lucene.apache.org
> Subject: Re: TLOG replica stucks
> 
> Could it be related to reloading a collection? I need to do some
> testing, but it just occurred to me that reload was done at least once
> during the period the cluster had been up.
> 
> Regards,
> Ere
> 
> Ere Maijala kirjoitti 30.10.2018 klo 12.03:
> > Hi,
> >
> > We had the same happen with PULL replicas with Solr 7.5. Solr was
> > showing that they all had correct index version, but the changes were
> > not showing. Unfortunately the solr.log size was too small to catch any
> > issues, so I've now increased and waiting for it to happen again.
> >
> > Regards,
> > Ere
> >
> > Vadim Ivanov kirjoitti 25.10.2018 klo 18.42:
> >> Thanks Erick for you attention!
> >> My comments below, but supposing that the problem resides in zookeeper
> >> I'll collect more information  from zk logs and solr logs and be back
> >> soon.
> >>
> >>> bq. I've noticed that some replicas stop receiving updates from the
> >>> leader without any visible signs from the cluster status.
> >>>
> >>> Hmm, yes, this isn't expected at all. What are you seeing that causes
> >>> you to say this? You'd have to be monitoring the log for update
> >>> messages to the replicas that aren't leaders or the like.  If anyone is
> >>> going to have a prayer of reproducing we'll need more info on exactly
> >>> what you're seeing and how you're measuring this.
> >>
> >> Meanwhile, I have log level WARN... I'l decrease  it to INFO and see. Tnx
> >>
> >>>
> >>> Have you changed any configurations in your replicas at all? We'd need
> >>> the exact steps you performed if so.
> >> Command to create replicas was like this (implicit sharding and custom
> >> CoreName ) :
> >>
> >> mysolr07:8983/solr/admin/collections?action=ADDREPLICA
> >> =rpk94
> >> =rpk94_1_0
> >> =rpk94_1_0_07
> >> =tlog
> >> =mysolr07:8983_solr
> >>
> >>>
> >>> On a quick test I didn't see this, but if it were that easy to
> >>> reproduce I'd expect it to have shown up before.
> >>
> >> Yesterday I've tried to reproduce...  trying to change leader with
> >> REBALANCELEADERS command.
> >> It ended up with no leader at all for the shard  and I could not set
> >> leader at all for a long time.
> >>
> >> There was a problem trying to register as the
> >> leader:org.apache.solr.common.SolrException: Could not register as the
> >> leader because creating the ephemeral registration node in ZooKeeper
> >> failed
> >> ...
> >> Deleting duplicate registration:
> >>
> /collections/rpk94/leader_elect/rpk94_1_117/election/298318118789952308
> 5-core_node73-n_22
> >>
> >> ...
> >>Index fetch failed :org.apache.solr.common.SolrException: No
> >> registered leader was found after waiting for 4000ms , collection:
> >> rpk94 slice: rpk94_1_117
> >> ...
> >>
> >> Even to delete all replicas for the shard and recreate Replica to the
> >> same node with the same name did not help - no leader for that shard.
> >> I had to delete collection, wait till morning and then it recreated
> >> successfully.
> >> Suppose some weird znodes were deleted from  zk by morning.
> >>
> >>>
> >>> NOTE: just looking at the cloud graph and having a node be active is
> >>> not _necessarily_ sufficient for the node to be up to date. It
> >>> _should_ be sufficient if (and only if) the node was shut down
> >>> gracefully, but a "kill -9" or similar doesn't give the replicas on
> >>> the node the opportunity to change the state. The "live_nodes" znode
> >>> in ZooKeeper must also contain the node the replica resides on.
> >>
> >> Node was live, cluster was healthy
> >>
> >>>
> >>> If you see this state again, you could try ping

RE: Overseer could not get tags

2018-10-31 Thread Vadim Ivanov
Hi, Chris
I had the same messages in solr log while testing 7.4 and 7.5
The only remedy I've found - increasing header size:
/opt/solr/server/etc/jetty.xml


After solr restart - no more annoying messages

> -Original Message-
> From: Chris Ulicny [mailto:culicny@iq.media]
> Sent: Wednesday, October 31, 2018 7:40 PM
> To: solr-user
> Subject: Re: Overseer could not get tags
> 
> I've managed to replicate this issue with the 7.5.0 release as well by
> starting up a single instance of solr in cloud mode (on windows) and
> uploading the security.json file below to it.
> 
> After a short while, the "could not get tags from node..." messages start
> coming through every 60 seconds. The accompanying logged error and
> expecting stacktrace are also included below.
> 
> Is there a JIRA ticket for this issue (or a directly related one)? I
> couldn't seem to find one.
> 
> Thanks,
> Chris
> 
> *security.json:*
> {
> "authentication":{"blockUnknown":true,"class":"solr.BasicAuthPlugin",
> "credentials":{
> "solradmin":"...",
> "solrreader":"...",
> "solrwriter":"..."}
> },
> "authorization":{"class":"solr.RuleBasedAuthorizationPlugin",
> "permissions":[
> {"name":"read","role":"reader"},
> {"name":"security-read","role":"reader"},
> {"name":"schema-read","role":"reader"},
> {"name":"config-read","role":"reader"},
> {"name":"core-admin-read","role":"reader"},
> {"name":"collection-admin-read","role":"reader"},
> {"name":"update","role":"writer"},
> {"name":"security-edit","role":"admin"},
> {"name":"schema-edit","role":"admin"},
> {"name":"config-edit","role":"admin"},
> {"name":"core-admin-edit","role":"admin"},
> {"name":"collection-admin-edit","role":"admin"},
> {"name":"all","role":"admin"}],
> "user-role":{
> "solradmin":["reader","writer","admin"],
> "solrreader":["reader"],
> "solrwriter":["reader","writer"]}
> }
> }
> 
> *StackTrace:*
> 2018-10-31 16:20:01.994 WARN  (MetricsHistoryHandler-12-thread-1) [   ]
> o.a.s.c.s.i.SolrClientNodeStateProvider could not get tags from node
> ip:8080_solr
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://ip:8080/solr: Expected mime type
> application/octet-stream but got text/html. 
> 
> 
> Error 401 require authentication
> 
> HTTP ERROR 401
> Problem accessing /solr/admin/metrics. Reason:
> require authentication
> 
> 
> 
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.ja
> va:607)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$ClientSnitchCtx.in
> voke(SolrClientNodeStateProvider.java:342)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchReplicaMetri
> cs(SolrClientNodeStateProvider.java:195)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$AutoScalingSnitc
> h.getRemoteInfo(SolrClientNodeStateProvider.java:241)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:7
> 6)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchTagValues(S
> olrClientNodeStateProvider.java:139)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.getNodeValues(S
> olrClientNodeStateProvider.java:128)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.handler.admin.MetricsHistoryHandler.collectGlobalMetrics(Me
> tricsHistoryHandler.java:498)
> ~[solr-core-7.5.0.jar:7.5.0 

RE: TLOG replica stucks

2018-10-25 Thread Vadim Ivanov
Thanks Erick for you attention!
My comments below, but supposing that the problem resides in zookeeper 
I'll collect more information  from zk logs and solr logs and be back soon.

> bq. I've noticed that some replicas stop receiving updates from the
> leader without any visible signs from the cluster status.
> 
> Hmm, yes, this isn't expected at all. What are you seeing that causes
> you to say this? You'd have to be monitoring the log for update
> messages to the replicas that aren't leaders or the like.  If anyone is
> going to have a prayer of reproducing we'll need more info on exactly
> what you're seeing and how you're measuring this.

Meanwhile, I have log level WARN... I'l decrease  it to INFO and see. Tnx

> 
> Have you changed any configurations in your replicas at all? We'd need
> the exact steps you performed if so.
Command to create replicas was like this (implicit sharding and custom CoreName 
) :

mysolr07:8983/solr/admin/collections?action=ADDREPLICA
=rpk94
=rpk94_1_0
=rpk94_1_0_07
=tlog
=mysolr07:8983_solr

> 
> On a quick test I didn't see this, but if it were that easy to
> reproduce I'd expect it to have shown up before.

Yesterday I've tried to reproduce...  trying to change leader with 
REBALANCELEADERS command. 
It ended up with no leader at all for the shard  and I could not set leader at 
all for a long time.

   There was a problem trying to register as the 
leader:org.apache.solr.common.SolrException: Could not register as the leader 
because creating the ephemeral registration node in ZooKeeper failed
...
   Deleting duplicate registration: 
/collections/rpk94/leader_elect/rpk94_1_117/election/2983181187899523085-core_node73-n_22
...
  Index fetch failed :org.apache.solr.common.SolrException: No registered 
leader was found after waiting for 4000ms , collection: rpk94 slice: rpk94_1_117
...

Even to delete all replicas for the shard and recreate Replica to the same node 
with the same name did not help - no leader for that shard.
I had to delete collection, wait till morning and then it recreated 
successfully.
Suppose some weird znodes were deleted from  zk by morning.

> 
> NOTE: just looking at the cloud graph and having a node be active is
> not _necessarily_ sufficient for the node to be up to date. It
> _should_ be sufficient if (and only if) the node was shut down
> gracefully, but a "kill -9" or similar doesn't give the replicas on
> the node the opportunity to change the state. The "live_nodes" znode
> in ZooKeeper must also contain the node the replica resides on.

Node was live, cluster was healthy

> 
> If you see this state again, you could try pinging the node directly,
> does it respond? Your URL should look something like:
> http://host:port/solr/colection_shard1_replica_t1/query?q=*:*=false

Yes, sure I did. Ill replica responded and number of documents differs with the 
leader

> 
> The "distrib=false" is important as it won't forward the query to any
> other replica. If what you're reporting is really happening, that node
> should respond with a document count different from other nodes.
> 
> NOTE: there's a delay between the time the leader indexes a doc and
> it's visible on the follower. Are you sure you're waiting for
> leader_commit_interval+polling_interval+autowarm_time before
> concluding that there's a problem? I'm a bit suspicious that checking
> the versions is concluding that your indexes are out of sync when
> really they're just catching up normally. If it's at all possible to
> turn off indexing for a few minutes when this happens and everything
> just gets better then it's not really a problem.

Sure, the problem was on many shards but not on all shards
and for the long time. 

> 
> If we prove out that this is really happening as you think, then a
> JIRA (with steps to reproduce) is _definitely_ in order.
> 
> Best,
> Erick
> On Wed, Oct 24, 2018 at 2:07 AM Vadim Ivanov
>  wrote:
> >
> > Hi All !
> >
> > I'm testing Solr 7.5 with TLOG replicas on SolrCloud with 5 nodes.
> >
> > My collection has shards and every shard has 3 TLOG replicas on different
> > nodes.
> >
> > I've noticed that some replicas stop receiving updates from the leader
> > without any visible signs from the cluster status.
> >
> > (all replicas active and green in Admin UI CLOUD graph). But indexversion of
> > 'ill' replica not increasing with the leader.
> >
> > It seems to be dangerous, because that 'ill' replica could become a leader
> > after restart of the nodes and I already experienced data loss.
> >
> > I didn't notice any meaningfull records in solr log, except that probably
> > problem occurs when leader changes.
> >
> > Meanwhile, I monitor indexversion of all replicas in a cluster by mbeans and
> > recreate ill replicas when difference with the leader indexversion  more
> > than one
> >
> > Any suggestions?
> >
> > --
> >
> > Best regards, Vadim
> >
> >
> >



TLOG replica stucks

2018-10-24 Thread Vadim Ivanov
Hi All !

I'm testing Solr 7.5 with TLOG replicas on SolrCloud with 5 nodes.

My collection has shards and every shard has 3 TLOG replicas on different
nodes.

I've noticed that some replicas stop receiving updates from the leader
without any visible signs from the cluster status.

(all replicas active and green in Admin UI CLOUD graph). But indexversion of
'ill' replica not increasing with the leader.

It seems to be dangerous, because that 'ill' replica could become a leader
after restart of the nodes and I already experienced data loss.

I didn't notice any meaningfull records in solr log, except that probably
problem occurs when leader changes.

Meanwhile, I monitor indexversion of all replicas in a cluster by mbeans and
recreate ill replicas when difference with the leader indexversion  more
than one

Any suggestions?

-- 

Best regards, Vadim

 



RE: Join across shards?

2018-10-23 Thread Vadim Ivanov
Hi, 
You CAN join across collections with runtime "join". 
The only limitation is that FROM collection should not be sharded and joined
data should reside on one node.
Solr cannot join across nodes (distributed search is not supported).
Though using streaming expressions it's possible to do various things...
-- 
Vadim

-Original Message-
From: e_bri...@videotron.ca [mailto:e_bri...@videotron.ca] 
Sent: Tuesday, October 23, 2018 2:38 PM
To: solr-user@lucene.apache.org
Subject: Join across shards?

Hi
all,

Sorry if the question was already covered.

We are using joins across documents with the limitation of having the
documents to be joined sitting on the same shard. Is there a way around this
limitation and even join across collections? Are there plans to support this
out of the box?

Thanks!

Eric Briere.



RE: Trying to retrieve two values from two different collections by sql (V 7.2.1)

2018-10-18 Thread Vadim Ivanov
...but using Streaming Expressions it's possible to achieve the goal, AFAIK
https://lucene.apache.org/solr/guide/7_5/stream-decorators.html#innerjoin

Though, probably it won't be so fast as search
-- 
Vadim

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Thursday, October 18, 2018 1:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Trying to retrieve two values from two different collections by 
sql (V 7.2.1)

Joins are not currently supported with Solr SQL. We should create a ticket
through a proper exception in this  scenario.



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Oct 16, 2018 at 10:56 PM deniz  wrote:

> found out sth strange regarding this case. If i change one of the values
> into
> sth else, and the field names are not the same any more, then i can get the
> different values
>
> so the initial query was
>
> select *collection1.id* as collection1id, collection2.id as collection2id
> from
> collection1 join collection2 on collection1.name = collection2.name where
> collection1.name = 'dummyname';
>
>
> once i change it into
>
> select *collection1.age* as collection1id, collection2.id as collection2id
> from
> collection1 join collection2 on collection1.name = collection2.name where
> collection1.name = 'dummyname';
>
> I am able to get the age from one collection and id from the second one.
> but
> if use age for both of the collections, like id field, i am getting only
> one
> value from one of the collections.
>
>
>
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



RE: Correct join syntax

2018-10-16 Thread Vadim Ivanov
Hi
You cannot join on two fields in SOLR as you do using SQL.
Having the same situations I add new string field in collections(to concatenate 
Type and Id fields) and fill it at index time)
Then join two collections on that field at query time
-- 
Vadim

-Original Message-
From: dami...@gmail.com [mailto:dami...@gmail.com] 
Sent: Tuesday, October 16, 2018 1:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Correct join syntax

Hi Christoph,

The closest I can get is:

{!join from=id to=id v="+UserId= +Access="}

If there was a combined id/type field then you could join on that.

Regards,
Damien.

On Tue, 16 Oct 2018 at 04:10, Christoph <
christoph+develo...@project-mayhem.org> wrote:

> I have two cores.
>
> One core has the following fields:
>
> Type
> Id
>
> The other core has
>
> Type
> Id
> UserId
> Access
>
> How can I join where core1.Type = core2.Type, core1.Id = core2.id,
> core2.UserId = , and core2.Access = ?
>
> When querying core1, I've tried variations of
>
> {!join from=Type to=Type fromIndex=core2)(UserId: AND
> Access:)
> {!join from=Id to=Id fromIndex=core2)(UserId: AND Access:)
>
> {!join from=Type to=Type fromIndex=core2)UserId:
> {!join from=Id to=Id fromIndex=core2)UserId:
> {!join from=Type to=Type fromIndex=core2)Access:
> {!join from=Id to=Id fromIndex=core2)Access:
>
> et. al.
>
> How can I get done what I'm trying to do?
>
> thnx,
> Christoph
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



RE: join query in same collection

2018-09-14 Thread Vadim Ivanov
Hi,
AFAIK Solr can  join only local indexes. No matter whether you join the same 
collection or two different ones.
So, in your case shard1 will be joined to shard1  and shard2 to shard2.
Unfortunately it's hard to say from your data  which document resides in which 
shard, but you can test using =false
-- 
BR, Vadim



-Original Message-
From: Steve Pruitt [mailto:bpru...@opentext.com] 
Sent: Friday, September 14, 2018 9:22 PM
To: solr-user@lucene.apache.org
Subject: join query in same collection

I see nothing in the documentation suggesting a query with a join filter 
doesn't work when a single collection is involved.  There is the special 
deployment instructions when joining across two distinct collections, but this 
is not my case.

I have a single collection:
I have two VM's, both running Solrcloud.
My collection has 2 shards on two different nodes.  Max shards per node is set 
to 1 and replication factor is set to 1.

The join filter is: {!join from=expctr-label-memberIds 
to=expctr-id}expctr-id:4b6f7d34-a58b-3399-b077-685951d06738

When I run the query, I get back only the document with expctr-id: 
2087d22a-6157-306f-8386-8352e7d8e4d4
This looks, maybe, like it only finds a document on the replica handling the 
query.  Shouldn't it search and filter across the entire collection?

The documents:
   {
"expctr-name":"online account opening",
"expctr-description":["Journey for online customers"],
"expctr-created":1536947719132,
"expctr-to-date":154623240,
"expctr-from-date":153836640,
"expctr-id":"89ec679b-24df-3559-8428-124640c96230",
"expctr-creator":"SP",
"expctr-type":"journey",
"_version_":1611606406752894976},
  {
"expctr-name":"drop-in account opening",
"expctr-description":["Journey for dropin customers"],
"expctr-created":1536947827643,
"expctr-to-date":154623240,
"expctr-from-date":153836640,
"expctr-id":"2087d22a-6157-306f-8386-8352e7d8e4d4",
"expctr-creator":"SP",
"expctr-type":"journey",
"_version_":1611606520475156480},
  {
"expctr-name":"placeholder",
"expctr-label":"customers",
"expctr-created":1536947679984,
"expctr-to-date":0,
"expctr-from-date":0,
"expctr-id":"4b6f7d34-a58b-3399-b077-685951d06738",
"expctr-creator":"SP",
"expctr-type":"label",
"expctr-label-memberIds":["89ec679b-24df-3559-8428-124640c96230", 
"2087d22a-6157-306f-8386-8352e7d8e4d4"],
"_version_":1611606544788488192}]
  }



RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

2018-09-14 Thread Vadim Ivanov
Hello
Mikhail, thank you  for support.
I have already tested this case a lot to be sure what is happening under the 
hood.
As you proposed - I 've shuffled data coming from sql to solr to see how solr 
reacts:
I have 6 shards s0 ... s5
shard - is the routing field in my collection. 
(router.name=implicit=shard)
Му sql query looks like this

Select 
id

, Case when  100 > RowNumber then 's5'
else 's_' + cast(RowNumber % 4 as varchar) 
 end  as shard 
from ...

Оnly first 99 rows goes to shard s5, and all the rest data spreads evenly 
between s0 ... s3.
After 120 sec of indexing I receive IdleTimeout  from shard leader of s5 
s4 receives no data and seems do not open connection at all - so no Timeout 
occurs
s0...s3 receives data and  no Timeout occurs

When I tweak IdleTimeout  in /opt/solr-7.4.0/server/etc/jetty-http.xml It 
helps, 
But I have concerns about icreasing it from 120sec to 30 min.
Is it safe? What consequences could be?

I have noticed that IdleTimeout  in jetty 9.3.8 (coming with Solr 6.3.0) was 50 
sec
And no such behavior was observed in Solr 6.3. So default was increased 
significantly in 9.4.10 for some reason.
Maybe someone could shed some light on the reasons. What was changed in 
document routing behavior and why?
Maybe there was discussion about it that I could not find?

-- 
BR Vadim

-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org] 
Sent: Friday, September 14, 2018 12:10 PM
To: solr-user
Subject: Re: Idle Timeout while DIH indexing and implicit sharding in 7.4

Hello, Vadim.
My guess (and only guess) that bunch of updates coming into a shard causes
a heavy merge that blocks new updates in its' order. This can be verified
with logs or threaddump from the problematic node. The probable measures
are: try to shuffle updates to load other shards for a while and let
parallel merge to pack that shard. And just wait a little by increasing
timeout in jetty.
Let us know what you will encounter.

On Thu, Sep 13, 2018 at 3:54 PM Vadim Ivanov <
vadim.iva...@spb.ntk-intourist.ru> wrote:

> Hi,
> I've put some more tests on the issue and managed to find out more details.
> Time out occurs when while long indexing some documents in the beginning is
> going to one shard and then for a long time (more than 120 sec) no data at
> all is going to that shard.
> Connection to that core, opened in the beginning of indexing, goes to  idle
> timeout :( .
> If no data at all going to the shard during indexing - no timeout occurs on
> that shard.
> If Indexing finishes earlier than 120 sec - no timeout occurs on that
> shard.
> Unfortunately, in our use-case there are lot of long  indexing up to 30
> minutes with uneven shard distribution of documents.
> Any suggestion how to mitigate issue?
> --
> BR
> Vadim Ivanov
>
>
> -Original Message-
> From: Вадим Иванов [mailto:vadim.iva...@spb.ntk-intourist.ru]
> Sent: Wednesday, September 12, 2018 4:29 PM
> To: solr-user@lucene.apache.org
> Subject: Idle Timeout while DIH indexing and implicit sharding in 7.4
>
> Hello gurus,
> I am using solrCloud with DIH for indexing my data.
> Testing 7.4.0 with implicitly sharded collection  I have noticed that any
> indexing
> longer then 2 minutes always failing with many timeout records in log
> coming
> from all replicas in collection.
>
> Such as:
> x:Mycol_s_0_replica_t40 RequestHandlerBase
> java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout
> expired: 120001/12 ms
> null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
> timeout expired: 12/12 ms
> at
>
> org.eclipse.jetty.server.HttpInput$ErrorState.noContent(HttpInput.java:1075)
> at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:313)
> at
>
> org.apache.solr.servlet.ServletInputStreamWrapper.read(ServletInputStreamWra
> pper.java:74)
> ...
> Caused by: java.util.concurrent.TimeoutException: Idle timeout expired:
> 12/12 ms
> at
> org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
> at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$
> 201(ScheduledThreadPoolExecutor.java:180)
> at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch
> eduledThreadPoolExecutor.java:293)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
> 49)
> at
>
> java.util.concurrent.Threa

RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

2018-09-13 Thread Vadim Ivanov
Hi,
I've put some more tests on the issue and managed to find out more details.
Time out occurs when while long indexing some documents in the beginning is
going to one shard and then for a long time (more than 120 sec) no data at
all is going to that shard.
Connection to that core, opened in the beginning of indexing, goes to  idle
timeout :( .
If no data at all going to the shard during indexing - no timeout occurs on
that shard.
If Indexing finishes earlier than 120 sec - no timeout occurs on that shard.
Unfortunately, in our use-case there are lot of long  indexing up to 30
minutes with uneven shard distribution of documents.
Any suggestion how to mitigate issue?
--
BR
Vadim Ivanov


-Original Message-
From: Вадим Иванов [mailto:vadim.iva...@spb.ntk-intourist.ru] 
Sent: Wednesday, September 12, 2018 4:29 PM
To: solr-user@lucene.apache.org
Subject: Idle Timeout while DIH indexing and implicit sharding in 7.4

Hello gurus, 
I am using solrCloud with DIH for indexing my data.
Testing 7.4.0 with implicitly sharded collection  I have noticed that any
indexing 
longer then 2 minutes always failing with many timeout records in log coming
from all replicas in collection.

Such as:
x:Mycol_s_0_replica_t40 RequestHandlerBase
java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout
expired: 120001/12 ms
null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
timeout expired: 12/12 ms
at
org.eclipse.jetty.server.HttpInput$ErrorState.noContent(HttpInput.java:1075)
at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:313)
at
org.apache.solr.servlet.ServletInputStreamWrapper.read(ServletInputStreamWra
pper.java:74)
...
Caused by: java.util.concurrent.TimeoutException: Idle timeout expired:
12/12 ms
at
org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$
201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch
eduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
49)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
24)
... 1 more
Suppressed: java.lang.Throwable: HttpInput failure
at
org.eclipse.jetty.server.HttpInput.failed(HttpInput.java:821)
at
org.eclipse.jetty.server.HttpConnection$BlockingReadCallback.failed(HttpConn
ection.java:649)
at
org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:134)

Resulting indexing status:
  "statusMessages":{
"Total Requests made to DataSource":"1",
"Total Rows Fetched":"2828323",
"Total Documents Processed":"2828323",
"Total Documents Skipped":"0",
"Full Dump Started":"2018-09-12 14:28:21",
"":"Indexing completed. Added/Updated: 2828323 documents. Deleted 0
documents.",
"Committed":"2018-09-12 14:33:41",
"Time taken":"0:5:19.507",
"Full Import failed":"2018-09-12 14:33:41"}}

Nevertheless all these documents seems indexed fine and searchable.
If the same collection not sharded  or sharded as " compositeId"   indexing
done without any errors.
Type of replicas - nrt or tolg doesn't matter.
Small Indexing (taking less than 2 minutes) run smoothly.

Testing environment - 1 node, Collection with 6 shards, 1 replica for each
shard
Collection:
/admin/collections?action=CREATE=Mycol
=6
=implicit
=s_0,s_1,s_2,s_3,s_4,s_5
=sf_shard
=Mycol 
    =10
    =0=1


I have never noticed such behavior before on my prod configuration (solr
6.3.0)
Seems like bug in new version, but I could not find any jira on issue.

Any ideas, please...

--
BR
Vadim Ivanov



RE: how to access solr in solrcloud

2018-09-12 Thread Vadim Ivanov
Hi,  Steve
If you are using  solr1:8983 to access solr and solr1 is down IMHO nothing
helps you to access dead ip.
You should switch to any other live node in the cluster or I'd propose to
have nginx as frontend to access
Solrcloud. 

-- 
BR, Vadim



-Original Message-
From: Gu, Steve (CDC/OD/OADS) (CTR) [mailto:c...@cdc.gov] 
Sent: Wednesday, September 12, 2018 4:38 PM
To: 'solr-user@lucene.apache.org'
Subject: how to access solr in solrcloud

Hi, all

I am upgrading our solr to 7.4 and would like to set up solrcloud for
failover and load balance.   There are three zookeeper servers (zk1:2181,
zk1:2182) and two solr instance solr1:8983, solr2:8983.  So what will be the
solr url should the client to use for access?  Will it be solr1:8983, the
leader?

If we  use solr1:8983 to access solr, what happens if solr1:8983 is down?
Will the request be routed to solr2:8983 via the zookeeper?  I understand
that zookeeper is doing all the coordination works but wanted to understand
how this works.

Any insight would be greatly appreciated.
Steve