AW: Solr 7.4 and log4j2 JSONLayout

2018-09-06 Thread Michael Aleythe, Sternwald
Hey,

I tried solr/server/lib/ext and solr/server/lib. I also tried without them but 
it doesn't change anything.

Best regards
Michael

-Ursprüngliche Nachricht-
Von: Varun Thacker  
Gesendet: Donnerstag, 6. September 2018 16:23
An: solr-user@lucene.apache.org
Betreff: Re: Solr 7.4 and log4j2 JSONLayout

Hi,

Where did you add the jackson core and databind libs under? Maybe it's 
conflicting with the JARs that Solr already ships?

Solr already comes with jackson dependencies

server/solr-webapp/webapp/WEB-INF/lib/ | grep jackson

jackson-annotations-2.9.5.jar

jackson-core-2.9.5.jar

jackson-core-asl-1.9.13.jar

jackson-databind-2.9.5.jar

jackson-dataformat-smile-2.9.5.jar

jackson-mapper-asl-1.9.13.jar

On Thu, Sep 6, 2018 at 6:46 AM Michael Aleythe, Sternwald < 
michael.aley...@sternwald.com> wrote:

> Hey,
>
> I'm trying to edit the log4j2 logging configuration for solr. The goal 
> is to get a log file in json format. I configured the the JSONLayout 
> for this purpose inside the rollingFile appender in the log4j2.xml. 
> After this solr stops logging entirely. Solr.log file is empty. Only 
> the solr-8983-console.log file contains 10 lines. The line "2018-09-06
> 13:22:25.378:INFO:oejs.Server:main: Started @2814ms" is the last one.
> My first guess was that the jackson-core and jackson-databind jars 
> were missing, but that did not fix the problem.
>
> Does anyone know where to find error-messages or exceptions that point 
> me towards whats going wrong here?
>
> This is my current log4j2 config file:
>
>  
>   
>
> 
>   
> 
>   %d{-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{collection} 
> %X{shard} %X{replica} %X{core}] %c{1.} %m%n
> 
>   
> 
>
>  name="RollingFile"
> fileName="${sys:solr.log.dir}/solr.log"
>
> filePattern="${sys:solr.log.dir}/solr.log.%d{-MM-dd-hh}_%i.zip" >
>   
>   
> 
> 
> 
>   
>   
> 
>
>  name="SlowFile"
> fileName="${sys:solr.log.dir}/solr_slow_requests.log"
> filePattern="${sys:solr.log.dir}/solr_slow_requests.log.%i" >
>   
> 
>   %d{-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{collection} 
> %X{shard} %X{replica} %X{core}] %c{1.} %m%n
> 
>   
>   
> 
> 
>   
>   
> 
>
>   
>   
> 
> 
> 
>  additivity="false">
>   
> 
>   
>   
> 
>   
> 
>
> Best regards
> Michael Aleythe
>
> Michael Aleythe
> Team --(sr)^(ch)--
> Java Entwickler | STERNWALD SYSTEMS GMBH
>
> Fon +49 351 31 40 6010
> Fax +49 351 31 40 6001
>
> E-Mail michael.aley...@sternwald.com
> Skype michael.aley...@sternwald.com
> Web www.sternwald.com
>
> STERNWALD SYSTEMS GMBH
> Pohlandstraße 19, 01309 Dresden, Germany Geschäftsführer Ard Meier 
> Registergericht Handelsregister Dresden, HRB 33480 UmSt-ID DE157125091
>
> SUPPORT / HOTLINE
> Fon +49 173 38 54 752
> E-Mail hotl...@sternwald.com
> Web support.sternwald.net
>
> STERNWALD Offices
> Berlin | Dresden | Düsseldorf | Hamburg | Sofia | Würzburg
>
>


preferLocalShards setting

2018-09-06 Thread Wei
Hi,

I am setting up a solr cloud with external load balancer.  Noticed the
'preferLocalShards' configuration and I am wondering how it would impact
performance. If one host can have replicas from all shards it sure will be
beneficial;  but in my 5 shard / 2 replica cloud on 5 servers, each server
will only host 2 of the 5 shards( 2 JVMs per server, each JVM have one
replica from different shards). Is it useful to set preferLocalShards=true
in this case?

Thanks,
Wei


Re: Solr range faceting

2018-09-06 Thread Erick Erickson
Indeed this doesn't look right. By my count, you're missing 599 counts
you'd expect in that range, although the after and between numbers
total the numFound.

What kind of a field is Value? Given the number of docs missing, I'd
guess you could get the number of docs down really small and post
them. Something like
values 1, 2, 3, 4, 5, 
and your range query so we could try it.

What is the fieldType definition and field for Value?

And finally, do you get different results if you use json faceting?

Best,
Erick
On Thu, Sep 6, 2018 at 5:51 PM Dwane Hall  wrote:
>
> Thanks Jan that has fixed the bucket issue but I'm a little confused at why 
> zero counts exist for some buckets when they appear to be values in them?
>
> "response":{"numFound":869,"start":0,"docs":[
>   {
> "Value":9475.08},
>   {
> "Value":780.0},
>   {
> "Value":9475.08},
>   {
> "Value":1000.0},
>   {
> "Value":50.0},
>   {
> "Value":50.0},
>   {
> "Value":0.0},
>   {
> "Value":800.0},
>   {
> "Value":0.0},
>   {
> "Value":1000.0},
>   {
> "Value":1000.0},
>   {
> "Value":5000.0},
>   {
> "Value":2000.0},
>   {
>"Value":4000.0},
>   {
> "Value":1500.0},
>   {
> "Value":0.0},
>   {
> "Value":1.0},
>   {
> "Value":5000.0},
>   {
> "Value":1000.0},
>   {
> "Value":0.0},
>   {
> "Value":1200.0},
>   {
> "Value":9000.0},
>   {
> "Value":1500.0},
>   {
> "Value":1.0},
>   {
> "Value":5000.0},
>   {
> "Value":4000.0},
>   {
> "Value":5000.0},
>   {
> "Value":5000.0},
>   {
> "Value":1.0},
>   {
> "Value":1000.0}]
>   },
>
>   "facet_counts":{
> "facet_queries":{},
> "facet_ranges":{
>   "Value":{
> "counts":[
>   "0.0",9,
>   "100.0",0,
>   "200.0",0,
>   "300.0",0,
>   "400.0",80,
>   "500.0",0,
>   "600.0",0,
>   "700.0",69,
>   "800.0",0,
>   "900.0",0,
>   "1000.0",0,
>   "1100.0",0,
>   "1200.0",0,
>   "1300.0",0,
>   "1400.0",0,
>   "1500.0",0,
>   "1600.0",0,
>   "1700.0",0,
>   "1800.0",0,
>   "1900.0",9],
> "gap":100.0,
> "before":0,
> "after":103,
> "between":766,
> "start":0.0,
> "end":2000.0}
>
> Cheers,
>
> Dwane
> 
> From: Jan H?ydahl 
> Sent: Friday, 7 September 2018 9:23:44 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr range faceting
>
> Try facet.minCount=0
>
> Jan
>
> > 7. sep. 2018 kl. 01:07 skrev Dwane Hall :
> >
> > Good morning Solr community.  I'm having a few facet range issues for which 
> > I'd appreciate some advice when somebody gets a spare couple of minutes.
> >
> > Environment
> > Solr Cloud (7.3.1)
> > Single Shard Index, No replicas
> >
> > Facet Configuration (I'm using the request params API and useParams at 
> > runtime)
> > "facet":"true",
> > "facet.mincount":1,
> > "facet.missing":"false",
> > "facet.range":"Value"
> > "f.Value.facet.range.start":0.0,
> > "f.Value.facet.range.end":2000.0,
> > "f.Value.facet.range.gap":100,
> > "f.Value.facet.range.include":"edge",
> > "f.Value.facet.range.other":"all",
> >
> > My problem
> > With my range facet configuration I'm expecting to see a facet range entry 
> > for every 'step' (100 in my case) between my facet.range.start and 
> > facet.range.end settings. Something like the following 0.0,100.0,200.0, 
> > ...2000.0 with a sum of the number of values that occur between each range 
> > step.  This does not appear to be the case and in some instances I don't 
> > appear to get counts for some range steps (800.0 and 1000.0 for example are 
> > present in my result set range below but I don't get a range value facets 
> > for these values?)
> >
> > Am I completely misunderstanding how range facets are supposed to work or 
> > is my configuration a little askew?
> >
> > Any advice would be greatly appreciated.
> >
> > The Solr Response
> > "responseHeader":{
> >"zkConnected":true,
> >"status":0,
> >"QTime":121},
> >
> >  "response":{"numFound":869,"start":0,"docs":[
> >  {
> >"Value":9475.08},
> >  {
> >"Value":780.0},
> >  {
> >"Value":1000.0},
> >  {
> >"Value":50.0},
> >  {
> >"Value":50.0},
> >  {
> >"Value":0.0},
> >  {
> >"Value":800.0},
> >  {
> >"Value":0.0},
> >  {
> >"Value":1000.0},
> >  {
> >"Value":1000.0},
> >  {
> >"Value":5000.0},
> >  {
> >"Value":2000.0},
> >  {
> >"Value":4000.0},
> >  {
> >"Value":1500.0},
> >  {
> >"Val

Re: Solr range faceting

2018-09-06 Thread Dwane Hall
Thanks Jan that has fixed the bucket issue but I'm a little confused at why 
zero counts exist for some buckets when they appear to be values in them?

"response":{"numFound":869,"start":0,"docs":[
  {
"Value":9475.08},
  {
"Value":780.0},
  {
"Value":9475.08},
  {
"Value":1000.0},
  {
"Value":50.0},
  {
"Value":50.0},
  {
"Value":0.0},
  {
"Value":800.0},
  {
"Value":0.0},
  {
"Value":1000.0},
  {
"Value":1000.0},
  {
"Value":5000.0},
  {
"Value":2000.0},
  {
   "Value":4000.0},
  {
"Value":1500.0},
  {
"Value":0.0},
  {
"Value":1.0},
  {
"Value":5000.0},
  {
"Value":1000.0},
  {
"Value":0.0},
  {
"Value":1200.0},
  {
"Value":9000.0},
  {
"Value":1500.0},
  {
"Value":1.0},
  {
"Value":5000.0},
  {
"Value":4000.0},
  {
"Value":5000.0},
  {
"Value":5000.0},
  {
"Value":1.0},
  {
"Value":1000.0}]
  },

  "facet_counts":{
"facet_queries":{},
"facet_ranges":{
  "Value":{
"counts":[
  "0.0",9,
  "100.0",0,
  "200.0",0,
  "300.0",0,
  "400.0",80,
  "500.0",0,
  "600.0",0,
  "700.0",69,
  "800.0",0,
  "900.0",0,
  "1000.0",0,
  "1100.0",0,
  "1200.0",0,
  "1300.0",0,
  "1400.0",0,
  "1500.0",0,
  "1600.0",0,
  "1700.0",0,
  "1800.0",0,
  "1900.0",9],
"gap":100.0,
"before":0,
"after":103,
"between":766,
"start":0.0,
"end":2000.0}

Cheers,

Dwane

From: Jan H?ydahl 
Sent: Friday, 7 September 2018 9:23:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr range faceting

Try facet.minCount=0

Jan

> 7. sep. 2018 kl. 01:07 skrev Dwane Hall :
>
> Good morning Solr community.  I'm having a few facet range issues for which 
> I'd appreciate some advice when somebody gets a spare couple of minutes.
>
> Environment
> Solr Cloud (7.3.1)
> Single Shard Index, No replicas
>
> Facet Configuration (I'm using the request params API and useParams at 
> runtime)
> "facet":"true",
> "facet.mincount":1,
> "facet.missing":"false",
> "facet.range":"Value"
> "f.Value.facet.range.start":0.0,
> "f.Value.facet.range.end":2000.0,
> "f.Value.facet.range.gap":100,
> "f.Value.facet.range.include":"edge",
> "f.Value.facet.range.other":"all",
>
> My problem
> With my range facet configuration I'm expecting to see a facet range entry 
> for every 'step' (100 in my case) between my facet.range.start and 
> facet.range.end settings. Something like the following 0.0,100.0,200.0, 
> ...2000.0 with a sum of the number of values that occur between each range 
> step.  This does not appear to be the case and in some instances I don't 
> appear to get counts for some range steps (800.0 and 1000.0 for example are 
> present in my result set range below but I don't get a range value facets for 
> these values?)
>
> Am I completely misunderstanding how range facets are supposed to work or is 
> my configuration a little askew?
>
> Any advice would be greatly appreciated.
>
> The Solr Response
> "responseHeader":{
>"zkConnected":true,
>"status":0,
>"QTime":121},
>
>  "response":{"numFound":869,"start":0,"docs":[
>  {
>"Value":9475.08},
>  {
>"Value":780.0},
>  {
>"Value":1000.0},
>  {
>"Value":50.0},
>  {
>"Value":50.0},
>  {
>"Value":0.0},
>  {
>"Value":800.0},
>  {
>"Value":0.0},
>  {
>"Value":1000.0},
>  {
>"Value":1000.0},
>  {
>"Value":5000.0},
>  {
>"Value":2000.0},
>  {
>"Value":4000.0},
>  {
>"Value":1500.0},
>  {
>"Value":0.0},
>  {
>"Value":1.0},
>  {
>"Value":1000.0}]
>  },
>  "facet_counts":{
>"facet_ranges":{
>  "Value":{
>"counts":[
>  "0.0",9,
>  "400.0",80,
>  "700.0",69,
>  "1900.0",9],
>"gap":100.0,
>"before":0,
>"after":103,
>"between":766,
>"start":0.0,
>"end":2000.0}}
>
> Cheers,
>
> Dwane


Re: Solr range faceting

2018-09-06 Thread Jan Høydahl
Try facet.minCount=0

Jan

> 7. sep. 2018 kl. 01:07 skrev Dwane Hall :
> 
> Good morning Solr community.  I'm having a few facet range issues for which 
> I'd appreciate some advice when somebody gets a spare couple of minutes.
> 
> Environment
> Solr Cloud (7.3.1)
> Single Shard Index, No replicas
> 
> Facet Configuration (I'm using the request params API and useParams at 
> runtime)
> "facet":"true",
> "facet.mincount":1,
> "facet.missing":"false",
> "facet.range":"Value"
> "f.Value.facet.range.start":0.0,
> "f.Value.facet.range.end":2000.0,
> "f.Value.facet.range.gap":100,
> "f.Value.facet.range.include":"edge",
> "f.Value.facet.range.other":"all",
> 
> My problem
> With my range facet configuration I'm expecting to see a facet range entry 
> for every 'step' (100 in my case) between my facet.range.start and 
> facet.range.end settings. Something like the following 0.0,100.0,200.0, 
> ...2000.0 with a sum of the number of values that occur between each range 
> step.  This does not appear to be the case and in some instances I don't 
> appear to get counts for some range steps (800.0 and 1000.0 for example are 
> present in my result set range below but I don't get a range value facets for 
> these values?)
> 
> Am I completely misunderstanding how range facets are supposed to work or is 
> my configuration a little askew?
> 
> Any advice would be greatly appreciated.
> 
> The Solr Response
> "responseHeader":{
>"zkConnected":true,
>"status":0,
>"QTime":121},
> 
>  "response":{"numFound":869,"start":0,"docs":[
>  {
>"Value":9475.08},
>  {
>"Value":780.0},
>  {
>"Value":1000.0},
>  {
>"Value":50.0},
>  {
>"Value":50.0},
>  {
>"Value":0.0},
>  {
>"Value":800.0},
>  {
>"Value":0.0},
>  {
>"Value":1000.0},
>  {
>"Value":1000.0},
>  {
>"Value":5000.0},
>  {
>"Value":2000.0},
>  {
>"Value":4000.0},
>  {
>"Value":1500.0},
>  {
>"Value":0.0},
>  {
>"Value":1.0},
>  {
>"Value":1000.0}]
>  },
>  "facet_counts":{
>"facet_ranges":{
>  "Value":{
>"counts":[
>  "0.0",9,
>  "400.0",80,
>  "700.0",69,
>  "1900.0",9],
>"gap":100.0,
>"before":0,
>"after":103,
>"between":766,
>"start":0.0,
>"end":2000.0}}
> 
> Cheers,
> 
> Dwane


Solr range faceting

2018-09-06 Thread Dwane Hall
Good morning Solr community.  I'm having a few facet range issues for which I'd 
appreciate some advice when somebody gets a spare couple of minutes.

Environment
Solr Cloud (7.3.1)
Single Shard Index, No replicas

Facet Configuration (I'm using the request params API and useParams at runtime)
"facet":"true",
"facet.mincount":1,
"facet.missing":"false",
"facet.range":"Value"
"f.Value.facet.range.start":0.0,
"f.Value.facet.range.end":2000.0,
"f.Value.facet.range.gap":100,
"f.Value.facet.range.include":"edge",
"f.Value.facet.range.other":"all",

My problem
With my range facet configuration I'm expecting to see a facet range entry for 
every 'step' (100 in my case) between my facet.range.start and facet.range.end 
settings. Something like the following 0.0,100.0,200.0, ...2000.0 with a sum of 
the number of values that occur between each range step.  This does not appear 
to be the case and in some instances I don't appear to get counts for some 
range steps (800.0 and 1000.0 for example are present in my result set range 
below but I don't get a range value facets for these values?)

Am I completely misunderstanding how range facets are supposed to work or is my 
configuration a little askew?

Any advice would be greatly appreciated.

The Solr Response
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":121},

  "response":{"numFound":869,"start":0,"docs":[
  {
"Value":9475.08},
  {
"Value":780.0},
  {
"Value":1000.0},
  {
"Value":50.0},
  {
"Value":50.0},
  {
"Value":0.0},
  {
"Value":800.0},
  {
"Value":0.0},
  {
"Value":1000.0},
  {
"Value":1000.0},
  {
"Value":5000.0},
  {
"Value":2000.0},
  {
"Value":4000.0},
  {
"Value":1500.0},
  {
"Value":0.0},
  {
"Value":1.0},
  {
"Value":1000.0}]
  },
  "facet_counts":{
"facet_ranges":{
  "Value":{
"counts":[
  "0.0",9,
  "400.0",80,
  "700.0",69,
  "1900.0",9],
"gap":100.0,
"before":0,
"after":103,
"between":766,
"start":0.0,
"end":2000.0}}

Cheers,

Dwane


Re: Multi word searching is not working getting random search results

2018-09-06 Thread Susheel Kumar
How about you search with Intermodal Schedules (plural) & try phrase slop
for better control on relevancy order

https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html


On Thu, Sep 6, 2018 at 12:10 PM Muddapati, Jagadish <
jagadish.muddap...@nscorp.com> wrote:

> Label: newbie
> Environment:
> I am currently running solr on Linux platform.
>
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.5"
>
> openjdk version "1.8.0_181"
>
> AEM version: 6.2
>
> I recently integrate solr to AEM and when i do search for multiple words
> the search results are getting randomly.
>
> search words: Intermodal schedule
> Results: First solr displaying the search results related to Intermodal
> and after few pages I am seeing the serch term schedule related pages
> randomly. I am not getting the results related to multi words on the page.
> For example: I am not seeing the results like [Terminals & Schedules |
> Intermodal | Shipping Options ... page on starting and getting random
> results and the  [Terminals & Schedules | Intermodal | Shipping Options ...
> page displaying after the 40 results.
>
> Here is the query on browser URL:
>
> http://test-servername/content/nscorp/en/search-results.html?start=0&q=Intermodal+Schedule
> <
> http://servername/content/nscorp/en/search-results.html?start=0&q=Intermodal+Schedule
> >
>
> I am using solr version 7.4
>
> Thanks,
> Jagadish M.
>
>
>


Re: Expected mime type application/octet-stream but got text/html

2018-09-06 Thread Alexandre Rafalovitch
Why is this http://host:port/solr-master/get? The normal URL is
http://host:port/solr//.  "get" as a handler
is fine,  can be empty for "collection1" in older
Solr. But what is "solr-master" here and where is the required "/solr"
part? What is your collection name?

So, to me, either your URL is wrong or you were trying to do a custom
URL and your troubleshooting should be focused around that first.

Regards,
   Alex.

On 6 September 2018 at 14:39, Walter Underwood  wrote:
> I thought there was a fix for this in SolrJ. This is a bug. It is getting 
> some error,
> like a 400 or 503, but instead of reporting the error, it makes a different 
> error about
> the content-type.
>
> That is just busted, but I can’t find a Jira for it.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>> On Sep 6, 2018, at 11:29 AM, nalsrini  wrote:
>>
>> Here is the error message I am getting:
>>
>> https://screencast.com/t/XwEjA22jX 
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Expected mime type application/octet-stream but got text/html

2018-09-06 Thread Walter Underwood
I thought there was a fix for this in SolrJ. This is a bug. It is getting some 
error,
like a 400 or 503, but instead of reporting the error, it makes a different 
error about
the content-type.

That is just busted, but I can’t find a Jira for it.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 6, 2018, at 11:29 AM, nalsrini  wrote:
> 
> Here is the error message I am getting:
> 
> https://screencast.com/t/XwEjA22jX   
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Expected mime type application/octet-stream but got text/html

2018-09-06 Thread nalsrini
Here is the error message I am getting:

https://screencast.com/t/XwEjA22jX   



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Autoscaling with core properties

2018-09-06 Thread James Strassburg
I created SOLR-12752 ( https://issues.apache.org/jira/browse/SOLR-12752 )
for this issue. We're also using user properties in our dataimporthandler
data-config.xml so SOLR-11529 ( https://issues.apache.org/jira/browse/SOLR-
11529  ) prevented us
from validating the Config API (unless we had something configured wrong).
Moving on to SOLR_OPTS to see if that will work but it is undesirable since
the admin UI will display the settings (some of them sensitive).

JiM

On Thu, Sep 6, 2018 at 8:24 AM James Strassburg 
wrote:

> Shalin,
>
> We actually found the ConfigAPI yesterday and started testing that with
> set-user-property. I should know today whether that will work or not and I
> will comment on this thread.
>
> I can open a Jira for the core props and replica props later today as well.
>
> JiM
>
> On Thu, Sep 6, 2018 at 12:37 AM Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> Hi Jim,
>>
>> Very interesting scenario that we didn't anticipate. I think this is a
>> limitation of the MoveReplica API which does not move core properties.
>>
>> But it also raises questions such as whether to always move all core
>> properties? I personally think core properties are an artifact that was
>> suitable for non-cloud Solr but does not lend well to cloud environments.
>> We also have replica properties in Solr and corresponding APIs to
>> add/update/delete them. See
>>
>> https://lucene.apache.org/solr/guide/7_4/collections-api.html#addreplicaprop
>> (however, I think even these are not carried forward by move replica
>> today). Do you mind opening a Jira to discuss how we can fix the current
>> behavior?
>>
>> Would using the Config API to set user properties work for your use-case?
>> See
>>
>> https://lucene.apache.org/solr/guide/7_4/configuring-solrconfig-xml.html#substituting-properties-in-solr-config-files
>> and
>>
>> https://lucene.apache.org/solr/guide/7_4/config-api.html#commands-for-user-defined-properties
>>
>> We can improve autoscaling actions such as ComputePlanAction to add custom
>> core properties to any add replica or move replica command. That is
>> probably worth another Jira as well.
>>
>>
>> On Wed, Sep 5, 2018 at 11:54 PM James Strassburg 
>> wrote:
>>
>> > Hello,
>> >
>> > We're creating a SolrCloud in AWS and attempting to use autoscaling to
>> add
>> > replicas during nodeAdded/nodeLost. This was working fine for us until
>> we
>> > started creating collections specifying core properties (e.g.
>> >
>> >
>> /solr/admin/collections?action=CREATE&property.synonyms_datasource=REDACTED).
>> > It seems that when the nodeLost/Added trigger fires the properties don't
>> > manifest in the core create invocation and we get errors like the
>> > following:
>> >
>> > products_20180904200015_shard1_replica_n39:
>> >
>> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>> > Could not load conf for core products_20180904200015_shard1_replica_n39:
>> > Can't load schema schema.xml: No system property or default value
>> specified
>> > for synonyms_datasource value:jdbc/${synonyms_datasource}
>> >
>> > The autoscaling API also doesn't appear to have a way to set the
>> properties
>> > when we create the triggers.
>> >
>> > Are we missing something or is this not supported at this time? I
>> couldn't
>> > find a relevant JIRA or other documentation or solr-user discussion on
>> > this.
>> >
>> > thanks,
>> >
>> > JiM
>> >
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>


Re: Need to connect solr with solrj from AWS lambda

2018-09-06 Thread nalsrini
Hi Mikhail,
I am good now after some changes in the AWS including security group.

thanks



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr 7.4.0 - bug in JMX cache stats?

2018-09-06 Thread Bojan Šmid
Hi,

  it seems the format of cache mbeans changed with 7.4.0.  And from what I
see similar change wasn't made for other mbeans, which may mean it was
accidental and may be a bug.

  In Solr 7.3.* format was (each attribute on its own, numeric type):

mbean:
solr:dom1=core,dom2=gettingstarted,dom3=shard1,dom4=replica_n1,category=CACHE,scope=searcher,name=filterCache

attributes:
  lookups java.lang.Long = 0
  hits java.lang.Long = 0
  cumulative_evictions java.lang.Long = 0
  size java.lang.Long = 0
  hitratio java.lang.Float = 0.0
  evictions java.lang.Long = 0
  cumulative_lookups java.lang.Long = 0
  cumulative_hitratio java.lang.Float = 0.0
  warmupTime java.lang.Long = 0
  inserts java.lang.Long = 0
  cumulative_inserts java.lang.Long = 0
  cumulative_hits java.lang.Long = 0


  With 7.4.0 there is a single attribute "Value" (java.lang.Object):

mbean:
solr:dom1=core,dom2=gettingstarted,dom3=shard1,dom4=replica_n1,category=CACHE,scope=searcher,name=filterCache

attributes:
  Value java.lang.Object = {lookups=0, evictions=0,
cumulative_inserts=0, cumulative_hits=0, hits=0, cumulative_evictions=0,
size=0, hitratio=0.0, cumulative_lookups=0, cumulative_hitratio=0.0,
warmupTime=0, inserts=0}


  So the question is - was this intentional change or a bug?

  Thanks,

Bojan


Use ListNet or LambdaRank algorithm to train the model for the Solr LTR

2018-09-06 Thread Zheng Lin Edwin Yeo
Hi,

Has anyone tried to use ListNet or LambdaRank algorithm to train the model
for the Solr LTR?
I am looking at NeuralNetworkModel with a Listwise approach, and these two
algorithms fits the bill. However, there is not much document of using
these two algorithms to work with Solr.

I am using Solr 7.4.0

Regards,
Edwin


Multi word searching is not working getting random search results

2018-09-06 Thread Muddapati, Jagadish
Label: newbie
Environment:
I am currently running solr on Linux platform.

NAME="Red Hat Enterprise Linux Server"
VERSION="7.5"

openjdk version "1.8.0_181"

AEM version: 6.2

I recently integrate solr to AEM and when i do search for multiple words the 
search results are getting randomly.

search words: Intermodal schedule
Results: First solr displaying the search results related to Intermodal and 
after few pages I am seeing the serch term schedule related pages randomly. I 
am not getting the results related to multi words on the page.
For example: I am not seeing the results like [Terminals & Schedules | 
Intermodal | Shipping Options ... page on starting and getting random results 
and the  [Terminals & Schedules | Intermodal | Shipping Options ... page 
displaying after the 40 results.

Here is the query on browser URL:
http://test-servername/content/nscorp/en/search-results.html?start=0&q=Intermodal+Schedule

I am using solr version 7.4

Thanks,
Jagadish M.




Corrupt Index error on Target cluster

2018-09-06 Thread Susheel Kumar
Hello,

We had a running cluster with CDCR and there were some issues with indexing
on Source cluster which got resolved after restarting the nodes (in my
absence...) and now I see  below errors on a shard at Target cluster.  Any
suggestions / ideas what could have caused this and whats the best way to
recover.

Thnx

Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2069)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1926)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1826)
at
org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:127)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:310)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at
org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:267)
... 34 more
Caused by: org.apache.lucene.index.CorruptIndexException: Corrupted
bitsPerDocBase: 6033
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard8_replica1/data/index.20180903220548447/_9nsy.tvx")))
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsIndexReader.(CompressingStoredFieldsIndexReader.java:89)
at
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.(CompressingTermVectorsReader.java:126)
at
org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat.vectorsReader(CompressingTermVectorsFormat.java:91)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:128)
at
org.apache.lucene.index.SegmentReader.(SegmentReader.java:74)
at
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
at
org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:197)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:103)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:467)
at
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:103)
at
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:79)
at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:39)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2033)
... 43 more
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum
failed (hardware problem?) : expected=e5bf0d15 actual=21722825
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="/app/solr/data/COLL_shard8_replica1/data/index.20180903220548447/_9nsy.tvx")))
at
org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419)
at
org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:462)
at
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.(CompressingTermVectorsReader.java:131)
... 54 more


Re: SolrCloud CDCR with 3+ DCs

2018-09-06 Thread cdatta
Hi Amrit, Thanks for your response.

We wiped out our complete installation and started a fresh one. Now the
multi-direction replication is working but we are seeing errors related to
the authentication sporadically. 

Thanks & Regards,
Chandi Datta



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Concurrent Update Client Stops on Exceptions Randomly v7.4

2018-09-06 Thread Erick Erickson
I would seriously consider moving away from DIH to SolrJ if you want
to tweak on this level, see:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

One other alternative is to incorporate a ScriptUpdateProcessor in
your update chain to intercept these on the way in to being indexed
and "do something" to fix it up.

ConcurrentUpdateSolrServer shouldn't "just quit", I'd guess something in DIH.

Best,
Erick
On Thu, Sep 6, 2018 at 12:52 AM deniz  wrote:
>
> I am trying to write a wrapper for DIH, so i can leverage the field type
> guessing while importing the sql data.
>
> the query is supposed to retrieve 400K+ documents. in the test data in db,
> there are dirty date fields, which has data like '1966-00-00' or
> '1987-10-00' as well.
>
> I am running the code below:
>
>  public void dataimport(ConcurrentUpdateSolrClient updateClient, String
> importSql) {
>
> try {
>
> Connection conn = DriverManager.getConnection("connection
> string","user","pass");
> Statement stmt =
> conn.createStatement(ResultSet.TYPE_FORWARD_ONLY,
> ResultSet.CONCUR_READ_ONLY);
> stmt.setFetchSize(Integer.MIN_VALUE);
> ResultSet rs = stmt.executeQuery(importsql);
> ResultSetMetaData resultSetMetaData = rs.getMetaData();
> List fields = new ArrayList<>();
> for(int index=1; index < resultSetMetaData.getColumnCount();
> index++){
> fields.add(new
> SolrFieldObject(resultSetMetaData.getColumnLabel(index),
> resultSetMetaData.getColumnClassName(index)));
> }
> while(rs.next()){
> SolrInputDocument solrInputDocument = new
> SolrInputDocument();
> for(SolrFieldObject field : fields){
> try{
> Object dataObject = rs.getString(field.name());
> Optional.ofNullable(dataObject).ifPresent(
> databaseInfo ->{
> solrInputDocument.addField(field.name(),
> String.valueOf(databaseInfo));
> }
> );
> }catch(Exception e){
> e.printStackTrace();
> }
>
> }
> try{
>  UpdateRequest updateRequest = new UpdateRequest();
>  updateRequest.setCommitWithin(1);
> try{
>   updateRequest.add(solrInputDocument);
>   updateRequest.process(updateClient);
>
> }catch(Exception e){
>   e.printStackTrace();
> }
> }catch(Exception e){
> System.out.println("Inner -> " + e.getMessage());
> }
> }
> stmt.close();
> conn.close();
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
>
> The code is working fine, except that it is randomly stopping with the logs
> like 'Error adding field 'day'='1976-00-00' msg=Invalid Date
> String:'1976-00-00' on random documents. Although there are many other
> documents with invalid dates, those are logged as errors on the server side,
> but client works fine and continues to push other document, until it stops
> on random document with the given error.
>
> Are there any error threshold value that makes the concurrent update client
> stop after some time? or there are some other points I am missing while
> dealing with this kind of updates?
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Heap Memory Problem after Upgrading to 7.4.0

2018-09-06 Thread Erick Erickson
All:

Let's move the rest of the conversation over to the JIRA Tomás raised,
so we have a record of what's been attempted to track this.

It would be great if Markus and Björn could add some environment info
on the JIRA, in particular the version of Java you're both using and
the op system etc...

Thanks,
Erick
On Thu, Sep 6, 2018 at 1:29 AM Markus Jelsma  wrote:
>
> Thanks Tomás!
>
> Björn, can you reproduce the problem in a local and controlled environment?
>
> Markus
>
>
>
> -Original message-
> > From:Tomás Fernández Löbbe 
> > Sent: Wednesday 5th September 2018 18:32
> > To: solr-user@lucene.apache.org
> > Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
> >
> > I think this is pretty bad. I created
> > https://issues.apache.org/jira/browse/SOLR-12743. Feel free to add any more
> > details you have there.
> >
> > On Mon, Sep 3, 2018 at 1:50 PM Markus Jelsma 
> > wrote:
> >
> > > Hello Björn,
> > >
> > > Take great care, 7.2.1 cannot read an index written by 7.4.0, so you
> > > cannot roll back but need to reindex!
> > >
> > > Andrey Kudryavtsev made a good suggestion in the thread on how to find the
> > > culprit, but it will be a tedious task. I have not yet had the time or
> > > courage to venture there.
> > >
> > > Hope it helps,
> > > Markus
> > >
> > >
> > >
> > > -Original message-
> > > > From:Björn Häuser 
> > > > Sent: Monday 3rd September 2018 22:28
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
> > > >
> > > > Hi Markus,
> > > >
> > > > this reads exactly like what we have. Where you able to figure out
> > > anything? Currently thinking about rollbacking to 7.2.1.
> > > >
> > > >
> > > >
> > > > > On 3. Sep 2018, at 21:54, Markus Jelsma 
> > > wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > Getting an OOM plus the fact you are having a lot of IndexSearcher
> > > instances rings a familiar bell. One of our collections has the same issue
> > > [1] when we attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all
> > > our custom Solr code but had to keep our Lucene filters in the schema, the
> > > problem persisted.
> > > > >
> > > > > The odd thing, however, is that you appear to have the same problem,
> > > but not with 7.3.0? Since you shortly after 7.3.0 upgraded to 7.4.0, can
> > > you confirm the problem is not also in 7.3.0?
> > > > >
> > > >
> > > > We had very similar problems with 7.3.0 but never analyzed them and just
> > > updated to 7.4.0 because I thought thats the bug we hit:
> > > https://issues.apache.org/jira/browse/SOLR-11882 <
> > > https://issues.apache.org/jira/browse/SOLR-11882>
> > > >
> > > >
> > > > > You should see the instance count for IndexSearcher increase by one
> > > for each replica on each commit.
> > > >
> > > >
> > > > Sorry, where can I find this? ;) Sorry, did not find anything.
> > > >
> > > > Thanks
> > > > Björn
> > > >
> > > > >
> > > > > Regards,
> > > > > Markus
> > > > >
> > > > > [1]
> > > http://lucene.472066.n3.nabble.com/RE-7-3-appears-to-leak-td4396232.html
> > > > >
> > > > >
> > > > >
> > > > > -Original message-
> > > > >> From:Erick Erickson 
> > > > >> Sent: Monday 3rd September 2018 20:49
> > > > >> To: solr-user 
> > > > >> Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
> > > > >>
> > > > >> I would expect at least 1 IndexSearcher per replica, how many total
> > > > >> replicas hosted in your JVM?
> > > > >>
> > > > >> Plus, if you're actively indexing, there may temporarily be 2
> > > > >> IndexSearchers open while the new searcher warms.
> > > > >>
> > > > >> And there may be quite a few caches, at least queryResultCache and
> > > > >> filterCache and documentCache, one of each per replica and maybe two
> > > > >> (for queryResultCache and filterCache) if you have a background
> > > > >> searcher autowarming.
> > > > >>
> > > > >> At a glance, your autowarm counts are very high, so it may take some
> > > > >> time to autowarm leading to multiple IndexSearchers and caches open
> > > > >> per replica when you happen to hit a commit point. I usually start
> > > > >> with 16-20 as an autowarm count, the benefit decreases rapidly as you
> > > > >> increase the count.
> > > > >>
> > > > >> I'm not quite sure why it would be different in 7x .vs. 6x. How much
> > > > >> heap do you allocate to the JVM? And do you see similar heap dumps in
> > > > >> 6.6?
> > > > >>
> > > > >> Best,
> > > > >> Erick
> > > > >> On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser 
> > > > >> 
> > > wrote:
> > > > >>>
> > > > >>> Hello,
> > > > >>>
> > > > >>> we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard
> > > each, 4 replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We
> > > are running Zookeeper 4.1.13.
> > > > >>>
> > > > >>> Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space
> > > exhaustion. After obtaining a heap dump it looks like that we have a lot 
> > > of
> > > IndexSearchers open for our larg

Re: Solr 7.4 and log4j2 JSONLayout

2018-09-06 Thread Varun Thacker
Hi,

Where did you add the jackson core and databind libs under? Maybe it's
conflicting with the JARs that Solr already ships?

Solr already comes with jackson dependencies

server/solr-webapp/webapp/WEB-INF/lib/ | grep jackson

jackson-annotations-2.9.5.jar

jackson-core-2.9.5.jar

jackson-core-asl-1.9.13.jar

jackson-databind-2.9.5.jar

jackson-dataformat-smile-2.9.5.jar

jackson-mapper-asl-1.9.13.jar

On Thu, Sep 6, 2018 at 6:46 AM Michael Aleythe, Sternwald <
michael.aley...@sternwald.com> wrote:

> Hey,
>
> I'm trying to edit the log4j2 logging configuration for solr. The goal is
> to get a log file in json format. I configured the the JSONLayout for this
> purpose inside the rollingFile appender in the log4j2.xml. After this solr
> stops logging entirely. Solr.log file is empty. Only the
> solr-8983-console.log file contains 10 lines. The line "2018-09-06
> 13:22:25.378:INFO:oejs.Server:main: Started @2814ms" is the last one.
> My first guess was that the jackson-core and jackson-databind jars were
> missing, but that did not fix the problem.
>
> Does anyone know where to find error-messages or exceptions that point me
> towards whats going wrong here?
>
> This is my current log4j2 config file:
>
> 
> 
>   
>
> 
>   
> 
>   %d{-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{collection} %X{shard}
> %X{replica} %X{core}] %c{1.} %m%n
> 
>   
> 
>
>  name="RollingFile"
> fileName="${sys:solr.log.dir}/solr.log"
>
> filePattern="${sys:solr.log.dir}/solr.log.%d{-MM-dd-hh}_%i.zip" >
>   
>   
> 
> 
> 
>   
>   
> 
>
>  name="SlowFile"
> fileName="${sys:solr.log.dir}/solr_slow_requests.log"
> filePattern="${sys:solr.log.dir}/solr_slow_requests.log.%i" >
>   
> 
>   %d{-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{collection} %X{shard}
> %X{replica} %X{core}] %c{1.} %m%n
> 
>   
>   
> 
> 
>   
>   
> 
>
>   
>   
> 
> 
> 
>  additivity="false">
>   
> 
>   
>   
> 
>   
> 
>
> Best regards
> Michael Aleythe
>
> Michael Aleythe
> Team --(sr)^(ch)--
> Java Entwickler | STERNWALD SYSTEMS GMBH
>
> Fon +49 351 31 40 6010
> Fax +49 351 31 40 6001
>
> E-Mail michael.aley...@sternwald.com
> Skype michael.aley...@sternwald.com
> Web www.sternwald.com
>
> STERNWALD SYSTEMS GMBH
> Pohlandstraße 19, 01309 Dresden, Germany
> Geschäftsführer Ard Meier
> Registergericht Handelsregister Dresden, HRB 33480
> UmSt-ID DE157125091
>
> SUPPORT / HOTLINE
> Fon +49 173 38 54 752
> E-Mail hotl...@sternwald.com
> Web support.sternwald.net
>
> STERNWALD Offices
> Berlin | Dresden | Düsseldorf | Hamburg | Sofia | Würzburg
>
>


Re: Null Pointer Exception without details on Update in schemaless 7.4

2018-09-06 Thread Steve Rowe
Hi,

Null handling in AddSchemaFieldsUpdateProcessorFactory has been added for Solr 
7.5, see https://issues.apache.org/jira/browse/SOLR-12704 .

--
Steve
www.lucidworks.com

> On Sep 6, 2018, at 1:11 AM, deniz  wrote:
> 
> server is also 7.4 
> 
> and your assumption/check on null values on input doc seems legit... I have
> added some checks before pushing the doc to solr and replaced null values
> with some default values, and updates seem going through w/o problem...
> though having a little bit explanatory logs on server side might be
> useful...
> 
> thanks a lot for pointing out the null fields 
> 
> 
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Solr 7.4 and log4j2 JSONLayout

2018-09-06 Thread Michael Aleythe, Sternwald
Hey,

I'm trying to edit the log4j2 logging configuration for solr. The goal is to 
get a log file in json format. I configured the the JSONLayout for this purpose 
inside the rollingFile appender in the log4j2.xml. After this solr stops 
logging entirely. Solr.log file is empty. Only the solr-8983-console.log file 
contains 10 lines. The line "2018-09-06 13:22:25.378:INFO:oejs.Server:main: 
Started @2814ms" is the last one.
My first guess was that the jackson-core and jackson-databind jars were 
missing, but that did not fix the problem.

Does anyone know where to find error-messages or exceptions that point me 
towards whats going wrong here?

This is my current log4j2 config file:



  


  

  %d{-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{collection} %X{shard} 
%X{replica} %X{core}] %c{1.} %m%n

  



  
  



  
  



  

  %d{-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{collection} %X{shard} 
%X{replica} %X{core}] %c{1.} %m%n

  
  


  
  


  
  




  

  
  

  


Best regards
Michael Aleythe

Michael Aleythe
Team --(sr)^(ch)--
Java Entwickler | STERNWALD SYSTEMS GMBH

Fon +49 351 31 40 6010
Fax +49 351 31 40 6001

E-Mail michael.aley...@sternwald.com
Skype michael.aley...@sternwald.com
Web www.sternwald.com

STERNWALD SYSTEMS GMBH
Pohlandstraße 19, 01309 Dresden, Germany
Geschäftsführer Ard Meier
Registergericht Handelsregister Dresden, HRB 33480
UmSt-ID DE157125091

SUPPORT / HOTLINE
Fon +49 173 38 54 752
E-Mail hotl...@sternwald.com
Web support.sternwald.net

STERNWALD Offices
Berlin | Dresden | Düsseldorf | Hamburg | Sofia | Würzburg



Re: Autoscaling with core properties

2018-09-06 Thread James Strassburg
Shalin,

We actually found the ConfigAPI yesterday and started testing that with
set-user-property. I should know today whether that will work or not and I
will comment on this thread.

I can open a Jira for the core props and replica props later today as well.

JiM

On Thu, Sep 6, 2018 at 12:37 AM Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Jim,
>
> Very interesting scenario that we didn't anticipate. I think this is a
> limitation of the MoveReplica API which does not move core properties.
>
> But it also raises questions such as whether to always move all core
> properties? I personally think core properties are an artifact that was
> suitable for non-cloud Solr but does not lend well to cloud environments.
> We also have replica properties in Solr and corresponding APIs to
> add/update/delete them. See
>
> https://lucene.apache.org/solr/guide/7_4/collections-api.html#addreplicaprop
> (however, I think even these are not carried forward by move replica
> today). Do you mind opening a Jira to discuss how we can fix the current
> behavior?
>
> Would using the Config API to set user properties work for your use-case?
> See
>
> https://lucene.apache.org/solr/guide/7_4/configuring-solrconfig-xml.html#substituting-properties-in-solr-config-files
> and
>
> https://lucene.apache.org/solr/guide/7_4/config-api.html#commands-for-user-defined-properties
>
> We can improve autoscaling actions such as ComputePlanAction to add custom
> core properties to any add replica or move replica command. That is
> probably worth another Jira as well.
>
>
> On Wed, Sep 5, 2018 at 11:54 PM James Strassburg 
> wrote:
>
> > Hello,
> >
> > We're creating a SolrCloud in AWS and attempting to use autoscaling to
> add
> > replicas during nodeAdded/nodeLost. This was working fine for us until we
> > started creating collections specifying core properties (e.g.
> >
> >
> /solr/admin/collections?action=CREATE&property.synonyms_datasource=REDACTED).
> > It seems that when the nodeLost/Added trigger fires the properties don't
> > manifest in the core create invocation and we get errors like the
> > following:
> >
> > products_20180904200015_shard1_replica_n39:
> >
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > Could not load conf for core products_20180904200015_shard1_replica_n39:
> > Can't load schema schema.xml: No system property or default value
> specified
> > for synonyms_datasource value:jdbc/${synonyms_datasource}
> >
> > The autoscaling API also doesn't appear to have a way to set the
> properties
> > when we create the triggers.
> >
> > Are we missing something or is this not supported at this time? I
> couldn't
> > find a relevant JIRA or other documentation or solr-user discussion on
> > this.
> >
> > thanks,
> >
> > JiM
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Streaming timeseries() and buckets with no docs

2018-09-06 Thread Jan Høydahl
Thanks!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 6. sep. 2018 kl. 15:09 skrev Joel Bernstein :
> 
> I found the ticket you created and commented on it. I'll work on this today.
> 
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> 
> On Thu, Sep 6, 2018 at 9:04 AM Joel Bernstein  wrote:
> 
>> Ok, I'll create a ticket for this, it's a very quick fix. I'll try to
>> commit today.
>> 
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>> 
>> 
>> On Thu, Sep 6, 2018 at 6:52 AM Jan Høydahl  wrote:
>> 
>>> Created https://issues.apache.org/jira/browse/SOLR-12749
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
 5. sep. 2018 kl. 23:48 skrev Jan Høydahl :
 
 Checked git history for TimeSeriesStream on master, and I cannot see
>>> any commits related to this?
 
 SOLR-11914: Deprecated some SolrParams methods. * toSolrParams(nl)
>>> moved to a NamedList method, which is more natural. David Smiley
>>> 23.04.2018, 19:26
 SOLR-11629: Add new CloudSolrClient.Builder ctors Jason Gerlowski
>>> 10.03.2018, 15:30
 SOLR-11799: Fix NPE and class cast exceptions in the TimeSeriesStream
>>> Joel Bernstein 28.12.2017, 17:14
 SOLR-11490: Add missing @since tags To all descendants of TupleStream
>>> Alexandre Rafalovitch 19.10.2017, 03:38
 SOLR-10770: Fix precommit Joel Bernstein 30.05.2017, 20:51
 SOLR-10770: Add date formatting to timeseries Streaming Expression Joel
>>> Bernstein 30.05.2017, 20:38
 SOLR-10566: Fix error handling Joel Bernstein 01.05.2017, 18:06
 SEARCH-313: Handled unescaped plus sign in gap Joel Bernstein
>>> 27.04.2017, 04:34
 SOLR-10566: Fix precommit Joel Bernstein 26.04.2017, 17:17
 SOLR-10566: Add timeseries Streaming Expression Joel Bernstein
>>> 26.04.2017, 16:57
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com 
 
> 5. sep. 2018 kl. 16:12 skrev Jan Høydahl >> >:
> 
> I have tested this with latest released ver 7.4.0
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com 
> 
>> 4. sep. 2018 kl. 16:32 skrev Joel Bernstein >> >:
>> 
>> Which version are you using?
>> 
>> I remember addressing this issue, but it may have been in Alfresco's
>> version of Solr and never got ported back.
>> 
>> I do agree that in a time series a null value is not what people
>>> want. It
>> is a very small change to populate with zeros if it has not already
>>> been
>> done in the latest versions.
>> 
>> Joel Bernstein
>> http://joelsolr.blogspot.com/ 
>> 
>> 
>> On Mon, Sep 3, 2018 at 8:58 AM Jan Høydahl >> > wrote:
>> 
>>> Hi
>>> 
>>> We have a timeseries expression with gap="+1DAY" and a sum(imps_l) to
>>> aggregate sums of an integer for each bucket.
>>> Now, some day buckets do not contain any documents at all, and
>>> instead of
>>> returning a tuple with value 0, it returns
>>> a tuple with no entry at all for the sum, see the bucket for date_dt
>>> 2018-06-22 below:
>>> 
>>> {
>>> "result-set": {
>>>   "docs": [
>>> {
>>>   "sum(imps_l)": 0,
>>>   "date_dt": "2018-06-21",
>>>   "count(*)": 5
>>> },
>>> {
>>>   "date_dt": "2018-06-22",
>>>   "count(*)": 0
>>> },
>>> {
>>>   "EOF": true,
>>>   "RESPONSE_TIME": 3
>>> }
>>>   ]
>>> }
>>> }
>>> 
>>> 
>>> Now when we want to convert this into a column using
>>> col(a,'sum(imps_l)')
>>> then that array will get mostly numbers
>>> but also some string entries 'sum(imps_l)' which is the key name. I
>>> need
>>> purely integers in the column.
>>> 
>>> Should the timeseries() have output values for all functions even if
>>> there
>>> are no documents in the bucket?
>>> Or is there something similar to the select() expression that can
>>> take a
>>> stream of tuples not originating directly
>>> from search() and replace values? Or is there perhaps a function
>>> that can
>>> loop through the column produced by col()
>>> and replace non-numeric values with 0?
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com 
>>> 
>>> 
> 
 
>>> 
>>> 



Re: Streaming timeseries() and buckets with no docs

2018-09-06 Thread Joel Bernstein
I found the ticket you created and commented on it. I'll work on this today.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Sep 6, 2018 at 9:04 AM Joel Bernstein  wrote:

> Ok, I'll create a ticket for this, it's a very quick fix. I'll try to
> commit today.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Sep 6, 2018 at 6:52 AM Jan Høydahl  wrote:
>
>> Created https://issues.apache.org/jira/browse/SOLR-12749
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> > 5. sep. 2018 kl. 23:48 skrev Jan Høydahl :
>> >
>> > Checked git history for TimeSeriesStream on master, and I cannot see
>> any commits related to this?
>> >
>> > SOLR-11914: Deprecated some SolrParams methods. * toSolrParams(nl)
>> moved to a NamedList method, which is more natural. David Smiley
>> 23.04.2018, 19:26
>> > SOLR-11629: Add new CloudSolrClient.Builder ctors Jason Gerlowski
>> 10.03.2018, 15:30
>> > SOLR-11799: Fix NPE and class cast exceptions in the TimeSeriesStream
>> Joel Bernstein 28.12.2017, 17:14
>> > SOLR-11490: Add missing @since tags To all descendants of TupleStream
>> Alexandre Rafalovitch 19.10.2017, 03:38
>> > SOLR-10770: Fix precommit Joel Bernstein 30.05.2017, 20:51
>> > SOLR-10770: Add date formatting to timeseries Streaming Expression Joel
>> Bernstein 30.05.2017, 20:38
>> > SOLR-10566: Fix error handling Joel Bernstein 01.05.2017, 18:06
>> > SEARCH-313: Handled unescaped plus sign in gap Joel Bernstein
>> 27.04.2017, 04:34
>> > SOLR-10566: Fix precommit Joel Bernstein 26.04.2017, 17:17
>> > SOLR-10566: Add timeseries Streaming Expression Joel Bernstein
>> 26.04.2017, 16:57
>> >
>> > --
>> > Jan Høydahl, search solution architect
>> > Cominvent AS - www.cominvent.com 
>> >
>> >> 5. sep. 2018 kl. 16:12 skrev Jan Høydahl > >:
>> >>
>> >> I have tested this with latest released ver 7.4.0
>> >>
>> >> --
>> >> Jan Høydahl, search solution architect
>> >> Cominvent AS - www.cominvent.com 
>> >>
>> >>> 4. sep. 2018 kl. 16:32 skrev Joel Bernstein > >:
>> >>>
>> >>> Which version are you using?
>> >>>
>> >>> I remember addressing this issue, but it may have been in Alfresco's
>> >>> version of Solr and never got ported back.
>> >>>
>> >>> I do agree that in a time series a null value is not what people
>> want. It
>> >>> is a very small change to populate with zeros if it has not already
>> been
>> >>> done in the latest versions.
>> >>>
>> >>> Joel Bernstein
>> >>> http://joelsolr.blogspot.com/ 
>> >>>
>> >>>
>> >>> On Mon, Sep 3, 2018 at 8:58 AM Jan Høydahl > > wrote:
>> >>>
>>  Hi
>> 
>>  We have a timeseries expression with gap="+1DAY" and a sum(imps_l) to
>>  aggregate sums of an integer for each bucket.
>>  Now, some day buckets do not contain any documents at all, and
>> instead of
>>  returning a tuple with value 0, it returns
>>  a tuple with no entry at all for the sum, see the bucket for date_dt
>>  2018-06-22 below:
>> 
>>  {
>>   "result-set": {
>> "docs": [
>>   {
>> "sum(imps_l)": 0,
>> "date_dt": "2018-06-21",
>> "count(*)": 5
>>   },
>>   {
>> "date_dt": "2018-06-22",
>> "count(*)": 0
>>   },
>>   {
>> "EOF": true,
>> "RESPONSE_TIME": 3
>>   }
>> ]
>>   }
>>  }
>> 
>> 
>>  Now when we want to convert this into a column using
>> col(a,'sum(imps_l)')
>>  then that array will get mostly numbers
>>  but also some string entries 'sum(imps_l)' which is the key name. I
>> need
>>  purely integers in the column.
>> 
>>  Should the timeseries() have output values for all functions even if
>> there
>>  are no documents in the bucket?
>>  Or is there something similar to the select() expression that can
>> take a
>>  stream of tuples not originating directly
>>  from search() and replace values? Or is there perhaps a function
>> that can
>>  loop through the column produced by col()
>>  and replace non-numeric values with 0?
>> 
>>  --
>>  Jan Høydahl, search solution architect
>>  Cominvent AS - www.cominvent.com 
>> 
>> 
>> >>
>> >
>>
>>


Re: Streaming timeseries() and buckets with no docs

2018-09-06 Thread Joel Bernstein
Ok, I'll create a ticket for this, it's a very quick fix. I'll try to
commit today.

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Sep 6, 2018 at 6:52 AM Jan Høydahl  wrote:

> Created https://issues.apache.org/jira/browse/SOLR-12749
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 5. sep. 2018 kl. 23:48 skrev Jan Høydahl :
> >
> > Checked git history for TimeSeriesStream on master, and I cannot see any
> commits related to this?
> >
> > SOLR-11914: Deprecated some SolrParams methods. * toSolrParams(nl) moved
> to a NamedList method, which is more natural. David Smiley 23.04.2018, 19:26
> > SOLR-11629: Add new CloudSolrClient.Builder ctors Jason Gerlowski
> 10.03.2018, 15:30
> > SOLR-11799: Fix NPE and class cast exceptions in the TimeSeriesStream
> Joel Bernstein 28.12.2017, 17:14
> > SOLR-11490: Add missing @since tags To all descendants of TupleStream
> Alexandre Rafalovitch 19.10.2017, 03:38
> > SOLR-10770: Fix precommit Joel Bernstein 30.05.2017, 20:51
> > SOLR-10770: Add date formatting to timeseries Streaming Expression Joel
> Bernstein 30.05.2017, 20:38
> > SOLR-10566: Fix error handling Joel Bernstein 01.05.2017, 18:06
> > SEARCH-313: Handled unescaped plus sign in gap Joel Bernstein
> 27.04.2017, 04:34
> > SOLR-10566: Fix precommit Joel Bernstein 26.04.2017, 17:17
> > SOLR-10566: Add timeseries Streaming Expression Joel Bernstein
> 26.04.2017, 16:57
> >
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com 
> >
> >> 5. sep. 2018 kl. 16:12 skrev Jan Høydahl  >:
> >>
> >> I have tested this with latest released ver 7.4.0
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com 
> >>
> >>> 4. sep. 2018 kl. 16:32 skrev Joel Bernstein  >:
> >>>
> >>> Which version are you using?
> >>>
> >>> I remember addressing this issue, but it may have been in Alfresco's
> >>> version of Solr and never got ported back.
> >>>
> >>> I do agree that in a time series a null value is not what people want.
> It
> >>> is a very small change to populate with zeros if it has not already
> been
> >>> done in the latest versions.
> >>>
> >>> Joel Bernstein
> >>> http://joelsolr.blogspot.com/ 
> >>>
> >>>
> >>> On Mon, Sep 3, 2018 at 8:58 AM Jan Høydahl  > wrote:
> >>>
>  Hi
> 
>  We have a timeseries expression with gap="+1DAY" and a sum(imps_l) to
>  aggregate sums of an integer for each bucket.
>  Now, some day buckets do not contain any documents at all, and
> instead of
>  returning a tuple with value 0, it returns
>  a tuple with no entry at all for the sum, see the bucket for date_dt
>  2018-06-22 below:
> 
>  {
>   "result-set": {
> "docs": [
>   {
> "sum(imps_l)": 0,
> "date_dt": "2018-06-21",
> "count(*)": 5
>   },
>   {
> "date_dt": "2018-06-22",
> "count(*)": 0
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 3
>   }
> ]
>   }
>  }
> 
> 
>  Now when we want to convert this into a column using
> col(a,'sum(imps_l)')
>  then that array will get mostly numbers
>  but also some string entries 'sum(imps_l)' which is the key name. I
> need
>  purely integers in the column.
> 
>  Should the timeseries() have output values for all functions even if
> there
>  are no documents in the bucket?
>  Or is there something similar to the select() expression that can
> take a
>  stream of tuples not originating directly
>  from search() and replace values? Or is there perhaps a function that
> can
>  loop through the column produced by col()
>  and replace non-numeric values with 0?
> 
>  --
>  Jan Høydahl, search solution architect
>  Cominvent AS - www.cominvent.com 
> 
> 
> >>
> >
>
>


Re: Streaming timeseries() and buckets with no docs

2018-09-06 Thread Jan Høydahl
Created https://issues.apache.org/jira/browse/SOLR-12749

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 5. sep. 2018 kl. 23:48 skrev Jan Høydahl :
> 
> Checked git history for TimeSeriesStream on master, and I cannot see any 
> commits related to this?
> 
> SOLR-11914: Deprecated some SolrParams methods. * toSolrParams(nl) moved to a 
> NamedList method, which is more natural. David Smiley 23.04.2018, 19:26
> SOLR-11629: Add new CloudSolrClient.Builder ctors Jason Gerlowski 10.03.2018, 
> 15:30
> SOLR-11799: Fix NPE and class cast exceptions in the TimeSeriesStream Joel 
> Bernstein 28.12.2017, 17:14
> SOLR-11490: Add missing @since tags To all descendants of TupleStream 
> Alexandre Rafalovitch 19.10.2017, 03:38
> SOLR-10770: Fix precommit Joel Bernstein 30.05.2017, 20:51
> SOLR-10770: Add date formatting to timeseries Streaming Expression Joel 
> Bernstein 30.05.2017, 20:38
> SOLR-10566: Fix error handling Joel Bernstein 01.05.2017, 18:06
> SEARCH-313: Handled unescaped plus sign in gap Joel Bernstein 27.04.2017, 
> 04:34
> SOLR-10566: Fix precommit Joel Bernstein 26.04.2017, 17:17
> SOLR-10566: Add timeseries Streaming Expression Joel Bernstein 26.04.2017, 
> 16:57
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com 
> 
>> 5. sep. 2018 kl. 16:12 skrev Jan Høydahl > >:
>> 
>> I have tested this with latest released ver 7.4.0
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com 
>> 
>>> 4. sep. 2018 kl. 16:32 skrev Joel Bernstein >> >:
>>> 
>>> Which version are you using?
>>> 
>>> I remember addressing this issue, but it may have been in Alfresco's
>>> version of Solr and never got ported back.
>>> 
>>> I do agree that in a time series a null value is not what people want. It
>>> is a very small change to populate with zeros if it has not already been
>>> done in the latest versions.
>>> 
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/ 
>>> 
>>> 
>>> On Mon, Sep 3, 2018 at 8:58 AM Jan Høydahl >> > wrote:
>>> 
 Hi
 
 We have a timeseries expression with gap="+1DAY" and a sum(imps_l) to
 aggregate sums of an integer for each bucket.
 Now, some day buckets do not contain any documents at all, and instead of
 returning a tuple with value 0, it returns
 a tuple with no entry at all for the sum, see the bucket for date_dt
 2018-06-22 below:
 
 {
  "result-set": {
"docs": [
  {
"sum(imps_l)": 0,
"date_dt": "2018-06-21",
"count(*)": 5
  },
  {
"date_dt": "2018-06-22",
"count(*)": 0
  },
  {
"EOF": true,
"RESPONSE_TIME": 3
  }
]
  }
 }
 
 
 Now when we want to convert this into a column using col(a,'sum(imps_l)')
 then that array will get mostly numbers
 but also some string entries 'sum(imps_l)' which is the key name. I need
 purely integers in the column.
 
 Should the timeseries() have output values for all functions even if there
 are no documents in the bucket?
 Or is there something similar to the select() expression that can take a
 stream of tuples not originating directly
 from search() and replace values? Or is there perhaps a function that can
 loop through the column produced by col()
 and replace non-numeric values with 0?
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com 
 
 
>> 
> 



RE: Heap Memory Problem after Upgrading to 7.4.0

2018-09-06 Thread Markus Jelsma
Thanks Tomás!

Björn, can you reproduce the problem in a local and controlled environment?

Markus

 
 
-Original message-
> From:Tomás Fernández Löbbe 
> Sent: Wednesday 5th September 2018 18:32
> To: solr-user@lucene.apache.org
> Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
> 
> I think this is pretty bad. I created
> https://issues.apache.org/jira/browse/SOLR-12743. Feel free to add any more
> details you have there.
> 
> On Mon, Sep 3, 2018 at 1:50 PM Markus Jelsma 
> wrote:
> 
> > Hello Björn,
> >
> > Take great care, 7.2.1 cannot read an index written by 7.4.0, so you
> > cannot roll back but need to reindex!
> >
> > Andrey Kudryavtsev made a good suggestion in the thread on how to find the
> > culprit, but it will be a tedious task. I have not yet had the time or
> > courage to venture there.
> >
> > Hope it helps,
> > Markus
> >
> >
> >
> > -Original message-
> > > From:Björn Häuser 
> > > Sent: Monday 3rd September 2018 22:28
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
> > >
> > > Hi Markus,
> > >
> > > this reads exactly like what we have. Where you able to figure out
> > anything? Currently thinking about rollbacking to 7.2.1.
> > >
> > >
> > >
> > > > On 3. Sep 2018, at 21:54, Markus Jelsma 
> > wrote:
> > > >
> > > > Hello,
> > > >
> > > > Getting an OOM plus the fact you are having a lot of IndexSearcher
> > instances rings a familiar bell. One of our collections has the same issue
> > [1] when we attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all
> > our custom Solr code but had to keep our Lucene filters in the schema, the
> > problem persisted.
> > > >
> > > > The odd thing, however, is that you appear to have the same problem,
> > but not with 7.3.0? Since you shortly after 7.3.0 upgraded to 7.4.0, can
> > you confirm the problem is not also in 7.3.0?
> > > >
> > >
> > > We had very similar problems with 7.3.0 but never analyzed them and just
> > updated to 7.4.0 because I thought thats the bug we hit:
> > https://issues.apache.org/jira/browse/SOLR-11882 <
> > https://issues.apache.org/jira/browse/SOLR-11882>
> > >
> > >
> > > > You should see the instance count for IndexSearcher increase by one
> > for each replica on each commit.
> > >
> > >
> > > Sorry, where can I find this? ;) Sorry, did not find anything.
> > >
> > > Thanks
> > > Björn
> > >
> > > >
> > > > Regards,
> > > > Markus
> > > >
> > > > [1]
> > http://lucene.472066.n3.nabble.com/RE-7-3-appears-to-leak-td4396232.html
> > > >
> > > >
> > > >
> > > > -Original message-
> > > >> From:Erick Erickson 
> > > >> Sent: Monday 3rd September 2018 20:49
> > > >> To: solr-user 
> > > >> Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
> > > >>
> > > >> I would expect at least 1 IndexSearcher per replica, how many total
> > > >> replicas hosted in your JVM?
> > > >>
> > > >> Plus, if you're actively indexing, there may temporarily be 2
> > > >> IndexSearchers open while the new searcher warms.
> > > >>
> > > >> And there may be quite a few caches, at least queryResultCache and
> > > >> filterCache and documentCache, one of each per replica and maybe two
> > > >> (for queryResultCache and filterCache) if you have a background
> > > >> searcher autowarming.
> > > >>
> > > >> At a glance, your autowarm counts are very high, so it may take some
> > > >> time to autowarm leading to multiple IndexSearchers and caches open
> > > >> per replica when you happen to hit a commit point. I usually start
> > > >> with 16-20 as an autowarm count, the benefit decreases rapidly as you
> > > >> increase the count.
> > > >>
> > > >> I'm not quite sure why it would be different in 7x .vs. 6x. How much
> > > >> heap do you allocate to the JVM? And do you see similar heap dumps in
> > > >> 6.6?
> > > >>
> > > >> Best,
> > > >> Erick
> > > >> On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser 
> > wrote:
> > > >>>
> > > >>> Hello,
> > > >>>
> > > >>> we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard
> > each, 4 replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We
> > are running Zookeeper 4.1.13.
> > > >>>
> > > >>> Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space
> > exhaustion. After obtaining a heap dump it looks like that we have a lot of
> > IndexSearchers open for our largest collection.
> > > >>>
> > > >>> The dump contains around ~60 IndexSearchers, and each containing
> > around ~40mb heap. Another 500MB of heap is the fieldcache, which is
> > expected in my opinion.
> > > >>>
> > > >>> The current config can be found here:
> > https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844 <
> > https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844>
> > > >>>
> > > >>> Analyzing the heap dump eclipse MAT says this:
> > > >>>
> > > >>> Problem Suspect 1
> > > >>>
> > > >>> 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded
> > by "org.eclipse.jetty.weba

Concurrent Update Client Stops on Exceptions Randomly v7.4

2018-09-06 Thread deniz
I am trying to write a wrapper for DIH, so i can leverage the field type
guessing while importing the sql data. 

the query is supposed to retrieve 400K+ documents. in the test data in db,
there are dirty date fields, which has data like '1966-00-00' or
'1987-10-00' as well. 

I am running the code below:

 public void dataimport(ConcurrentUpdateSolrClient updateClient, String
importSql) {

try {

Connection conn = DriverManager.getConnection("connection
string","user","pass");
Statement stmt =
conn.createStatement(ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
ResultSet rs = stmt.executeQuery(importsql);
ResultSetMetaData resultSetMetaData = rs.getMetaData();
List fields = new ArrayList<>();
for(int index=1; index < resultSetMetaData.getColumnCount();
index++){
fields.add(new
SolrFieldObject(resultSetMetaData.getColumnLabel(index),
resultSetMetaData.getColumnClassName(index)));
}
while(rs.next()){
SolrInputDocument solrInputDocument = new
SolrInputDocument();
for(SolrFieldObject field : fields){
try{
Object dataObject = rs.getString(field.name());
Optional.ofNullable(dataObject).ifPresent(
databaseInfo ->{
solrInputDocument.addField(field.name(),
String.valueOf(databaseInfo)); 
}
);
}catch(Exception e){
e.printStackTrace();
}

}
try{
 UpdateRequest updateRequest = new UpdateRequest();
 updateRequest.setCommitWithin(1);
try{
  updateRequest.add(solrInputDocument);
  updateRequest.process(updateClient);

}catch(Exception e){
  e.printStackTrace();
}
}catch(Exception e){
System.out.println("Inner -> " + e.getMessage());
}
}
stmt.close();
conn.close();
} catch (Exception e) {
e.printStackTrace();
}
}

The code is working fine, except that it is randomly stopping with the logs
like 'Error adding field 'day'='1976-00-00' msg=Invalid Date
String:'1976-00-00' on random documents. Although there are many other
documents with invalid dates, those are logged as errors on the server side,
but client works fine and continues to push other document, until it stops
on random document with the given error.

Are there any error threshold value that makes the concurrent update client
stop after some time? or there are some other points I am missing while
dealing with this kind of updates? 



-
Zeki ama calismiyor... Calissa yapar...
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html