Re: Getting repeated Error - RunExecutableListener java.io.IOException

2019-02-14 Thread Jan Høydahl
Yes we need more details.
I wonder why you have configured the RunExecutableListener in your 
solrconfig.xml and why you have told it to execute Linux commands on Windows.
RunExecutableListener is removed in later Solr versions due to security 
concerns so you should anyway move away from it.

It would help if you told us things like Solr version, whether this has worked 
before, or what change that caused it to stop working, also
a copy of the relevant solrconfig.xml section. What are you expecting the 
listener to do for you?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 15. feb. 2019 kl. 07:52 skrev Hemant Verma :
> 
> We have Sitecore and Solr setup.
> Solr is running on windows.
> 
> Below errors are repeatedly appearing in logs.
> 
> o.a.s.c.RunExecutableListener java.io.IOException: Cannot run program "sh"
> (in directory "\bin"): CreateProcess error=2, The system cannot find the
> file specified
> 
> o.a.s.c.RunExecutableListener java.io.IOException: Cannot run program "bash"
> (in directory "\bin"): CreateProcess error=2, The system cannot find the
> file specified
> 
> o.a.s.c.RunExecutableListener java.io.IOException: Cannot run program "curl"
> (in directory "\usr\bin"): CreateProcess error=2, The system cannot find the
> file specified
> 
> Let me know if need more details.
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Getting repeated Error - RunExecutableListener java.io.IOException

2019-02-14 Thread Hemant Verma
We have Sitecore and Solr setup.
Solr is running on windows.

Below errors are repeatedly appearing in logs.

o.a.s.c.RunExecutableListener java.io.IOException: Cannot run program "sh"
(in directory "\bin"): CreateProcess error=2, The system cannot find the
file specified

o.a.s.c.RunExecutableListener java.io.IOException: Cannot run program "bash"
(in directory "\bin"): CreateProcess error=2, The system cannot find the
file specified

o.a.s.c.RunExecutableListener java.io.IOException: Cannot run program "curl"
(in directory "\usr\bin"): CreateProcess error=2, The system cannot find the
file specified

Let me know if need more details.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-14 Thread Zheng Lin Edwin Yeo
Hi,

For your info, this issue is occurring in Solr 7.7.0 as well.

Regards,
Edwin

On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> Should we report this as a bug in Solr?
>
> Regards,
> Edwin
>
> On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi Paul,
>>
>> Regarding the regex (\n\s*){2,} that we are using, when we try in on
>> https://regex101.com/, it is able to give us the correct result for all
>> the examples (ie: All of them will only have , and not more than
>> that like what we are getting in Solr in our earlier examples).
>>
>> Could there be a possibility of a bug in Solr?
>>
>> Regards,
>> Edwin
>>
>> On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo 
>> wrote:
>>
>>> Hi Paul,
>>>
>>> We have tried it with the space preceeding the \n i.e. >> name="pattern">(\s*\n){2,}, with the following regex pattern:
>>>
>>> 
>>>content
>>>(\s*\n){2,}
>>>brbr
>>> 
>>>
>>> However, we are also getting the exact same results as the earlier
>>> Example 1, 2 and 3.
>>>
>>> As for your point 2 on perhaps in the data you have other (non printing)
>>> characters than \n, we have find that there are no non printing characters.
>>> It is just next line with a space. You can refer to the original content in
>>> the same examples below.
>>>
>>>
>>> Example 1: The sentence that the above regex pattern is working
>>> correctly
>>> *Original content in EML file:*
>>> Dear Sir,
>>>
>>>
>>> I am terminating
>>> *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
>>> *Index content: *Dear Sir,  I am terminating
>>>
>>> Example 2: The sentence that the above regex pattern is partially
>>> working (as you can see, instead of 2 , there are 4 )
>>> *Original content in EML file:*
>>>
>>> *exalted*
>>>
>>> *Psalm 89:17*
>>>
>>>
>>> 3 Choa Chu Kang Avenue 4
>>> *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3
>>> Choa Chu Kang Avenue 4, Singapore
>>> *Index content: *exalted  Psalm 89:17 3
>>> Choa Chu Kang Avenue 4, Singapore
>>>
>>> Example 3: The sentence that the above regex pattern is partially
>>> working (as you can see, instead of 2 , there are 4 )
>>> *Original content in EML file:*
>>>
>>> http://www.concordpri.moe.edu.sg/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Dec 18, 2018 at 10:07 AM
>>> *Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n
>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18,
>>> 2018 at 10:07 AM
>>> *Index content: *http://www.concordpri.moe.edu.sg/   
>>> On Tue, Dec 18, 2018 at 10:07 AM
>>>
>>>
>>> Appreciate any other ideas or suggestions that you may have.
>>>
>>> Thank you.
>>>
>>> Regards,
>>> Edwin
>>>
>>> On Thu, 7 Feb 2019 at 22:49,  wrote:
>>>
 Hi Edwin



   1.  Sorry, the pattern was wrong, the space should preceed the \n
 i.e. (\s*\n){2,}
   2.  Perhaps in the data you have other (non printing) characters than
 \n?



 Gesendet von Mail für
 Windows 10



 Von: Zheng Lin Edwin Yeo
 Gesendet: Donnerstag, 7. Februar 2019 15:23
 An: solr-user@lucene.apache.org
 Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



 Hi Paul,

 We have tried this suggested regex pattern as follow:
 
content
(\n\s*){2,}
brbr
 

 But we still have exactly the same problem of Example 1,2 and 3 below.

 Example 1: The sentence that the above regex pattern is working
 correctly
 *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
 *Index content: *Dear Sir,  I am terminating

 Example 2: The sentence that the above regex pattern is partially
 working
 (as you can see, instead of 2 , there are 4 )
 *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
 Chu Kang Avenue 4, Singapore
 *Index content: *exalted  Psalm 89:17 3 Choa
 Chu Kang Avenue 4, Singapore

 Example 3: The sentence that the above regex pattern is partially
 working
 (as you can see, instead of 2 , there are 4 )
 *Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n
 \n\n
 \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18,
 2018
 at 10:07 AM
 *Index content: *http://www.concordpri.moe.edu.sg/   
 On
 Tue, Dec 18, 2018 at 10:07 AM

 Any further suggestion?

 Thank you.

 Regards,
 Edwin

 On Thu, 7 Feb 2019 at 22:20,  wrote:

 > To avoid the «\n+\s*» matching too many \n and then failing on the
 {2,}
 > part you could try
 >
 >
 >
 > (\n\s*){2,}
 >
 >
 >
 > If you also want to match CRLF then
 >
 > (\r?\n\s*){2,}
 >
 >
 >
 >
 >
 > Gesendet von 

Re: solr cloud version upgrade 7.6 to 7.7 collection indexes all marked as down

2019-02-14 Thread Zheng Lin Edwin Yeo
Hi,

Which version of zookeeper are you using?

Also, if you tried to query the index, did you get any error message?

Regards,
Edwin


On Fri, 15 Feb 2019 at 02:34, Jeff Courtade  wrote:

> Hi,
>
> I am working n doing a simple point upgrade from solr 7.6 to 7.7 cloud.
>
> 6 servers
> 3 zookeepers
> one simple test collection using the prepackages _default config.
>
> i stop all solr servers leaving the zookeepers up.
>
> change out the binaries and put the solr.in.sh file back in place with
> memory and directory stuff.
>
> The index directory does not move the files dont change
>
> i start up the new binaries and it starts with no errors in the logs but
> all of the indexes are "down"
>
> I have no clue here. nothing in the logs
>


Re: Migrate from sol 5.3.1 to 7.5.0

2019-02-14 Thread ramyogi
Do we need to reindex if we change synonymQueryStyle values for a fieldType ?
I hope not.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Migrate from sol 5.3.1 to 7.5.0

2019-02-14 Thread ramyogi
Hi Eric, Found the reason and all shard request to go /select flow solves the
problem.

Regarding


In SOLR 7 when relevancy added for the search it is not working (not
expected results) for the above fieldtype but same works fine for /select
because it uses Lucene Parser but our flow uses EDISMAX. But 5.3.1 works
fine both cases.

As soon as we change as below,



 synonymQueryStyle="as_same_term"  Both cases works in 7.5.0 flow. 

Is there any intentional changes for the above scenario in SOLR 7.5.0 where
I can refer ?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: edismax: sorting on numeric fields

2019-02-14 Thread Gus Heck
Hi Niclolas,

Solr has no difficulty sorting on numeric fields if they are indexed as a
numeric type. Just use "=weight asc" If you're field is indexed as
text of course it won't sort properly, but then you should fix your schema.

-Gus

On Thu, Feb 14, 2019 at 4:10 PM David Hastings 
wrote:

> Not clearly understanding your question here.  if your query is
> q=kind:animal weight:50 you will get no results, as nothing matches
> (assuming a q.op of AND)
>
>
> On Thu, Feb 14, 2019 at 4:06 PM Nicolas Paris 
> wrote:
>
> > Hi
> >
> > I have a numeric field (say "weight") and I d'like to be able to get
> > results sorted.
> > q=kind:animal weight:50
> > pf=kind^2 weight^3
> >
> > would return:
> > name=dog, kind=animal, weight=51
> > name=tiger, kind=animal,weight=150
> > name=elephant, kind=animal,weight=2000
> >
> >
> > In other terms how to deal with numeric fields ?
> >
> > My first idea is to encode numeric into letters (one x per value)
> > dog x
> > tiger x
> > elephant
> >
> 
> >
> > and the query would be
> > kind:animal, weight:xxx
> >
> >
> > How to deal with numeric fields ?
> >
> > Thanks
> > --
> > nicolas
> >
>


-- 
http://www.the111shift.com


Under-utilization during streaming expression execution

2019-02-14 Thread Gus Heck
Hi Folks,

I'm looking for ideas on how to speed up processing for a streaming
expression. I can't post the full details because it's customer related,
but the structure is shown here: https://imgur.com/a/98sENVT What that does
is take the results of two queries, join them and push them back into the
collection as a new (denormalized) doc. The second (hash) join just updates
a field that distinguishes the new docs from either of the old docs so it's
hashing exactly one value, and thus this is not of concern for performance
(if there were a good way to tell select to modify only one field and keep
all the rest without listing the fields explicitly it wouldn't be needed) .


When I run it across a test index with 1377364 and 5146620 docs for the two
queries. The result is that it inserts 4742322 new documents, in ~10
minutes. This seems pretty spiffy except this test index is ~1/1000 of the
real index... so obviously I want to find *at least* a factor of 10
improvement. So far I managed a factor of about 3 to get it down to
slightly over 200 seconds by programmatically building the queries
partitioning based on a set of percentiles from a stats query on one of the
fields that is a floating point number with good distribution, but this
seems to stop helping 10-12 splits on my 50 node cluster, scaling up to
split to all 50 nodes brings things back to ~400 seconds.

The CPU utilization on the machines mostly stabilizes around 30-50%, Disk
metrics don't seem to look bad (disk idle stat in AWS stays over 90%).
Still trying to get a good handle on network numbers, but I'm guessing that
I'm either network limited or there's an inefficiency with contention
somewhere inside solr (no I haven't put a profiler on it yet).

Here's the interesting bit. I happen to know that the join key in the
leftJoin is on a key that is used for document routing, so we're only
joining up with documents on the same node. Furthermore, the id generated
is a concatenation of these id's with a value from one of the fields and
should also route to the same node... Is there any way to make the whole
expression run locally on the nodes to avoid throwing the data back and
forth across the network needlessly?

Any other ideas for making this go another factor of 2-3 faster?

-Gus


SolrCloud exclusive features

2019-02-14 Thread Arnold Bronley
Hi,

Are there any features that are only exclusive to SolrCloud?

e.g. when I am reading Streaming Expressions documentation, first sentence
there says 'Streaming Expressions provide a simple yet powerful stream
processing language for Solr Cloud.'

So, does this mean that streaming expressions are only available in
SolrCloud mode and not in Solr master-slave mode?

If yes, is there a list of such features that only exclusively available in
SolrCloud?


Re: edismax: sorting on numeric fields

2019-02-14 Thread David Hastings
Not clearly understanding your question here.  if your query is
q=kind:animal weight:50 you will get no results, as nothing matches
(assuming a q.op of AND)


On Thu, Feb 14, 2019 at 4:06 PM Nicolas Paris 
wrote:

> Hi
>
> I have a numeric field (say "weight") and I d'like to be able to get
> results sorted.
> q=kind:animal weight:50
> pf=kind^2 weight^3
>
> would return:
> name=dog, kind=animal, weight=51
> name=tiger, kind=animal,weight=150
> name=elephant, kind=animal,weight=2000
>
>
> In other terms how to deal with numeric fields ?
>
> My first idea is to encode numeric into letters (one x per value)
> dog x
> tiger x
> elephant
> 
>
> and the query would be
> kind:animal, weight:xxx
>
>
> How to deal with numeric fields ?
>
> Thanks
> --
> nicolas
>


edismax: sorting on numeric fields

2019-02-14 Thread Nicolas Paris
Hi

I have a numeric field (say "weight") and I d'like to be able to get
results sorted.
q=kind:animal weight:50
pf=kind^2 weight^3

would return:
name=dog, kind=animal, weight=51
name=tiger, kind=animal,weight=150
name=elephant, kind=animal,weight=2000


In other terms how to deal with numeric fields ?

My first idea is to encode numeric into letters (one x per value)
dog x
tiger x
elephant 


and the query would be
kind:animal, weight:xxx


How to deal with numeric fields ?

Thanks
-- 
nicolas


solr cloud version upgrade 7.6 to 7.7 collection indexes all marked as down

2019-02-14 Thread Jeff Courtade
Hi,

I am working n doing a simple point upgrade from solr 7.6 to 7.7 cloud.

6 servers
3 zookeepers
one simple test collection using the prepackages _default config.

i stop all solr servers leaving the zookeepers up.

change out the binaries and put the solr.in.sh file back in place with
memory and directory stuff.

The index directory does not move the files dont change

i start up the new binaries and it starts with no errors in the logs but
all of the indexes are "down"

I have no clue here. nothing in the logs


Re: Migrate from sol 5.3.1 to 7.5.0

2019-02-14 Thread Walter Underwood
I can’t find the original post right now, but putting a load balancer in front 
of
Zookeeper is a really bad idea. Do not do that. There is a stateful protocol
between one client and one Zookeeper node. This is not a stateless protocol
that you can just bounce around between servers.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 14, 2019, at 7:42 AM, Erick Erickson  wrote:
> 
> bq. 2. Did you take the solrconfig that came with 7.5 and modify it rather
> than copy your 5.3 solrconfig?
> No, We have adjusted existing 5.3.1 config when we used for couple years to
> SOLR 7.5.0.
> 
> I don't think this is the root of your problem, but this is always
> suspect. Not only
> can the format of solrconfig change, but certain values don't make any sense,
> e.g. LuceneMatchVersion would be 2 major versions back, which is unsupported.
> You may well have changed _that_ value, but can you guarantee all
> _other_ changes
> are accounted for?
> 
> So I'd takethe 7.x solrconfig (and schema for that matter) and overlay
> your changes
> rather than try to update the 5x configs.
> 
> Best,
> Erick
> 
> On Thu, Feb 14, 2019 at 1:59 AM Jan Høydahl  wrote:
>> 
>> You may of course ignore the warning in the UI, it is just a warning 
>> intended to help you avoid mis-configurations.
>> But there may be side effects of placing a load balancer in between client 
>> and zk cluster, see
>> https://stackoverflow.com/questions/30905007/load-balancer-with-zookeeper
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 14. feb. 2019 kl. 01:24 skrev ramyogi :
>>> 
>>> Thanks Jan, I am unaware how our Devops team decided this But this was
>>> working well without any issues with SOLR 5.3.1 for couple of years. Just
>>> wanted to make any changes in SOLR7 mandates .
>>> 
>>> 
>>> 
>>> --
>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>> 



Re: Solr Index Size after reindex

2019-02-14 Thread David Hastings
The other thing I would be curious about is in your reindexing process, do
you clear out the entire index before hand?  if so perhaps there is content
missing/moved

On Thu, Feb 14, 2019 at 11:07 AM Erick Erickson 
wrote:

> Basically, this is not possible ;). Therefore there's something I
> don't understand
>
> There's nothing anywhere except what's in the index. By that I mean that
> _if_
> you copy an index (the data directory and children) from one place to
> another,
> that's all there is. No information about what's in the index is stored
> anywhere
> else. So there are a couple of possibilities I see:
>
> 1> Your rsync isn't doing what you think. By that I mean that "somehow" it
> isn't
> copying segments (perhaps with the same name, although the size and time
> checks would make it extremely unlikely to skip one). What happens if
> you _delete_ the data index on your target system first?
>
> 2> I'm not entirely sure what happens if there are multiple
> "segments_n" files. in
> the index. That file "points" to all the current segments. From a strictly
> theoretical standpoint, my _guess_ is that Lucene chooses the one with the
> highest "_n" value. So if you have multiple ones of those, it would be
> interesting
> to know,
>
> 3> Has Solr been restarted (or at least the core reloaded) on the target?
>
> So here's the experiment I'd run:
> 1> shut down the Solr running on the target
> 2> delete the data dir.
> 3> restart Solr and verify that you have zero docs. This will recreate
> the data dir and verify that that Solr instance is pointing where you
> think it is as a sanity check.
> 4> stop Solr again on the target.
> 5> do a hard commit on the source.
> 6> get a a long listing "ls -l" on your source index. This should be a
> lot of flies like _0.tim, _0.fdt, _1.tim, _1.fdt etc .
> 7> do your rsync. You should _not_ be indexing to the source at this time.
> 8> start Solr on the target.
> 9> check the target again. Assuming that you have _not_ been adding
> any documents to the source system during the rsync, I'd be stunned if
> there were any differences.
> 10> If there are incorrect counts or other anomalies:
> 10.1> double-check your rsync. Is it really getting the files from your
> source?
> 10.2> compare the long listing from your index you took in <6> with
> the target. Are all files identical size-wise? Are there any files on
> the target that are not on the source and vice-versa? If there are
> differences, that would explain your issues and would point to your
> rsync process being messed up.
>
> If the index directories are identical on the source and target and
> you _still_ see differences then there's an alternate reality that we
> occupy ;).
>
> And the Alfresco folks would probably be the ones to contact.
>
> Best,
> Erick
>
>
>
> On Wed, Feb 13, 2019 at 11:28 PM Mathieu Menard
>  wrote:
> >
> > Hello Andrea,
> >
> > I'm really sorry for the delay of my answer but I beed more information
> before answer you.
> >
> > Yes 5.365.213 is the numDocs you got just after the sync and yes
> 4.537.651 is the numDocs you got in the staging server after the reindexing
> and the colleague who realized the rsync confirm that it has been entirely
> completed.
> >
> > I don't see any transaction not completed that normaly means that the
> indexation is completed. That's why I don't understand the difference.
> >
> > Kind Regards
> >
> > Matthieu
> >
> > Original Message-
> > From: Andrea Gazzarini [mailto:a.gazzar...@sease.io]
> > Sent: samedi 9 février 2019 16:56
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr Index Size after reindex
> >
> > Yes, those numbers are different and that should explain the different
> size. I think you should be able to find some information in the Alfresco
> or Solr log. There must be a reason about the missing content.
> > For example, are those numbers coming from two comparable snapshots? In
> other words, I imagine that at a given moment X you rsync-ed the two servers
> >
> >   * 5.365.213 is the numDocs you got just after the sync, isn't it?
> >   * 4.537.651 is the numDocs you got in the staging server after the
> > reindexing isn't it? Are you sure the whole reindexing is completed?
> >
> > MaxDocs is the number of documents you have in the index including the
> deleted docs not yet cleared by a merge. In the console you should also see
> the "Deleted docs" count which should be equal to (maxdocs - numdocs)
> >
> > Ciao
> >
> > Andrea
> >
> > On 08/02/2019 15:53, Mathieu Menard wrote:
> > >
> > > Hi Andrea,
> > >
> > > I've checked this information and here is the result:
> > >
> > >
> > >
> > > PRODUCTION
> > >
> > >
> > >
> > > STAGING
> > >
> > > *numDocs*
> > >
> > >
> > >
> > > 5.365.213
> > >
> > >
> > >
> > > 4.537.651
> > >
> > > *MaxDoc*
> > >
> > >
> > >
> > > 5.845.469
> > >
> > >
> > >
> > > 5.129.556
> > >
> > > It seems that there is more than 800.00 docs in PRODUCTION that will
> > > explain the size of indexes 

Re: Solr Index Size after reindex

2019-02-14 Thread Erick Erickson
Basically, this is not possible ;). Therefore there's something I
don't understand

There's nothing anywhere except what's in the index. By that I mean that _if_
you copy an index (the data directory and children) from one place to another,
that's all there is. No information about what's in the index is stored anywhere
else. So there are a couple of possibilities I see:

1> Your rsync isn't doing what you think. By that I mean that "somehow" it isn't
copying segments (perhaps with the same name, although the size and time
checks would make it extremely unlikely to skip one). What happens if
you _delete_ the data index on your target system first?

2> I'm not entirely sure what happens if there are multiple
"segments_n" files. in
the index. That file "points" to all the current segments. From a strictly
theoretical standpoint, my _guess_ is that Lucene chooses the one with the
highest "_n" value. So if you have multiple ones of those, it would be
interesting
to know,

3> Has Solr been restarted (or at least the core reloaded) on the target?

So here's the experiment I'd run:
1> shut down the Solr running on the target
2> delete the data dir.
3> restart Solr and verify that you have zero docs. This will recreate
the data dir and verify that that Solr instance is pointing where you
think it is as a sanity check.
4> stop Solr again on the target.
5> do a hard commit on the source.
6> get a a long listing "ls -l" on your source index. This should be a
lot of flies like _0.tim, _0.fdt, _1.tim, _1.fdt etc .
7> do your rsync. You should _not_ be indexing to the source at this time.
8> start Solr on the target.
9> check the target again. Assuming that you have _not_ been adding
any documents to the source system during the rsync, I'd be stunned if
there were any differences.
10> If there are incorrect counts or other anomalies:
10.1> double-check your rsync. Is it really getting the files from your source?
10.2> compare the long listing from your index you took in <6> with
the target. Are all files identical size-wise? Are there any files on
the target that are not on the source and vice-versa? If there are
differences, that would explain your issues and would point to your
rsync process being messed up.

If the index directories are identical on the source and target and
you _still_ see differences then there's an alternate reality that we
occupy ;).

And the Alfresco folks would probably be the ones to contact.

Best,
Erick



On Wed, Feb 13, 2019 at 11:28 PM Mathieu Menard
 wrote:
>
> Hello Andrea,
>
> I'm really sorry for the delay of my answer but I beed more information 
> before answer you.
>
> Yes 5.365.213 is the numDocs you got just after the sync and yes 4.537.651 is 
> the numDocs you got in the staging server after the reindexing and the 
> colleague who realized the rsync confirm that it has been entirely completed.
>
> I don't see any transaction not completed that normaly means that the 
> indexation is completed. That's why I don't understand the difference.
>
> Kind Regards
>
> Matthieu
>
> Original Message-
> From: Andrea Gazzarini [mailto:a.gazzar...@sease.io]
> Sent: samedi 9 février 2019 16:56
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Index Size after reindex
>
> Yes, those numbers are different and that should explain the different size. 
> I think you should be able to find some information in the Alfresco or Solr 
> log. There must be a reason about the missing content.
> For example, are those numbers coming from two comparable snapshots? In other 
> words, I imagine that at a given moment X you rsync-ed the two servers
>
>   * 5.365.213 is the numDocs you got just after the sync, isn't it?
>   * 4.537.651 is the numDocs you got in the staging server after the
> reindexing isn't it? Are you sure the whole reindexing is completed?
>
> MaxDocs is the number of documents you have in the index including the 
> deleted docs not yet cleared by a merge. In the console you should also see 
> the "Deleted docs" count which should be equal to (maxdocs - numdocs)
>
> Ciao
>
> Andrea
>
> On 08/02/2019 15:53, Mathieu Menard wrote:
> >
> > Hi Andrea,
> >
> > I've checked this information and here is the result:
> >
> >
> >
> > PRODUCTION
> >
> >
> >
> > STAGING
> >
> > *numDocs*
> >
> >
> >
> > 5.365.213
> >
> >
> >
> > 4.537.651
> >
> > *MaxDoc*
> >
> >
> >
> > 5.845.469
> >
> >
> >
> > 5.129.556
> >
> > It seems that there is more than 800.00 docs in PRODUCTION that will
> > explain the size of indexes more important. But there is a thing that
> > I don't understand, we have copied the DB and the contenstore the
> > numDocs for the two environments should be the same no?
> >
> > Could you also explain me the meaning of the maxDocs value pleases?
> >
> > Thanks
> >
> > Matthieu
> >
> > *From:*Andrea Gazzarini [mailto:a.gazzar...@sease.io]
> > *Sent:* vendredi 8 février 2019 14:54
> > *To:* solr-user@lucene.apache.org
> > *Subject:* Re: Solr Index Size after reindex

Re: Migrate from sol 5.3.1 to 7.5.0

2019-02-14 Thread Erick Erickson
bq. 2. Did you take the solrconfig that came with 7.5 and modify it rather
than copy your 5.3 solrconfig?
No, We have adjusted existing 5.3.1 config when we used for couple years to
SOLR 7.5.0.

I don't think this is the root of your problem, but this is always
suspect. Not only
can the format of solrconfig change, but certain values don't make any sense,
e.g. LuceneMatchVersion would be 2 major versions back, which is unsupported.
You may well have changed _that_ value, but can you guarantee all
_other_ changes
are accounted for?

So I'd takethe 7.x solrconfig (and schema for that matter) and overlay
your changes
rather than try to update the 5x configs.

Best,
Erick

On Thu, Feb 14, 2019 at 1:59 AM Jan Høydahl  wrote:
>
> You may of course ignore the warning in the UI, it is just a warning intended 
> to help you avoid mis-configurations.
> But there may be side effects of placing a load balancer in between client 
> and zk cluster, see
> https://stackoverflow.com/questions/30905007/load-balancer-with-zookeeper
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 14. feb. 2019 kl. 01:24 skrev ramyogi :
> >
> > Thanks Jan, I am unaware how our Devops team decided this But this was
> > working well without any issues with SOLR 5.3.1 for couple of years. Just
> > wanted to make any changes in SOLR7 mandates .
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Solr 7.7 UpdateRequestProcessor broken

2019-02-14 Thread Andreas Hubold

Hi,

while trying to update from Solr 7.6 to 7.7 I run into some unexpected 
incompatibilites with UpdateRequestProcessors.


The SolrInputDocument passed to UpdateRequestProcessor#processAdd does 
not return Strings for string fields anymore but instances of 
org.apache.solr.common.util.ByteArrayUtf8CharSequence. I found some 
related JIRA issues (SOLR-12983?) but nothing under the "Upgrade Notes" 
section.


I can adapt our UpdateRequestProcessor implementations but at least the 
org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor 
is broken now as well and needs to be fixed in Solr. It expects String 
values and logs messages such as the following now:


2019-02-14 13:14:47.537 WARN  (qtp802600647-19) [   x:studio] 
o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field 
name_tokenized not a String value, not including in detection


I wonder what kind of plugins are affected by the change. Does this only 
affect UpdateRequestProcessors or more plugins? Do I need to handle 
these ByteArrayUtf8CharSequence instances in SolrJ clients now as well?


Cheers,
Andreas




Null Pointer Exception during periodic deletion of expired doc in solr 7.6

2019-02-14 Thread manjula potula
Hello All,

We have upgraded to solr from 6.6 to 7.6 and seeing null pointer exception
with DocExpirationUpdateProcessorFactory. Didn't see this issue in 6.6
version of solr. Not sure if any changes made to the
updateRequestProcessorChain in 7.6 version, can anyone please help?


2019-02-14 03:21:58.417 INFO  (autoExpireDocs-29-thread-1) [   ]
o.a.s.u.p.IgnoreCommitOptimizeUpdateProcessorFactory commit from client
application ignored with status code: 200
2019-02-14 03:21:58.417 DEBUG (autoExpireDocs-29-thread-1) [   ]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE FINISH
LocalSolrQueryRequest{}
2019-02-14 03:21:58.417 DEBUG (autoExpireDocs-29-thread-1) [   ]
o.a.s.c.s.i.ConcurrentUpdateSolrClient STATS pollInteruppts=1 pollExists=1
blockLoops=3 emptyQueueLoops=1
2019-02-14 03:21:58.417 DEBUG (autoExpireDocs-29-thread-1) [   ]
o.a.s.c.s.i.ConcurrentUpdateSolrClient STATS pollInteruppts=1 pollExists=1
blockLoops=5 emptyQueueLoops=0
2019-02-14 03:21:58.417 DEBUG (autoExpireDocs-29-thread-1) [   ]
o.a.s.c.s.i.ConcurrentUpdateSolrClient STATS pollInteruppts=1 pollExists=1
blockLoops=2 emptyQueueLoops=0
2019-02-14 03:21:58.417 DEBUG (autoExpireDocs-29-thread-1) [   ]
o.a.s.c.s.i.ConcurrentUpdateSolrClient STATS pollInteruppts=1 pollExists=1
blockLoops=1 emptyQueueLoops=0
2019-02-14 03:21:58.417 DEBUG (autoExpireDocs-29-thread-1) [   ]
o.a.s.c.s.i.ConcurrentUpdateSolrClient STATS pollInteruppts=1 pollExists=1
blockLoops=1 emptyQueueLoops=0
2019-02-14 03:21:58.417 ERROR (autoExpireDocs-29-thread-1) [   ]
o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic
deletion of expired docs: null
java.lang.NullPointerException: null
at
org.apache.solr.update.processor.DistributedUpdateProcessor.handleReplicationFactor(DistributedUpdateProcessor.java:952)
~[solr-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f -
nknize - 2018-12-07 14:47:52]
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:920)
~[solr-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f -
nknize - 2018-12-07 14:47:52]
at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1967)
~[solr-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f -
nknize - 2018-12-07 14:47:52]
at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
~[solr-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f -
nknize - 2018-12-07 14:47:52]
at
org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
~[solr-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f -
nknize - 2018-12-07 14:47:52]
at
org.apache.solr.update.processor.DocExpirationUpdateProcessorFactory$DeleteExpiredDocsRunnable.run(DocExpirationUpdateProcessorFactory.java:419)
[solr-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f -
nknize - 2018-12-07 14:47:52]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_121]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[?:1.8.0_121]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[?:1.8.0_121]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[?:1.8.0_121]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[?:1.8.0_121]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

We have below defined in solrconfig



   
   
   300
   expire_at



Below in schema.xml


Thanks,


Re: Incorrect shard placement during Collection creation in 7.6

2019-02-14 Thread Bram Van Dam
Thanks Erick, I just created SOLR-13247 and linked it to SOLR-12944.

 - Bram

On 13/02/2019 18:31, Erick Erickson wrote:
> I haven't verified, but this looks like a JIRA to me. Looks like some
> of the create logic may have issues, see: SOLR-12944 and maybe link to
> that JIRA?


Re: SOLR and AWS comprehend

2019-02-14 Thread Charlie Hull

On 13/02/2019 12:17, Gareth Baxendale wrote:

This is perhaps more or an architecture question than dev code but
appreciate collective thoughts!

We are using Solr to order records and to categorise them to allow users to
search and find specific medical conditions. We have an opportunity to make
use of Machine Learning to aid and improve the results. AWS Comprehend is
the product we are looking at but there is a question over whether one
should replace the other as they would compete or if in fact both should
work together to provide the solution we are after.


One is an open source search engine and one is a closed source hosted 
NLP service you pay for. I think you're comparing chalk and cheese here: 
you would use a NLP service to enhance the source data before indexing 
with something like Solr, or extract information from a query before 
searching. Although Solr does contain some classification features it 
doesn't contain any NLP features - although as my colleague Liz writes 
you can now easily integrate Solr & OpenNLP, another open source 
toolkit. 
https://opensourceconnections.com/blog/2018/08/06/intro_solr_nlp_integrations/


By the way are you aware that NHS Wales are using Solr to power their 
patient records service?


Best

Charlie


Appreciate any insights people have.

Thanks Gareth




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Migrate from sol 5.3.1 to 7.5.0

2019-02-14 Thread Jan Høydahl
You may of course ignore the warning in the UI, it is just a warning intended 
to help you avoid mis-configurations.
But there may be side effects of placing a load balancer in between client and 
zk cluster, see
https://stackoverflow.com/questions/30905007/load-balancer-with-zookeeper

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 14. feb. 2019 kl. 01:24 skrev ramyogi :
> 
> Thanks Jan, I am unaware how our Devops team decided this But this was
> working well without any issues with SOLR 5.3.1 for couple of years. Just
> wanted to make any changes in SOLR7 mandates .
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



HDFS cache for Solr

2019-02-14 Thread Antczak, Lukasz
Hello,
We are using Cloudera 5.12.1 with Solr 4.10.3.
We want to store our index in memory since HDFS where the data is
stored is to slow.

We were trying using Solr HDFS block cache, but we are struggling with
warming it to be sure that whole index is in memory after updating
some documents.

We are thinking about using HDFS cache for directories with Solr
index. Have you ever seen such an approach?

Regards
Łukasz

-- 
Łukasz Antczak
IT Professional
GS Business Intelligence Team

Planned absences:

Roche Polska Sp. z o.o.
ADMD Group Services - Business Intelligence Team
HQ: ul. Domaniewska 39B, 02-672 Warszawa
Office: ul. Abpa Baraniaka 88D, 61-131 Poznań

Tel: +48 22 260 55 21
Mobile: +48 519 515 010
mailto: lukasz.antc...@roche.com

Informacja o poufności: Treść tej wiadomości zawiera informacje
przeznaczone tylko dla adresata. Jeżeli nie jesteście Państwo jej
adresatem, bądź otrzymaliście ją przez pomyłkę, prosimy o
powiadomienie o tym nadawcy oraz trwałe jej usunięcie. Wszelkie
nieuprawnione wykorzystanie informacji zawartych w tej wiadomości jest
zabronione.

Confidentiality Note: This message is intended only for the use of the
named recipient(s) and may contain confidential and/or proprietary
information. If you are not the intended recipient, please contact the
sender and delete this message. Any unauthorized use of the
information contained in this message is prohibited.


Re: Multiplicative Boosts broken since 7.3 (LUCENE-8099)

2019-02-14 Thread Mikhail Khludnev
Here's the fix proposal is attached
https://issues.apache.org/jira/browse/SOLR-13126 appreciate reviews and
opinions.
I can push it quite soon if there's no veto.

On Wed, Feb 13, 2019 at 9:59 PM Burgmans, Tom <
tom.burgm...@wolterskluwer.com> wrote:

> I like to bump this issue up, since this is a showstopper for us to
> upgrade from Solr 6. In https://issues.apache.org/jira/browse/SOLR-13126
> I described a couple of more use cases in which this bug appears. We see
> different scores in the EXPLAIN compared to the actual scores and our
> analysis is that the EXPLAIN in fact is correct. It happens when a
> multiplicative boost is used (via the "boost" parameter) in combination
> with some function queries, like "query" and "field".
>
> One example (tested on Solr 7.5.0), when running:
>
> http://localhost:8983/solr/test/select?defType=edismax=id,score,[explain
> style=text]=*:*=sum(field(price),4)
>
> then the expectation is that a document that doesn't have the price field
> gets a score of 4. The result however is:
>
> {
> "id": "docid123576",
> "score": 1.0,
> "[explain]": "4.0 = product of:\n  1.0 = boost\n  4.0 = product of:\n
>   1.0 = *:*\n4.0 = sum(float(price)=0.0,const(4))\n"
> }
>
> EXPLAIN and score are not consistent.
>
> Best regards Tom
>
>
> -Original Message-
> From: Tobias Ibounig [mailto:t.ibou...@netconomy.net]
> Sent: dinsdag 22 januari 2019 10:14
> To: solr-user@lucene.apache.org
> Subject: Multiplicative Boosts broken since 7.3 (LUCENE-8099)
>
> Hello,
>
> As described in
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSOLR-13126data=02%7C01%7Ctom.burgmans%40wolterskluwer.com%7C82b7f7923bd74285295e08d68049f3da%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C636837452448856240sdata=paFEStnQwxcKQQ9mM1MfPXQm%2BrStTaqQnYFH2LolVl8%3Dreserved=0
> multiplicative boots (in certain conditions) seem to be broken since 7.3.
> The error seems to be introduced in
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FLUCENE-8099data=02%7C01%7Ctom.burgmans%40wolterskluwer.com%7C82b7f7923bd74285295e08d68049f3da%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C636837452448856240sdata=Gs1EzQ%2FCSO8ryZJv0EGx2etxmDA7HkW8Crj5H6mE%2FvE%3Dreserved=0.
> Reverting the SOLR parts to the now deprecated BoostingQuery again fixes
> the issue.
> The filed issue contains a test case and a patch with the revert (for
> testing purposes, not really a clean fix).
> We sadly couldn't find the actual issue, which seems to lie with the use
> of "FunctionScoreQuery" for boosting.
>
> We were able to patch our 7.5 installation with the patch. As others might
> be affected as well, we hope this can be helpful in resolving this bug.
>
> To all SOLR/Lucene developers, thank you for your work. Looking trough the
> code base gave me a new appreciation of your work.
>
> Best Regards,
> Tobias
>
> PS: This issue was already posted by a colleague, "Inconsistent debugQuery
> score with multiplicative boost", but I wanted to create a new post with a
> clearer title.
>
>

-- 
Sincerely yours
Mikhail Khludnev