Re: Safe to change numVersionBuckets?

2018-03-30 Thread Shawn Heisey

On 3/30/2018 10:24 PM, Randy Fradin wrote:

I understand from reading the discussion in SOLR-6820 that 65536 is the
recommended default for this setting now because it results in higher
document write rates than the old default of 256. I would like to reduce my
heap utilization and I'm OK with somewhat slower document writing
throughput. My question is, it is safe to reduce the value
of numVersionBuckets on all of my existing cores without reindexing my data?

My solrconfig.xml contains this for all of my collections:


   
 ${solr.ulog.dir:}
 ${solr.ulog.numVersionBuckets:65536}
   



It looks like that is something that applies during indexing, and 
doesn't affect the data that's already indexed.


If you do heavy indexing, then you probably shouldn't reduce it quite 
that far.  If your indexing is light, then a low value is not likely to 
be problematic.



Assuming it is safe to change, can I just add a vm arg to the Solr process
like "-Dsolr.ulog.numVersionBuckets=256" to override the value for all
cores at once? Or do I have to change and re-upload the solrconfig.xml
files and reload the cores?


Setting it with the system property will require restarting every Solr 
service.  If you change it in solrconfig.xml and then upload the updated 
config to zookeeper under the same config name, all you will need to do 
to apply the change is reload collections using that config.


With 3300 cores per server, a reload is probably significantly less 
stress on the system than a full Solr restart.


Thanks,
Shawn



Safe to change numVersionBuckets?

2018-03-30 Thread Randy Fradin
I have a SolrCloud cluster (version 6.5.1) with around 3300 cores per
instance. I've been investigating what is driving heap utilization since it
is higher than I expected. I took a heap dump and found the largest driver
of heap utilization is the array of VersionBucket objects in the
org.apache.solr.update.VersionInfo class. The array is size 65536 and there
is one per SolrCore instance. Each instance of the array is 1.8MB so the
aggregate size is 6GB in heap.

I understand from reading the discussion in SOLR-6820 that 65536 is the
recommended default for this setting now because it results in higher
document write rates than the old default of 256. I would like to reduce my
heap utilization and I'm OK with somewhat slower document writing
throughput. My question is, it is safe to reduce the value
of numVersionBuckets on all of my existing cores without reindexing my data?

My solrconfig.xml contains this for all of my collections:


  
${solr.ulog.dir:}
${solr.ulog.numVersionBuckets:65536}
  


Assuming it is safe to change, can I just add a vm arg to the Solr process
like "-Dsolr.ulog.numVersionBuckets=256" to override the value for all
cores at once? Or do I have to change and re-upload the solrconfig.xml
files and reload the cores?

Thanks


RE: Query redg : diacritics in keyword search

2018-03-30 Thread Allison, Timothy B.
For a simple illustration of Charlie's point and a side bonus on the 78 reasons 
to use the ICUFoldingFilter if you happen to be processing Arabic script 
languages, see slides 31-33:

https://github.com/tballison/share/blob/master/slides/TextProcessingAndAdvancedSearch_tallison_MITRE_201510_final_abbrev.pdf
 

-Original Message-
From: Charlie Hull [mailto:char...@flax.co.uk] 
Sent: Thursday, March 29, 2018 9:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Query redg : diacritics in keyword search

On 29/03/2018 14:12, Peter Lancaster wrote:
> Hi,
> 
> You don't say whether the AsciiFolding filter is at index time or query time. 
> In any case you can easily look at what's happening using the admin analysis 
> tool which helpfully will even highlight where the analysed query and index 
> token match.
> 
> That said I'd expect what you want to work if you simply use  class="solr.ASCIIFoldingFilterFactory"/> on both index and query.

Simply put:

You use the filter at indexing time to collapse any variants of a term into a 
single variant, which is then stored in your index.

You use the filter at query time to collapse any variants of a term that users 
type into a single variant, and if this exists in your index you get a match.

If you don't use the same filter at both ends you won't get a match.

Cheers

Charlie

> 
> Cheers,
> Peter.
> 
> -Original Message-
> From: Paul, Lulu [mailto:lulu.p...@bl.uk]
> Sent: 29 March 2018 12:03
> To: solr-user@lucene.apache.org
> Subject: Query redg : diacritics in keyword search
> 
> Hi,
> 
> The keyword search Carré  returns values Carré and Carre (this works 
> well as I added the tokenizer  class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/> in 
> the schema config to enable returning of both sets of values)
> 
> Now looks like we want Carre to return both Carré and Carre (and this dosen’t 
> work. Solr only returns Carre) – any ideas on how this scenario can be 
> achieved?
> 
> Thanks & Best Regards,
> Lulu Paul
> 
> 
> 
> **
> 
> Experience the British Library online at www.bl.uk 
> The British Library’s latest Annual Report and Accounts : 
> www.bl.uk/aboutus/annrep/index.html dex.html> Help the British Library conserve the world's knowledge. 
> Adopt a Book. www.bl.uk/adoptabook
> The Library's St Pancras site is WiFi - enabled
> **
> ***
> The information contained in this e-mail is confidential and may be legally 
> privileged. It is intended for the addressee(s) only. If you are not the 
> intended recipient, please delete this e-mail and notify the 
> postmas...@bl.uk : The contents of this e-mail must 
> not be disclosed or copied without the sender's consent.
> The statements and opinions expressed in this message are those of the author 
> and do not necessarily reflect those of the British Library. The British 
> Library does not take any responsibility for the views of the author.
> **
> ***
> Think before you print
> 
> 
> This message is confidential and may contain privileged information. You 
> should not disclose its contents to any other person. If you are not the 
> intended recipient, please notify the sender named above immediately. It is 
> expressly declared that this e-mail does not constitute nor form part of a 
> contract or unilateral obligation. Opinions, conclusions and other 
> information in this message that do not relate to the official business of 
> findmypast shall be understood as neither given nor endorsed by it.
> 
> 
> __
> 
> 
> This email has been checked for virus and other malicious content prior to 
> leaving our network.
> __
> 
> 


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Resetting Authentication/Authorization

2018-03-30 Thread Shawn Heisey

On 3/30/2018 7:18 AM, Terry Steichen wrote:

The output resembles the contents of security.json, except that there's
only one authenticated user, which is the one whose credentials are
supplied.  And there are only two permissions.


I was actually wanting to SEE it.  Redact things like the encrypted 
passwords and the usernames if you like.  There should be stuff in the 
output OTHER than the json itself.



That's the essence of my question: yes, I think it should logically do
what you say, but I don't know if or how it does that.  I don't think it
loads security.json because I have to start from scratch no matter
what's in security.json, and no matter where I place that file.  I would
be happy if it did that because I could prepare a fine-tuned set of
authentications and permissions and reuse it each time.  I simply don't
know how to do that (or even if it can be done).


When you're running SolrCloud, security.json (and most other config 
files) are NOT on your disk.  They're in zookeeper. An exception is 
sometimes solr.xml ... but you can put that in zookeeper too.  Any 
versions of config files that you put on disk are completely ignored.


Unless you're doing something that creates a brand new ZK database every 
time you restart Solr, which is a very bad idea, the security settings 
should be surviving restarts.


Thanks,
Shawn



Re: Resetting Authentication/Authorization

2018-03-30 Thread Terry Steichen

On 03/29/2018 11:07 PM, Shawn Heisey wrote:
> On 3/29/2018 8:28 PM, Terry Steichen wrote:
>> When I set up the initial authentications and authorizations (I'm using
>> 6.6.0 and running in cloud mode.), I call "bin/solr auth enable
>> -credentials xxx:yyy".
>
> What does this command output?  There should definitely be something
> output when that command is run.  I don't know if it will be a lot of
> output or a little bit, but whatever it is, can you provide it?
*The output resembles the contents of security.json, except that there's
only one authenticated user, which is the one whose credentials are
supplied.  And there are only two permissions.*
>
>> I then use a series of additional API calls ( to
>> create additional users and permissions).  This creates my desired
>> security environment (and, BTW, it seems to function as it should).
>
> Can you elaborate on exactly what you did when you say "a series of
> additional API calls"?
*I issued the well-documented curl-based commands to create a user and
to create a permission.  Multiple times as needed.*
>
>> If I restart solr, it appears I must reactivate it with the same
>> 'bin/solr auth enable -credentials xxx:yyy' command.  But, it seems that
>> when solr is restarted this way, only the authorizations are retained
>> persistently.  But the authentications have to be created again from
>> scratch.
>
> Enabling the authentication when running in cloud mode should upload a
> "security.json" file to zookeeper.  It should also write some
> variables to your solr.in.sh file, so that future usage of the
> bin/solr tool can provide the authentication that is required.
*That's the essence of my question: yes, I think it should logically do
what you say, but I don't know if or how it does that.  I don't think it
loads security.json because I have to start from scratch no matter
what's in security.json, and no matter where I place that file.  I would
be happy if it did that because I could prepare a fine-tuned set of
authentications and permissions and reuse it each time.  I simply don't
know how to do that (or even if it can be done).*
>
> Thanks,
> Shawn
>
>



Re: New 7.2.1 install on linux; "permission denied" on exec?

2018-03-30 Thread Shawn Heisey

On 3/30/2018 6:01 AM, hal...@xsmail.com wrote:
WHY that works, that's still an open question for me ... 


If you had tried the "-x" trick, it might have given me some insight.  
But if your solution is acceptable to you, then we can let the matter 
drop.  If you ever upgrade Solr, you're probably going to be in the same 
situation again.  At that point you can either try to figure it out, or 
apply the same fix.


Thanks,
Shawn



Re: New 7.2.1 install on linux; "permission denied" on exec?

2018-03-30 Thread hal469
hi

On Thu, Mar 29, 2018, at 10:35 AM, Shawn Heisey wrote:
> Looks fine.  It's a little odd to be changing the install location to
> /opt/solr instead of /opt ... but if that's what you really want, it
> won't cause any issues.

Just testing that it does what I want, where I want.  I always *1st* install 
into a dedicated subdir ... have had one too many apps fail to create their own 
subdir, and 'pollute'!

> > chown -R solr:solr /opt/solr
> 
> Why are you doing this step?

Because it was complaining about permissions.  1st assumption was ownership ...

>  Those files are *MEANT* to be owned by
> root.  The solr user has no need to write to files in that location. 
> (Changing permissions in this way is unlikely to hurt anything, but
> isn't at all necessary)

Noted.

> distribution 

I'm on OpenSuse.  So still have to do this

  https://github.com/apache/lucene-solr/pull/305/files

1st.

> Try the following as a troubleshooting step.  Either log in as "solr" or
...


I managed to 'fix' the problem.

rm -f /etc/init.d/solr

and replace it with a systemd unit file,

/etc/systemd/system/solr.service

That seems to do the trick:

ps aux | grep solr
   solr 35445  181  1.6 4047996 267116 ?  Sl   04:57   0:11 java 
-server -Xms512m -Xmx51 ...

and all's good.

WHY that works, that's still an open question for me ...