from:"Christopher Schultz"

Re: Help using Noggit for streaming JSON data

2020-10-07 Thread Christopher Schultz

Yonic,

Thanks for the reply, and apologies for the long delay in this reply. Also 
apologies for top-posting, I’m writing from my phone. :(

Oh, of course... simply subclass the CharArr.

In my case, I should be able to immediately base64-decode the value (saves 1/4 
in-memory representation) and, if I do everything correctly, may be able to 
stream directly to my database.

With a *very* complicated CharArr implementation of course :)

Thanks,
-chris

> On Sep 17, 2020, at 12:22, Yonik Seeley  wrote:
> 
> See this method:
> 
>  /** Reads a JSON string into the output, decoding any escaped characters.
> */
>  public void getString(CharArr output) throws IOException
> 
> And then the idea is to create a subclass of CharArr to incrementally
> handle the string that is written to it.
> You could overload write methods, or perhaps reserve() to flush/handle the
> buffer when it reaches a certain size.
> 
> -Yonik
> 
> 
>> On Thu, Sep 17, 2020 at 11:48 AM Christopher Schultz <
>> ch...@christopherschultz.net> wrote:
>> 
>> All,
>> 
>> Is this an appropriate forum for asking questions about how to use
>> Noggit? The Github doesn't have any discussions available and filing an
>> "issue" to ask a question is kinda silly. I'm happy to be redirected to
>> the right place if this isn't appropriate.
>> 
>> I've been able to figure out most things in Noggit by reading the code,
>> but I have a new use-case where I expect that I'll have very large
>> values (base64-encoded binary) and I'd like to stream those rather than
>> calling parser.getString() and getting a potentially huge string coming
>> back. I'm streaming into a database so I never need the whole string in
>> one place at one time.
>> 
>> I was thinking something like this:
>> 
>> JSONParser p = ...;
>> 
>> int evt = p.nextEvent();
>> if(JSONParser.STRING == evt) {
>>  // Start streaming
>>  boolean eos = false;
>>  while(!eos) {
>>char c = p.getChar();
>>if(c == '"') {
>>  eos = true;
>>} else {
>>  append to stream
>>}
>>  }
>> }
>> 
>> But getChar() is not public. The only "documentation" I've really been
>> able to find for Noggit is this post from Yonic back in 2014:
>> 
>> http://yonik.com/noggit-json-parser/
>> 
>> It mostly says "Noggit is great!" and specifically mentions huge, long
>> strings but does not actually show any Java code to consume the JSON
>> data in any kind of streaming way.
>> 
>> The ObjectBuilder class is a great user of JSONParser, but it just
>> builds standard objects and would consume tons of memory in my case.
>> 
>> I know for sure that Solr consumes huge JSON documents and I'm assuming
>> that Noggit is being used in that situation, though I have not looked at
>> the code used to do that.
>> 
>> Any suggestions?
>> 
>> -chris
>>

Help using Noggit for streaming JSON data

2020-09-17 Thread Christopher Schultz

All,

Is this an appropriate forum for asking questions about how to use
Noggit? The Github doesn't have any discussions available and filing an
"issue" to ask a question is kinda silly. I'm happy to be redirected to
the right place if this isn't appropriate.

I've been able to figure out most things in Noggit by reading the code,
but I have a new use-case where I expect that I'll have very large
values (base64-encoded binary) and I'd like to stream those rather than
calling parser.getString() and getting a potentially huge string coming
back. I'm streaming into a database so I never need the whole string in
one place at one time.

I was thinking something like this:

JSONParser p = ...;

int evt = p.nextEvent();
if(JSONParser.STRING == evt) {
  // Start streaming
  boolean eos = false;
  while(!eos) {
char c = p.getChar();
if(c == '"') {
  eos = true;
} else {
  append to stream
}
  }
}

But getChar() is not public. The only "documentation" I've really been
able to find for Noggit is this post from Yonic back in 2014:

http://yonik.com/noggit-json-parser/

It mostly says "Noggit is great!" and specifically mentions huge, long
strings but does not actually show any Java code to consume the JSON
data in any kind of streaming way.

The ObjectBuilder class is a great user of JSONParser, but it just
builds standard objects and would consume tons of memory in my case.

I know for sure that Solr consumes huge JSON documents and I'm assuming
that Noggit is being used in that situation, though I have not looked at
the code used to do that.

Any suggestions?

-chris

Re: Dynamic reload of TLS configuration

2020-05-28 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

Ping. Any options for no-downtime TLS reconfiguration?

- -chris

On 4/23/20 11:35, Christopher Schultz wrote:
> All,
>
> Does anyone know if it is possible to reconfigure Solr's TLS
> configuration (specifically, the server key and certificate)
> without a restart?
>
> I'm looking for a zero-downtime situation with a single-server and
> an updated TLS certificate.
>
> Thanks, -chris
>
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl7P9FEACgkQHPApP6U8
pFh6Eg//ceIMMoWnTEEcWk+vI40cgTe7bOUJJ+KcIRSuN9MwLOy6/RnHiNgvF1ma
VS9+AkpzM9oOmoTu+p6Je9ZZi55cKqvwUm//Q92lO1GW3q7UVLTpESqTv6sUGi2t
umIs9Qm51pGVzGS8G0unfkgFvcBy1j+0uJ58wEIaZdEa7DbSdHodo+UWJw/69wys
H6yVaxGVRAwDSaR4EzhoDOvMT+Cze9WQoSvGxWFjJGa8WPMWetbOYmWsI7GJxXXt
5GzoMeVGv3ITbjMExDKyIHnoQYNZePxzegNBKD0FFAny2ozKEqBXeH6qOooYs56S
XWubqMriFhnUgjrpbS+iwwOjEMuHjBZq2VXGGQ0XGCkv9e2iOIKFCtbY8O2IXZS0
grU3U7lC1wgZi594RrGXTYT2xYw0esYbi6jvDAKDE/zy33zHha/GlQy+4FGRdzqv
Iaj1mvqlr1BNXuVl5yvuh6zAiw4cYjOWykAhnFztuSRIEEE5yEA1yT6g35e413QI
nf3cUEFsczV04soSOwrsxEhqMG4+u6rBMVpT5zLvRyai8F1xXReVv/RPJesrB7bw
Ow2hgGhEk0WSTCoeKnw6rXn89PDAe3V0oS0Nhug6KxfUiFfuVxf3rX07O4uOWaJz
T8InerreM4D1JuMWEnjVWuFseBIX+Nroolo9uKZqNrhbPwlZz+0=
=i1PJ
-END PGP SIGNATURE-

Re: using S3 as the Directory for Solr

2020-04-23 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rahul,

On 4/23/20 21:49, dhurandar S wrote:
> Thank you for your reply. The reason we are looking for S3 is since
> the volume is close to 10 Petabytes. We are okay to have higher
> latency of say twice or thrice that of placing data on the local
> disk. But we have a requirement to have long-range data and
> providing Seach capability on that.  Every other storage apart from
> S3 turned out to be very expensive at that scale.
>
> Basically I want to replace
>
> -Dsolr.directoryFactory=HdfsDirectoryFactory \
>
> with S3 based implementation.

Can you clarify whether you have 10 PiB of /source data/ or 10 PiB of
/index data/?

You can theoretically store your source data anywhere, of course. 10
PiB sounds like a truly enormous index.

- -chris

> On Thu, Apr 23, 2020 at 3:12 AM Jan Høydahl 
> wrote:
>
>> Hi,
>>
>> Is your data so partitioned that it makes sense to consider
>> splitting up in multiple collections and make some arrangement
>> that will keep only a few collections live at a time, loading
>> index files from S3 on demand?
>>
>> I cannot see how an S3 directory would be able to effectively
>> cache files in S3 and what units the index files would be stored
>> as?
>>
>> Have you investigated EFS as an alternative? That would look like
>> a normal filesystem to Solr but might be cheaper storage wise,
>> but much slower.
>>
>> Jan
>>
>>> 23. apr. 2020 kl. 06:57 skrev dhurandar S
>>> :
>>>
>>> Hi,
>>>
>>> I am looking to use S3 as the place to store indexes. Just how
>>> Solr uses HdfsDirectory to store the index and all the other
>>> documents.
>>>
>>> We want to provide a search capability that is okay to be a
>>> little slow
>> but
>>> cheaper in terms of the cost. We have close to 2 petabytes of
>>> data on
>> which
>>> we want to provide the Search using Solr.
>>>
>>> Are there any open-source implementations around using S3 as
>>> the
>> Directory
>>> for Solr ??
>>>
>>> Any recommendations on this approach?
>>>
>>> regards, Rahul
>>
>>
>
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIyBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl6iTwUACgkQHPApP6U8
pFjRaw/4sGbH286gZJe+wfKsLc4JPvyJZjjwVDCdpiR2SHt50IA23wYSK97R6xRj
dbWWReA7C3JNWp6x21i8Bb6sIeLDnotbc7IOSmOMuNep1BtVaYBMJ8wyW6uUtXf6
hQbY0Ew93ZhDlS9CWMJqbQtWfrQEqH51Xbz+4uqqvJU8Bq9o9Vv0rnuVp/5f73lV
ihek0sbA73oGle0gC5NFmrKItnn+14X8vIxUC8JRZlY4rDSiOdOcIil3DExxOQNQ
UodIvwKKhzALFY77PeGSSjKiy0X3JJ1rKzLeIBrW0JCNMprYLzL2CQjZ5F09MraZ
WxXdA64lEg2diEwHywNrsaaygbEZYTWd8gaeGA7kzCk78Y2KuhWuEQej6KmE3Iq2
AW+K7JgFakUpzB5oorCtKNLQOqFHX85ne57gCYKr42S3Htfxmf98pBdudQy4RvuT
+tJvGYx8NLqgeOoZN4u+G/8WunlzUC+u2vUxVcIoK3Ozz0usMioFDqn69vmOxxoH
cN2Y4T1ZZZGtndiAGZww1JXKAbVN0U41isXg2F8tHQV9dxaeoYDQ/xYbAoWEhhlM
SVtEdr76eMJ08T6h5711gtrhSK+RQFPD2Jbr8B/Xl063xPfN2TpqmcJCKXkucvpc
CEDLFqeKX6qIRZDgMf8EICmbFl6aF5knbDP0MkyYk4urB+uFaw==
=Y/6Y
-END PGP SIGNATURE-

Dynamic reload of TLS configuration

2020-04-23 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

Does anyone know if it is possible to reconfigure Solr's TLS
configuration (specifically, the server key and certificate) without a
restart?

I'm looking for a zero-downtime situation with a single-server and an
updated TLS certificate.

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl6htdYACgkQHPApP6U8
pFhIpA/9EZ/jC3QjGUfx+g9fpNel5AxzpCV0NnTaJulGLkWNVeGoGNY2IcwfG+Oe
13piWVfRWs3OTLWYwiEhuxbY3FzBxJZL9EJ6QFNNmCVkyg2MMgQzR+bdSWiT0P7F
K8hPyOzEMqLML5y6c1TOQHF8Bn09mHwgLACHdnvzfFKcaaUSzzBKItDlIvDTB5Vm
m1x/GOBQ4P7uYr+Gi7hUbr+Zz6MwDI9HT2arUwAiG0aeTO///FrZEtVdKdJtrDWk
tBwZz+qzkOzWj6EuTWgLU2/64QVzJsutGJmhkpixLaGaAnrpQ5d+3PjhxYraKA3j
tahzRYJGC2PEUxQMZsWWCPSJodDsB/5h4zo5DsdIOZLmrAuuI367j5fcb9fO/J3c
KxStUZf04ZCXWb17xMIrcYecWwkNQydjuwH5yRQHb9c7C3oRCpYNxY0yueUg/+8W
voJUvCwR9qRD9NdSAUB9JOkt0Tj0c/SEgP5X8zllF5kISb7q7KcUVoyjG+vei1H0
E+4VNV3KnqnIJQgnFIsUU6ZiGznn+uy0I29+we8P08GX27MlEL1+KxjsT8la6h97
OWXwuH44e4ntFFsYbC9lOmn3ib/zA45l1sO77wTdDH9iBwKZXmLmf24ABlXvy8uI
4AH3dvOxjFeXWtYq9m2jebotiirzpkPaxvzBHJ+WDcVgtQKZ7wo=
=5MDT
-END PGP SIGNATURE-

Re: Require searching only for file content and not metadata

2019-08-26 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Kushal,

On 8/26/19 07:52, Khare, Kushal (MIND) wrote:
> This is Kushal Khare, a new addition to the user-list. I started 
> working with Solr few days ago for implementing it in my project.
> 
> Now, I have the basics done, and reached the query stage.
> 
> My problem is – I need to restrict the solr to search only for the 
> file content and not the metadata. I have gone through various 
> articles on the internet, but could not get any help.
> 
> Therefore, I hope I could get some solutions here.

How are you querying Solr? Are you querying from a web application? From
a thick-client application? Directly from a web browser?

What do you consider "metadata" versus "content"? To Solr, everything
is the same...

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8
pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y
MyUKpp0/P6EpR/dMPmiKBPvLppSqjT1SUNgrFi2btwtBaTibxWXd0WtEqNdinWCo
DFyJaPQaIT20IR887SPWrQSYc4oC8aKNAEDAXxlyWDzEgImE23AyCeWs++gJsaKm
RphkleBeIKCX6SkRzDFeEzx4VyKBZKcjI+Ks/9z2s9tcGmElxyMDPHYf5VXJQgcz
A1D3jPVPqm2OMvThXd2ll4NlnXe2PWV5eYfZQt/6YMwx4jF+rqG66jDXEhTHzDro
jmiZVj1VbQ0RlFLqP6OHu2YRj+01a0OtE8l4mWiGSNIrKymp+ycT9E+L0eC9yGIT
hLUfo7a3ONfOTTNAbuI/363+2WA1wBxSHm2m3kQT8Ho8ydjd7w/umR1L6/wr+q9B
jEZfAHs1TLFXd6lgqLtmIyf6Ya5bloWM+yjwnjfpniOuHCcXTiJi+5GvxLwih8yE
6CQ32kIUuspJ7N5hyiJvM4AcuWWMldDlZaYoHuUwhVbWCCT+Y4X6R1+IZfyXZnvn
wFEMD3+3r382M3G0uyh2MJk899l1kSPcX+BtRg3pOqDZh0WR+2xWpTndeiMxsmGj
UC1J1PssKUa1P0dMk7wLvgOl0BiiGC+WwgD7ZfHjF7NPL1jPtW8=
=LWwW
-END PGP SIGNATURE-

Can't start Solr 7.7.1 due to name-resolution issue

2019-08-22 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I'm getting a failure to start my Solr instance. Here's the error from
the console log:

Error: Exception thrown by the agent : java.net.MalformedURLException:
Local host name unknown: java.net.UnknownHostException: [hostname]:
[hostname]: Name or service not known
sun.management.AgentConfigurationError:
java.net.MalformedURLException: Local host name unknown:
java.net.UnknownHostException: [hostname]: [hostname]: Name or service
not known
at
sun.management.jmxremote.ConnectorBootstrap.startRemoteConnectorServer(C
onnectorBootstrap.java:480)
at sun.management.Agent.startAgent(Agent.java:262)
at sun.management.Agent.startAgent(Agent.java:452)
Caused by: java.net.MalformedURLException: Local host name unknown:
java.net.UnknownHostException: [hostname]: [hostname]: Name or service
not known
at
javax.management.remote.JMXServiceURL.(JMXServiceURL.java:289)
at
javax.management.remote.JMXServiceURL.(JMXServiceURL.java:253)
at
sun.management.jmxremote.ConnectorBootstrap.exportMBeanServer(ConnectorB
ootstrap.java:739)
at
sun.management.jmxremote.ConnectorBootstrap.startRemoteConnectorServer(C
onnectorBootstrap.java:468)
... 2 more


Now, my hostname is just the first part of the hostname, so like "www"
instead of "www.example.com". Running "host [hostname]" on the CLI
returns "Host [hostname]" not found: 3(NXDOMAIN)" so it's not entirely
surprising that this name resolution is failing.

What's the best way for me to get around this?

I'm running on Debian Stretch in Amazon EC2. I've tried fixing the
local name resolution so that it actually works, but when I reboot,
the EC2 instance reverts my DNS settings so those changes won't
survive a reboot.

Can I give the fully-qualified hostname to the JMX component in some way
?

I've this answer[1] on SO and everyone seems to say "edit /etc/hosts"
and, as I said, the EC2 startup scripts end up resetting those files
during a reboot.

Any ideas?

- -chris

[1]
https://stackoverflow.com/questions/20093854/jmx-agent-throws-java-net-m
alformedurlexception-when-host-name-is-set-to-all-num
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1e9IQACgkQHPApP6U8
pFjG6g//XM6nPgioIuJs40gB9534GnsG9q8d42AUIoiDzJ+t8isFFxtphEChdcye
9/5ePo36fODIsNkzzXsAJh9L1iRgmnVy7QGQIDp07WEo9v2bVo2RkWl42zm+UQ5u
XIz//bpT+J9y3eBPdPCKaXou+UYeR9/2W/UYyN08/uayP2QVVd2ZavC6AbFW93i1
IF5vOmETOsxBgVlgngX4TQRNSKfe5gCqWT0l/diHpm7PjT2BDzNO7x3vRbfioOMS
ktXcRqdBJAzM9XLV1acI+0z7I1kzs/A+jCymT/4++VmI0Lf4AACIhoaqnmS9pxyY
nrXU8tttozbaHMiBS3dIIMZP1ZF4jzY0+/UPBfgXqM4OcErWTjrha4G/5oBlLqf8
msuVRTg6qbsQJP//UcDhN8kl593xCK/bcQMkzq1ABkwFUhb8PhXp/3IJCRjJm5q3
U3gTwMwA/k+R4aM8qGaLw+07aFCdVJKrIUW0NEEHEnwkjJxAeqIRdpV8acfrT6uy
3v78cVFvWaxcOtAyioUhek0jhKzCobcxsZEcxZqWWxY0DOFHWbip/agTJESC/sXV
wLY2P9lldo+S5dAoaGM7Ze1WJ5FOSLm6Juvl4CvyMeebyPFie4PrWX7b7ess8I+A
YwLyqfKQOV4qmWoiO7yNGcwfgIYNn3bJ/1b/vkmo+ua0KvjscYk=
=zeBa
-END PGP SIGNATURE-

Re: Configure mutual TLS 1.2 to secure SOLR

2019-06-07 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Paul,

On 6/7/19 11:02, Paul wrote:
> Can someone please outline how to use mutual TLS 1.2 with SOLR. Or,
> point me at docs/tutorials/other where I can read up further on
> this (version currently onsite is SOLR 7.6).

Here's a copy/paste from our internal guide for how to do this. YMMV.

Enjoy!

[...]

5. Configure Solr for TLS
   Create a server key and certificate:
   $ sudo mkdir /etc/solr
   $ sudo keytool -genkey -keyalg RSA -sigalg SHA256withRSA -keysize
4096 -validity 730 \
  -alias 'solr-ssl' -keystore /etc/solr/solr.p12 -storetype
PKCS12 \
  -ext san=dns:localhost,ip:192.168.10.20
 Use the following information for the certificate:
 First and Last name: 192.168.10.20 (or "localhost", or your
IP address)
 Org unit:  CHADIS Solr (Prod) (or dev)
 Everything else should be obvious

   Now, export the public key from the keystore.

   $ sudo /usr/local/java-8/bin/keytool -list -rfc -keystore
/etc/solr/solr.p12 -storetype PKCS12 -alias solr-ssl

   Copy that certificate and paste it into this command's stdin:

   $ sudo keytool -importcert -keystore /etc/solr/solr-server.p12
- -storetype PKCS12 -alias 'solr-ssl'

   Now, fix the ownership and permissions on these files:

   $ sudo chown root:solr /etc/solr/solr.p12 /etc/solr/solr-server.p12
   $ sudo chmod 0640 /etc/solr/solr.p12

   Edit the file /etc/default/solr.in.sh

   Set the following settings:

   SOLR_SSL_KEY_STORE=/etc/solr/solr.p12
   SOLR_SSL_KEY_STORE_TYPE=PKCS12
   SOLR_SSL_KEY_STORE_PASSWORD=whatever

   # You MUST set the trust store for some reason.
   SOLR_SSL_TRUST_STORE=/etc/solr/solr-server.p12
   SOLR_SSL_TRUST_STORE_TYPE=PKCS12
   SOLR_SSL_TRUST_STORE_PASSWORD=whatever

6. Configure Solr to Require Client TLS Certificates

  On each client, create a client key and certificate:

  $ keytool -genkey -keyalg EC -sigalg SHA256withECDSA \
-validity 730 -alias 'solr-client-ssl' \
-keystore /etc/solr/solr-client.p12 -storetype PKCS12

  Now dump the certificate for the next step:

  $ keytool -exportcert -keystore /etc/solr/solr-client.p12 -storetype
PKCS12 \
-alias 'solr-client-ssl' -rfc

  Don't forget that you might want to generate your own client certifica
te
  to use from you own web browser if you want to be able to connect to t
he
  server's dashboard.

  Use the output of that command on each client to put the cert(s)
into this
  trust store on the server:

  $ sudo keytool -importcert -keystore
/etc/solr/solr-trusted-clients.p12 \
 -storetype PKCS12 -alias '[client key alias]'

  Then, export the server's certificate and put IT into the
trusted-clients
  trust store, because command-line tools will use the server's own key
to
  contact itself.

  $ keytool -exportcert -keystore /etc/solr/solr-server.p12 -storetype
PKCS12 \
-alias 'solr-ssl'

  $ sudo keytool -importcert -keystore
/etc/solr/solr-trusted-clients.p12 \
 -storetype PKCS12 -alias 'solr-server'

  Now, set the proper file ownership and permissions:

  $ sudo chown root:solr /etc/solr/solr-trusted-clients.p12
  $ sudo chmod 0640 /etc/solr/solr-trusted-clients.p12

Edit /etc/default/solr.in.sh and add the following entries:

  # NOTE: Some of these are changing from "basic TLS" configuration.
  SOLR_SSL_NEED_CLIENT_AUTH=true
  SOLR_SSL_TRUST_STORE=/etc/solr/solr-trusted-clients.p12
  SOLR_SSL_TRUST_STORE_TYPE=PKCS12
  SOLR_SSL_TRUST_STORE_PASSWORD=whatever
  SOLR_SSL_CLIENT_TRUST_STORE=/etc/solr/solr-server.p12
  SOLR_SSL_CLIENT_TRUST_STORE_TYPE=PKCS12
  SOLR_SSL_CLIENT_TRUST_STORE_PASSWORD=whatever
  SOLR_SSL_CLIENT_KEY_STORE=/etc/solr/solr-client.p12
  SOLR_SSL_CLIENT_KEY_STORE_TYPE=PKCS12
  SOLR_SSL_CLIENT_KEY_STORE_PASSWORD=whatever

Summary of Files in /etc/solr
- -

solr.p12  Server keystore. Contains server key and certificate.
  Used by server to identify itself to clients.
  Should exist on Solr server.

solr-server.p12   Client trust store. Contains server's certificate.
  Used by clients to identify and trust the server.
  Should exist on Solr clients.

solr-client.p12   Client keystore. Contains client key and certificate.
  Used by clients to identify themselves to the server.
  Should exist on Solr clients when TLS client certs
are used.

solr-trusted-clients.p12
  Server trust store. Contains trusted client
certificates.
  Used by server to trust clients.
  Should exist on Solr servers when TLS client certs
are used.

[...]

Loading Data into a Core (Index)
- 
If you have installed Solr as a service using TLS, you will need to do
some
additional work to call Solr's "post" program. First, ensure you have
patched
bin/post according to the installation instructions above.

Re: Using Solr as a Database?

2019-06-03 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Daniel,

On 6/3/19 16:26, Davis, Daniel (NIH/NLM) [C] wrote:
> I think the sweet spot of Cassandra and Solr should be mentioned
> in this discussion.   Cassandra is more scalable/clusterable than
> an RDBMS, without losing all of the structure that is desirable in
> an RDBMS.

Amusingly enough, there is also Solandra if you don't want to choose :)

https://github.com/tjake/Solandra

It's a lot like DataStax.

> In contrast, if you use a full document store such as MongoDB, you 
> lose some of the abilities to know what is in your schema.
> 
> DataStax markets a platform that combines Cassandra (as a
> distributed replacement for an RDBMS) that is integrated with Solr
> so that records in managed in Cassandra are indexed and
> up-to-date.
> 
> If your real problem with an RDBMS is the lack of scaling, but you 
> like the ability to specify columnar structure explicitly, then
> this combination might be a good fit.
> 
> Now, MongoDB is also a strong alternative to an RDBMS.
> 
> The other thing to recall though is that the power of sharding has 
> reached into the databases themselves, and databases such as 
> PostgreSQL can operate with some tables sharded and other tables 
> duplicated.   See
> https://pgdash.io/blog/postgres-11-sharding.html.

Even MySQL and MariaDB -- the most bare-bones solutions in the RDBMS
space -- now have clustering available to them, to it's hard to defend
an RDBMS solution at this point that does NOT provide clustering, or
something similar.

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlz1kA8ACgkQHPApP6U8
pFif4w/+Ph5ZsQdEiVuK96ygWYJcq0x5RzBfrQhQ5oq7IvhdlLzdzwIPilwLZoaO
9/JcwQOUfVo5XNC72mpclg6J+1jhkBuvee7tMqvSA90PLoTmJLft/oeFoBBm374Z
9UAhJgHF/lhcyp00w4L1JjRH+jQzZia3cohi56oeLReKnyHY//EvqzHKNe2TbiPf
7m5jOIiscxmzAMaI2pEBE4gHWUL8rXVG0SVkUbMQYqR+dRj50sOKk3w2lO2akWV/
rLkYD175LAtpQ7qMXU+CAGro2UAIdTXJOtp7yhCquA6T6Vo4BcBsvQ2bGBMDpeld
MsnyxzM1hiOZ71DOhyFjfGN9Ivqr1/UijVNsZWazBYtYp9N9/H1l3hl6NlKUVGIF
c+pSVWleNAzsO4ShUGJrOkdfv64vRjfK1s/unggAnu/XtyTWKoNV6vxhXwceEYlD
1xVDk8O4ANErXxj4XvQvtgrvBeYOK5sJ5aqn0guN1UIX6Q2gE61bclYwJp9r4NO9
cJjTQedEPdVdRYAz+lDucmSESETQITghhSgub8558BmTSc1PF61f3nAKEYiWrhfN
NnxR0dLKY+QOQ5Mo9lX6RSsCYb9x5F8K1jAoy/GSllpnGc88oswquJT/7Vm6R0yX
9YvFI7JsUHfhIwSkV8uupBZ03KJpYgJvXwirBGzV4j7i4M4qr7o=
=9ZXf
-END PGP SIGNATURE-

Re: Using Solr as a Database?

2019-06-03 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Ralph,

On 6/2/19 16:32, Ralph Soika wrote:
> The whole system is highly transactional as it runs on Java EE with
> JPA and Session EJBs.

And you write-through from your application -> RDBMS -> Lucene/Solr?

How are you handling commits (both soft and hard) and re-opening the
index?

> So, as far as I understand, you recommend to leave the data in the
> RDBMS?

I certainly would, even if it's just to allow a rebuild of the index
from a "trusted" source.

> The problem with RDBMS is that you can not easily scale over many
> nodes with a master less cluster.

That sounds like it's a problem with your choice of RDBMS, and not of
RDBMS's in general.

> This was why I thought Solr can solve this problem easily. On the 
> other hand my Lucene index also did not scale over multiple nodes.

If you want a clustered document-store[1], you might want to look at a
storage system designed for that purpose such as CouchDB or MongoDB.
Lucene/Solr is really best used as a distillation of data stored
elsewhere and not as a backing-store itself.

> Maybe Solr would be a solution to scale just the index?

That's exactly what Solr is for.

> Another solution I am working on is to store all my data in a HA 
> Cassandra cluster because I do not need the SQL-Core
> functionallity. But in this case I only replace the RDBMS with
> Cassandra and Lucene/Solr holds again only the index.

This seems like another plausible solution.

> So Solr can't improve my architecture, with the exception of the
> fact that the search index could be distributed across multiple
> nodes with Solr. Did I get that right?

Yes.

Hope that helps,
- -chris

[1]
https://en.wikipedia.org/wiki/Document-oriented_database#Implementations
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlz1jYYACgkQHPApP6U8
pFjw6w/9GGv4Z4FIoypv8XQrtIf5heT8yH0On6pQaFI313mglmzerTrD4W9Jz3y7
VWQHeMw5Q5LBg56KKMGSKv/PEnNmiA+69YTMdXB+R5gJnwtW0ZEZU0jP1uhPO+af
UO6ZpdbMnIuIyeZK8oeo99rL7nrb0CaPvzrVP7LoF+flX9gp5qt30841QPTVwNgZ
ryC+mrlWTidRpFF/uKCctDOwDJgw6pKNf352F+n/Oc85maBTySgIla1ZEqz+B+G3
tdgdTiDT/ueZY0BNFubnWlpjVTP+rwQjOrq1cD/Z53zV6APs4v7RQ0JBqDeJcadj
5xohEmZh47lKiNqsrSpB+CZy5mebxEalB3ptB+O7zexwLoixzJB4wmqfbP/hcO69
ijp58mhdoYDZqqwNJXoRNQ6OfQ9KlTyxtQwQGNcKCDiOOzZkhPInaYFnDo4AARG7
bI4z4eMpDuAm0VKi+b1voASSDxvIcT1gUZVVEtQWR5O3lzWDYmpKLsdMXQi34TKG
CXtpjgq5CR8x8kFhVQD8QijTG/zOsDf0pksF1AZx/6DQvN3JaFy3hy2dSW1Plbm6
n0WMDIkJ8w9IxofU+pFcu+tJuSRvKdcieK6dHSMHSrTvUAZc3VcCXWI4w25eODX2
985JoQF5tP6IizxBOv334VwizGu7GRyPmLLMSnQFuJXzjB52v2w=
=cN5b
-END PGP SIGNATURE-

Re: SolrJ, CloudSolrClient and basic authentication

2019-05-31 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Dimitris,

On 6/1/18 02:46, Dimitris Kardarakos wrote:
> Thanks a lot Shawn. I had tried with the documented approach, but
> since I use SolrClient.add to add documents to the index, I could
> not "port" the documented approach to my case (probably I do miss
> something).
> 
> The custom HttpClient suggestion worked as expected!

Can you please explain how you did this?

I'm facing a problem where the simplest possible solution is giving
the error "org.apache.http.client.NonRepeatableRequestException:
Cannot retry request with a non-repeatable request entity.".

It seems that SolrClient is using something like BasicHttpEntity which
isn't "repeatable" when using HTTP Basic auth (where the server is
supposed to challenge the client and the client only then sends the
credentials). I need to either make the client data repeatable (which
is in SolrClient, which I'd prefer to avoid) or I need to make
HttpClient use an "expectant" credential-sending technique, or I need
to just stuff things into a header manually.

What did you do to solve this problem? It seems like this should
really probably come up more often than it does. Maybe nobody bothers
to lock-down their Solr instances?

Thanks,
- -chris

> On 31/05/2018 06:16 μμ, Shawn Heisey wrote:
>> On 5/31/2018 8:03 AM, Dimitris Kardarakos wrote:
>>> Following the feedback in the "Index protected zip" thread, I
>>> am trying to add documents to the index using SolrJ API.
>>> 
>>> The server is in SolrCloud mode with BasicAuthPlugin for
>>> authentication.
>>> 
>>> I have not managed to figure out how to pass username/password
>>> to my client.
>> There are two ways to approach this.
>> 
>> One approach is to build a custom HttpClient object that uses 
>> credentials by default, and then use that custom HttpClient
>> object to build your CloudSolrClient.  Exactly how to correctly
>> build the HttpClient object will depend on exactly which
>> HttpClient version you've included into your program.  If you go
>> with SolrJ dependency defaults, then the HttpClient version will
>> depend on the SolrJ version.
>> 
>> The other approach is the method described in the documentation,
>> where credentials are added to each request object:
>> 
>> https://lucene.apache.org/solr/guide/6_6/basic-authentication-plugin.
html#BasicAuthenticationPlugin-UsingBasicAuthwithSolrJ
>>
>>
>>
>> 
There are several different kinds of request objects.  A few examples:
>> UpdateRequest, QueryRequest, CollectionAdminRequest.
>> 
>> Thanks, Shawn
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlzxjlEACgkQHPApP6U8
pFhoeQ/7BzlhjGGE8tnMcrdmruP+N2rgvawfLcTdzDg3U4cQFNUVRoCclZcM8LiA
iuZf+cAewTTQTjLpQuSv2WoknQgO/YRgaqTlo+b3hv9zR2awY8Mob/m5RYcYAwmn
i+2SJrG7+u+qhpfDQGSjwppUKpm2WrfvGXL3lcRF48UXQ+z7J95o2g88SnP44FKH
87/X/iYX+xMsj0bkIEOkyppuXENQQwUZ7QWhgfAxSItJr2A0Ma6zkuuNPf4FvBJ1
JQM/c33WWbAXK3B7tI5iQsstVi5CMOhRF0Z336/vZgq6aF9uEZvIOWEVAlM+E8Qp
mYlZz7tERzUMs+QbcBcSdDIb8VSPwYy5kvKiJ9eEpjFGXmPBLOqiJ4M+4SOeGFq7
BA5sbm6k4gwHc33MiKvnHE1K+k3r1OBPngjxvelsyIaqSnX3zpKPTFhkU2dvWMPt
XPo/ICuiliGowD8xh5EhB6w0BuYZhK3dW7AKMCLbyoANwk7SLfHxC6O+rdmYyDQF
UwiR65+3ImmeKJOZt7lFoR43BXoFuz6L1SILU8XRcclS5KwXHg3moBElU7jM9iKV
9vMwWkuPGUA2gq5K0oV4XFEOShxUxFiCL4FXjd/P7x9Evhio+itvaUlHzP8FGblh
YyK+l2YqjKBnTJ0G4XE8UnJcmH8C23jJ05gwMgq92pXBQy5ly6s=
=6kab
-END PGP SIGNATURE-

Re: Enabling SSL on SOLR breaks my SQL Server connection

2019-05-24 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn and Paul,

On 5/23/19 08:57, Shawn Heisey wrote:
> On 5/23/2019 5:45 AM, Paul wrote:
>> unable to find valid certification path to requested target
> 
> This seems to be the root of your problem with the connection to
> SQL server.
> 
> If I have all the context right, Java is saying it can't validate
> the certificate returned by the SQL server.
> 
> This page:
> 
> https://docs.microsoft.com/en-us/sql/connect/jdbc/connecting-with-ssl-
encryption?view=sql-server-2017
>
> 
> 
> Talks about a "trustCertificate" property you can set to "true" in
> the JDBC URL that will cause Microsoft's JDBC driver to NOT
> validate the server certificate.

It would be much better to use the "trustStore" setting on the
connection properties. As Shawn mentions later in this thread:

On 5/23/19 12:06, Shawn Heisey wrote:
> Enabling SSL should have no *direct* effect on JDBC.
> 
> But it might have an indirect effect by changing some of Java's
> SSL settings that in turn could filter down to the JDBC driver.

You have probably been relying on the JVM's VM-wide default trust
store and when you change that, your SSL connections to SQL Server no
longer work.

I would argue that it is always a best-practice to configure trust
stores separately for every type of connection.

So, if you follow the link above you can read about the "trustStore"
connection parameter and point that config setting at a trust store
which contains the SQL Server's TLS certificate -- that your
application should trust.

I think that will clear-up your issue.

You may also with to set the "trustStorePassword" and "trustStoreType"
options as well.

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlzoVekACgkQHPApP6U8
pFjxDw/9EhDR1Pgxta9s05htdChMhuD+zpCTRPMvnppQoAQ6Qa5XCeSuVGaRNuGG
uDkJam93k/zxpLUoDSXmTxivKOoCljqdL3dDBEFzuuefrCTT3Hea4yCzzRJpjQa4
vttVBFJZdfvkjzMy1r3gtHe+IfMnYa404PKqNB+J9JTmsE2J/6cfbQ7/NzQPJ5p1
X1zQGlvhTSICkuqtuj+nuAh2WtrZpkG578N9STUfyCMcYHQqvZKNfo814Su1sBi4
PgKm0duc3QGS97kLf7qmsOq2hcbi2bF4snLw/Nii25pyLLKsw2mgpqKrVCPlCI3B
ic5cFGfMkJvwfakORFMeUV6oLAAY2wkm/itDPGkN/Iifsdx5SjqRa/z56k7FbOx5
y0Bt1lKJm+CJg+OcUq+qWIoZKSyqn6CjuOJmgq2UZQJlG24GUDhLNFPP/qHeA5/E
vaL7kJBcshPpAIFtg8r6T07mwfA9n0c0JKrp0a3RSzk22xVn5Uy44MAWd/z7jMhR
QU9UiOOJ+p8Om02td5UMv93liVB4xqA1biZ0l4LaIGIbbxDin4XZI3Ww2vWdSb8s
8famu8OFPBJ9IWjBqx6X48BscDIJVv3oYDBdfOS9LlVlGOG9sWI3/hu33Z9OOM4F
r4aoxgeD7fHj5G6vgkCkJ7FSHcdiy9NsIwQcyhUu9JgHIj6UNHU=
=ksmI
-END PGP SIGNATURE-

Re: Commits and new document visibility

2019-03-14 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 3/14/19 10:46, Shawn Heisey wrote:
> On 3/14/2019 8:23 AM, Christopher Schultz wrote:
>> I believe that the only thing I want to do is to set the 
>> autoSoftCommit value to something "reasonable". I'll probably
>> start with maybe 15000 (15sec) to match the hard-commit setting
>> and see if we get any complaints about delays between "save" and
>> "seeing the user".
> 
> In my opinion, 15 seconds is far too frequent for opening a new 
> searcher.  If the index reaches any real size, you may be in a
> situation where the full soft commit takes longer than 15 seconds
> to complete - mostly due to warming or autowarming.  Commits that
> open a searcher can be very resource-intensive ... if they happen
> too frequently, then heavy indexing will cause your Solr instance
> to never "calm down" ... it will always be hitting the CPU and disk
> hard. I'd personally start with one minute and adjust from there
> based on how long the commits take.
Okay. Current core size is ~1M documents. I think users can live with
a 1-minute delay, but I'll have to ask :)

Is the log file the best resource for information on (soft)
commit-duration?

>> In our case, we don't have a huge number of documents being
>> created in a minute. Probably once per minute, if that.
>> 
>> Does that seem reasonable?
>> 
>> As for actually SETTING the setting, I'd prefer not to edit the 
>> solrconfig.xml document. Instead, can I set this in my
>> solr.in.sh script? I see an example like this right in the file:
>> 
>> SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000"
> 
> 3 seconds is even more problematic than 15.

Sorry, that was just a copy/paste directly from the default solr.in.sh
script that ships with Solr. I wouldn't do a 3-second soft-commit.

> I believe that when you use "bin/solr create" to create an index
> with the default config, that it does set the autoSoftCommit to 3
> seconds. Which as I stated, I believe to be far too frequent.

Nope, it sets it to "never soft commit", unless the defaults have
changed since I built this service with, I think, 7.3.0.

Is there any way to change this value at runtime, or does it require a
service-restart?

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlyKxt8ACgkQHPApP6U8
pFg9ChAAkSgsvn3+xufyLM9bA8WIWqICwmDWRdFM9nbSiy4bDH1Zl/86FKjzcvbB
lmyVFYlpFGedcSKLVsqXGEZiu8n0YgR6iVw6udfIJOWzex5JkwUBUsmS6bHP5ZAj
8wkTyWPyBQVBSBWUxQnEzfrgJCFxzEbzBt8no0gt0f7vbgXm+HaFBkb+l2MQzTK9
wrhsLh36cb17ig+/w16Eo4Rq5VQ5f/P4Y7PkTfzS5CaWyPi16mTP8Z7vTxQ+ltHQ
IPAVnZ4U6Tx4hFxf2Ox99qRX5wAlX0lMD063Gx7Q348Xn+u8VH8Aur8hudnb9Icf
MK9OqU0bxdeWkhDxGDCuxY4h+t+kE1YI0cPI5KWTkBVAU24dCOAPkJQ0LMGs/rGR
B3KareFltLztowvM8rxOeNcLzeoKn1ZpWrtPuK9tuaCy9LnwxgfTOGJFRuzhzxPF
WHA7R4LtQrjjmAXV1a/BgkNVXXmGnq1qJNyICiV6nYS/ALJXKidrexgcyJ4FoWK4
uEcy/62mtbTVz7I4mdmkNH/vwjjOTxZy2FXfwoUIQYe9R2RHM9NbF0Fzzrvx3hQH
vp2GD+AhzhIQUuqBe50XqUkC0T199ZgR4YkCBX7LdPDPcv54QgAfgjfImidQAiqn
s+i/J/rBFZPTD2vAgix+A74UNpePrKhODt0GNg92J4NvTU8P9kM=
=FwiA
-END PGP SIGNATURE-

Commits and new document visibility

2019-03-14 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I recently had a situation where a document wasn't findable in a
fairly small Solr core/collection and I didn't see any errors in
either the application using Solr or within Solr itself. A Solr
service restart caused the document to become visible.

So I started reading.

I believe the "problem" is that the document was indexed but not
visible due to the default commit settings in Solr 7.5 -- which is the
version  I happen to be running right now.

I never bothered so change anything from the defaults because, well, I
didn't know what I was doing. Now that I (a) have a problem to solve
and (b) know a little more about what is happening, I just wanted a
quick sanity-check on what I'd like to do.

[Quick background: my core/collection stores user data so that other
users can quickly find anyone in the system via text-search. This
replaced our previous RDBMS-based "SELECT ... WHERE name LIKE
'%whatever%'" implementation which of course wasn't scaling well.
Generally, users will expect that when a new user is created, they
will be findable "fairly soon" (probably immediately) afterwards.]

We are using SolrJ as a client from our application, btw.

Initially, we were doing:

SolrInputDocument document = ...;
SolrClient solr = ...;
solr.add(document);
solr.commit();

Someone told me that committing after every document-add was wasteful
and it seemed like good advice -- allow Solr's autoCommit mechanism to
handle the commits and we'll get better performance. The problem was
that no new documents are visible unless we take additional action.

So, here's the default settings:

autoCommit   = max 15sec
openSearcher = false

autoSoftCommit = never[*]

This means that every 15 seconds (plus OS/disk sync time), I'll get a
safe snapshot of the data. I'm okay with losing 15 seconds worth of
data if there is some catastrophe.

It also means that my documents are pretty much never made visible.

I believe that the only thing I want to do is to set the
autoSoftCommit value to something "reasonable". I'll probably start
with maybe 15000 (15sec) to match the hard-commit setting and see if
we get any complaints about delays between "save" and "seeing the user".

In our case, we don't have a huge number of documents being created in
 a minute. Probably once per minute, if that.

Does that seem reasonable?

As for actually SETTING the setting, I'd prefer not to edit the
solrconfig.xml document. Instead, can I set this in my solr.in.sh
script? I see an example like this right in the file:

SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000"

Is that a fairly standard way to set the autoSoftCommit value for all
cores?

Thanks,
- -chris

[*] This setting is documented only in a single place: in the
"near-real-time" documentation. It would be nice if that special value
was called-out in other places so it wasn't so hard to find.
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlyKY9wACgkQHPApP6U8
pFhxzRAAnxLCMPSFwJxChXZ8q7UJ9hHAGyMPHNs3k0tFilt9/aT+eR7rUEFGupvR
anl+o7QNU8fOreF/l0KoFeGpjNLHZqEJRSKrZkaEb0PH3gabH5IKpgwY9hr+CS9N
bcKC7GwQAs19TdkTorxY+MIBeQo0/bO51Ux7XallzYPdX6BW/+kRGlHCuiAQj3fg
+EwQan0iXLslk/bDxvCvg95B1zlvr7R4iRAOwp9GxIsk4tL8X/B7sOS5pm0RK19/
tiVJuAqTBwD2fQ3lZ1oQftadKMuajgedJdrrgd94jCuwzWVLjJpIXql2AKA/QcsM
7e2zJqOsPy/4eGFUJ+St5/JYxFfm/yzFjV4rTW1/wng65mmbYAGpLsQ3A+05A8s1
o8ciDQ/80/fvnislr3/NGxZF5hSMjJG4xVriDWpdHX+PqfbqfpeaWnR4j8HEP3vy
tPklo3MflnPLk0oA6wqvjSX32ujucVd+X5tKKtkqnE6rorD41FpJGVRvgUrq7Zof
kwNro/r7ObqD72hioJJIkjol3ImL3NGSyeZ6XZtsKx+kEsGoyvW5lsRtC580ksXN
tYaJbCWQbrHmXnf3ooQV0PatQi0YkG70BQceKPXNQJ3l8Fmc2MjrP7aJ9//ptrMl
Pvc0qh4mpzGJKMBjSjaItadmouZdc3dn308xP4WIvpt2a4RYmjo=
=PrAt
-END PGP SIGNATURE-

Re: Get details about server-side errors

2019-02-13 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Jason,

On 2/13/19 07:39, Jason Gerlowski wrote:
> Hey Chris,
> 
> Unfortunately I think you covered the main/only options above.
> 
> HTTP status code isn't the most useful, but it's worth pointing
> out that there are a few things you can do with it.  Some status
> codes are easy to identify and come up with a good message to
> display to your end user e.g. 403 codes.  But of course it doesn't
> do anything to help you disambiguate 400 error messages you get.
> 
> Error handling has always been one of SolrJ's weak spots.  One
> thing people have suggested before is adding some sort of enum to
> error responses that is less ambiguous and easier to interpret 
> programmatically, but it's never been picked up.  A bit more 
> information on SOLR-7170.  Feel free to vote for it or chime in
> there if you think that'd be an improvement.

I've added some comments and a proposed fix that meets *my* needs, but
I want to make sure that it will be useful for others (and not just my
specific use-case).

Thanks,
- -chris

> On Tue, Feb 12, 2019 at 5:09 PM Christopher Schultz 
>  wrote:
>> 
> Hello, everyone.
> 
> I'm trying to get some information about a (fairly) simple case
> when a user is searching using a wide-open query where they can
> type in anything they want, including field-names. Of course, it's
> possible that they will try to enter a field-name that does not
> exist and Solr will complain, like this:
> 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>
> 
Error from server at http://localhost:8983/solr/users: undefined field
> bad_field
> 
> (This is what happens when I search my user database for
> "bad_field:foo" .)
> 
> What is the best way to discover what happened on the server --
> from a code perspective. I can certainly read the above as a human
> and see what the problem is. But my users won't understand
> (exactly) what that means and I don't always have English-language
> searching my user databas e.
> 
> Is there a way to check for "was the error a bad field name?" and 
> "what was the bad field name (or names) detected?"
> 
> I looked at javadoc and saw two hopefuls:
> 
> 1.   code -- unfortunately, this is the HTTP response code
> 
> 2.  metadata -- unfortunately, this just returns 
> {error-class=org.apache.solr.common.SolrException,root-error-class=org
.a
>
> 
pache.solr.common.SolrException},
> which is already obvious from the exception type.
> 
> Is there something in SolrJ that I'm overlooking, here, or am I 
> limited to what I can parse out of the exception's "getMessage"
> string?
> 
> Thanks, -chris
> 
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlxkPP8ACgkQHPApP6U8
pFiWkg/+Jd9kHBUc0dYPw9EqkiqjzKDc+/adtERK1TktD/GxYoJaXoKwkNeSt+5C
nOysBDwPoPBZELKmCQVDyyyIKrGfSYw2Pva0fDuMr1fKNoazf6I68/5BusjNf5iL
ETkPuCtGuV6fmETGnK9xLFKE41tTO2u32erWnCcxbBPC858qNhafYfO1UZ3lzjuj
kvuV81RESL4LQvbfx98FKxhgiHJGCV9maY4xFGQeNpI0nc3btnneAGfqUIBxJdhk
RT97PdMF1yZ37aLx4H4wUTtey8hAvJhHSpDg1fw+UDNoGXcefpTwh+KQMqK5D3Cg
QRLzdbzu2BR14saV2tkJ+lKbt0zvurYgOJ2J2CaCz2o44n0P82ll3hCnUCV8WfYW
G70iKi8+8y73jMCOYf5hPO3O5uUJXg3dpGjgaRHHzkoOks2A+3QEWlX0CWEyoO4U
Zg2avKpZNgHj6I5TxyiHD4EkhU3/e3GHbB4neUyvU36zpC6+g54a3CM7HoxWBTUn
NtU2C7jDHJozUnn1S3IGOIdwv5CJ7rJNfgp+m/BOw9xuF1g/Rt7QG68J5KK0/JQE
IL68zAQzWX/1KubIT3Ro5AD/2tR8CKXsCv72U8CdpjSQFFnV+6rFAvS2M7e1D6dm
Lj3yRS4EcKQEgYUKltyWGX2GqnLENGLOUa2wd3aiJY7kiOdNgrA=
=75gr
-END PGP SIGNATURE-

Get details about server-side errors

2019-02-12 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hello, everyone.

I'm trying to get some information about a (fairly) simple case when a
user is searching using a wide-open query where they can type in
anything they want, including field-names. Of course, it's possible
that they will try to enter a field-name that does not exist and Solr
will complain, like this:

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://localhost:8983/solr/users: undefined field
bad_field

(This is what happens when I search my user database for "bad_field:foo"
.)

What is the best way to discover what happened on the server -- from a
code perspective. I can certainly read the above as a human and see
what the problem is. But my users won't understand (exactly) what that
means and I don't always have English-language searching my user databas
e.

Is there a way to check for "was the error a bad field name?" and
"what was the bad field name (or names) detected?"

I looked at javadoc and saw two hopefuls:

1.   code -- unfortunately, this is the HTTP response code

2.  metadata -- unfortunately, this just returns
{error-class=org.apache.solr.common.SolrException,root-error-class=org.a
pache.solr.common.SolrException},
which is already obvious from the exception type.

Is there something in SolrJ that I'm overlooking, here, or am I
limited to what I can parse out of the exception's "getMessage" string?

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlxjRDIACgkQHPApP6U8
pFihjBAAty32GuiOj8XnwJu55Y9tYWFoQOhNEEJEGmeh1mOv4fxj5D4Rh+7MXTJB
7APLZ5IlNjpGMQ5ygLpfFTrLIEljn/f/a8hRslH/g+H3p/y4EJgeyvbNHaQZdkuh
HlKQ9Z/M6HK+1KGvVNB+9onU3hs7+Tct7TjWO/cZ031CPovDknsYTbOBoLW+tszS
BrsR7up0s7AOWYNkXTu8i0tf6A6nkF8+YJvml2mxNvXUCZrhHh71eL3R+v1/zGun
6yYyGCPm5rO9Pkxq+It4Fo8pkvo3z6k65NAflMXsFcEwWaf/5OmzAjE+TrDdqfeQ
InKDsXj3w6ZOHOEWN/lq8kK1alZUP0i8MQJHpAXzlPL213joP9mN2AeNk7airIXE
hPPmUGKjOVlMDJg6ICJiPVibMjwLBiy68TQJj2DX+dMVeYTQSroPBw5VUJhrxinV
+4y6podDJ6xs+27LxfI8DZ8nGAZP/tFYMCLNIdnhOg682PfaiD3ZiDDu5dJvm871
7N0EK3oCkoAmQ3l7xQNtz/0nDdI5TKSOtI3KBXTY72/8dfZlSoE4kwmBh56SrKQJ
KNfT54Cj329p5qKoNBy1bKxw4GyUx0UbKQo8HyFqzK0gQHlH+23taq5IePhocW12
uUMGSvVUnm/E+C5w3OGLJ96Y6a3aiNUORinkTJePz+sJoUbCIwY=
=Ril5
-END PGP SIGNATURE-

Re: Page faults

2019-01-07 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Erick,

On 1/7/19 11:52, Erick Erickson wrote:
> Images do not come through, so we don't see what you're seeing.
> 
> That said, I'd expect page faults to happen:
> 
> 1> when indexing. Besides what you'd expect (new segments written
> to disk), there's segment merging going on in the background which
> has to read segments from disk in order to merge.
> 
> 2> when querying, any fields returned as part of a doc that has
> stored=true docValues=false will require a disk access to get the
> stored data.

A page fault is not necessarily a disk access. It almost always *is*,
but it's not because the application is calling fopen(). It's because
the OS is performing a memory operation which often results in a dip
into virtual memory.

Jeremy, are these page-faults occurring on all the machines in your
cluster, or only some? What is the hardware configuration of each
machine (specifically, memory)? What are your JVM settings for your
Solr instances? Is anything else running on these nodes?

It would help to understand what's happening on your servers. "I'm
seeing page faults" doesn't really help us help you.

Thanks,
- -chris

> On Mon, Jan 7, 2019 at 8:35 AM Branham, Jeremy (Experis) 
>  wrote:
>> 
>> Does anyone know if it is typical behavior for a SOLR cluster to
>> have lots of page faults (50-100 per second) under heavy load?
>> 
>> We are performing load testing on a cluster with 8 nodes, and my
>> performance engineer has brought this information to attention.
>> 
>> I don’t know enough about memory management to say it is normal
>> or not.
>> 
>> 
>> 
>> The performance doesn’t appear to be suffering, but I don’t want
>> to overlook a potential hazard.
>> 
>> 
>> 
>> Thanks!
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Jeremy Branham
>> 
>> jb...@allstate.com
>> 
>> Allstate Insurance Company | UCV Technology Services |
>> Information Services Group
>> 
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwzpYsACgkQHPApP6U8
pFgSHxAAgaXV5wkwV7Ru2QyhnvxUnIWY4Iom0IdZYrDuZBDxmFx9wzE7P33zmR3E
nrgZCqBtAMdxRSwG9BfyKircChZBssqtQpskw6mgJyzRyGvKVJjJ68r0vEio3Kjo
HjaJczBFWvdOKm42W1Li4SeymGyYXu/jmdkWLcIbEM4BgDQLf1HhSEphDeZzP4ST
GNDBrIA6XkUJwE1r58FUuj9l0XSKUAPLOPNAx1qGiAn4fKdbysVHvLcvJvJzC0pC
1kx000r+Mqdd61EzhM20ZDIvg2F3vgFgGCUtB31hIi18bfD8whoAafL2FSMkIccD
H7X09PpUK8qPM/oQgqCKTtfmVR3M2pi3CSxLFSQ1/QucnF2wxWknOOWUH1TMU/L2
KUQHS6GwuTk+R/8PxdBRsZI8ON3MVb690ECV4QplYlkrtygXrLRg2YOgifgAXsKL
5Kg2mrpKoxfNnDWaRksy4GUDTsSxbkd1rpnHJEZ8le26HXvz9wrug/FtNPzqP8S9
dan2gkgiSqOM9GKlKkA72ROyQDhZa5YiXfGNdRrmfkiQzlDBEcGpD8pg1GwskRJl
yidTBfvRSyCHsI5NBGf65nTG+2WfUnr8wClHVK5QQGVilHBn6KzeHeDTL9ZpHvcn
GhkDMvc+9f8DR7Hr/mTiGjYIAvJZYiIJeYUoe0Bl2BHmGDv0tEk=
=OpZo
-END PGP SIGNATURE-

Re: solr optimize command

2018-11-30 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 11/29/18 18:53, Shawn Heisey wrote:
> On 11/29/2018 4:41 PM, Christopher Schultz wrote:
>> When mine returned (with wait=true as a request parameter), I got
>> a JSON response telling me how long it took.
> 
> That's what I would expect.
> 
> If you have to explicitly include parameters like "wait" or 
> "waitSearcher" to make it block until the optimize is done, then in
> my mind, that's a bug.  That should be the default setting.  In the
> 7.5 reference guide, I only see "waitSearcher", and it says the
> default is true.

I didn't test it without that parameter. I used it because it was
suggested to me earlier this week on this list. It may in fact be
optional. I was using Solr 7.4.

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwBZjQACgkQHPApP6U8
pFj2ZBAAq741UaizWQkea2dsupyJMUAs+K0A3oHh3Z9QCJqonXdgew620HMmlj2v
iTD1ECZ0OxUy6h4fDKAUFw96FO0/86gsGGMI+BVGZjbBN46oXwpUsNik3gEj3h/E
VjEZ0Nh0qpA783ug2Ezl7zHfeEBd+TRo6tHP1T7S6xp1JFqAs+kB5hxnepipFA/Q
SFssFmdub/0TTDSfxi2taPWxkHVCJO6Atse2HGhiLiRve/ZnV1LabnZnV92OCK6q
YucL3HzrOe23mu1qGJ2uzRM6M8pVkw5QioAUm/ESOFTVv5wqTwMPQ/HGTqO7W/Mp
qU0v3D8+ziKUtCW94UGSEDC5eBOhlr270JWOplYyrxhL/szCCSZ2yVLYaIz6ZXyI
EF5jh1WUsh6w+TrPPN0obUtbN/ZH6SLFzQzocbV6ZhZZL7kqgrAGmw1TVcokR0fC
HhXj0sEukrhRGBaog3+8w21j/ACywb02kTyl21ntpo/+flKHKpitafU2juLHJswD
nb3Q2YAD2bIWX8Ms9QTtozAc+EFVmNw5j2piFprTtWYdbAfqqTS/MxKqZoy/8L49
qiS1lY3eivOGDQufhAhdTO8jTzly5V6Y6xlJ8i0n0oQiPP2FY8yZeCLphdE5Wo/i
jfoauU9WwRGWdq1dwPUe1ZAg9eft2rlvexrVyjh7vjVk92sp17M=
=0Tlc
-END PGP SIGNATURE-

Re: solr optimize command

2018-11-29 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 11/29/18 17:56, Shawn Heisey wrote:
> On 11/28/2018 6:22 PM, Wei wrote:
>> I use the following http request to start solr index
>> optimization:
>> 
>> http://localhost:8983/solr//update?skipError=true -F
>> stream.body=' '
>> 
>> The request returns status code 200 shortly, but when looking at
>> the solr instance I noticed that actual optimization has not
>> completed yet as there are more than 1 segments. Is the optimize
>> command async? What is the best approach to validate that
>> optimize is truly completed?
> 
> I do not know how that request can return a 200 before the optimize
> job completes.  The "wait" parameters (one of which Christopher
> mentioned) should all default to true, and I don't see them on your
> request.  As far as I know, the operation is NOT asynchronous.  Are
> you absolutely sure that it returned a 200? I'd like to see the
> actual response to verify.
> 
> I hate to assume you're wrong, but I think it's probably more
> likely that your HTTP request timed out because of overly
> aggressive timeout settings, probably a socket timeout.  If you
> have definitive proof that you received the 200 and a
> normal-looking response, then we'll need to look deeper.  Do you
> have the entry in solr.log for the optimize request?

When mine returned (with wait=true as a request parameter), I got a
JSON response telling me how long it took.

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwAeT0ACgkQHPApP6U8
pFiXchAAxMzdVbXF5WrAC3K0E5rwg99hTh9n6WdzrtaZvGfKGCI9HyxMSrp/mZ8l
CzHXCx7gYZboW2qPHQtfZM0jknNtWHdOd5CahmXzd4vpFee85PJlWWru8cVEsnHZ
hQfNhX/kVRbFlA3lA++1gYZbl/cqdlqMdfF3pn/X3nnwto7xSsYg1vKKi0+4HW/5
yWm8AmsLYK8eluHOcpheCTOGhT9NPt5OkTsT6FxLSDfyAoSVN8GnCIKZJwRtX6Ni
m826mtc55BSb0dM6Zh3xRyLl5O1BIknIC8QaZtL1OiAb/8r3iJoc/vfhP64Jzq+5
enVORXbdqeWjPF+mJoBNPnCb14VnvzyUX+G4PhrN9jPgsWzlv2FDBwWBopOiAl/L
GZKSRRasxQ6Uwk09U2x6PPwlWCP6fC3i4xJoM++Rj1VRRCu6j7duyats9UBXlQ7M
bJcjlvAVQgaAMgndBJikPEFljyhgg+Tl8iAtf1PMUO8nPoboAwIGmZZwRsoBAPXP
rvvi1/V5KHlO6tDjQ5PLZVq9Bo71BbVDEUrJkyEUU+pAU1xZKyAhWANydCuasZ+n
CLShdIlGb4LTzRdv8L0WklTdl9BAEGa0hhNjdNNJkNxBngaX9cCyTJdZi0ImswsG
CZUlriNR0Ojue/yVDF+K5YxtQmw2slFysadX4kgNPO6LS2dwkeM=
=Xd+S
-END PGP SIGNATURE-

Re: solr optimize command

2018-11-29 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Wei,

On 11/28/18 20:22, Wei wrote:
> Hi,
> 
> I use the following http request to start solr index optimization:
> 
> http://localhost:8983/solr//update?skipError=true -F
> stream.body=' '
> 
> 
> The request returns status code 200 shortly, but when looking at
> the solr instance I noticed that actual optimization has not
> completed yet as there are more than 1 segments. Is the optimize
> command async? What is the best approach to validate that optimize
> is truly completed?

Try this instead:

http://localhost:8983/solr//update?optimize=true=true

This will wait until the operation has completed. Note that your
client (e.g. curl) may time-out after some time, so you'll want to
adjust that timeout to make sure the client doesn't give-up before the
optimization operation has completed.

As others have said, perhaps you don't actually need to optimize anythin
g.

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlv//R4ACgkQHPApP6U8
pFi3+w/8C+pvp/XBqHUPeVCd7rEvU1v7mPOx+9lQ/zmU/OE3Y7rmAmVBXiiFvXeT
p2tKwhaNSrpx+MoGtaLu0GKg+nczD6K7yxOuRiltmr2KCg+6vCexJAd4yHFIt3H6
FmBnS3Couja7DwD/49pk75o/IkgXj3zok49fbt75AObttQOwXYo06yuijqN/08Wt
ieKo/4iLYLwGd3Pii8DnBTu3+IXlQG2eBbdOsNBazr2az0UrOkO+Xuj+IKv8brYr
LwMJ36e+m+Q2Gj8ZUvTQ8lTQNs7HD5giqtQXMelUXF7dcGPSwG9jCMvSTHfb+0rs
woMIt6ehRsW2CeP2Vrm2qY5gxeVIK5LwkwRcjZUq4gIDes3eiOImDLCE8Fhxxn2Z
xifKL7fQPlwdQWWXm2KDfTN+VvLVyWeA1n5z7drgD13VARdbA5c66iaIgguw0uKP
an3YC8uYbcZJolyWt/yu9r01pBTUsnxCpXDo5s5xUAz0LWdoRSNRDS872ohZxRIR
mcfCPbYUwNyhnclvzIPPcE8Z2sbCNaHcc2b5ZuavlA4PgEwFxgI1PweDXSa2Tuxg
lzuus5uS/U8lGSrkheeQDBmX6nCl2n1jsnXS4CXLGNHzH3uOVkJFmFraVNZCav16
t7SKTQc8Yc9P3AbdesG13C0iQDGjo3WLoKg7ghO3khoEL+NMKbQ=
=1wy3
-END PGP SIGNATURE-

Re: Period on-line index optimization

2018-11-28 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Erick,

On 11/27/18 20:47, Erick Erickson wrote:
> And do note one implication of the link Shawn gave you. Now that 
> you've optimized, you probably have one huge segment. It _will not_
> be merged unless and until it has < 2.5G "live" documents. So you
> may see your percentage of deleted documents get quite a bit larger
> than you've seen before merging kicks in. Solr 7.5 will rewrite
> this segment (singleton merge) over time as deletes accumulate, or
> you can optimize/forceMerge and it'll gradually shrink (assuming
> you do not merge down to 1 segment).

Ack. It sounds like I shouldn't worry too much about "optimization" at
all. If I find that I have a performance problem (hah! I'm comparing
the performance to a relational table-scan, which was intolerably
long), I can investigate whether or not optimization will help me.

> Oh, and the admin UI segments view is misleading prior to Solr
> 7.5. Hover over each one and you'll see the number of deleted docs.
> It's _supposed_ to be proportional to the number of deleted docs,
> with light gray being live docs and dark gray being deleted, but
> the calculation was off. If you hover over you'll see the raw
> numbers and see what I mean.

Thanks for this clarification. I'm using 7.4.0, so I think that's what
was confusing me.

I'm fairly certain to upgrade to 7.5 in the next few weeks. For me,
it's basically a untar/stop/ln/start operation as long as testing goes
well.

- -chris

> On Tue, Nov 27, 2018 at 2:11 PM Shawn Heisey 
> wrote:
>> 
>> On 11/27/2018 10:04 AM, Christopher Schultz wrote:
>>> So, it's pretty much like GC promotion: the number of live
>>> objects is really the only things that matters?
>> 
>> That's probably a better analogy than most anything else I could
>> come up with.
>> 
>> Lucene must completely reconstruct all of the index data from
>> the documents that haven't been marked as deleted.  The fastest
>> I've ever seen an optimize proceed is about 30 megabytes per
>> second, even on RAID10 disk subsystems that are capable of far
>> faster sustained transfer rates.  The operation strongly impacts
>> CPU and garbage generation, in addition to the I/O impact.
>> 
>>> I was thinking once per day. AFAIK, this index hasn't been
>>> optimized since it was first built which was a few months ago.
>> 
>> For an index that small, I wouldn't expect a once-per-day
>> optimization to have much impact on overall operation.  Even for
>> big indexes, if you can do the operation when traffic on your
>> system is very low, users might never even notice.
>> 
>>> We aren't explicitly deleting anything, ever. The only deletes 
>>> occurring should be when we perform an update() on a document,
>>> and Solr/Lucene automatically deletes the existing document
>>> with the same id
>> 
>> If you do not use deleteByQuery, then ongoing index updates and
>> segment merging (which is what an optimize is) will not interfere
>> with each other, as long as you're using version 4.0 or later.
>> 3.6 and earlier were not able to readily mix merging with ongoing
>> indexing operations.
>> 
>>> I'd want to schedule this thing with cron, so curl is better
>>> for me. "nohup optimize &" is fine with me, especially if it
>>> will give me stats on how long the optimization actually took.
>> 
>> If you want to know how long it takes, it's probably better to
>> throw the whole script into the background rather than the curl
>> itself.  But you're headed in the right general direction.  Just
>> a few details to think about.
>> 
>>> I have dev and test environments so I have plenty of places to 
>>> play-around. I can even load my production index into dev to
>>> see how long the whole 1M document index will take to optimize,
>>> though the number of segments in the index will be different,
>>> unless I just straight-up copy the index files from the disk. I
>>> probably won't do that because I'd prefer not to take-down the
>>> index long enough to take a copy.
>> 
>> If you're dealing with the small index, I wouldn't expect copying
>> the index data while the machine is online to be problematic --
>> the I/O load would be small.  But if you're running on Windows, I
>> wouldn't be 100% sure that you could copy index data that's in
>> use -- Windows does odd things with file locking that aren't a
>> problem on most other operating systems.
>> 
>>> You skipped question 4 which was "can I update my index during
>>> an optimization&qu

Re: Period on-line index optimization

2018-11-27 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Walter,

On 11/27/18 12:31, Walter Underwood wrote:
> Optimize is just forcing a full merge. Solr does merges
> automatically in the background.
Understood.

> It has been automatically doing merges for the months you’ve been 
> using it. Let it continue. Don’t bother with optimize.
Fair enough.

> It was a huge mistake to name that function “optimize”. Ultraseek
> had a button labeled “Merge”.
I understand that "optimize" makes it sounds like, without performing
that operation, that the index is "not optimized" which sounds bad.
I'm not hung-up on the terminology.

In my live index, I can see total 20 segments. 7 of them are "all
gray" and the other 13 are at various levels of "dark grayness". I
haven't been able to find a reference for what those colors mean, but
they don't seem to be correlated with any data I can see on each segment
.

When I have run an "optimize" operation on a test index, I can see a
single segment which is shown all in "light gray", whatever that means.

Other than wasting my time, are there any negative consequences for
periodically "optimizing" (or merging) the index?

Thanks,
- -chris

>> On Nov 27, 2018, at 9:04 AM, Christopher Schultz
>>  wrote:
>> 
> Shawn,
> 
> On 11/27/18 11:01, Shawn Heisey wrote:
>>>> On 11/27/2018 7:47 AM, Christopher Schultz wrote:
>>>>> I've got a single-core Solr instance with something like 1M
>>>>> small documents in it. It contains user information for
>>>>> fast-lookups, and it gets updated any time relevant
>>>>> user-info changes.
>>>>> 
>>>>> Here's the basic info from the Core Dashboard:
>>>> 
>>>> 
>>>> 
>>>>> I'm wondering how often it makes sense to "optimize" my
>>>>> index, because there is plenty of turnover of existing
>>>>> documents. That is, plenty of existing users update their
>>>>> info and therefore the Lucene index is being updated as
>>>>> well -- causing a document-delete and document-add
>>>>> operation to occur. My understanding is that leaves a lot
>>>>> of dead space over time, and I'm assuming that it might
>>>>> even slow things down as the ratio of useful data to total
>>>>> data is reduced.
>>>> 
>>>> The percentage of deleted documents here is fairly low. About
>>>> 7.6 percent.  Doing an optimize with deleted percentage that
>>>> low may not be worthwhile.
>>>> 
>>>> On the other hand, it *would* improve performance by a little
>>>> bit to optimize.  For the index with the stats you mentioned,
>>>> you'd be going from 15 segments to one segment.  And with an
>>>> index size of under 300 MB, the optimize operation would
>>>> complete pretty quickly - likely a few minutes, maybe even
>>>> less than one minute.
> Okay. What I really don't want to do is interrupt normal
> operation.
> 
>>>>> Presumably, optimizing more often will reduce the time to 
>>>>> perform a single optimization operation, yes?
>>>> 
>>>> No, not really.  It depends on what documents are in the
>>>> index, not so much on whether an optimization was done
>>>> previously. Subsequent optimizes will take about as long as
>>>> the previous optimize did.
> 
> So, it's pretty much like GC promotion: the number of live objects
> is really the only things that matters?
> 
>>>>> Anyhow, I'd like to know a few things:
>>>>> 
>>>>> 1. Is manually-triggered optimization even worth doing at
>>>>> all?
>>>> 
>>>> Maybe.  See how long it takes, how much impact it has on 
>>>> performance while it's happening, and see if you can get an 
>>>> estimate of how much extra performance you get from it once
>>>> it's done.  If the impact is low and/or the benefit is high,
>>>> then by all means, optimize regularly.
>>>> 
>>>>> 2. If so, how often? Or, maybe not "how often [in 
>>>>> hours/days/months]" but maybe "how often [in deletes,
>>>>> etc.]"?
>>>> 
>>>> For an index that size, I would say you should aim for an
>>>> interval between once an hour and once every 24 hours.  Set
>>>> up this timing based on what kind of impact the optimize
>>>> operation has on performance while it's occurring.  Might be
>>>> best to

Re: Period on-line index optimization

2018-11-27 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 11/27/18 11:01, Shawn Heisey wrote:
> On 11/27/2018 7:47 AM, Christopher Schultz wrote:
>> I've got a single-core Solr instance with something like 1M small
>> documents in it. It contains user information for fast-lookups,
>> and it gets updated any time relevant user-info changes.
>> 
>> Here's the basic info from the Core Dashboard:
> 
> 
> 
>> I'm wondering how often it makes sense to "optimize" my index, 
>> because there is plenty of turnover of existing documents. That 
>> is, plenty of existing users update their info and therefore the 
>> Lucene index is being updated as well -- causing a 
>> document-delete and document-add operation to occur. My 
>> understanding is that leaves a lot of dead space over time, and 
>> I'm assuming that it might even slow things down as the ratio of 
>> useful data to total data is reduced.
> 
> The percentage of deleted documents here is fairly low. About 7.6 
> percent.  Doing an optimize with deleted percentage that low may 
> not be worthwhile.
> 
> On the other hand, it *would* improve performance by a little bit 
> to optimize.  For the index with the stats you mentioned, you'd be 
> going from 15 segments to one segment.  And with an index size of 
> under 300 MB, the optimize operation would complete pretty quickly 
> - likely a few minutes, maybe even less than one minute.
Okay. What I really don't want to do is interrupt normal operation.

>> Presumably, optimizing more often will reduce the time to
>> perform a single optimization operation, yes?
> 
> No, not really.  It depends on what documents are in the index,
> not so much on whether an optimization was done previously. 
> Subsequent optimizes will take about as long as the previous 
> optimize did.

So, it's pretty much like GC promotion: the number of live objects is
really the only things that matters?

>> Anyhow, I'd like to know a few things:
>> 
>> 1. Is manually-triggered optimization even worth doing at all?
> 
> Maybe.  See how long it takes, how much impact it has on 
> performance while it's happening, and see if you can get an 
> estimate of how much extra performance you get from it once it's 
> done.  If the impact is low and/or the benefit is high, then by
> all means, optimize regularly.
> 
>> 2. If so, how often? Or, maybe not "how often [in 
>> hours/days/months]" but maybe "how often [in deletes, etc.]"?
> 
> For an index that size, I would say you should aim for an interval
>  between once an hour and once every 24 hours.  Set up this timing 
> based on what kind of impact the optimize operation has on 
> performance while it's occurring.  Might be best to do it once a 
> day at a low activity time, perhaps 03:00.  With indexes slightly 
> bigger than that, I was doing an optimize once an hour. And for
> the bigger indexes, once a day.

I was thinking once per day. AFAIK, this index hasn't been optimized
since it was first built which was a few months ago.

>> 3. During the optimization operation, can clients still issue 
>> (read) queries? If so, will they wait until the optimization 
>> operation has completed?
> 
> Yes.  And as long as you don't use deleteByQuery, you can even 
> update the index while it's optimizing.  The deleteByQuery 
> operation will cause problems, especially when the index gets 
> large.  With your small index size, you might not even notice the 
> problems that mixing optimize and deleteByQuery will cause. 
> Replacing deleteByQuery with a standard query to retrieve ID
> values and then doing a deleteById will get rid of the problems
> that DBQ causes with optimize.

We aren't explicitly deleting anything, ever. The only deletes
occurring should be when we perform an update() on a document, and
Solr/Lucene automatically deletes the existing document with the same id
.

>> 5. Is it possible to abort an optimization operation if it's 
>> taking too long, and simply discard the new data -- basically, 
>> fall-back to the previously-existing index data?
> 
> I am not aware of a way to abort an optimize.  I suppose there 
> might be one ... but in general it doesn't sound like a good idea 
> to me, even if it's possible.
> 
>> 6. What's a good way to trigger an optimization operation? I 
>> didn't see anything directly in the web UI, but there is an 
>> "optimize" method in the Solr/J client. If I can fire-off a 
>> fire-and-forget "optimize" request via e.g. curl or similar tool 
>> rather than writing a Java client, that would be slightly more 
>> convenient for me.
> 
> Removal of the optimize button fr

Period on-line index optimization

2018-11-27 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I've got a single-core Solr instance with something like 1M small
documents in it. It contains user information for fast-lookups, and it
gets updated any time relevant user-info changes.

Here's the basic info from the Core Dashboard:

Last Modified:
less than a minute ago
Num Docs:
1011023
Max Doc:
1095364
Heap Memory Usage:
-1
Deleted Docs:
84341
Version:
2582476
Segment Count:
15
Current: Ø


Replication (Master)
Version Gen Size
Master (Searching)  
1543329227929   491727  277.23 MB

Each document add/update operation has an immediate explicit "commit"
operation, which may be unnecessary, but it's there in case it makes
any difference for this question.

I'm wondering how often it makes sense to "optimize" my index, because
there is plenty of turnover of existing documents. That is, plenty of
existing users update their info and therefore the Lucene index is
being updated as well -- causing a document-delete and document-add
operation to occur. My understanding is that leaves a lot of dead
space over time, and I'm assuming that it might even slow things down
as the ratio of useful data to total data is reduced.

Presumably, optimizing more often will reduce the time to perform a
single optimization operation, yes?

Anyhow, I'd like to know a few things:

1. Is manually-triggered optimization even worth doing at all?

2. If so, how often? Or, maybe not "how often [in hours/days/months]"
but maybe "how often [in deletes, etc.]"?

3. During the optimization operation, can clients still issue (read)
queries? If so, will they wait until the optimization operation has
completed?

4. During the optimization operation, can clients still issue writes?
If so, will they wait until the optimization operation has completed?

5. Is it possible to abort an optimization operation if it's taking
too long, and simply discard the new data -- basically, fall-back to
the previously-existing index data?

6. What's a good way to trigger an optimization operation? I didn't
see anything directly in the web UI, but there is an "optimize" method
in the Solr/J client. If I can fire-off a fire-and-forget "optimize"
request via e.g. curl or similar tool rather than writing a Java
client, that would be slightly more convenient for me.

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlv9WOgACgkQHPApP6U8
pFjPhQ//TOkMwES1ytAbugFE/bZdpwff9LS3sRbCEEL6Bbl9yeqZMXqDf652p2CN
P9EusGW0WTSvhRaJb50H+jo4y5QxJmV36aBkMej7/o4yFw0hIRSqqihlbEFAVkI1
VMGWtr7s0Vv9O+/Wj0MP8FAizwm8d7nYl03rTvfY0b+BESOQHXv5I8DEai1+/mgF
Mx49HG82qXo/9OZocrv4tal97juF7UcNDowVlnk0wcuk5LjEuilhzpOXtcTG9QmB
Nc4H//d6hcDN0tp/az5hY1EoU3xmSdW2m243kgdzjVjz/Q9FotB0jAo3WGbD5EiB
nmM1Yp0bKfRX/xLPHbtJ/wlQSSY4Dm/E0Y5Nb5fZFjnHtEke7/hWX1Qxps28gOs+
hXfm4WyjaTirnJk5h+I3wVJvzaHycD0vIFNwJ18JkLpPaVZ56iDfHcKVc5eHlWaa
gaKYyLhz8DluZC//ydVFAbqDy7xOIeh/fiACFHM/SH9KjdempaVD1KrlO1/fxG0v
U9Z4xI5GladTUnelcvvggCbl+9wFe3pO8xLqN4NMdftn5CNDFDTIs9Diph19jQJr
sf7ETDQwWBebc6BesXdmFyKT8zHzX+x9uU3LtF9Tww5H0AS4JseEfogB3bsF6r3X
MlRId02UPSuAMmzbMLn52jX0NljbMRNN1rHy3tVGpJD9OPgU3A8=
=pb5Z
-END PGP SIGNATURE-

Re: Solr JVM Memory settings

2018-10-12 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hendrik,

On 10/12/18 02:36, Hendrik Haddorp wrote:
> Those constraints can be easily set if you are using Docker. The
> problem is however that at least up to Oracle Java 8, and I believe
> quite a bit further, the JVM is not at all aware about those
> limits. That's why when running Solr in Docker you really need to
> make sure that you set the memory limits lower. I usually set the
> heap and metaspace size. How you set them depends again a bit on
> your Solr configuration. I prefer the JVM to crash due to memory
> limits rather then the Linux OOM Killer killing the JVM as the
> OutOfMemoryError from the JVM does at least state what memory was
> out.

Limiting the native memory used by attempting to limit the heap is not
actually limiting the native memory used. It's just an attempt to do
so. If you limit the native memory using OS limits (or, using Docker,
simply make it look like there is less system memory) then you haven't
actually achieved anything. You could have done that simply by
lowering heap values and avoided the complexity of Docker, etc.

- -chris

> On 11.10.2018 16:45, Christopher Schultz wrote: Shawn,
> 
> On 10/11/18 12:54 AM, Shawn Heisey wrote:
>>>> On 10/10/2018 10:08 PM, Sourav Moitra wrote:
>>>>> We have a Solr server with 8gb of memory. We are using solr
>>>>> in cloud mode, solr version is 7.5, Java version is Oracle
>>>>> Java 9 and settings for Xmx and Xms value is 2g but we are
>>>>> observing that the RAM getting used to 98% when doing
>>>>> indexing.
>>>>> 
>>>>> How can I ensure that SolrCloud doesn't use more than N GB
>>>>> of memory ?
>>>> Where precisely are you seeing the 98% usage?  It is
>>>> completely normal for a modern operating system to report
>>>> that almost all the system memory is in use, at least after
>>>> the system has been shuffling a lot of data.  All modern
>>>> operating systems will use memory that has not been
>>>> specifically allocated to programs for disk caching purposes,
>>>> and system information tools will generally indicate that
>>>> this memory is in use, even though it can be instantly
>>>> claimed by any program that requests it.
>>>> 
>>>> https://en.wikipedia.org/wiki/Page_cache
>>>> 
>>>> If you tell a Java program that it is limited to a 2GB heap,
>>>> then that program will never use more than 2GB, plus a little
>>>> extra for the java runtime itself.  I cannot give you an
>>>> exact figure for that little bit extra.  But every bit of
>>>> data on disk that Solr accesses will end up (at least
>>>> temporarily) in the operating system's disk cache -- using
>>>> that unallocated memory.
>>>> 
>>>> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
> To be fair, the JVM can use *much more* memory than you have
> specified for your Java heap. It's just that the Java heap itself
> wont exceed those values.
> 
> The JVM uses quite a bit of native memory which isn't counted in
> the Java heap. There is only one way I know of to control that, and
> it's to set a process-limit at the OS level on the amount of
> memory allowed. I'm not sure how sensitive to those limits the JVM
> actually is, so attempting to artificially constrain the JVM might
> end up with a native OOM crash.
> 
> -chris
> 
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlvA4O0ACgkQHPApP6U8
pFiCHg/+P+/yoSrvMd2uMyDK16nMCOIdxAL1gdS++DqS+qPmch1BHJTA9nuHybF4
j6WElpCI7Q3HP/sgsGE8kHE6Kg+DFJNz7mGJqgXjnSkm90LzETRFMqa959fTgBo6
SILD4n4LnZI844VoaKb2gIVibr804hloxX5UDe0XYFp3EtcVi4QMC5Q2ovn8+RoJ
S/LJx/VQi3AqtcCaEYAAKpYrKxO3OkoIKnN+oC55ag/16zh9StT2TUI03bBslcxn
PkS5zdsSmsS7NydSR4Gn4C7wAGyL3hGoU6pD+GhvYE9EF29KxHXFSIe2FJQ6mdRf
ikZvm17U8OFNwqlB4OOLziGvOkcmIgtqchnhUm80Qwtn0ZMbql2zwlIhOSPWbuPL
lq3F09p1QBqPjbxJdrcmpoSFH8jvmIPdrPOl3BbPEmDzNdnF03sEGP5gDyJ9/INB
AD/QhqvQEKUtMBPX+1/9dxOm+JyUDlARZQ7p4k1BeFjl2BI8imLUK/c6JlWJ757G
QWk+0Ff3R02va+ITWNvGs5C1uOnu2g58eqAggREPWXmXAj9nqJ5EyPkNAaGJBheo
NasGNSXVnjN+hk4QlMTAJ3C5u0Q5lW3HCOXj8Mufo7LE8M96OjRkM09o87NG9sGT
EdX7V8Ypw758Jt9xcms6U9tC2TqekJ9AYu+VLsoGa4OZgy5hfDk=
=Sq+f
-END PGP SIGNATURE-

Re: Solr JVM Memory settings

2018-10-11 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 10/11/18 12:54 AM, Shawn Heisey wrote:
> On 10/10/2018 10:08 PM, Sourav Moitra wrote:
>> We have a Solr server with 8gb of memory. We are using solr in
>> cloud mode, solr version is 7.5, Java version is Oracle Java 9
>> and settings for Xmx and Xms value is 2g but we are observing
>> that the RAM getting used to 98% when doing indexing.
>> 
>> How can I ensure that SolrCloud doesn't use more than N GB of
>> memory ?
> 
> Where precisely are you seeing the 98% usage?  It is completely
> normal for a modern operating system to report that almost all the
> system memory is in use, at least after the system has been
> shuffling a lot of data.  All modern operating systems will use
> memory that has not been specifically allocated to programs for
> disk caching purposes, and system information tools will generally
> indicate that this memory is in use, even though it can be
> instantly claimed by any program that requests it.
> 
> https://en.wikipedia.org/wiki/Page_cache
> 
> If you tell a Java program that it is limited to a 2GB heap, then
> that program will never use more than 2GB, plus a little extra for
> the java runtime itself.  I cannot give you an exact figure for
> that little bit extra.  But every bit of data on disk that Solr
> accesses will end up (at least temporarily) in the operating
> system's disk cache -- using that unallocated memory.
> 
> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

To be fair, the JVM can use *much more* memory than you have specified
for your Java heap. It's just that the Java heap itself wont exceed
those values.

The JVM uses quite a bit of native memory which isn't counted in the
Java heap. There is only one way I know of to control that, and it's
to set a process-limit at the OS level on the amount of memory
allowed. I'm not sure how sensitive to those limits the JVM actually
is, so attempting to artificially constrain the JVM might end up with
a native OOM crash.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlu/YgoACgkQHPApP6U8
pFjcbBAAgYegk20pYvfu3vcrAKxj3s+RSMGRPZ+nN5g0KYQFuhFgptYE+TqjLfBX
geekQUNqNUHO5psMA5q/6m6b3LwpqrMxJiapv0wWQ2wPah21CgLs/P/iG+elNQ63
H0ZXbe3wX0P0onZbP4+sfDyzhujZ+5+gMooK87o8Q4z91hIVX1EZfM4lcaZ3pbnb
JJ44YorWGPpXjQNEtOHfS7l/Q+8+6+XfEyfKha3JpRFcwcqgLpv23Koy4xgxgYr+
PMqfjptMBMjZ04xSdd491crm2yZowv3KH1Ss8v/L51rknGYPxCEkdKvPrUlpn+Rb
4WnQS6H//dJvQaLum/qR9Jxd+3vc13K7Mn++5Lu+jMbeEgaJU2hD4/ap/KMtFCqn
eIXl6HQYPW36sVcm/MIpkRvAgx8vri17sd3/5sOYaETrp4SMxMN5W44GvgDdkbGF
R9/tVBCFWb3p+o8eSKUf7QmARiN69DHGVwtQHWMIp8K9893IeHUNgVXKD7281zLB
AjHPc7QTvAn4xne0X9lvQjr+YKOPxd9FFqMBejdKht9aBFQvApma9LtJT3FInrob
QkSIx594KhoRltRy7E9t3XuWWGg8ujiuzKl6SEPsgXUC2Opwr4Wwu1yn9dCWkFJz
RzCKbaDBaNmrK6HSEsoNvS+yQPksPxM8MuchFaCAMZpVOsobCM0=
=77dD
-END PGP SIGNATURE-

Re: Auto recovery of a failed Solr Cloud Node?

2018-09-28 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 9/27/18 10:00, Shawn Heisey wrote:
> On 9/27/2018 7:24 AM, Kimber, Mike wrote:
>> I'm trying to determine if there is any health check available
>> to determine the above and then if the issue happens then an
>> automated mechanism in SolrCloud to restart the instance. Or is
>> this something we have to code ourselves?
> 
> As shipped by the project, Solr will never restart itself 
> automatically.  If it dies, it's dead until you start it again,
> unless you implement something to restart it automatically.This is
> intentional -- Solr almost never dies unless there's some kind of
> problem -- not enough memory, corrupt software, etc.If Solr *does*
> die, you need to figure out why and fix it, not rely on an
> automatic restart.

I thought someone recently mentioned (but I cannot find a reference,
sorry) that Solr would automatically restart if an OutOfMemoryError
was encountered.

Is that only for single-note Solr (i.e. non-cloud/ZK)?

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluuqJ8ACgkQHPApP6U8
pFgxNQ/7BFG6RbF1I/jQ0Pevs4Yum4BElkAEknEv7JLar9sWwuGCNBe4Zj1wgpaF
Gwmkt9TsQEs7/1amR4nUu1SUAaFkhw3R920/5ad/mz+qzvtyV0VyEEYhiJrAxCoH
EA+fxKYjy/9DgZ5ZFLaBbOl0JUk+6uqoaEX7RoNAZxyGjqVzeVR7JXBzeNl1Wagg
9wiq2MQrP1o8xwsBvQzQPO/sB6YZOlGLiAiAcJ7NAlt7RF4V5XvvG1fz7NM84w1e
iKImZiBorxEl6eangxr8TU2HqkDdfMHxXmAGlmqGuGEkut/agPjM1HeR63vzjy1p
Jslr3Ef2+NIslyMg0jk4e6VBppg1wHJOrrqOyxg0xlNvvJIa7XoinQH3zmu48pFN
fLd4cXXHcZ2Xn4X7g74ey1o4HZyxgY+hu2aSNRUtQrSpcTO3WeF4lYe8cHk871K5
7YF9jJ7SVZblHPqzLNxj1BItmh0FyRflfW7XMPGYHzCs2dKS0IlNtSJYsSZsYKpn
Z85nct0/gQ6uF2LMJdL7MKVbdyn/jtPndIHVSq6fP867r7kCtKY20njnnmjbQFd0
U5Ox+LJ+NU5nKBsckHsfS4TEr5PrUqlAhesgLhNmAhd1GL8iXYvBCLeE/fCNpNjj
ixGNDKX9//z00TOOULyQVzwRjHvFLQyJ+LBmLf/11CxPIt3vxVg=
=2fgU
-END PGP SIGNATURE-

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Jeff,

On 9/26/18 11:35, Jeff Courtade wrote:
> My concern with using g1 is solely based on finding this. Does
> anyone have any information on this?
> 
> https://wiki.apache.org/lucene-java/JavaBugs#Oracle_Java_.2F_Sun_Java_
.2F_OpenJDK_Bugs
>
>  "Do not, under any circumstances, run Lucene with the G1 garbage
> collector. Lucene's test suite fails with the G1 garbage collector
> on a regular basis, including bugs that cause index corruption.
> There is no person on this planet that seems to understand such
> bugs (see https://bugs.openjdk.java.net/browse/JDK-8038348, open
> for over a year), so don't count on the situation changing soon.
> This information is not out of date, and don't think that the next
> oracle java release will fix the situation."

That language is 3 years old and likely just hasn't been updated after
it was no longer relevant. Also, it isn't attributed to anyone in
particular (it's anonymous), so ... maybe it was one person's opinion
and not a project-initiated warning.

- -chris

> On Wed, Sep 26, 2018 at 11:08 AM Walter Underwood
>  wrote:
> 
>> We’ve been running G1 in prod for at least 18 months. Our biggest
>> cluster is 48 machines, each with 36 CPUs, running 6.6.2. We also
>> run it on our 4.10.4 master/slave cluster.
>> 
>> wunder Walter Underwood wun...@wunderwood.org 
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Sep 26, 2018, at 7:37 AM, Jeff Courtade
>>> 
>> wrote:
>>> 
>>> Thanks for that... I am just starting to look at this I was
>>> unaware of the license debacle.
>>> 
>>> Automated testing up to 10 is great.
>>> 
>>> I am still curious about the GC1 being supported now...
>>> 
>>> On Wed, Sep 26, 2018 at 10:25 AM Zisis T. 
>>> wrote:
>>> 
 Jeff Courtade wrote
> Can we use GC1 garbage collection yet or do we still need
> to use CMS?
 
 I believe you should be safe to go with G1. We've applied it
 in in a
>> Solr
 6.6 cluster with 10 shards, 3 replicas per shard and an index
 of about 500GB (1,5T counting all replicas) and it works
 extremely well (throughput > 99%). The use-case includes
 complex search queries and faceting. There is also this post
 you can use as a starting point
 
 
>> http://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-p
roduction/





>> 
- --
 Sent from:
 http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
 
>>> --
>>> 
>>> Jeff Courtade M: 240.507.6116 <(240)%20507-6116>
>> 
>> --
> 
> Jeff Courtade M: 240.507.6116
> 
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlur5i0ACgkQHPApP6U8
pFioLw/9HzPmNo1wtsfZDIsJjXy4i+F0YYqBFKRqXcPH8mLgpxicE5enRVI9he6p
1Z3Mz0wzFj/H91eWktmGyNSKSmFjkYI2IgCBrsZv1gPDFn3mI3TwapJgTR0J4GAg
wXB/9GRHuCCTz7qvfQexBOwOt25OKVOhcvNFVI8bxV0hFl58Nlo56Qzt33X/JS32
jH2jIlz77pal1t5ZhnXJwCSWQyWsLnr5GtoxDisvvOl1o3Ey/WIllvCe8x7M+PvA
0/DIK/5niTSCwcv0LVCPIWsE/HCjsSWfdhnhtTnu1088OTKwb2dsa7wyBJItZUzw
fCTcmcGclViGUa2QAnXNFiVPj1y0PhFxAPMCU6mWPerCSH6cYn5neicsp2AYovoj
dRcs4LGrGf0S7PVJBq/DQdb44XbzvFkkp2SjS9WAnLpBv7RwP4bWfDvMCJsZWJOU
8J2r4ZbkVUjByQ3mAXMZN7bKC6hHBQLLzAwodloAV0OWHJ+Io96flTclDRPt4N6e
J8olEQezDKcgkZDg0GV8I9WxUzeTHI+QvnZxUzwsT/sJUPgxjSDjHlous5HU29ay
6lynoEjVFJd4yYAwh6gaRPMw34xKFT6a62D6bDmcL0MqPCpbcbOny+kgx0k7bzl5
FNsapJ5vCIaG0/tPTuWEY/jaqmhNNznXDr+sEX5l8Sk1ZQz8+/U=
=Y8qg
-END PGP SIGNATURE-

Re: [SolrJ Client] Error calling add: connection is still allocated

2018-09-21 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

On 9/18/18 11:10, Christopher Schultz wrote:
> All,
> 
> Our single-instance Solr server is just getting its first taste of 
> production load, and I'm seeing this periodically:
> 
> java.lang.IllegalStateException: Connection is still allocated
> 
> The stack trace shows it's coming from HTTP Client as called from 
> within Solr.
> 
> We are using SolrJ 7.2.1 and Solr (server) 7.4.0.
> 
> Our code looks something like this:
> 
> private HashMap CLIENT_REGISTRY = new 
> HashMap();
> 
> synchronized HttpSolrClient getSolrClient(String url) throws
> ServiceException, SolrServerException, IOException, 
> GeneralSecurityException { HttpSolrClient solrClient =
> CLIENT_REGISTRY.get(url);
> 
> if(null == solrClient) { log.info("Creating new HttpSolrClient
> connected to " + url);
> 
> solrClient = new HttpSolrClient.Builder(url) 
> .withHttpClient(getHttpClient()) .build();
> 
> solrClient.ping();
> 
> CLIENT_REGISTRY.put(url, solrClient); }
> 
> return solrClient; }
> 
> 
> [here's the code that uses the above]
> 
> SolrClient solr = getSolrRegistry().getSolrClient(url);
> 
> SolrInputDocument doc = new SolrInputDocument();
> 
> // Add stuff to the document
> 
> solr.add(doc); solr.commit();
> 
> That's it.
> 
> Other than not really needing the "commit" at the end, is there 
> anything wrong with how we are using SolrJ client? Are instances
> of SolrJClient not thread-safe? My assumption was that they were 
> threadsafe and that HTTP Client would manage the connection pool
> under the covers.
> 
> Here is the full stack trace:
> 
> com.chadis.api.business.RegistrationProcessor- Error processing 
> registration request java.lang.IllegalStateException: Connection is
> still allocated at
> org.apache.http.util.Asserts.check(Asserts.java:34) at 
> org.apache.http.impl.conn.BasicHttpClientConnectionManager.getConnecti
on
>
> 
(BasicHttpClientConnectionManager.java:251)
> at 
> org.apache.http.impl.conn.BasicHttpClientConnectionManager$1.get(Basic
Ht
>
> 
tpClientConnectionManager.java:202)
> at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.j
av
>
> 
a:191)
> at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:
18
>
> 
5)
> at 
> org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
>
> 
at
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:
11
>
> 
1)
> at 
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpC
li
>
> 
ent.java:185)
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpC
li
>
> 
ent.java:83)
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpC
li
>
> 
ent.java:56)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSol
rC
>
> 
lient.java:542)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClien
t.
>
> 
java:255)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClien
t.
>
> 
java:244)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>
> 
at
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173) 
> at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138) 
> at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152) at
> [my code, calling SolrClient.add()]
> 
> Any ideas?
> 
> Thanks, -chris
> 

For those interested, it looks like I was naïvely using
BasicHttpClientConnectionManager, which is totally inappropriate in a
multi-user threaded environment.

I switched to PooledHttpClientConnectionManager and that seems to be
working much better, now. :)

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlulHN0ACgkQHPApP6U8
pFiWnRAAxIprILEjjF6rhwVZfmIFjLo8G5QNqMLZZ/4lIPEIDE8ojnCT8oiFpOru
LakIn60DkQvgIioihvNtILjO7YJUPr0Pmoq+1feCQSSRtIFwBRoyvYahzUfx0v55
rWMRcJwWo/Vr1YsyQpH33O80F07himXxqmpiQeaQd+t+d9WYOpmBn8ENuG8QEd9g
fc6yELLpJSpC6DFslCjRtAhMVNt3thdpbmYBwuKtoxHV8tuenXoxm/QQRLzJia/J
AWsNB9boYNPF1T8rGt+eft4wej71t8ac00jzj+ylkQjPpPdexp+NSEGDRCfYoz6I
bEVIVEy39f1SoyAlBnrS1QJqas9FwzMPd2tNv3y5fFCbYnKnHh50YaLgv1JAUali
UQVDtlKGwPOrbbB2SBJiX3dK263RCQSSP9eJIDvyrGzRyRAgE9fzsVmvpokicMzx
ZFiCZuIPPvmmGDvXBQ+lmtBvbav6ajsU3XyGEu+aawo6Lo7MgbdcLCPj839GR5Yd
tDxMM2O8Wpkr4FRo7hbMlKJb5KoWJNtHjs5QQNFYUFmYwSXnU9OwH7B3fCpPVC2t
OfBT5EKb8L1TWPog3zxFzrY5MQgJ2wSfBBphh2zeiFUSSLzb6T6F+ryv3rAzRO1U
6u6pfdf8AZ22gonPXs/mM4HbsL8dpP1Oyb6poHlaprxggKP7XqQ=
=sHv2
-END PGP SIGNATURE-

Re: Command Line Indexer

2018-09-18 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Dan,

On 9/18/18 2:51 PM, Dan Brown wrote:
> I've been working on this for a while and it's finally in a state
> where it's ready for public consumption.
> 
> This is a command line indexer that will index CSV or JSON
> documents: https://github.com/likethecolor/solr-indexer
> 
> There are quite a few parameters/options that can be set.
> 
> One thing to note is that it will update individual fields.  That
> is, unlike the Data Import Handler, it does not replace entire
> documents.
> 
> Please check it out and let me know what you think.

How is this different from the bin/post tool that ships with Solr?

Or is that you meant when you said "this is unlike the Data Import
Handler".

AIUI, Solr doesn't support updating a single field in a document. The
document is replaced no matter how hard to try to be surgical about
updating a single field.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhXlYACgkQHPApP6U8
pFjIeQ/+PRIx+I+IDW9XTqGNV5TIWYf+yQKC/4JpTV4Ndj7MZLsEEw+cfMvFTvQt
44dK7CnDKEDgQHZlMccWKd9/Th1k/5g40VMugBMsayRwUc83Onawdi4HQfnig4et
VN0/RaZ/IBo2AThsgEvUNplXYyY3BtyrUt6miiBsVkhKstI/BnmKqZvsRgvVjH0P
K1Xc5F2LNyXswvoIZqd3YmEa9p7CYMy7COsFV9KOeSymKlB7UoHulZqpJ9MRYkmn
YWjc9dHIRjpz5TUrJqWhZUG03uGXGtTnaXEku1Hb98WyIUZcHxkwN8W7qm6/B0CG
inPxfGRFH9EbUdcK4qeXmbQqty2sbKMQ6hogpRd/NEzgSWjDapiEUT1xz+p5V6wG
XM0ILaiLJ8zHJA6oUY0w5SNNyhdnd76CDpCK7T7YBm+aIxUDv9zoj6TLNceEaLi0
SjfI83LvaR1gM/ZeVO77d+1IY9maU1+5m0EZFjAETfMGj5dwYRvBub0Oo6QQuLUm
roF5R5b/bg/WjjPF1n4CJ7gTr/WBMzahKFnnQvoYD3OQqZpoasoEUifPpSd9OgvO
yEok0VqwxPeXdHgE+Vy+BlXn6QqshB3BYnUSNbpFXlNsOIQojfJXkjcCa+dP1nyF
JCElvmEgBG8K1WzGo4WAtVqJs7WDzQlmY2RDrETGsVbnqkTojXA=
=AmkJ
-END PGP SIGNATURE-

Re: [OT] 20180917-Need Apache SOLR support

2018-09-18 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Walter,

On 9/18/18 11:24, Walter Underwood wrote:
> It isn’t very clear from that page, but the two backup methods make
> a copy of the indexes in a commit-aware way. That is all. One
> method copies them to a new server, the other to files in the data
> directory.
> 
> Database backups generally have a separate backup format which is 
> independent of the database version. For example, mysqldump
> generates a backup as SQL statements.
> 
> The Solr backup is version-locked, because it is just a copy of the
> index files. People who are used to database backups might be very
> surprised when they could not load a Solr backup into a server with
> a different version or on a different architecture.
> 
> The only version-independent restore in Solr is to reload the data
> from the source repository.

Thanks for the explanation.

We recently re-built from source and it took about 10 minutes. If we
can get better performance for a restore starting with a "backup"
(which is likely), we'll probably go ahead and do that, with the
understanding that the ultimate fallback is reload-from-source.

When upgrading to a new version of Solr, what are the rules for when
you have to discard your whole index and reload from source? We have
been in the 7.x line since we began development and testing and have
not had any reason to reload from source so far. (Well, except when we
had to make schema changes.)

Thanks,
- -chris

>> On Sep 18, 2018, at 8:15 AM, Christopher Schultz
>>  wrote:
>> 
> Walter,
> 
> On 9/17/18 11:39, Walter Underwood wrote:
>>>> Do not use Solr as a database. It was never designed to be a 
>>>> database. It is missing a lot of features that are normal in 
>>>> databases.
>>>> 
>>>> [...] * no real backups (Solr backup is a cold server, not a 
>>>> dump/load)
> 
> I'm just curious... if Solr has "no real backups", why is there a 
> complete client API for performing backups and restores?
> 
> https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.
ht
>
> 
ml
> 
> Thanks, -chris
> 
> 
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhQlkACgkQHPApP6U8
pFgcyRAAm4/FeeGn3eGv4CwNVfc9GrsUYc4/YexdwRT7oFUgqTC2kYeegj/YAgm3
ZwgfLDkDL0HR51i/pp4UG8MDTB5NFtp8Jg6+JSE4SutAA72N6vnwnC1Z/T52i0xG
OqT0lFKeIL7Tt5c0FffbAMx5rgbFkzWHNWgFFqYFB0WZEzj4JM6rmAiDqLunRGPA
xAZUnZCRMXhcVZT0bmmnSGlyU+JHL0ZQrJD/WX4DOJo2ZyAvP7pSYBEU+nTfyjzJ
kE3rx1W9o269yc052FJTk5rRADuHIdirQQ/SrUN3O7Nn7Hqqi2/6sqyM34CF6wmX
IPv9frb/WTvXQ3nsFYmQVB1jEBBr5S+9pztO3jOtUbGGKCjBpVGDcOXJVBwEDzPW
yII5EjpjkoYwVB6shUI2nfaM/Y6r4aQLrZO6A5FFePhQTm6BGa/i2i1A1uLqfvHY
WMmv/QMYqXZu7hXW6l5NKpO1AtSKTZBq8iXi9BiOXSHNSxo9mT9kPLu40Uh63Gyp
EHI/SfAPWNwOj01pkbyV+siyhAWBVWpolN1SinnW3ZR16Yddd2lRmNxdfVCC32pL
OfRxrChtZ736kvm4ELzmUAUjITxpZf7AFgsrB6zyTlPRn/jvnW7sRsIsOa4BHdGC
e4oCzK7waITu6jam4Zz6e3efyxSDfT2YZ7811L098mody1n2g5k=
=PaVE
-END PGP SIGNATURE-

Re: [OT] 20180917-Need Apache SOLR support

2018-09-18 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Walter,

On 9/17/18 11:39, Walter Underwood wrote:
> Do not use Solr as a database. It was never designed to be a
> database. It is missing a lot of features that are normal in
> databases.
> 
> [...] * no real backups (Solr backup is a cold server, not a
> dump/load)

I'm just curious... if Solr has "no real backups", why is there a
complete client API for performing backups and restores?

https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.ht
ml

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhFp8ACgkQHPApP6U8
pFgnhBAAre3Zb2mu++WVmY6rZlcc3uoRkDRva6iR602wA/w/EUabCmHEkO9maYEm
NoUREgBH9NtFPvYnjkEEL7/P/2hUErvRw0RfwsAo89ClYjjyMEH25+p5SNmudUmK
fKRSLRUyCbpE8ahKTPG44gRlki03uJJ2GA0r3vbTLvdqm1p5KO6sE4k/r3IYJ0QI
qZfUY4Un+LQ5vGMQ7qeGRcFhaAXVOaJmnLCRqGTS2hMTM1uM01TCblhOaeX5XHYD
Yra4m15Sr1H8p3S0CFsP8oqvDND0jEC4MxM9mQvHOvq9IwMreTSwACga35Wm6ItD
h1/Td9H/Puo8o9vQMaVfNcFD4TAqt+FkIHzQEb+FkQAMfbC9ZHsmBgvl8EUtPBq1
h2ODETEcD5SsmdfrP5OWUz+0OBhH7/HEgWRjHW9nSMzhPn4kYgpF/7VuFL8iy3re
/8TviTf446I859QNragWXACdARhCzMo8AoXIs/dC70CGDvxuKmEcI6tad9Zsxcf2
+yaFa3Fzddulaeao4juZVbRVJ9eewFOSawMXDc14TeL6t13CxzxFasHiYu0C5euV
XhKSWEHYj58ijS/KU4FMDCEWZhr1KWEKwfVp7hZ2CZZNW5kNPbv97otKvxB0cKyS
LTK6PtZoZbTWXFa8rT3yq28/x6gMULQeo0ZBZLTXEJKpfAT2vAU=
=Fh1S
-END PGP SIGNATURE-

[SolrJ Client] Error calling add: connection is still allocated

2018-09-18 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

Our single-instance Solr server is just getting its first taste of
production load, and I'm seeing this periodically:

java.lang.IllegalStateException: Connection is still allocated

The stack trace shows it's coming from HTTP Client as called from
within Solr.

We are using SolrJ 7.2.1 and Solr (server) 7.4.0.

Our code looks something like this:

private HashMap CLIENT_REGISTRY = new
HashMap();

synchronized HttpSolrClient getSolrClient(String url)
throws ServiceException, SolrServerException, IOException,
GeneralSecurityException
{
HttpSolrClient solrClient = CLIENT_REGISTRY.get(url);

if(null == solrClient) {
log.info("Creating new HttpSolrClient connected to " + url);

solrClient = new HttpSolrClient.Builder(url)
.withHttpClient(getHttpClient())
.build();

solrClient.ping();

CLIENT_REGISTRY.put(url, solrClient);
}

return solrClient;
}


[here's the code that uses the above]

SolrClient solr = getSolrRegistry().getSolrClient(url);

SolrInputDocument doc = new SolrInputDocument();

// Add stuff to the document

solr.add(doc);
solr.commit();

That's it.

Other than not really needing the "commit" at the end, is there
anything wrong with how we are using SolrJ client? Are instances of
SolrJClient not thread-safe? My assumption was that they were
threadsafe and that HTTP Client would manage the connection pool under
the covers.

Here is the full stack trace:

com.chadis.api.business.RegistrationProcessor- Error processing
registration request
java.lang.IllegalStateException: Connection is still allocated
at org.apache.http.util.Asserts.check(Asserts.java:34)
at
org.apache.http.impl.conn.BasicHttpClientConnectionManager.getConnection
(BasicHttpClientConnectionManager.java:251)
at
org.apache.http.impl.conn.BasicHttpClientConnectionManager$1.get(BasicHt
tpClientConnectionManager.java:202)
at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.jav
a:191)
at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:18
5)
at
org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:11
1)
at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpCli
ent.java:185)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpCli
ent.java:83)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpCli
ent.java:56)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrC
lient.java:542)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.
java:255)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.
java:244)
at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at
org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
at
org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
at
org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
at [my code, calling SolrClient.add()]

Any ideas?

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhFVQACgkQHPApP6U8
pFhGeRAAgrg2GAmwhS9J/RBQC19SnebhevncmgMAF6nHhKegnXr8uv2fGvvySg53
BHCW0N3dtt9ZhI1VB7C9aBO65o/esW5rHi3/sIiY5QRfNIl39ajL8y98RWHJQEeA
mhjoqNdqW/GopA3YaiCmf1YJZ0FsZV7iK04KboD5DRwhsqoa8XVDa44RYfdU4iDP
cleMkQYY2KDSID0gJ2pf/Qj1acwR/hI2Q9+6kxc11/bXKCrWYAmLawV+DH6ZHqLF
HT/7bNNJ+zV0df0WEKzUDQ9wVzTKXkzvYP7ueINIiomyZN7Pv+pF58BaAiICdlUr
aqQMulLcKRC7qmN/5XqBZG00hkbH82n80o5foveTlQlC9yltSTbXjwFqd+FfOH8Y
kBU+mHWkrZr/Ic29LkgLLzX1tG+QoXAgoEAASHOockaTX5oj2vsyFYQ5nVddOMNj
/w1AgdpNztP5DLr1HQ6JhA+3nLZX43GaDxs/nENIOI2Xe36kXfS/so9Cv7DaAjQ8
OkGdOLUksQaukFZ/3MUwbgan5tQYYp4zSmky4RGS7Nd0ePTgvk4pH1uD4NFJnHWK
fsSydLT43tiOWltQkzzby6QcpSg9WrV+0zsnEPQSQHH+ubDbFt03aXS1/tjYAZTF
r8ttwGFfMQLa58hfWwBKMWtyM8m6n9gVMivhp5oENa3uFdo76kQ=
=+WJu
-END PGP SIGNATURE-

Re: Solr standalone health checks

2018-09-17 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 9/17/18 17:21, Shawn Heisey wrote:
> On 9/17/2018 3:01 PM, Christopher Schultz wrote:
>> The basic questions I'd like to have answered on a regular basis
>> are:
>> 
>> 1. Is the JVM up (this can be done with a ping, of course) 2. Is
>> the heap healthy? Any OOMEs? 3. Will a sample query return in a
>> reasonable amount of time?
>> 
>> 1 and 3 are quite easily done using e.g. /solr/[c]/ping, but #2
>> is trickier. I can do this via JMX, but I'd prefer to avoid
>> spinning-up a whole JVM just to probe Solr for one or two
>> values.
> 
> If your Solr version is at least 5.5.1 and you're NOT on Windows,
> number 2 can also be verified by a ping request.

Interesting. I did mention 7.4.0 but not my OS. I'm on Debian Linux,
and I'm running Solr using the Solr-supplied init.d scripts (via solr
install).

> With a new enough version on the correct operating system, Solr is 
> started with an option that will kill the process should an 
> OutOfMemoryError occur.  When that happens, it won't be able to
> answer a ping request.
> 
> Here's the issue that fixes a problem with the startup on 5.5.1 or
> later:
> 
> https://issues.apache.org/jira/browse/SOLR-8145

Given that, I'll go ahead and set things up to do a simple
/solr/[c]/ping request for health-monitoring.

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlugIGMACgkQHPApP6U8
pFi8YQ//dwJdJG1VtqUgbFI437HzUMhuI+9SBOf0nateQFqQbfoqkLhC/z3dwjvj
qqhqcT68D2x1bYYk/5we7KD9I6PZ50mL5sZlU34NYC9AFMB5QEdTtWljlqGM/Xoe
elvsKYJVmZn9kvc6iwqyLU71clcRX27NhEDAFrPrCmhgZKRTpNqtgYyEOsIJZ/CL
muMml4hV5eNIc+VOle+jcqwTrWY4xtaf6Fmo6NLCsUvC2CB5/QI7JoYzvnLvVVMD
IVn6AnsLd/wIVSJiPyVYDA58/pVj1w6Jb36L8eg0fxfoO+eAkObUU3s71QglZlIx
m9Qkd8lGQ7qNxUDOMSgPNW/j7tZcxn39FRsM9b3z7kWJGriBcz/S5jX9QSNcArmh
pyHIf48y8wOgl/wQsmsGgXsHtdlwJu+84B3sFGjUKQU/2JPO88XJEo+pKluaMFDO
E2yZGdTvfRbXLTqe/XCGN89yKyIOKJAX2ZXP9EU0PmFSFbeod6oqbT/MKO3+DzCm
PpkUV10vlmqnsJ+5edj89hmM5gJOKcwQTDZ2E/U5tvs4DJHZTG578hnZp1coDU/c
m7M80m5SyE/5ycYBODp6oyJNAkEf6suJ+BIyQkr61t9/L7yvwSm80nFheFpVMIMX
N/lRL9ar4U/lLDL00aVhDecyNSFOvDjSUBlIlQ4hUb80bZiz3xY=
=lOp1
-END PGP SIGNATURE-

Solr standalone health checks

2018-09-17 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I can see three possibilities for monitoring a Solr (7.4.0) deployment:

1. bin/solr healthcheck
2. curl /solr/[collection]/admin/ping
3. JMX

Option #1 isn't available unless ZK is in use, and I'm not using ZK in
my case.

Option #2 issues a very simple query and essentially returns a
"service is up" response.

Option #3 requires a JVM to be launched in order to check to see if
things are working well.

I have read about the Prometheus/Grafana reporting, but that includes
much more information about the performance of Solr that I'm currently
interested in.

The basic questions I'd like to have answered on a regular basis are:

1. Is the JVM up (this can be done with a ping, of course)
2. Is the heap healthy? Any OOMEs?
3. Will a sample query return in a reasonable amount of time?

1 and 3 are quite easily done using e.g. /solr/[c]/ping, but #2 is
trickier. I can do this via JMX, but I'd prefer to avoid spinning-up a
whole JVM just to probe Solr for one or two values.

Are there any other options for monitoring Solr that I am missing?

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlugFhwACgkQHPApP6U8
pFjShRAAoV3auXv0PIhztMW7385hi0jV2Fl6V4PrF/TZUYpEQ0jDXdseC5bm+tip
rwKmAYTqu6smvNtC4Qlj27+BdFSmaDP2MwfGN9sWCPahRLHdUKHfxwi4MnWTegM/
OkGuTiVYjzLe2vUlf4BACFFTRAz2bkRHua81SqiOMU1nZFQlj8mHy4qRBFK57Zcd
R6GGry2zcnDTkXql5v/kOCaJiXUj76n8regMVaC0M04AFIvGrIqqJ/BfxkTPmVEf
v1kC+zbKiThTl2fOSLRzwoLJvMpPghLKg5cvb9QQyRgrTQbYcYTPgytstKYS4c87
1mlbj92+T5D6kbw5snBoNIXqfPP+3kUQEeoEwz9m05SRYeoV/SR/M+wqqag5Vmz9
1Gje4TrLAfNOCxk1jSBkUWsgR5lC3msyDSDbLE/2i/m6iANxUoPnin0bQHpau6XN
tGvxyTzyZa4O1hfsWyuTywipdJOadtjyDkAEEU5CeExFAY4EILxRr78mqMx1g+CV
lefLYavs0rfQzvkkL01meL2nqitk82/x6l0PCyIh6WHHrIJ1XYWR+nQszeqY8HJE
BX0NITMqQ2gk50JpzbKqrcLWNGLvAZTzFvLKUUq4pgtab3tBwwaDzVHsxhNy517Z
933Cz92cP1VJtUKkQrw4YDChQzZt9wIHIm5vcAaBgwKCZPRWcds=
=yiXp
-END PGP SIGNATURE-

Re: How secure is Zookeeper digest auth?

2018-09-16 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Jan,

On 9/16/18 16:22, Jan Høydahl wrote:
> We plan to enable (digest) authentication and ACL with Zookeeper to
> improve security.

Can you be more explicit? There is HTTP DIGEST auth and then there are
"digested" (hashed) passwords for the user-database. The former is
secure on the wire and the other one is wire-agnostic.

> However, we have not been able to answer the question of how secure
> such a setup will be, given that ZK 3.4.x TCP communication is
> unencrypted.
> 
> So, do anyone know if ZK sends the password in cleartext over the
> network, so that anyone who can sniff the network can also pick up
> the password, and connect and read/write nodes in ZK?
> 
> We'll of course add all the firewall and IP filtering we can. Do
> you have any other tricks you use to increase ZK security?

I'm not using ZK (yet) so this may be supremely ignorant since I don't
know what protocol it uses to communicate: I would recommend using
mutual-TLS authentication everywhere. I have just deployed such a
system (single-node, no cluster/ZK) and all of the communication for
both admin and querying are over client-authenticated TLS.

Even if an attacker gets onto the box where Solr is running, they
cannot attack it without also breaking filesystem privileges or
exploiting the users who have access to the Solr client key stores.

(I just did a little Googling and it looks like only ZK 3.5+ has TLS
available. At any rate, that should be your target for the future if
you really want a secure environment.

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluewOgACgkQHPApP6U8
pFiE1g/8CiRxFySxCPZRU+OdGaw5JjtMNGs3oBDaf75LIQYDnsXAU9wJFjaEKymD
snceusjikN85XyPIFBWLhbWvrdjKhJxm29q8xqqnwTkY1WmGis53Es9NHyT/I1UX
dY3UGAbf148+ZR6NtCFDQPVQtKKfHqE/VAl2bJzMARTC1nPS3v3mtgKEbrAC5ZqX
WMMkb6pOFH58Yj7jeEdHi/y8MKEOeXV3MynWrsSRqGsJsG4Ms55pdBvWtZmIZR+c
0sM4d7zUl18/JjP82YvhhHvHW0IQL+TGKLE1s22p6JRrMU9fzcxNoD9b1r9WORGl
UixQETpBPkKw+VWXBesTxTNkprddMH6oGzm2KkWb9zOH0BehF/ChjB1W0vnC7RXB
lEKWdNkwbLfrP1r+plpy2aVc3PV0lw3jsJdxLf3tMTEPgzeU6wweiJR+YMW6J0iS
4TWFouuL6yGSY7jT99lW+CmBfKHGEXoUlrxS2WSM9BvYuV8pJvzVuEkb1PmXUQdI
rgQIW30Vk0jDwS6SMxdOy/TkbCDAV9dFqsqmYFTSN9W8jBdSx9RevOPnJyVnvCvI
qq96sTqhPa0iSHYWWK5PAzZAvfbcRmohcut/1ZWml1pNZlZzBT0QGQUJm9CzXfS7
v6FNf7PrpIiqOlai1Js67Fm6QrWzjGPVhDl474Q1tAG1rFU2cSM=
=U0Fj
-END PGP SIGNATURE-

Re: solr, multiple ports

2018-09-12 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

David,

On 9/12/18 12:21 PM, David Hastings wrote:
>> On Sep 12, 2018, at 12:15 PM, Christopher Schultz
>> mailto:ch...@christopherschultz.net>>
>> wrote:
>> 
>> David,
>> 
>> On 9/12/18 11:03 AM, David Hastings wrote:
>>> is there a way to start the default solr installation on more
>>> than one port? Only thing I could find was adding another
>>> connector to Jetty, via
>>> 
https://stackoverflow.com/questions/6905098/how-to-configure-jetty-to-li
sten-to-multiple-ports
>>> however the default solr start command takes the -p parameter, 
>>> can this start listening on multiple ports?>>
>> What's your use-case?
> 
> Use case is we are upgrading our servers, and have been running
> solr 5 and 7 side by side on the same machines to make sure we got
> 7 to reflect the results of our current install. However to finally
> make the switch, it would require changing many many scripts and
> servers that have already been modified to use both servers
Can you configure your servers to redirect port X -> port Y?

This is trivial using iptables, but you didn't mention your
environment. What OS, etc. are you using?

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluZPqgACgkQHPApP6U8
pFifoA//eDjEVraAMrtiHBcUGvMEIpNy2mCQKt0tfk2GqOvq3phO03pYta8ygJeg
UgS6c9E5Zrx7UzLF+x8ngyA+2YVvOatpRABx20k2Q8d4zq/gc2QrK+w/DwffDlV2
r3cFJkVa1k/NrDqownYAGlCvUPLUGa0JSEFnzokoh44Drn5TgRolWNqowDPitOoL
FitT97n95XhpuQrIXG1wA0nicpeBKYCLrp5HJkVVHQrLZPqkIm2FjtJZuwZN54pN
PpvBNnNIvaYfjtWxpmJ9sW2/PmwqmT4RwSmRsJUQ6H/iWFMsi1e/MCQQFWx/mOLK
4YS/yBvRaT1dRJMrzL0517zlrqdStwBh005bBeZ0+EE7DROwufYcT7hD9VBytG4y
vzgFybRA3yo5LELp2Loj2MqMvbSHNFiT290m9JgLcJRf861dGD/Luj/AYEN4qV6k
TrhlyzijKiUJmAjBIP/i8FxRNX9YkGl8QleDb4iIi5WUdPog5Enz0rw2O/l5Xie9
cz8pGj+OOEmuMLoMLBII7Crkqnmsla+hPpB2x9+lqoE0erjrngCShAiCLi9vGOJY
u6oETiGTcZjgTNnXYpZLBxZw71q4sbZhpkUIC68NJE0IIO322Vu4yreM9AaYhObq
Ak9fFrPPfgCyF7IAB6kkvWfP5eYOfK7TzTB4b9pWVN7J6owF8nA=
=+X20
-END PGP SIGNATURE-

Re: solr, multiple ports

2018-09-12 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

David,

On 9/12/18 11:03 AM, David Hastings wrote:
> is there a way to start the default solr installation on more than
> one port?  Only thing I could find was adding another connector to
> Jetty, via 
> https://stackoverflow.com/questions/6905098/how-to-configure-jetty-to-
listen-to-multiple-ports
>
>  however the default solr start command takes the -p parameter, can
> this start listening on multiple ports?

What's your use-case?

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluZO50ACgkQHPApP6U8
pFhJIg//ZS7N9E/JlYjh6dxI9x5nOUMPw3wBMKsVo290e5HnQ+6Vx+CZYEtq/nQA
Pxh7TdyvgdD67cj+jZCWYn4a3JLoVT+MVSciIZoVIDcvMEFHgmsviUwGCEq7+xsg
WvuCPEo6IzY+yZZ33wzdr7jlv+jNbFHtrF5t9nuQk8YfNrLqwvaEVor6g6+t+R/j
2L0+UOPPzRvposLiJBUKhYedBxtWas7A05WSFHpYou9wmDhJSB6P1RfnlJSdUNVd
M1BBzJpLTGo5fFgP1zZTVns+jdo6lFTo/g/UpBvVhgv1pPTkN/6vXCTbYlhXpkhO
PxXplyX6OZfavTJUvAzDoUH44xSIuAgLi+7G2nzogXqejZPjwDj6J42jvL/v3ZXP
dz4CFas0gtY/PW9eaug/nGD7UMqCrqSMxOxBatKWnNEl6R359Zp4tRD9fmU097Vc
rmky2kjZazFNGcA0RU7F8Z/pNIbWmoVAkc08yDJ6uyqfh63PI5+CEBclHZqYcAhj
cZWoDvmZL56bT8gQ5leGxME7+QQNLm6nTV1O1l9u+HeWqhBYOlbhDFOlzPVL5cYQ
SZeaseWGamn1HtyZGJN+dZoQxB3QXlHQY9Nj837QDV9tdLlHsujJ3u7w8uJBoJF0
sILKM1oQoPNcCTjv+JbhGKu6z/eq7syVwkwE9zKTlITlcemEoY8=
=6Dnr
-END PGP SIGNATURE-

Re: Error while creating a new solr core

2018-09-11 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shalvak,

On 9/11/18 01:51, Shalvak Mittal (UST, ) wrote:
> I have recently installed solr 7.2.1 in my ubuntu 16.04 system.
> While creating a new core, the solr logging shows an error saying
> 
> 
> " Caused by: org.apache.solr.common.SolrException: fips module was
> not loaded."
> 
> 
> I have downloaded the necessary jar files like cryptoj.jar and
> copied them in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/ but
> the error still persists.
> 
> I have also updated the java.security file with 
> security.provider.x=com.rsa.jsafe.provider.JsafeJCE

Does JsafeJCE provide a FIPS-compliant JSSE back-end? If so, it looks
like it's not configured properly.

Does Solr work as expected when you are using the built-in JSSE (Sun)
provider?

> Can you please suggest a solution to the FIPS module problem. Are 
> there any files I am missing while creating the solr core?
You'll have to talk to your security module vendor about fixing this
issue... it's got nothing to do with Solr.

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluX9qgACgkQHPApP6U8
pFjM4BAAv1/bXM8VwKo+ObWaRR07Y2YC/K0v6BG1yQnxv5M3OWA9zQnrm0ktAFW6
yYzF/OP8HpDKjXoS/ZahHaDS54hjLwcnBbNzDK6vbSfk556gI55v6RIEpZ/R4aYE
cae7dQIYqiGQ18igIEoGxj8ZXcNHfLmfMhVoLBCd7JJnUucToTUVhpNY4UqzBlBq
sxUzziTuMsm0RWYB4HedK8k0Tg0Sltw1XgYzeFb325Dmhw9HOLQukVvjRHrg/tCW
+n0JVzXJdANqpJpHDhmEnv3/Lw6j/8kl9APOt0cLP3bRAmD2V7QkvDBsNpOnlwiE
TfBjkv4gCkBjcB9aPInMQOdwpVp+i28RqQzw+lMipqCUVY/F0/u45WHuM9BF2IED
7fZ6PhxY953qGn5KSKpg2ol6H5X9BMswI5Az+MMGfri2dNRjgU8UfW2sr/YdrNvN
KMzo9vKsbiTGQ6sxb3Ot1ARjDUivletvI4mGjb5dUwV+xKCpWe+CSrwSZDhk5JsE
mR9jeil7QtMBuSl1ts4KB7JJ4Hlx0bHmSX7UOGSUfoqqrdfKQYDGV0GIBsDfX4uC
olcW4HEmDBnwRkxuAfm+GHCtTyWMOYBkQ3LG0uUD/HptBeXAHtVrMP7Hy5EztDiq
VxdrHG7siEKo/kIUO1yQJxUz7cXo8ZFyS7BCMYQZgiuCU3bdowM=
=8e2z
-END PGP SIGNATURE-

Re: “solr.data.dir” can only config a single directory

2018-08-27 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 8/27/18 22:37, Shawn Heisey wrote:
> On 8/27/2018 8:29 PM, zhenyuan wei wrote:
>> I found the  “solr.data.dir” can only config a single directory.
>> I think it is necessary to be config  multi dirs，such as 
>> ”solr.data.dir:/mnt/disk1,/mnt/disk2,/mnt/disk3" ， due to one
>> disk overload or capacity limitation.  Any reason to support why
>> not do so？
> 
> Nobody has written the code to support it.  It would very likely
> not be easy code to write.  Supporting one directory for that
> setting is pretty easy ... it would require changing a LOT of
> existing code to support more than one.

Also, there are better ways to do this:

- - multi-node Solr with sharding
- - LVM or similar with multi-disk volumes
- - ZFS surely has something for this
- - buy a bigger disk (disk is cheap!)
- - etc.

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluEvn8ACgkQHPApP6U8
pFgTTg//ayed4AXtocVrB6e/ZK0eWz5/E1Q7Oa7kF0c34l0MH6BIe4iOHDmrR+J9
A+t6SzVQqURMrDE8plg/xbPTlyGF8wGrEjZUZF4fpWlgnY/qNYxl5S9zJ1hPgBh7
fCKkb+LuLGdZMM4oORfCYtMgpDjOnLihHmDTfkrvZzyZwOQGeFpgEZDZKFYAjcur
wqIGTMTTWfSCoPQgQzvI8Husq7Rs75BEc+mAkaPOL0LvT9PQDEPEXXt3Kf5vXgM+
Eet1ymltZM/Xz+V/em/oeumCoCE18uxi9seuDhTpHRLjS9tCBbPWA0NmobriY3ct
GskwCnsFDAeGjG/7dcA/zmB8BK4t6JpUvI+OcJU5dvQczpQbhB9WT4GQUiME9Tvr
RjBES53HoEEKA8gb0kiuPN1pE2MSX8vO3uKpQtzVS2MOmuOeV/IebrnP/zLTll18
awtWWbPmzaAGAUfXL2ExK0+ism0o31i46CNfLfBBM8jh3lkc2HNdz5TLe8YfN3Sp
Tj0HfmYynhtH1CggOAcI1M4PIEbIGfoywX/ICSGHnLwfQoDUnBmjqXhGkFUIstWk
Dcntx+4E4NRny6zDZfg5UMjWYyo+fOVSoaDf6dfgBWIB1I3xPn5Dt0In7+oRtZ9i
Xlkw6DSaSZZ5caBqjaF278xj7IwEw2zipLPWB7hVCcUhKuJBbDY=
=rbrT
-END PGP SIGNATURE-

Re: Data Import from Command Line

2018-08-20 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Adam,

On 8/20/18 1:45 PM, Adam Blank wrote:
> I'm running Solr 5.5.0 on AIX, and I'm wondering if there's a way
> to import the index from the command line instead of using the
> admin console?  I don't have the ability to use a HTTP client such
> as cURL to connect to the console.

I'm not sure when it was added, but there is a program called "post"
which comes with later versions of Solr that can be used to load data
into an index.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlt7AfcACgkQHPApP6U8
pFgtgw/7BTV7shvNcXKrpTB11g0wjYXAJOlqARlYgWFcQIhVcs1jfbJi8O6Yxh0x
BIA/EAdob9zC/EgYbMfkM/duibr2A1/wF+CkhhTd6M/HcoSOXbI31L1LDo/xa0lg
z6t3AO9WYYKnFmD2JIxdidH1zHpIz74cAc3q43PFVtLNW2fVT2cNlg7Vn6vdVmoi
79VLPnvdyxZRdQtxbhdvCribPdFP6YLC3dgxh1KeeZzdO0OcjQykSrssX/hd207z
9iuw2TusoUIgXQsMLRtnKqqVp38MYPppk49uGprhB8iTJjDAVlvgD3jURef7S7s/
w1KBPVZTGQFh6cvzjOOZHUkaj0hX4PuYkun/hQY3Uy5kBIw5fo0Y10bjVcRZGYrb
SQDTUe0sdfU27qaY8DLqSf21to5K+wTIuOO28C1TkHkjKymg0w7THz583o0aOCzr
5fjNN00FevrWFLm+n7c2tToW3H1cAZkh5XRDDDUYnqzVzchSOHlFKM1X0gMOq8Lf
If434uctruwsqBrkscTWcS5UALGLxuwtNk9trLLeRII8YapB6MI6xoUnCvWFv1sO
fziqKXXwBmrI+v/1FqiR8Md3r32jm8Gy54acViJc9+szUEM26C+FSzvsdGnf5oVr
tlsHVwLBPORS6hGJ+MvqMGkrxlO1WNm5MrJxHNoyQ5KqAL7WT+s=
=+VTK
-END PGP SIGNATURE-

Re: Searching by dates

2018-08-16 Thread Christopher Schultz

Shawn,

On 8/16/18 10:37 AM, Shawn Heisey wrote:
> On 8/16/2018 7:48 AM, Christopher Schultz wrote:
>> I haven't actually tried this, yes, but from the docs I'm guessing that
>> I can't search for a DOB using e.g. 2018-08-16 but instead I need to
>> search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ.
>>
>> No user is ever going to do that.
> 
> If you use the field class called DateRangeField, instead of the trie or
> point classes, you can get what you're after.
> 
> It allows both searching and indexing dates as vague as "2018".
> 
> https://lucene.apache.org/solr/guide/7_4/working-with-dates.html

Hmm. I could have sworn the documentation I read in the past (maybe as
long as 3-4 months ago) indicated that date+timestamp was necessary.
Maybe that was just for the index, while the searches can be partial.

As long as users don't have to enter timestamps to search, I think all
is well in terms of index/search for me.

As for i18n, is there a way to have the query analyzer convert strings
like "mm/dd/" into "-mm-dd"?

I'm sure we can take the query (before handing-off to Solr), look for
anything that looks like a date and convert it into ISO-8601 for
searching, but if Solr already provides a facility to do that, I'd
rather not complicate my code in order to get it working.

> For an existing index, you will have to change the schema and completely
> reindex.

That's okay. The index doesn't actually exist, yet :) This is all just
planning.

Thanks,
-chris

signature.asc
Description: OpenPGP digital signature

Searching by dates

2018-08-16 Thread Christopher Schultz

All,

My understanding is that Solr (really Lucene) only handles temporal data
using full timestamps (date+time, always UTC). I have a use-case where
I'd like to store and search for people by their birth dates, so the
timestamp information is not relevant for me.

I haven't actually tried this, yes, but from the docs I'm guessing that
I can't search for a DOB using e.g. 2018-08-16 but instead I need to
search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ.

No user is ever going to do that.

I can also offer a separate form-field for "enter your DOB search here"
and then correctly-format it for Solr/Lucene, but then users can't
conveniently search for e.g. "chris schultz 2018-08-16" and have the DOB
match anything useful.

Is there any standard way of handling dates, or any ideas people have
come up with that kind of work for this use-case?

I could always convert dates to unparsed strings (so I don't get
separate tokens like 2018, 08, and 16 in the document), but then I won't
be able to do range queries against the index.

I would definitely want to be able to search for "chris [born in] august
2018" and find any matches.

Any ideas?

Thanks
-chris



signature.asc
Description: OpenPGP digital signature

Re: [OT] Lucene/Solr bug list caused by JVM's implementations

2018-08-15 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Erick,

On 8/15/18 12:56 PM, Erick Erickson wrote:
> Also note that the OpenJDK devs regularly get to test very early 
> (unreleased) Java versions, which flushes out a lot of issues long 
> before a general release of Java

We (dev@tomcat) get emails from Oracle about pre-release versions of
Java releases as well. I'm sure you guys could get on that list so
solr-dev@lucene can get notifications of pre-release versions to test
to make sure Solr is good-to-go on each forthcoming version.

- -chris

> On Wed, Aug 15, 2018 at 5:25 AM, Shawn Heisey 
> wrote:
>> On 8/14/2018 8:07 PM, Yasufumi Mizoguchi wrote:
>>> 
>>> I am looking for Lucene/Solr's bug list caused by JVM's
>>> implementations. And I found the following, but it seems not to
>>> be updated. https://wiki.apache.org/lucene-java/JavaBugs
>>> 
>>> Where can I check the latest one?
>> 
>> 
>> That is the only such list that I'm aware of.  There are not very
>> many JVM bugs that affect Solr, and most of them have either been
>> fixed or have a workaround.  I don't know the state of the IBM
>> bugs ... except to say we strongly recommend that you don't run
>> IBM Java.
>> 
>> Best course of action:  Run the latest release of whatever Java
>> version you have chosen, and only use Oracle or OpenJDK.  For
>> Java 8, the current Oracle release is 8u181.  At this time, I
>> wouldn't use Java 10 except in a development environment.  It's
>> still early days for that -- newest Oracle version is 10.0.2.
>> 
>> If you use the latest Oracle/OpenJDK release of Java 8, Solr
>> ought to work quite well.
>> 
>> Thanks, Shawn
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlt0cQgACgkQHPApP6U8
pFg3FRAAi8BZICgv57H5zb6qxUh9Ic5scn0BBoT+lVRjpOkemoMQ8Fki3MViE69o
jCABKQ70HItM1SKFu7a8xys8M/qs81gDHHdO/atW8Q9VzfrWlBJadrnIdVrvqW4y
ZWBgUfx0fsucgKBGy9U7uFlvpTqj+H/gpRLm+1CEzi3Eb3V43YRkxy1FY5qPH2yA
YApJMqyLWdSw9p2axwCSRswILfnCTI6VV0YXNAbaJIxJNqbmDat4yLGN/e6mMcP+
+y8jndPMxQHTgB4OH2B1DLAIlop4p3/eF2Oy+PiKEziALlvd+TYSlc07XWYv+TuO
NELBS0eEty/ay8wLSoCx+er9N18eiPa3eaMr7LQcRFs2wtBmYe8OGQpb9NC0/Bpm
WNUuDGxc2rIGAnHYI7CTi/Y8ncCX1XBGstMGuYpnguoEWMSOUSrdWDVYjJJEB+cP
qFCGRHdsK3qeXze2UQ/9FHNXYGjv9TwKYsTAX06ZZZPGC0VD8l0mxXeHfsx3aTQA
u7/cmj+i86LFnjQ/gvsc4vUzXEk163Pgd/dutqpaMFmENTdN6cvBHHnj9T7TV/PJ
WpJemYvje4xFZrFvbkdQ1XMij/s3+8gNqHYmaaTjZ7JHvnlbCDofqtwLbFH9hKDt
n87iJUmTe6zGtn6/RUTrRA8ONH/5j2Yok+2reHqzgo2XSosqpc8=
=9CZ5
-END PGP SIGNATURE-

Re: Add Wildcard Certificate to Java Keystore

2018-08-13 Thread Christopher Schultz

Kelly,

On 8/13/18 12:37 PM, Kelly Rusk wrote:
> All I have is the .p12 and password so it has already gone through 
> the CSR process. How do I import this file into the keystore?
Java's keytool won't merge keystores. You'll have to export the
certificates from the PKCS12 file you got from your CA and import each
of them separately into your own keystore.

> On the Windows side, does it need to reside in the Personal Store or
> Trusted Root Store?
Umm... is this for a server certificate? If so, you definitely don't
want to import any of those certificates into any system-wide or
user-wide certificate trust stores.

Is this certificate signed by a real CA, or are you building your own,
internal, private CA who is signing these certficates?

-chris

> -Original Message- From: Christopher Schultz
>  Sent: Monday, August 13, 2018 12:00
> PM To: solr-user@lucene.apache.org Subject: Re: Add Wildcard
> Certificate to Java Keystore
> 
> Kelly,
> 
> On 8/13/18 11:55 AM, Kelly Rusk wrote:
>> I have imported a Wildcard Certificate to my Java Keystore and it 
>> displays, but when I pull up Internet Explorer and browse to my
>> Solr site, it fails to load and presents TLS errors.
> 
> What do you mean "it displays"?
> 
> How did you import your signed certificate into your keystore? What
> was in the keystore before you performed the import?
> 
>> Has anyone run into this, what commands do you run to import a
>> Public CA into Solr?
> 
> Generally, you want to generate a key+cert/CSR and send the CSR to a
> CA. The CA signs it and returns it, typically with one or more
> intermediate certificates to build a chain of trust between the CA's
> root cert (present in browser trust stores) and your server's
> certificate (which was signed by a subordinate certificate, not
> directly by the CA's root cert).
> 
> Import them into your keystore in this order:
> 
> 1. Highest (closest to the root) CA cert 2. [any other intermediate
> certs from the CA, in order] 3. Your server's cert
> 
> Most server software needs a bounce to reload the keystore.
> 
> -chris
> 



signature.asc
Description: OpenPGP digital signature

Re: Add Wildcard Certificate to Java Keystore

2018-08-13 Thread Christopher Schultz

Kelly,

On 8/13/18 11:55 AM, Kelly Rusk wrote:
> I have imported a Wildcard Certificate to my Java Keystore and it
> displays, but when I pull up Internet Explorer and browse to my Solr
> site, it fails to load and presents TLS errors.

What do you mean "it displays"?

How did you import your signed certificate into your keystore? What was
in the keystore before you performed the import?

> Has anyone run into this, what commands do you run to import a Public
> CA into Solr?

Generally, you want to generate a key+cert/CSR and send the CSR to a CA.
The CA signs it and returns it, typically with one or more intermediate
certificates to build a chain of trust between the CA's root cert
(present in browser trust stores) and your server's certificate (which
was signed by a subordinate certificate, not directly by the CA's root
cert).

Import them into your keystore in this order:

1. Highest (closest to the root) CA cert
2. [any other intermediate certs from the CA, in order]
3. Your server's cert

Most server software needs a bounce to reload the keystore.

-chris

signature.asc
Description: OpenPGP digital signature

Re: 4 days and no solution - please help on Solr

2018-08-10 Thread Christopher Schultz

Ravion,

What's wrong with "update request"? Updating a document that does not
exist... will add it.

-chris

On 8/10/18 3:01 PM, ☼ R Nair wrote:
> Do you feel that this is only partially complete?
> 
> Best, Ravion
> 
> On Fri, Aug 10, 2018, 1:37 PM ☼ R Nair  wrote:
> 
>> I saw this. Please provide for add. My issue is with add. There is no
>> "AddRequesg". So how to do that, thanks
>>
>> Best Ravion
>>
>> On Fri, Aug 10, 2018, 12:58 PM Jason Gerlowski 
>> wrote:
>>
>>> The "setBasicAuthCredentials" method works on all SolrRequest
>>> implementations.  There's a corresponding SolrRequest object for most
>>> common Solr APIs.  As you mentioned, I used QueryRequest above, but
>>> the same approach works for any SolrRequest object.
>>>
>>> The specific one for indexing is "UpdateRequest".  Here's a short example
>>> below:
>>>
>>> final List docsToIndex = new ArrayList<>();
>>> ...Prepare your docs for indexing
>>> final UpdateRequest update = new UpdateRequest();
>>> update.add(docsToIndex);
>>> update.setBasicAuthCredentials("solr", "solrRocks");
>>> update.process(client, "techproducts");
>>> On Fri, Aug 10, 2018 at 12:47 PM ☼ R Nair 
>>> wrote:

 Hi Jason,

 Thanks for replying.

 I am adding a document, not querying. I am using 7.3 apis. Adding a
 document is done via solrclient.add(). How to set authentication in
 this case? Seems I can't use SolrRequest.

 Thx, bye
 RAVION

 On Fri, Aug 10, 2018, 10:46 AM Jason Gerlowski 
 wrote:

> I'd tried to type my previous SolrJ example snippet from memory.  That
> didn't work out so great.  I've corrected it below:
>
> final List zkUrls = new ArrayList<>();
> zkUrls.add("localhost:9983");
> final SolrClient client = new CloudSolrClient.Builder(zkUrls,
> Optional.empty()).build();
>
> final Map queryParamMap = new HashMap>> String>();
> queryParamMap.put("q", "*:*");
> final QueryRequest query = new QueryRequest(new
> MapSolrParams(queryParamMap));
> query.setBasicAuthCredentials("solr", "solrRocks");
>
> query.process(client, "techproducts"); // or, client.request(query)
> On Fri, Aug 10, 2018 at 10:12 AM Jason Gerlowski <
>>> gerlowsk...@gmail.com>
> wrote:
>>
>> I would also recommend removing the username/password from your Solr
>> base URL.  You might be able to get things working that way, but
>>> it's
>> definitely less common, and it wouldn't surprise me if some parts of
>> SolrJ mishandle a URL in that format.  Though that's just a hunch on
>> my part.
>> On Fri, Aug 10, 2018 at 10:09 AM Jason Gerlowski <
>>> gerlowsk...@gmail.com>
> wrote:
>>>
>>> Hi Ravion,
>>>
>>> (Note: I'm not sure what Solr version you're using.  My answer
>>> below
>>> assumes Solr 7 APIs.  These APIs don't change often, but you might
>>> find them under slightly different names in your version of Solr.)
>>>
>>> SolrJ provides 2 ways (that I know of) to provide basic auth
> credentials.
>>>
>>> The first (and IMO simplest) way is to use the
>>> setBasicAuthCredentials
>>> method on each individual SolrRequest.  You can see what this
>>> looks
>>> like in the example below:
>>>
>>> final SolrClient client = new
>>>
>>> CloudSolrCLient.Builder(solrURLs).withHttpClient(myHttpClient).build();
>>> client.setDefaultCollection("collection1");
>>> SolrQuery req = new SolrQuery("*:*");
>>> req.setBasicAuthCredentials("yourUsername", "yourPassword);
>>> client.query(req);
>>>
>>> SolrJ also has a PreemptiveBasicAuthClientBuilderFactory, which
>>> reads
>>> the username/password from Java system properties, and is used to
>>> configure the HttpClient that SolrJ creates internally for sending
>>> requests.  I find this second method a little more complex, and it
>>> looks like you're providing your own HttpClient anyways, so for
>>> both
>>> those reasons I'd recommend sticking with the first approach (at
>>> least
>>> while you're getting things up and running).
>>>
>>> Hope that helps.
>>>
>>> Best,
>>>
>>> Jason
>>>
>>> On Thu, Aug 9, 2018 at 5:47 PM ☼ R Nair <
>>> ravishankar.n...@gmail.com>
> wrote:

 Dear all,

 I have tried my best to do it - searched all Google. But I an=m
 unsuccessful. Kindly help.

 We have a solo environment. Its secured with userid and
>>> password.

 I used

>
>>> CloudSolrClient.Builder(solrURLs).withHttpClient(mycloseablehttpclient)
 method to access it. The url is of the form
>>> http:/userid:password@/
 passionbytes.com/solr. I set defaultCollectionName later.
 In mycloseablehttpclient, I set Basic Authentication with
 CredentialProvider and gave url, port, userid and password.
 I have changed HTTPCLIENT to 4.4.1 version,

Re: Schema Change for Solr 7.4

2018-08-03 Thread Christopher Schultz

Joe,

On 8/3/18 11:44 AM, Joe Lerner wrote:
> OK--yes, I can see how that would work. But it would require some quick
> infrastructure flexibility that, at least to this point, we don't really
> have.

The only thing that needs swapping is the URL that your application uses
to connect to Solr, so you don't need anything terribly complicated to
proxy it.

Something like Squid would work, and you'd only have a few seconds of
downtime to set it up initially, and then another few seconds to swap later.

Heck, you can even remove the proxy after you are all done. It doesn't
have to be a permanent fixture in your infrastructure.

-chris

signature.asc
Description: OpenPGP digital signature

Re: Schema Change for Solr 7.4

2018-08-03 Thread Christopher Schultz

Joe,

On 8/3/18 11:09 AM, Joe Lerner wrote:
> We recently set up Solr 7.4 in Production. There are 2 Solr nodes, with 3
> zookeepers. We need to make a schema change. What I want to do is simply
> push the updated schema to Solr, and then re-index all the content to pick
> up the change. But I am being told that I need to:
> 
> 1.Delete the collection that depends on this config-set.
> 2.Reload the config-set
> 3.Recreate the dependent collection
> 
> It seems to me that between steps #1 and #3, users will not be able to
> search, which is not cool.
> 
> Can I avoid the outage to my search capabilitty?

I dunno about how to do any online-updates like this, but you could
always instead:

0. place a proxy between your application and Solr
1. stand-up a new service
2. load the config-set
3. create the collection
4. load all the data from source
5. swap the service at the proxy to the newly-created service

-chris

signature.asc
Description: OpenPGP digital signature

Re: Search for a specific unicode char

2018-07-31 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

To whom it may concern,

On 7/31/18 2:56 PM, tedsolr wrote:
> I'm having some trouble with non printable, but valid, UTF8 chars
> when exporting to Amazon Redshift. The export fails but I can't yet
> find this data in my Solr collection. How can I search, say from
> the admin console, for a particular character? I'm looking for
> U+001E and U+001F

Try copy/pasting from e.g.
https://www.fileformat.info/info/unicode/char/001e/browsertest.htm

Or url-decode this string (%1e) here:
https://meyerweb.com/eric/tools/dencoder/

and paste it into your search box.

Do you have the source-data for the index? Maybe it's easier to locate
the character in the source-data than in the index.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgzZoACgkQHPApP6U8
pFh5LQ//XEHKxGXd50kujey1H2i9SCoF0MYPIL255Mm/CXI2CEkHBiZnEEN7mrEH
xW87KbpKcahikEYT2fc/VDoctWtoJYpzi3WrizONNf1W7J4Nq9sSfdQ8UEDEuHy7
ITma15LkVseKmWxcFJP5rOtRatHw+L0j8EzwvYrC+BfpP7c9hqO8h4VO+9fkmSbn
5wB49kfot4quvJf4iMud+/qd6+4rLD1XR2nO1P7ZRuU7yqEGy5w9fLFNYkAVZmxR
1WXidEnAgLXxFoR061k0OwrxCwgVD0K/NqhzO5cWpmv5DbGoFiWcuOavzlOedp7u
ZPP32TuAM3PqmTpO6ku1MEsI70jVNlaRx6M1dzp6RUARFNEzLRbw93F3Vo9A34PL
94JhDaKMqbA74s2OdG+qNna7Fwe4mbIXMxUbwY80AC+1RMkEzRC/f1erNK1sfCzA
6cn06FNLuwbNhHvEpPAcS7TX0w0uhy4tCbbBt8rw0pbZDWee4Jz/aF7eRfMIiLdt
SlILSJZyte0CCMuC7Rm5qs/lpObfOaynVNSHpyPOJircqOyvYDy/UWq6C1t5/NuB
0X6vpBy/QSZhmmq7GHc6a8A6udDd8cfW1rXEt1vRcG9qnke1zSR7Trcb6n+GV19s
wooo3fHIsvU7393MHUZqAspaU20WqY9r9coNRHmje40Uj5ckFzU=
=NdlT
-END PGP SIGNATURE-

Re: Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Georg,

On 7/31/18 12:33 PM, Georg Fette wrote:
> Yes ist is only one of the processors that is at maximum capacity.

Ok.

> How do I do something like a thread-dump of a single thread ?

Here's how to get a thread dump of the whole JVM:
https://wiki.apache.org/tomcat/HowTo#How_do_I_obtain_a_thread_dump_of_my
_running_webapp_.3F

The "tid" field of each thread is usually the same as the process-id
from a "top" or "ps" listing, except it's often shown in hex instead
of decimal.

Have a look at this for some guidance:
http://javadrama.blogspot.com/2012/02/why-is-java-eating-my-cpu.html

Some tools dump the tid in hex, others in decimal. It's frustrating
sometimes.

> We run the Solr from the command line out-of-the-box and not in a 
> code development environment. Are there parameters that can be 
> configured so that the server creates dumps ?
You don't want this to happen automatically. Instead, you'll want to
trigger a dump manually for debugging purposes.

- -chris


> Am 31.07.2018 um 15:07 schrieb Christopher Schultz: Georg,
> 
> On 7/31/18 4:39 AM, Georg Fette wrote:
>>>> We run the server version 7.3.1. on a machine with 32GB RAM
>>>> in a mode having -10g.
>>>> 
>>>> When requesting a query with
>>>> 
>>>> q={!boost 
>>>> b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)=str
ing
>
>>>> 
_field_type:catalog_entry=2147483647
>>>> 
>>>> 
>>>> the server takes all available memory up to 10GB and is then
>>>> no longer accessible with one processor at 100%.
> Is it a single thread which takes the CPU or more than one? Can
> you identify that thread and take a thread dump to get a backtrace
> for that thread?
> 
>>>> When we reduce the rows parameter to 1000 the query
>>>> works. The query returns only 581 results.
>>>> 
>>>> The documentation at 
>>>> https://wiki.apache.org/solr/CommonQueryParameters states
>>>> that as the "rows" parameter a "ridiculously large value" may
>>>> be used, but this could pose a problem. The number we used
>>>> was Int.max from Java.
> Interesting. I wonder if Solr attempts to pre-allocate a result 
> buffer. Requesting 2147483647 rows can have an adverse affect on
> most pre-allocated data structures.
> 
> -chris
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgySgACgkQHPApP6U8
pFjgKxAAxfbUmcj81+CpmTwHaPsz8Zb70HX4o/1eDGwALMhuvg8MyTaZnR9rSPy3
LHhAn0dtdnhp7Pe3NWRrYFdzKOZjQ85jiEcW96bzCe5ggJmnvs9a9VeEJ+5b4AXN
XMtSMo8Ph7BvAWeTQcwmsiK8w2grAzaV6zXEetxaXgL0+16wfIjyNBteiQHkpcjo
T5T5UzSzwyuAxFJkxSdbsF6SAJD7+zwbOEUQlURlUBsmzgam124ojgNl3gEG8d/V
SSFhI1vnuj7pkdFLSZm7BDdAw6KjnOeM3yE3VKh5Lem4CRNLrP3ZvKrzKVlWTFJ4
dAIuJL6GUSMEFU0MCwQZjFxmtWNMwl/MIdDD8Yp9m/GislLXbcOi4oBbmWTNnuqU
SPtmjdV+7fcIRl8AWc0bzLbK4nFYlVFzhiijR5am+pvF13TB/WQ8eOn9uifSPxWb
OHzrU+fMV0fvIe5pZxqkcHEBas5QiZKZ5yH6Zz+xLldF4nh9Q4A6CJu/21qU/Kxd
Dp2lenZEjKc90FKpSVMXqxJNM0n7geRmTSgv8imeoQf5+H6VU7dll1xGQkTnXtR9
UyV/U1fj12z2UjzcY6ePuJ8BadIx+cSf6H3q4bcJOGZ884lI+bDX08C/89hb/5vT
2NE5+tK1jAOX/ESClb6eFFMcJzBww/CoIxb9PpRqgw3HJKYuVpY=
=mS/y
-END PGP SIGNATURE-

Re: Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Georg,

On 7/31/18 4:39 AM, Georg Fette wrote:
> We run the server version 7.3.1. on a machine with 32GB RAM in a
> mode having -10g.
> 
> When requesting a query with
> 
> q={!boost 
> b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)=string
_field_type:catalog_entry=2147483647
>
> 
> 
> the server takes all available memory up to 10GB and is then no
> longer accessible with one processor at 100%.

Is it a single thread which takes the CPU or more than one? Can you
identify that thread and take a thread dump to get a backtrace for
that thread?

> When we reduce the rows parameter to 1000 the query works. The
> query returns only 581 results.
> 
> The documentation at
> https://wiki.apache.org/solr/CommonQueryParameters states that as
> the "rows" parameter a "ridiculously large value" may be used, but
> this could pose a problem. The number we used was Int.max from 
> Java.

Interesting. I wonder if Solr attempts to pre-allocate a result
buffer. Requesting 2147483647 rows can have an adverse affect on most
pre-allocated data structures.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgXy4ACgkQHPApP6U8
pFicOQ//c1Qe0hLOHIbSvmxAMVEhqZTjQlzEGFoYYhC1aGrYpw++RKQYtBLD2kmN
DcLLkwFOmwv5CDft+Mn+g5ZWhEuZSKnwFgxsPfTAbRGjDYGQ7qCzzGq2JGacoxTJ
rPgizyRlZQ4f5QY0RHohAGFx/QhgPtLdSl0V32eERWH8fVJWvDH3iYTTTSDN4UCY
/bpB34nrruBgh2iTz9UcGR1jnTw9iU57OVYRwtTk8ETeOivcBM5MTXzKbwQ8/w5m
c7lmKWqMG0G5XKKu6KDbWFZwSwYLBvHTUQurqgS2pkm+r2c4xP5/U0+uI5D9EseS
1HiOjWBuhWFEIveioKCOQbPAWL+C0i4xMbBLiC4RZPnTs6LSQ0aXm4Jx05NFoAWt
3HA2VCb9rrK5y8cICSCbVGaPNNBT9HHqJqeo2eGbzLaZXP5iRCc8BdkjHTPrSqCq
gh8FEAK9pVS3ejO96DZvIoiIEpcmRNuSHczdE7YKwCv5XvytSh4QXa0SKluEhpYo
acPXOtjIbqFcTZ1f+hZTfiG1/PeCUnYshta8VdSyvIjm748wOB7wqs7uYhl0b6zx
i6OgoQ3bOel8e7oAO4Fmv5LE56b8A4tOPzPBf4Y1ehb8e8HbBdSzZuzqZZrQqChQ
AUfrEzaXUKIBsmlaUneT2qjsLLZZmU+Gk0EYJnmHw63RQR/QxKg=
=IXGx
-END PGP SIGNATURE-

Re: Upgrading SOLR (not clustered)

2018-07-25 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Phil,

On 7/25/18 4:38 PM, Staley, Phil R - DCF wrote:
> Christopher,
> 
> Testing an upgrade from version 7.2.1 to 7.4.0 on SUSE Linux 12
> 
> From the /etc/init.d/solr file?
> 
> SOLR_INSTALL_DIR="/opt/solr"
> 
> From the /etc/default/solr.in.sh file? (and these are my data
> and/indexing core locations
> 
> SOLR_PID_DIR="/var/solr" SOLR_HOME="/var/solr/data" 
> LOG4J_PROPS="/var/solr/log4j.properties" 
> SOLR_LOGS_DIR="/var/solr/logs" SOLR_PORT="8983"

I would expect your process to work. Did it?

- -chris

> -Original Message- From: Christopher Schultz
>  Sent: Wednesday, July 25, 2018 3:23
> PM To: solr-user@lucene.apache.org Subject: Re: Upgrading SOLR (not
> clustered)
> 
> Phil,
> 
> On 7/25/18 12:38 PM, Staley, Phil R - DCF wrote:
>> What are the steps for upgrading a non-clustered SOLR version? 
>> Here's what I thought should work:
> 
> 
> 
>> 1.  Open a bash window and ssh login to desired server with 
>> your Linux admin credentials
> 
>> 2.  Change directories:  cd /opt
> 
>> 3.  Download the latest Linux/OSX version direct to server: 
>> sudo wget 
>> http://secure-web.cisco.com/1m3u-zHHzT7PG9DMKzh18vroXutH5_t3ai-gl70-Y
x
>>
>> 
ZzDhjDAlBf5297ajnpoZ0PptxeKUldcLaRREkQF6UwpkpjGJvBhFyMYKEleNgOv2KiAXuZ
>> Qw4HjRFeUCRluU7gPGPLiYF7_aaBeutMU6Kr0LxiOTpUTv2z9qZiIQYU2M-YN1lNy-acH
K
>>
>> 
rY5ZfGuMw0fSBmdRa9PSzP9ZUj1qGEY94PCLQXxVNkYx_u4CXx-TaA0Fo-aKqvl2x9ejFB
>> uVt2jF1e8zf3i9E367USmyBdbEQ/http%3A%2F%2Fapache.claz.org%2Flucene%2Fs
o
>>
>> 
lr%2Fx.x.x%2Fsolr-x.x.x.tgz (replace x.x.x with the latest version
>> number)
> 
>> a.  Additional download mirror servers are available @ 
>> http://secure-web.cisco.com/1Mafx4QIn9BgkDPtPKbw6pF3EugYCWQHwgSifrgOr
_
>>
>> 
5l1VTprI53j3huCKwyUxst3FIbRgyqah-96wu9NC3fcClwiEqV6ww9g796bhMz6OQDxYb17q
2WPVzIhkB8ozsOw6CJoJKu9xvQuPlab4QkH5DAqOfWBFbtBavS2s-eRdGexv327ATH5BZZP0
snS49XnaiUJjYgYPf4ILzXPp5DLQmbLYSxuHlIp0UP3J_4b_gxq9JEB7_E6dcDiq1hrEN_wW
4n8MvuaRQ3PqPgO_ucjaFoYOL5ZoFSM-svWmZcoD1E/http%3A%2F%2Fwww.apache.org%2
Fdyn%2Fcloser.lua%2Flucene%2Fsolr%2F7.3.1
if the
http://secure-web.cisco.com/1mEnNfQ3nil_pfEFLpG5wMsugkz7vhDU0czyVu2MH7pe
J0aomngulTED-W-zTbK-ywavVjNDYF95PcgmerYe2J4MSIrpaWALkysbyL5rYu4BVb9VZXQg
GuPso0kODrtnA_F4Igw1cE2qjoeoRLk6Pff9Or3lnLbyVCuHjIfECo_JOGvuw91ulYljWU3e
113vxCGB8x9ogaAPR06C1qoqDhu4_b1j2tXqAfJb9iiJKLvOHB-RsxGLu1jxdk4_enK1CVE5
5nj2gyHh2QgAgqVmaBA/http%3A%2F%2Fapache.claz.org
is site is slow.
> 
>> 4.  Login as root user:  sudo -i  and enter you admin
>> password
> 
>> 5.  Unzip the .tgz file:  tar zxf solr-x.x.x.tgz
> 
>> 6.  Change directories:  cd /
> 
>> 7.  Stop SOLR service: service solr stop
> 
>> 8.  Confirm that SOLR is stopped:   service solr status
> 
>> 9.  Change directories to your user home directory:  cd 
>> /home/myadminlogonid
> 
>> 10.  Create new solr symbolic link in your user home folder that 
>> points to new SOLR version:   ln -s /opt/solr-x.x.x  solr
> 
>> 11.  Move/replace current symbolic link:  mv solr /opt
> 
> What version are you going from/to?
> 
> What OS is this?
> 
> Do you have an /etc/init.d/solr file? If so, where does
> SOLR_INSTALL_DIR point?
> 
> Do you have an /etc/default/solr.in.sh file? If it points to all of
> your data-locations, then you should be okay.
> 
> -chris
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltY4DkACgkQHPApP6U8
pFgShhAAysteX3nkqj/LwLZ8Q5uiLVo/OCNz0kk/Q4VOFM7sgnkydLTCdfRD6hBD
FtqoJue/ayXvGPoRC46R0+LGBYyRQwPajFPBe0S5Ay4dCIe4de1lMjmsc6zfgMpb
tu15YduBFT2O6vdKUyHISHIANqvnaAZnRvfp6P3rlgN1ADL0Ui8y2Kdrx+iHszi2
mIc3fuJY1t8LVpAjMH5Vu8ZD8LuBkH3DOHPLErPoJPkOF+0CaiLrR7DBStrKdsOF
5k5Jlgv/oYueCS0X1SAtc1W7t/vqUHgqnqNqnNaInGDOTblW/FTVOxRt8BF90sgS
UPBy8K2EyhS/rZqEBEp7sLndzNhtGHmhCNOIptHsaixt+zh7bepdXEvSNThDLHs2
Pg+NTyGGsr5JzdzkjZZwV4Re5jPY5vNL9LTOqIr/x3rQiSo04M5u1rCuHKRFP3Dw
ZFxamOXPDSo1Oo32042/yAwgpI+En1YVEEvXwNhudCeG1mEAxW+UejdhEAkxrpFt
+BqDo+XWh9jNyqBFUMtMjjzbF3SWfjeDtMfFPCy6IamUqCWJoXk6uhGF6RH1GHqL
QBJP3NSxMMU5X68fVG/dr2DKiprZmuJuuNup3qZJwGWZZAHV3/Z+3gmTLL/ISqDH
RZnm2lc1aIgIFdD7s1cFxHueD3j7shxbwmFXkn5Jd+RUjKf7nDU=
=enmW
-END PGP SIGNATURE-

Re: Upgrading SOLR (not clustered)

2018-07-25 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Phil,

On 7/25/18 12:38 PM, Staley, Phil R - DCF wrote:
> What are the steps for upgrading a non-clustered SOLR version?
> Here's what I thought should work:
> 
> 
> 
> 1.  Open a bash window and ssh login to desired server with
> your Linux admin credentials
> 
> 2.  Change directories:  cd /opt
> 
> 3.  Download the latest Linux/OSX version direct to server: 
> sudo wget http://apache.claz.org/lucene/solr/x.x.x/solr-x.x.x.tgz
> (replace x.x.x with the latest version number)
> 
> a.  Additional download mirror servers are available @
> http://www.apache.org/dyn/closer.lua/lucene/solr/7.3.1 if the
> http://apache.claz.org is site is slow.
> 
> 4.  Login as root user:  sudo -i  and enter you admin password
> 
> 5.  Unzip the .tgz file:  tar zxf solr-x.x.x.tgz
> 
> 6.  Change directories:  cd /
> 
> 7.  Stop SOLR service: service solr stop
> 
> 8.  Confirm that SOLR is stopped:   service solr status
> 
> 9.  Change directories to your user home directory:  cd
> /home/myadminlogonid
> 
> 10.  Create new solr symbolic link in your user home folder that
> points to new SOLR version:   ln -s /opt/solr-x.x.x  solr
> 
> 11.  Move/replace current symbolic link:  mv solr /opt

What version are you going from/to?

What OS is this?

Do you have an /etc/init.d/solr file? If so, where does
SOLR_INSTALL_DIR point?

Do you have an /etc/default/solr.in.sh file? If it points to all of
your data-locations, then you should be okay.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltY3BgACgkQHPApP6U8
pFgPBRAAvhRGsL2b7WN9ivgerORr8P9Q5+eSrfiCh+DHcBWGUXbLvC5286U2Ieua
F6A5xPbjg6hhNoo7TVN/b+5iPBJbZL/Ea9UnuR7ZdVL+xTVktoN+Y5HWUEHwEFk1
PTzfAw/GyNmN4hQbFLVQbYQn+hzYyj1xXCtwa/RKO82c7CEM5H43aTO90CoZa2Vh
rNBeBiXXKPmlaL+RJdDs2yRZAjpTYO2FMJAZWPrzNq9R956tuZj8rPMrERhpLBuk
Dh/33EZKaanLzEBEfOU5O5Qqm5oOlKqDDOK3hs25ru8o6pZ7wAPsiiBof0dSBkj1
V/DGdUfrSMjzVi7DYC1Ie0m1RI8IvHwUZZV7cT23S73U6+QvP+9ap/m8/P4CZCtH
i06aSfEFHEhcvjM2DQ2+sbn2VRinbiQWggGtlr0lrauOSdJ/NCTb4fgiZ3w/esbC
xdY4O9HwQhkjyKFgagKKIBdx/4klusrM+mx/VdhqQ5RtfiWcO3gqKZPlVHYfyc6m
FWMW4i06QfmZLyLeH6xzBqOVUcUdY7UbwALEOO/Kgm2B9J9t/azDlkM4XcnMLeBT
Ee7WuqREe4JoV9iH+MvReHfA+FbrO5vt0b2LFgI2RmcEgFzp1CDq/vbqkcEESN6C
5tRG1VjOpBTINvAKo2hNNwmiNIDGa6ZpWcxqgknZqQJiZ7AsuUw=
=GP6+
-END PGP SIGNATURE-

Re: Possible to define a field so that substring-search is always used?

2018-07-25 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Chris,

On 7/24/18 4:46 PM, Chris Hostetter wrote:
> 
> : We are using Solr as a user index, and users have email
> addresses. : : Our old search behavior used a SQL substring match
> for any search : terms entered, and so users are used to being able
> to search for e.g. : "chr" and finding my email address
> ("ch...@christopherschultz.net"). : : By default, Solr doesn't
> perform substring matches, and it might be : difficult to re-train
> users to use *chr* to find email addresses by : substring.
> 
> In the past, were you really doing arbitrary substring matching, or
> just prefix matching?  ie would a search for "sto" match 
> "ch...@christopherschultz.net"

Yes. Searching for "sto" would result in a SQL query with a " WHERE
... LIKE '%sto%'" clause. So it was slow as hell, of course.

> Personally, if you know you have an email field, would suggest
> using a custom tokenizer that splits on "@" and "." (and maybe
> other punctuation characters like "-") and then take your raw user
> input and feed it to the prefix parser (instead of requiring your
> users to add the "*")...
> 
> q={!prefix f=email v=$user_input}_input=chr
> 
> ...which would match ch...@gmail.com, f...@chris.com, f...@bar.chr
> etc.
> 
> (this wouldn't help you though if you *really* want arbitrary
> substring matching -- as erick suggested ngrams is pretty much your
> best bet for something like that)
> 
> Bear in mind, you can combine that "forced prefix" query against 
> the (otkenized) email field with other queries that could parse
> your input in other ways...
> 
> user_input=... q=({!prefix f=email v=$user_input} OR {!dismax
> qf="first_name last_name" ..etc.. v=$user_input})
> 
> so if your user input is "chris" you'll get term matches on the 
> first_name field, or the last_name field as well as prefix matches
> on the email field.

The problem is that our users (admins) sometimes need to locate users
by their email address, and people often forget the exact spelling. So
they'll call and say "I can't get in" and we have to search for "chris
schultz" and then "chris" and then it turns out that their email
address was actually sexylove...@yahoo.com, so they often have to try
a bunch of searches before finding the right user record. Having to
search for "sexylover42", a complete-match word, isn't going to work
for their use-case. They need to be able to search for "lover" and
have it work. I think n-grams sounds like the only way to get this
done. I'll have to play-around with it a little bit to see how it behave
s.

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltYedQACgkQHPApP6U8
pFjzgQ/9GW7kI9Lefnmj7zH8JsqZfW1Y/PrF4YA1RjbliNWRn2dRPz7Q7C2ITO/n
Ys73uUII3qPz8M/H6d0LN57Un96BGAjIhf6WZSiIRAQcvenhGaS/lROciq6I8iN8
hB+1X2GixTG8fbq6Q6Q3jRG22S0GpW+OL2mJcu3wCkQ2dzyBWObWxjF1ag5O4pT+
AP0lqAgpUTsWAeMPPd6dkuStOhXraJQc+1WwwEw36gohwaZwLMftcOl2ohnys/DM
pdyqQEQ6fOldJLBHLU8PyNVHxJA5qZjVTwu3S7zv7w+2N+V8bHOl6y5ir3krOEs0
OIvFX+Do+pbsg+QQ5VY8LDxbPBCjgDiWTpplh3Ym0raaVMoMQ6GfFfsOPF9jYhxS
gb0eMwVTJFWM0xvMaH4xSXLR/Dh6upT/0do1sTr7kKjhIlwc3pfR/vIwqsVer1HJ
Qsj6Pc+ZJckOrPGGIYCZEWZwlS8ONinAx4fh23/C1GltU19kHtRvGTQLzRT+9sus
2stvkD44Lv7zuc49/Y07NISxcUceTlbZHKC5ebzAtKNDS2p+qYLJlbdTZQIofMsb
zmncdP+s5cSYgiCZZS19E2GxP7Yw2rmSn2zsSF6yJMgMy9logJi5HS1UQ54IWvn7
eAzvM+TcV6i+8Hf9kijNcg4/OZPv67DZt6HDcXO2K+a/AMyQElE=
=4Y/b
-END PGP SIGNATURE-

Re: Alias field names when searching (not for results)

2018-07-24 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Chris,

On 7/24/18 1:40 PM, Chris Hostetter wrote:
> 
> : So if I want to alias the "first_name" field to "first" and the :
> "last_name" field to "last", then I would ... do what, exactly?
> 
> se the last example here...
> 
> https://lucene.apache.org/solr/guide/7_4/the-extended-dismax-query-par
ser.html#examples-of-edismax-queries
>
>  defType=edismax q=sysadmin name:Mike qf=title text last_name
> first_name

Aside: I'm curious about the use of "qf", here. Since I didn't want my
users to have to specify any particular field to search, I created an
"all" field and dumped everything into it. It seems like it would be
better to change that so that I don't have an "all" field at all and
instead I mention all of the fields I would normally have packed into
the "all" field in the "qf" parameter. That would reduce my index size
and also help with another question I had today (subject: Possible to
define a field so that substring-search is always used?).

Does that sound like a better approach than packing-together an "all"
field during indexing?

> f.name.qf=last_name first_name
> 
> the "f.name.qf" has created an "alias" so that when the "q"
> contains "name:Mike" it searches for "Mike" in both the last_name
> and first_name fields.  if it were "f.name.qf=last_name
> first_name^2" then there would be a boost on matches in the
> first_name field.
> 
> For your usecase you want something like...
> 
> defType=edismax q=sysadmin first:Mike last:Smith qf=title text
> last_name first_name f.first.qf=first_name f.last.qf=last_name
> 
> : I'm using SolrJ as the client.
> 
> ...the examples above all show the request params, so "f.last.qf"
> is a param name, "last_name" is the corrisponding param value.

Awesome. I didn't realize that "f.alias.qf" was the name of the actual
parameter to send. I was staring at the Solr Dashboard's selection of
edismax parameters and not seeing anything that seemed correct. That's
because it's a new parameter! Makes sense, now.

Thanks a bunch,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltXb9IACgkQHPApP6U8
pFifZxAAgQGXwsMzSQf9shJYmjgLgFWTYxQQBRJDFRgtEz0wtYkRS0nEoE+kO0xs
BEGC6iXfXChAkOQ3Bv/QittRCxCQvXL+aoZA5ewcyumf0XhmU0My4R7AJOoIRGpO
C9oPfUf8bwqynrTN0cXBIN8pr+KAG4rimAEMLxuscVeQAm3McrNbmmX22LL9VNRv
/QBDnil8rPCYiprQn7SnN88IkU9irgwN/1QQ+YaUhwOMubPwygfxGTdkTJivi0KA
fi5nmYE8A+wOzAGlP8GrMUZpkIfVx8VV96fwKdCyw+fi8MXVF+6rd+Z0u4TOI6Yq
ZQ3d/GK7W5OImWpQOJUX9oHRmoKiUgE/27XRb6QSC/WwF1WOonClmHggSKkh24a8
dGa+5A6tbPdCxJwv9T2NPn7XBqOyvNfxzMUnItpIdNoM0lrHCOMmANoU6nnSjrPg
iInAM9oG2p41zO8S83tv7KLVbOwS1xogmeUn5fr/5XQ5Z7g7V5yBE5oYgVTiUleB
Sd+wjoCWeZIfLSJJfRYFLLjQmFqQOh2Fc6XCoyBYQeGLrlCiNLRHIS6dEisHFNq8
PLbXNuMyZOkrvLNFUWwYhC9pwQ8Q8z3C0i1uVSYlOVDd1GHVwJowVI9XCFbAGFoO
0ZXSy3TuHMgk8VGUZNNO0H9nHf3i8MAoMo4TDsgROs2Y9TXRVPM=
=AEkI
-END PGP SIGNATURE-

Re: Alias field names when searching (not for results)

2018-07-24 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Emir,

On 3/6/18 2:42 AM, Emir Arnautović wrote:
> I did not try it, but the first thing that came to my mind is to
> use edismax’s ability to define field aliases, something like 
> f.f1.fq=field_1. Note that it is not recommended to have field
> name starting with number so not sure if it will work with “1”.

So if I want to alias the "first_name" field to "first" and the
"last_name" field to "last", then I would ... do what, exactly?

I'm using SolrJ as the client.

   queryParamMap.put("defType", "edismax");
   queryParamMap.put([??], "f.first.fq=first_name f.last.fq=last_name");

??

Thanks,
- -chris

>> On 5 Mar 2018, at 17:51, Christopher Schultz
>>  wrote:
>> 
> All,
> 
> I'd like for users to be able to search a field by multiple names 
> without performing a "copy-field" when analyzing a document. Is
> that possible? Whenever I search for "solr alias field" I get
> results about how to re-name fields in the results.
> 
> Here's what I'd like to do. Let's say I have a document:
> 
> { id: 1234, field_1: valueA, field_2: valueB, field_3: valueC }
> 
> I'd like users to be able to find this document using any of the 
> following queries:
> 
> field_1:valueA f1:valueA 1:valueA
> 
> I just want the query parser to say "oh, 'f1' is an alias for 
> 'field_1'" and substitute that when performing the search. Is that 
> possible?
> 
> -chris
> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltXUWwACgkQHPApP6U8
pFgytBAAnI5pSrmfo4vPr2Tvol+qeuBXXcJ1WogXI2CvE1wWfHUm0xOXXzJ+YzMb
glv9UFs+VBfzksM9p4anJ0zLSQ82DxMv+dQ4c/rgMxTMkA/Yj7/9yBxp2jniFz5k
Jaq6FlAcpmQYDKTTx8pZb9srIWfXRoQg2Kv4zFDftD9jQi5Fekn1wt4PuhIWdrWi
9ROX4Pajx6wyJccamfTr5xSiBnzDcA6CBGGMFPmXVPWozYqcDfz4Ohry5MgbHMaR
wz0NMHSFjQ6zF9ZI28RM1z7gMT5xB1mG5HgC5oQWVD2V0PULdAIWC7tDZhlFGE6p
USjELBdeV6NNARz3sIbI8MD+T0Ww0SIekJgz3xNcs8TMIi2k5s1ksEdJl5flrsZ5
wbR7hNYol2nb0Bx6p/wk9wXwxqfDrW9yT3gNg+kYRrEWZdfLqLOXrytTZ7BhTz1O
6xoUX58FugULPyj9zT/DFTxMicjzdLrXUZR9kpRZXZSDhhn9NrzC1zFYJVs/E7W5
2LzguS3zD6pR7stxAory4KaeuJEaU3pBo80P9jslOjBDrmZRIKFLCSaynTwxi2pF
Z0LXwGw/Vpc96sznBe4BYvWmxKkjYGCAUjrXM+tortr2SxH2dd2/umXySB5uQRV8
hAjBkidVLm1pB6jirzxLOzOMeIXb6zXnlLhBbvXBvYVpY9yQNuw=
=f0ub
-END PGP SIGNATURE-

Re: Alias field names when searching (not for results)

2018-07-24 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rick,

On 3/6/18 6:39 PM, Rick Leir wrote:
> The first thing that came to mind is that you are planning not to 
> have an app in front of Solr. Without a web app, you will need to 
> trust whoever can get access to Solr. Maybe you are on an
> intranet.
Nope, we have a web application between the user and Solr. But I would
rather not parse the user's query string and re-write it so that the
search field-names are canonicalized.

Thanks,
- -chris

> On March 6, 2018 2:42:26 AM EST, "Emir Arnautović"
>  wrote:
>> Hi, I did not try it, but the first thing that came to my mind is
>> to use edismax’s ability to define field aliases, something like 
>> f.f1.fq=field_1. Note that it is not recommended to have field
>> name starting with number so not sure if it will work with “1”.
>> 
>> HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly
>> Detection Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
>> 
>> 
>> 
>>> On 5 Mar 2018, at 17:51, Christopher Schultz
>>  wrote:
>>> 
> All,
> 
> I'd like for users to be able to search a field by multiple names 
> without performing a "copy-field" when analyzing a document. Is
> that possible? Whenever I search for "solr alias field" I get
> results
>>> about
> how to re-name fields in the results.
> 
> Here's what I'd like to do. Let's say I have a document:
> 
> { id: 1234, field_1: valueA, field_2: valueB, field_3: valueC }
> 
> I'd like users to be able to find this document using any of the 
> following queries:
> 
> field_1:valueA f1:valueA 1:valueA
> 
> I just want the query parser to say "oh, 'f1' is an alias for 
> 'field_1'" and substitute that when performing the search. Is that 
> possible?
> 
> -chris
> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltXTxwACgkQHPApP6U8
pFjHaA/9HYIjEges8puLMM8S+hwoKigLFzrGstbyWlrj5xHQsBnaeQbNCAv8TSO5
/Yx911UbJEc00etpJSiXVUbMWbwvzt1QmjKZADaUtUQpKJ3i/eORnFOu3/FXrojX
LJFWNxasO/gpFMqz6ADqdsfjKLDiqDQHg6letg0QVQ4d3k3diD3rahJaoJYg67/e
OeEOHqK9LTY+v9HGdLUzLQ87C2FQScsvnTX6vmCU7HLXcbJFOly/KXamL8gulM5g
+sVQbMSB1l+jkU3TOkWZ2ovJJzB49qVto2ZxcrT682GHyHq8sZIX6nsFSRZQl7Af
rCe0Esgdk0SPCf3NIcZugEKmlawqWulzDhheyFVDwc5kQhMmi9CFU+/JbQcT4yeM
Q72TRCdESnH8W9jWDa9+WuBT7PW+BPBogBXhTT2JgptqPxA2iUPl1M9HdjqiZd4K
qdt65YZrpomAQpcDBa4Rzl0yG7UXOuu5A3Ms6nYFyOB0lHdsQqtSVLVSgw1hw3g9
3tnRlyBi1FrrSpwDew8oNobMGVMigb3sxvjAO3lv6g6DH8YEcIyJE197xFVd5091
m+OQSpgO3iZtr7YxruDlM/fvofOLNevQS4LcdhXZoZ4Txi6cAi12svxId8w4yycq
SEOfyXZvd9S0IOdC4UZVfJ+8Ome6Iy1BV+WHsdO8SWKoHW+m7cE=
=x/55
-END PGP SIGNATURE-

Possible to define a field so that substring-search is always used?

2018-07-24 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

We are using Solr as a user index, and users have email addresses.

Our old search behavior used a SQL substring match for any search
terms entered, and so users are used to being able to search for e.g.
"chr" and finding my email address ("ch...@christopherschultz.net").

By default, Solr doesn't perform substring matches, and it might be
difficult to re-train users to use *chr* to find email addresses by
substring.

Is there a way to define the field such that searches are always done
as a substring? While we are at it, I'd like to define the field to
avoid tokenization because it's never useful to search for
"m...@gmail.com" and find a few million search results because many
users use @gmail.com email addresses.

Here is the current field definition from our create-schema script:

  "add-field":{
 "name":"email_address",
 "type":"text_general",
 "multiValued" : false,
 "stored":true },

Later, we add the email address to the "all" field (which aggregates
everything from all useful fields into the field used as the
default-field):

  "add-copy-field":{
 "source":"email_address",
 "dest":"all" },

Is there a way to define these fields such that:

1. The email_address field is always searched using a substring
2. The email_address field is not tokenized
3. The copied-email-address is not tokenized in the "all" field

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltXTkcACgkQHPApP6U8
pFh1aRAAilB2nVGycjVyY2taAJv6x2ss33UcVL6xBATRUkHTCbyAr5LFN3FWmcOR
iCbZdxCU5LSa0x0clMTlRjR0U8HF+l2J4ArMQYiveA9mXc6fZz+ovAYrBqDguE6b
UZnbOcR3pDF+P5h3ch9aMbdkHAhsVN7AX5yiSIS0fqKn6irNrI7TkvRmiZqNzVFx
sDIPChL9meMfh8rz7vVmu5IjaImnQZ+2tmc+QruFsbgKGXJMR4n+d0CjacIfd5vp
hoZDpg9qcasnYau925xqlj4BBrPS1XiYOqvdgCxnO1l6qqVfBK+lVsPaP5FOtXZP
7Fe/unkzuK8j1Y0mZNpcZtMYYhsMHboT1Kegrn1mUZp9S6iL1NzbqzmsbDQyNqlg
8HghvGG7ROj/hkqLPOlGy6wp72GFQYrHuIEzdyDI9wHOaP+cdliCdkkmqIAQJilR
ketzTVhEbOHGEHGa9obHg0NPqmYwP4DDmSOZ42z5UPr2KqaqpeXsqcB2CV7nnvB3
6hvKuHVWIrHE1P1k1XFwMF3Vy+YbeojFbvKLH+eNKXXOXu8PEn2MaZU5v12WNWEr
0l6K16VnFf436WqH/fSa1DZUfuphA4z0qg/oHqcUcfhVFjc+U1wSZVvdvpG+rSf1
n3NS9pqFAWruWq7V0ID5cV0PVRwp9g6pgs4XJAhKYEkiXVO8u7Y=
=wAsa
-END PGP SIGNATURE-

Re: solr basic authentication

2018-06-21 Thread Christopher Schultz

Dinesh,

On 6/21/18 11:40 AM, Dinesh Sundaram wrote:
> is there any way to disable basic authentication for particular domain. i
> have proxy pass from a domain to solr which is always asking credentials so
> wanted to disable basic auth only for that domain. is there any way?

I wouldn't recommend this, in general, because it's not really all that
secure, but since you have a reverse-proxy in between the client and
Solr, why not have the proxy provide the HTTP BASIC authentication
information to Solr?

That may be a more straightforward solution.

-chris

signature.asc
Description: OpenPGP digital signature

Re: Solr Suggest Component and OOM

2018-06-11 Thread Christopher Schultz

Ratnadeep,

On 6/11/18 12:25 PM, Ratnadeep Rakshit wrote:
> I am using the Solr Suggester component in Solr 5.5 with a lot of address
> data. My Machine has allotted 20Gb RAM for solr and the machine has 32GB
> RAM in total.
> 
> I have an address book core with the following vitals -
> 
> "numDocs"=153242074
> "segmentCount"=34
> "size"=30.29 GB
> 
> My solrconfig.xml looks something like this -
> 
> 
> 
>   mySuggester1
>   FuzzyLookupFactory
>   suggester_fuzzy_dir
> 
>   
> 
>   DocumentDictionaryFactory
>   site_address
>   suggestType
>   property_metadata
>   false
>   false
> 
> 
>   mySuggester2
>   AnalyzingInfixLookupFactory
>   suggester_infix_dir
> 
>   DocumentDictionaryFactory
>   site_address_other
>   suggestType
>   property_metadata
>   false
>   false
> 
> 
> 
> The handler is defined like so -
> 
> 
> 
>   true
>   10
>   mySuggester1
>   mySuggester2
>   false
>   explicit
> 
> 
>   suggest
> 
> 
> 
> *Problem Statement*
> 
> Every time I try to build the suggest index using the suggest.build=true
> url parameter, I end up with an OutOfMemory error. I have no clue how I can
> make this work with the current setup. Can anyone explain why this is
> happening? And how can I fix this issue?
> *StackOverflow:*
> https://stackoverflow.com/questions/50802122/solr-suggest-component-and-outofmemory-error
> 

Can you explain the nature of the OOM? Not all OOMs are due to heap
exhaustion...

-chris




signature.asc
Description: OpenPGP digital signature

Re: Collections unable to load after setting up SSL

2018-06-11 Thread Christopher Schultz

Edwin,

On 6/10/18 10:22 PM, Zheng Lin Edwin Yeo wrote:
> I have found that we can't set it this way either, as we will get the below
> error on "no valid keystore".
> 
> set SOLR_SSL_KEY_STORE=/etc/solr-ssl.keystore.jks
> set SOLR_SSL_TRUST_STORE=/etc/solr-ssl.keystore.jks
> 
> Error:
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at org.eclipse.jetty.start.Main.invokeMain(Main.java:221)
> at org.eclipse.jetty.start.Main.start(Main.java:504)
> at org.eclipse.jetty.start.Main.main(Main.java:78)
> Caused by: java.lang.IllegalStateException: no valid keystore
> 
> 
> Any other ways can that we set or to generate the keystore?

File permissions on /etc/solr-*?

Effective user-id of the process trying to connect to Solr?

If you use relative paths, do you have any idea what the paths are
relative TO?

-chris

> On 9 June 2018 at 21:30, Zheng Lin Edwin Yeo  wrote:
> 
>> Hi Chris,
>>
>> I have deployed these files on the {SolrHome}\server\etc folder.
>>
>> Currently this is the setting of the path in edm.in.cmd.
>>
>> set SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.jks
>> set SOLR_SSL_TRUST_STORE=etc/solr-ssl.keystore.jks
>>
>> For your meaning of absolute paths actually start with a slash, meaning
>> we have to set it like this?
>>
>> set SOLR_SSL_KEY_STORE=/etc/solr-ssl.keystore.jks
>> set SOLR_SSL_TRUST_STORE=/etc/solr-ssl.keystore.jks
>>
>> Regards,
>> Edwin
>>
>>
>> On 9 June 2018 at 00:15, Christopher Schultz >> wrote:
>>
>>> Edwin,
>>>
>>> On 6/8/18 12:02 PM, Zheng Lin Edwin Yeo wrote:
>>>> I followed the steps from
>>>> https://lucene.apache.org/solr/guide/7_3/enabling-ssl.html.
>>>>
>>>> 1)
>>>>
>>>> keytool -genkeypair -alias solr-ssl -keyalg RSA -keysize 2048 -keypass
>>>> secret -storepass secret -validity  -keystore
>>>> solr-ssl.keystore.jks -ext
>>>> SAN=DNS:localhost,IP:192.168.1.3,IP:127.0.0.1 -dname "CN=localhost,
>>>> OU=Organizational Unit, O=Organization, L=Location, ST=State,
>>>> C=Country"
>>>>
>>>>
>>>> 2)
>>>>
>>>> keytool -importkeystore -srckeystore solr-ssl.keystore.jks
>>>> -destkeystore solr-ssl.keystore.p12 -srcstoretype jks -deststoretype
>>>> pkcs12
>>>>
>>>>
>>>> 3)
>>>>
>>>> openssl pkcs12 -in solr-ssl.keystore.p12 -out solr-ssl.pem
>>>>
>>>>
>>>>
>>>> I have also set these in solr.in.cmd:
>>>>
>>>> SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.jksSOLR_SSL_KEY_STO
>>> RE_PASSWORD=secretSOLR_SSL_TRUST_STORE=etc/solr-ssl.keystore
>>> .jksSOLR_SSL_TRUST_STORE_PASSWORD=secret#
>>>> Require clients to authenticateSOLR_SSL_NEED_CLIENT_AUTH=false# Enable
>>>> clients to authenticate (but not
>>>> require)SOLR_SSL_WANT_CLIENT_AUTH=false# Define Key Store type if
>>>> necessarySOLR_SSL_KEY_STORE_TYPE=JKSSOLR_SSL_TRUST_STORE_TYPE=JKS
>>>
>>> You didn't describe how you have deployed each of these files on each of
>>> your servers.
>>>
>>> You might want to make sure that all your (attempted) absolute paths
>>> actually start with a slash, though.
>>>
>>> -chris
>>>
>>>
>>
> 



signature.asc
Description: OpenPGP digital signature

Re: Collections unable to load after setting up SSL

2018-06-08 Thread Christopher Schultz

Edwin,

On 6/8/18 12:02 PM, Zheng Lin Edwin Yeo wrote:
> I followed the steps from
> https://lucene.apache.org/solr/guide/7_3/enabling-ssl.html.
> 
> 1)
> 
> keytool -genkeypair -alias solr-ssl -keyalg RSA -keysize 2048 -keypass
> secret -storepass secret -validity  -keystore
> solr-ssl.keystore.jks -ext
> SAN=DNS:localhost,IP:192.168.1.3,IP:127.0.0.1 -dname "CN=localhost,
> OU=Organizational Unit, O=Organization, L=Location, ST=State,
> C=Country"
> 
> 
> 2)
> 
> keytool -importkeystore -srckeystore solr-ssl.keystore.jks
> -destkeystore solr-ssl.keystore.p12 -srcstoretype jks -deststoretype
> pkcs12
> 
> 
> 3)
> 
> openssl pkcs12 -in solr-ssl.keystore.p12 -out solr-ssl.pem
> 
> 
> 
> I have also set these in solr.in.cmd:
> 
> SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.jksSOLR_SSL_KEY_STORE_PASSWORD=secretSOLR_SSL_TRUST_STORE=etc/solr-ssl.keystore.jksSOLR_SSL_TRUST_STORE_PASSWORD=secret#
> Require clients to authenticateSOLR_SSL_NEED_CLIENT_AUTH=false# Enable
> clients to authenticate (but not
> require)SOLR_SSL_WANT_CLIENT_AUTH=false# Define Key Store type if
> necessarySOLR_SSL_KEY_STORE_TYPE=JKSSOLR_SSL_TRUST_STORE_TYPE=JKS

You didn't describe how you have deployed each of these files on each of
your servers.

You might want to make sure that all your (attempted) absolute paths
actually start with a slash, though.

-chris



signature.asc
Description: OpenPGP digital signature

Re: Collections unable to load after setting up SSL

2018-06-08 Thread Christopher Schultz

Edwin,

On 6/7/18 11:11 PM, Zheng Lin Edwin Yeo wrote:
> Hi,
> 
> I am running SolrCloud on Solr 7.3.1 on External ZooKeeper 3.4.11, and I am
> setting up the security aspect of Solr.
> 
> After setting up the SSL based on the steps from
> https://lucene.apache.org/solr/guide/7_3/enabling-ssl.html, the collections
> that are with 2 replica are no longer able to be loaded.
> 
> What could be causing the issue?
> 
> I remember that wasn't this problem when I tried the same thing in Solr 6
> and even Solr 7.1.

I've fought a bit to get Solr running on a single instance with SSL, so
I can imagine that ZK might be an issue for you.

Can you describe how each server's truststores and keystores are
configured? Are you using client-validated servers (e.g. one-way TLS
like you would with most public web sites) or are you using
mutual-authentication where the server is also checking the client's
certificate?

-chris

signature.asc
Description: OpenPGP digital signature

Re: Windows monitoring software for Solr recommendation

2018-06-05 Thread Christopher Schultz

TK

On 6/5/18 1:12 PM, TK Solr wrote:
> My client's Solr 6.6 running on a Windows server is mysteriously
> crashing without any JVM crash log. No unusual activities recorded in
> solr.log. GC log does not indicate the OOM situation. It's a simple
> single-core, single node deployment (no solrCloud). It has very light
> load. No indexing activities were running near the crash time.
> 
> After exhausting all possibilities (suggestions are welcome), I'd like
> to recommend to install some monitoring software but I couldn't find one
> that works on Windows for a Java based software. (Some I found can
> monitor only EXEs. Since all java software shares the same EXE,
> java.EXE, those won't work.) Can anyone recommend some? They don't need
> to be free but can't be very expensive since it's a very lightly used
> Solr system. Perhaps less than $500?

How about Apache procrun/commons-daemon?

https://commons.apache.org/proper/commons-daemon/procrun.html

I don't know how much of a pain it would be to set it up to run Solr,
but it runs Apache Tomcat quite well and has its own logs for things
like "process died, restarted" which might give you some insight.

-chris



signature.asc
Description: OpenPGP digital signature

Re: Self Signed Certificate for Load Balancer and Solr Nodes

2018-06-01 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Kelly,

On 6/1/18 5:41 PM, Kelly Rusk wrote:
> I can directly connect to either node without issue, it is only
> when the Load Balancer routes to either solr1 or solr2 that I get
> the security error (ex. https://solrlb.com:8983/solr). The Load
> Balancer is not managing HTTPS but just acting as a pure TCP proxy.
> Nothing more complex than sending traffic to either solr1 or
> solr2... however, the URL will be displayed as solrlb.com as it
> hides the real address of what is being routed to.
> 
> In this case, do we need a certificate for solrlb.com installed on
> both solr1 and solr2?

That's exactly what you need. It would be best to:

1. Create a certificate for solrlb.com
2. Install the same key + certificate on both Solr nodes
3. Always use solrlb.com for any links and redirects you generate

Optionally, you could add SANs for that certificate for both solr1 and
solr2 just in case you want to be able to connect directly to either
back-end node without getting hostname mismatch complaints.

> In our previous environments we used the same load balancer setup,
>  but that worked since the Solr nodes were serving over http and
> not https.
You probably never noticed that redirects were occurring that were
sending users to a particular node instead of always using the lb's
hostname because there was never anything double-checking the hostname.

In your previous message, you mentioned that you got an error message
including the hostname "b-win-solr-01.azure-dfa.com" which probably
isn't your load-balancer's hostname. That suggests to me that some
kind of redirect (or similar) is occurring and that the redirect
doesn't understand that there is a reverse-proxy/lb out in front of
the node.

Hope that helps,
- -chris

> -Original Message- From: Shawn Heisey 
>  Sent: Friday, June 1, 2018 5:25 PM To:
> solr-user@lucene.apache.org Subject: Re: Self Signed Certificate
> for Load Balancer and Solr Nodes
> 
> On 6/1/2018 2:01 PM, Kelly Rusk wrote:
>> We have solr1.com and solr2.com self-signed certs that correspond
>> to the two servers. We also have a load balancer with an address
>> named solrlb.com. When we hit the load balancer it gives us an
>> SSL error, as it is passing us back to either solr1.com or
>> solr2.com, but since these two Solr servers only have each
>> other's self-signed cert installed in their Keystore, it doesn't
>> resolve when it comes in through the load balanced address of
>> solrlb.com.
>> 
>> We tried a san certificate that has all 3 addresses, but when we
>> do this, we get the following error:
>> 
>> This page can't be displayed Turn on TLS 1.0, TLS 1.1, and TLS
>> 1.2 in Advanced settings and try connecting to
>> https://b-win-solr-01.azure-dfa.com:8983  again. If this error
>> persists, it is possible that this site uses an unsupported
>> protocol or cipher suite such as RC4 (link for the details),
>> which is not considered secure. Please contact your site
>> administrator.
> 
> One really important question is whether the load balancer acts as
> a pure TCP proxy, or whether the load balancer is configured with a
> certificate and handles HTTPS itself.
> 
> If the load balancer is handling HTTPS, it's very likely that the
> load balancer either cannot use modern TLS protocols and/or
> ciphers, or that it has the modern protocols/ciphers turned off.
> There's probably nothing that we can do to help you in this
> situation.  You will need to find support for your load balancer.
> 
> If the load balancer is just a TCP proxy and lets the back end
> server handle HTTPS, then you may need to ensure that you're
> running a very recent version of Java 8.  You may also need to
> install the JCE policy files for unlimited strength encryption into
> your Java.  I see from other messages on the list that you're
> running Solr 6.6.2, so it would not be a good idea for you to use
> Java 9 or Java 10.  If you need them, the JCE policy files for Java
> 8 can be found here:
> 
> http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-
2133166.html
>
>  One thing you didn't explicitly mention is whether the connection
> works when talking directly to one of the Solr servers instead of
> the load balancer.  If that works, then your Java version is
> probably fine, and it's even more evidence that the problem is on
> the load balancer.
> 
> Thanks, Shawn
> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlsRwYwACgkQHPApP6U8
pFh5LA/+MWkaeylVbsXwL9TxU/qe6fyW82/OVznqDNVKF2KwvtloMjKIyo90ZdqB
N2fqRfczyqN2NporI7dZtj68Qcb7JiOkzfKUQJX/4Ecgfl6WhcrcnzC6jt9B6oQR
c0W02QGGKREz2l719ZI4wohgGPX7HD+u+GXlUdz+v1Bw+4vZlG9LzDJ7YC9XDgXX
1hUDfdmBHS2krMnp5/1bsIvg9Xr58Orrwz20EKyumzUZ/P9WekoUw7WeqJSuuQoN
n3+yM8BMPp/AUy7+5gcvaKtd9mB6J4oUyQQAfj+cNOg/eOiY2t+EFr8b+pVBDG+z

Re: [OT] Self Signed Certificate for Load Balancer and Solr Nodes

2018-06-01 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 6/1/18 5:25 PM, Shawn Heisey wrote:
> On 6/1/2018 2:01 PM, Kelly Rusk wrote:
>> We have solr1.com and solr2.com self-signed certs that correspond
>> to the two servers. We also have a load balancer with an address
>> named solrlb.com. When we hit the load balancer it gives us an
>> SSL error, as it is passing us back to either solr1.com or
>> solr2.com, but since these two Solr servers only have each
>> other's self-signed cert installed in their Keystore, it doesn't
>> resolve when it comes in through the load balanced address of
>> solrlb.com.
>> 
>> We tried a san certificate that has all 3 addresses, but when we
>> do this, we get the following error:
>> 
>> This page can't be displayed Turn on TLS 1.0, TLS 1.1, and TLS
>> 1.2 in Advanced settings and try connecting to
>> https://b-win-solr-01.azure-dfa.com:8983  again. If this error
>> persists, it is possible that this site uses an unsupported
>> protocol or cipher suite such as RC4 (link for the details),
>> which is not considered secure. Please contact your site
>> administrator.
> 
> One really important question is whether the load balancer acts as
> a pure TCP proxy, or whether the load balancer is configured with
> a certificate and handles HTTPS itself.
> 
> If the load balancer is handling HTTPS, it's very likely that the
> load balancer either cannot use modern TLS protocols and/or
> ciphers, or that it has the modern protocols/ciphers turned off.
> There's probably nothing that we can do to help you in this
> situation.  You will need to find support for your load balancer.
> 
> If the load balancer is just a TCP proxy and lets the back end
> server handle HTTPS, then you may need to ensure that you're
> running a very recent version of Java 8.  You may also need to
> install the JCE policy files for unlimited strength encryption into
> your Java.  I see from other messages on the list that you're
> running Solr 6.6.2, so it would not be a good idea for you to use
> Java 9 or Java 10.  If you need them, the JCE policy files for Java
> 8 can be found here:
> 
> http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-
2133166.html

Starting
> 
with Oracle Java 8u151 and later, the "unlimited strength
jurisdiction policy files" are included in the default build, so you
no longer have to manually-install them.

Nice to see that Java finally got out of the 1990s mindset when it
comes to cryptography.

Unfortunately, Java 8 is close to EOL[2] so it's time to look at newer
versions of Java, which likely means newer versions of Solr if you
want to be safe and secure. I say "close to EOL" even though it's 7
months away because it can take a looong time to plan and execute an
upgrade of both Solr and Java.

- -chris

[1]
https://golb.hplar.ch/2017/10/JCE-policy-changes-in-Java-SE-8u151-and-8u
152.html
[2] http://www.oracle.com/technetwork/java/javase/eol-135779.html
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlsRwEIACgkQHPApP6U8
pFi+ng/+KhdiGaSN4PHjHvroqcNXKmmvXbMIIHcHCAARGnTS+0LuZGhopWbJA0u+
NhE/fHJTyRFtfCBaY6gL9NsumAQTXA2kCLsKpWv86WaVEWZSH55BC/0aJCNp/xOU
/QheBJ255RDBYeLZvGAngAS7mWK1wPh6BhsD0bNwtoU7xGCZQtvLt7CdQLu+F8Dm
uJczJOipp8SS/TlTJcP9t02WW3RvjqIZbn4EEr0DZj7hzy1ST8/yzu7cNpo+uQw5
AmoIDik8TmVKmT7h/gW8/frpz7brI+Zw3qm+YELpJK2SQywqhZFdhPjnnAqYKqY0
AuVJlYeC+0ivw/3oHQM/kShzqgXiMTv8bp63BbEYcWt1z9pb2Ltrx/jHsEQYr6k1
bxHAnrXXoQQTq8wm4jqYBSfEB97JyYWqCKJ04HyhxJ9Tzqv5vUwL1xXf4mY0m6dA
eDGoKQ3fjHZaMzUhc0c/zv4MwMH+KYzZ05Y5mdT1UHaYGX3sUMGhdSyNlvWZy4Np
G7ehzOdsuEO+b5+YBQQpWarei76I5soPttkz5rrvWfksn8jUHo0VoqDVs0/g6uY4
5p85OJPF/C4quLDWHN1swpVQJ2q4R3C4RdGjdb2WT+hkks6c1WkqGfkAH2ONA+DS
dxG83u9aDxm+eyoj+GvMlTIAGnqutU2nNQrErb5sGjVHkQaLLaw=
=PK53
-END PGP SIGNATURE-

Re: CURL command problem on Solr

2018-05-30 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Roee,

On 5/30/18 3:38 AM, Roee T wrote:
> Thank you so much all of you the following worked for me!
> 
> curl -X PUT -H "Content-Type: application/json" -d
> "@Myfeatures.json" 
> "http://localhost:8983/solr/techproducts/schema/feature-store;

Curl assumes that the URL is the last argument to the program, so it
stops reading options (left-to-right) when it gets to the URL.

So if you put options after the URL they will be ignored.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlsOzDIACgkQHPApP6U8
pFjHjhAAjwI0xMzqK1Pzhogiumq6XVNLr8BqqlL2jMMXcf3EcwOx5WT62oqDFs92
JILArQSPp22GUOvR3cTxlmTVAYsjvMqsvkbVugxeU5VtBBz1VVwy3bU220nKlooo
El9T0292wbuP/QbUGdt0qfnpMkXIfbdwKJhd7MQ30J8S7XxvQx8j5YQhe2MAkPlz
x7Bc4Qy2J6ov5wNq2sd4wuj5XvvjDE+8pFDXWtC6m7mfjsbGrHTAIoTI843GAVRz
RkMd08vzsmoS81cNsaQAqxJCX0tP2Hwbx0asH94ZO0ohlHe8dB5hmk1fS0TDgNae
QR4hczJ3lYQCpvZXYFCUihC/7Sfpe3/yjs/Ke2DlbUtXJHaLulSYoo7RrgTl3JZy
zBne6HNtcruvQAqDIjKq8xcAzszLsxVPA4RGqO/J5uY96hyuUe/NuJUeUdRTkIbU
wC+DYs8ch7PeOMGkW1MYWSeakPRdQ1/5EKS1mtubJNBVOCri+hy4I+KT5V1f9y8x
8GIySXaoH52xt3b/hsJajQ2PdHd4KRGgB1H7mx9ntXsoVzmPSanuxQ6w+E/XUHDt
iyl2WheLtUop+ukE7ahGUe+IPEVqTMXtdiQBCDB0IWyGbsB00M5P9ZUeFbOCCfle
B0N3Jafv7hGjLHzfjpu3lAUneS3ct2Ljy4Za2snW/ZgMzezHUUY=
=xZzo
-END PGP SIGNATURE-

Re: CURL command problem on Solr

2018-05-29 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Roee,

On 5/29/18 11:02 AM, Roee Tarab wrote:
> I am having some troubles with pushing a features file to solr
> while building an LTR model. I'm trying to upload a JSON file on
> windows cmd executable from an already installed CURL folder, with
> the command:
> 
> curl -XPUT
> 'http://localhost:8983/solr/techproducts/schema/feature-store' 
> --data-binary "@/path/myFeatures.json" -H
> 'Content-type:application/json'.
> 
> I am receiving the following error massage:
> 
> { "responseHeader":{ "status":500, "QTime":7}, "error":{ "msg":"Bad
> Request", "trace":"Bad Request (400) - Invalid content type 
> application/x-www-form-urlencoded; only application/json is 
> supported.\r\n\tat
> org.apache.solr.rest.RestManager$ManagedEndpoint. 
> parseJsonFromRequestBody(RestManager.java:407)\r\n\tat
> org.apache.solr.rest. 
> RestManager$ManagedEndpoint.put(RestManager.java:340) 
> 
> This is definitely a technical issue, and I have not been able to
> overcome it for 2 days.
> 
> Is there another option of uploading the file to our core? Is
> there something we are missing in our command?

What happens if you put the URL as the very last command-line option,
instead of the second one?

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlsNpFMACgkQHPApP6U8
pFjCxQ//WvN5ISrSf1Hoek+HA9e/1jgvHdtbPTz3SKx9Cxv0M28VDE41dOD8/TJU
Yeu8WIIyjbAOugPxYd6X/1Q+ksmzp8DwcANO4uWjM7m9KnrKUcgUqFbiEx5DCWFv
cCO49lD6pbnP7M21BFqIUPdRu4Sk84bObhb8+pFiANDurGG9iDGsk4z5JG8kph1n
QtJeyGss79GF4Fb8Ojs+rju+fcMW9tssi2NCbPI/OUmcEntonmVQKW6Zg8WaqlXD
w29gjss9P6sMloyIe4QbusxfwCL//HdCjuTBOAOZg/Od+Xb4bHG3AkZGqjmf21qC
oR7hjwkQtjl9C9yK5pHMPvAK1bUR8NCuv993dCOw3ddwdPsScv7K7TsI7GqVOfCD
X+PwkrE1PeZbPfSJGO4jVEwRIZ1zx5jRwl2WFpa0HSTnN2+GHVZnezqqIOW6HVax
Hb/7r13vs+6jOUBQPZvzcWtnGl7DurAwYM3nREgBjzMeXYMKqI67lwSBieoyC/da
a8GxkZBn6J+vutLI/hodi8ymUB+wNxiV6W4XTG8t2HSLGmZWD9fUgW6gr4a8WRQk
LM8yzmVSADjTkf5/fdKZ9ausYoMwHzrxKc0ceuK1iEF9WNts6AdOoIcIxrrFfr0v
yPyXnVaGS/5eLnwEt3vR8DROZRpX6OUKgteZRln0QQpAWegzW/I=
=U23/
-END PGP SIGNATURE-

Re: Question regarding TLS version for solr

2018-05-24 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Anchal,

On 5/24/18 6:02 AM, Anchal Sharma2 wrote:
> Thanks a lot for sharing the steps . I tried few of them .Actually
> we already have been using solr in our application since an year or
> so  .We just want to encrypt it to use secure solr now .So ,I
> followed the steps where you have created the certificates ,etc
> .But when I go to start the solr back ,it doesnt start . We are
> using zookeeper .Following is the error I get ,on running solr
> start command.
> 
> Command:./solr -c -m 1g -p 8984 -z :2181 -s  folder containing data>
> 
> Error:
> 
> lsof 4.55 (latest revision at
> ftp://vic.cc.purdue.edu/pub/tools/unix/lsof) usage:
> [-?abhlnNoOPRstUvVX] [-c c] [+|-d s] [+|-D D] [+|-f[cfgGn]] [-F
> [f]] [-g [s]] [-i [i]] [+|-L [l]] [-m m] [+|-M] [-o [o]] [-p s] 
> [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [--] [names] Use the
> ``-h'' option to get more help information. Still not seeing Solr
> listening on 8984 after 30 seconds! at
> java.security.KeyStore.load(KeyStore.java:1456) at
> org.eclipse.jetty.util.security.CertificateUtils.getKeyStore(Certifica
teUtils.java:55)
>
> 
at
org.eclipse.jetty.util.ssl.SslContextFactory.loadKeyStore(SslContextFact
ory.java:871)
> at
> org.eclipse.jetty.util.ssl.SslContextFactory.doStart(SslContextFactory
.java:273)
>
> 
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCyc
le.java:68)
> at
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLif
eCycle.java:132)
>
> 
at
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLif
eCycle.java:114)
> at
> org.eclipse.jetty.server.SslConnectionFactory.doStart(SslConnectionFac
tory.java:64)
>
> 
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCyc
le.java:68)
> at
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLif
eCycle.java:132)
>
> 
at
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLif
eCycle.java:114)
> at
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.j
ava:256)
>
> 
at
org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetwor
kConnector.java:81)
> at
> org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:
236)
>
> 
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCyc
le.java:68)
> at org.eclipse.jetty.server.Server.doStart(Server.java:366) at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeC
ycle.java:68)
>
> 
at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:12
55)
> at
> java.security.AccessController.doPrivileged(AccessController.java:594)
>
> 
at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:117
4)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
ava:90)
>
> 
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:508) at
> org.eclipse.jetty.start.Main.invokeMain(Main.java:321) at
> org.eclipse.jetty.start.Main.start(Main.java:817) at
> org.eclipse.jetty.start.Main.main(Main.java:112) 2018-05-24
> 09:05:16.714 INFO
> (zkCallback-3-thread-1-processing-n:9.109.122.113:8984_solr) [   ]
> o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
> state:SyncConnected type:NodeDataChanged path:/clusterstate.json,
> has occurred - updating... (live nodes size: 1) 2018-05-24
> 09:05:17.018 INFO
> (zkCallback-3-thread-1-processing-n:9.109.122.113:8984_solr) [   ]
> o.a.s.c.c.ZkStateReader Updated cluster state version to 9702 
> 2018-05-24 09:05:17.153 INFO
> (coreLoadExecutor-7-thread-2-processing-n:9.109.122.113:8984_solr)
> [c:document  r:core_node1 x:document] o.a.s.u.SolrIndexConfig
> IndexWriter infoStream solr logging is enabled [\]  sleep: bad
> character in argument


What does the solr.log file say? The above stack trace isn't terribly
helpful, and it's incomplete.

- -chris

> -Christopher Schultz <ch...@christopherschultz.net> wrote:
> - To: solr-user@lucene.apache.org From: Christopher Schultz
> <ch...@christopherschultz.net> Date: 05/23/2018 07:29PM Subject:
> Re: Question regarding TLS version for solr
> 
> Anchal,
> 
> On 5/23/18 2:38 AM, Anchal Sharma2 wrote:
>> Thank you for replying .But ,I checked the java version solr
>> using ,and it is already  version 1.8.
> 
>> @Christopher ,can you let me know what steps you followed for
>> TLS authentication on solr version 7.3.0.
> 
> Sure. Here are my deployment notes. You may have to adjust them 
> slightly for your environment. Note that we are using standalone

Re: Question regarding TLS version for solr

2018-05-23 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Anchal,

On 5/23/18 2:38 AM, Anchal Sharma2 wrote:
> Thank you for replying .But ,I checked the java version solr using
> ,and it is already  version 1.8.
> 
> @Christopher ,can you let me know what steps you followed for TLS
> authentication on solr version 7.3.0.

Sure. Here are my deployment notes. You may have to adjust them
slightly for your environment. Note that we are using standalone Solr
without any Zookeeper, clustering, etc. This is just about configuring
a single instance. Also, this guide says 7.3.0, but 7.3.1 would be
better as it contains a fix for a CVE.

=== CUT ===

 Instructions for installing Solr and working with Cores

Installation
- 

Installing Solr is fairly simple. One can simply untar the distribution
tarball and work from that directory, but it is better to install it
in a somewhat more centralized place with a separate data directory
to facilitate upgrades, etc.

1. Obtain the distribution tarball
   Go to https://lucene.apache.org/solr/mirrors-solr-latest-redir.html
   and obtain the latest supported version of Solr.
   (7.3.0 as of this writing).

2. Untar the archive
   $ tar xzf solr-x.y.x.tgz

3. Install Solr
   $ cd solr-x.y.z
   $ sudo bin/install_solr_service.sh ../solr-x.y.z.tgz \
 -i /usr/local \
 -d /mnt/securefs/solr \
 -n
   (that last -n says "don't start Solr")

4. Configure Solr Settings
   Edit the file /etc/default/solr.in.sh

   Settings you may want to explicitly set:

   SOLR_JAVA_HOME=(java home)
   SOLR_HEAP="1024M"

5. Configure Solr for TLS
   Create a server key and certificate:
   $ sudo mkdir /etc/solr
   $ sudo keytool -genkey -keyalg EC -sigalg SHA256withECDSA -keysize
256 -validity 730 \
  -alias 'solr-ssl' -keystore /etc/solr/solr.p12 -storetype
PKCS12 \
  -ext san=dns:localhost,ip:192.168.10.20
 Use the following information for the certificate:
 First and Last name: 192.168.10.20 (or "localhost", or your
IP address)
 Org unit:  [whatever]
 Everything else should be obvious

   Now, export the public key from the keystore.

   $ sudo /usr/local/java-8/bin/keytool -list -rfc -keystore
/etc/solr/solr.p12 -storetype PKCS12 -alias solr-ssl

   Copy that certificate and paste it into this command's stdin:

   $ sudo keytool -importcert -keystore /etc/solr/solr-server.p12
- -storetype PKCS12 -alias 'solr-ssl'

   Now, fix the ownership and permissions on these files:

   $ sudo chown root:solr /etc/solr/solr.p12 /etc/solr/solr-server.p12
   $ sudo chmod 0640 /etc/solr/solr.p12

   Edit the file /etc/default/solr.in.sh

   Set the following settings:

   SOLR_SSL_KEY_STORE=/etc/solr/solr.p12
   SOLR_SSL_KEY_STORE_TYPE=PKCS12
   SOLR_SSL_KEY_STORE_PASSWORD=whatever

   # You MUST set the trust store for some reason.
   SOLR_SSL_TRUST_STORE=/etc/solr/solr-server.p12
   SOLR_SSL_TRUST_STORE_TYPE=PKCS12
   SOLR_SSL_TRUST_STORE_PASSWORD=whatever

   Then, patch the file bin/post; you are going to need this, later.

- --- bin/post2017-09-03 13:29:15.0 -0400
+++ /usr/local/solr/bin/post2018-04-11 20:08:17.0 -0400
@@ -231,8 +231,8 @@
   PROPS+=('-Drecursive=yes')
 fi

- -echo "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}"
org.apache.solr.util.SimplePostTool "${PARAMS[@]}"
- -"$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}"
org.apache.solr.util.SimplePostTool "${PARAMS[@]}"
+echo "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}"
${SOLR_POST_OPTS} org.apache.solr.util.SimplePostTool "${PARAMS[@]}"
+"$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}" ${SOLR_POST_OPTS}
org.apache.solr.util.SimplePostTool "${PARAMS[@]}"

6. Configure Solr to Require Client TLS Certificates

  On each client, create a client key and certificate:

  $ keytool -genkey -keyalg EC -sigalg SHA256withECDSA -keysize 256 \
-validity 730 -alias 'solr-client-ssl'

  Now dump the certificate for the next step:

  $ keytool -exportcert -keystore [client-key-store] -storetype PKCS12 \
-alias 'solr-client-ssl'

  Don't forget that you might want to generate your own client certifica
te
  to use from you own web browser if you want to be able to connect to t
he
  server's dashboard.

  Use the output of that command on each client to put the cert(s)
into this
  trust store on the server:

  $ sudo keytool -importcert -keystore
/etc/solr/solr-trusted-clients.p12 \
 -storetype PKCS12 -alias '[client key alias]'

Edit /etc/default/solr.in.sh and add the following entries:

  SOLR_SSL_NEED_CLIENT_AUTH=true
  SOLR_SSL_TRUST_STORE=/etc/solr/solr-trusted-clients.p12
  SOLR_SSL_TRUST_STORE_TYPE=PKCS12
  SOLR_SSL_TRUST_STORE_PASSWORD=whatever

Summary of Files in /etc/solr
- -

solr-client.p12   Client keystore. Contains client key and certificate.
  Used by clients to

Re: Question regarding TLS version for solr

2018-05-17 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 5/17/18 4:23 AM, Shawn Heisey wrote:
> On 5/17/2018 1:53 AM, Anchal Sharma2 wrote:
>> We are using solr version 5.3.0 and  have been  trying to enable 
>> security on our solr .We followed steps mentioned on site 
>> -https://lucene.apache.org/solr/guide/6_6/enabling-ssl.html .But
>> by default it picks ,TLS version  1.0,which is causing an issue
>> as our application uses TLSv 1.2.We tried using online resources
>> ,but could not find anything regarding TLS enablement for solr .
>> 
>> It will be a huge help if anyone can provide some suggestions as
>> to how we can enable TLS v 1.2 for solr.
> 
> The choice of ciphers and encryption protocols is mostly made by
> Java. The servlet container might influence it as well. The only
> servlet container that is supported since Solr 5.0 is the Jetty
> that is bundled in the Solr download.
> 
> TLS 1.2 was added in Java 7, and it became default in Java 8. If
> you can install the latest version of Java 8 and make sure that it
> has the policy files for unlimited crypto strength installed,
> support for TLS 1.2 might happen automatically.

There is no "default" TLS version for either the client or the server:
the two endpoints always negotiate the highest mutual version they
both support. The key agreement, authentication, and cipher suites are
the items that are negotiated during the handshake.

> Solr 5.3.0 is running a fairly old version of Jetty -- 9.2.11. 
> Information for 9.2.x versions is hard to find, so although I think
> it probably CAN do TLS 1.2 if the Java version supports it, I can't
> be absolutely sure.  You'll need to upgrade Solr to get an upgraded
> Jetty.

I would be shocked if Jetty ships with its own crypto libraries; it
should be using JSSE.

Anchal,

Java 1.7 or later is an absolute requirement if you want to use
TLSv1.2 (and you SHOULD want to use it).

I have recently spent a lot of time getting Solr 7.3.0 running with
TLS mutual-authentication, but I haven't worked with the 5.3.x line. I
can tell you have I've done things for my version, but they may need
some adjustments for yours.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlr9fKYACgkQHPApP6U8
pFh8lRAAmmvBMUSk35keW0OG0/SHpUy/ExJK69JGIKGwi96ddbz2yH8MG+OjjE3G
GNq/o5+EMT7tP/nW6XuPQou5UQvA2nlA9jsskox3A+CqOH7e6cbSxfxIkTqf9YDl
Kxr4J6mYjvTIjJAqLXGF+ghJfswS6RjZezDgo1PdSUox+gUOvmY61tlSjuYTaAYw
vH1i1DRzb8PkkR4ULePF48Y4r5+ZYz/4ZwSvnJTTkyl97KCw93rZ/kI5v9p3cCHK
Ycuwi/ZirO/VNf/9ruAOtgET3aojNfuNCX/A+vrSbJfiY7mXo05lYKN+eT80elQr
X8OKQaqHP6haF2aNPHrqXGtY2YoiGrdyaGtrXkUHFDfXgQeOmlk/eSVWemcSsatk
eEHSWW9NALMaalRAM7NuXQtgqq1badJhKysiJwSqFgcdgVKcSt8SsQ/09qTPjaNE
Ce1/EHdR6j1hM0Bnv5Hzf85cZjM7PfLmh7P8fnUD5d8eSbBpeWYVBDsS+fXp8WWv
FO5axbnSYIScOIz33i0UZyxpJgcsAkABLGghL6WWQSkfBf4ANgdTumS7K9Pn7Thz
Uq+lD9QPEPWJ91Fc0gnCWtDAEIRjOyLLbYzgI4ebV5qo41GO1WDDHfQZEcqA0Vod
+K8oAMD8nnwU+TprTFkjlQwbDnW1q1efTD6IrpEL5H7h6Xw2cgg=
=RpO6
-END PGP SIGNATURE-

Re: Using Solr / Lucene with OpenJDK

2018-05-10 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 4/24/18 11:23 AM, Shawn Heisey wrote:
> On 4/24/2018 8:50 AM, Steven White wrote:
>> Does anyone use Solr, any version, with OpenJDK?  If so, what
>> has been you experience?  Also, what platforms have you used it
>> on?
> 
> I've used it on Linux.  It worked without issues.

What version of OpenJDK did you happen to run?

My preference would be for Java 8, but on my Debian Wheezy install
only Java 7 is available.

In general, I prefer package-managed versions of everything whenever
possible, but I have found that on Debian the openjdk package requires
many dependencies that I might not otherwise want to install (at least
not globally)[1], so I tend to go with the tarball from Oracle.

I'm still on the fence for a production deployment.

- -chris


[1] Here's what Debian Wheezy currently says it wants to install when
I tell it to install the "default-jre" package:

  ca-certificates-java dbus default-jre default-jre-headless
fontconfig fontconfig-config hicolor-icon-theme java-common libasound2
  libasyncns0 libatk-wrapper-java libatk-wrapper-java-jni libatk1.0-0
libatk1.0-data libavahi-client3 libavahi-common-data
  libavahi-common3 libcairo2 libcups2 libdatrie1 libdbus-1-3
libdrm-intel1 libdrm-nouveau1a libdrm-radeon1 libdrm2 libffi5 libflac8
  libfontconfig1 libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-common libgif4
libgl1-mesa-dri libgl1-mesa-glx libglapi-mesa libglib2.0-0
  libglib2.0-data libgtk2.0-0 libgtk2.0-bin libgtk2.0-common libice6
libjasper1 libjbig0 libjpeg8 libjson0 liblcms2-2 libnspr4
  libnss3 libogg0 libpango1.0-0 libpciaccess0 libpcsclite1
libpixman-1-0 libpng12-0 libpulse0 libsctp1 libsm6 libsndfile1
  libsystemd-login0 libthai-data libthai0 libtiff4 libvorbis0a
libvorbisenc2 libx11-6 libx11-data libx11-xcb1 libxau6 libxcb-glx0
  libxcb-render0 libxcb-shm0 libxcb1 libxcomposite1 libxcursor1
libxdamage1 libxdmcp6 libxext6 libxfixes3 libxft2 libxi6 libxinerama1
  libxrandr2 libxrender1 libxtst6 libxxf86vm1 lksctp-tools
openjdk-7-jre openjdk-7-jre-headless shared-mime-info ttf-dejavu-core
  ttf-dejavu-extra tzdata-java x11-common

-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlr0XYMACgkQHPApP6U8
pFhldRAAr5/LCc8ufhGjMWbtK2GrUcinD/znUq8QV4CGjfBclg+FHGAMQ00Er8na
TTeY4shHdsdHq82sEzTXX7ySoeKV4pd6+LbsxT2q0uzTIoXbXFdpsPThBESaKTNB
BvdA+T7CHpknE7zH4b0ebxiCnWlQM5VkDgKK/bgte2IXoK7y1iXxh30id3DET1qo
e9i96umNDSZ6Ik7s03rK/JoU6j1EHCz+80mbERWSie/z9T/6+avCcYp3fB570ue9
aysX8yzBhiwp+YFiEJ9cDlOrccmC4vaWgZgRHRWnbIlvnPQys4pq+qSHSqU22iy7
e1HBob0f6ZN1yK1gM8UC29w4XVwDF6CCh+xlH5arvoX38ucNvhOVj2EPyUY2sLAy
uEsqwhjDPRphYLRoMiis/3RV9MksvbUs+HOIFhciFB7OnOd4MsQA5a9VJi8txeVA
adLEoAYKZw0u9wvue/J5481aja+JPBJwE9f5zbTCliTK9Ojk2FKY8syB6FYs2qvX
42Epr7eaj22gxEMrektH0WcH+keSg6fzzPh9QypNHRYGjSDsbDkyoa/cFRdVHt4D
NrvvaGMFhf1/KzQFVvsiVo5zBF5xPzh9EQBu3HhIb7yQFdKTuCx2mxgnJ4rOl7pg
twXGB+oRTQeT70LxEDN4ozUgAe/dT7CCtj3LPoWK8yRylvzReWc=
=ExdU
-END PGP SIGNATURE-

Sorting using "packed" fields?

2018-04-16 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I have documents that need to appear to have different attributes
depending upon which user is trying to search them. One of the fields
I currently have in the document is called "latest_submission" and
it's a multi-valued text field that contains fields packed with a
numeric identifier prefix and then the real data. Something like this:

101:2018-04-16T16:41:00Z
102:2017-01-25T22:08:17Z
103:2018-11-19T02:52:28Z

When searching, I will know which prefixes are valid for a certain
user, so I know I can search by *other* fields and then pull-out the
values that are appropriate for a particular user.

But if I want Solr/Lucene to searcg/sort by the "latest submission", I
need to be able to tell Solr/Lucene which values are appropriate to
use for that user.

Is this kind of thing possible? I'd like to be able to issue a search
that says e.g.:

  find documents matching name:foo sort by latest_submission starting
with ("102:" or "103:")

I'm just starting out with this data set, so I can completely change
the organization of the data within the index if necessary.

Does anyone have any suggestions?

I've seen some questions on the list about "child documents", and it
seems like that might be relevant. Right now, my input data looks like
this:

{
  { "name" : "document name",
"latest_submission" : [ "prefix:date", "prefix:date", etc. ]
  }
}

But that could easily be changed to be:

{
  { "name" : "document name",
"latest_submission" : { "prefix" : "101",
"date" : "[date]" },
  { "prefix" : "103",
"date" : "[date]" },
  }
}


Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrVDAQACgkQHPApP6U8
pFj7Mw//dnM0ZMRhbvAlMptYSH3LEj08I0l/oJWMQYilWOIltpZ148QOJp+5Iqu/
Q9uYfkItdv0Fw77Ebgtmm7N5PUzH7utiyDfKNayvL9d9+MtfFzx4CKPyqdjNDXvC
2LLUks9ABTX93h7AUdeO5rM4NsPci6LMY8dcxU6fbVDbDT5nYTRULUrbGfDxmY6E
SyMwk25DOzmrIoFCOJcyhuluvHhax753mOQCCljuFaCM3J8ap0+2ZqX8Nl5D2NLz
CqU5ROTGxm+qMVQ8dbqhT6LRdbjj6KqazutOxZl+H+Ix6yVeWZG/9TiAtkKZklvJ
6wjMB2te4utj35YPhpMkghkIYwo7s6jt9DXyBaf2gv1fbiNKmvPN2eqhsI870f0t
UmknH8Atx3ygeru3ddjIvb2Fn17E7EpKHWxkmmrexKE8uzCo9Ith6BWqL8ae19o/
LtBQ7RNCNjIbyNk3GcUJmvboM+PAAvUWbnpwQ4V2oI8b5sO9zeopE4JlzbWmG89H
WVmtPpIdw0H8AwLNbJuGaaksY5ZIcYg2iFH56BHvvu1ri3ArSgcQuyHfxEZD7gs3
cjh+mX9QEgbCVrz2i0CwRkgAMMIffG2SjBsHhUs5ESYqeskkDcyFDi70Q+5wNJ71
GhAESSbgpI31lpbhkGwh7gdXiJyKJG3EMFDEEZVN5sLhFYv96Q8=
=V+EE
-END PGP SIGNATURE-

Re: Appropriate field type for date-without-time

2018-04-16 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 4/15/18 4:49 PM, Shawn Heisey wrote:
> On 4/15/2018 2:31 PM, Christopher Schultz wrote:
>> I'd usually call this a "date", but Solr's documentation says
>> that a "date" is what I would call a timestamp (including time
>> zone).
> 
> That is correct.  Lucene dates are accurate to the millisecond.
> They don't actually handle timezones the way you might be thinking
> -- the information is UTC.  When using date rounding (NOW/WEEK,
> NOW/DAY, etc) you can tell Solr what the timezone is so that the
> boundaries are correct, but the information in the index is UTC.
> 
>> https://lucene.apache.org/solr/guide/7_3/field-types-included-with-so
lr.
>>
>> 
html
>> 
>> [ I remember reading but cannot currently seem to find a
>> reference page with the actual pre-defined field types Solr ships
>> with. That page above lists the class names, but not the aliases
>> used by a real Solr installation.
> 
> That info is what you need to define the fieldType in the schema.
> So you would put something like "solr.DatePointField" as the
> class.

What about the "standard" aliases for existing fieldTypes? I remember
reading a page where "int" versus "pint" were compared, but I can't
seem to find that, now.

>> Is there an existing appropriate field type for
>> "date-without-time"?
> 
> The answer to this question is not yes, but it's also not no.  All
> date types in Solr have millisecond precision.

Okay, so if I want to have a date-without-timestamp, I'll either need
to set all timestamps to 00:00:00 or invent something like
pint-encoded-date, right?

> But if you use DateRangeField, you can deal with larger time
> periods.  A query like "2018" actually works.  At both query and
> index time, the less precise syntax is translated internally to a
> *range* before the query or indexing happens.

Sounds like wasting a little space with 00:00:00 timestamps is
probably the way to go. Even if using pint would be equivalent (and
perhaps even a little more efficient), I think using a "real" date
field is more appropriate.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrUoEoACgkQHPApP6U8
pFj4lBAAzBSwzlq/mYpK9KraK3UkRhvDfQY5Tk9UpjaDvigROMks5oaGUybZmYLa
6oIguO+xwrMpYU08X3RCtDMPkJKFxXcQhj4x3zgMj/JM2FaCjgkWMsE1oU+68qKB
Ad4HMMqPsmDuG22zcXJWlMLNIfgZk89u2c97Tt/eWvtUYMnZMjT+6CfA43z8JRnM
i8ixDaEl7TZVDD3G4YW/cXCQacpIPmynMOH60gng5ylC04nMLCQyvf3zV0WB7X+t
JTGEjGmMENJhqVq3PnH6VYjGeSU92c8/bbEf+us1nRkIjayEnA7Uv7L87l56viVY
3jpEvHPjGiluDpTfLRUQzaTvu7PUwL1MefmKYnri9NP+HB2v8AhGN+oCyRI/RM5r
hYMTOdyX9VcVOUF3DluWpOCpG9WaJaEfT6ifw6bifNQpWG9lj6B8zxAfGGWRL9dU
iOOCBYwDioYaolRz6oIcTny22/mm3SE4IXGkrH9C2U9WU/nUFhWEjqbw4MWF0ten
0RSJ8coj05fsFdA0A1owA2wOqXuJGmaMfNjZiPR05ucgIFaM0MxgIyFzNeMGxKSd
aUp5EfrS2EHa23DDgsMF0i7C5KTw/Xlzr0Y+9WWdSlRWtYGvBZThP261lJ/jHmpS
FcDsNz4Y5/V2XnNcp0ieD+RoaAMctiehFuzPu9h2awZcF25CGDI=
=vaBk
-END PGP SIGNATURE-

[OT] Re: What is the correct URL for POSTing new data?

2018-04-16 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 4/15/18 4:33 PM, Shawn Heisey wrote:
> On 4/15/2018 2:24 PM, Christopher Schultz wrote:
>> No, it wouldn't have. It doesn't read any configuration files
>> and guesses its way through everything. Simply adding HTTPS
>> support required me to modify the script and manually-specify the
>> URL. That's why I went through the trouble of explaining so in my
>> initial post.
> 
> Gotcha.  I haven't used SSL with Solr myself.  Nobody can get
> directly to the Solr servers, so we don't need it.  If somebody is
> able to penetrate our systems to the point where they can sniff
> Solr traffic, they will already have full access to things far more
> sensitive than our search index.

Not necessarily, but that depends entirely upon your environment. We
have a policy of "no privileged network positions" so we don't even
trust our "private networks". Someone at the data center could
inadvertently configure a switch port to suddenly join our VLAN or a
network plug might be incorrectly assigned, etc. So we don't want our
data flying around in a way that can be intercepted.

> I'll see what I can do about the documentation to make it clear
> that the URL given to the post tool needs the request handler
> path.

That would be great. Even poking-around in the Solr web UI doesn't
reveal that path because of all the javascript magic in the interface.

It's unreasonable to expect everyone to read source code in order to
learn how to use tools that don't require direct programming.

Let me take a step back and say that Solr in fact has great
documentation. There are evidently some things it lacks for the
uninitiated.

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrUn5wACgkQHPApP6U8
pFiN+hAAyyO69VE5ZLGFTk4ti2a4L2+Cvtgdag9GYIvUbX72Zhdwlu2OWBSLVbix
Ibx6XNYKfv88IFzYWrFhTQmPS7Ce35H5Wss2YNfSnGZBhbSrifkCDam06zFZlesH
HTSwrBFs32rTB41c4d6WrBR1wgSOirRsIQ4iDitoIRcGhDsdn3y4nANqoSp3/ZmM
hYJEZ57pa7+aon4hbXde5aYKs5NGqkvOg0XAvctscDSPifZ9sijOgwM7DmABoqit
9oUB5s9pvOt0eA1czhI+gAvgscXdReo8A2i2l1hFxGhvaZ0Xnl2OJqjkNSwhUfaB
J9sc/j/LYWSzapBFl6b9fDYAqjxIcwkLtlX/BOOwLzZWa0Gjnj3OkJSfO6pZjtC3
ZQkBC2a8cyBbx3OW7GyyTzCDKQdYceslXiyYvFiqAEJL5u1SpPfbD8l9XdoTRDzL
M+lsmq9NW7ZDDk5VCAzHr6WVrcTGVM9wZPy4lJ+Wi5sOA/VS8QrXP/J+lJg8blID
MhUCstVZHY9MT6NwQxYpfBb/Sc00/sksakhkdSt95GOEnUnz3cxiW/gqaYEq6b6q
LugrqUuLz9Iy+OVPRzIj7dT31JQERpLm1wELcbY0QutI2hPICkIaec5Pw/avdRBW
UmRrESPK7+zOly+j+WVy2noX2+Y6/orje4oP3ETTPRA1Ey4Y2Xs=
=2ntw
-END PGP SIGNATURE-

Appropriate field type for date-without-time

2018-04-15 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I'd usually call this a "date", but Solr's documentation says that a
"date" is what I would call a timestamp (including time zone).

https://lucene.apache.org/solr/guide/7_3/field-types-included-with-solr.
html

[ I remember reading but cannot currently seem to find a reference
page with the actual pre-defined field types Solr ships with. That
page above lists the class names, but not the aliases used by a real
Solr installation.

For example, if I want to store an integral numeric value, I know I
want to use "pint", but can't actually find the reference for that. ]

I have dates that have no timestamps on them, and I'd like to store
them and probably sort by them. I'm not sure whether we would care to
search for documents whose date fields are within a certain range,
etc. at this point.

I could convert the date into a number e.g. 20180415 for today and
simply store it as a "pint", but that might, ahem, surprise someone
looking at documents in the collection and expect that an obvious
"date"-oriented field was in fact an int. Also, the year 1 bug
will rear its ugly head many generations from now.

Is there an existing appropriate field type for "date-without-time"?

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrTtr8ACgkQHPApP6U8
pFgulw//VEY527dP/rSvar7Q/XK6lXBNNrl3C7QOse1WlZq27+WRy7A4JwgfKzaR
gvIvLCIytDBznI+Xye72iLuyYnbKn92OLv4sz/jazQfIK9qwlEIRe0ZDKqWZI8k0
CNz3HrfKC5o4Qe84H8dj91PK8U00Q2EGjHe/WY2yS0vYhs4bp4xaVM0Ks2VcRvo1
Jw1DyPwyODTPEQRQ0DdowE6InIJzJ2r+A6OrexvRUMng6AldbOKJjanqgSbZf5lF
07+nnT5Raejs3pIQCbyrCWuxOMGiTsR5rxYy8TTlnUdyqgRChDEaJD4tFBFv/sis
ez03T3EsIBz6Ha4BLhFRLhtssjYX6+5gyrJUd32xaUYtvsQR0ca0iE9gzNBVXNzz
ZsRNGEmjOE3khJX4UL1MuGgQRbLlKfSunz/58HdXlzzmIG9LwryKj3G85diRYUmh
Ge9PUmjUg9u+VfzqgfFqO3Mf1FhQkW/ejAli7I3N8hHk81Iyvhdm+eqyuhq5GFNy
U7Kxmmg1DfJIumXu+4jczUuN8TI+xanvB2yiTgsycbIfGAL5LRMoRi/yN8+DhaUX
HOvGhWprzzuNb+AM4heLq/dAk2vD/zWK91Vc2YLAy9/W/WW9xeoIzRLvb32y6oq7
OVUuni0IjVzphLJOgfZOtCBUdAKWAwSMOohJ6+v7GcAW1xBzSP4=
=cgb/
-END PGP SIGNATURE-

Re: What is the correct URL for POSTing new data?

2018-04-15 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 4/13/18 6:02 PM, Shawn Heisey wrote:
> On 4/13/2018 7:49 AM, Christopher Schultz wrote:
>> $
>> SOLR_POST_OPTS="-Djavax.net.ssl.trustStore=/etc/solr/solr-client.p12
>>
>> 
- -Djavax.net.ssl.trustStorePassword=whatevs
>> -Djavax.net.ssl.trustStoreType=PKCS12" /usr/local/solr/bin/post
>> -c new_core https://localhost:8983/solr/new_core
>> 
>> [time passes while bin/post uploads a very large file]
>> 
>> SimplePostTool version 5.0.0 Posting files to [base] url
>> https://localhost:8983/solr/new_core... Entering auto mode. File
>> endings considered are 
>> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp
,ots,rtf,htm,html,txt,log
>>
>> 
POSTing file new_core.json (application/json) to [base]/json/docs
>> SimplePostTool: WARNING: Solr returned an error #404 (Not Found)
>> for url: https://localhost:8983/solr/new_core/json/docs
> 
> The URL path (beyond the core name) it's ending up with is
> /json/docs, when it should be /update/json/docs.

Looks like that worked. I could find that nowhere in the documentation.

> If you hadn't given the command a specific URL, it probably would
> have figured out the correct URL on its own.

No, it wouldn't have. It doesn't read any configuration files and
guesses its way through everything. Simply adding HTTPS support
required me to modify the script and manually-specify the URL. That's
why I went through the trouble of explaining so in my initial post.

> The base URL for the post tool normally includes the /update path, 
> which is different than the base URL for something like 
> HttpSolrClient (in the SolrJ library).  Changing the handler path
> is done differently in SolrJ than it is with the post tool.
> 
> I know, we've violated that principle again. :)

;)

I don't mind all surprises. It's the ones that have zero documentation
that are the most surprising.

> The bin/post tool is a *simple* tool.  The java class that it calls
> is even named "SimplePostTool".  It is expected that most users
> will outgrow its functionality quickly and write their own indexing
> software that does whatever custom processing they require.  The
> tool doesn't get a lot of improvements because we don't intend it
> to be used as a production indexing mechanism.

I'm using it as a bulk-loading operation. I have no need in production
to completely bootstrap a document collection unless the existing one
has been trashed for some reason. Why bother writing my own client
that does the equivalent of "SELECT * FROM table" and then loop over
the ResultSet calling SolrJ's add-document method.

The SimplePostTool should be able to handle that for me, and if it
did, I'd have less code to babysit in perpetuity.

> If it does what you need, there's nothing wrong with production 
> usage, but you need to be aware that it doesn't have robust error 
> handling, which is usually pretty important for production.
I'm okay with terse error messages.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrTtQ4ACgkQHPApP6U8
pFi4iA/+MQ97WTAkA6t06PqJWEjbu948gJSS5gaVo3HZlTtLqmzT3/4HhypKolId
aVWEU4KpdGGyOp9N2nkc31Zg8Wu4eLRa0k3GaOJ146b9CgJmUqgedJi/6sDlAXFL
mM472eAxDhVRpZB2wGpXp8HZyVxbjOd/ggCVX5ln6vj8TaRfkdDlhWWTX4Bci/uQ
Ia3M50whXIMxKVHmNKLziIsSbvJ/Bt1/rPoz9CzSBDch665yFK+21cXz3u8dAMsv
fdseYYvJ53tnZi6i8xDlGxsTQFbbWpYNWefs0tQjQGLF67t33NNdX5oC6ihChVjD
OlAxh+sL0TX10eGq8Q+1nQcvyg87QAiipY2yDM3CnFxFLbfn/9rdn28mFxtsNIRd
YQyNsVJN2NNXEPzjAYZe9khsIouvioQlmeX0XWhmuQOPdLbO0otiEGNRtwyUhDnt
ytXwkZ70htwRrAh9UC6GFXwgLkMgTN2E4KRjnOBJCbHSYmjL6YAFPWeeAQFX9fW1
18BVNlsyi2Qyo+v86Jbl50Ld3+64UQukjvNCJn8v/uQJ1O8NT2qfcV6jAZ9Wj273
QSzg1eVCiycmKSL+12EojS4ksSmmBVEuMa4pmFimR2JNEYZnzjyO/egaGgIx2FmQ
Sar14gER2OCeI2dXkrRI8sIiLmOaJOatkHCf9lMebpcuyvq+un8=
=D+Pm
-END PGP SIGNATURE-

What is the correct URL for POSTing new data?

2018-04-13 Thread Christopher Schultz

All,

I've recently been encountering some frustrations with Solr 7.3 after
configuring TLS; since the command-line tools (which are a breeze to use
when you have a "toy" Solr installation) stop working when TLS is
enabled, I'm finding myself having to perform the following tasks in
order to get bin/post to work:

1. patch bin/post:

234,235c234,235
< echo "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}"
org.apache.solr.util.SimplePostTool "${PARAMS[@]}"
< "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}"
org.apache.solr.util.SimplePostTool "${PARAMS[@]}"
---
> echo "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}"
${SOLR_POST_OPTS} org.apache.solr.util.SimplePostTool "${PARAMS[@]}"
> "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}" ${SOLR_POST_OPTS}
org.apache.solr.util.SimplePostTool "${PARAMS[@]}"


2. Run the command with lots of manual options:

$ SOLR_POST_OPTS="-Djavax.net.ssl.trustStore=/etc/solr/solr-client.p12
-Djavax.net.ssl.trustStorePassword=whatevs
-Djavax.net.ssl.trustStoreType=PKCS12" /usr/local/solr/bin/post -c
new_core https://localhost:8983/solr/new_core

[time passes while bin/post uploads a very large file]

SimplePostTool version 5.0.0
Posting files to [base] url https://localhost:8983/solr/new_core...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file new_core.json (application/json) to [base]/json/docs
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
url: https://localhost:8983/solr/new_core/json/docs
SimplePostTool: WARNING: Response: 


Error 404 Not Found

HTTP ERROR 404
Problem accessing /solr/new_core/json/docs. Reason:
Not Found


SimplePostTool: WARNING: IOException while reading response:
java.io.FileNotFoundException:
https://localhost:8983/solr/new_core/json/docs
1 files indexed.
COMMITting Solr index changes to https://localhost:8983/solr/new_core...
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
url: https://localhost:8983/solr/new_core?commit=true
SimplePostTool: WARNING: Response: 


Error 404 Not Found

HTTP ERROR 404
Problem accessing /solr/new_core. Reason:
Not Found


Time spent: 0:00:04.710

I'm guessing that I just don't know what the URL is supposed to be for
that core. When browsing the web UI, I can examine the core here:

https://localhost:8983/solr/#/~cores/new_core

Solr reports:

startTime:a day ago
instanceDir:/var/solr/data/new_core
dataDir:/var/solr/data/new_core/data/

Index
lastModified:-
version:2
numDocs:0
maxDoc:0
deletedDocs:0
current: [check-mark]


So the core is there. I suspect I'm simply not addressing it correctly.
How should I modify the URL I pass on the command-line so that bin/post
can inject a new batch of data?

Thanks,
-chris

Re: Confusing error when creating a new core with TLS, service enabled

2018-04-11 Thread Christopher Schultz

Shawn,

On 4/10/18 10:16 AM, Shawn Heisey wrote:
> On 4/10/2018 7:32 AM, Christopher Schultz wrote:
>>> What happened is that the new core directory was created as root,
>>> owned by root.
>> Was it? If my server is running as solr, how can it create directories
>> as root?
> 
> Unless you run Solr in cloud mode (which means using zookeeper), the
> server cannot create the core directories itself. When running in
> standalone mode, the core directory is created by the bin/solr program
> doing the "create" -- which was running as root.

That is ... surprising.[1]

> I know that because
> you needed the "-force" option.  So the core directory and its "conf"
> subdirectory (with the config) are created by the script, then Solr is
> asked (using the CoreAdmin API via http) to add that core.  It can't,
> because the new directory was created by root, and Solr can't write the
> core.properties file that defines the core for Solr.

Okay, then that makes sense. I'll try running bin/solr as "solr" via
sudo instead of merely as root. I was under the mistaken impression that
the server kept its own files in order.

It also means that one cannot remote-admin a Solr server. :(

> When running Solr in cloud mode, the configs are in zookeeper, so the
> create command on the script doesn't have to make the core directory in
> order for Solr to find the configuration.  It can simply upload the
> config to zookeeper and then tell Solr to create the collection, and
> Solr will do so, locating the configuration in ZooKeeper.

Good to know, though I'm not at the stage where I'm using ZK.

> You might be wondering why Solr can't create the core directories itself
> using the CoreAdmin API except in cloud mode.  This is because the
> CoreAdmin API is *OLD* and its functionality has not really changed
> since it was created.  Historically, it was only designed to add a core
> that had already been created.

*snapping sounds from inside brain*

> We probably need to "fix" this ... but
> it has never been a priority.  There are bigger problems and features to
> work on.  Cloud mode is much newer, and although the Collections API
> does utilize the CoreAdmin API behind the scenes, the user typically
> doesn't use CoreAdmin directly in cloud mode.
> 
>> The client may be running as root, but the server is running as 'solr'.
>> And the error occurs on the server, not the client. So, what's really
>> going on, here?
> 
> I hope I've explained that clearly above.

You have. Running bin/solr as user 'solr' was able to create the core.

The way the installer and server work together is very unfortunate.
bin/solr knows the euid of the server and, if running under root/sudo
could easily mkdir/chown without crapping itself. Having installed a
"service" using the Solr installer practically requires you to run
bin/solr using sudo, and then it doesn't work. Is there a JIRA ticket
already in existence where I can leave a comment?

Thanks,
-chris

[1] https://en.wikipedia.org/wiki/Principle_of_least_astonishment

Re: Confusing error when creating a new core with TLS, service enabled

2018-04-10 Thread Christopher Schultz

Shawn,

On 4/9/18 8:04 PM, Shawn Heisey wrote:
> On 4/9/2018 12:58 PM, Christopher Schultz wrote:
>> After playing-around with a Solr 7.2.1 instance launched from the
>> extracted tarball, I decided to go ahead and create a "real service" on
>> my Debian-based server.
>>
>> I've run the 7.3.0 install script, configured Solr for TLS, and moved my
>> existing configuration into the data directory, here:
> 
> What was the *precise* command you used to install Solr?

$ sudo bin/install_solr_service.sh ../solr-7.3.0.tgz -i /usr/local/


> Looking for
> all the options you used, so I know where things are.  There shouldn't
> be anything sensitive in that command, so I don't think you need to
> redact it at all.  Also, what exactly did you add to
> /etc/default/solr.in.sh?  Redact any passwords you put there if you need to.


# Set by installer
SOLR_PID_DIR="/var/solr"
SOLR_HOME="/var/solr/data"
LOG4J_PROPS="/var/solr/log4j.properties"
SOLR_LOGS_DIR="/var/solr/logs"
SOLR_PORT="8983"

# Set by me
SOLR_JAVA_HOME=/usr/local/java-8
SOLR_SSL_KEY_STORE=/etc/solr/solr.p12
SOLR_SSL_KEY_STORE_PASSWORD=xxx
SOLR_SSL_KEY_STORE_TYPE=PKCS12
SOLR_SSL_TRUST_STORE=/etc/solr/solr-client.p12
SOLR_SSL_TRUST_STORE_PASSWORD=xxx
SOLR_SSL_TRUST_STORE_TYPE=PKCS12

>> When trying to create a new core, I get an NPE running:
>>
>> $ /usr/local/solr/bin/solr create -V -c new_core
>>
>> WARNING: Using _default configset with data driven schema functionality.
>> NOT RECOMMENDED for production use.
>>  To turn off: bin/solr config -c new_core -p 8983 -property
>> update.autoCreateFields -value false
>> Exception in thread "main" java.lang.NullPointerException
>>  at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:731)
>>  at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:642)
>>  at org.apache.solr.util.SolrCLI$CreateTool.runImpl(SolrCLI.java:1773)
>>  at org.apache.solr.util.SolrCLI$ToolBase.runTool(SolrCLI.java:176)
>>  at org.apache.solr.util.SolrCLI.main(SolrCLI.java:282)
> 
> Due to the way the code is written there in version 7.3, the exact
> nature of the problem is lost and it's not possible to see it without a
> change to the source code.  If you want to build a patched version of
> 7.3, you could re-run it to see exactly what happened.  Here's an issue
> for the NPE problem:
> 
> https://issues.apache.org/jira/browse/SOLR-12206

Thanks.

> Best guess about the error that it got:  When you ran the create
> command, I think that Java was not able to validate the SSL certificate
> from the Solr server.  This would be consistent with what I saw in the
> source code.

This particular scenario was that the solr client was trying to use HTTP
on port 8983 (because solr.in.sh could not be read with the TLS hints)
and getting a (broken) TLS handshake response. So it wasn't even an HTTP
response, which is probably why the client was (very) confused.

> For the problem you had later with "-force" ... this is *exactly* why
> you shouldn't run bin/solr as root.

Not running as root. I'm on the Tomcat security team. I'm obviously not
wanting to run the server as root.

$ ps aux | grep -e 'PID\|solr'
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
solr 18309  0.0  3.3 2148524 257164 ?  Sl   Apr09   0:22 [cmd]

File permissions make sense, too:

$ sudo ls -ld /var/solr/data
drwxr-x--- 3 solr solr 4096 Apr  9 15:06 /var/solr/data

$ sudo ls -l /var/solr/data
total 12
drwxr-xr-x 4 solr solr 4096 Mar  5 15:12 test_core
-rw-r- 1 solr solr 2117 Apr  9 09:49 solr.xml
-rw-r- 1 solr solr  975 Apr  9 09:49 zoo.cfg

> What happened is that the new core directory was created as root,
> owned by root.
Was it? If my server is running as solr, how can it create directories
as root?

> But then when Solr tried to add the core, it needed to write a
> core.properties file to that directory, but was not able to do so,
> probably because it's running as "solr" and has no write permission
> in a directory owned by root.
That makes absolutely no sense whatsoever. The server is running under a
single egid, and it's 'solr', not 'root'. Also, there is no new
directory in /var/solr/data (owned by either solr OR root) and if Solr
was able to create that directory, it should be able to write to it.

The client may be running as root, but the server is running as 'solr'.
And the error occurs on the server, not the client. So, what's really
going on, here?

> The error in the message from the command with "-force" seems to have
> schizophrenia.
I absolutely edited the log and failed to do so completely.

-chris

Re: Confusing error when creating a new core with TLS, service enabled

2018-04-09 Thread Christopher Schultz

All,

On 4/9/18 2:58 PM, Christopher Schultz wrote:
> All,
> 
> After playing-around with a Solr 7.2.1 instance launched from the
> extracted tarball, I decided to go ahead and create a "real service" on
> my Debian-based server.
> 
> I've run the 7.3.0 install script, configured Solr for TLS, and moved my
> existing configuration into the data directory, here:
> 
> $ sudo ls -l /var/solr/data
> total 12
> drwxr-xr-x 4 solr solr 4096 Mar  5 15:12 test_core
> -rw-r- 1 solr solr 2117 Apr  9 09:49 solr.xml
> -rw-r- 1 solr solr  975 Apr  9 09:49 zoo.cfg
> 
> I have a single node, no ZK.
> 
> When trying to create a new core, I get an NPE running:
> 
> $ /usr/local/solr/bin/solr create -V -c new_core
> 
> WARNING: Using _default configset with data driven schema functionality.
> NOT RECOMMENDED for production use.
>  To turn off: bin/solr config -c new_core -p 8983 -property
> update.autoCreateFields -value false
> Exception in thread "main" java.lang.NullPointerException
>   at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:731)
>   at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:642)
>   at org.apache.solr.util.SolrCLI$CreateTool.runImpl(SolrCLI.java:1773)
>   at org.apache.solr.util.SolrCLI$ToolBase.runTool(SolrCLI.java:176)
>   at org.apache.solr.util.SolrCLI.main(SolrCLI.java:282)
> 
> 
> There is nothing being printed in the log files.
> 
> I thought it might be because I enabled TLS.
> 
> My /etc/default/solr.in.sh (which was created during installation)
> contains the minor configuration required for TLS, among other obvious
> things such as where my data resides.
> 
> I checked the /usr/local/solr/bin/solr script, and I can see that
> /etc/default/solr.in.sh in indeed checked and run it readable.
> 
> Readable.
> 
> The Solr installer (reasonably) makes all scripts, etc. readable only by
> the Solr user, and I'm never logged-in as Solr, so I can't read this
> file normally. I therefore ended up having to run the command like this:
> 
> $ sudo /usr/local/solr/bin/solr create -V -c new_core

Actually, then I got this error:

WARNING: Creating cores as the root user can cause Solr to fail and is
not advisable. Exiting.
 If you started Solr as root (not advisable either), force core
creation by adding argument -force

When adding "-force" to the command-line, I get an error about not being
able to persist core properties to a directory on the disk, with not
much detail:

2018-04-09 19:03:14.796 ERROR (qtp2114889273-17) [   ]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error
CREATEing SolrCore 'cschultz_patients': Couldn't persist core properties
to /var/solr/data/new_core/core.properties :
/var/solr/data/new_core/core.properties
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:989)
at
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:90)
at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:358)
at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:389)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)
at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)

Confusing error when creating a new core with TLS, service enabled

2018-04-09 Thread Christopher Schultz

All,

After playing-around with a Solr 7.2.1 instance launched from the
extracted tarball, I decided to go ahead and create a "real service" on
my Debian-based server.

I've run the 7.3.0 install script, configured Solr for TLS, and moved my
existing configuration into the data directory, here:

$ sudo ls -l /var/solr/data
total 12
drwxr-xr-x 4 solr solr 4096 Mar  5 15:12 test_core
-rw-r- 1 solr solr 2117 Apr  9 09:49 solr.xml
-rw-r- 1 solr solr  975 Apr  9 09:49 zoo.cfg

I have a single node, no ZK.

When trying to create a new core, I get an NPE running:

$ /usr/local/solr/bin/solr create -V -c new_core

WARNING: Using _default configset with data driven schema functionality.
NOT RECOMMENDED for production use.
 To turn off: bin/solr config -c new_core -p 8983 -property
update.autoCreateFields -value false
Exception in thread "main" java.lang.NullPointerException
at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:731)
at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:642)
at org.apache.solr.util.SolrCLI$CreateTool.runImpl(SolrCLI.java:1773)
at org.apache.solr.util.SolrCLI$ToolBase.runTool(SolrCLI.java:176)
at org.apache.solr.util.SolrCLI.main(SolrCLI.java:282)


There is nothing being printed in the log files.

I thought it might be because I enabled TLS.

My /etc/default/solr.in.sh (which was created during installation)
contains the minor configuration required for TLS, among other obvious
things such as where my data resides.

I checked the /usr/local/solr/bin/solr script, and I can see that
/etc/default/solr.in.sh in indeed checked and run it readable.

Readable.

The Solr installer (reasonably) makes all scripts, etc. readable only by
the Solr user, and I'm never logged-in as Solr, so I can't read this
file normally. I therefore ended up having to run the command like this:

$ sudo /usr/local/solr/bin/solr create -V -c new_core

This was unexpected, because "everything goes through the web service."
Well, everything except for figuring out how to connect to the web
service, of course.

I think maybe the bin/solr script should maybe dump a message saying
"Can't read file $configfile ; might not be able to connect to Solr" or
something? It would have saved me a ton of time.

Thanks,
-chris

Re: Apache commons fileupload migration

2018-03-20 Thread Christopher Schultz

Shawn,

On 3/20/18 9:13 AM, Shawn Heisey wrote:
> On 3/15/2018 6:40 AM, padmanabhan1616 wrote:
>> Hi Team,We are using Apache SOLR-5.2.1 as index engine for our data
>> analytics
>> application. As part of this SOLR uses commons-fileupload-1.2.1.jar
>> for file
>> manipulation.There is security Vulnerability identified in
>> commons-fileupload library: *CVE-2016-131 Apache Commons FileUpload:
>> DiskFileItem file manipulation*As per official notice from apache
>> software
>> foundations this issue has been addressed in commons-fileupload-1.3.3.jar
>> and available for all the dependency vendors.*Is this good toupgrade
>> commons-fileupload from 1.2.1 to 1.3.3 version directly?*
> 
> Solr previously addressed two other vulnerabilites in
> commons-fileupload, both of them after the version you're running.
> 
> https://issues.apache.org/jira/browse/SOLR-9819
> https://issues.apache.org/jira/browse/SOLR-9053
> 
> One of these fixes just did a jar upgrade, but the other also included
> code changes.  So it looks like just replacing the jar with 1.3.3 MIGHT
> cause problems. The commons-fileupload dependency is only used in one
> place in Solr -- the multipart request parser.  I cannot tell what
> actually uses this functionality, though.  I suspect that whatever it is
> is not something really common.
> 
> Looking at the way that Solr uses DiskFileItem and related classes, I
> don't see any evidence that it actually uses serialization or
> deserialization, so I don't think Solr is vulnerable to the problem
> fixed in 1.3.3, but there are two other vulnerabilities that the version
> you're running has.  I haven't assessed whether Solr is vulnerable to
> either of those problems.

I think you are misunderstanding the attack vector. It doesn't matter
how Solr uses DiskFileItem. It matters how a running JVM will behave if
it is tricked into deserializing such an object.

Let me give an example:

1. Solr is running on a system in read-only mode (by whatever
definition) and therefore is not firewalled or anything like that.

2. No "normal" users have access to Solr due to process/file permissions.

3. JMX is enabled on the Solr instance, but only for the 127.0.0.1
interface.

This environment might seem to be "secure" because of the isolation
provided by the OS for files and processes, and the restriction of JMX
to the localhost interface.

But JMX uses RMI which uses serialization to marshal objects over the
wire to the JMX server. So an attacker can construct a malicious
serialized object (say, an Object[] which contains a DiskFileItem
somewhere in there) and merely the presence of the vulnerable
commons-fileupload library can be used to trick the JVM into executing
arbitrary code.

The fact that Solr itself doesn't use DiskFileItem is irrelevant.

> FYI: If only trusted admins and applications can reach the Solr server,
> then any remote vulnerability Solr has cannot be exploited unless
> somebody first breaches the security on something else that DOES have
> access to Solr.  If they manage to do that, they probably have access
> that's far more damaging than access to Solr would be.

This is absolutely true. But it's easy to overlook things that are
outside of the bounds of the application (like JMX/RMI) or in
little-used corners where deserialization is occurring for whatever
reason. It's those edge-cases that can get you into trouble.

The proper mitigation is to upgrade to the latest version of the
library. I suggested that the OP read the changelog because it describes
all the changes and whether or not they are backward-compatible.

-chris

signature.asc
Description: OpenPGP digital signature

Re: Question liste solr

2018-03-19 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Mariano,

On 3/19/18 11:50 AM, LOPEZ-CORTES Mariano-ext wrote:
> Hello
> 
> We have an index Solr with 3 nodes, 1 shard et 2 replicas.
> 
> Our goal is to index 42 millions rows. Indexing time is important.
> The data source is an oracle database.
> 
> Our indexing strategy is :
> 
> * Reading from Oracle to a big CSV file.
> 
> * Reading from 4 files (big file chunked) and injection via
> ConcurrentUpdateSolrClient
> 
> Is it the optimal way of injecting such mass of data into Solr ?
> 
> For information, estimated time for our solution is 6h.

How big are the CSV files? If most of the time is taken performing the
various SELECT operations, then it's probably a good strategy.

However, you may find that using the disk as a buffer slows everything
down because disk-writes can be very slow.

Why not perform your SELECT(s) and write directly to Solr using one of
the APIs (either a language-specific API, or through the HTTP API)?

Hope that helps,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqv7aEdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFgJrg//RushznZlTg60TxdE
s/XKK+69s9c0+DwZ/IrU366j2ZOcJl8Osu9TpzaCSEpdWuulFG8qCSYThTngaijH
I02YCqnK9Ey4+6B7u9QECWNXjdlQXoeINjCnRLVENWzkSmht/U2nW3WTFEPKOvQ3
6ISTPATFnfo6Wt4VYrVefqO/yCCiR5bGL5LsSZYwvqlh9egR8K/wtf4sQ5kji3z+
r2Z0gYpR9igE3ZCIByf6QGq0Ftku90oFCG+kCVNOdgfqwkUaMdc7krv92oTSH4o5
BH+trc2jPf3HKFmp/ywRAPEhAfA5BwbT8vB9gwl/6vuT6efAot7xrLqduF3h7jG6
ffPtkEBbD/ld3inIVta6/hnUwxX9O1fBtJrZegD14cezLV9QcEWFJ8/lUfgGOTdX
ZuvwxBFhmCXE9EMWLlpdUOWK9iVBsZoQZxawoqw9xQauBp/Adg29fdeXmEkUssey
85HGDv/x33Bcr1xPGa8nOygWcZRUgGFCh871qStg9GeTNx3C/mSk0wxdKeUDRePg
GEuL0p803yCJYAddyF66nnx676LfFeDaocBJelx5UbiteNT23xut7jWP/COyOvoy
tpq3c9UfIkobgcA7bZ3IL2Og+hExgo+tLQXiOx6bf2TD1Jk2UOWWk1TAUspuUybD
VH6PlwgqcrO28Jx799mJvpIotoE=
=aMPk
-END PGP SIGNATURE-

Recommendations for non-narrative data

2018-03-16 Thread Christopher Schultz

All,

I'm using Solr to index and search a database of user data (username,
email, first and last name), so there aren't really "terms" in the data
to search for, like you might search for words that describe products in
a catalog, for example.

I have set up my schema to include plain-old text fields for each of the
data mentioned above, plus I have a copy-field called "all" which
includes everything all together, plus I have a first + last field which
uses a phonetic index and query analyzer.

Since I don't need things such as term-replacement (spanner == wrench),
stemming (first name 'chris' -> 'chri'), and possibly other features
that I don't know about, I'm wondering what might be a recommended set
of tokenizer(s), analyzer(s), etc. for such data.

We will definitely want to be able to search by substring (to find
'cschultz' as a username with 'schultz' as input) but some substrings
are probably useless (such as @gmail.com for email addresses) and don't
need to be supported.

What are some good options to look at for this type of data?

In production, we have fewer than 5M records to handle, so this is more
of an academic exercise than an actual performance requirement (since
Solr is at least an order of magnitude faster than our current
RDBMS-searching implementation).

If it makes any difference, we are trying to keep the index up-to-date
with all user changes made in real time (okay, maybe delayed by a few
seconds, but basically realtime). We have a few hundred new-user
registrations per day and probably half as many changes to user records
as that, so perhaps 2 document-updates per minute on average (during ~12
business hours in the US on weekdays).

Thanks for any advice anyone may have,
-chris



signature.asc
Description: OpenPGP digital signature

Re: Apache commons fileupload migration

2018-03-15 Thread Christopher Schultz

To whom it may concern,

On 3/15/18 8:40 AM, padmanabhan1616 wrote:
> Hi Team,We are using Apache SOLR-5.2.1 as index engine for our data analytics
> application. As part of this SOLR uses commons-fileupload-1.2.1.jar for file
> manipulation.There is security Vulnerability identified in
> commons-fileupload library: *CVE-2016-131 Apache Commons FileUpload:
> DiskFileItem file manipulation*As per official notice from apache software
> foundations this issue has been addressed in commons-fileupload-1.3.3.jar
> and available for all the dependency vendors.*Is this good toupgrade
> commons-fileupload from 1.2.1 to 1.3.3 version directly?* Please suggest us
> best way to handle this. Note  - *Currently we don't have any requirements
> to upgrade solr, So please suggest best way to handle  this vulnarability
> without upgrade entire SOLR.*Thanks,Padmanabhan

Have you read the changelog?[1]

-chris

[1] https://commons.apache.org/proper/commons-fileupload/changes-report.html



signature.asc
Description: OpenPGP digital signature

Re: Including a filtered-field in the default-field

2018-03-12 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Erick,

(Sorry... hit sent inadvertently before completion...)

On 3/12/18 2:50 PM, Erick Erickson wrote:
> Something like:
> 
> solr/collection/query?q=chris shultz=edismax=all^10
> phonetic

Interesting. Looks like the "qf=all phonetic" would take the place of
my existing "df=all" parameter.

> The point of edismax is to take whatever the input is and
> distribute it among one or more fields defined by the "qf"
> parameter.

That's an entirely lucid explanation. That's not evident from reading
the official documentation :)

> In this case, it'll look for "chris" and "shultz" in both the
> "all" and "phonetic" fields. It would boost matches in the "all"
> field by 10, giving you an easy knob to tweak for "this field is
> more important than this other one".

Cool, like "if I spell it exactly right, I want that result to float
to the top"?

> You can combine  "fielded" searches, something like: 
> solr/collection/query?q=firstName:chris 
> shultz=edismax=all phonetic
> 
> would search for "shultz" in the "all" and "phonetic" fields while 
> searching for "chris" only in the "firstName" field.

Perfect.

> As you have noticed, there are a _lot_ of knobs to tweak when it
> comes to edismax, and the result of adding =query to the URL
> can be...bewildering. But edismax was created exactly to spread the
> input out across multiple fields automatically.
> 
> You can also put these as defaults in your requesthandler in 
> solrconfig.xml. The "browse" handler in some of the examples will
> give you a template, I'd copy/paste from the "browse" handler to
> you main handler (usually "selsect"), as the "browse" handler is
> tied into the Velocity templating engine

Since I'm taking the query from the user in my backend Java to convert
it into a Solr call, I'm comfortable doing everything in the Java code
itself. I'd actually rather not have too much automated stuff, because
then I think I'll confuse myself when using the Solr dashboard for
debugging, etc.

> To start, since there are a lot of parameters to tweak, I'd just
> start with the "qf" field (plus some boosts perhaps). Then move on
> to pf, pf2, pf3. mm will take a while to get your head around all
> by itself. I think once you see the basic operation, then the rest
> of the parameters will be easier to understand.
> 
> And I urge you to take it a little at a time, just use two fields
> and two terms and look at the result of =query, the parsed
> query bits, 'cause each new thing you add adds a further
> complication. Fortunately you can just put different parameters on
> the URL and see the results for rapidly iterating.

Exactly :)

Thanks for the hints,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqm//AdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFhn2Q/+KgmjtAbKbak3qSB9
eHqNz58HS1TQ5XAosMw5WvWikqPcSH+rWVyOQfk+UPNNnI/lsK9dt1Tqpg3LPSHd
cdJFEweoWQWhqWkj5lYj+/cJHcuS2Bd4TP3wOuAIdm7heP3iHVsjfRS7YodRVGCn
JRbmiJBmtSlw1K+leMf4IF4kkBCzDEuZU/LcKfzyU3VoNORwtGYGHq9EXxaDtFyh
0v8v8PJWGHXgAKxdCf9a1qK9Jb40mTciGIhEQ1V083sN4U/Dieq+u9/VCVTzqlwC
KuZ9YWSA58Pqx3biJYwNrjJJITFRFZT4C/TNKeiDENe53n3fL+HsSAhxs2RDvLO0
qK3NXN75B32gLZi7n/+s0SCqQcJeV/HlomLjHeB+0bUTi9Mwwqng7qoaJ49FIdjq
N4lgjVLJMZmp87m883PlLev0ZXrTuoX/QRj4a5xh7tENfQ3StoUz0cC0D8GDO+XO
WERL5p98KZtfca95SHAQSK41H74O5AbfG/h85iZitRQaM4mYt/cs5DAdGif9T4+z
ZDzKgk1kutsTKDRyFZM6qK1O/K+9mk8ye6op+RGCYRr5qbJZpgwgUO8Vl+kOgLS7
WljUkmLbOGsGo8a2pJNJ481OhD3e+C5pa+SFGaxtYT7GBiuGJ/y8LA4HqtXzd+k3
wiHOJ0Bixyo1T4aEjbGZ+tFTOTM=
=ehg4
-END PGP SIGNATURE-

Re: Including a filtered-field in the default-field

2018-03-12 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Erick,

On 3/12/18 2:50 PM, Erick Erickson wrote:
> Something like:
> 
> solr/collection/query?q=chris shultz=edismax=all^10
> phonetic

Interesting. Looks like the "qf=all phonetic" would take the place of
my existing "df=all" parameter.

> The point of edismax is to take whatever the input is and
> distribute it among one or more fields defined by the "qf"
> parameter. In this case, it'll look for "chris" and "shultz" in
> both the "all" and "phonetic" fields. It would boost matches in the
> "all" field by 10, giving you an easy knob to tweak for "this field
> is more important than this other one".
> 
> You can combine  "fielded" searches, something like: 
> solr/collection/query?q=firstName:chris 
> shultz=edismax=all phonetic
> 
> would search for "shultz" in the "all" and "phonetic" fields while 
> searching for "chris" only in the "firstName" field.
> 
> As you have noticed, there are a _lot_ of knobs to tweak when it
> comes to edismax, and the result of adding =query to the URL
> can be...bewildering. But edismax was created exactly to spread the
> input out across multiple fields automatically.
> 
> You can also put these as defaults in your requesthandler in 
> solrconfig.xml. The "browse" handler in some of the examples will
> give you a template, I'd copy/paste from the "browse" handler to
> you main handler (usually "selsect"), as the "browse" handler is
> tied into the Velocity templating engine
> 
> To start, since there are a lot of parameters to tweak, I'd just
> start with the "qf" field (plus some boosts perhaps). Then move on
> to pf, pf2, pf3. mm will take a while to get your head around all
> by itself. I think once you see the basic operation, then the rest
> of the parameters will be easier to understand.
> 
> And I urge you to take it a little at a time, just use two fields
> and two terms and look at the result of =query, the parsed
> query bits, 'cause each new thing you add adds a further
> complication. Fortunately you can just put different parameters on
> the URL and see the results for rapidly iterating.
> 
> Best, Erick
> 
> 
> On Mon, Mar 12, 2018 at 11:30 AM, Christopher Schultz 
> <ch...@christopherschultz.net> wrote: Erick,
> 
> On 3/12/18 1:36 PM, Erick Erickson wrote:
>>>> Did you try edismax?
> 
> Err no, and I must admit that it's a lot to take in. Did you
> have a particular suggestion for how to use it?
> 
> Thanks, -chris
> 
>>>> On Mon, Mar 12, 2018 at 10:20 AM, Christopher Schultz 
>>>> <ch...@christopherschultz.net> wrote: All,
>>>> 
>>>> I have a Solr index containing application user information 
>>>> (username, first/last, etc.). I have created an "all" field
>>>> for the purpose of using it as a default. It contains most
>>>> but not all fields.
>>>> 
>>>> I recently added phonetic searching for the first and last
>>>> names (together in a single field) but it will only work if
>>>> the query specifies that field like this:
>>>> 
>>>> chris or phonetic:schultz
>>>> 
>>>> Is there a way to add the phonetic field to the "all" field
>>>> and have it searched phonetically alongside the non-phonetic 
>>>> fields/terms? I see I cannot have multiple "default fields"
>>>> :)
>>>> 
>>>> I know on the back-end I can construct a query like this:
>>>> 
>>>> all:[query] phonetic:[query]
>>>> 
>>>> ...but I'd prefer to do as little massaging of the query as 
>>>> possible.
>>>> 
>>>> Thanks, -chris
>>>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqm/vIdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjWfhAAumgA99yS8li4nT+r
FLtssQ4bitBaMk3QtJQCaHULV90OHYWtL2h0JDewsve+1QP1pUU0R2d/uzHdAF5G
Nzo1bqqUCkt11NwsOC4Fe7B06ZCYTlUf7r/qWiNI2fJmRy7Wit8+6qqtMuXNpGMr
U+dgsiB9Wn6ygfuDKsMU8++MIxPT908Tu2wDarTTRQ6DvGyGucuRMf8ItYKklBIv
I4pDuuS5UY8CpZcIN8bw8Hm7rbfXskC12Lezk81QDnNimbC4u8J9uinReqpWGzC0
d2VsNDkKONBzjGaeUwvtyBLJrEXqWn9F75nSq8PYC/eOalEO8iO9pwolmbtnJ3na
VRE8TjsuapoOTYZEZcxbw39/U0gCcO4Ns5Fs3W405gA5ouQ5qnOKPHnk5hRxAEBo
QW/31n+mXsjt3S8EzRtlCwyXcVykyyafS5exzzZqgx4j8hJMw3zfHUkF8oJC5nyt
f5Tvk/8w5epe/3xKSeAf+5QTtAT5/5DftYiOvMqraTxwVuO/d4QeehAMtzUVCdm8
8JXpAbvp9HiADt52fP8YqQwMs4aX3cbA8SeDaVMZK130XM72hcg7ykspwnEejE8a
DIRbHG4I5Z6F4M9mTBudKbJhhDzkWYjo5mAjnd+1apOv1wKH1qELz7XSevVsyQb0
SI+WDftPFpdJ8bAu5XTGvs5TOpw=
=RM6r
-END PGP SIGNATURE-

Re: Including a filtered-field in the default-field

2018-03-12 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Erick,

On 3/12/18 1:36 PM, Erick Erickson wrote:
> Did you try edismax?

Err no, and I must admit that it's a lot to take in. Did you have
a particular suggestion for how to use it?

Thanks,
- -chris

> On Mon, Mar 12, 2018 at 10:20 AM, Christopher Schultz 
> <ch...@christopherschultz.net> wrote: All,
> 
> I have a Solr index containing application user information
> (username, first/last, etc.). I have created an "all" field for the
> purpose of using it as a default. It contains most but not all
> fields.
> 
> I recently added phonetic searching for the first and last names 
> (together in a single field) but it will only work if the query 
> specifies that field like this:
> 
> chris or phonetic:schultz
> 
> Is there a way to add the phonetic field to the "all" field and
> have it searched phonetically alongside the non-phonetic
> fields/terms? I see I cannot have multiple "default fields" :)
> 
> I know on the back-end I can construct a query like this:
> 
> all:[query] phonetic:[query]
> 
> ...but I'd prefer to do as little massaging of the query as
> possible.
> 
> Thanks, -chris
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmxzEdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFiejxAAwYVKaztnIESfGTik
W2AVf1b/gOKp4WKzOK4Jg37Qjmguzfm2yPuD38gauP62ph5JPKeWjIAU4JHi/O8Y
U5tWO2AuJBC/l7+t++vDTITP1iwFS1c7iXCpaMpZ16Ji3ScuKW5ZV2vMIioLXw6U
9BH5xtgh63D2JUyse89LzNVQULBENZx2uzAz7+q7ne7LYdWSR8949ry0EUOssVX+
HEV1Be2QmmLUVHg1bSsTt678mrqV8EGm8Z1pf7WOBK6OJKA36iRTSxlyboShMP5D
5OnIfUoL+HxwIOnjevAmZU4zDZVllXdBme66xF/WT8+HP3NEWqBMDzcRlO8M1TPe
yY9Y0By9cpkIXasm2uVYZzmUy5Hb+CcQOUXKLrvxqkE018+iey49pivr7ne2+W+B
m0XF7qha4zPBT3onZNm2iDqNuMbVv1443aaAMNjh/E6RwsEgJ6PKSZjMKxM6QEdj
5Jy6dUGqhkQHgFfEa6srz1XHSbL5vwPyH3WQNv1pIRuvASbdfEinZPNmiPM36iMi
itlO9HulZ80/It7aQR9llqt10bEK0vh7CzN7EzI6Yu+st1g/uVHeg8GMupNYd0u0
/1t0NQafKjaf1UTO5ubLN/QiDSA1NYVNHkAczbnNzdHj0Z7sI5l2fkNWWxFzikQi
RuYSZRH+EG4C2ey1f8vkxlrwIwE=
=lb3e
-END PGP SIGNATURE-

Including a filtered-field in the default-field

2018-03-12 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I have a Solr index containing application user information (username,
first/last, etc.). I have created an "all" field for the purpose of
using it as a default. It contains most but not all fields.

I recently added phonetic searching for the first and last names
(together in a single field) but it will only work if the query
specifies that field like this:

  chris or phonetic:schultz

Is there a way to add the phonetic field to the "all" field and have
it searched phonetically alongside the non-phonetic fields/terms? I
see I cannot have multiple "default fields" :)

I know on the back-end I can construct a query like this:

  all:[query] phonetic:[query]

...but I'd prefer to do as little massaging of the query as possible.

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmtt8dHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFggzQ//SRglgqRRvh9GgwCk
A2xAjiMZSnf6iYsVJDMPlP1IhlLVrgPH5gm3tKWbOHdf52s3RPaWXHlDE3PAuHtx
gpDPOief+mo4X/XmlWnj8R461XahOAhEDKeJ6uIS4X2qR2hZvSQd+gXMN4/aoA/l
BcRNSTiQGKUVDDj+3wZayFhjElrrfaDbWC2dwnM2ULMrgK7xyhnulCVjUV+hOswY
AmMyTuDJNjPuiT867x8Cckoh8J468OkBtQUUkdHn9UiHwShD8TxaDSUVcpqyxihr
oODmvLfPN6EIkv0CN4h/pNrRCvQBlNTSeIh2AqFk7rnD4W0nWSzRXcV5seyGushI
pzPvebtIsYcxx+DU7d/4jqH42yba9fADPFa+xhHckbwY4e2lZxvKF7HpLu6aoVnH
zhCl/Cdu3cwZPmDWsUX+3Xkb7r28pe1iUrdNoYrbrhfL2WR6xJX5hL32QQyoyy8V
w/SU8XLgETYSe0oN773F/Lxjwf2AVATdKY9acrsKr+KSI5VFEFBHNJctkpk0o630
OOI1FYencsbiCdIVoRQ5b94EU/iAqs7r8wi6OdyeawyPOFDZhFwFMeYUPasuWrCP
MEB0iSbCI9OIG5pi6tiTtbOQQ85Qb41u2VcyOqKHkneEMn58/nx2QY0FiS3XvcC7
HuZC0A7VssLRu2g5+joWp4NBILI=
=yyJm
-END PGP SIGNATURE-

Re: Defining a phonetic analyzer and searcher via the schema API

2018-03-12 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Erick,

On 3/12/18 1:00 PM, Erick Erickson wrote:
> bq: which you aren't supposed to edit directly.
> 
> Well, kind of. Here's why it's "discouraged": 
> https://lucene.apache.org/solr/guide/6_6/schema-api.html.
> 
> But as long as you don't mix-and-match hand-editing with using the 
> schema API you can hand edit it freely. You're then in charge of 
> pushing it to ZK and reloading your collections that use it
> yourself however.

No Zookeeper (yet), but I suspect I'll end up there. I'm mostly
toying-around with it right now, but it won't be long before I'll want
to go live with it and having a single Solr instance isn't going to
help me sleep well at night. I'm sure I'll end up with two instances
to begin with, which requires ZK, right?

> As a side note, even if I _never_ hand-edited it I'd make it a 
> practice to regularly pull it from ZK and put it in some VCS system
> ;)

Actually, I have the script that builds the schema in VCS, so it's
roughly the same.

As for the schema modifications... did I get those right?

Thanks,
- -chris

> On Mon, Mar 12, 2018 at 9:51 AM, Christopher Schultz 
> <ch...@christopherschultz.net> wrote: All,
> 
> I'd like to add a new synthesized field that uses a phonetic
> analyzer such as Beider-Morse. I'm using Solr 7.2.
> 
> When I request the current schema via the schema API, I get a list
> of existing fields, dynamic fields, and analyzers, none of which
> appear to be what I'm looking for.
> 
> Conceptually, I think I'd like to do something like this:
> 
> add-field: { name: phoneticname, type: phonetic, multiValued: true
> }
> 
> ... but how do I define what type of data "phonetic" should be?
> 
> I can see the example XML definition in this document: 
> https://lucene.apache.org/solr/guide/7_2/filter-descriptions.html#Filt
er
>
> 
Descriptions-Beider-MorseFilter
> 
> But I'm not sure how to add an analyzer to the schema using the
> schema API:
> https://lucene.apache.org/solr/guide/7_2/schema-api.html
> 
> Under "Add a new field type", it says that new analyzers can be 
> defined, but I'm not entirely sure how to do that ... the API docs 
> refer to the field type definitions page[1] which just shows what
> XML you'd have to put into your schema XML -- which you aren't
> supposed to edit directly.
> 
> When looking at the JSON version of my schema, I can see for
> example thi s:
> 
> "fieldTypes":[{ "name":"ancestor_path", "class":"solr.TextField", 
> "indexAnalyzer":{ "tokenizer":{ 
> "class":"solr.KeywordTokenizerFactory"}}, "queryAnalyzer":{ 
> "tokenizer":{ "class":"solr.PathHierarchyTokenizerFactory", 
> "delimiter":"/"}}},
> 
> So should I create a new field type like this?
> 
> "add-field-type" : { "name" : "phonetic", "class" :
> "solr.TextField",
> 
> "analyzer" : { "tokenizer": { "class" :
> "solr.StandardTokenizerFactory" },
> 
> "filters" : [{ "class": "solr.BeiderMorseFilterFactory", 
> "nameType": "GENERIC", "ruleType": "APPROX", "concat": "true", 
> "languageSet": "auto" }] } }
> 
> Then, use copy-field as "usual":
> 
> "add-field":{ "name":"phonetic", "type":"phonetic", multiValued:
> true, "stored":false },
> 
> "add-copy-field":{ "source":"first_name", "dest":"phonetic" },
> 
> "add-copy-field":{ "source":"last_name", "dest":"phonetic" },
> 
> This seems to work but I wanted to know if I was doing it the right
> way.
> 
> Thanks, -chris
> 
> [1] 
> https://lucene.apache.org/solr/guide/7_2/field-type-definitions-and-pr
op
>
> 
erties.html#field-type-definitions-and-properties
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmtY4dHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFhdIA/9GkZ/yimVmkwB725L
uS4kcy4YJowyYw+eMtvurpIq/ZV/U8H4hFJY/ddsT+bdrjeZMsTdc7B9Tdlha8xt
dmuj1VcvDn3uyIUGooTOob6ZvZwjeJEZIJrbwUM5gNq7uJW8xpCU0/3+iP6Km7OY
1Nia5uCuwarLWcsRFdtjCvR3M7ZppBYHec3kVGGOUL637AC6ISgpxhuzOnuTHAss
wCjuR1y6AdTjRbHpis3MJdiVIjEENfyzGpEnqvumsu1e+0F/A0DNbhU9nAPv+73d
aOLfOW9Fs6jjnq96qzIBAkHLWkqU1GHKYNYHql7/59x8rFcjGkGC7ziSY69lKc+f
ivrIEqLH1Go7kawz+1og3dPyl/n0CFWE3UK+wj5QeTY5XLduq0x6EmFKW6D790BS
ywmFuqr4cmvKbs3N6BbxHz5QVbjgRsWO4jp4kJi3KDCepd8vKW+2xwHfX/zAcBKY
rSDuVkM3KtxQal8xgm4tsvyU3g1dXpNEVa7PFXYJzd3uA2yij9OU6s83NS9LHK3N
2zssPfNDj7QddAEhYan0O4r4wSUN2UNT9nMhBVXXYRpoD6WzrhC5TdRUDh66rkOB
AvhAUKsV0rfjct+MUBpQA9W+SUG7i911wNSBJJmB58MYbyxMAJb8NKGk1yEs1MyH
FQHEgiEEFRCD9ZFd/fqwfuPyKQo=
=Vqz6
-END PGP SIGNATURE-

Defining a phonetic analyzer and searcher via the schema API

2018-03-12 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I'd like to add a new synthesized field that uses a phonetic analyzer
such as Beider-Morse. I'm using Solr 7.2.

When I request the current schema via the schema API, I get a list of
existing fields, dynamic fields, and analyzers, none of which appear
to be what I'm looking for.

Conceptually, I think I'd like to do something like this:

add-field: { name: phoneticname, type: phonetic, multiValued: true }

... but how do I define what type of data "phonetic" should be?

I can see the example XML definition in this document:
https://lucene.apache.org/solr/guide/7_2/filter-descriptions.html#Filter
Descriptions-Beider-MorseFilter

But I'm not sure how to add an analyzer to the schema using the schema
API: https://lucene.apache.org/solr/guide/7_2/schema-api.html

Under "Add a new field type", it says that new analyzers can be
defined, but I'm not entirely sure how to do that ... the API docs
refer to the field type definitions page[1] which just shows what XML
you'd have to put into your schema XML -- which you aren't supposed to
edit directly.

When looking at the JSON version of my schema, I can see for example thi
s:

"fieldTypes":[{
"name":"ancestor_path",
"class":"solr.TextField",
"indexAnalyzer":{
  "tokenizer":{
"class":"solr.KeywordTokenizerFactory"}},
"queryAnalyzer":{
  "tokenizer":{
"class":"solr.PathHierarchyTokenizerFactory",
"delimiter":"/"}}},

So should I create a new field type like this?

"add-field-type" : {
  "name" : "phonetic",
  "class" : "solr.TextField",

  "analyzer" : {
"tokenizer": { "class" : "solr.StandardTokenizerFactory" },

"filters" : [{
  "class": "solr.BeiderMorseFilterFactory",
  "nameType": "GENERIC",
  "ruleType": "APPROX",
  "concat": "true",
  "languageSet": "auto"
}]
  }
}

Then, use copy-field as "usual":

  "add-field":{
 "name":"phonetic",
 "type":"phonetic",
 multiValued: true,
 "stored":false },

  "add-copy-field":{
 "source":"first_name",
 "dest":"phonetic" },

  "add-copy-field":{
 "source":"last_name",
 "dest":"phonetic" },

This seems to work but I wanted to know if I was doing it the right way.

Thanks,
- -chris

[1]
https://lucene.apache.org/solr/guide/7_2/field-type-definitions-and-prop
erties.html#field-type-definitions-and-properties
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmsC4dHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjZWRAAisee5Ya+5dyix91A
cGpwgZtFpcVldhd0wDG8qwihq9528vBZCdDSM3yotojMd+Y9dYLm+Q+oM/RT/zoO
IXVfRRc352GqG00++hYKpZONUp9Eb3RNjl64+TCufz7vSpr3U/TsJL4wwIMQAY3r
eItN/v6TWvvb6jd0z/zL1eITeheOm7bFGjZhGRNv2A7LaQbqTLs6N+SgYphUv7mr
E6oQZD5VsdNDqmQdpXVA+Z+eiHweST5JHm1T2ePPz2S7lYunmAcGkAhCmTn2Kwew
H3C8+h+mD14YlfYK5J0VcQ2WMZtOkgNNvBiUGIUoEGoqu82dX81408cS49/ZYD/3
c9/p41nfzz2V9M3HwgYqbQTI9vV5HP33t44BsWIQr34x86yAPfnMIH3Yv5iEfXTk
aGAyeQjkfmMfJbiKTtmVu8Z7q/AiacgzUFUh3yMzGnoDQKz/OWw0A3JkdJ0TT/vY
Y6ZiwarooO1tuhG+wm4h+6rUQpoueJS7K8cdWi7LfVb9LGLgj7NCaOQtyIn9QAmk
1UxaJjIOiyO1hsV31nC0kXfKW2A/gkN444gitSi51106QuzIXpEtCeAc4QmqjJt9
yeI61DFbQRnr76oVCiyYQwEmOj+C0bOkZqkLU7ZvMonWLLjgX0ydrpNSfm0fDDNv
tdfbE/POTM+uJlgX0UEEJhN7qz0=
=bgGi
-END PGP SIGNATURE-

Re: Solr Read-Only?

2018-03-06 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Terry,

On 3/6/18 4:55 PM, Terry Steichen wrote:
> Chris,
> 
> Thanks for your suggestion.  Restarting solr after an in-memory 
> corruption is, of course, trivial (compared to rebuilding the
> indexes).
> 
> Are there any solr directories that MUST be read/write (even with
> a pre-built index)?  Would it suffice (for my purposes) to make
> only the data/index directory R-O?

I installed Solr for the first time 2 weeks ago, so I'm not a great
resource, here. But I've used Lucene in the past and the on-disk
storage is basically the same AFAICT.

When starting with a expand-the-tarball-and-just-go-for-it deployment
model, I'd probably make sure that the server/solr directory and
everything below it was non-writable by the Solr-user.

Obviously, once you have set this up in a test lab, just try to break
it and see what happens :)

- -chris

> On 03/06/2018 04:20 PM, Christopher Schultz wrote:
>> Terry,
>> 
>> On 3/6/18 4:08 PM, Terry Steichen wrote:
>>> Is it possible to run solr in a read-only directory?
>> 
>>> I'm running it just fine on a ubuntu server which is
>>> accessible only through SSH tunneling.  At the platform level,
>>> this is fine: only authorized users can access it (via a
>>> browser on their machine accessing a forwarded port).
>> 
>>> The problem is that it's an all-or-nothing situation so
>>> everyone who's authorized access to the platform has, in
>>> effect, administrator privileges on solr.  I understand that
>>> authentication is coming, but that it isn't here yet.  (Or, to
>>> add complexity, I had to downgrade from 7.2.1 to 6.4.2 to
>>> overcome a new bug concerning indexing of eml files, and 6.4.2
>>> definitely doesn't have authentication.)
>> 
>>> Anyway, what I was wondering is if it might be possible to run
>>> solr not as me (the administrator), but as a user with lesser
>>> privileges so that no one who came through the SSH tunnel could
>>> (inadvertently or otherwise) screw up the indexes.
>> 
>> With shell access, the only protection you could provide would
>> be through file-permissions. But of course Solr will need to be 
>> read-write in order to build the index in the first place. So
>> you'd probably have to run read-write at first, build the index
>> (perhaps that's already been done in the past), then (possibly)
>> restart in read-only mode.
>> 
>> Read-only can be achieved by simply revoking write-access to the
>> data directories from the euid of the Solr process.
>> Theoretically, you could switch from being read-write to
>> read-only merely by changing file-permissions... no Solr restarts
>> required.
>> 
>> I'm not sure if it matters to you very much, but a user can still
>> do some damage to the index even if the "server" is read-only
>> (through file-permissions): they can issue a batch of DELETE or
>> ADD requests that will effect the in-memory copies of the index.
>> It might be temporary, but it might require that you restart the
>> Solr instance to get back to a sane state.
>> 
>> Hope that helps, -chris
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqfFf8dHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFhNbQ//SNP5gVLO/Ntt3OA5
9Cg05Gzvc7lNvLQVW1SSDFiQHbAJ91/6CB1N/AHhCTOLyRzmAoYBsOF+wgOuufrV
Z8FZBbSCVACiNi48n+agNfA/QQ79pBgTBaharAZqFaEybxhLgivAw5f9VyhABxSt
5Ceq2UffHzOFL4q8yRSpPPwOTAPnPzSH2Qvsv7039ZRJRehiV5WZiwU318Tkbtoy
M3LbTjWWlm9/IvqzYyf3KuKAytWDIvXs7aSwGi9RI0K9PtGCJwzz4Dp8G6dJCTo3
+2jLe5Q/bRATEwrNO+uriOUk6DOT2+9giUJbyBQjwW2e9jWCxiUCN/NVosjY1M6F
zb9beuQ8Oglkzz/PlcsLpavH7vNayeVhVB2+yGK1L5XiRKz5qtvY7GaFuol4Lb7s
21PR5911vuuw79Kqi7q7srmJF/AtIPbsnBK9c/6Ts6h+VzR1BH+eflec9tSvH5rK
OuSyX6KKFjjMskZglHQz5kzdrn6tb1KLt0+lXr5SZpVSUt6YEtlyZMKDFVuxrLFB
SsZ8jhjxBh2YYYOhPCkan69bZoz4yyoE49g70+raAwKILZi1z4INFJ0Lf0eS9BSg
jXCjUAa+53Ne4/PyVRvycQYEHvPobSyPAW7dMXucldeUmIimn8mC/eLUgV0YTGaM
K6WVWl+oMrE5kLhyUEXtEYcdYwM=
=IAv7
-END PGP SIGNATURE-

Re: Solr Read-Only?

2018-03-06 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Terry,

On 3/6/18 4:08 PM, Terry Steichen wrote:
> Is it possible to run solr in a read-only directory?
> 
> I'm running it just fine on a ubuntu server which is accessible
> only through SSH tunneling.  At the platform level, this is fine:
> only authorized users can access it (via a browser on their machine
> accessing a forwarded port).
> 
> The problem is that it's an all-or-nothing situation so everyone
> who's authorized access to the platform has, in effect,
> administrator privileges on solr.  I understand that authentication
> is coming, but that it isn't here yet.  (Or, to add complexity, I
> had to downgrade from 7.2.1 to 6.4.2 to overcome a new bug
> concerning indexing of eml files, and 6.4.2 definitely doesn't have
> authentication.)
> 
> Anyway, what I was wondering is if it might be possible to run solr
> not as me (the administrator), but as a user with lesser privileges
> so that no one who came through the SSH tunnel could (inadvertently
> or otherwise) screw up the indexes.

With shell access, the only protection you could provide would be
through file-permissions. But of course Solr will need to be
read-write in order to build the index in the first place. So you'd
probably have to run read-write at first, build the index (perhaps
that's already been done in the past), then (possibly) restart in
read-only mode.

Read-only can be achieved by simply revoking write-access to the data
directories from the euid of the Solr process. Theoretically, you
could switch from being read-write to read-only merely by changing
file-permissions... no Solr restarts required.

I'm not sure if it matters to you very much, but a user can still do
some damage to the index even if the "server" is read-only (through
file-permissions): they can issue a batch of DELETE or ADD requests
that will effect the in-memory copies of the index. It might be
temporary, but it might require that you restart the Solr instance to
get back to a sane state.

Hope that helps,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqfBiEdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFg9WBAAw1AoqeNTmndplMwT
YRLznWAaiSi2/bCzxQEFf8KlTXh80rMc9zVPvMhgqJQYx0EGiMqyUqQEAk1xc/Vq
5XGNk0Vf2efnjA4HVS5pHvhWJz2t4ATagqX6Z98qFvvO0OqkX7lpZat8612jfDYA
f2PmZ1GGlkxZhU7eP4u7FX1drVTFJPBWeUndZoPiSZg6Sj/zz4+rbfaCIEhcl2hC
1CorI3OIos4NgJjLwCqHLCuurkN0+NEJOFE+n2wsEJA69UES8sBo4rwZMR7TECWN
mv+bFHVc4RQIvmppFPSptQIAX4T0k7PgNY38pfGPKgpHgET8RbvpKP34S434uR06
w8jhwOCUOSY7iUP718vbzK9RKcJFzYB6hb2hIUe/C8Hig2K1EfOys7NHd96uBYvS
7fKL6zHByCw9Fw+XiA1O8q5D6Clo3DAWEix5JUl7FDmbXIeUftHEmzb7axfDisec
B80ZYFSUmtOAshaRhKT1dSaw6wIi8io/VDYw+UMIyKh4MFZFDDiN2fF8JLwGkFF4
whZvIaaP8iUBdrhc6ZlOupMA2mjjq+ugAjelyeVjxc/ogaqSOQzIyah7NgW0yvYY
u7xaMsVSg6OJWluAe6lEh0U1CYpdBABgdkSjs7rHefIQ/n4du+7sq0fQUcE32dX8
jMOD3In9TqX4JXP3c6EDfMQCN1g=
=FrpI
-END PGP SIGNATURE-

Alias field names when searching (not for results)

2018-03-05 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I'd like for users to be able to search a field by multiple names
without performing a "copy-field" when analyzing a document. Is that
possible? Whenever I search for "solr alias field" I get results about
how to re-name fields in the results.

Here's what I'd like to do. Let's say I have a document:

{
  id: 1234,
  field_1: valueA,
  field_2: valueB,
  field_3: valueC
}

I'd like users to be able to find this document using any of the
following queries:

   field_1:valueA
   f1:valueA
   1:valueA

I just want the query parser to say "oh, 'f1' is an alias for
'field_1'" and substitute that when performing the search. Is that
possible?

- -chris

-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqddZMdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFgbFg/9HIgJgX4Lib2X4XYU
P2F4uW9TyDWtp6mA9xsfdYxRNe4K3yFPbkUUwJW2MI2V62SR6apB+TghOMqbmCD/
gaQ0CFWgLsn5Egulj2taUN+MAYD/4GMO9ltyXNc2g9siSMIDUS5N09fwJbxfBXrP
SPvSQqUOVD5wKCgoCXCVd+RM+SEClX4k1ZuWDbVAiO4YPpJwFy6+BN2uTCaqP3Ll
XOqn+/6ejnPCcvoQrTlE1/DiBTUti8H7V0LOjzEZns8YqZOAH+pAVxYRRQM5UzZS
pUBGpHokoaZ0tMf/aCmHp5pI5VWrxrXcS47csBRvoAn8Z7uRxH8p0wYE8BkGs2rw
dEzOSOKdhma11ZDkWKg2/sBw8v9swyWy9W3MuA0tqYzfZicsXT2GBHzyPDsqabDq
mBPWuxUdqZEaz+fE8SRsW84ELcqe1fbltscng/ZhNRkLOtmn6aeMc+XABhpcVE7o
Rfodl/PrQetgzZ4WLyzb7m2bz2w38x6WSPhuQIZHVrHNoCXG+gWY3zMxF6EBEFCV
CJvsXaQ1ZpGLjO/uCXJ9iHKxsSoUzWap9qws82xH3QJ52Q7vCoxF5G/2MZWvvgje
+MsZbh8L5D0HBM1jTKWx3X+r3FbdURu6P8yUFD/Hywy2J/jev1MiU4Zh3Yw+JByo
mR8TdvleHAHfA01tArVgk2yscqI=
=44DX
-END PGP SIGNATURE-

Re: Updating documents and commit/rollback

2018-03-05 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 3/2/18 7:46 PM, Shawn Heisey wrote:
> On 3/2/2018 10:39 AM, Christopher Schultz wrote:
>> The problem is that I'm updating the index after my SQL UPDATE(s)
>> have run, but before my SQL COMMIT occurs. I have had a problem
>> where the SQL fails and rolls-back, but the solrClient is not
>> rolled-back.
>> 
>> I'm a little wary of rolling-back Solr because, as I understand
>> it, the client itself doesn't carry any transactional
>> information. That is, it should be a shared-resource (within the
>> web application) and indeed, other clients could be connecting
>> from other places (like other app servers running the same
>> application). Performing either commit() or rollback() on the
>> Solr client will commit/rollback *all* writes since the last
>> commit, right?
> 
> Correct.  Relational databases typically keep track of transactions
> on one connection separately from transactions on another
> connection, and can roll one of them back without affecting the
> others.
> 
> Solr doesn't have this capability.  The reason that it doesn't have
> this capability is that Lucene doesn't have it, and the majority of
> Solr functionality is provided by Lucene.
> 
> If updates are happening concurrently from multiple sources, then 
> there's no way to have any kind of meaningful rollback.
> 
> I see two solutions:
> 
> 1) Funnel all updates through a single thread/process, which will
> not move on from one update to another until the final decision is
> made about that update.  Then rolling back becomes possible,
> because there is only one source for updates.  The disadvantage
> here is that this thread/process becomes a bottleneck, and
> performance may suffer greatly.  Also, it can be a single point of
> failure.  If the rate of updates is low, then the bottleneck may
> not be a problem.
> 
> 2) Have your updating software revert the changes "manually" in 
> situations where the SQL change is rolled back ... by either
> deleting the record or sending another update to change values back
> to what they were before.

Yeah, technique #2 was the only thing I could come up with that made
any sense. Serializing updates is probably more trouble than it's worth.

In an environment where I'd probably expect to have maybe 50 - 100
"writes" daily to a Solr core, how do you recommend commits be done?
The documents are quite small (user metadata like username, first/last
and email). Can I add/commit simultaneously? There seems to be no
reason to perform separate add/commit steps in this scenario.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqddMUdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjQHBAAiZaJLBQM6t6OLYea
LsGtqCtDTCmUuJGpBq7q8/+26OkgCTK0KDOGWlqpMeMvCe8uLlN0qDTGHEm0nLCk
Ils9Yv+UOP8iiYMvodUxv5d5Y75Yt5aQ0yZ8X7vp1KOCXTZhXIjmAdtw8KaC3z4y
zYJcI3DAEYurkmJcFVwZNQ7LRck2RWRNNsRfWaZ0yGAd2AUvvCp2zV3e0i5cs7hA
xICklU+5+5Nsy90pyDalnpgwrbc0uE6ZFGSkAocSDBdvNNONbNAq+sUYsov8af0+
6qhQWOqZOT2M+Ue51Nlqy+PtECzWOsqXcpFNyM/2Rsz1cnKCzAUbDs2Hi7m5R1UX
tST10VBvFTJ4GukGVPxHysVxwTHVg1HYCEngfHKS7HqiVtwkqWMzm315toWoDRfQ
J8EMeFZ/cQx716D+DPAKudGBWZ3akyODsb9h1KB4i85pGT4rijKhY7bxddhFDnHi
gbCdnpU9/pv8G/Y2SUhW4SgEUd3X6YZZD/4cZ4ocrf8KaXBFrLe8iz1aoFYI5ldh
i3TAi28dFHqxrofBTo4f42AXm9SYsycCQ2kBj7Yegyt5Sljfr3yoOckoJnNR05mX
2qjBIJJjJT0CvnV18azerdhpkZtcVbdVYC4WZHEjf6doC3SqqLHL6Pfu5Ha4APZ8
hc0tRk3wV+Cn/XVVx691QN0X1Nw=
=0s2n
-END PGP SIGNATURE-

Updating documents and commit/rollback

2018-03-02 Thread Christopher Schultz

Hey, folks. I've been a long-time Lucene user (running a hilariously-old
1.9.1 version forever), but I'm only just now getting into using Solr.

My particular use-case is storing information about web-application
users so they can be found more quickly than our current RDBMS-based
search (SELECT ... FROM user WHERE username LIKE '%foo%' OR
email_address LIKE '%foo%' OR last_name LIKE '%foo%'...).

I've set up my Solr (very basic... just untar, bin/solr start), created
a core/collection (I'm running single-server for now, no cloudy
zookeeper stuff ATM), customized my schema (using the Schema API, since
hand-editing is discouraged) and loaded my data. I can search just fine
through the Solr dashboard.

I've also user solr-solrj to perform searches from within my
application, replacing the previous JDBC-based search with the
Solr-based one. All is well.

Now I'm trying to figure out the best way to update users in the index
when their information (e.g. first/last names) change. I have used
solr-solrj quite simply like this:

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", user.getId());
doc.addField("username", user.getUsername());
doc.addField("first_name", user.getFirstName());
doc.addField("last_name", user.getLastName());
...
solrClient.add("users", doc);
solrClient.commit();

I'm having a problem, though, and I'd like to know what the "right"
solution is.

The problem is that I'm updating the index after my SQL UPDATE(s) have
run, but before my SQL COMMIT occurs. I have had a problem where the SQL
fails and rolls-back, but the solrClient is not rolled-back.

I'm a little wary of rolling-back Solr because, as I understand it, the
client itself doesn't carry any transactional information. That is, it
should be a shared-resource (within the web application) and indeed,
other clients could be connecting from other places (like other app
servers running the same application). Performing either commit() or
rollback() on the Solr client will commit/rollback *all* writes since
the last commit, right?

That means that there is no meaningful way that I can say to Solr "oops,
I actually need you to NOT add that document I just told you about".
Instead, I have to either commit the document I don't want (and, I
dunno, delete it later or whatever) or risk rolling-back other writes
that other clients have performed.

Do I have that right?

So... what's the best way to do this kind of thing? Can I ask Solr to
add-and-commit at the same time? If so, how? Is there a meaningful
"rollback this one addition" that I can perform? If so, how?

Thanks for a great product,
-chris



signature.asc
Description: OpenPGP digital signature

98 matches

Mail list logo