Re: Help using Noggit for streaming JSON data
Yonic, Thanks for the reply, and apologies for the long delay in this reply. Also apologies for top-posting, I’m writing from my phone. :( Oh, of course... simply subclass the CharArr. In my case, I should be able to immediately base64-decode the value (saves 1/4 in-memory representation) and, if I do everything correctly, may be able to stream directly to my database. With a *very* complicated CharArr implementation of course :) Thanks, -chris > On Sep 17, 2020, at 12:22, Yonik Seeley wrote: > > See this method: > > /** Reads a JSON string into the output, decoding any escaped characters. > */ > public void getString(CharArr output) throws IOException > > And then the idea is to create a subclass of CharArr to incrementally > handle the string that is written to it. > You could overload write methods, or perhaps reserve() to flush/handle the > buffer when it reaches a certain size. > > -Yonik > > >> On Thu, Sep 17, 2020 at 11:48 AM Christopher Schultz < >> ch...@christopherschultz.net> wrote: >> >> All, >> >> Is this an appropriate forum for asking questions about how to use >> Noggit? The Github doesn't have any discussions available and filing an >> "issue" to ask a question is kinda silly. I'm happy to be redirected to >> the right place if this isn't appropriate. >> >> I've been able to figure out most things in Noggit by reading the code, >> but I have a new use-case where I expect that I'll have very large >> values (base64-encoded binary) and I'd like to stream those rather than >> calling parser.getString() and getting a potentially huge string coming >> back. I'm streaming into a database so I never need the whole string in >> one place at one time. >> >> I was thinking something like this: >> >> JSONParser p = ...; >> >> int evt = p.nextEvent(); >> if(JSONParser.STRING == evt) { >> // Start streaming >> boolean eos = false; >> while(!eos) { >>char c = p.getChar(); >>if(c == '"') { >> eos = true; >>} else { >> append to stream >>} >> } >> } >> >> But getChar() is not public. The only "documentation" I've really been >> able to find for Noggit is this post from Yonic back in 2014: >> >> http://yonik.com/noggit-json-parser/ >> >> It mostly says "Noggit is great!" and specifically mentions huge, long >> strings but does not actually show any Java code to consume the JSON >> data in any kind of streaming way. >> >> The ObjectBuilder class is a great user of JSONParser, but it just >> builds standard objects and would consume tons of memory in my case. >> >> I know for sure that Solr consumes huge JSON documents and I'm assuming >> that Noggit is being used in that situation, though I have not looked at >> the code used to do that. >> >> Any suggestions? >> >> -chris >>
Help using Noggit for streaming JSON data
All, Is this an appropriate forum for asking questions about how to use Noggit? The Github doesn't have any discussions available and filing an "issue" to ask a question is kinda silly. I'm happy to be redirected to the right place if this isn't appropriate. I've been able to figure out most things in Noggit by reading the code, but I have a new use-case where I expect that I'll have very large values (base64-encoded binary) and I'd like to stream those rather than calling parser.getString() and getting a potentially huge string coming back. I'm streaming into a database so I never need the whole string in one place at one time. I was thinking something like this: JSONParser p = ...; int evt = p.nextEvent(); if(JSONParser.STRING == evt) { // Start streaming boolean eos = false; while(!eos) { char c = p.getChar(); if(c == '"') { eos = true; } else { append to stream } } } But getChar() is not public. The only "documentation" I've really been able to find for Noggit is this post from Yonic back in 2014: http://yonik.com/noggit-json-parser/ It mostly says "Noggit is great!" and specifically mentions huge, long strings but does not actually show any Java code to consume the JSON data in any kind of streaming way. The ObjectBuilder class is a great user of JSONParser, but it just builds standard objects and would consume tons of memory in my case. I know for sure that Solr consumes huge JSON documents and I'm assuming that Noggit is being used in that situation, though I have not looked at the code used to do that. Any suggestions? -chris
Re: Dynamic reload of TLS configuration
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, Ping. Any options for no-downtime TLS reconfiguration? - -chris On 4/23/20 11:35, Christopher Schultz wrote: > All, > > Does anyone know if it is possible to reconfigure Solr's TLS > configuration (specifically, the server key and certificate) > without a restart? > > I'm looking for a zero-downtime situation with a single-server and > an updated TLS certificate. > > Thanks, -chris > -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl7P9FEACgkQHPApP6U8 pFh6Eg//ceIMMoWnTEEcWk+vI40cgTe7bOUJJ+KcIRSuN9MwLOy6/RnHiNgvF1ma VS9+AkpzM9oOmoTu+p6Je9ZZi55cKqvwUm//Q92lO1GW3q7UVLTpESqTv6sUGi2t umIs9Qm51pGVzGS8G0unfkgFvcBy1j+0uJ58wEIaZdEa7DbSdHodo+UWJw/69wys H6yVaxGVRAwDSaR4EzhoDOvMT+Cze9WQoSvGxWFjJGa8WPMWetbOYmWsI7GJxXXt 5GzoMeVGv3ITbjMExDKyIHnoQYNZePxzegNBKD0FFAny2ozKEqBXeH6qOooYs56S XWubqMriFhnUgjrpbS+iwwOjEMuHjBZq2VXGGQ0XGCkv9e2iOIKFCtbY8O2IXZS0 grU3U7lC1wgZi594RrGXTYT2xYw0esYbi6jvDAKDE/zy33zHha/GlQy+4FGRdzqv Iaj1mvqlr1BNXuVl5yvuh6zAiw4cYjOWykAhnFztuSRIEEE5yEA1yT6g35e413QI nf3cUEFsczV04soSOwrsxEhqMG4+u6rBMVpT5zLvRyai8F1xXReVv/RPJesrB7bw Ow2hgGhEk0WSTCoeKnw6rXn89PDAe3V0oS0Nhug6KxfUiFfuVxf3rX07O4uOWaJz T8InerreM4D1JuMWEnjVWuFseBIX+Nroolo9uKZqNrhbPwlZz+0= =i1PJ -END PGP SIGNATURE-
Re: using S3 as the Directory for Solr
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Rahul, On 4/23/20 21:49, dhurandar S wrote: > Thank you for your reply. The reason we are looking for S3 is since > the volume is close to 10 Petabytes. We are okay to have higher > latency of say twice or thrice that of placing data on the local > disk. But we have a requirement to have long-range data and > providing Seach capability on that. Every other storage apart from > S3 turned out to be very expensive at that scale. > > Basically I want to replace > > -Dsolr.directoryFactory=HdfsDirectoryFactory \ > > with S3 based implementation. Can you clarify whether you have 10 PiB of /source data/ or 10 PiB of /index data/? You can theoretically store your source data anywhere, of course. 10 PiB sounds like a truly enormous index. - -chris > On Thu, Apr 23, 2020 at 3:12 AM Jan Høydahl > wrote: > >> Hi, >> >> Is your data so partitioned that it makes sense to consider >> splitting up in multiple collections and make some arrangement >> that will keep only a few collections live at a time, loading >> index files from S3 on demand? >> >> I cannot see how an S3 directory would be able to effectively >> cache files in S3 and what units the index files would be stored >> as? >> >> Have you investigated EFS as an alternative? That would look like >> a normal filesystem to Solr but might be cheaper storage wise, >> but much slower. >> >> Jan >> >>> 23. apr. 2020 kl. 06:57 skrev dhurandar S >>> : >>> >>> Hi, >>> >>> I am looking to use S3 as the place to store indexes. Just how >>> Solr uses HdfsDirectory to store the index and all the other >>> documents. >>> >>> We want to provide a search capability that is okay to be a >>> little slow >> but >>> cheaper in terms of the cost. We have close to 2 petabytes of >>> data on >> which >>> we want to provide the Search using Solr. >>> >>> Are there any open-source implementations around using S3 as >>> the >> Directory >>> for Solr ?? >>> >>> Any recommendations on this approach? >>> >>> regards, Rahul >> >> > -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIyBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl6iTwUACgkQHPApP6U8 pFjRaw/4sGbH286gZJe+wfKsLc4JPvyJZjjwVDCdpiR2SHt50IA23wYSK97R6xRj dbWWReA7C3JNWp6x21i8Bb6sIeLDnotbc7IOSmOMuNep1BtVaYBMJ8wyW6uUtXf6 hQbY0Ew93ZhDlS9CWMJqbQtWfrQEqH51Xbz+4uqqvJU8Bq9o9Vv0rnuVp/5f73lV ihek0sbA73oGle0gC5NFmrKItnn+14X8vIxUC8JRZlY4rDSiOdOcIil3DExxOQNQ UodIvwKKhzALFY77PeGSSjKiy0X3JJ1rKzLeIBrW0JCNMprYLzL2CQjZ5F09MraZ WxXdA64lEg2diEwHywNrsaaygbEZYTWd8gaeGA7kzCk78Y2KuhWuEQej6KmE3Iq2 AW+K7JgFakUpzB5oorCtKNLQOqFHX85ne57gCYKr42S3Htfxmf98pBdudQy4RvuT +tJvGYx8NLqgeOoZN4u+G/8WunlzUC+u2vUxVcIoK3Ozz0usMioFDqn69vmOxxoH cN2Y4T1ZZZGtndiAGZww1JXKAbVN0U41isXg2F8tHQV9dxaeoYDQ/xYbAoWEhhlM SVtEdr76eMJ08T6h5711gtrhSK+RQFPD2Jbr8B/Xl063xPfN2TpqmcJCKXkucvpc CEDLFqeKX6qIRZDgMf8EICmbFl6aF5knbDP0MkyYk4urB+uFaw== =Y/6Y -END PGP SIGNATURE-
Dynamic reload of TLS configuration
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, Does anyone know if it is possible to reconfigure Solr's TLS configuration (specifically, the server key and certificate) without a restart? I'm looking for a zero-downtime situation with a single-server and an updated TLS certificate. Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl6htdYACgkQHPApP6U8 pFhIpA/9EZ/jC3QjGUfx+g9fpNel5AxzpCV0NnTaJulGLkWNVeGoGNY2IcwfG+Oe 13piWVfRWs3OTLWYwiEhuxbY3FzBxJZL9EJ6QFNNmCVkyg2MMgQzR+bdSWiT0P7F K8hPyOzEMqLML5y6c1TOQHF8Bn09mHwgLACHdnvzfFKcaaUSzzBKItDlIvDTB5Vm m1x/GOBQ4P7uYr+Gi7hUbr+Zz6MwDI9HT2arUwAiG0aeTO///FrZEtVdKdJtrDWk tBwZz+qzkOzWj6EuTWgLU2/64QVzJsutGJmhkpixLaGaAnrpQ5d+3PjhxYraKA3j tahzRYJGC2PEUxQMZsWWCPSJodDsB/5h4zo5DsdIOZLmrAuuI367j5fcb9fO/J3c KxStUZf04ZCXWb17xMIrcYecWwkNQydjuwH5yRQHb9c7C3oRCpYNxY0yueUg/+8W voJUvCwR9qRD9NdSAUB9JOkt0Tj0c/SEgP5X8zllF5kISb7q7KcUVoyjG+vei1H0 E+4VNV3KnqnIJQgnFIsUU6ZiGznn+uy0I29+we8P08GX27MlEL1+KxjsT8la6h97 OWXwuH44e4ntFFsYbC9lOmn3ib/zA45l1sO77wTdDH9iBwKZXmLmf24ABlXvy8uI 4AH3dvOxjFeXWtYq9m2jebotiirzpkPaxvzBHJ+WDcVgtQKZ7wo= =5MDT -END PGP SIGNATURE-
Re: Require searching only for file content and not metadata
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Kushal, On 8/26/19 07:52, Khare, Kushal (MIND) wrote: > This is Kushal Khare, a new addition to the user-list. I started > working with Solr few days ago for implementing it in my project. > > Now, I have the basics done, and reached the query stage. > > My problem is – I need to restrict the solr to search only for the > file content and not the metadata. I have gone through various > articles on the internet, but could not get any help. > > Therefore, I hope I could get some solutions here. How are you querying Solr? Are you querying from a web application? From a thick-client application? Directly from a web browser? What do you consider "metadata" versus "content"? To Solr, everything is the same... - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8 pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y MyUKpp0/P6EpR/dMPmiKBPvLppSqjT1SUNgrFi2btwtBaTibxWXd0WtEqNdinWCo DFyJaPQaIT20IR887SPWrQSYc4oC8aKNAEDAXxlyWDzEgImE23AyCeWs++gJsaKm RphkleBeIKCX6SkRzDFeEzx4VyKBZKcjI+Ks/9z2s9tcGmElxyMDPHYf5VXJQgcz A1D3jPVPqm2OMvThXd2ll4NlnXe2PWV5eYfZQt/6YMwx4jF+rqG66jDXEhTHzDro jmiZVj1VbQ0RlFLqP6OHu2YRj+01a0OtE8l4mWiGSNIrKymp+ycT9E+L0eC9yGIT hLUfo7a3ONfOTTNAbuI/363+2WA1wBxSHm2m3kQT8Ho8ydjd7w/umR1L6/wr+q9B jEZfAHs1TLFXd6lgqLtmIyf6Ya5bloWM+yjwnjfpniOuHCcXTiJi+5GvxLwih8yE 6CQ32kIUuspJ7N5hyiJvM4AcuWWMldDlZaYoHuUwhVbWCCT+Y4X6R1+IZfyXZnvn wFEMD3+3r382M3G0uyh2MJk899l1kSPcX+BtRg3pOqDZh0WR+2xWpTndeiMxsmGj UC1J1PssKUa1P0dMk7wLvgOl0BiiGC+WwgD7ZfHjF7NPL1jPtW8= =LWwW -END PGP SIGNATURE-
Can't start Solr 7.7.1 due to name-resolution issue
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I'm getting a failure to start my Solr instance. Here's the error from the console log: Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: [hostname]: [hostname]: Name or service not known sun.management.AgentConfigurationError: java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: [hostname]: [hostname]: Name or service not known at sun.management.jmxremote.ConnectorBootstrap.startRemoteConnectorServer(C onnectorBootstrap.java:480) at sun.management.Agent.startAgent(Agent.java:262) at sun.management.Agent.startAgent(Agent.java:452) Caused by: java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: [hostname]: [hostname]: Name or service not known at javax.management.remote.JMXServiceURL.(JMXServiceURL.java:289) at javax.management.remote.JMXServiceURL.(JMXServiceURL.java:253) at sun.management.jmxremote.ConnectorBootstrap.exportMBeanServer(ConnectorB ootstrap.java:739) at sun.management.jmxremote.ConnectorBootstrap.startRemoteConnectorServer(C onnectorBootstrap.java:468) ... 2 more Now, my hostname is just the first part of the hostname, so like "www" instead of "www.example.com". Running "host [hostname]" on the CLI returns "Host [hostname]" not found: 3(NXDOMAIN)" so it's not entirely surprising that this name resolution is failing. What's the best way for me to get around this? I'm running on Debian Stretch in Amazon EC2. I've tried fixing the local name resolution so that it actually works, but when I reboot, the EC2 instance reverts my DNS settings so those changes won't survive a reboot. Can I give the fully-qualified hostname to the JMX component in some way ? I've this answer[1] on SO and everyone seems to say "edit /etc/hosts" and, as I said, the EC2 startup scripts end up resetting those files during a reboot. Any ideas? - -chris [1] https://stackoverflow.com/questions/20093854/jmx-agent-throws-java-net-m alformedurlexception-when-host-name-is-set-to-all-num -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1e9IQACgkQHPApP6U8 pFjG6g//XM6nPgioIuJs40gB9534GnsG9q8d42AUIoiDzJ+t8isFFxtphEChdcye 9/5ePo36fODIsNkzzXsAJh9L1iRgmnVy7QGQIDp07WEo9v2bVo2RkWl42zm+UQ5u XIz//bpT+J9y3eBPdPCKaXou+UYeR9/2W/UYyN08/uayP2QVVd2ZavC6AbFW93i1 IF5vOmETOsxBgVlgngX4TQRNSKfe5gCqWT0l/diHpm7PjT2BDzNO7x3vRbfioOMS ktXcRqdBJAzM9XLV1acI+0z7I1kzs/A+jCymT/4++VmI0Lf4AACIhoaqnmS9pxyY nrXU8tttozbaHMiBS3dIIMZP1ZF4jzY0+/UPBfgXqM4OcErWTjrha4G/5oBlLqf8 msuVRTg6qbsQJP//UcDhN8kl593xCK/bcQMkzq1ABkwFUhb8PhXp/3IJCRjJm5q3 U3gTwMwA/k+R4aM8qGaLw+07aFCdVJKrIUW0NEEHEnwkjJxAeqIRdpV8acfrT6uy 3v78cVFvWaxcOtAyioUhek0jhKzCobcxsZEcxZqWWxY0DOFHWbip/agTJESC/sXV wLY2P9lldo+S5dAoaGM7Ze1WJ5FOSLm6Juvl4CvyMeebyPFie4PrWX7b7ess8I+A YwLyqfKQOV4qmWoiO7yNGcwfgIYNn3bJ/1b/vkmo+ua0KvjscYk= =zeBa -END PGP SIGNATURE-
Re: Configure mutual TLS 1.2 to secure SOLR
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Paul, On 6/7/19 11:02, Paul wrote: > Can someone please outline how to use mutual TLS 1.2 with SOLR. Or, > point me at docs/tutorials/other where I can read up further on > this (version currently onsite is SOLR 7.6). Here's a copy/paste from our internal guide for how to do this. YMMV. Enjoy! [...] 5. Configure Solr for TLS Create a server key and certificate: $ sudo mkdir /etc/solr $ sudo keytool -genkey -keyalg RSA -sigalg SHA256withRSA -keysize 4096 -validity 730 \ -alias 'solr-ssl' -keystore /etc/solr/solr.p12 -storetype PKCS12 \ -ext san=dns:localhost,ip:192.168.10.20 Use the following information for the certificate: First and Last name: 192.168.10.20 (or "localhost", or your IP address) Org unit: CHADIS Solr (Prod) (or dev) Everything else should be obvious Now, export the public key from the keystore. $ sudo /usr/local/java-8/bin/keytool -list -rfc -keystore /etc/solr/solr.p12 -storetype PKCS12 -alias solr-ssl Copy that certificate and paste it into this command's stdin: $ sudo keytool -importcert -keystore /etc/solr/solr-server.p12 - -storetype PKCS12 -alias 'solr-ssl' Now, fix the ownership and permissions on these files: $ sudo chown root:solr /etc/solr/solr.p12 /etc/solr/solr-server.p12 $ sudo chmod 0640 /etc/solr/solr.p12 Edit the file /etc/default/solr.in.sh Set the following settings: SOLR_SSL_KEY_STORE=/etc/solr/solr.p12 SOLR_SSL_KEY_STORE_TYPE=PKCS12 SOLR_SSL_KEY_STORE_PASSWORD=whatever # You MUST set the trust store for some reason. SOLR_SSL_TRUST_STORE=/etc/solr/solr-server.p12 SOLR_SSL_TRUST_STORE_TYPE=PKCS12 SOLR_SSL_TRUST_STORE_PASSWORD=whatever 6. Configure Solr to Require Client TLS Certificates On each client, create a client key and certificate: $ keytool -genkey -keyalg EC -sigalg SHA256withECDSA \ -validity 730 -alias 'solr-client-ssl' \ -keystore /etc/solr/solr-client.p12 -storetype PKCS12 Now dump the certificate for the next step: $ keytool -exportcert -keystore /etc/solr/solr-client.p12 -storetype PKCS12 \ -alias 'solr-client-ssl' -rfc Don't forget that you might want to generate your own client certifica te to use from you own web browser if you want to be able to connect to t he server's dashboard. Use the output of that command on each client to put the cert(s) into this trust store on the server: $ sudo keytool -importcert -keystore /etc/solr/solr-trusted-clients.p12 \ -storetype PKCS12 -alias '[client key alias]' Then, export the server's certificate and put IT into the trusted-clients trust store, because command-line tools will use the server's own key to contact itself. $ keytool -exportcert -keystore /etc/solr/solr-server.p12 -storetype PKCS12 \ -alias 'solr-ssl' $ sudo keytool -importcert -keystore /etc/solr/solr-trusted-clients.p12 \ -storetype PKCS12 -alias 'solr-server' Now, set the proper file ownership and permissions: $ sudo chown root:solr /etc/solr/solr-trusted-clients.p12 $ sudo chmod 0640 /etc/solr/solr-trusted-clients.p12 Edit /etc/default/solr.in.sh and add the following entries: # NOTE: Some of these are changing from "basic TLS" configuration. SOLR_SSL_NEED_CLIENT_AUTH=true SOLR_SSL_TRUST_STORE=/etc/solr/solr-trusted-clients.p12 SOLR_SSL_TRUST_STORE_TYPE=PKCS12 SOLR_SSL_TRUST_STORE_PASSWORD=whatever SOLR_SSL_CLIENT_TRUST_STORE=/etc/solr/solr-server.p12 SOLR_SSL_CLIENT_TRUST_STORE_TYPE=PKCS12 SOLR_SSL_CLIENT_TRUST_STORE_PASSWORD=whatever SOLR_SSL_CLIENT_KEY_STORE=/etc/solr/solr-client.p12 SOLR_SSL_CLIENT_KEY_STORE_TYPE=PKCS12 SOLR_SSL_CLIENT_KEY_STORE_PASSWORD=whatever Summary of Files in /etc/solr - - solr.p12 Server keystore. Contains server key and certificate. Used by server to identify itself to clients. Should exist on Solr server. solr-server.p12 Client trust store. Contains server's certificate. Used by clients to identify and trust the server. Should exist on Solr clients. solr-client.p12 Client keystore. Contains client key and certificate. Used by clients to identify themselves to the server. Should exist on Solr clients when TLS client certs are used. solr-trusted-clients.p12 Server trust store. Contains trusted client certificates. Used by server to trust clients. Should exist on Solr servers when TLS client certs are used. [...] Loading Data into a Core (Index) - If you have installed Solr as a service using TLS, you will need to do some additional work to call Solr's "post" program. First, ensure you have patched bin/post according to the installation instructions above.
Re: Using Solr as a Database?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Daniel, On 6/3/19 16:26, Davis, Daniel (NIH/NLM) [C] wrote: > I think the sweet spot of Cassandra and Solr should be mentioned > in this discussion. Cassandra is more scalable/clusterable than > an RDBMS, without losing all of the structure that is desirable in > an RDBMS. Amusingly enough, there is also Solandra if you don't want to choose :) https://github.com/tjake/Solandra It's a lot like DataStax. > In contrast, if you use a full document store such as MongoDB, you > lose some of the abilities to know what is in your schema. > > DataStax markets a platform that combines Cassandra (as a > distributed replacement for an RDBMS) that is integrated with Solr > so that records in managed in Cassandra are indexed and > up-to-date. > > If your real problem with an RDBMS is the lack of scaling, but you > like the ability to specify columnar structure explicitly, then > this combination might be a good fit. > > Now, MongoDB is also a strong alternative to an RDBMS. > > The other thing to recall though is that the power of sharding has > reached into the databases themselves, and databases such as > PostgreSQL can operate with some tables sharded and other tables > duplicated. See > https://pgdash.io/blog/postgres-11-sharding.html. Even MySQL and MariaDB -- the most bare-bones solutions in the RDBMS space -- now have clustering available to them, to it's hard to defend an RDBMS solution at this point that does NOT provide clustering, or something similar. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlz1kA8ACgkQHPApP6U8 pFif4w/+Ph5ZsQdEiVuK96ygWYJcq0x5RzBfrQhQ5oq7IvhdlLzdzwIPilwLZoaO 9/JcwQOUfVo5XNC72mpclg6J+1jhkBuvee7tMqvSA90PLoTmJLft/oeFoBBm374Z 9UAhJgHF/lhcyp00w4L1JjRH+jQzZia3cohi56oeLReKnyHY//EvqzHKNe2TbiPf 7m5jOIiscxmzAMaI2pEBE4gHWUL8rXVG0SVkUbMQYqR+dRj50sOKk3w2lO2akWV/ rLkYD175LAtpQ7qMXU+CAGro2UAIdTXJOtp7yhCquA6T6Vo4BcBsvQ2bGBMDpeld MsnyxzM1hiOZ71DOhyFjfGN9Ivqr1/UijVNsZWazBYtYp9N9/H1l3hl6NlKUVGIF c+pSVWleNAzsO4ShUGJrOkdfv64vRjfK1s/unggAnu/XtyTWKoNV6vxhXwceEYlD 1xVDk8O4ANErXxj4XvQvtgrvBeYOK5sJ5aqn0guN1UIX6Q2gE61bclYwJp9r4NO9 cJjTQedEPdVdRYAz+lDucmSESETQITghhSgub8558BmTSc1PF61f3nAKEYiWrhfN NnxR0dLKY+QOQ5Mo9lX6RSsCYb9x5F8K1jAoy/GSllpnGc88oswquJT/7Vm6R0yX 9YvFI7JsUHfhIwSkV8uupBZ03KJpYgJvXwirBGzV4j7i4M4qr7o= =9ZXf -END PGP SIGNATURE-
Re: Using Solr as a Database?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Ralph, On 6/2/19 16:32, Ralph Soika wrote: > The whole system is highly transactional as it runs on Java EE with > JPA and Session EJBs. And you write-through from your application -> RDBMS -> Lucene/Solr? How are you handling commits (both soft and hard) and re-opening the index? > So, as far as I understand, you recommend to leave the data in the > RDBMS? I certainly would, even if it's just to allow a rebuild of the index from a "trusted" source. > The problem with RDBMS is that you can not easily scale over many > nodes with a master less cluster. That sounds like it's a problem with your choice of RDBMS, and not of RDBMS's in general. > This was why I thought Solr can solve this problem easily. On the > other hand my Lucene index also did not scale over multiple nodes. If you want a clustered document-store[1], you might want to look at a storage system designed for that purpose such as CouchDB or MongoDB. Lucene/Solr is really best used as a distillation of data stored elsewhere and not as a backing-store itself. > Maybe Solr would be a solution to scale just the index? That's exactly what Solr is for. > Another solution I am working on is to store all my data in a HA > Cassandra cluster because I do not need the SQL-Core > functionallity. But in this case I only replace the RDBMS with > Cassandra and Lucene/Solr holds again only the index. This seems like another plausible solution. > So Solr can't improve my architecture, with the exception of the > fact that the search index could be distributed across multiple > nodes with Solr. Did I get that right? Yes. Hope that helps, - -chris [1] https://en.wikipedia.org/wiki/Document-oriented_database#Implementations -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlz1jYYACgkQHPApP6U8 pFjw6w/9GGv4Z4FIoypv8XQrtIf5heT8yH0On6pQaFI313mglmzerTrD4W9Jz3y7 VWQHeMw5Q5LBg56KKMGSKv/PEnNmiA+69YTMdXB+R5gJnwtW0ZEZU0jP1uhPO+af UO6ZpdbMnIuIyeZK8oeo99rL7nrb0CaPvzrVP7LoF+flX9gp5qt30841QPTVwNgZ ryC+mrlWTidRpFF/uKCctDOwDJgw6pKNf352F+n/Oc85maBTySgIla1ZEqz+B+G3 tdgdTiDT/ueZY0BNFubnWlpjVTP+rwQjOrq1cD/Z53zV6APs4v7RQ0JBqDeJcadj 5xohEmZh47lKiNqsrSpB+CZy5mebxEalB3ptB+O7zexwLoixzJB4wmqfbP/hcO69 ijp58mhdoYDZqqwNJXoRNQ6OfQ9KlTyxtQwQGNcKCDiOOzZkhPInaYFnDo4AARG7 bI4z4eMpDuAm0VKi+b1voASSDxvIcT1gUZVVEtQWR5O3lzWDYmpKLsdMXQi34TKG CXtpjgq5CR8x8kFhVQD8QijTG/zOsDf0pksF1AZx/6DQvN3JaFy3hy2dSW1Plbm6 n0WMDIkJ8w9IxofU+pFcu+tJuSRvKdcieK6dHSMHSrTvUAZc3VcCXWI4w25eODX2 985JoQF5tP6IizxBOv334VwizGu7GRyPmLLMSnQFuJXzjB52v2w= =cN5b -END PGP SIGNATURE-
Re: SolrJ, CloudSolrClient and basic authentication
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Dimitris, On 6/1/18 02:46, Dimitris Kardarakos wrote: > Thanks a lot Shawn. I had tried with the documented approach, but > since I use SolrClient.add to add documents to the index, I could > not "port" the documented approach to my case (probably I do miss > something). > > The custom HttpClient suggestion worked as expected! Can you please explain how you did this? I'm facing a problem where the simplest possible solution is giving the error "org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity.". It seems that SolrClient is using something like BasicHttpEntity which isn't "repeatable" when using HTTP Basic auth (where the server is supposed to challenge the client and the client only then sends the credentials). I need to either make the client data repeatable (which is in SolrClient, which I'd prefer to avoid) or I need to make HttpClient use an "expectant" credential-sending technique, or I need to just stuff things into a header manually. What did you do to solve this problem? It seems like this should really probably come up more often than it does. Maybe nobody bothers to lock-down their Solr instances? Thanks, - -chris > On 31/05/2018 06:16 μμ, Shawn Heisey wrote: >> On 5/31/2018 8:03 AM, Dimitris Kardarakos wrote: >>> Following the feedback in the "Index protected zip" thread, I >>> am trying to add documents to the index using SolrJ API. >>> >>> The server is in SolrCloud mode with BasicAuthPlugin for >>> authentication. >>> >>> I have not managed to figure out how to pass username/password >>> to my client. >> There are two ways to approach this. >> >> One approach is to build a custom HttpClient object that uses >> credentials by default, and then use that custom HttpClient >> object to build your CloudSolrClient. Exactly how to correctly >> build the HttpClient object will depend on exactly which >> HttpClient version you've included into your program. If you go >> with SolrJ dependency defaults, then the HttpClient version will >> depend on the SolrJ version. >> >> The other approach is the method described in the documentation, >> where credentials are added to each request object: >> >> https://lucene.apache.org/solr/guide/6_6/basic-authentication-plugin. html#BasicAuthenticationPlugin-UsingBasicAuthwithSolrJ >> >> >> >> There are several different kinds of request objects. A few examples: >> UpdateRequest, QueryRequest, CollectionAdminRequest. >> >> Thanks, Shawn >> > -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlzxjlEACgkQHPApP6U8 pFhoeQ/7BzlhjGGE8tnMcrdmruP+N2rgvawfLcTdzDg3U4cQFNUVRoCclZcM8LiA iuZf+cAewTTQTjLpQuSv2WoknQgO/YRgaqTlo+b3hv9zR2awY8Mob/m5RYcYAwmn i+2SJrG7+u+qhpfDQGSjwppUKpm2WrfvGXL3lcRF48UXQ+z7J95o2g88SnP44FKH 87/X/iYX+xMsj0bkIEOkyppuXENQQwUZ7QWhgfAxSItJr2A0Ma6zkuuNPf4FvBJ1 JQM/c33WWbAXK3B7tI5iQsstVi5CMOhRF0Z336/vZgq6aF9uEZvIOWEVAlM+E8Qp mYlZz7tERzUMs+QbcBcSdDIb8VSPwYy5kvKiJ9eEpjFGXmPBLOqiJ4M+4SOeGFq7 BA5sbm6k4gwHc33MiKvnHE1K+k3r1OBPngjxvelsyIaqSnX3zpKPTFhkU2dvWMPt XPo/ICuiliGowD8xh5EhB6w0BuYZhK3dW7AKMCLbyoANwk7SLfHxC6O+rdmYyDQF UwiR65+3ImmeKJOZt7lFoR43BXoFuz6L1SILU8XRcclS5KwXHg3moBElU7jM9iKV 9vMwWkuPGUA2gq5K0oV4XFEOShxUxFiCL4FXjd/P7x9Evhio+itvaUlHzP8FGblh YyK+l2YqjKBnTJ0G4XE8UnJcmH8C23jJ05gwMgq92pXBQy5ly6s= =6kab -END PGP SIGNATURE-
Re: Enabling SSL on SOLR breaks my SQL Server connection
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn and Paul, On 5/23/19 08:57, Shawn Heisey wrote: > On 5/23/2019 5:45 AM, Paul wrote: >> unable to find valid certification path to requested target > > This seems to be the root of your problem with the connection to > SQL server. > > If I have all the context right, Java is saying it can't validate > the certificate returned by the SQL server. > > This page: > > https://docs.microsoft.com/en-us/sql/connect/jdbc/connecting-with-ssl- encryption?view=sql-server-2017 > > > > Talks about a "trustCertificate" property you can set to "true" in > the JDBC URL that will cause Microsoft's JDBC driver to NOT > validate the server certificate. It would be much better to use the "trustStore" setting on the connection properties. As Shawn mentions later in this thread: On 5/23/19 12:06, Shawn Heisey wrote: > Enabling SSL should have no *direct* effect on JDBC. > > But it might have an indirect effect by changing some of Java's > SSL settings that in turn could filter down to the JDBC driver. You have probably been relying on the JVM's VM-wide default trust store and when you change that, your SSL connections to SQL Server no longer work. I would argue that it is always a best-practice to configure trust stores separately for every type of connection. So, if you follow the link above you can read about the "trustStore" connection parameter and point that config setting at a trust store which contains the SQL Server's TLS certificate -- that your application should trust. I think that will clear-up your issue. You may also with to set the "trustStorePassword" and "trustStoreType" options as well. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlzoVekACgkQHPApP6U8 pFjxDw/9EhDR1Pgxta9s05htdChMhuD+zpCTRPMvnppQoAQ6Qa5XCeSuVGaRNuGG uDkJam93k/zxpLUoDSXmTxivKOoCljqdL3dDBEFzuuefrCTT3Hea4yCzzRJpjQa4 vttVBFJZdfvkjzMy1r3gtHe+IfMnYa404PKqNB+J9JTmsE2J/6cfbQ7/NzQPJ5p1 X1zQGlvhTSICkuqtuj+nuAh2WtrZpkG578N9STUfyCMcYHQqvZKNfo814Su1sBi4 PgKm0duc3QGS97kLf7qmsOq2hcbi2bF4snLw/Nii25pyLLKsw2mgpqKrVCPlCI3B ic5cFGfMkJvwfakORFMeUV6oLAAY2wkm/itDPGkN/Iifsdx5SjqRa/z56k7FbOx5 y0Bt1lKJm+CJg+OcUq+qWIoZKSyqn6CjuOJmgq2UZQJlG24GUDhLNFPP/qHeA5/E vaL7kJBcshPpAIFtg8r6T07mwfA9n0c0JKrp0a3RSzk22xVn5Uy44MAWd/z7jMhR QU9UiOOJ+p8Om02td5UMv93liVB4xqA1biZ0l4LaIGIbbxDin4XZI3Ww2vWdSb8s 8famu8OFPBJ9IWjBqx6X48BscDIJVv3oYDBdfOS9LlVlGOG9sWI3/hu33Z9OOM4F r4aoxgeD7fHj5G6vgkCkJ7FSHcdiy9NsIwQcyhUu9JgHIj6UNHU= =ksmI -END PGP SIGNATURE-
Re: Commits and new document visibility
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 3/14/19 10:46, Shawn Heisey wrote: > On 3/14/2019 8:23 AM, Christopher Schultz wrote: >> I believe that the only thing I want to do is to set the >> autoSoftCommit value to something "reasonable". I'll probably >> start with maybe 15000 (15sec) to match the hard-commit setting >> and see if we get any complaints about delays between "save" and >> "seeing the user". > > In my opinion, 15 seconds is far too frequent for opening a new > searcher. If the index reaches any real size, you may be in a > situation where the full soft commit takes longer than 15 seconds > to complete - mostly due to warming or autowarming. Commits that > open a searcher can be very resource-intensive ... if they happen > too frequently, then heavy indexing will cause your Solr instance > to never "calm down" ... it will always be hitting the CPU and disk > hard. I'd personally start with one minute and adjust from there > based on how long the commits take. Okay. Current core size is ~1M documents. I think users can live with a 1-minute delay, but I'll have to ask :) Is the log file the best resource for information on (soft) commit-duration? >> In our case, we don't have a huge number of documents being >> created in a minute. Probably once per minute, if that. >> >> Does that seem reasonable? >> >> As for actually SETTING the setting, I'd prefer not to edit the >> solrconfig.xml document. Instead, can I set this in my >> solr.in.sh script? I see an example like this right in the file: >> >> SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000" > > 3 seconds is even more problematic than 15. Sorry, that was just a copy/paste directly from the default solr.in.sh script that ships with Solr. I wouldn't do a 3-second soft-commit. > I believe that when you use "bin/solr create" to create an index > with the default config, that it does set the autoSoftCommit to 3 > seconds. Which as I stated, I believe to be far too frequent. Nope, it sets it to "never soft commit", unless the defaults have changed since I built this service with, I think, 7.3.0. Is there any way to change this value at runtime, or does it require a service-restart? - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlyKxt8ACgkQHPApP6U8 pFg9ChAAkSgsvn3+xufyLM9bA8WIWqICwmDWRdFM9nbSiy4bDH1Zl/86FKjzcvbB lmyVFYlpFGedcSKLVsqXGEZiu8n0YgR6iVw6udfIJOWzex5JkwUBUsmS6bHP5ZAj 8wkTyWPyBQVBSBWUxQnEzfrgJCFxzEbzBt8no0gt0f7vbgXm+HaFBkb+l2MQzTK9 wrhsLh36cb17ig+/w16Eo4Rq5VQ5f/P4Y7PkTfzS5CaWyPi16mTP8Z7vTxQ+ltHQ IPAVnZ4U6Tx4hFxf2Ox99qRX5wAlX0lMD063Gx7Q348Xn+u8VH8Aur8hudnb9Icf MK9OqU0bxdeWkhDxGDCuxY4h+t+kE1YI0cPI5KWTkBVAU24dCOAPkJQ0LMGs/rGR B3KareFltLztowvM8rxOeNcLzeoKn1ZpWrtPuK9tuaCy9LnwxgfTOGJFRuzhzxPF WHA7R4LtQrjjmAXV1a/BgkNVXXmGnq1qJNyICiV6nYS/ALJXKidrexgcyJ4FoWK4 uEcy/62mtbTVz7I4mdmkNH/vwjjOTxZy2FXfwoUIQYe9R2RHM9NbF0Fzzrvx3hQH vp2GD+AhzhIQUuqBe50XqUkC0T199ZgR4YkCBX7LdPDPcv54QgAfgjfImidQAiqn s+i/J/rBFZPTD2vAgix+A74UNpePrKhODt0GNg92J4NvTU8P9kM= =FwiA -END PGP SIGNATURE-
Commits and new document visibility
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I recently had a situation where a document wasn't findable in a fairly small Solr core/collection and I didn't see any errors in either the application using Solr or within Solr itself. A Solr service restart caused the document to become visible. So I started reading. I believe the "problem" is that the document was indexed but not visible due to the default commit settings in Solr 7.5 -- which is the version I happen to be running right now. I never bothered so change anything from the defaults because, well, I didn't know what I was doing. Now that I (a) have a problem to solve and (b) know a little more about what is happening, I just wanted a quick sanity-check on what I'd like to do. [Quick background: my core/collection stores user data so that other users can quickly find anyone in the system via text-search. This replaced our previous RDBMS-based "SELECT ... WHERE name LIKE '%whatever%'" implementation which of course wasn't scaling well. Generally, users will expect that when a new user is created, they will be findable "fairly soon" (probably immediately) afterwards.] We are using SolrJ as a client from our application, btw. Initially, we were doing: SolrInputDocument document = ...; SolrClient solr = ...; solr.add(document); solr.commit(); Someone told me that committing after every document-add was wasteful and it seemed like good advice -- allow Solr's autoCommit mechanism to handle the commits and we'll get better performance. The problem was that no new documents are visible unless we take additional action. So, here's the default settings: autoCommit = max 15sec openSearcher = false autoSoftCommit = never[*] This means that every 15 seconds (plus OS/disk sync time), I'll get a safe snapshot of the data. I'm okay with losing 15 seconds worth of data if there is some catastrophe. It also means that my documents are pretty much never made visible. I believe that the only thing I want to do is to set the autoSoftCommit value to something "reasonable". I'll probably start with maybe 15000 (15sec) to match the hard-commit setting and see if we get any complaints about delays between "save" and "seeing the user". In our case, we don't have a huge number of documents being created in a minute. Probably once per minute, if that. Does that seem reasonable? As for actually SETTING the setting, I'd prefer not to edit the solrconfig.xml document. Instead, can I set this in my solr.in.sh script? I see an example like this right in the file: SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000" Is that a fairly standard way to set the autoSoftCommit value for all cores? Thanks, - -chris [*] This setting is documented only in a single place: in the "near-real-time" documentation. It would be nice if that special value was called-out in other places so it wasn't so hard to find. -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlyKY9wACgkQHPApP6U8 pFhxzRAAnxLCMPSFwJxChXZ8q7UJ9hHAGyMPHNs3k0tFilt9/aT+eR7rUEFGupvR anl+o7QNU8fOreF/l0KoFeGpjNLHZqEJRSKrZkaEb0PH3gabH5IKpgwY9hr+CS9N bcKC7GwQAs19TdkTorxY+MIBeQo0/bO51Ux7XallzYPdX6BW/+kRGlHCuiAQj3fg +EwQan0iXLslk/bDxvCvg95B1zlvr7R4iRAOwp9GxIsk4tL8X/B7sOS5pm0RK19/ tiVJuAqTBwD2fQ3lZ1oQftadKMuajgedJdrrgd94jCuwzWVLjJpIXql2AKA/QcsM 7e2zJqOsPy/4eGFUJ+St5/JYxFfm/yzFjV4rTW1/wng65mmbYAGpLsQ3A+05A8s1 o8ciDQ/80/fvnislr3/NGxZF5hSMjJG4xVriDWpdHX+PqfbqfpeaWnR4j8HEP3vy tPklo3MflnPLk0oA6wqvjSX32ujucVd+X5tKKtkqnE6rorD41FpJGVRvgUrq7Zof kwNro/r7ObqD72hioJJIkjol3ImL3NGSyeZ6XZtsKx+kEsGoyvW5lsRtC580ksXN tYaJbCWQbrHmXnf3ooQV0PatQi0YkG70BQceKPXNQJ3l8Fmc2MjrP7aJ9//ptrMl Pvc0qh4mpzGJKMBjSjaItadmouZdc3dn308xP4WIvpt2a4RYmjo= =PrAt -END PGP SIGNATURE-
Re: Get details about server-side errors
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jason, On 2/13/19 07:39, Jason Gerlowski wrote: > Hey Chris, > > Unfortunately I think you covered the main/only options above. > > HTTP status code isn't the most useful, but it's worth pointing > out that there are a few things you can do with it. Some status > codes are easy to identify and come up with a good message to > display to your end user e.g. 403 codes. But of course it doesn't > do anything to help you disambiguate 400 error messages you get. > > Error handling has always been one of SolrJ's weak spots. One > thing people have suggested before is adding some sort of enum to > error responses that is less ambiguous and easier to interpret > programmatically, but it's never been picked up. A bit more > information on SOLR-7170. Feel free to vote for it or chime in > there if you think that'd be an improvement. I've added some comments and a proposed fix that meets *my* needs, but I want to make sure that it will be useful for others (and not just my specific use-case). Thanks, - -chris > On Tue, Feb 12, 2019 at 5:09 PM Christopher Schultz > wrote: >> > Hello, everyone. > > I'm trying to get some information about a (fairly) simple case > when a user is searching using a wide-open query where they can > type in anything they want, including field-names. Of course, it's > possible that they will try to enter a field-name that does not > exist and Solr will complain, like this: > > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > > Error from server at http://localhost:8983/solr/users: undefined field > bad_field > > (This is what happens when I search my user database for > "bad_field:foo" .) > > What is the best way to discover what happened on the server -- > from a code perspective. I can certainly read the above as a human > and see what the problem is. But my users won't understand > (exactly) what that means and I don't always have English-language > searching my user databas e. > > Is there a way to check for "was the error a bad field name?" and > "what was the bad field name (or names) detected?" > > I looked at javadoc and saw two hopefuls: > > 1. code -- unfortunately, this is the HTTP response code > > 2. metadata -- unfortunately, this just returns > {error-class=org.apache.solr.common.SolrException,root-error-class=org .a > > pache.solr.common.SolrException}, > which is already obvious from the exception type. > > Is there something in SolrJ that I'm overlooking, here, or am I > limited to what I can parse out of the exception's "getMessage" > string? > > Thanks, -chris > -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlxkPP8ACgkQHPApP6U8 pFiWkg/+Jd9kHBUc0dYPw9EqkiqjzKDc+/adtERK1TktD/GxYoJaXoKwkNeSt+5C nOysBDwPoPBZELKmCQVDyyyIKrGfSYw2Pva0fDuMr1fKNoazf6I68/5BusjNf5iL ETkPuCtGuV6fmETGnK9xLFKE41tTO2u32erWnCcxbBPC858qNhafYfO1UZ3lzjuj kvuV81RESL4LQvbfx98FKxhgiHJGCV9maY4xFGQeNpI0nc3btnneAGfqUIBxJdhk RT97PdMF1yZ37aLx4H4wUTtey8hAvJhHSpDg1fw+UDNoGXcefpTwh+KQMqK5D3Cg QRLzdbzu2BR14saV2tkJ+lKbt0zvurYgOJ2J2CaCz2o44n0P82ll3hCnUCV8WfYW G70iKi8+8y73jMCOYf5hPO3O5uUJXg3dpGjgaRHHzkoOks2A+3QEWlX0CWEyoO4U Zg2avKpZNgHj6I5TxyiHD4EkhU3/e3GHbB4neUyvU36zpC6+g54a3CM7HoxWBTUn NtU2C7jDHJozUnn1S3IGOIdwv5CJ7rJNfgp+m/BOw9xuF1g/Rt7QG68J5KK0/JQE IL68zAQzWX/1KubIT3Ro5AD/2tR8CKXsCv72U8CdpjSQFFnV+6rFAvS2M7e1D6dm Lj3yRS4EcKQEgYUKltyWGX2GqnLENGLOUa2wd3aiJY7kiOdNgrA= =75gr -END PGP SIGNATURE-
Get details about server-side errors
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hello, everyone. I'm trying to get some information about a (fairly) simple case when a user is searching using a wide-open query where they can type in anything they want, including field-names. Of course, it's possible that they will try to enter a field-name that does not exist and Solr will complain, like this: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/users: undefined field bad_field (This is what happens when I search my user database for "bad_field:foo" .) What is the best way to discover what happened on the server -- from a code perspective. I can certainly read the above as a human and see what the problem is. But my users won't understand (exactly) what that means and I don't always have English-language searching my user databas e. Is there a way to check for "was the error a bad field name?" and "what was the bad field name (or names) detected?" I looked at javadoc and saw two hopefuls: 1. code -- unfortunately, this is the HTTP response code 2. metadata -- unfortunately, this just returns {error-class=org.apache.solr.common.SolrException,root-error-class=org.a pache.solr.common.SolrException}, which is already obvious from the exception type. Is there something in SolrJ that I'm overlooking, here, or am I limited to what I can parse out of the exception's "getMessage" string? Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlxjRDIACgkQHPApP6U8 pFihjBAAty32GuiOj8XnwJu55Y9tYWFoQOhNEEJEGmeh1mOv4fxj5D4Rh+7MXTJB 7APLZ5IlNjpGMQ5ygLpfFTrLIEljn/f/a8hRslH/g+H3p/y4EJgeyvbNHaQZdkuh HlKQ9Z/M6HK+1KGvVNB+9onU3hs7+Tct7TjWO/cZ031CPovDknsYTbOBoLW+tszS BrsR7up0s7AOWYNkXTu8i0tf6A6nkF8+YJvml2mxNvXUCZrhHh71eL3R+v1/zGun 6yYyGCPm5rO9Pkxq+It4Fo8pkvo3z6k65NAflMXsFcEwWaf/5OmzAjE+TrDdqfeQ InKDsXj3w6ZOHOEWN/lq8kK1alZUP0i8MQJHpAXzlPL213joP9mN2AeNk7airIXE hPPmUGKjOVlMDJg6ICJiPVibMjwLBiy68TQJj2DX+dMVeYTQSroPBw5VUJhrxinV +4y6podDJ6xs+27LxfI8DZ8nGAZP/tFYMCLNIdnhOg682PfaiD3ZiDDu5dJvm871 7N0EK3oCkoAmQ3l7xQNtz/0nDdI5TKSOtI3KBXTY72/8dfZlSoE4kwmBh56SrKQJ KNfT54Cj329p5qKoNBy1bKxw4GyUx0UbKQo8HyFqzK0gQHlH+23taq5IePhocW12 uUMGSvVUnm/E+C5w3OGLJ96Y6a3aiNUORinkTJePz+sJoUbCIwY= =Ril5 -END PGP SIGNATURE-
Re: Page faults
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Erick, On 1/7/19 11:52, Erick Erickson wrote: > Images do not come through, so we don't see what you're seeing. > > That said, I'd expect page faults to happen: > > 1> when indexing. Besides what you'd expect (new segments written > to disk), there's segment merging going on in the background which > has to read segments from disk in order to merge. > > 2> when querying, any fields returned as part of a doc that has > stored=true docValues=false will require a disk access to get the > stored data. A page fault is not necessarily a disk access. It almost always *is*, but it's not because the application is calling fopen(). It's because the OS is performing a memory operation which often results in a dip into virtual memory. Jeremy, are these page-faults occurring on all the machines in your cluster, or only some? What is the hardware configuration of each machine (specifically, memory)? What are your JVM settings for your Solr instances? Is anything else running on these nodes? It would help to understand what's happening on your servers. "I'm seeing page faults" doesn't really help us help you. Thanks, - -chris > On Mon, Jan 7, 2019 at 8:35 AM Branham, Jeremy (Experis) > wrote: >> >> Does anyone know if it is typical behavior for a SOLR cluster to >> have lots of page faults (50-100 per second) under heavy load? >> >> We are performing load testing on a cluster with 8 nodes, and my >> performance engineer has brought this information to attention. >> >> I don’t know enough about memory management to say it is normal >> or not. >> >> >> >> The performance doesn’t appear to be suffering, but I don’t want >> to overlook a potential hazard. >> >> >> >> Thanks! >> >> >> >> >> >> >> >> >> >> Jeremy Branham >> >> jb...@allstate.com >> >> Allstate Insurance Company | UCV Technology Services | >> Information Services Group >> >> > -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwzpYsACgkQHPApP6U8 pFgSHxAAgaXV5wkwV7Ru2QyhnvxUnIWY4Iom0IdZYrDuZBDxmFx9wzE7P33zmR3E nrgZCqBtAMdxRSwG9BfyKircChZBssqtQpskw6mgJyzRyGvKVJjJ68r0vEio3Kjo HjaJczBFWvdOKm42W1Li4SeymGyYXu/jmdkWLcIbEM4BgDQLf1HhSEphDeZzP4ST GNDBrIA6XkUJwE1r58FUuj9l0XSKUAPLOPNAx1qGiAn4fKdbysVHvLcvJvJzC0pC 1kx000r+Mqdd61EzhM20ZDIvg2F3vgFgGCUtB31hIi18bfD8whoAafL2FSMkIccD H7X09PpUK8qPM/oQgqCKTtfmVR3M2pi3CSxLFSQ1/QucnF2wxWknOOWUH1TMU/L2 KUQHS6GwuTk+R/8PxdBRsZI8ON3MVb690ECV4QplYlkrtygXrLRg2YOgifgAXsKL 5Kg2mrpKoxfNnDWaRksy4GUDTsSxbkd1rpnHJEZ8le26HXvz9wrug/FtNPzqP8S9 dan2gkgiSqOM9GKlKkA72ROyQDhZa5YiXfGNdRrmfkiQzlDBEcGpD8pg1GwskRJl yidTBfvRSyCHsI5NBGf65nTG+2WfUnr8wClHVK5QQGVilHBn6KzeHeDTL9ZpHvcn GhkDMvc+9f8DR7Hr/mTiGjYIAvJZYiIJeYUoe0Bl2BHmGDv0tEk= =OpZo -END PGP SIGNATURE-
Re: solr optimize command
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 11/29/18 18:53, Shawn Heisey wrote: > On 11/29/2018 4:41 PM, Christopher Schultz wrote: >> When mine returned (with wait=true as a request parameter), I got >> a JSON response telling me how long it took. > > That's what I would expect. > > If you have to explicitly include parameters like "wait" or > "waitSearcher" to make it block until the optimize is done, then in > my mind, that's a bug. That should be the default setting. In the > 7.5 reference guide, I only see "waitSearcher", and it says the > default is true. I didn't test it without that parameter. I used it because it was suggested to me earlier this week on this list. It may in fact be optional. I was using Solr 7.4. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwBZjQACgkQHPApP6U8 pFj2ZBAAq741UaizWQkea2dsupyJMUAs+K0A3oHh3Z9QCJqonXdgew620HMmlj2v iTD1ECZ0OxUy6h4fDKAUFw96FO0/86gsGGMI+BVGZjbBN46oXwpUsNik3gEj3h/E VjEZ0Nh0qpA783ug2Ezl7zHfeEBd+TRo6tHP1T7S6xp1JFqAs+kB5hxnepipFA/Q SFssFmdub/0TTDSfxi2taPWxkHVCJO6Atse2HGhiLiRve/ZnV1LabnZnV92OCK6q YucL3HzrOe23mu1qGJ2uzRM6M8pVkw5QioAUm/ESOFTVv5wqTwMPQ/HGTqO7W/Mp qU0v3D8+ziKUtCW94UGSEDC5eBOhlr270JWOplYyrxhL/szCCSZ2yVLYaIz6ZXyI EF5jh1WUsh6w+TrPPN0obUtbN/ZH6SLFzQzocbV6ZhZZL7kqgrAGmw1TVcokR0fC HhXj0sEukrhRGBaog3+8w21j/ACywb02kTyl21ntpo/+flKHKpitafU2juLHJswD nb3Q2YAD2bIWX8Ms9QTtozAc+EFVmNw5j2piFprTtWYdbAfqqTS/MxKqZoy/8L49 qiS1lY3eivOGDQufhAhdTO8jTzly5V6Y6xlJ8i0n0oQiPP2FY8yZeCLphdE5Wo/i jfoauU9WwRGWdq1dwPUe1ZAg9eft2rlvexrVyjh7vjVk92sp17M= =0Tlc -END PGP SIGNATURE-
Re: solr optimize command
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 11/29/18 17:56, Shawn Heisey wrote: > On 11/28/2018 6:22 PM, Wei wrote: >> I use the following http request to start solr index >> optimization: >> >> http://localhost:8983/solr//update?skipError=true -F >> stream.body=' ' >> >> The request returns status code 200 shortly, but when looking at >> the solr instance I noticed that actual optimization has not >> completed yet as there are more than 1 segments. Is the optimize >> command async? What is the best approach to validate that >> optimize is truly completed? > > I do not know how that request can return a 200 before the optimize > job completes. The "wait" parameters (one of which Christopher > mentioned) should all default to true, and I don't see them on your > request. As far as I know, the operation is NOT asynchronous. Are > you absolutely sure that it returned a 200? I'd like to see the > actual response to verify. > > I hate to assume you're wrong, but I think it's probably more > likely that your HTTP request timed out because of overly > aggressive timeout settings, probably a socket timeout. If you > have definitive proof that you received the 200 and a > normal-looking response, then we'll need to look deeper. Do you > have the entry in solr.log for the optimize request? When mine returned (with wait=true as a request parameter), I got a JSON response telling me how long it took. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwAeT0ACgkQHPApP6U8 pFiXchAAxMzdVbXF5WrAC3K0E5rwg99hTh9n6WdzrtaZvGfKGCI9HyxMSrp/mZ8l CzHXCx7gYZboW2qPHQtfZM0jknNtWHdOd5CahmXzd4vpFee85PJlWWru8cVEsnHZ hQfNhX/kVRbFlA3lA++1gYZbl/cqdlqMdfF3pn/X3nnwto7xSsYg1vKKi0+4HW/5 yWm8AmsLYK8eluHOcpheCTOGhT9NPt5OkTsT6FxLSDfyAoSVN8GnCIKZJwRtX6Ni m826mtc55BSb0dM6Zh3xRyLl5O1BIknIC8QaZtL1OiAb/8r3iJoc/vfhP64Jzq+5 enVORXbdqeWjPF+mJoBNPnCb14VnvzyUX+G4PhrN9jPgsWzlv2FDBwWBopOiAl/L GZKSRRasxQ6Uwk09U2x6PPwlWCP6fC3i4xJoM++Rj1VRRCu6j7duyats9UBXlQ7M bJcjlvAVQgaAMgndBJikPEFljyhgg+Tl8iAtf1PMUO8nPoboAwIGmZZwRsoBAPXP rvvi1/V5KHlO6tDjQ5PLZVq9Bo71BbVDEUrJkyEUU+pAU1xZKyAhWANydCuasZ+n CLShdIlGb4LTzRdv8L0WklTdl9BAEGa0hhNjdNNJkNxBngaX9cCyTJdZi0ImswsG CZUlriNR0Ojue/yVDF+K5YxtQmw2slFysadX4kgNPO6LS2dwkeM= =Xd+S -END PGP SIGNATURE-
Re: solr optimize command
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Wei, On 11/28/18 20:22, Wei wrote: > Hi, > > I use the following http request to start solr index optimization: > > http://localhost:8983/solr//update?skipError=true -F > stream.body=' ' > > > The request returns status code 200 shortly, but when looking at > the solr instance I noticed that actual optimization has not > completed yet as there are more than 1 segments. Is the optimize > command async? What is the best approach to validate that optimize > is truly completed? Try this instead: http://localhost:8983/solr//update?optimize=true=true This will wait until the operation has completed. Note that your client (e.g. curl) may time-out after some time, so you'll want to adjust that timeout to make sure the client doesn't give-up before the optimization operation has completed. As others have said, perhaps you don't actually need to optimize anythin g. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlv//R4ACgkQHPApP6U8 pFi3+w/8C+pvp/XBqHUPeVCd7rEvU1v7mPOx+9lQ/zmU/OE3Y7rmAmVBXiiFvXeT p2tKwhaNSrpx+MoGtaLu0GKg+nczD6K7yxOuRiltmr2KCg+6vCexJAd4yHFIt3H6 FmBnS3Couja7DwD/49pk75o/IkgXj3zok49fbt75AObttQOwXYo06yuijqN/08Wt ieKo/4iLYLwGd3Pii8DnBTu3+IXlQG2eBbdOsNBazr2az0UrOkO+Xuj+IKv8brYr LwMJ36e+m+Q2Gj8ZUvTQ8lTQNs7HD5giqtQXMelUXF7dcGPSwG9jCMvSTHfb+0rs woMIt6ehRsW2CeP2Vrm2qY5gxeVIK5LwkwRcjZUq4gIDes3eiOImDLCE8Fhxxn2Z xifKL7fQPlwdQWWXm2KDfTN+VvLVyWeA1n5z7drgD13VARdbA5c66iaIgguw0uKP an3YC8uYbcZJolyWt/yu9r01pBTUsnxCpXDo5s5xUAz0LWdoRSNRDS872ohZxRIR mcfCPbYUwNyhnclvzIPPcE8Z2sbCNaHcc2b5ZuavlA4PgEwFxgI1PweDXSa2Tuxg lzuus5uS/U8lGSrkheeQDBmX6nCl2n1jsnXS4CXLGNHzH3uOVkJFmFraVNZCav16 t7SKTQc8Yc9P3AbdesG13C0iQDGjo3WLoKg7ghO3khoEL+NMKbQ= =1wy3 -END PGP SIGNATURE-
Re: Period on-line index optimization
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Erick, On 11/27/18 20:47, Erick Erickson wrote: > And do note one implication of the link Shawn gave you. Now that > you've optimized, you probably have one huge segment. It _will not_ > be merged unless and until it has < 2.5G "live" documents. So you > may see your percentage of deleted documents get quite a bit larger > than you've seen before merging kicks in. Solr 7.5 will rewrite > this segment (singleton merge) over time as deletes accumulate, or > you can optimize/forceMerge and it'll gradually shrink (assuming > you do not merge down to 1 segment). Ack. It sounds like I shouldn't worry too much about "optimization" at all. If I find that I have a performance problem (hah! I'm comparing the performance to a relational table-scan, which was intolerably long), I can investigate whether or not optimization will help me. > Oh, and the admin UI segments view is misleading prior to Solr > 7.5. Hover over each one and you'll see the number of deleted docs. > It's _supposed_ to be proportional to the number of deleted docs, > with light gray being live docs and dark gray being deleted, but > the calculation was off. If you hover over you'll see the raw > numbers and see what I mean. Thanks for this clarification. I'm using 7.4.0, so I think that's what was confusing me. I'm fairly certain to upgrade to 7.5 in the next few weeks. For me, it's basically a untar/stop/ln/start operation as long as testing goes well. - -chris > On Tue, Nov 27, 2018 at 2:11 PM Shawn Heisey > wrote: >> >> On 11/27/2018 10:04 AM, Christopher Schultz wrote: >>> So, it's pretty much like GC promotion: the number of live >>> objects is really the only things that matters? >> >> That's probably a better analogy than most anything else I could >> come up with. >> >> Lucene must completely reconstruct all of the index data from >> the documents that haven't been marked as deleted. The fastest >> I've ever seen an optimize proceed is about 30 megabytes per >> second, even on RAID10 disk subsystems that are capable of far >> faster sustained transfer rates. The operation strongly impacts >> CPU and garbage generation, in addition to the I/O impact. >> >>> I was thinking once per day. AFAIK, this index hasn't been >>> optimized since it was first built which was a few months ago. >> >> For an index that small, I wouldn't expect a once-per-day >> optimization to have much impact on overall operation. Even for >> big indexes, if you can do the operation when traffic on your >> system is very low, users might never even notice. >> >>> We aren't explicitly deleting anything, ever. The only deletes >>> occurring should be when we perform an update() on a document, >>> and Solr/Lucene automatically deletes the existing document >>> with the same id >> >> If you do not use deleteByQuery, then ongoing index updates and >> segment merging (which is what an optimize is) will not interfere >> with each other, as long as you're using version 4.0 or later. >> 3.6 and earlier were not able to readily mix merging with ongoing >> indexing operations. >> >>> I'd want to schedule this thing with cron, so curl is better >>> for me. "nohup optimize &" is fine with me, especially if it >>> will give me stats on how long the optimization actually took. >> >> If you want to know how long it takes, it's probably better to >> throw the whole script into the background rather than the curl >> itself. But you're headed in the right general direction. Just >> a few details to think about. >> >>> I have dev and test environments so I have plenty of places to >>> play-around. I can even load my production index into dev to >>> see how long the whole 1M document index will take to optimize, >>> though the number of segments in the index will be different, >>> unless I just straight-up copy the index files from the disk. I >>> probably won't do that because I'd prefer not to take-down the >>> index long enough to take a copy. >> >> If you're dealing with the small index, I wouldn't expect copying >> the index data while the machine is online to be problematic -- >> the I/O load would be small. But if you're running on Windows, I >> wouldn't be 100% sure that you could copy index data that's in >> use -- Windows does odd things with file locking that aren't a >> problem on most other operating systems. >> >>> You skipped question 4 which was "can I update my index during >>> an optimization&qu
Re: Period on-line index optimization
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Walter, On 11/27/18 12:31, Walter Underwood wrote: > Optimize is just forcing a full merge. Solr does merges > automatically in the background. Understood. > It has been automatically doing merges for the months you’ve been > using it. Let it continue. Don’t bother with optimize. Fair enough. > It was a huge mistake to name that function “optimize”. Ultraseek > had a button labeled “Merge”. I understand that "optimize" makes it sounds like, without performing that operation, that the index is "not optimized" which sounds bad. I'm not hung-up on the terminology. In my live index, I can see total 20 segments. 7 of them are "all gray" and the other 13 are at various levels of "dark grayness". I haven't been able to find a reference for what those colors mean, but they don't seem to be correlated with any data I can see on each segment . When I have run an "optimize" operation on a test index, I can see a single segment which is shown all in "light gray", whatever that means. Other than wasting my time, are there any negative consequences for periodically "optimizing" (or merging) the index? Thanks, - -chris >> On Nov 27, 2018, at 9:04 AM, Christopher Schultz >> wrote: >> > Shawn, > > On 11/27/18 11:01, Shawn Heisey wrote: >>>> On 11/27/2018 7:47 AM, Christopher Schultz wrote: >>>>> I've got a single-core Solr instance with something like 1M >>>>> small documents in it. It contains user information for >>>>> fast-lookups, and it gets updated any time relevant >>>>> user-info changes. >>>>> >>>>> Here's the basic info from the Core Dashboard: >>>> >>>> >>>> >>>>> I'm wondering how often it makes sense to "optimize" my >>>>> index, because there is plenty of turnover of existing >>>>> documents. That is, plenty of existing users update their >>>>> info and therefore the Lucene index is being updated as >>>>> well -- causing a document-delete and document-add >>>>> operation to occur. My understanding is that leaves a lot >>>>> of dead space over time, and I'm assuming that it might >>>>> even slow things down as the ratio of useful data to total >>>>> data is reduced. >>>> >>>> The percentage of deleted documents here is fairly low. About >>>> 7.6 percent. Doing an optimize with deleted percentage that >>>> low may not be worthwhile. >>>> >>>> On the other hand, it *would* improve performance by a little >>>> bit to optimize. For the index with the stats you mentioned, >>>> you'd be going from 15 segments to one segment. And with an >>>> index size of under 300 MB, the optimize operation would >>>> complete pretty quickly - likely a few minutes, maybe even >>>> less than one minute. > Okay. What I really don't want to do is interrupt normal > operation. > >>>>> Presumably, optimizing more often will reduce the time to >>>>> perform a single optimization operation, yes? >>>> >>>> No, not really. It depends on what documents are in the >>>> index, not so much on whether an optimization was done >>>> previously. Subsequent optimizes will take about as long as >>>> the previous optimize did. > > So, it's pretty much like GC promotion: the number of live objects > is really the only things that matters? > >>>>> Anyhow, I'd like to know a few things: >>>>> >>>>> 1. Is manually-triggered optimization even worth doing at >>>>> all? >>>> >>>> Maybe. See how long it takes, how much impact it has on >>>> performance while it's happening, and see if you can get an >>>> estimate of how much extra performance you get from it once >>>> it's done. If the impact is low and/or the benefit is high, >>>> then by all means, optimize regularly. >>>> >>>>> 2. If so, how often? Or, maybe not "how often [in >>>>> hours/days/months]" but maybe "how often [in deletes, >>>>> etc.]"? >>>> >>>> For an index that size, I would say you should aim for an >>>> interval between once an hour and once every 24 hours. Set >>>> up this timing based on what kind of impact the optimize >>>> operation has on performance while it's occurring. Might be >>>> best to
Re: Period on-line index optimization
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 11/27/18 11:01, Shawn Heisey wrote: > On 11/27/2018 7:47 AM, Christopher Schultz wrote: >> I've got a single-core Solr instance with something like 1M small >> documents in it. It contains user information for fast-lookups, >> and it gets updated any time relevant user-info changes. >> >> Here's the basic info from the Core Dashboard: > > > >> I'm wondering how often it makes sense to "optimize" my index, >> because there is plenty of turnover of existing documents. That >> is, plenty of existing users update their info and therefore the >> Lucene index is being updated as well -- causing a >> document-delete and document-add operation to occur. My >> understanding is that leaves a lot of dead space over time, and >> I'm assuming that it might even slow things down as the ratio of >> useful data to total data is reduced. > > The percentage of deleted documents here is fairly low. About 7.6 > percent. Doing an optimize with deleted percentage that low may > not be worthwhile. > > On the other hand, it *would* improve performance by a little bit > to optimize. For the index with the stats you mentioned, you'd be > going from 15 segments to one segment. And with an index size of > under 300 MB, the optimize operation would complete pretty quickly > - likely a few minutes, maybe even less than one minute. Okay. What I really don't want to do is interrupt normal operation. >> Presumably, optimizing more often will reduce the time to >> perform a single optimization operation, yes? > > No, not really. It depends on what documents are in the index, > not so much on whether an optimization was done previously. > Subsequent optimizes will take about as long as the previous > optimize did. So, it's pretty much like GC promotion: the number of live objects is really the only things that matters? >> Anyhow, I'd like to know a few things: >> >> 1. Is manually-triggered optimization even worth doing at all? > > Maybe. See how long it takes, how much impact it has on > performance while it's happening, and see if you can get an > estimate of how much extra performance you get from it once it's > done. If the impact is low and/or the benefit is high, then by > all means, optimize regularly. > >> 2. If so, how often? Or, maybe not "how often [in >> hours/days/months]" but maybe "how often [in deletes, etc.]"? > > For an index that size, I would say you should aim for an interval > between once an hour and once every 24 hours. Set up this timing > based on what kind of impact the optimize operation has on > performance while it's occurring. Might be best to do it once a > day at a low activity time, perhaps 03:00. With indexes slightly > bigger than that, I was doing an optimize once an hour. And for > the bigger indexes, once a day. I was thinking once per day. AFAIK, this index hasn't been optimized since it was first built which was a few months ago. >> 3. During the optimization operation, can clients still issue >> (read) queries? If so, will they wait until the optimization >> operation has completed? > > Yes. And as long as you don't use deleteByQuery, you can even > update the index while it's optimizing. The deleteByQuery > operation will cause problems, especially when the index gets > large. With your small index size, you might not even notice the > problems that mixing optimize and deleteByQuery will cause. > Replacing deleteByQuery with a standard query to retrieve ID > values and then doing a deleteById will get rid of the problems > that DBQ causes with optimize. We aren't explicitly deleting anything, ever. The only deletes occurring should be when we perform an update() on a document, and Solr/Lucene automatically deletes the existing document with the same id . >> 5. Is it possible to abort an optimization operation if it's >> taking too long, and simply discard the new data -- basically, >> fall-back to the previously-existing index data? > > I am not aware of a way to abort an optimize. I suppose there > might be one ... but in general it doesn't sound like a good idea > to me, even if it's possible. > >> 6. What's a good way to trigger an optimization operation? I >> didn't see anything directly in the web UI, but there is an >> "optimize" method in the Solr/J client. If I can fire-off a >> fire-and-forget "optimize" request via e.g. curl or similar tool >> rather than writing a Java client, that would be slightly more >> convenient for me. > > Removal of the optimize button fr
Period on-line index optimization
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I've got a single-core Solr instance with something like 1M small documents in it. It contains user information for fast-lookups, and it gets updated any time relevant user-info changes. Here's the basic info from the Core Dashboard: Last Modified: less than a minute ago Num Docs: 1011023 Max Doc: 1095364 Heap Memory Usage: -1 Deleted Docs: 84341 Version: 2582476 Segment Count: 15 Current: Ø Replication (Master) Version Gen Size Master (Searching) 1543329227929 491727 277.23 MB Each document add/update operation has an immediate explicit "commit" operation, which may be unnecessary, but it's there in case it makes any difference for this question. I'm wondering how often it makes sense to "optimize" my index, because there is plenty of turnover of existing documents. That is, plenty of existing users update their info and therefore the Lucene index is being updated as well -- causing a document-delete and document-add operation to occur. My understanding is that leaves a lot of dead space over time, and I'm assuming that it might even slow things down as the ratio of useful data to total data is reduced. Presumably, optimizing more often will reduce the time to perform a single optimization operation, yes? Anyhow, I'd like to know a few things: 1. Is manually-triggered optimization even worth doing at all? 2. If so, how often? Or, maybe not "how often [in hours/days/months]" but maybe "how often [in deletes, etc.]"? 3. During the optimization operation, can clients still issue (read) queries? If so, will they wait until the optimization operation has completed? 4. During the optimization operation, can clients still issue writes? If so, will they wait until the optimization operation has completed? 5. Is it possible to abort an optimization operation if it's taking too long, and simply discard the new data -- basically, fall-back to the previously-existing index data? 6. What's a good way to trigger an optimization operation? I didn't see anything directly in the web UI, but there is an "optimize" method in the Solr/J client. If I can fire-off a fire-and-forget "optimize" request via e.g. curl or similar tool rather than writing a Java client, that would be slightly more convenient for me. Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlv9WOgACgkQHPApP6U8 pFjPhQ//TOkMwES1ytAbugFE/bZdpwff9LS3sRbCEEL6Bbl9yeqZMXqDf652p2CN P9EusGW0WTSvhRaJb50H+jo4y5QxJmV36aBkMej7/o4yFw0hIRSqqihlbEFAVkI1 VMGWtr7s0Vv9O+/Wj0MP8FAizwm8d7nYl03rTvfY0b+BESOQHXv5I8DEai1+/mgF Mx49HG82qXo/9OZocrv4tal97juF7UcNDowVlnk0wcuk5LjEuilhzpOXtcTG9QmB Nc4H//d6hcDN0tp/az5hY1EoU3xmSdW2m243kgdzjVjz/Q9FotB0jAo3WGbD5EiB nmM1Yp0bKfRX/xLPHbtJ/wlQSSY4Dm/E0Y5Nb5fZFjnHtEke7/hWX1Qxps28gOs+ hXfm4WyjaTirnJk5h+I3wVJvzaHycD0vIFNwJ18JkLpPaVZ56iDfHcKVc5eHlWaa gaKYyLhz8DluZC//ydVFAbqDy7xOIeh/fiACFHM/SH9KjdempaVD1KrlO1/fxG0v U9Z4xI5GladTUnelcvvggCbl+9wFe3pO8xLqN4NMdftn5CNDFDTIs9Diph19jQJr sf7ETDQwWBebc6BesXdmFyKT8zHzX+x9uU3LtF9Tww5H0AS4JseEfogB3bsF6r3X MlRId02UPSuAMmzbMLn52jX0NljbMRNN1rHy3tVGpJD9OPgU3A8= =pb5Z -END PGP SIGNATURE-
Re: Solr JVM Memory settings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hendrik, On 10/12/18 02:36, Hendrik Haddorp wrote: > Those constraints can be easily set if you are using Docker. The > problem is however that at least up to Oracle Java 8, and I believe > quite a bit further, the JVM is not at all aware about those > limits. That's why when running Solr in Docker you really need to > make sure that you set the memory limits lower. I usually set the > heap and metaspace size. How you set them depends again a bit on > your Solr configuration. I prefer the JVM to crash due to memory > limits rather then the Linux OOM Killer killing the JVM as the > OutOfMemoryError from the JVM does at least state what memory was > out. Limiting the native memory used by attempting to limit the heap is not actually limiting the native memory used. It's just an attempt to do so. If you limit the native memory using OS limits (or, using Docker, simply make it look like there is less system memory) then you haven't actually achieved anything. You could have done that simply by lowering heap values and avoided the complexity of Docker, etc. - -chris > On 11.10.2018 16:45, Christopher Schultz wrote: Shawn, > > On 10/11/18 12:54 AM, Shawn Heisey wrote: >>>> On 10/10/2018 10:08 PM, Sourav Moitra wrote: >>>>> We have a Solr server with 8gb of memory. We are using solr >>>>> in cloud mode, solr version is 7.5, Java version is Oracle >>>>> Java 9 and settings for Xmx and Xms value is 2g but we are >>>>> observing that the RAM getting used to 98% when doing >>>>> indexing. >>>>> >>>>> How can I ensure that SolrCloud doesn't use more than N GB >>>>> of memory ? >>>> Where precisely are you seeing the 98% usage? It is >>>> completely normal for a modern operating system to report >>>> that almost all the system memory is in use, at least after >>>> the system has been shuffling a lot of data. All modern >>>> operating systems will use memory that has not been >>>> specifically allocated to programs for disk caching purposes, >>>> and system information tools will generally indicate that >>>> this memory is in use, even though it can be instantly >>>> claimed by any program that requests it. >>>> >>>> https://en.wikipedia.org/wiki/Page_cache >>>> >>>> If you tell a Java program that it is limited to a 2GB heap, >>>> then that program will never use more than 2GB, plus a little >>>> extra for the java runtime itself. I cannot give you an >>>> exact figure for that little bit extra. But every bit of >>>> data on disk that Solr accesses will end up (at least >>>> temporarily) in the operating system's disk cache -- using >>>> that unallocated memory. >>>> >>>> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM > To be fair, the JVM can use *much more* memory than you have > specified for your Java heap. It's just that the Java heap itself > wont exceed those values. > > The JVM uses quite a bit of native memory which isn't counted in > the Java heap. There is only one way I know of to control that, and > it's to set a process-limit at the OS level on the amount of > memory allowed. I'm not sure how sensitive to those limits the JVM > actually is, so attempting to artificially constrain the JVM might > end up with a native OOM crash. > > -chris > -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlvA4O0ACgkQHPApP6U8 pFiCHg/+P+/yoSrvMd2uMyDK16nMCOIdxAL1gdS++DqS+qPmch1BHJTA9nuHybF4 j6WElpCI7Q3HP/sgsGE8kHE6Kg+DFJNz7mGJqgXjnSkm90LzETRFMqa959fTgBo6 SILD4n4LnZI844VoaKb2gIVibr804hloxX5UDe0XYFp3EtcVi4QMC5Q2ovn8+RoJ S/LJx/VQi3AqtcCaEYAAKpYrKxO3OkoIKnN+oC55ag/16zh9StT2TUI03bBslcxn PkS5zdsSmsS7NydSR4Gn4C7wAGyL3hGoU6pD+GhvYE9EF29KxHXFSIe2FJQ6mdRf ikZvm17U8OFNwqlB4OOLziGvOkcmIgtqchnhUm80Qwtn0ZMbql2zwlIhOSPWbuPL lq3F09p1QBqPjbxJdrcmpoSFH8jvmIPdrPOl3BbPEmDzNdnF03sEGP5gDyJ9/INB AD/QhqvQEKUtMBPX+1/9dxOm+JyUDlARZQ7p4k1BeFjl2BI8imLUK/c6JlWJ757G QWk+0Ff3R02va+ITWNvGs5C1uOnu2g58eqAggREPWXmXAj9nqJ5EyPkNAaGJBheo NasGNSXVnjN+hk4QlMTAJ3C5u0Q5lW3HCOXj8Mufo7LE8M96OjRkM09o87NG9sGT EdX7V8Ypw758Jt9xcms6U9tC2TqekJ9AYu+VLsoGa4OZgy5hfDk= =Sq+f -END PGP SIGNATURE-
Re: Solr JVM Memory settings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 10/11/18 12:54 AM, Shawn Heisey wrote: > On 10/10/2018 10:08 PM, Sourav Moitra wrote: >> We have a Solr server with 8gb of memory. We are using solr in >> cloud mode, solr version is 7.5, Java version is Oracle Java 9 >> and settings for Xmx and Xms value is 2g but we are observing >> that the RAM getting used to 98% when doing indexing. >> >> How can I ensure that SolrCloud doesn't use more than N GB of >> memory ? > > Where precisely are you seeing the 98% usage? It is completely > normal for a modern operating system to report that almost all the > system memory is in use, at least after the system has been > shuffling a lot of data. All modern operating systems will use > memory that has not been specifically allocated to programs for > disk caching purposes, and system information tools will generally > indicate that this memory is in use, even though it can be > instantly claimed by any program that requests it. > > https://en.wikipedia.org/wiki/Page_cache > > If you tell a Java program that it is limited to a 2GB heap, then > that program will never use more than 2GB, plus a little extra for > the java runtime itself. I cannot give you an exact figure for > that little bit extra. But every bit of data on disk that Solr > accesses will end up (at least temporarily) in the operating > system's disk cache -- using that unallocated memory. > > https://wiki.apache.org/solr/SolrPerformanceProblems#RAM To be fair, the JVM can use *much more* memory than you have specified for your Java heap. It's just that the Java heap itself wont exceed those values. The JVM uses quite a bit of native memory which isn't counted in the Java heap. There is only one way I know of to control that, and it's to set a process-limit at the OS level on the amount of memory allowed. I'm not sure how sensitive to those limits the JVM actually is, so attempting to artificially constrain the JVM might end up with a native OOM crash. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlu/YgoACgkQHPApP6U8 pFjcbBAAgYegk20pYvfu3vcrAKxj3s+RSMGRPZ+nN5g0KYQFuhFgptYE+TqjLfBX geekQUNqNUHO5psMA5q/6m6b3LwpqrMxJiapv0wWQ2wPah21CgLs/P/iG+elNQ63 H0ZXbe3wX0P0onZbP4+sfDyzhujZ+5+gMooK87o8Q4z91hIVX1EZfM4lcaZ3pbnb JJ44YorWGPpXjQNEtOHfS7l/Q+8+6+XfEyfKha3JpRFcwcqgLpv23Koy4xgxgYr+ PMqfjptMBMjZ04xSdd491crm2yZowv3KH1Ss8v/L51rknGYPxCEkdKvPrUlpn+Rb 4WnQS6H//dJvQaLum/qR9Jxd+3vc13K7Mn++5Lu+jMbeEgaJU2hD4/ap/KMtFCqn eIXl6HQYPW36sVcm/MIpkRvAgx8vri17sd3/5sOYaETrp4SMxMN5W44GvgDdkbGF R9/tVBCFWb3p+o8eSKUf7QmARiN69DHGVwtQHWMIp8K9893IeHUNgVXKD7281zLB AjHPc7QTvAn4xne0X9lvQjr+YKOPxd9FFqMBejdKht9aBFQvApma9LtJT3FInrob QkSIx594KhoRltRy7E9t3XuWWGg8ujiuzKl6SEPsgXUC2Opwr4Wwu1yn9dCWkFJz RzCKbaDBaNmrK6HSEsoNvS+yQPksPxM8MuchFaCAMZpVOsobCM0= =77dD -END PGP SIGNATURE-
Re: Auto recovery of a failed Solr Cloud Node?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 9/27/18 10:00, Shawn Heisey wrote: > On 9/27/2018 7:24 AM, Kimber, Mike wrote: >> I'm trying to determine if there is any health check available >> to determine the above and then if the issue happens then an >> automated mechanism in SolrCloud to restart the instance. Or is >> this something we have to code ourselves? > > As shipped by the project, Solr will never restart itself > automatically. If it dies, it's dead until you start it again, > unless you implement something to restart it automatically.This is > intentional -- Solr almost never dies unless there's some kind of > problem -- not enough memory, corrupt software, etc.If Solr *does* > die, you need to figure out why and fix it, not rely on an > automatic restart. I thought someone recently mentioned (but I cannot find a reference, sorry) that Solr would automatically restart if an OutOfMemoryError was encountered. Is that only for single-note Solr (i.e. non-cloud/ZK)? - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluuqJ8ACgkQHPApP6U8 pFgxNQ/7BFG6RbF1I/jQ0Pevs4Yum4BElkAEknEv7JLar9sWwuGCNBe4Zj1wgpaF Gwmkt9TsQEs7/1amR4nUu1SUAaFkhw3R920/5ad/mz+qzvtyV0VyEEYhiJrAxCoH EA+fxKYjy/9DgZ5ZFLaBbOl0JUk+6uqoaEX7RoNAZxyGjqVzeVR7JXBzeNl1Wagg 9wiq2MQrP1o8xwsBvQzQPO/sB6YZOlGLiAiAcJ7NAlt7RF4V5XvvG1fz7NM84w1e iKImZiBorxEl6eangxr8TU2HqkDdfMHxXmAGlmqGuGEkut/agPjM1HeR63vzjy1p Jslr3Ef2+NIslyMg0jk4e6VBppg1wHJOrrqOyxg0xlNvvJIa7XoinQH3zmu48pFN fLd4cXXHcZ2Xn4X7g74ey1o4HZyxgY+hu2aSNRUtQrSpcTO3WeF4lYe8cHk871K5 7YF9jJ7SVZblHPqzLNxj1BItmh0FyRflfW7XMPGYHzCs2dKS0IlNtSJYsSZsYKpn Z85nct0/gQ6uF2LMJdL7MKVbdyn/jtPndIHVSq6fP867r7kCtKY20njnnmjbQFd0 U5Ox+LJ+NU5nKBsckHsfS4TEr5PrUqlAhesgLhNmAhd1GL8iXYvBCLeE/fCNpNjj ixGNDKX9//z00TOOULyQVzwRjHvFLQyJ+LBmLf/11CxPIt3vxVg= =2fgU -END PGP SIGNATURE-
Re: Java version 11 for solr 7.5?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jeff, On 9/26/18 11:35, Jeff Courtade wrote: > My concern with using g1 is solely based on finding this. Does > anyone have any information on this? > > https://wiki.apache.org/lucene-java/JavaBugs#Oracle_Java_.2F_Sun_Java_ .2F_OpenJDK_Bugs > > "Do not, under any circumstances, run Lucene with the G1 garbage > collector. Lucene's test suite fails with the G1 garbage collector > on a regular basis, including bugs that cause index corruption. > There is no person on this planet that seems to understand such > bugs (see https://bugs.openjdk.java.net/browse/JDK-8038348, open > for over a year), so don't count on the situation changing soon. > This information is not out of date, and don't think that the next > oracle java release will fix the situation." That language is 3 years old and likely just hasn't been updated after it was no longer relevant. Also, it isn't attributed to anyone in particular (it's anonymous), so ... maybe it was one person's opinion and not a project-initiated warning. - -chris > On Wed, Sep 26, 2018 at 11:08 AM Walter Underwood > wrote: > >> We’ve been running G1 in prod for at least 18 months. Our biggest >> cluster is 48 machines, each with 36 CPUs, running 6.6.2. We also >> run it on our 4.10.4 master/slave cluster. >> >> wunder Walter Underwood wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Sep 26, 2018, at 7:37 AM, Jeff Courtade >>> >> wrote: >>> >>> Thanks for that... I am just starting to look at this I was >>> unaware of the license debacle. >>> >>> Automated testing up to 10 is great. >>> >>> I am still curious about the GC1 being supported now... >>> >>> On Wed, Sep 26, 2018 at 10:25 AM Zisis T. >>> wrote: >>> Jeff Courtade wrote > Can we use GC1 garbage collection yet or do we still need > to use CMS? I believe you should be safe to go with G1. We've applied it in in a >> Solr 6.6 cluster with 10 shards, 3 replicas per shard and an index of about 500GB (1,5T counting all replicas) and it works extremely well (throughput > 99%). The use-case includes complex search queries and faceting. There is also this post you can use as a starting point >> http://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-p roduction/ >> - -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >>> -- >>> >>> Jeff Courtade M: 240.507.6116 <(240)%20507-6116> >> >> -- > > Jeff Courtade M: 240.507.6116 > -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlur5i0ACgkQHPApP6U8 pFioLw/9HzPmNo1wtsfZDIsJjXy4i+F0YYqBFKRqXcPH8mLgpxicE5enRVI9he6p 1Z3Mz0wzFj/H91eWktmGyNSKSmFjkYI2IgCBrsZv1gPDFn3mI3TwapJgTR0J4GAg wXB/9GRHuCCTz7qvfQexBOwOt25OKVOhcvNFVI8bxV0hFl58Nlo56Qzt33X/JS32 jH2jIlz77pal1t5ZhnXJwCSWQyWsLnr5GtoxDisvvOl1o3Ey/WIllvCe8x7M+PvA 0/DIK/5niTSCwcv0LVCPIWsE/HCjsSWfdhnhtTnu1088OTKwb2dsa7wyBJItZUzw fCTcmcGclViGUa2QAnXNFiVPj1y0PhFxAPMCU6mWPerCSH6cYn5neicsp2AYovoj dRcs4LGrGf0S7PVJBq/DQdb44XbzvFkkp2SjS9WAnLpBv7RwP4bWfDvMCJsZWJOU 8J2r4ZbkVUjByQ3mAXMZN7bKC6hHBQLLzAwodloAV0OWHJ+Io96flTclDRPt4N6e J8olEQezDKcgkZDg0GV8I9WxUzeTHI+QvnZxUzwsT/sJUPgxjSDjHlous5HU29ay 6lynoEjVFJd4yYAwh6gaRPMw34xKFT6a62D6bDmcL0MqPCpbcbOny+kgx0k7bzl5 FNsapJ5vCIaG0/tPTuWEY/jaqmhNNznXDr+sEX5l8Sk1ZQz8+/U= =Y8qg -END PGP SIGNATURE-
Re: [SolrJ Client] Error calling add: connection is still allocated
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, On 9/18/18 11:10, Christopher Schultz wrote: > All, > > Our single-instance Solr server is just getting its first taste of > production load, and I'm seeing this periodically: > > java.lang.IllegalStateException: Connection is still allocated > > The stack trace shows it's coming from HTTP Client as called from > within Solr. > > We are using SolrJ 7.2.1 and Solr (server) 7.4.0. > > Our code looks something like this: > > private HashMap CLIENT_REGISTRY = new > HashMap(); > > synchronized HttpSolrClient getSolrClient(String url) throws > ServiceException, SolrServerException, IOException, > GeneralSecurityException { HttpSolrClient solrClient = > CLIENT_REGISTRY.get(url); > > if(null == solrClient) { log.info("Creating new HttpSolrClient > connected to " + url); > > solrClient = new HttpSolrClient.Builder(url) > .withHttpClient(getHttpClient()) .build(); > > solrClient.ping(); > > CLIENT_REGISTRY.put(url, solrClient); } > > return solrClient; } > > > [here's the code that uses the above] > > SolrClient solr = getSolrRegistry().getSolrClient(url); > > SolrInputDocument doc = new SolrInputDocument(); > > // Add stuff to the document > > solr.add(doc); solr.commit(); > > That's it. > > Other than not really needing the "commit" at the end, is there > anything wrong with how we are using SolrJ client? Are instances > of SolrJClient not thread-safe? My assumption was that they were > threadsafe and that HTTP Client would manage the connection pool > under the covers. > > Here is the full stack trace: > > com.chadis.api.business.RegistrationProcessor- Error processing > registration request java.lang.IllegalStateException: Connection is > still allocated at > org.apache.http.util.Asserts.check(Asserts.java:34) at > org.apache.http.impl.conn.BasicHttpClientConnectionManager.getConnecti on > > (BasicHttpClientConnectionManager.java:251) > at > org.apache.http.impl.conn.BasicHttpClientConnectionManager$1.get(Basic Ht > > tpClientConnectionManager.java:202) > at > org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.j av > > a:191) > at > org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java: 18 > > 5) > at > org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) > > at > org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java: 11 > > 1) > at > org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpC li > > ent.java:185) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpC li > > ent.java:83) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpC li > > ent.java:56) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSol rC > > lient.java:542) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClien t. > > java:255) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClien t. > > java:244) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) > > at > org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173) > at > org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138) > at > org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152) at > [my code, calling SolrClient.add()] > > Any ideas? > > Thanks, -chris > For those interested, it looks like I was naïvely using BasicHttpClientConnectionManager, which is totally inappropriate in a multi-user threaded environment. I switched to PooledHttpClientConnectionManager and that seems to be working much better, now. :) - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlulHN0ACgkQHPApP6U8 pFiWnRAAxIprILEjjF6rhwVZfmIFjLo8G5QNqMLZZ/4lIPEIDE8ojnCT8oiFpOru LakIn60DkQvgIioihvNtILjO7YJUPr0Pmoq+1feCQSSRtIFwBRoyvYahzUfx0v55 rWMRcJwWo/Vr1YsyQpH33O80F07himXxqmpiQeaQd+t+d9WYOpmBn8ENuG8QEd9g fc6yELLpJSpC6DFslCjRtAhMVNt3thdpbmYBwuKtoxHV8tuenXoxm/QQRLzJia/J AWsNB9boYNPF1T8rGt+eft4wej71t8ac00jzj+ylkQjPpPdexp+NSEGDRCfYoz6I bEVIVEy39f1SoyAlBnrS1QJqas9FwzMPd2tNv3y5fFCbYnKnHh50YaLgv1JAUali UQVDtlKGwPOrbbB2SBJiX3dK263RCQSSP9eJIDvyrGzRyRAgE9fzsVmvpokicMzx ZFiCZuIPPvmmGDvXBQ+lmtBvbav6ajsU3XyGEu+aawo6Lo7MgbdcLCPj839GR5Yd tDxMM2O8Wpkr4FRo7hbMlKJb5KoWJNtHjs5QQNFYUFmYwSXnU9OwH7B3fCpPVC2t OfBT5EKb8L1TWPog3zxFzrY5MQgJ2wSfBBphh2zeiFUSSLzb6T6F+ryv3rAzRO1U 6u6pfdf8AZ22gonPXs/mM4HbsL8dpP1Oyb6poHlaprxggKP7XqQ= =sHv2 -END PGP SIGNATURE-
Re: Command Line Indexer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Dan, On 9/18/18 2:51 PM, Dan Brown wrote: > I've been working on this for a while and it's finally in a state > where it's ready for public consumption. > > This is a command line indexer that will index CSV or JSON > documents: https://github.com/likethecolor/solr-indexer > > There are quite a few parameters/options that can be set. > > One thing to note is that it will update individual fields. That > is, unlike the Data Import Handler, it does not replace entire > documents. > > Please check it out and let me know what you think. How is this different from the bin/post tool that ships with Solr? Or is that you meant when you said "this is unlike the Data Import Handler". AIUI, Solr doesn't support updating a single field in a document. The document is replaced no matter how hard to try to be surgical about updating a single field. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhXlYACgkQHPApP6U8 pFjIeQ/+PRIx+I+IDW9XTqGNV5TIWYf+yQKC/4JpTV4Ndj7MZLsEEw+cfMvFTvQt 44dK7CnDKEDgQHZlMccWKd9/Th1k/5g40VMugBMsayRwUc83Onawdi4HQfnig4et VN0/RaZ/IBo2AThsgEvUNplXYyY3BtyrUt6miiBsVkhKstI/BnmKqZvsRgvVjH0P K1Xc5F2LNyXswvoIZqd3YmEa9p7CYMy7COsFV9KOeSymKlB7UoHulZqpJ9MRYkmn YWjc9dHIRjpz5TUrJqWhZUG03uGXGtTnaXEku1Hb98WyIUZcHxkwN8W7qm6/B0CG inPxfGRFH9EbUdcK4qeXmbQqty2sbKMQ6hogpRd/NEzgSWjDapiEUT1xz+p5V6wG XM0ILaiLJ8zHJA6oUY0w5SNNyhdnd76CDpCK7T7YBm+aIxUDv9zoj6TLNceEaLi0 SjfI83LvaR1gM/ZeVO77d+1IY9maU1+5m0EZFjAETfMGj5dwYRvBub0Oo6QQuLUm roF5R5b/bg/WjjPF1n4CJ7gTr/WBMzahKFnnQvoYD3OQqZpoasoEUifPpSd9OgvO yEok0VqwxPeXdHgE+Vy+BlXn6QqshB3BYnUSNbpFXlNsOIQojfJXkjcCa+dP1nyF JCElvmEgBG8K1WzGo4WAtVqJs7WDzQlmY2RDrETGsVbnqkTojXA= =AmkJ -END PGP SIGNATURE-
Re: [OT] 20180917-Need Apache SOLR support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Walter, On 9/18/18 11:24, Walter Underwood wrote: > It isn’t very clear from that page, but the two backup methods make > a copy of the indexes in a commit-aware way. That is all. One > method copies them to a new server, the other to files in the data > directory. > > Database backups generally have a separate backup format which is > independent of the database version. For example, mysqldump > generates a backup as SQL statements. > > The Solr backup is version-locked, because it is just a copy of the > index files. People who are used to database backups might be very > surprised when they could not load a Solr backup into a server with > a different version or on a different architecture. > > The only version-independent restore in Solr is to reload the data > from the source repository. Thanks for the explanation. We recently re-built from source and it took about 10 minutes. If we can get better performance for a restore starting with a "backup" (which is likely), we'll probably go ahead and do that, with the understanding that the ultimate fallback is reload-from-source. When upgrading to a new version of Solr, what are the rules for when you have to discard your whole index and reload from source? We have been in the 7.x line since we began development and testing and have not had any reason to reload from source so far. (Well, except when we had to make schema changes.) Thanks, - -chris >> On Sep 18, 2018, at 8:15 AM, Christopher Schultz >> wrote: >> > Walter, > > On 9/17/18 11:39, Walter Underwood wrote: >>>> Do not use Solr as a database. It was never designed to be a >>>> database. It is missing a lot of features that are normal in >>>> databases. >>>> >>>> [...] * no real backups (Solr backup is a cold server, not a >>>> dump/load) > > I'm just curious... if Solr has "no real backups", why is there a > complete client API for performing backups and restores? > > https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups. ht > > ml > > Thanks, -chris > > -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhQlkACgkQHPApP6U8 pFgcyRAAm4/FeeGn3eGv4CwNVfc9GrsUYc4/YexdwRT7oFUgqTC2kYeegj/YAgm3 ZwgfLDkDL0HR51i/pp4UG8MDTB5NFtp8Jg6+JSE4SutAA72N6vnwnC1Z/T52i0xG OqT0lFKeIL7Tt5c0FffbAMx5rgbFkzWHNWgFFqYFB0WZEzj4JM6rmAiDqLunRGPA xAZUnZCRMXhcVZT0bmmnSGlyU+JHL0ZQrJD/WX4DOJo2ZyAvP7pSYBEU+nTfyjzJ kE3rx1W9o269yc052FJTk5rRADuHIdirQQ/SrUN3O7Nn7Hqqi2/6sqyM34CF6wmX IPv9frb/WTvXQ3nsFYmQVB1jEBBr5S+9pztO3jOtUbGGKCjBpVGDcOXJVBwEDzPW yII5EjpjkoYwVB6shUI2nfaM/Y6r4aQLrZO6A5FFePhQTm6BGa/i2i1A1uLqfvHY WMmv/QMYqXZu7hXW6l5NKpO1AtSKTZBq8iXi9BiOXSHNSxo9mT9kPLu40Uh63Gyp EHI/SfAPWNwOj01pkbyV+siyhAWBVWpolN1SinnW3ZR16Yddd2lRmNxdfVCC32pL OfRxrChtZ736kvm4ELzmUAUjITxpZf7AFgsrB6zyTlPRn/jvnW7sRsIsOa4BHdGC e4oCzK7waITu6jam4Zz6e3efyxSDfT2YZ7811L098mody1n2g5k= =PaVE -END PGP SIGNATURE-
Re: [OT] 20180917-Need Apache SOLR support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Walter, On 9/17/18 11:39, Walter Underwood wrote: > Do not use Solr as a database. It was never designed to be a > database. It is missing a lot of features that are normal in > databases. > > [...] * no real backups (Solr backup is a cold server, not a > dump/load) I'm just curious... if Solr has "no real backups", why is there a complete client API for performing backups and restores? https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.ht ml Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhFp8ACgkQHPApP6U8 pFgnhBAAre3Zb2mu++WVmY6rZlcc3uoRkDRva6iR602wA/w/EUabCmHEkO9maYEm NoUREgBH9NtFPvYnjkEEL7/P/2hUErvRw0RfwsAo89ClYjjyMEH25+p5SNmudUmK fKRSLRUyCbpE8ahKTPG44gRlki03uJJ2GA0r3vbTLvdqm1p5KO6sE4k/r3IYJ0QI qZfUY4Un+LQ5vGMQ7qeGRcFhaAXVOaJmnLCRqGTS2hMTM1uM01TCblhOaeX5XHYD Yra4m15Sr1H8p3S0CFsP8oqvDND0jEC4MxM9mQvHOvq9IwMreTSwACga35Wm6ItD h1/Td9H/Puo8o9vQMaVfNcFD4TAqt+FkIHzQEb+FkQAMfbC9ZHsmBgvl8EUtPBq1 h2ODETEcD5SsmdfrP5OWUz+0OBhH7/HEgWRjHW9nSMzhPn4kYgpF/7VuFL8iy3re /8TviTf446I859QNragWXACdARhCzMo8AoXIs/dC70CGDvxuKmEcI6tad9Zsxcf2 +yaFa3Fzddulaeao4juZVbRVJ9eewFOSawMXDc14TeL6t13CxzxFasHiYu0C5euV XhKSWEHYj58ijS/KU4FMDCEWZhr1KWEKwfVp7hZ2CZZNW5kNPbv97otKvxB0cKyS LTK6PtZoZbTWXFa8rT3yq28/x6gMULQeo0ZBZLTXEJKpfAT2vAU= =Fh1S -END PGP SIGNATURE-
[SolrJ Client] Error calling add: connection is still allocated
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, Our single-instance Solr server is just getting its first taste of production load, and I'm seeing this periodically: java.lang.IllegalStateException: Connection is still allocated The stack trace shows it's coming from HTTP Client as called from within Solr. We are using SolrJ 7.2.1 and Solr (server) 7.4.0. Our code looks something like this: private HashMap CLIENT_REGISTRY = new HashMap(); synchronized HttpSolrClient getSolrClient(String url) throws ServiceException, SolrServerException, IOException, GeneralSecurityException { HttpSolrClient solrClient = CLIENT_REGISTRY.get(url); if(null == solrClient) { log.info("Creating new HttpSolrClient connected to " + url); solrClient = new HttpSolrClient.Builder(url) .withHttpClient(getHttpClient()) .build(); solrClient.ping(); CLIENT_REGISTRY.put(url, solrClient); } return solrClient; } [here's the code that uses the above] SolrClient solr = getSolrRegistry().getSolrClient(url); SolrInputDocument doc = new SolrInputDocument(); // Add stuff to the document solr.add(doc); solr.commit(); That's it. Other than not really needing the "commit" at the end, is there anything wrong with how we are using SolrJ client? Are instances of SolrJClient not thread-safe? My assumption was that they were threadsafe and that HTTP Client would manage the connection pool under the covers. Here is the full stack trace: com.chadis.api.business.RegistrationProcessor- Error processing registration request java.lang.IllegalStateException: Connection is still allocated at org.apache.http.util.Asserts.check(Asserts.java:34) at org.apache.http.impl.conn.BasicHttpClientConnectionManager.getConnection (BasicHttpClientConnectionManager.java:251) at org.apache.http.impl.conn.BasicHttpClientConnectionManager$1.get(BasicHt tpClientConnectionManager.java:202) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.jav a:191) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:18 5) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:11 1) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpCli ent.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpCli ent.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpCli ent.java:56) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrC lient.java:542) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient. java:255) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient. java:244) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173) at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138) at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152) at [my code, calling SolrClient.add()] Any ideas? Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhFVQACgkQHPApP6U8 pFhGeRAAgrg2GAmwhS9J/RBQC19SnebhevncmgMAF6nHhKegnXr8uv2fGvvySg53 BHCW0N3dtt9ZhI1VB7C9aBO65o/esW5rHi3/sIiY5QRfNIl39ajL8y98RWHJQEeA mhjoqNdqW/GopA3YaiCmf1YJZ0FsZV7iK04KboD5DRwhsqoa8XVDa44RYfdU4iDP cleMkQYY2KDSID0gJ2pf/Qj1acwR/hI2Q9+6kxc11/bXKCrWYAmLawV+DH6ZHqLF HT/7bNNJ+zV0df0WEKzUDQ9wVzTKXkzvYP7ueINIiomyZN7Pv+pF58BaAiICdlUr aqQMulLcKRC7qmN/5XqBZG00hkbH82n80o5foveTlQlC9yltSTbXjwFqd+FfOH8Y kBU+mHWkrZr/Ic29LkgLLzX1tG+QoXAgoEAASHOockaTX5oj2vsyFYQ5nVddOMNj /w1AgdpNztP5DLr1HQ6JhA+3nLZX43GaDxs/nENIOI2Xe36kXfS/so9Cv7DaAjQ8 OkGdOLUksQaukFZ/3MUwbgan5tQYYp4zSmky4RGS7Nd0ePTgvk4pH1uD4NFJnHWK fsSydLT43tiOWltQkzzby6QcpSg9WrV+0zsnEPQSQHH+ubDbFt03aXS1/tjYAZTF r8ttwGFfMQLa58hfWwBKMWtyM8m6n9gVMivhp5oENa3uFdo76kQ= =+WJu -END PGP SIGNATURE-
Re: Solr standalone health checks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 9/17/18 17:21, Shawn Heisey wrote: > On 9/17/2018 3:01 PM, Christopher Schultz wrote: >> The basic questions I'd like to have answered on a regular basis >> are: >> >> 1. Is the JVM up (this can be done with a ping, of course) 2. Is >> the heap healthy? Any OOMEs? 3. Will a sample query return in a >> reasonable amount of time? >> >> 1 and 3 are quite easily done using e.g. /solr/[c]/ping, but #2 >> is trickier. I can do this via JMX, but I'd prefer to avoid >> spinning-up a whole JVM just to probe Solr for one or two >> values. > > If your Solr version is at least 5.5.1 and you're NOT on Windows, > number 2 can also be verified by a ping request. Interesting. I did mention 7.4.0 but not my OS. I'm on Debian Linux, and I'm running Solr using the Solr-supplied init.d scripts (via solr install). > With a new enough version on the correct operating system, Solr is > started with an option that will kill the process should an > OutOfMemoryError occur. When that happens, it won't be able to > answer a ping request. > > Here's the issue that fixes a problem with the startup on 5.5.1 or > later: > > https://issues.apache.org/jira/browse/SOLR-8145 Given that, I'll go ahead and set things up to do a simple /solr/[c]/ping request for health-monitoring. Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlugIGMACgkQHPApP6U8 pFi8YQ//dwJdJG1VtqUgbFI437HzUMhuI+9SBOf0nateQFqQbfoqkLhC/z3dwjvj qqhqcT68D2x1bYYk/5we7KD9I6PZ50mL5sZlU34NYC9AFMB5QEdTtWljlqGM/Xoe elvsKYJVmZn9kvc6iwqyLU71clcRX27NhEDAFrPrCmhgZKRTpNqtgYyEOsIJZ/CL muMml4hV5eNIc+VOle+jcqwTrWY4xtaf6Fmo6NLCsUvC2CB5/QI7JoYzvnLvVVMD IVn6AnsLd/wIVSJiPyVYDA58/pVj1w6Jb36L8eg0fxfoO+eAkObUU3s71QglZlIx m9Qkd8lGQ7qNxUDOMSgPNW/j7tZcxn39FRsM9b3z7kWJGriBcz/S5jX9QSNcArmh pyHIf48y8wOgl/wQsmsGgXsHtdlwJu+84B3sFGjUKQU/2JPO88XJEo+pKluaMFDO E2yZGdTvfRbXLTqe/XCGN89yKyIOKJAX2ZXP9EU0PmFSFbeod6oqbT/MKO3+DzCm PpkUV10vlmqnsJ+5edj89hmM5gJOKcwQTDZ2E/U5tvs4DJHZTG578hnZp1coDU/c m7M80m5SyE/5ycYBODp6oyJNAkEf6suJ+BIyQkr61t9/L7yvwSm80nFheFpVMIMX N/lRL9ar4U/lLDL00aVhDecyNSFOvDjSUBlIlQ4hUb80bZiz3xY= =lOp1 -END PGP SIGNATURE-
Solr standalone health checks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I can see three possibilities for monitoring a Solr (7.4.0) deployment: 1. bin/solr healthcheck 2. curl /solr/[collection]/admin/ping 3. JMX Option #1 isn't available unless ZK is in use, and I'm not using ZK in my case. Option #2 issues a very simple query and essentially returns a "service is up" response. Option #3 requires a JVM to be launched in order to check to see if things are working well. I have read about the Prometheus/Grafana reporting, but that includes much more information about the performance of Solr that I'm currently interested in. The basic questions I'd like to have answered on a regular basis are: 1. Is the JVM up (this can be done with a ping, of course) 2. Is the heap healthy? Any OOMEs? 3. Will a sample query return in a reasonable amount of time? 1 and 3 are quite easily done using e.g. /solr/[c]/ping, but #2 is trickier. I can do this via JMX, but I'd prefer to avoid spinning-up a whole JVM just to probe Solr for one or two values. Are there any other options for monitoring Solr that I am missing? - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlugFhwACgkQHPApP6U8 pFjShRAAoV3auXv0PIhztMW7385hi0jV2Fl6V4PrF/TZUYpEQ0jDXdseC5bm+tip rwKmAYTqu6smvNtC4Qlj27+BdFSmaDP2MwfGN9sWCPahRLHdUKHfxwi4MnWTegM/ OkGuTiVYjzLe2vUlf4BACFFTRAz2bkRHua81SqiOMU1nZFQlj8mHy4qRBFK57Zcd R6GGry2zcnDTkXql5v/kOCaJiXUj76n8regMVaC0M04AFIvGrIqqJ/BfxkTPmVEf v1kC+zbKiThTl2fOSLRzwoLJvMpPghLKg5cvb9QQyRgrTQbYcYTPgytstKYS4c87 1mlbj92+T5D6kbw5snBoNIXqfPP+3kUQEeoEwz9m05SRYeoV/SR/M+wqqag5Vmz9 1Gje4TrLAfNOCxk1jSBkUWsgR5lC3msyDSDbLE/2i/m6iANxUoPnin0bQHpau6XN tGvxyTzyZa4O1hfsWyuTywipdJOadtjyDkAEEU5CeExFAY4EILxRr78mqMx1g+CV lefLYavs0rfQzvkkL01meL2nqitk82/x6l0PCyIh6WHHrIJ1XYWR+nQszeqY8HJE BX0NITMqQ2gk50JpzbKqrcLWNGLvAZTzFvLKUUq4pgtab3tBwwaDzVHsxhNy517Z 933Cz92cP1VJtUKkQrw4YDChQzZt9wIHIm5vcAaBgwKCZPRWcds= =yiXp -END PGP SIGNATURE-
Re: How secure is Zookeeper digest auth?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jan, On 9/16/18 16:22, Jan Høydahl wrote: > We plan to enable (digest) authentication and ACL with Zookeeper to > improve security. Can you be more explicit? There is HTTP DIGEST auth and then there are "digested" (hashed) passwords for the user-database. The former is secure on the wire and the other one is wire-agnostic. > However, we have not been able to answer the question of how secure > such a setup will be, given that ZK 3.4.x TCP communication is > unencrypted. > > So, do anyone know if ZK sends the password in cleartext over the > network, so that anyone who can sniff the network can also pick up > the password, and connect and read/write nodes in ZK? > > We'll of course add all the firewall and IP filtering we can. Do > you have any other tricks you use to increase ZK security? I'm not using ZK (yet) so this may be supremely ignorant since I don't know what protocol it uses to communicate: I would recommend using mutual-TLS authentication everywhere. I have just deployed such a system (single-node, no cluster/ZK) and all of the communication for both admin and querying are over client-authenticated TLS. Even if an attacker gets onto the box where Solr is running, they cannot attack it without also breaking filesystem privileges or exploiting the users who have access to the Solr client key stores. (I just did a little Googling and it looks like only ZK 3.5+ has TLS available. At any rate, that should be your target for the future if you really want a secure environment. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluewOgACgkQHPApP6U8 pFiE1g/8CiRxFySxCPZRU+OdGaw5JjtMNGs3oBDaf75LIQYDnsXAU9wJFjaEKymD snceusjikN85XyPIFBWLhbWvrdjKhJxm29q8xqqnwTkY1WmGis53Es9NHyT/I1UX dY3UGAbf148+ZR6NtCFDQPVQtKKfHqE/VAl2bJzMARTC1nPS3v3mtgKEbrAC5ZqX WMMkb6pOFH58Yj7jeEdHi/y8MKEOeXV3MynWrsSRqGsJsG4Ms55pdBvWtZmIZR+c 0sM4d7zUl18/JjP82YvhhHvHW0IQL+TGKLE1s22p6JRrMU9fzcxNoD9b1r9WORGl UixQETpBPkKw+VWXBesTxTNkprddMH6oGzm2KkWb9zOH0BehF/ChjB1W0vnC7RXB lEKWdNkwbLfrP1r+plpy2aVc3PV0lw3jsJdxLf3tMTEPgzeU6wweiJR+YMW6J0iS 4TWFouuL6yGSY7jT99lW+CmBfKHGEXoUlrxS2WSM9BvYuV8pJvzVuEkb1PmXUQdI rgQIW30Vk0jDwS6SMxdOy/TkbCDAV9dFqsqmYFTSN9W8jBdSx9RevOPnJyVnvCvI qq96sTqhPa0iSHYWWK5PAzZAvfbcRmohcut/1ZWml1pNZlZzBT0QGQUJm9CzXfS7 v6FNf7PrpIiqOlai1Js67Fm6QrWzjGPVhDl474Q1tAG1rFU2cSM= =U0Fj -END PGP SIGNATURE-
Re: solr, multiple ports
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 David, On 9/12/18 12:21 PM, David Hastings wrote: >> On Sep 12, 2018, at 12:15 PM, Christopher Schultz >> mailto:ch...@christopherschultz.net>> >> wrote: >> >> David, >> >> On 9/12/18 11:03 AM, David Hastings wrote: >>> is there a way to start the default solr installation on more >>> than one port? Only thing I could find was adding another >>> connector to Jetty, via >>> https://stackoverflow.com/questions/6905098/how-to-configure-jetty-to-li sten-to-multiple-ports >>> however the default solr start command takes the -p parameter, >>> can this start listening on multiple ports?>> >> What's your use-case? > > Use case is we are upgrading our servers, and have been running > solr 5 and 7 side by side on the same machines to make sure we got > 7 to reflect the results of our current install. However to finally > make the switch, it would require changing many many scripts and > servers that have already been modified to use both servers Can you configure your servers to redirect port X -> port Y? This is trivial using iptables, but you didn't mention your environment. What OS, etc. are you using? - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluZPqgACgkQHPApP6U8 pFifoA//eDjEVraAMrtiHBcUGvMEIpNy2mCQKt0tfk2GqOvq3phO03pYta8ygJeg UgS6c9E5Zrx7UzLF+x8ngyA+2YVvOatpRABx20k2Q8d4zq/gc2QrK+w/DwffDlV2 r3cFJkVa1k/NrDqownYAGlCvUPLUGa0JSEFnzokoh44Drn5TgRolWNqowDPitOoL FitT97n95XhpuQrIXG1wA0nicpeBKYCLrp5HJkVVHQrLZPqkIm2FjtJZuwZN54pN PpvBNnNIvaYfjtWxpmJ9sW2/PmwqmT4RwSmRsJUQ6H/iWFMsi1e/MCQQFWx/mOLK 4YS/yBvRaT1dRJMrzL0517zlrqdStwBh005bBeZ0+EE7DROwufYcT7hD9VBytG4y vzgFybRA3yo5LELp2Loj2MqMvbSHNFiT290m9JgLcJRf861dGD/Luj/AYEN4qV6k TrhlyzijKiUJmAjBIP/i8FxRNX9YkGl8QleDb4iIi5WUdPog5Enz0rw2O/l5Xie9 cz8pGj+OOEmuMLoMLBII7Crkqnmsla+hPpB2x9+lqoE0erjrngCShAiCLi9vGOJY u6oETiGTcZjgTNnXYpZLBxZw71q4sbZhpkUIC68NJE0IIO322Vu4yreM9AaYhObq Ak9fFrPPfgCyF7IAB6kkvWfP5eYOfK7TzTB4b9pWVN7J6owF8nA= =+X20 -END PGP SIGNATURE-
Re: solr, multiple ports
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 David, On 9/12/18 11:03 AM, David Hastings wrote: > is there a way to start the default solr installation on more than > one port? Only thing I could find was adding another connector to > Jetty, via > https://stackoverflow.com/questions/6905098/how-to-configure-jetty-to- listen-to-multiple-ports > > however the default solr start command takes the -p parameter, can > this start listening on multiple ports? What's your use-case? - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluZO50ACgkQHPApP6U8 pFhJIg//ZS7N9E/JlYjh6dxI9x5nOUMPw3wBMKsVo290e5HnQ+6Vx+CZYEtq/nQA Pxh7TdyvgdD67cj+jZCWYn4a3JLoVT+MVSciIZoVIDcvMEFHgmsviUwGCEq7+xsg WvuCPEo6IzY+yZZ33wzdr7jlv+jNbFHtrF5t9nuQk8YfNrLqwvaEVor6g6+t+R/j 2L0+UOPPzRvposLiJBUKhYedBxtWas7A05WSFHpYou9wmDhJSB6P1RfnlJSdUNVd M1BBzJpLTGo5fFgP1zZTVns+jdo6lFTo/g/UpBvVhgv1pPTkN/6vXCTbYlhXpkhO PxXplyX6OZfavTJUvAzDoUH44xSIuAgLi+7G2nzogXqejZPjwDj6J42jvL/v3ZXP dz4CFas0gtY/PW9eaug/nGD7UMqCrqSMxOxBatKWnNEl6R359Zp4tRD9fmU097Vc rmky2kjZazFNGcA0RU7F8Z/pNIbWmoVAkc08yDJ6uyqfh63PI5+CEBclHZqYcAhj cZWoDvmZL56bT8gQ5leGxME7+QQNLm6nTV1O1l9u+HeWqhBYOlbhDFOlzPVL5cYQ SZeaseWGamn1HtyZGJN+dZoQxB3QXlHQY9Nj837QDV9tdLlHsujJ3u7w8uJBoJF0 sILKM1oQoPNcCTjv+JbhGKu6z/eq7syVwkwE9zKTlITlcemEoY8= =6Dnr -END PGP SIGNATURE-
Re: Error while creating a new solr core
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shalvak, On 9/11/18 01:51, Shalvak Mittal (UST, ) wrote: > I have recently installed solr 7.2.1 in my ubuntu 16.04 system. > While creating a new core, the solr logging shows an error saying > > > " Caused by: org.apache.solr.common.SolrException: fips module was > not loaded." > > > I have downloaded the necessary jar files like cryptoj.jar and > copied them in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/ but > the error still persists. > > I have also updated the java.security file with > security.provider.x=com.rsa.jsafe.provider.JsafeJCE Does JsafeJCE provide a FIPS-compliant JSSE back-end? If so, it looks like it's not configured properly. Does Solr work as expected when you are using the built-in JSSE (Sun) provider? > Can you please suggest a solution to the FIPS module problem. Are > there any files I am missing while creating the solr core? You'll have to talk to your security module vendor about fixing this issue... it's got nothing to do with Solr. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluX9qgACgkQHPApP6U8 pFjM4BAAv1/bXM8VwKo+ObWaRR07Y2YC/K0v6BG1yQnxv5M3OWA9zQnrm0ktAFW6 yYzF/OP8HpDKjXoS/ZahHaDS54hjLwcnBbNzDK6vbSfk556gI55v6RIEpZ/R4aYE cae7dQIYqiGQ18igIEoGxj8ZXcNHfLmfMhVoLBCd7JJnUucToTUVhpNY4UqzBlBq sxUzziTuMsm0RWYB4HedK8k0Tg0Sltw1XgYzeFb325Dmhw9HOLQukVvjRHrg/tCW +n0JVzXJdANqpJpHDhmEnv3/Lw6j/8kl9APOt0cLP3bRAmD2V7QkvDBsNpOnlwiE TfBjkv4gCkBjcB9aPInMQOdwpVp+i28RqQzw+lMipqCUVY/F0/u45WHuM9BF2IED 7fZ6PhxY953qGn5KSKpg2ol6H5X9BMswI5Az+MMGfri2dNRjgU8UfW2sr/YdrNvN KMzo9vKsbiTGQ6sxb3Ot1ARjDUivletvI4mGjb5dUwV+xKCpWe+CSrwSZDhk5JsE mR9jeil7QtMBuSl1ts4KB7JJ4Hlx0bHmSX7UOGSUfoqqrdfKQYDGV0GIBsDfX4uC olcW4HEmDBnwRkxuAfm+GHCtTyWMOYBkQ3LG0uUD/HptBeXAHtVrMP7Hy5EztDiq VxdrHG7siEKo/kIUO1yQJxUz7cXo8ZFyS7BCMYQZgiuCU3bdowM= =8e2z -END PGP SIGNATURE-
Re: “solr.data.dir” can only config a single directory
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 8/27/18 22:37, Shawn Heisey wrote: > On 8/27/2018 8:29 PM, zhenyuan wei wrote: >> I found the “solr.data.dir” can only config a single directory. >> I think it is necessary to be config multi dirs,such as >> ”solr.data.dir:/mnt/disk1,/mnt/disk2,/mnt/disk3" , due to one >> disk overload or capacity limitation. Any reason to support why >> not do so? > > Nobody has written the code to support it. It would very likely > not be easy code to write. Supporting one directory for that > setting is pretty easy ... it would require changing a LOT of > existing code to support more than one. Also, there are better ways to do this: - - multi-node Solr with sharding - - LVM or similar with multi-disk volumes - - ZFS surely has something for this - - buy a bigger disk (disk is cheap!) - - etc. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluEvn8ACgkQHPApP6U8 pFgTTg//ayed4AXtocVrB6e/ZK0eWz5/E1Q7Oa7kF0c34l0MH6BIe4iOHDmrR+J9 A+t6SzVQqURMrDE8plg/xbPTlyGF8wGrEjZUZF4fpWlgnY/qNYxl5S9zJ1hPgBh7 fCKkb+LuLGdZMM4oORfCYtMgpDjOnLihHmDTfkrvZzyZwOQGeFpgEZDZKFYAjcur wqIGTMTTWfSCoPQgQzvI8Husq7Rs75BEc+mAkaPOL0LvT9PQDEPEXXt3Kf5vXgM+ Eet1ymltZM/Xz+V/em/oeumCoCE18uxi9seuDhTpHRLjS9tCBbPWA0NmobriY3ct GskwCnsFDAeGjG/7dcA/zmB8BK4t6JpUvI+OcJU5dvQczpQbhB9WT4GQUiME9Tvr RjBES53HoEEKA8gb0kiuPN1pE2MSX8vO3uKpQtzVS2MOmuOeV/IebrnP/zLTll18 awtWWbPmzaAGAUfXL2ExK0+ism0o31i46CNfLfBBM8jh3lkc2HNdz5TLe8YfN3Sp Tj0HfmYynhtH1CggOAcI1M4PIEbIGfoywX/ICSGHnLwfQoDUnBmjqXhGkFUIstWk Dcntx+4E4NRny6zDZfg5UMjWYyo+fOVSoaDf6dfgBWIB1I3xPn5Dt0In7+oRtZ9i Xlkw6DSaSZZ5caBqjaF278xj7IwEw2zipLPWB7hVCcUhKuJBbDY= =rbrT -END PGP SIGNATURE-
Re: Data Import from Command Line
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Adam, On 8/20/18 1:45 PM, Adam Blank wrote: > I'm running Solr 5.5.0 on AIX, and I'm wondering if there's a way > to import the index from the command line instead of using the > admin console? I don't have the ability to use a HTTP client such > as cURL to connect to the console. I'm not sure when it was added, but there is a program called "post" which comes with later versions of Solr that can be used to load data into an index. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlt7AfcACgkQHPApP6U8 pFgtgw/7BTV7shvNcXKrpTB11g0wjYXAJOlqARlYgWFcQIhVcs1jfbJi8O6Yxh0x BIA/EAdob9zC/EgYbMfkM/duibr2A1/wF+CkhhTd6M/HcoSOXbI31L1LDo/xa0lg z6t3AO9WYYKnFmD2JIxdidH1zHpIz74cAc3q43PFVtLNW2fVT2cNlg7Vn6vdVmoi 79VLPnvdyxZRdQtxbhdvCribPdFP6YLC3dgxh1KeeZzdO0OcjQykSrssX/hd207z 9iuw2TusoUIgXQsMLRtnKqqVp38MYPppk49uGprhB8iTJjDAVlvgD3jURef7S7s/ w1KBPVZTGQFh6cvzjOOZHUkaj0hX4PuYkun/hQY3Uy5kBIw5fo0Y10bjVcRZGYrb SQDTUe0sdfU27qaY8DLqSf21to5K+wTIuOO28C1TkHkjKymg0w7THz583o0aOCzr 5fjNN00FevrWFLm+n7c2tToW3H1cAZkh5XRDDDUYnqzVzchSOHlFKM1X0gMOq8Lf If434uctruwsqBrkscTWcS5UALGLxuwtNk9trLLeRII8YapB6MI6xoUnCvWFv1sO fziqKXXwBmrI+v/1FqiR8Md3r32jm8Gy54acViJc9+szUEM26C+FSzvsdGnf5oVr tlsHVwLBPORS6hGJ+MvqMGkrxlO1WNm5MrJxHNoyQ5KqAL7WT+s= =+VTK -END PGP SIGNATURE-
Re: Searching by dates
Shawn, On 8/16/18 10:37 AM, Shawn Heisey wrote: > On 8/16/2018 7:48 AM, Christopher Schultz wrote: >> I haven't actually tried this, yes, but from the docs I'm guessing that >> I can't search for a DOB using e.g. 2018-08-16 but instead I need to >> search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ. >> >> No user is ever going to do that. > > If you use the field class called DateRangeField, instead of the trie or > point classes, you can get what you're after. > > It allows both searching and indexing dates as vague as "2018". > > https://lucene.apache.org/solr/guide/7_4/working-with-dates.html Hmm. I could have sworn the documentation I read in the past (maybe as long as 3-4 months ago) indicated that date+timestamp was necessary. Maybe that was just for the index, while the searches can be partial. As long as users don't have to enter timestamps to search, I think all is well in terms of index/search for me. As for i18n, is there a way to have the query analyzer convert strings like "mm/dd/" into "-mm-dd"? I'm sure we can take the query (before handing-off to Solr), look for anything that looks like a date and convert it into ISO-8601 for searching, but if Solr already provides a facility to do that, I'd rather not complicate my code in order to get it working. > For an existing index, you will have to change the schema and completely > reindex. That's okay. The index doesn't actually exist, yet :) This is all just planning. Thanks, -chris signature.asc Description: OpenPGP digital signature
Searching by dates
All, My understanding is that Solr (really Lucene) only handles temporal data using full timestamps (date+time, always UTC). I have a use-case where I'd like to store and search for people by their birth dates, so the timestamp information is not relevant for me. I haven't actually tried this, yes, but from the docs I'm guessing that I can't search for a DOB using e.g. 2018-08-16 but instead I need to search using 2018-08-16T00:00:00 plus maybe "Z" at the end for the TZ. No user is ever going to do that. I can also offer a separate form-field for "enter your DOB search here" and then correctly-format it for Solr/Lucene, but then users can't conveniently search for e.g. "chris schultz 2018-08-16" and have the DOB match anything useful. Is there any standard way of handling dates, or any ideas people have come up with that kind of work for this use-case? I could always convert dates to unparsed strings (so I don't get separate tokens like 2018, 08, and 16 in the document), but then I won't be able to do range queries against the index. I would definitely want to be able to search for "chris [born in] august 2018" and find any matches. Any ideas? Thanks -chris signature.asc Description: OpenPGP digital signature
Re: [OT] Lucene/Solr bug list caused by JVM's implementations
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Erick, On 8/15/18 12:56 PM, Erick Erickson wrote: > Also note that the OpenJDK devs regularly get to test very early > (unreleased) Java versions, which flushes out a lot of issues long > before a general release of Java We (dev@tomcat) get emails from Oracle about pre-release versions of Java releases as well. I'm sure you guys could get on that list so solr-dev@lucene can get notifications of pre-release versions to test to make sure Solr is good-to-go on each forthcoming version. - -chris > On Wed, Aug 15, 2018 at 5:25 AM, Shawn Heisey > wrote: >> On 8/14/2018 8:07 PM, Yasufumi Mizoguchi wrote: >>> >>> I am looking for Lucene/Solr's bug list caused by JVM's >>> implementations. And I found the following, but it seems not to >>> be updated. https://wiki.apache.org/lucene-java/JavaBugs >>> >>> Where can I check the latest one? >> >> >> That is the only such list that I'm aware of. There are not very >> many JVM bugs that affect Solr, and most of them have either been >> fixed or have a workaround. I don't know the state of the IBM >> bugs ... except to say we strongly recommend that you don't run >> IBM Java. >> >> Best course of action: Run the latest release of whatever Java >> version you have chosen, and only use Oracle or OpenJDK. For >> Java 8, the current Oracle release is 8u181. At this time, I >> wouldn't use Java 10 except in a development environment. It's >> still early days for that -- newest Oracle version is 10.0.2. >> >> If you use the latest Oracle/OpenJDK release of Java 8, Solr >> ought to work quite well. >> >> Thanks, Shawn >> > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlt0cQgACgkQHPApP6U8 pFg3FRAAi8BZICgv57H5zb6qxUh9Ic5scn0BBoT+lVRjpOkemoMQ8Fki3MViE69o jCABKQ70HItM1SKFu7a8xys8M/qs81gDHHdO/atW8Q9VzfrWlBJadrnIdVrvqW4y ZWBgUfx0fsucgKBGy9U7uFlvpTqj+H/gpRLm+1CEzi3Eb3V43YRkxy1FY5qPH2yA YApJMqyLWdSw9p2axwCSRswILfnCTI6VV0YXNAbaJIxJNqbmDat4yLGN/e6mMcP+ +y8jndPMxQHTgB4OH2B1DLAIlop4p3/eF2Oy+PiKEziALlvd+TYSlc07XWYv+TuO NELBS0eEty/ay8wLSoCx+er9N18eiPa3eaMr7LQcRFs2wtBmYe8OGQpb9NC0/Bpm WNUuDGxc2rIGAnHYI7CTi/Y8ncCX1XBGstMGuYpnguoEWMSOUSrdWDVYjJJEB+cP qFCGRHdsK3qeXze2UQ/9FHNXYGjv9TwKYsTAX06ZZZPGC0VD8l0mxXeHfsx3aTQA u7/cmj+i86LFnjQ/gvsc4vUzXEk163Pgd/dutqpaMFmENTdN6cvBHHnj9T7TV/PJ WpJemYvje4xFZrFvbkdQ1XMij/s3+8gNqHYmaaTjZ7JHvnlbCDofqtwLbFH9hKDt n87iJUmTe6zGtn6/RUTrRA8ONH/5j2Yok+2reHqzgo2XSosqpc8= =9CZ5 -END PGP SIGNATURE-
Re: Add Wildcard Certificate to Java Keystore
Kelly, On 8/13/18 12:37 PM, Kelly Rusk wrote: > All I have is the .p12 and password so it has already gone through > the CSR process. How do I import this file into the keystore? Java's keytool won't merge keystores. You'll have to export the certificates from the PKCS12 file you got from your CA and import each of them separately into your own keystore. > On the Windows side, does it need to reside in the Personal Store or > Trusted Root Store? Umm... is this for a server certificate? If so, you definitely don't want to import any of those certificates into any system-wide or user-wide certificate trust stores. Is this certificate signed by a real CA, or are you building your own, internal, private CA who is signing these certficates? -chris > -Original Message- From: Christopher Schultz > Sent: Monday, August 13, 2018 12:00 > PM To: solr-user@lucene.apache.org Subject: Re: Add Wildcard > Certificate to Java Keystore > > Kelly, > > On 8/13/18 11:55 AM, Kelly Rusk wrote: >> I have imported a Wildcard Certificate to my Java Keystore and it >> displays, but when I pull up Internet Explorer and browse to my >> Solr site, it fails to load and presents TLS errors. > > What do you mean "it displays"? > > How did you import your signed certificate into your keystore? What > was in the keystore before you performed the import? > >> Has anyone run into this, what commands do you run to import a >> Public CA into Solr? > > Generally, you want to generate a key+cert/CSR and send the CSR to a > CA. The CA signs it and returns it, typically with one or more > intermediate certificates to build a chain of trust between the CA's > root cert (present in browser trust stores) and your server's > certificate (which was signed by a subordinate certificate, not > directly by the CA's root cert). > > Import them into your keystore in this order: > > 1. Highest (closest to the root) CA cert 2. [any other intermediate > certs from the CA, in order] 3. Your server's cert > > Most server software needs a bounce to reload the keystore. > > -chris > signature.asc Description: OpenPGP digital signature
Re: Add Wildcard Certificate to Java Keystore
Kelly, On 8/13/18 11:55 AM, Kelly Rusk wrote: > I have imported a Wildcard Certificate to my Java Keystore and it > displays, but when I pull up Internet Explorer and browse to my Solr > site, it fails to load and presents TLS errors. What do you mean "it displays"? How did you import your signed certificate into your keystore? What was in the keystore before you performed the import? > Has anyone run into this, what commands do you run to import a Public > CA into Solr? Generally, you want to generate a key+cert/CSR and send the CSR to a CA. The CA signs it and returns it, typically with one or more intermediate certificates to build a chain of trust between the CA's root cert (present in browser trust stores) and your server's certificate (which was signed by a subordinate certificate, not directly by the CA's root cert). Import them into your keystore in this order: 1. Highest (closest to the root) CA cert 2. [any other intermediate certs from the CA, in order] 3. Your server's cert Most server software needs a bounce to reload the keystore. -chris signature.asc Description: OpenPGP digital signature
Re: 4 days and no solution - please help on Solr
Ravion, What's wrong with "update request"? Updating a document that does not exist... will add it. -chris On 8/10/18 3:01 PM, ☼ R Nair wrote: > Do you feel that this is only partially complete? > > Best, Ravion > > On Fri, Aug 10, 2018, 1:37 PM ☼ R Nair wrote: > >> I saw this. Please provide for add. My issue is with add. There is no >> "AddRequesg". So how to do that, thanks >> >> Best Ravion >> >> On Fri, Aug 10, 2018, 12:58 PM Jason Gerlowski >> wrote: >> >>> The "setBasicAuthCredentials" method works on all SolrRequest >>> implementations. There's a corresponding SolrRequest object for most >>> common Solr APIs. As you mentioned, I used QueryRequest above, but >>> the same approach works for any SolrRequest object. >>> >>> The specific one for indexing is "UpdateRequest". Here's a short example >>> below: >>> >>> final List docsToIndex = new ArrayList<>(); >>> ...Prepare your docs for indexing >>> final UpdateRequest update = new UpdateRequest(); >>> update.add(docsToIndex); >>> update.setBasicAuthCredentials("solr", "solrRocks"); >>> update.process(client, "techproducts"); >>> On Fri, Aug 10, 2018 at 12:47 PM ☼ R Nair >>> wrote: Hi Jason, Thanks for replying. I am adding a document, not querying. I am using 7.3 apis. Adding a document is done via solrclient.add(). How to set authentication in this case? Seems I can't use SolrRequest. Thx, bye RAVION On Fri, Aug 10, 2018, 10:46 AM Jason Gerlowski wrote: > I'd tried to type my previous SolrJ example snippet from memory. That > didn't work out so great. I've corrected it below: > > final List zkUrls = new ArrayList<>(); > zkUrls.add("localhost:9983"); > final SolrClient client = new CloudSolrClient.Builder(zkUrls, > Optional.empty()).build(); > > final Map queryParamMap = new HashMap>> String>(); > queryParamMap.put("q", "*:*"); > final QueryRequest query = new QueryRequest(new > MapSolrParams(queryParamMap)); > query.setBasicAuthCredentials("solr", "solrRocks"); > > query.process(client, "techproducts"); // or, client.request(query) > On Fri, Aug 10, 2018 at 10:12 AM Jason Gerlowski < >>> gerlowsk...@gmail.com> > wrote: >> >> I would also recommend removing the username/password from your Solr >> base URL. You might be able to get things working that way, but >>> it's >> definitely less common, and it wouldn't surprise me if some parts of >> SolrJ mishandle a URL in that format. Though that's just a hunch on >> my part. >> On Fri, Aug 10, 2018 at 10:09 AM Jason Gerlowski < >>> gerlowsk...@gmail.com> > wrote: >>> >>> Hi Ravion, >>> >>> (Note: I'm not sure what Solr version you're using. My answer >>> below >>> assumes Solr 7 APIs. These APIs don't change often, but you might >>> find them under slightly different names in your version of Solr.) >>> >>> SolrJ provides 2 ways (that I know of) to provide basic auth > credentials. >>> >>> The first (and IMO simplest) way is to use the >>> setBasicAuthCredentials >>> method on each individual SolrRequest. You can see what this >>> looks >>> like in the example below: >>> >>> final SolrClient client = new >>> >>> CloudSolrCLient.Builder(solrURLs).withHttpClient(myHttpClient).build(); >>> client.setDefaultCollection("collection1"); >>> SolrQuery req = new SolrQuery("*:*"); >>> req.setBasicAuthCredentials("yourUsername", "yourPassword); >>> client.query(req); >>> >>> SolrJ also has a PreemptiveBasicAuthClientBuilderFactory, which >>> reads >>> the username/password from Java system properties, and is used to >>> configure the HttpClient that SolrJ creates internally for sending >>> requests. I find this second method a little more complex, and it >>> looks like you're providing your own HttpClient anyways, so for >>> both >>> those reasons I'd recommend sticking with the first approach (at >>> least >>> while you're getting things up and running). >>> >>> Hope that helps. >>> >>> Best, >>> >>> Jason >>> >>> On Thu, Aug 9, 2018 at 5:47 PM ☼ R Nair < >>> ravishankar.n...@gmail.com> > wrote: Dear all, I have tried my best to do it - searched all Google. But I an=m unsuccessful. Kindly help. We have a solo environment. Its secured with userid and >>> password. I used > >>> CloudSolrClient.Builder(solrURLs).withHttpClient(mycloseablehttpclient) method to access it. The url is of the form >>> http:/userid:password@/ passionbytes.com/solr. I set defaultCollectionName later. In mycloseablehttpclient, I set Basic Authentication with CredentialProvider and gave url, port, userid and password. I have changed HTTPCLIENT to 4.4.1 version,
Re: Schema Change for Solr 7.4
Joe, On 8/3/18 11:44 AM, Joe Lerner wrote: > OK--yes, I can see how that would work. But it would require some quick > infrastructure flexibility that, at least to this point, we don't really > have. The only thing that needs swapping is the URL that your application uses to connect to Solr, so you don't need anything terribly complicated to proxy it. Something like Squid would work, and you'd only have a few seconds of downtime to set it up initially, and then another few seconds to swap later. Heck, you can even remove the proxy after you are all done. It doesn't have to be a permanent fixture in your infrastructure. -chris signature.asc Description: OpenPGP digital signature
Re: Schema Change for Solr 7.4
Joe, On 8/3/18 11:09 AM, Joe Lerner wrote: > We recently set up Solr 7.4 in Production. There are 2 Solr nodes, with 3 > zookeepers. We need to make a schema change. What I want to do is simply > push the updated schema to Solr, and then re-index all the content to pick > up the change. But I am being told that I need to: > > 1.Delete the collection that depends on this config-set. > 2.Reload the config-set > 3.Recreate the dependent collection > > It seems to me that between steps #1 and #3, users will not be able to > search, which is not cool. > > Can I avoid the outage to my search capabilitty? I dunno about how to do any online-updates like this, but you could always instead: 0. place a proxy between your application and Solr 1. stand-up a new service 2. load the config-set 3. create the collection 4. load all the data from source 5. swap the service at the proxy to the newly-created service -chris signature.asc Description: OpenPGP digital signature
Re: Search for a specific unicode char
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 To whom it may concern, On 7/31/18 2:56 PM, tedsolr wrote: > I'm having some trouble with non printable, but valid, UTF8 chars > when exporting to Amazon Redshift. The export fails but I can't yet > find this data in my Solr collection. How can I search, say from > the admin console, for a particular character? I'm looking for > U+001E and U+001F Try copy/pasting from e.g. https://www.fileformat.info/info/unicode/char/001e/browsertest.htm Or url-decode this string (%1e) here: https://meyerweb.com/eric/tools/dencoder/ and paste it into your search box. Do you have the source-data for the index? Maybe it's easier to locate the character in the source-data than in the index. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgzZoACgkQHPApP6U8 pFh5LQ//XEHKxGXd50kujey1H2i9SCoF0MYPIL255Mm/CXI2CEkHBiZnEEN7mrEH xW87KbpKcahikEYT2fc/VDoctWtoJYpzi3WrizONNf1W7J4Nq9sSfdQ8UEDEuHy7 ITma15LkVseKmWxcFJP5rOtRatHw+L0j8EzwvYrC+BfpP7c9hqO8h4VO+9fkmSbn 5wB49kfot4quvJf4iMud+/qd6+4rLD1XR2nO1P7ZRuU7yqEGy5w9fLFNYkAVZmxR 1WXidEnAgLXxFoR061k0OwrxCwgVD0K/NqhzO5cWpmv5DbGoFiWcuOavzlOedp7u ZPP32TuAM3PqmTpO6ku1MEsI70jVNlaRx6M1dzp6RUARFNEzLRbw93F3Vo9A34PL 94JhDaKMqbA74s2OdG+qNna7Fwe4mbIXMxUbwY80AC+1RMkEzRC/f1erNK1sfCzA 6cn06FNLuwbNhHvEpPAcS7TX0w0uhy4tCbbBt8rw0pbZDWee4Jz/aF7eRfMIiLdt SlILSJZyte0CCMuC7Rm5qs/lpObfOaynVNSHpyPOJircqOyvYDy/UWq6C1t5/NuB 0X6vpBy/QSZhmmq7GHc6a8A6udDd8cfW1rXEt1vRcG9qnke1zSR7Trcb6n+GV19s wooo3fHIsvU7393MHUZqAspaU20WqY9r9coNRHmje40Uj5ckFzU= =NdlT -END PGP SIGNATURE-
Re: Solr Server crashes when requesting a result with too large resultRows
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Georg, On 7/31/18 12:33 PM, Georg Fette wrote: > Yes ist is only one of the processors that is at maximum capacity. Ok. > How do I do something like a thread-dump of a single thread ? Here's how to get a thread dump of the whole JVM: https://wiki.apache.org/tomcat/HowTo#How_do_I_obtain_a_thread_dump_of_my _running_webapp_.3F The "tid" field of each thread is usually the same as the process-id from a "top" or "ps" listing, except it's often shown in hex instead of decimal. Have a look at this for some guidance: http://javadrama.blogspot.com/2012/02/why-is-java-eating-my-cpu.html Some tools dump the tid in hex, others in decimal. It's frustrating sometimes. > We run the Solr from the command line out-of-the-box and not in a > code development environment. Are there parameters that can be > configured so that the server creates dumps ? You don't want this to happen automatically. Instead, you'll want to trigger a dump manually for debugging purposes. - -chris > Am 31.07.2018 um 15:07 schrieb Christopher Schultz: Georg, > > On 7/31/18 4:39 AM, Georg Fette wrote: >>>> We run the server version 7.3.1. on a machine with 32GB RAM >>>> in a mode having -10g. >>>> >>>> When requesting a query with >>>> >>>> q={!boost >>>> b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)=str ing > >>>> _field_type:catalog_entry=2147483647 >>>> >>>> >>>> the server takes all available memory up to 10GB and is then >>>> no longer accessible with one processor at 100%. > Is it a single thread which takes the CPU or more than one? Can > you identify that thread and take a thread dump to get a backtrace > for that thread? > >>>> When we reduce the rows parameter to 1000 the query >>>> works. The query returns only 581 results. >>>> >>>> The documentation at >>>> https://wiki.apache.org/solr/CommonQueryParameters states >>>> that as the "rows" parameter a "ridiculously large value" may >>>> be used, but this could pose a problem. The number we used >>>> was Int.max from Java. > Interesting. I wonder if Solr attempts to pre-allocate a result > buffer. Requesting 2147483647 rows can have an adverse affect on > most pre-allocated data structures. > > -chris >> > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgySgACgkQHPApP6U8 pFjgKxAAxfbUmcj81+CpmTwHaPsz8Zb70HX4o/1eDGwALMhuvg8MyTaZnR9rSPy3 LHhAn0dtdnhp7Pe3NWRrYFdzKOZjQ85jiEcW96bzCe5ggJmnvs9a9VeEJ+5b4AXN XMtSMo8Ph7BvAWeTQcwmsiK8w2grAzaV6zXEetxaXgL0+16wfIjyNBteiQHkpcjo T5T5UzSzwyuAxFJkxSdbsF6SAJD7+zwbOEUQlURlUBsmzgam124ojgNl3gEG8d/V SSFhI1vnuj7pkdFLSZm7BDdAw6KjnOeM3yE3VKh5Lem4CRNLrP3ZvKrzKVlWTFJ4 dAIuJL6GUSMEFU0MCwQZjFxmtWNMwl/MIdDD8Yp9m/GislLXbcOi4oBbmWTNnuqU SPtmjdV+7fcIRl8AWc0bzLbK4nFYlVFzhiijR5am+pvF13TB/WQ8eOn9uifSPxWb OHzrU+fMV0fvIe5pZxqkcHEBas5QiZKZ5yH6Zz+xLldF4nh9Q4A6CJu/21qU/Kxd Dp2lenZEjKc90FKpSVMXqxJNM0n7geRmTSgv8imeoQf5+H6VU7dll1xGQkTnXtR9 UyV/U1fj12z2UjzcY6ePuJ8BadIx+cSf6H3q4bcJOGZ884lI+bDX08C/89hb/5vT 2NE5+tK1jAOX/ESClb6eFFMcJzBww/CoIxb9PpRqgw3HJKYuVpY= =mS/y -END PGP SIGNATURE-
Re: Solr Server crashes when requesting a result with too large resultRows
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Georg, On 7/31/18 4:39 AM, Georg Fette wrote: > We run the server version 7.3.1. on a machine with 32GB RAM in a > mode having -10g. > > When requesting a query with > > q={!boost > b=sv_int_catalog_count_document}string_catalog_aliases:(*2*)=string _field_type:catalog_entry=2147483647 > > > > the server takes all available memory up to 10GB and is then no > longer accessible with one processor at 100%. Is it a single thread which takes the CPU or more than one? Can you identify that thread and take a thread dump to get a backtrace for that thread? > When we reduce the rows parameter to 1000 the query works. The > query returns only 581 results. > > The documentation at > https://wiki.apache.org/solr/CommonQueryParameters states that as > the "rows" parameter a "ridiculously large value" may be used, but > this could pose a problem. The number we used was Int.max from > Java. Interesting. I wonder if Solr attempts to pre-allocate a result buffer. Requesting 2147483647 rows can have an adverse affect on most pre-allocated data structures. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgXy4ACgkQHPApP6U8 pFicOQ//c1Qe0hLOHIbSvmxAMVEhqZTjQlzEGFoYYhC1aGrYpw++RKQYtBLD2kmN DcLLkwFOmwv5CDft+Mn+g5ZWhEuZSKnwFgxsPfTAbRGjDYGQ7qCzzGq2JGacoxTJ rPgizyRlZQ4f5QY0RHohAGFx/QhgPtLdSl0V32eERWH8fVJWvDH3iYTTTSDN4UCY /bpB34nrruBgh2iTz9UcGR1jnTw9iU57OVYRwtTk8ETeOivcBM5MTXzKbwQ8/w5m c7lmKWqMG0G5XKKu6KDbWFZwSwYLBvHTUQurqgS2pkm+r2c4xP5/U0+uI5D9EseS 1HiOjWBuhWFEIveioKCOQbPAWL+C0i4xMbBLiC4RZPnTs6LSQ0aXm4Jx05NFoAWt 3HA2VCb9rrK5y8cICSCbVGaPNNBT9HHqJqeo2eGbzLaZXP5iRCc8BdkjHTPrSqCq gh8FEAK9pVS3ejO96DZvIoiIEpcmRNuSHczdE7YKwCv5XvytSh4QXa0SKluEhpYo acPXOtjIbqFcTZ1f+hZTfiG1/PeCUnYshta8VdSyvIjm748wOB7wqs7uYhl0b6zx i6OgoQ3bOel8e7oAO4Fmv5LE56b8A4tOPzPBf4Y1ehb8e8HbBdSzZuzqZZrQqChQ AUfrEzaXUKIBsmlaUneT2qjsLLZZmU+Gk0EYJnmHw63RQR/QxKg= =IXGx -END PGP SIGNATURE-
Re: Upgrading SOLR (not clustered)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Phil, On 7/25/18 4:38 PM, Staley, Phil R - DCF wrote: > Christopher, > > Testing an upgrade from version 7.2.1 to 7.4.0 on SUSE Linux 12 > > From the /etc/init.d/solr file? > > SOLR_INSTALL_DIR="/opt/solr" > > From the /etc/default/solr.in.sh file? (and these are my data > and/indexing core locations > > SOLR_PID_DIR="/var/solr" SOLR_HOME="/var/solr/data" > LOG4J_PROPS="/var/solr/log4j.properties" > SOLR_LOGS_DIR="/var/solr/logs" SOLR_PORT="8983" I would expect your process to work. Did it? - -chris > -Original Message- From: Christopher Schultz > Sent: Wednesday, July 25, 2018 3:23 > PM To: solr-user@lucene.apache.org Subject: Re: Upgrading SOLR (not > clustered) > > Phil, > > On 7/25/18 12:38 PM, Staley, Phil R - DCF wrote: >> What are the steps for upgrading a non-clustered SOLR version? >> Here's what I thought should work: > > > >> 1. Open a bash window and ssh login to desired server with >> your Linux admin credentials > >> 2. Change directories: cd /opt > >> 3. Download the latest Linux/OSX version direct to server: >> sudo wget >> http://secure-web.cisco.com/1m3u-zHHzT7PG9DMKzh18vroXutH5_t3ai-gl70-Y x >> >> ZzDhjDAlBf5297ajnpoZ0PptxeKUldcLaRREkQF6UwpkpjGJvBhFyMYKEleNgOv2KiAXuZ >> Qw4HjRFeUCRluU7gPGPLiYF7_aaBeutMU6Kr0LxiOTpUTv2z9qZiIQYU2M-YN1lNy-acH K >> >> rY5ZfGuMw0fSBmdRa9PSzP9ZUj1qGEY94PCLQXxVNkYx_u4CXx-TaA0Fo-aKqvl2x9ejFB >> uVt2jF1e8zf3i9E367USmyBdbEQ/http%3A%2F%2Fapache.claz.org%2Flucene%2Fs o >> >> lr%2Fx.x.x%2Fsolr-x.x.x.tgz (replace x.x.x with the latest version >> number) > >> a. Additional download mirror servers are available @ >> http://secure-web.cisco.com/1Mafx4QIn9BgkDPtPKbw6pF3EugYCWQHwgSifrgOr _ >> >> 5l1VTprI53j3huCKwyUxst3FIbRgyqah-96wu9NC3fcClwiEqV6ww9g796bhMz6OQDxYb17q 2WPVzIhkB8ozsOw6CJoJKu9xvQuPlab4QkH5DAqOfWBFbtBavS2s-eRdGexv327ATH5BZZP0 snS49XnaiUJjYgYPf4ILzXPp5DLQmbLYSxuHlIp0UP3J_4b_gxq9JEB7_E6dcDiq1hrEN_wW 4n8MvuaRQ3PqPgO_ucjaFoYOL5ZoFSM-svWmZcoD1E/http%3A%2F%2Fwww.apache.org%2 Fdyn%2Fcloser.lua%2Flucene%2Fsolr%2F7.3.1 if the http://secure-web.cisco.com/1mEnNfQ3nil_pfEFLpG5wMsugkz7vhDU0czyVu2MH7pe J0aomngulTED-W-zTbK-ywavVjNDYF95PcgmerYe2J4MSIrpaWALkysbyL5rYu4BVb9VZXQg GuPso0kODrtnA_F4Igw1cE2qjoeoRLk6Pff9Or3lnLbyVCuHjIfECo_JOGvuw91ulYljWU3e 113vxCGB8x9ogaAPR06C1qoqDhu4_b1j2tXqAfJb9iiJKLvOHB-RsxGLu1jxdk4_enK1CVE5 5nj2gyHh2QgAgqVmaBA/http%3A%2F%2Fapache.claz.org is site is slow. > >> 4. Login as root user: sudo -i and enter you admin >> password > >> 5. Unzip the .tgz file: tar zxf solr-x.x.x.tgz > >> 6. Change directories: cd / > >> 7. Stop SOLR service: service solr stop > >> 8. Confirm that SOLR is stopped: service solr status > >> 9. Change directories to your user home directory: cd >> /home/myadminlogonid > >> 10. Create new solr symbolic link in your user home folder that >> points to new SOLR version: ln -s /opt/solr-x.x.x solr > >> 11. Move/replace current symbolic link: mv solr /opt > > What version are you going from/to? > > What OS is this? > > Do you have an /etc/init.d/solr file? If so, where does > SOLR_INSTALL_DIR point? > > Do you have an /etc/default/solr.in.sh file? If it points to all of > your data-locations, then you should be okay. > > -chris > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltY4DkACgkQHPApP6U8 pFgShhAAysteX3nkqj/LwLZ8Q5uiLVo/OCNz0kk/Q4VOFM7sgnkydLTCdfRD6hBD FtqoJue/ayXvGPoRC46R0+LGBYyRQwPajFPBe0S5Ay4dCIe4de1lMjmsc6zfgMpb tu15YduBFT2O6vdKUyHISHIANqvnaAZnRvfp6P3rlgN1ADL0Ui8y2Kdrx+iHszi2 mIc3fuJY1t8LVpAjMH5Vu8ZD8LuBkH3DOHPLErPoJPkOF+0CaiLrR7DBStrKdsOF 5k5Jlgv/oYueCS0X1SAtc1W7t/vqUHgqnqNqnNaInGDOTblW/FTVOxRt8BF90sgS UPBy8K2EyhS/rZqEBEp7sLndzNhtGHmhCNOIptHsaixt+zh7bepdXEvSNThDLHs2 Pg+NTyGGsr5JzdzkjZZwV4Re5jPY5vNL9LTOqIr/x3rQiSo04M5u1rCuHKRFP3Dw ZFxamOXPDSo1Oo32042/yAwgpI+En1YVEEvXwNhudCeG1mEAxW+UejdhEAkxrpFt +BqDo+XWh9jNyqBFUMtMjjzbF3SWfjeDtMfFPCy6IamUqCWJoXk6uhGF6RH1GHqL QBJP3NSxMMU5X68fVG/dr2DKiprZmuJuuNup3qZJwGWZZAHV3/Z+3gmTLL/ISqDH RZnm2lc1aIgIFdD7s1cFxHueD3j7shxbwmFXkn5Jd+RUjKf7nDU= =enmW -END PGP SIGNATURE-
Re: Upgrading SOLR (not clustered)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Phil, On 7/25/18 12:38 PM, Staley, Phil R - DCF wrote: > What are the steps for upgrading a non-clustered SOLR version? > Here's what I thought should work: > > > > 1. Open a bash window and ssh login to desired server with > your Linux admin credentials > > 2. Change directories: cd /opt > > 3. Download the latest Linux/OSX version direct to server: > sudo wget http://apache.claz.org/lucene/solr/x.x.x/solr-x.x.x.tgz > (replace x.x.x with the latest version number) > > a. Additional download mirror servers are available @ > http://www.apache.org/dyn/closer.lua/lucene/solr/7.3.1 if the > http://apache.claz.org is site is slow. > > 4. Login as root user: sudo -i and enter you admin password > > 5. Unzip the .tgz file: tar zxf solr-x.x.x.tgz > > 6. Change directories: cd / > > 7. Stop SOLR service: service solr stop > > 8. Confirm that SOLR is stopped: service solr status > > 9. Change directories to your user home directory: cd > /home/myadminlogonid > > 10. Create new solr symbolic link in your user home folder that > points to new SOLR version: ln -s /opt/solr-x.x.x solr > > 11. Move/replace current symbolic link: mv solr /opt What version are you going from/to? What OS is this? Do you have an /etc/init.d/solr file? If so, where does SOLR_INSTALL_DIR point? Do you have an /etc/default/solr.in.sh file? If it points to all of your data-locations, then you should be okay. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltY3BgACgkQHPApP6U8 pFgPBRAAvhRGsL2b7WN9ivgerORr8P9Q5+eSrfiCh+DHcBWGUXbLvC5286U2Ieua F6A5xPbjg6hhNoo7TVN/b+5iPBJbZL/Ea9UnuR7ZdVL+xTVktoN+Y5HWUEHwEFk1 PTzfAw/GyNmN4hQbFLVQbYQn+hzYyj1xXCtwa/RKO82c7CEM5H43aTO90CoZa2Vh rNBeBiXXKPmlaL+RJdDs2yRZAjpTYO2FMJAZWPrzNq9R956tuZj8rPMrERhpLBuk Dh/33EZKaanLzEBEfOU5O5Qqm5oOlKqDDOK3hs25ru8o6pZ7wAPsiiBof0dSBkj1 V/DGdUfrSMjzVi7DYC1Ie0m1RI8IvHwUZZV7cT23S73U6+QvP+9ap/m8/P4CZCtH i06aSfEFHEhcvjM2DQ2+sbn2VRinbiQWggGtlr0lrauOSdJ/NCTb4fgiZ3w/esbC xdY4O9HwQhkjyKFgagKKIBdx/4klusrM+mx/VdhqQ5RtfiWcO3gqKZPlVHYfyc6m FWMW4i06QfmZLyLeH6xzBqOVUcUdY7UbwALEOO/Kgm2B9J9t/azDlkM4XcnMLeBT Ee7WuqREe4JoV9iH+MvReHfA+FbrO5vt0b2LFgI2RmcEgFzp1CDq/vbqkcEESN6C 5tRG1VjOpBTINvAKo2hNNwmiNIDGa6ZpWcxqgknZqQJiZ7AsuUw= =GP6+ -END PGP SIGNATURE-
Re: Possible to define a field so that substring-search is always used?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Chris, On 7/24/18 4:46 PM, Chris Hostetter wrote: > > : We are using Solr as a user index, and users have email > addresses. : : Our old search behavior used a SQL substring match > for any search : terms entered, and so users are used to being able > to search for e.g. : "chr" and finding my email address > ("ch...@christopherschultz.net"). : : By default, Solr doesn't > perform substring matches, and it might be : difficult to re-train > users to use *chr* to find email addresses by : substring. > > In the past, were you really doing arbitrary substring matching, or > just prefix matching? ie would a search for "sto" match > "ch...@christopherschultz.net" Yes. Searching for "sto" would result in a SQL query with a " WHERE ... LIKE '%sto%'" clause. So it was slow as hell, of course. > Personally, if you know you have an email field, would suggest > using a custom tokenizer that splits on "@" and "." (and maybe > other punctuation characters like "-") and then take your raw user > input and feed it to the prefix parser (instead of requiring your > users to add the "*")... > > q={!prefix f=email v=$user_input}_input=chr > > ...which would match ch...@gmail.com, f...@chris.com, f...@bar.chr > etc. > > (this wouldn't help you though if you *really* want arbitrary > substring matching -- as erick suggested ngrams is pretty much your > best bet for something like that) > > Bear in mind, you can combine that "forced prefix" query against > the (otkenized) email field with other queries that could parse > your input in other ways... > > user_input=... q=({!prefix f=email v=$user_input} OR {!dismax > qf="first_name last_name" ..etc.. v=$user_input}) > > so if your user input is "chris" you'll get term matches on the > first_name field, or the last_name field as well as prefix matches > on the email field. The problem is that our users (admins) sometimes need to locate users by their email address, and people often forget the exact spelling. So they'll call and say "I can't get in" and we have to search for "chris schultz" and then "chris" and then it turns out that their email address was actually sexylove...@yahoo.com, so they often have to try a bunch of searches before finding the right user record. Having to search for "sexylover42", a complete-match word, isn't going to work for their use-case. They need to be able to search for "lover" and have it work. I think n-grams sounds like the only way to get this done. I'll have to play-around with it a little bit to see how it behave s. Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltYedQACgkQHPApP6U8 pFjzgQ/9GW7kI9Lefnmj7zH8JsqZfW1Y/PrF4YA1RjbliNWRn2dRPz7Q7C2ITO/n Ys73uUII3qPz8M/H6d0LN57Un96BGAjIhf6WZSiIRAQcvenhGaS/lROciq6I8iN8 hB+1X2GixTG8fbq6Q6Q3jRG22S0GpW+OL2mJcu3wCkQ2dzyBWObWxjF1ag5O4pT+ AP0lqAgpUTsWAeMPPd6dkuStOhXraJQc+1WwwEw36gohwaZwLMftcOl2ohnys/DM pdyqQEQ6fOldJLBHLU8PyNVHxJA5qZjVTwu3S7zv7w+2N+V8bHOl6y5ir3krOEs0 OIvFX+Do+pbsg+QQ5VY8LDxbPBCjgDiWTpplh3Ym0raaVMoMQ6GfFfsOPF9jYhxS gb0eMwVTJFWM0xvMaH4xSXLR/Dh6upT/0do1sTr7kKjhIlwc3pfR/vIwqsVer1HJ Qsj6Pc+ZJckOrPGGIYCZEWZwlS8ONinAx4fh23/C1GltU19kHtRvGTQLzRT+9sus 2stvkD44Lv7zuc49/Y07NISxcUceTlbZHKC5ebzAtKNDS2p+qYLJlbdTZQIofMsb zmncdP+s5cSYgiCZZS19E2GxP7Yw2rmSn2zsSF6yJMgMy9logJi5HS1UQ54IWvn7 eAzvM+TcV6i+8Hf9kijNcg4/OZPv67DZt6HDcXO2K+a/AMyQElE= =4Y/b -END PGP SIGNATURE-
Re: Alias field names when searching (not for results)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Chris, On 7/24/18 1:40 PM, Chris Hostetter wrote: > > : So if I want to alias the "first_name" field to "first" and the : > "last_name" field to "last", then I would ... do what, exactly? > > se the last example here... > > https://lucene.apache.org/solr/guide/7_4/the-extended-dismax-query-par ser.html#examples-of-edismax-queries > > defType=edismax q=sysadmin name:Mike qf=title text last_name > first_name Aside: I'm curious about the use of "qf", here. Since I didn't want my users to have to specify any particular field to search, I created an "all" field and dumped everything into it. It seems like it would be better to change that so that I don't have an "all" field at all and instead I mention all of the fields I would normally have packed into the "all" field in the "qf" parameter. That would reduce my index size and also help with another question I had today (subject: Possible to define a field so that substring-search is always used?). Does that sound like a better approach than packing-together an "all" field during indexing? > f.name.qf=last_name first_name > > the "f.name.qf" has created an "alias" so that when the "q" > contains "name:Mike" it searches for "Mike" in both the last_name > and first_name fields. if it were "f.name.qf=last_name > first_name^2" then there would be a boost on matches in the > first_name field. > > For your usecase you want something like... > > defType=edismax q=sysadmin first:Mike last:Smith qf=title text > last_name first_name f.first.qf=first_name f.last.qf=last_name > > : I'm using SolrJ as the client. > > ...the examples above all show the request params, so "f.last.qf" > is a param name, "last_name" is the corrisponding param value. Awesome. I didn't realize that "f.alias.qf" was the name of the actual parameter to send. I was staring at the Solr Dashboard's selection of edismax parameters and not seeing anything that seemed correct. That's because it's a new parameter! Makes sense, now. Thanks a bunch, - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltXb9IACgkQHPApP6U8 pFifZxAAgQGXwsMzSQf9shJYmjgLgFWTYxQQBRJDFRgtEz0wtYkRS0nEoE+kO0xs BEGC6iXfXChAkOQ3Bv/QittRCxCQvXL+aoZA5ewcyumf0XhmU0My4R7AJOoIRGpO C9oPfUf8bwqynrTN0cXBIN8pr+KAG4rimAEMLxuscVeQAm3McrNbmmX22LL9VNRv /QBDnil8rPCYiprQn7SnN88IkU9irgwN/1QQ+YaUhwOMubPwygfxGTdkTJivi0KA fi5nmYE8A+wOzAGlP8GrMUZpkIfVx8VV96fwKdCyw+fi8MXVF+6rd+Z0u4TOI6Yq ZQ3d/GK7W5OImWpQOJUX9oHRmoKiUgE/27XRb6QSC/WwF1WOonClmHggSKkh24a8 dGa+5A6tbPdCxJwv9T2NPn7XBqOyvNfxzMUnItpIdNoM0lrHCOMmANoU6nnSjrPg iInAM9oG2p41zO8S83tv7KLVbOwS1xogmeUn5fr/5XQ5Z7g7V5yBE5oYgVTiUleB Sd+wjoCWeZIfLSJJfRYFLLjQmFqQOh2Fc6XCoyBYQeGLrlCiNLRHIS6dEisHFNq8 PLbXNuMyZOkrvLNFUWwYhC9pwQ8Q8z3C0i1uVSYlOVDd1GHVwJowVI9XCFbAGFoO 0ZXSy3TuHMgk8VGUZNNO0H9nHf3i8MAoMo4TDsgROs2Y9TXRVPM= =AEkI -END PGP SIGNATURE-
Re: Alias field names when searching (not for results)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Emir, On 3/6/18 2:42 AM, Emir Arnautović wrote: > I did not try it, but the first thing that came to my mind is to > use edismax’s ability to define field aliases, something like > f.f1.fq=field_1. Note that it is not recommended to have field > name starting with number so not sure if it will work with “1”. So if I want to alias the "first_name" field to "first" and the "last_name" field to "last", then I would ... do what, exactly? I'm using SolrJ as the client. queryParamMap.put("defType", "edismax"); queryParamMap.put([??], "f.first.fq=first_name f.last.fq=last_name"); ?? Thanks, - -chris >> On 5 Mar 2018, at 17:51, Christopher Schultz >> wrote: >> > All, > > I'd like for users to be able to search a field by multiple names > without performing a "copy-field" when analyzing a document. Is > that possible? Whenever I search for "solr alias field" I get > results about how to re-name fields in the results. > > Here's what I'd like to do. Let's say I have a document: > > { id: 1234, field_1: valueA, field_2: valueB, field_3: valueC } > > I'd like users to be able to find this document using any of the > following queries: > > field_1:valueA f1:valueA 1:valueA > > I just want the query parser to say "oh, 'f1' is an alias for > 'field_1'" and substitute that when performing the search. Is that > possible? > > -chris > > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltXUWwACgkQHPApP6U8 pFgytBAAnI5pSrmfo4vPr2Tvol+qeuBXXcJ1WogXI2CvE1wWfHUm0xOXXzJ+YzMb glv9UFs+VBfzksM9p4anJ0zLSQ82DxMv+dQ4c/rgMxTMkA/Yj7/9yBxp2jniFz5k Jaq6FlAcpmQYDKTTx8pZb9srIWfXRoQg2Kv4zFDftD9jQi5Fekn1wt4PuhIWdrWi 9ROX4Pajx6wyJccamfTr5xSiBnzDcA6CBGGMFPmXVPWozYqcDfz4Ohry5MgbHMaR wz0NMHSFjQ6zF9ZI28RM1z7gMT5xB1mG5HgC5oQWVD2V0PULdAIWC7tDZhlFGE6p USjELBdeV6NNARz3sIbI8MD+T0Ww0SIekJgz3xNcs8TMIi2k5s1ksEdJl5flrsZ5 wbR7hNYol2nb0Bx6p/wk9wXwxqfDrW9yT3gNg+kYRrEWZdfLqLOXrytTZ7BhTz1O 6xoUX58FugULPyj9zT/DFTxMicjzdLrXUZR9kpRZXZSDhhn9NrzC1zFYJVs/E7W5 2LzguS3zD6pR7stxAory4KaeuJEaU3pBo80P9jslOjBDrmZRIKFLCSaynTwxi2pF Z0LXwGw/Vpc96sznBe4BYvWmxKkjYGCAUjrXM+tortr2SxH2dd2/umXySB5uQRV8 hAjBkidVLm1pB6jirzxLOzOMeIXb6zXnlLhBbvXBvYVpY9yQNuw= =f0ub -END PGP SIGNATURE-
Re: Alias field names when searching (not for results)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Rick, On 3/6/18 6:39 PM, Rick Leir wrote: > The first thing that came to mind is that you are planning not to > have an app in front of Solr. Without a web app, you will need to > trust whoever can get access to Solr. Maybe you are on an > intranet. Nope, we have a web application between the user and Solr. But I would rather not parse the user's query string and re-write it so that the search field-names are canonicalized. Thanks, - -chris > On March 6, 2018 2:42:26 AM EST, "Emir Arnautović" > wrote: >> Hi, I did not try it, but the first thing that came to my mind is >> to use edismax’s ability to define field aliases, something like >> f.f1.fq=field_1. Note that it is not recommended to have field >> name starting with number so not sure if it will work with “1”. >> >> HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly >> Detection Solr & Elasticsearch Consulting Support Training - >> http://sematext.com/ >> >> >> >>> On 5 Mar 2018, at 17:51, Christopher Schultz >> wrote: >>> > All, > > I'd like for users to be able to search a field by multiple names > without performing a "copy-field" when analyzing a document. Is > that possible? Whenever I search for "solr alias field" I get > results >>> about > how to re-name fields in the results. > > Here's what I'd like to do. Let's say I have a document: > > { id: 1234, field_1: valueA, field_2: valueB, field_3: valueC } > > I'd like users to be able to find this document using any of the > following queries: > > field_1:valueA f1:valueA 1:valueA > > I just want the query parser to say "oh, 'f1' is an alias for > 'field_1'" and substitute that when performing the search. Is that > possible? > > -chris > > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltXTxwACgkQHPApP6U8 pFjHaA/9HYIjEges8puLMM8S+hwoKigLFzrGstbyWlrj5xHQsBnaeQbNCAv8TSO5 /Yx911UbJEc00etpJSiXVUbMWbwvzt1QmjKZADaUtUQpKJ3i/eORnFOu3/FXrojX LJFWNxasO/gpFMqz6ADqdsfjKLDiqDQHg6letg0QVQ4d3k3diD3rahJaoJYg67/e OeEOHqK9LTY+v9HGdLUzLQ87C2FQScsvnTX6vmCU7HLXcbJFOly/KXamL8gulM5g +sVQbMSB1l+jkU3TOkWZ2ovJJzB49qVto2ZxcrT682GHyHq8sZIX6nsFSRZQl7Af rCe0Esgdk0SPCf3NIcZugEKmlawqWulzDhheyFVDwc5kQhMmi9CFU+/JbQcT4yeM Q72TRCdESnH8W9jWDa9+WuBT7PW+BPBogBXhTT2JgptqPxA2iUPl1M9HdjqiZd4K qdt65YZrpomAQpcDBa4Rzl0yG7UXOuu5A3Ms6nYFyOB0lHdsQqtSVLVSgw1hw3g9 3tnRlyBi1FrrSpwDew8oNobMGVMigb3sxvjAO3lv6g6DH8YEcIyJE197xFVd5091 m+OQSpgO3iZtr7YxruDlM/fvofOLNevQS4LcdhXZoZ4Txi6cAi12svxId8w4yycq SEOfyXZvd9S0IOdC4UZVfJ+8Ome6Iy1BV+WHsdO8SWKoHW+m7cE= =x/55 -END PGP SIGNATURE-
Possible to define a field so that substring-search is always used?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, We are using Solr as a user index, and users have email addresses. Our old search behavior used a SQL substring match for any search terms entered, and so users are used to being able to search for e.g. "chr" and finding my email address ("ch...@christopherschultz.net"). By default, Solr doesn't perform substring matches, and it might be difficult to re-train users to use *chr* to find email addresses by substring. Is there a way to define the field such that searches are always done as a substring? While we are at it, I'd like to define the field to avoid tokenization because it's never useful to search for "m...@gmail.com" and find a few million search results because many users use @gmail.com email addresses. Here is the current field definition from our create-schema script: "add-field":{ "name":"email_address", "type":"text_general", "multiValued" : false, "stored":true }, Later, we add the email address to the "all" field (which aggregates everything from all useful fields into the field used as the default-field): "add-copy-field":{ "source":"email_address", "dest":"all" }, Is there a way to define these fields such that: 1. The email_address field is always searched using a substring 2. The email_address field is not tokenized 3. The copied-email-address is not tokenized in the "all" field Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltXTkcACgkQHPApP6U8 pFh1aRAAilB2nVGycjVyY2taAJv6x2ss33UcVL6xBATRUkHTCbyAr5LFN3FWmcOR iCbZdxCU5LSa0x0clMTlRjR0U8HF+l2J4ArMQYiveA9mXc6fZz+ovAYrBqDguE6b UZnbOcR3pDF+P5h3ch9aMbdkHAhsVN7AX5yiSIS0fqKn6irNrI7TkvRmiZqNzVFx sDIPChL9meMfh8rz7vVmu5IjaImnQZ+2tmc+QruFsbgKGXJMR4n+d0CjacIfd5vp hoZDpg9qcasnYau925xqlj4BBrPS1XiYOqvdgCxnO1l6qqVfBK+lVsPaP5FOtXZP 7Fe/unkzuK8j1Y0mZNpcZtMYYhsMHboT1Kegrn1mUZp9S6iL1NzbqzmsbDQyNqlg 8HghvGG7ROj/hkqLPOlGy6wp72GFQYrHuIEzdyDI9wHOaP+cdliCdkkmqIAQJilR ketzTVhEbOHGEHGa9obHg0NPqmYwP4DDmSOZ42z5UPr2KqaqpeXsqcB2CV7nnvB3 6hvKuHVWIrHE1P1k1XFwMF3Vy+YbeojFbvKLH+eNKXXOXu8PEn2MaZU5v12WNWEr 0l6K16VnFf436WqH/fSa1DZUfuphA4z0qg/oHqcUcfhVFjc+U1wSZVvdvpG+rSf1 n3NS9pqFAWruWq7V0ID5cV0PVRwp9g6pgs4XJAhKYEkiXVO8u7Y= =wAsa -END PGP SIGNATURE-
Re: solr basic authentication
Dinesh, On 6/21/18 11:40 AM, Dinesh Sundaram wrote: > is there any way to disable basic authentication for particular domain. i > have proxy pass from a domain to solr which is always asking credentials so > wanted to disable basic auth only for that domain. is there any way? I wouldn't recommend this, in general, because it's not really all that secure, but since you have a reverse-proxy in between the client and Solr, why not have the proxy provide the HTTP BASIC authentication information to Solr? That may be a more straightforward solution. -chris signature.asc Description: OpenPGP digital signature
Re: Solr Suggest Component and OOM
Ratnadeep, On 6/11/18 12:25 PM, Ratnadeep Rakshit wrote: > I am using the Solr Suggester component in Solr 5.5 with a lot of address > data. My Machine has allotted 20Gb RAM for solr and the machine has 32GB > RAM in total. > > I have an address book core with the following vitals - > > "numDocs"=153242074 > "segmentCount"=34 > "size"=30.29 GB > > My solrconfig.xml looks something like this - > > > > mySuggester1 > FuzzyLookupFactory > suggester_fuzzy_dir > > > > DocumentDictionaryFactory > site_address > suggestType > property_metadata > false > false > > > mySuggester2 > AnalyzingInfixLookupFactory > suggester_infix_dir > > DocumentDictionaryFactory > site_address_other > suggestType > property_metadata > false > false > > > > The handler is defined like so - > > > > true > 10 > mySuggester1 > mySuggester2 > false > explicit > > > suggest > > > > *Problem Statement* > > Every time I try to build the suggest index using the suggest.build=true > url parameter, I end up with an OutOfMemory error. I have no clue how I can > make this work with the current setup. Can anyone explain why this is > happening? And how can I fix this issue? > *StackOverflow:* > https://stackoverflow.com/questions/50802122/solr-suggest-component-and-outofmemory-error > Can you explain the nature of the OOM? Not all OOMs are due to heap exhaustion... -chris signature.asc Description: OpenPGP digital signature
Re: Collections unable to load after setting up SSL
Edwin, On 6/10/18 10:22 PM, Zheng Lin Edwin Yeo wrote: > I have found that we can't set it this way either, as we will get the below > error on "no valid keystore". > > set SOLR_SSL_KEY_STORE=/etc/solr-ssl.keystore.jks > set SOLR_SSL_TRUST_STORE=/etc/solr-ssl.keystore.jks > > Error: > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.eclipse.jetty.start.Main.invokeMain(Main.java:221) > at org.eclipse.jetty.start.Main.start(Main.java:504) > at org.eclipse.jetty.start.Main.main(Main.java:78) > Caused by: java.lang.IllegalStateException: no valid keystore > > > Any other ways can that we set or to generate the keystore? File permissions on /etc/solr-*? Effective user-id of the process trying to connect to Solr? If you use relative paths, do you have any idea what the paths are relative TO? -chris > On 9 June 2018 at 21:30, Zheng Lin Edwin Yeo wrote: > >> Hi Chris, >> >> I have deployed these files on the {SolrHome}\server\etc folder. >> >> Currently this is the setting of the path in edm.in.cmd. >> >> set SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.jks >> set SOLR_SSL_TRUST_STORE=etc/solr-ssl.keystore.jks >> >> For your meaning of absolute paths actually start with a slash, meaning >> we have to set it like this? >> >> set SOLR_SSL_KEY_STORE=/etc/solr-ssl.keystore.jks >> set SOLR_SSL_TRUST_STORE=/etc/solr-ssl.keystore.jks >> >> Regards, >> Edwin >> >> >> On 9 June 2018 at 00:15, Christopher Schultz >> wrote: >> >>> Edwin, >>> >>> On 6/8/18 12:02 PM, Zheng Lin Edwin Yeo wrote: >>>> I followed the steps from >>>> https://lucene.apache.org/solr/guide/7_3/enabling-ssl.html. >>>> >>>> 1) >>>> >>>> keytool -genkeypair -alias solr-ssl -keyalg RSA -keysize 2048 -keypass >>>> secret -storepass secret -validity -keystore >>>> solr-ssl.keystore.jks -ext >>>> SAN=DNS:localhost,IP:192.168.1.3,IP:127.0.0.1 -dname "CN=localhost, >>>> OU=Organizational Unit, O=Organization, L=Location, ST=State, >>>> C=Country" >>>> >>>> >>>> 2) >>>> >>>> keytool -importkeystore -srckeystore solr-ssl.keystore.jks >>>> -destkeystore solr-ssl.keystore.p12 -srcstoretype jks -deststoretype >>>> pkcs12 >>>> >>>> >>>> 3) >>>> >>>> openssl pkcs12 -in solr-ssl.keystore.p12 -out solr-ssl.pem >>>> >>>> >>>> >>>> I have also set these in solr.in.cmd: >>>> >>>> SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.jksSOLR_SSL_KEY_STO >>> RE_PASSWORD=secretSOLR_SSL_TRUST_STORE=etc/solr-ssl.keystore >>> .jksSOLR_SSL_TRUST_STORE_PASSWORD=secret# >>>> Require clients to authenticateSOLR_SSL_NEED_CLIENT_AUTH=false# Enable >>>> clients to authenticate (but not >>>> require)SOLR_SSL_WANT_CLIENT_AUTH=false# Define Key Store type if >>>> necessarySOLR_SSL_KEY_STORE_TYPE=JKSSOLR_SSL_TRUST_STORE_TYPE=JKS >>> >>> You didn't describe how you have deployed each of these files on each of >>> your servers. >>> >>> You might want to make sure that all your (attempted) absolute paths >>> actually start with a slash, though. >>> >>> -chris >>> >>> >> > signature.asc Description: OpenPGP digital signature
Re: Collections unable to load after setting up SSL
Edwin, On 6/8/18 12:02 PM, Zheng Lin Edwin Yeo wrote: > I followed the steps from > https://lucene.apache.org/solr/guide/7_3/enabling-ssl.html. > > 1) > > keytool -genkeypair -alias solr-ssl -keyalg RSA -keysize 2048 -keypass > secret -storepass secret -validity -keystore > solr-ssl.keystore.jks -ext > SAN=DNS:localhost,IP:192.168.1.3,IP:127.0.0.1 -dname "CN=localhost, > OU=Organizational Unit, O=Organization, L=Location, ST=State, > C=Country" > > > 2) > > keytool -importkeystore -srckeystore solr-ssl.keystore.jks > -destkeystore solr-ssl.keystore.p12 -srcstoretype jks -deststoretype > pkcs12 > > > 3) > > openssl pkcs12 -in solr-ssl.keystore.p12 -out solr-ssl.pem > > > > I have also set these in solr.in.cmd: > > SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.jksSOLR_SSL_KEY_STORE_PASSWORD=secretSOLR_SSL_TRUST_STORE=etc/solr-ssl.keystore.jksSOLR_SSL_TRUST_STORE_PASSWORD=secret# > Require clients to authenticateSOLR_SSL_NEED_CLIENT_AUTH=false# Enable > clients to authenticate (but not > require)SOLR_SSL_WANT_CLIENT_AUTH=false# Define Key Store type if > necessarySOLR_SSL_KEY_STORE_TYPE=JKSSOLR_SSL_TRUST_STORE_TYPE=JKS You didn't describe how you have deployed each of these files on each of your servers. You might want to make sure that all your (attempted) absolute paths actually start with a slash, though. -chris signature.asc Description: OpenPGP digital signature
Re: Collections unable to load after setting up SSL
Edwin, On 6/7/18 11:11 PM, Zheng Lin Edwin Yeo wrote: > Hi, > > I am running SolrCloud on Solr 7.3.1 on External ZooKeeper 3.4.11, and I am > setting up the security aspect of Solr. > > After setting up the SSL based on the steps from > https://lucene.apache.org/solr/guide/7_3/enabling-ssl.html, the collections > that are with 2 replica are no longer able to be loaded. > > What could be causing the issue? > > I remember that wasn't this problem when I tried the same thing in Solr 6 > and even Solr 7.1. I've fought a bit to get Solr running on a single instance with SSL, so I can imagine that ZK might be an issue for you. Can you describe how each server's truststores and keystores are configured? Are you using client-validated servers (e.g. one-way TLS like you would with most public web sites) or are you using mutual-authentication where the server is also checking the client's certificate? -chris signature.asc Description: OpenPGP digital signature
Re: Windows monitoring software for Solr recommendation
TK On 6/5/18 1:12 PM, TK Solr wrote: > My client's Solr 6.6 running on a Windows server is mysteriously > crashing without any JVM crash log. No unusual activities recorded in > solr.log. GC log does not indicate the OOM situation. It's a simple > single-core, single node deployment (no solrCloud). It has very light > load. No indexing activities were running near the crash time. > > After exhausting all possibilities (suggestions are welcome), I'd like > to recommend to install some monitoring software but I couldn't find one > that works on Windows for a Java based software. (Some I found can > monitor only EXEs. Since all java software shares the same EXE, > java.EXE, those won't work.) Can anyone recommend some? They don't need > to be free but can't be very expensive since it's a very lightly used > Solr system. Perhaps less than $500? How about Apache procrun/commons-daemon? https://commons.apache.org/proper/commons-daemon/procrun.html I don't know how much of a pain it would be to set it up to run Solr, but it runs Apache Tomcat quite well and has its own logs for things like "process died, restarted" which might give you some insight. -chris signature.asc Description: OpenPGP digital signature
Re: Self Signed Certificate for Load Balancer and Solr Nodes
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Kelly, On 6/1/18 5:41 PM, Kelly Rusk wrote: > I can directly connect to either node without issue, it is only > when the Load Balancer routes to either solr1 or solr2 that I get > the security error (ex. https://solrlb.com:8983/solr). The Load > Balancer is not managing HTTPS but just acting as a pure TCP proxy. > Nothing more complex than sending traffic to either solr1 or > solr2... however, the URL will be displayed as solrlb.com as it > hides the real address of what is being routed to. > > In this case, do we need a certificate for solrlb.com installed on > both solr1 and solr2? That's exactly what you need. It would be best to: 1. Create a certificate for solrlb.com 2. Install the same key + certificate on both Solr nodes 3. Always use solrlb.com for any links and redirects you generate Optionally, you could add SANs for that certificate for both solr1 and solr2 just in case you want to be able to connect directly to either back-end node without getting hostname mismatch complaints. > In our previous environments we used the same load balancer setup, > but that worked since the Solr nodes were serving over http and > not https. You probably never noticed that redirects were occurring that were sending users to a particular node instead of always using the lb's hostname because there was never anything double-checking the hostname. In your previous message, you mentioned that you got an error message including the hostname "b-win-solr-01.azure-dfa.com" which probably isn't your load-balancer's hostname. That suggests to me that some kind of redirect (or similar) is occurring and that the redirect doesn't understand that there is a reverse-proxy/lb out in front of the node. Hope that helps, - -chris > -Original Message- From: Shawn Heisey > Sent: Friday, June 1, 2018 5:25 PM To: > solr-user@lucene.apache.org Subject: Re: Self Signed Certificate > for Load Balancer and Solr Nodes > > On 6/1/2018 2:01 PM, Kelly Rusk wrote: >> We have solr1.com and solr2.com self-signed certs that correspond >> to the two servers. We also have a load balancer with an address >> named solrlb.com. When we hit the load balancer it gives us an >> SSL error, as it is passing us back to either solr1.com or >> solr2.com, but since these two Solr servers only have each >> other's self-signed cert installed in their Keystore, it doesn't >> resolve when it comes in through the load balanced address of >> solrlb.com. >> >> We tried a san certificate that has all 3 addresses, but when we >> do this, we get the following error: >> >> This page can't be displayed Turn on TLS 1.0, TLS 1.1, and TLS >> 1.2 in Advanced settings and try connecting to >> https://b-win-solr-01.azure-dfa.com:8983 again. If this error >> persists, it is possible that this site uses an unsupported >> protocol or cipher suite such as RC4 (link for the details), >> which is not considered secure. Please contact your site >> administrator. > > One really important question is whether the load balancer acts as > a pure TCP proxy, or whether the load balancer is configured with a > certificate and handles HTTPS itself. > > If the load balancer is handling HTTPS, it's very likely that the > load balancer either cannot use modern TLS protocols and/or > ciphers, or that it has the modern protocols/ciphers turned off. > There's probably nothing that we can do to help you in this > situation. You will need to find support for your load balancer. > > If the load balancer is just a TCP proxy and lets the back end > server handle HTTPS, then you may need to ensure that you're > running a very recent version of Java 8. You may also need to > install the JCE policy files for unlimited strength encryption into > your Java. I see from other messages on the list that you're > running Solr 6.6.2, so it would not be a good idea for you to use > Java 9 or Java 10. If you need them, the JCE policy files for Java > 8 can be found here: > > http://www.oracle.com/technetwork/java/javase/downloads/jce8-download- 2133166.html > > One thing you didn't explicitly mention is whether the connection > works when talking directly to one of the Solr servers instead of > the load balancer. If that works, then your Java version is > probably fine, and it's even more evidence that the problem is on > the load balancer. > > Thanks, Shawn > > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlsRwYwACgkQHPApP6U8 pFh5LA/+MWkaeylVbsXwL9TxU/qe6fyW82/OVznqDNVKF2KwvtloMjKIyo90ZdqB N2fqRfczyqN2NporI7dZtj68Qcb7JiOkzfKUQJX/4Ecgfl6WhcrcnzC6jt9B6oQR c0W02QGGKREz2l719ZI4wohgGPX7HD+u+GXlUdz+v1Bw+4vZlG9LzDJ7YC9XDgXX 1hUDfdmBHS2krMnp5/1bsIvg9Xr58Orrwz20EKyumzUZ/P9WekoUw7WeqJSuuQoN n3+yM8BMPp/AUy7+5gcvaKtd9mB6J4oUyQQAfj+cNOg/eOiY2t+EFr8b+pVBDG+z
Re: [OT] Self Signed Certificate for Load Balancer and Solr Nodes
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 6/1/18 5:25 PM, Shawn Heisey wrote: > On 6/1/2018 2:01 PM, Kelly Rusk wrote: >> We have solr1.com and solr2.com self-signed certs that correspond >> to the two servers. We also have a load balancer with an address >> named solrlb.com. When we hit the load balancer it gives us an >> SSL error, as it is passing us back to either solr1.com or >> solr2.com, but since these two Solr servers only have each >> other's self-signed cert installed in their Keystore, it doesn't >> resolve when it comes in through the load balanced address of >> solrlb.com. >> >> We tried a san certificate that has all 3 addresses, but when we >> do this, we get the following error: >> >> This page can't be displayed Turn on TLS 1.0, TLS 1.1, and TLS >> 1.2 in Advanced settings and try connecting to >> https://b-win-solr-01.azure-dfa.com:8983 again. If this error >> persists, it is possible that this site uses an unsupported >> protocol or cipher suite such as RC4 (link for the details), >> which is not considered secure. Please contact your site >> administrator. > > One really important question is whether the load balancer acts as > a pure TCP proxy, or whether the load balancer is configured with > a certificate and handles HTTPS itself. > > If the load balancer is handling HTTPS, it's very likely that the > load balancer either cannot use modern TLS protocols and/or > ciphers, or that it has the modern protocols/ciphers turned off. > There's probably nothing that we can do to help you in this > situation. You will need to find support for your load balancer. > > If the load balancer is just a TCP proxy and lets the back end > server handle HTTPS, then you may need to ensure that you're > running a very recent version of Java 8. You may also need to > install the JCE policy files for unlimited strength encryption into > your Java. I see from other messages on the list that you're > running Solr 6.6.2, so it would not be a good idea for you to use > Java 9 or Java 10. If you need them, the JCE policy files for Java > 8 can be found here: > > http://www.oracle.com/technetwork/java/javase/downloads/jce8-download- 2133166.html Starting > with Oracle Java 8u151 and later, the "unlimited strength jurisdiction policy files" are included in the default build, so you no longer have to manually-install them. Nice to see that Java finally got out of the 1990s mindset when it comes to cryptography. Unfortunately, Java 8 is close to EOL[2] so it's time to look at newer versions of Java, which likely means newer versions of Solr if you want to be safe and secure. I say "close to EOL" even though it's 7 months away because it can take a looong time to plan and execute an upgrade of both Solr and Java. - -chris [1] https://golb.hplar.ch/2017/10/JCE-policy-changes-in-Java-SE-8u151-and-8u 152.html [2] http://www.oracle.com/technetwork/java/javase/eol-135779.html -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlsRwEIACgkQHPApP6U8 pFi+ng/+KhdiGaSN4PHjHvroqcNXKmmvXbMIIHcHCAARGnTS+0LuZGhopWbJA0u+ NhE/fHJTyRFtfCBaY6gL9NsumAQTXA2kCLsKpWv86WaVEWZSH55BC/0aJCNp/xOU /QheBJ255RDBYeLZvGAngAS7mWK1wPh6BhsD0bNwtoU7xGCZQtvLt7CdQLu+F8Dm uJczJOipp8SS/TlTJcP9t02WW3RvjqIZbn4EEr0DZj7hzy1ST8/yzu7cNpo+uQw5 AmoIDik8TmVKmT7h/gW8/frpz7brI+Zw3qm+YELpJK2SQywqhZFdhPjnnAqYKqY0 AuVJlYeC+0ivw/3oHQM/kShzqgXiMTv8bp63BbEYcWt1z9pb2Ltrx/jHsEQYr6k1 bxHAnrXXoQQTq8wm4jqYBSfEB97JyYWqCKJ04HyhxJ9Tzqv5vUwL1xXf4mY0m6dA eDGoKQ3fjHZaMzUhc0c/zv4MwMH+KYzZ05Y5mdT1UHaYGX3sUMGhdSyNlvWZy4Np G7ehzOdsuEO+b5+YBQQpWarei76I5soPttkz5rrvWfksn8jUHo0VoqDVs0/g6uY4 5p85OJPF/C4quLDWHN1swpVQJ2q4R3C4RdGjdb2WT+hkks6c1WkqGfkAH2ONA+DS dxG83u9aDxm+eyoj+GvMlTIAGnqutU2nNQrErb5sGjVHkQaLLaw= =PK53 -END PGP SIGNATURE-
Re: CURL command problem on Solr
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Roee, On 5/30/18 3:38 AM, Roee T wrote: > Thank you so much all of you the following worked for me! > > curl -X PUT -H "Content-Type: application/json" -d > "@Myfeatures.json" > "http://localhost:8983/solr/techproducts/schema/feature-store; Curl assumes that the URL is the last argument to the program, so it stops reading options (left-to-right) when it gets to the URL. So if you put options after the URL they will be ignored. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlsOzDIACgkQHPApP6U8 pFjHjhAAjwI0xMzqK1Pzhogiumq6XVNLr8BqqlL2jMMXcf3EcwOx5WT62oqDFs92 JILArQSPp22GUOvR3cTxlmTVAYsjvMqsvkbVugxeU5VtBBz1VVwy3bU220nKlooo El9T0292wbuP/QbUGdt0qfnpMkXIfbdwKJhd7MQ30J8S7XxvQx8j5YQhe2MAkPlz x7Bc4Qy2J6ov5wNq2sd4wuj5XvvjDE+8pFDXWtC6m7mfjsbGrHTAIoTI843GAVRz RkMd08vzsmoS81cNsaQAqxJCX0tP2Hwbx0asH94ZO0ohlHe8dB5hmk1fS0TDgNae QR4hczJ3lYQCpvZXYFCUihC/7Sfpe3/yjs/Ke2DlbUtXJHaLulSYoo7RrgTl3JZy zBne6HNtcruvQAqDIjKq8xcAzszLsxVPA4RGqO/J5uY96hyuUe/NuJUeUdRTkIbU wC+DYs8ch7PeOMGkW1MYWSeakPRdQ1/5EKS1mtubJNBVOCri+hy4I+KT5V1f9y8x 8GIySXaoH52xt3b/hsJajQ2PdHd4KRGgB1H7mx9ntXsoVzmPSanuxQ6w+E/XUHDt iyl2WheLtUop+ukE7ahGUe+IPEVqTMXtdiQBCDB0IWyGbsB00M5P9ZUeFbOCCfle B0N3Jafv7hGjLHzfjpu3lAUneS3ct2Ljy4Za2snW/ZgMzezHUUY= =xZzo -END PGP SIGNATURE-
Re: CURL command problem on Solr
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Roee, On 5/29/18 11:02 AM, Roee Tarab wrote: > I am having some troubles with pushing a features file to solr > while building an LTR model. I'm trying to upload a JSON file on > windows cmd executable from an already installed CURL folder, with > the command: > > curl -XPUT > 'http://localhost:8983/solr/techproducts/schema/feature-store' > --data-binary "@/path/myFeatures.json" -H > 'Content-type:application/json'. > > I am receiving the following error massage: > > { "responseHeader":{ "status":500, "QTime":7}, "error":{ "msg":"Bad > Request", "trace":"Bad Request (400) - Invalid content type > application/x-www-form-urlencoded; only application/json is > supported.\r\n\tat > org.apache.solr.rest.RestManager$ManagedEndpoint. > parseJsonFromRequestBody(RestManager.java:407)\r\n\tat > org.apache.solr.rest. > RestManager$ManagedEndpoint.put(RestManager.java:340) > > This is definitely a technical issue, and I have not been able to > overcome it for 2 days. > > Is there another option of uploading the file to our core? Is > there something we are missing in our command? What happens if you put the URL as the very last command-line option, instead of the second one? - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlsNpFMACgkQHPApP6U8 pFjCxQ//WvN5ISrSf1Hoek+HA9e/1jgvHdtbPTz3SKx9Cxv0M28VDE41dOD8/TJU Yeu8WIIyjbAOugPxYd6X/1Q+ksmzp8DwcANO4uWjM7m9KnrKUcgUqFbiEx5DCWFv cCO49lD6pbnP7M21BFqIUPdRu4Sk84bObhb8+pFiANDurGG9iDGsk4z5JG8kph1n QtJeyGss79GF4Fb8Ojs+rju+fcMW9tssi2NCbPI/OUmcEntonmVQKW6Zg8WaqlXD w29gjss9P6sMloyIe4QbusxfwCL//HdCjuTBOAOZg/Od+Xb4bHG3AkZGqjmf21qC oR7hjwkQtjl9C9yK5pHMPvAK1bUR8NCuv993dCOw3ddwdPsScv7K7TsI7GqVOfCD X+PwkrE1PeZbPfSJGO4jVEwRIZ1zx5jRwl2WFpa0HSTnN2+GHVZnezqqIOW6HVax Hb/7r13vs+6jOUBQPZvzcWtnGl7DurAwYM3nREgBjzMeXYMKqI67lwSBieoyC/da a8GxkZBn6J+vutLI/hodi8ymUB+wNxiV6W4XTG8t2HSLGmZWD9fUgW6gr4a8WRQk LM8yzmVSADjTkf5/fdKZ9ausYoMwHzrxKc0ceuK1iEF9WNts6AdOoIcIxrrFfr0v yPyXnVaGS/5eLnwEt3vR8DROZRpX6OUKgteZRln0QQpAWegzW/I= =U23/ -END PGP SIGNATURE-
Re: Question regarding TLS version for solr
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Anchal, On 5/24/18 6:02 AM, Anchal Sharma2 wrote: > Thanks a lot for sharing the steps . I tried few of them .Actually > we already have been using solr in our application since an year or > so .We just want to encrypt it to use secure solr now .So ,I > followed the steps where you have created the certificates ,etc > .But when I go to start the solr back ,it doesnt start . We are > using zookeeper .Following is the error I get ,on running solr > start command. > > Command:./solr -c -m 1g -p 8984 -z :2181 -s folder containing data> > > Error: > > lsof 4.55 (latest revision at > ftp://vic.cc.purdue.edu/pub/tools/unix/lsof) usage: > [-?abhlnNoOPRstUvVX] [-c c] [+|-d s] [+|-D D] [+|-f[cfgGn]] [-F > [f]] [-g [s]] [-i [i]] [+|-L [l]] [-m m] [+|-M] [-o [o]] [-p s] > [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [--] [names] Use the > ``-h'' option to get more help information. Still not seeing Solr > listening on 8984 after 30 seconds! at > java.security.KeyStore.load(KeyStore.java:1456) at > org.eclipse.jetty.util.security.CertificateUtils.getKeyStore(Certifica teUtils.java:55) > > at org.eclipse.jetty.util.ssl.SslContextFactory.loadKeyStore(SslContextFact ory.java:871) > at > org.eclipse.jetty.util.ssl.SslContextFactory.doStart(SslContextFactory .java:273) > > at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCyc le.java:68) > at > org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLif eCycle.java:132) > > at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLif eCycle.java:114) > at > org.eclipse.jetty.server.SslConnectionFactory.doStart(SslConnectionFac tory.java:64) > > at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCyc le.java:68) > at > org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLif eCycle.java:132) > > at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLif eCycle.java:114) > at > org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.j ava:256) > > at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetwor kConnector.java:81) > at > org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java: 236) > > at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCyc le.java:68) > at org.eclipse.jetty.server.Server.doStart(Server.java:366) at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeC ycle.java:68) > > at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:12 55) > at > java.security.AccessController.doPrivileged(AccessController.java:594) > > at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:117 4) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j ava:90) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:55) > at java.lang.reflect.Method.invoke(Method.java:508) at > org.eclipse.jetty.start.Main.invokeMain(Main.java:321) at > org.eclipse.jetty.start.Main.start(Main.java:817) at > org.eclipse.jetty.start.Main.main(Main.java:112) 2018-05-24 > 09:05:16.714 INFO > (zkCallback-3-thread-1-processing-n:9.109.122.113:8984_solr) [ ] > o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent > state:SyncConnected type:NodeDataChanged path:/clusterstate.json, > has occurred - updating... (live nodes size: 1) 2018-05-24 > 09:05:17.018 INFO > (zkCallback-3-thread-1-processing-n:9.109.122.113:8984_solr) [ ] > o.a.s.c.c.ZkStateReader Updated cluster state version to 9702 > 2018-05-24 09:05:17.153 INFO > (coreLoadExecutor-7-thread-2-processing-n:9.109.122.113:8984_solr) > [c:document r:core_node1 x:document] o.a.s.u.SolrIndexConfig > IndexWriter infoStream solr logging is enabled [\] sleep: bad > character in argument What does the solr.log file say? The above stack trace isn't terribly helpful, and it's incomplete. - -chris > -Christopher Schultz <ch...@christopherschultz.net> wrote: > - To: solr-user@lucene.apache.org From: Christopher Schultz > <ch...@christopherschultz.net> Date: 05/23/2018 07:29PM Subject: > Re: Question regarding TLS version for solr > > Anchal, > > On 5/23/18 2:38 AM, Anchal Sharma2 wrote: >> Thank you for replying .But ,I checked the java version solr >> using ,and it is already version 1.8. > >> @Christopher ,can you let me know what steps you followed for >> TLS authentication on solr version 7.3.0. > > Sure. Here are my deployment notes. You may have to adjust them > slightly for your environment. Note that we are using standalone
Re: Question regarding TLS version for solr
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Anchal, On 5/23/18 2:38 AM, Anchal Sharma2 wrote: > Thank you for replying .But ,I checked the java version solr using > ,and it is already version 1.8. > > @Christopher ,can you let me know what steps you followed for TLS > authentication on solr version 7.3.0. Sure. Here are my deployment notes. You may have to adjust them slightly for your environment. Note that we are using standalone Solr without any Zookeeper, clustering, etc. This is just about configuring a single instance. Also, this guide says 7.3.0, but 7.3.1 would be better as it contains a fix for a CVE. === CUT === Instructions for installing Solr and working with Cores Installation - Installing Solr is fairly simple. One can simply untar the distribution tarball and work from that directory, but it is better to install it in a somewhat more centralized place with a separate data directory to facilitate upgrades, etc. 1. Obtain the distribution tarball Go to https://lucene.apache.org/solr/mirrors-solr-latest-redir.html and obtain the latest supported version of Solr. (7.3.0 as of this writing). 2. Untar the archive $ tar xzf solr-x.y.x.tgz 3. Install Solr $ cd solr-x.y.z $ sudo bin/install_solr_service.sh ../solr-x.y.z.tgz \ -i /usr/local \ -d /mnt/securefs/solr \ -n (that last -n says "don't start Solr") 4. Configure Solr Settings Edit the file /etc/default/solr.in.sh Settings you may want to explicitly set: SOLR_JAVA_HOME=(java home) SOLR_HEAP="1024M" 5. Configure Solr for TLS Create a server key and certificate: $ sudo mkdir /etc/solr $ sudo keytool -genkey -keyalg EC -sigalg SHA256withECDSA -keysize 256 -validity 730 \ -alias 'solr-ssl' -keystore /etc/solr/solr.p12 -storetype PKCS12 \ -ext san=dns:localhost,ip:192.168.10.20 Use the following information for the certificate: First and Last name: 192.168.10.20 (or "localhost", or your IP address) Org unit: [whatever] Everything else should be obvious Now, export the public key from the keystore. $ sudo /usr/local/java-8/bin/keytool -list -rfc -keystore /etc/solr/solr.p12 -storetype PKCS12 -alias solr-ssl Copy that certificate and paste it into this command's stdin: $ sudo keytool -importcert -keystore /etc/solr/solr-server.p12 - -storetype PKCS12 -alias 'solr-ssl' Now, fix the ownership and permissions on these files: $ sudo chown root:solr /etc/solr/solr.p12 /etc/solr/solr-server.p12 $ sudo chmod 0640 /etc/solr/solr.p12 Edit the file /etc/default/solr.in.sh Set the following settings: SOLR_SSL_KEY_STORE=/etc/solr/solr.p12 SOLR_SSL_KEY_STORE_TYPE=PKCS12 SOLR_SSL_KEY_STORE_PASSWORD=whatever # You MUST set the trust store for some reason. SOLR_SSL_TRUST_STORE=/etc/solr/solr-server.p12 SOLR_SSL_TRUST_STORE_TYPE=PKCS12 SOLR_SSL_TRUST_STORE_PASSWORD=whatever Then, patch the file bin/post; you are going to need this, later. - --- bin/post2017-09-03 13:29:15.0 -0400 +++ /usr/local/solr/bin/post2018-04-11 20:08:17.0 -0400 @@ -231,8 +231,8 @@ PROPS+=('-Drecursive=yes') fi - -echo "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}" org.apache.solr.util.SimplePostTool "${PARAMS[@]}" - -"$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}" org.apache.solr.util.SimplePostTool "${PARAMS[@]}" +echo "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}" ${SOLR_POST_OPTS} org.apache.solr.util.SimplePostTool "${PARAMS[@]}" +"$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}" ${SOLR_POST_OPTS} org.apache.solr.util.SimplePostTool "${PARAMS[@]}" 6. Configure Solr to Require Client TLS Certificates On each client, create a client key and certificate: $ keytool -genkey -keyalg EC -sigalg SHA256withECDSA -keysize 256 \ -validity 730 -alias 'solr-client-ssl' Now dump the certificate for the next step: $ keytool -exportcert -keystore [client-key-store] -storetype PKCS12 \ -alias 'solr-client-ssl' Don't forget that you might want to generate your own client certifica te to use from you own web browser if you want to be able to connect to t he server's dashboard. Use the output of that command on each client to put the cert(s) into this trust store on the server: $ sudo keytool -importcert -keystore /etc/solr/solr-trusted-clients.p12 \ -storetype PKCS12 -alias '[client key alias]' Edit /etc/default/solr.in.sh and add the following entries: SOLR_SSL_NEED_CLIENT_AUTH=true SOLR_SSL_TRUST_STORE=/etc/solr/solr-trusted-clients.p12 SOLR_SSL_TRUST_STORE_TYPE=PKCS12 SOLR_SSL_TRUST_STORE_PASSWORD=whatever Summary of Files in /etc/solr - - solr-client.p12 Client keystore. Contains client key and certificate. Used by clients to
Re: Question regarding TLS version for solr
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 5/17/18 4:23 AM, Shawn Heisey wrote: > On 5/17/2018 1:53 AM, Anchal Sharma2 wrote: >> We are using solr version 5.3.0 and have been trying to enable >> security on our solr .We followed steps mentioned on site >> -https://lucene.apache.org/solr/guide/6_6/enabling-ssl.html .But >> by default it picks ,TLS version 1.0,which is causing an issue >> as our application uses TLSv 1.2.We tried using online resources >> ,but could not find anything regarding TLS enablement for solr . >> >> It will be a huge help if anyone can provide some suggestions as >> to how we can enable TLS v 1.2 for solr. > > The choice of ciphers and encryption protocols is mostly made by > Java. The servlet container might influence it as well. The only > servlet container that is supported since Solr 5.0 is the Jetty > that is bundled in the Solr download. > > TLS 1.2 was added in Java 7, and it became default in Java 8. If > you can install the latest version of Java 8 and make sure that it > has the policy files for unlimited crypto strength installed, > support for TLS 1.2 might happen automatically. There is no "default" TLS version for either the client or the server: the two endpoints always negotiate the highest mutual version they both support. The key agreement, authentication, and cipher suites are the items that are negotiated during the handshake. > Solr 5.3.0 is running a fairly old version of Jetty -- 9.2.11. > Information for 9.2.x versions is hard to find, so although I think > it probably CAN do TLS 1.2 if the Java version supports it, I can't > be absolutely sure. You'll need to upgrade Solr to get an upgraded > Jetty. I would be shocked if Jetty ships with its own crypto libraries; it should be using JSSE. Anchal, Java 1.7 or later is an absolute requirement if you want to use TLSv1.2 (and you SHOULD want to use it). I have recently spent a lot of time getting Solr 7.3.0 running with TLS mutual-authentication, but I haven't worked with the 5.3.x line. I can tell you have I've done things for my version, but they may need some adjustments for yours. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlr9fKYACgkQHPApP6U8 pFh8lRAAmmvBMUSk35keW0OG0/SHpUy/ExJK69JGIKGwi96ddbz2yH8MG+OjjE3G GNq/o5+EMT7tP/nW6XuPQou5UQvA2nlA9jsskox3A+CqOH7e6cbSxfxIkTqf9YDl Kxr4J6mYjvTIjJAqLXGF+ghJfswS6RjZezDgo1PdSUox+gUOvmY61tlSjuYTaAYw vH1i1DRzb8PkkR4ULePF48Y4r5+ZYz/4ZwSvnJTTkyl97KCw93rZ/kI5v9p3cCHK Ycuwi/ZirO/VNf/9ruAOtgET3aojNfuNCX/A+vrSbJfiY7mXo05lYKN+eT80elQr X8OKQaqHP6haF2aNPHrqXGtY2YoiGrdyaGtrXkUHFDfXgQeOmlk/eSVWemcSsatk eEHSWW9NALMaalRAM7NuXQtgqq1badJhKysiJwSqFgcdgVKcSt8SsQ/09qTPjaNE Ce1/EHdR6j1hM0Bnv5Hzf85cZjM7PfLmh7P8fnUD5d8eSbBpeWYVBDsS+fXp8WWv FO5axbnSYIScOIz33i0UZyxpJgcsAkABLGghL6WWQSkfBf4ANgdTumS7K9Pn7Thz Uq+lD9QPEPWJ91Fc0gnCWtDAEIRjOyLLbYzgI4ebV5qo41GO1WDDHfQZEcqA0Vod +K8oAMD8nnwU+TprTFkjlQwbDnW1q1efTD6IrpEL5H7h6Xw2cgg= =RpO6 -END PGP SIGNATURE-
Re: Using Solr / Lucene with OpenJDK
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 4/24/18 11:23 AM, Shawn Heisey wrote: > On 4/24/2018 8:50 AM, Steven White wrote: >> Does anyone use Solr, any version, with OpenJDK? If so, what >> has been you experience? Also, what platforms have you used it >> on? > > I've used it on Linux. It worked without issues. What version of OpenJDK did you happen to run? My preference would be for Java 8, but on my Debian Wheezy install only Java 7 is available. In general, I prefer package-managed versions of everything whenever possible, but I have found that on Debian the openjdk package requires many dependencies that I might not otherwise want to install (at least not globally)[1], so I tend to go with the tarball from Oracle. I'm still on the fence for a production deployment. - -chris [1] Here's what Debian Wheezy currently says it wants to install when I tell it to install the "default-jre" package: ca-certificates-java dbus default-jre default-jre-headless fontconfig fontconfig-config hicolor-icon-theme java-common libasound2 libasyncns0 libatk-wrapper-java libatk-wrapper-java-jni libatk1.0-0 libatk1.0-data libavahi-client3 libavahi-common-data libavahi-common3 libcairo2 libcups2 libdatrie1 libdbus-1-3 libdrm-intel1 libdrm-nouveau1a libdrm-radeon1 libdrm2 libffi5 libflac8 libfontconfig1 libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-common libgif4 libgl1-mesa-dri libgl1-mesa-glx libglapi-mesa libglib2.0-0 libglib2.0-data libgtk2.0-0 libgtk2.0-bin libgtk2.0-common libice6 libjasper1 libjbig0 libjpeg8 libjson0 liblcms2-2 libnspr4 libnss3 libogg0 libpango1.0-0 libpciaccess0 libpcsclite1 libpixman-1-0 libpng12-0 libpulse0 libsctp1 libsm6 libsndfile1 libsystemd-login0 libthai-data libthai0 libtiff4 libvorbis0a libvorbisenc2 libx11-6 libx11-data libx11-xcb1 libxau6 libxcb-glx0 libxcb-render0 libxcb-shm0 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxdmcp6 libxext6 libxfixes3 libxft2 libxi6 libxinerama1 libxrandr2 libxrender1 libxtst6 libxxf86vm1 lksctp-tools openjdk-7-jre openjdk-7-jre-headless shared-mime-info ttf-dejavu-core ttf-dejavu-extra tzdata-java x11-common -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlr0XYMACgkQHPApP6U8 pFhldRAAr5/LCc8ufhGjMWbtK2GrUcinD/znUq8QV4CGjfBclg+FHGAMQ00Er8na TTeY4shHdsdHq82sEzTXX7ySoeKV4pd6+LbsxT2q0uzTIoXbXFdpsPThBESaKTNB BvdA+T7CHpknE7zH4b0ebxiCnWlQM5VkDgKK/bgte2IXoK7y1iXxh30id3DET1qo e9i96umNDSZ6Ik7s03rK/JoU6j1EHCz+80mbERWSie/z9T/6+avCcYp3fB570ue9 aysX8yzBhiwp+YFiEJ9cDlOrccmC4vaWgZgRHRWnbIlvnPQys4pq+qSHSqU22iy7 e1HBob0f6ZN1yK1gM8UC29w4XVwDF6CCh+xlH5arvoX38ucNvhOVj2EPyUY2sLAy uEsqwhjDPRphYLRoMiis/3RV9MksvbUs+HOIFhciFB7OnOd4MsQA5a9VJi8txeVA adLEoAYKZw0u9wvue/J5481aja+JPBJwE9f5zbTCliTK9Ojk2FKY8syB6FYs2qvX 42Epr7eaj22gxEMrektH0WcH+keSg6fzzPh9QypNHRYGjSDsbDkyoa/cFRdVHt4D NrvvaGMFhf1/KzQFVvsiVo5zBF5xPzh9EQBu3HhIb7yQFdKTuCx2mxgnJ4rOl7pg twXGB+oRTQeT70LxEDN4ozUgAe/dT7CCtj3LPoWK8yRylvzReWc= =ExdU -END PGP SIGNATURE-
Sorting using "packed" fields?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I have documents that need to appear to have different attributes depending upon which user is trying to search them. One of the fields I currently have in the document is called "latest_submission" and it's a multi-valued text field that contains fields packed with a numeric identifier prefix and then the real data. Something like this: 101:2018-04-16T16:41:00Z 102:2017-01-25T22:08:17Z 103:2018-11-19T02:52:28Z When searching, I will know which prefixes are valid for a certain user, so I know I can search by *other* fields and then pull-out the values that are appropriate for a particular user. But if I want Solr/Lucene to searcg/sort by the "latest submission", I need to be able to tell Solr/Lucene which values are appropriate to use for that user. Is this kind of thing possible? I'd like to be able to issue a search that says e.g.: find documents matching name:foo sort by latest_submission starting with ("102:" or "103:") I'm just starting out with this data set, so I can completely change the organization of the data within the index if necessary. Does anyone have any suggestions? I've seen some questions on the list about "child documents", and it seems like that might be relevant. Right now, my input data looks like this: { { "name" : "document name", "latest_submission" : [ "prefix:date", "prefix:date", etc. ] } } But that could easily be changed to be: { { "name" : "document name", "latest_submission" : { "prefix" : "101", "date" : "[date]" }, { "prefix" : "103", "date" : "[date]" }, } } Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrVDAQACgkQHPApP6U8 pFj7Mw//dnM0ZMRhbvAlMptYSH3LEj08I0l/oJWMQYilWOIltpZ148QOJp+5Iqu/ Q9uYfkItdv0Fw77Ebgtmm7N5PUzH7utiyDfKNayvL9d9+MtfFzx4CKPyqdjNDXvC 2LLUks9ABTX93h7AUdeO5rM4NsPci6LMY8dcxU6fbVDbDT5nYTRULUrbGfDxmY6E SyMwk25DOzmrIoFCOJcyhuluvHhax753mOQCCljuFaCM3J8ap0+2ZqX8Nl5D2NLz CqU5ROTGxm+qMVQ8dbqhT6LRdbjj6KqazutOxZl+H+Ix6yVeWZG/9TiAtkKZklvJ 6wjMB2te4utj35YPhpMkghkIYwo7s6jt9DXyBaf2gv1fbiNKmvPN2eqhsI870f0t UmknH8Atx3ygeru3ddjIvb2Fn17E7EpKHWxkmmrexKE8uzCo9Ith6BWqL8ae19o/ LtBQ7RNCNjIbyNk3GcUJmvboM+PAAvUWbnpwQ4V2oI8b5sO9zeopE4JlzbWmG89H WVmtPpIdw0H8AwLNbJuGaaksY5ZIcYg2iFH56BHvvu1ri3ArSgcQuyHfxEZD7gs3 cjh+mX9QEgbCVrz2i0CwRkgAMMIffG2SjBsHhUs5ESYqeskkDcyFDi70Q+5wNJ71 GhAESSbgpI31lpbhkGwh7gdXiJyKJG3EMFDEEZVN5sLhFYv96Q8= =V+EE -END PGP SIGNATURE-
Re: Appropriate field type for date-without-time
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 4/15/18 4:49 PM, Shawn Heisey wrote: > On 4/15/2018 2:31 PM, Christopher Schultz wrote: >> I'd usually call this a "date", but Solr's documentation says >> that a "date" is what I would call a timestamp (including time >> zone). > > That is correct. Lucene dates are accurate to the millisecond. > They don't actually handle timezones the way you might be thinking > -- the information is UTC. When using date rounding (NOW/WEEK, > NOW/DAY, etc) you can tell Solr what the timezone is so that the > boundaries are correct, but the information in the index is UTC. > >> https://lucene.apache.org/solr/guide/7_3/field-types-included-with-so lr. >> >> html >> >> [ I remember reading but cannot currently seem to find a >> reference page with the actual pre-defined field types Solr ships >> with. That page above lists the class names, but not the aliases >> used by a real Solr installation. > > That info is what you need to define the fieldType in the schema. > So you would put something like "solr.DatePointField" as the > class. What about the "standard" aliases for existing fieldTypes? I remember reading a page where "int" versus "pint" were compared, but I can't seem to find that, now. >> Is there an existing appropriate field type for >> "date-without-time"? > > The answer to this question is not yes, but it's also not no. All > date types in Solr have millisecond precision. Okay, so if I want to have a date-without-timestamp, I'll either need to set all timestamps to 00:00:00 or invent something like pint-encoded-date, right? > But if you use DateRangeField, you can deal with larger time > periods. A query like "2018" actually works. At both query and > index time, the less precise syntax is translated internally to a > *range* before the query or indexing happens. Sounds like wasting a little space with 00:00:00 timestamps is probably the way to go. Even if using pint would be equivalent (and perhaps even a little more efficient), I think using a "real" date field is more appropriate. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrUoEoACgkQHPApP6U8 pFj4lBAAzBSwzlq/mYpK9KraK3UkRhvDfQY5Tk9UpjaDvigROMks5oaGUybZmYLa 6oIguO+xwrMpYU08X3RCtDMPkJKFxXcQhj4x3zgMj/JM2FaCjgkWMsE1oU+68qKB Ad4HMMqPsmDuG22zcXJWlMLNIfgZk89u2c97Tt/eWvtUYMnZMjT+6CfA43z8JRnM i8ixDaEl7TZVDD3G4YW/cXCQacpIPmynMOH60gng5ylC04nMLCQyvf3zV0WB7X+t JTGEjGmMENJhqVq3PnH6VYjGeSU92c8/bbEf+us1nRkIjayEnA7Uv7L87l56viVY 3jpEvHPjGiluDpTfLRUQzaTvu7PUwL1MefmKYnri9NP+HB2v8AhGN+oCyRI/RM5r hYMTOdyX9VcVOUF3DluWpOCpG9WaJaEfT6ifw6bifNQpWG9lj6B8zxAfGGWRL9dU iOOCBYwDioYaolRz6oIcTny22/mm3SE4IXGkrH9C2U9WU/nUFhWEjqbw4MWF0ten 0RSJ8coj05fsFdA0A1owA2wOqXuJGmaMfNjZiPR05ucgIFaM0MxgIyFzNeMGxKSd aUp5EfrS2EHa23DDgsMF0i7C5KTw/Xlzr0Y+9WWdSlRWtYGvBZThP261lJ/jHmpS FcDsNz4Y5/V2XnNcp0ieD+RoaAMctiehFuzPu9h2awZcF25CGDI= =vaBk -END PGP SIGNATURE-
[OT] Re: What is the correct URL for POSTing new data?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 4/15/18 4:33 PM, Shawn Heisey wrote: > On 4/15/2018 2:24 PM, Christopher Schultz wrote: >> No, it wouldn't have. It doesn't read any configuration files >> and guesses its way through everything. Simply adding HTTPS >> support required me to modify the script and manually-specify the >> URL. That's why I went through the trouble of explaining so in my >> initial post. > > Gotcha. I haven't used SSL with Solr myself. Nobody can get > directly to the Solr servers, so we don't need it. If somebody is > able to penetrate our systems to the point where they can sniff > Solr traffic, they will already have full access to things far more > sensitive than our search index. Not necessarily, but that depends entirely upon your environment. We have a policy of "no privileged network positions" so we don't even trust our "private networks". Someone at the data center could inadvertently configure a switch port to suddenly join our VLAN or a network plug might be incorrectly assigned, etc. So we don't want our data flying around in a way that can be intercepted. > I'll see what I can do about the documentation to make it clear > that the URL given to the post tool needs the request handler > path. That would be great. Even poking-around in the Solr web UI doesn't reveal that path because of all the javascript magic in the interface. It's unreasonable to expect everyone to read source code in order to learn how to use tools that don't require direct programming. Let me take a step back and say that Solr in fact has great documentation. There are evidently some things it lacks for the uninitiated. Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrUn5wACgkQHPApP6U8 pFiN+hAAyyO69VE5ZLGFTk4ti2a4L2+Cvtgdag9GYIvUbX72Zhdwlu2OWBSLVbix Ibx6XNYKfv88IFzYWrFhTQmPS7Ce35H5Wss2YNfSnGZBhbSrifkCDam06zFZlesH HTSwrBFs32rTB41c4d6WrBR1wgSOirRsIQ4iDitoIRcGhDsdn3y4nANqoSp3/ZmM hYJEZ57pa7+aon4hbXde5aYKs5NGqkvOg0XAvctscDSPifZ9sijOgwM7DmABoqit 9oUB5s9pvOt0eA1czhI+gAvgscXdReo8A2i2l1hFxGhvaZ0Xnl2OJqjkNSwhUfaB J9sc/j/LYWSzapBFl6b9fDYAqjxIcwkLtlX/BOOwLzZWa0Gjnj3OkJSfO6pZjtC3 ZQkBC2a8cyBbx3OW7GyyTzCDKQdYceslXiyYvFiqAEJL5u1SpPfbD8l9XdoTRDzL M+lsmq9NW7ZDDk5VCAzHr6WVrcTGVM9wZPy4lJ+Wi5sOA/VS8QrXP/J+lJg8blID MhUCstVZHY9MT6NwQxYpfBb/Sc00/sksakhkdSt95GOEnUnz3cxiW/gqaYEq6b6q LugrqUuLz9Iy+OVPRzIj7dT31JQERpLm1wELcbY0QutI2hPICkIaec5Pw/avdRBW UmRrESPK7+zOly+j+WVy2noX2+Y6/orje4oP3ETTPRA1Ey4Y2Xs= =2ntw -END PGP SIGNATURE-
Appropriate field type for date-without-time
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I'd usually call this a "date", but Solr's documentation says that a "date" is what I would call a timestamp (including time zone). https://lucene.apache.org/solr/guide/7_3/field-types-included-with-solr. html [ I remember reading but cannot currently seem to find a reference page with the actual pre-defined field types Solr ships with. That page above lists the class names, but not the aliases used by a real Solr installation. For example, if I want to store an integral numeric value, I know I want to use "pint", but can't actually find the reference for that. ] I have dates that have no timestamps on them, and I'd like to store them and probably sort by them. I'm not sure whether we would care to search for documents whose date fields are within a certain range, etc. at this point. I could convert the date into a number e.g. 20180415 for today and simply store it as a "pint", but that might, ahem, surprise someone looking at documents in the collection and expect that an obvious "date"-oriented field was in fact an int. Also, the year 1 bug will rear its ugly head many generations from now. Is there an existing appropriate field type for "date-without-time"? Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrTtr8ACgkQHPApP6U8 pFgulw//VEY527dP/rSvar7Q/XK6lXBNNrl3C7QOse1WlZq27+WRy7A4JwgfKzaR gvIvLCIytDBznI+Xye72iLuyYnbKn92OLv4sz/jazQfIK9qwlEIRe0ZDKqWZI8k0 CNz3HrfKC5o4Qe84H8dj91PK8U00Q2EGjHe/WY2yS0vYhs4bp4xaVM0Ks2VcRvo1 Jw1DyPwyODTPEQRQ0DdowE6InIJzJ2r+A6OrexvRUMng6AldbOKJjanqgSbZf5lF 07+nnT5Raejs3pIQCbyrCWuxOMGiTsR5rxYy8TTlnUdyqgRChDEaJD4tFBFv/sis ez03T3EsIBz6Ha4BLhFRLhtssjYX6+5gyrJUd32xaUYtvsQR0ca0iE9gzNBVXNzz ZsRNGEmjOE3khJX4UL1MuGgQRbLlKfSunz/58HdXlzzmIG9LwryKj3G85diRYUmh Ge9PUmjUg9u+VfzqgfFqO3Mf1FhQkW/ejAli7I3N8hHk81Iyvhdm+eqyuhq5GFNy U7Kxmmg1DfJIumXu+4jczUuN8TI+xanvB2yiTgsycbIfGAL5LRMoRi/yN8+DhaUX HOvGhWprzzuNb+AM4heLq/dAk2vD/zWK91Vc2YLAy9/W/WW9xeoIzRLvb32y6oq7 OVUuni0IjVzphLJOgfZOtCBUdAKWAwSMOohJ6+v7GcAW1xBzSP4= =cgb/ -END PGP SIGNATURE-
Re: What is the correct URL for POSTing new data?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 4/13/18 6:02 PM, Shawn Heisey wrote: > On 4/13/2018 7:49 AM, Christopher Schultz wrote: >> $ >> SOLR_POST_OPTS="-Djavax.net.ssl.trustStore=/etc/solr/solr-client.p12 >> >> - -Djavax.net.ssl.trustStorePassword=whatevs >> -Djavax.net.ssl.trustStoreType=PKCS12" /usr/local/solr/bin/post >> -c new_core https://localhost:8983/solr/new_core >> >> [time passes while bin/post uploads a very large file] >> >> SimplePostTool version 5.0.0 Posting files to [base] url >> https://localhost:8983/solr/new_core... Entering auto mode. File >> endings considered are >> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp ,ots,rtf,htm,html,txt,log >> >> POSTing file new_core.json (application/json) to [base]/json/docs >> SimplePostTool: WARNING: Solr returned an error #404 (Not Found) >> for url: https://localhost:8983/solr/new_core/json/docs > > The URL path (beyond the core name) it's ending up with is > /json/docs, when it should be /update/json/docs. Looks like that worked. I could find that nowhere in the documentation. > If you hadn't given the command a specific URL, it probably would > have figured out the correct URL on its own. No, it wouldn't have. It doesn't read any configuration files and guesses its way through everything. Simply adding HTTPS support required me to modify the script and manually-specify the URL. That's why I went through the trouble of explaining so in my initial post. > The base URL for the post tool normally includes the /update path, > which is different than the base URL for something like > HttpSolrClient (in the SolrJ library). Changing the handler path > is done differently in SolrJ than it is with the post tool. > > I know, we've violated that principle again. :) ;) I don't mind all surprises. It's the ones that have zero documentation that are the most surprising. > The bin/post tool is a *simple* tool. The java class that it calls > is even named "SimplePostTool". It is expected that most users > will outgrow its functionality quickly and write their own indexing > software that does whatever custom processing they require. The > tool doesn't get a lot of improvements because we don't intend it > to be used as a production indexing mechanism. I'm using it as a bulk-loading operation. I have no need in production to completely bootstrap a document collection unless the existing one has been trashed for some reason. Why bother writing my own client that does the equivalent of "SELECT * FROM table" and then loop over the ResultSet calling SolrJ's add-document method. The SimplePostTool should be able to handle that for me, and if it did, I'd have less code to babysit in perpetuity. > If it does what you need, there's nothing wrong with production > usage, but you need to be aware that it doesn't have robust error > handling, which is usually pretty important for production. I'm okay with terse error messages. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrTtQ4ACgkQHPApP6U8 pFi4iA/+MQ97WTAkA6t06PqJWEjbu948gJSS5gaVo3HZlTtLqmzT3/4HhypKolId aVWEU4KpdGGyOp9N2nkc31Zg8Wu4eLRa0k3GaOJ146b9CgJmUqgedJi/6sDlAXFL mM472eAxDhVRpZB2wGpXp8HZyVxbjOd/ggCVX5ln6vj8TaRfkdDlhWWTX4Bci/uQ Ia3M50whXIMxKVHmNKLziIsSbvJ/Bt1/rPoz9CzSBDch665yFK+21cXz3u8dAMsv fdseYYvJ53tnZi6i8xDlGxsTQFbbWpYNWefs0tQjQGLF67t33NNdX5oC6ihChVjD OlAxh+sL0TX10eGq8Q+1nQcvyg87QAiipY2yDM3CnFxFLbfn/9rdn28mFxtsNIRd YQyNsVJN2NNXEPzjAYZe9khsIouvioQlmeX0XWhmuQOPdLbO0otiEGNRtwyUhDnt ytXwkZ70htwRrAh9UC6GFXwgLkMgTN2E4KRjnOBJCbHSYmjL6YAFPWeeAQFX9fW1 18BVNlsyi2Qyo+v86Jbl50Ld3+64UQukjvNCJn8v/uQJ1O8NT2qfcV6jAZ9Wj273 QSzg1eVCiycmKSL+12EojS4ksSmmBVEuMa4pmFimR2JNEYZnzjyO/egaGgIx2FmQ Sar14gER2OCeI2dXkrRI8sIiLmOaJOatkHCf9lMebpcuyvq+un8= =D+Pm -END PGP SIGNATURE-
What is the correct URL for POSTing new data?
All, I've recently been encountering some frustrations with Solr 7.3 after configuring TLS; since the command-line tools (which are a breeze to use when you have a "toy" Solr installation) stop working when TLS is enabled, I'm finding myself having to perform the following tasks in order to get bin/post to work: 1. patch bin/post: 234,235c234,235 < echo "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}" org.apache.solr.util.SimplePostTool "${PARAMS[@]}" < "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}" org.apache.solr.util.SimplePostTool "${PARAMS[@]}" --- > echo "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}" ${SOLR_POST_OPTS} org.apache.solr.util.SimplePostTool "${PARAMS[@]}" > "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}" ${SOLR_POST_OPTS} org.apache.solr.util.SimplePostTool "${PARAMS[@]}" 2. Run the command with lots of manual options: $ SOLR_POST_OPTS="-Djavax.net.ssl.trustStore=/etc/solr/solr-client.p12 -Djavax.net.ssl.trustStorePassword=whatevs -Djavax.net.ssl.trustStoreType=PKCS12" /usr/local/solr/bin/post -c new_core https://localhost:8983/solr/new_core [time passes while bin/post uploads a very large file] SimplePostTool version 5.0.0 Posting files to [base] url https://localhost:8983/solr/new_core... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing file new_core.json (application/json) to [base]/json/docs SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: https://localhost:8983/solr/new_core/json/docs SimplePostTool: WARNING: Response: Error 404 Not Found HTTP ERROR 404 Problem accessing /solr/new_core/json/docs. Reason: Not Found SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: https://localhost:8983/solr/new_core/json/docs 1 files indexed. COMMITting Solr index changes to https://localhost:8983/solr/new_core... SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: https://localhost:8983/solr/new_core?commit=true SimplePostTool: WARNING: Response: Error 404 Not Found HTTP ERROR 404 Problem accessing /solr/new_core. Reason: Not Found Time spent: 0:00:04.710 I'm guessing that I just don't know what the URL is supposed to be for that core. When browsing the web UI, I can examine the core here: https://localhost:8983/solr/#/~cores/new_core Solr reports: startTime:a day ago instanceDir:/var/solr/data/new_core dataDir:/var/solr/data/new_core/data/ Index lastModified:- version:2 numDocs:0 maxDoc:0 deletedDocs:0 current: [check-mark] So the core is there. I suspect I'm simply not addressing it correctly. How should I modify the URL I pass on the command-line so that bin/post can inject a new batch of data? Thanks, -chris
Re: Confusing error when creating a new core with TLS, service enabled
Shawn, On 4/10/18 10:16 AM, Shawn Heisey wrote: > On 4/10/2018 7:32 AM, Christopher Schultz wrote: >>> What happened is that the new core directory was created as root, >>> owned by root. >> Was it? If my server is running as solr, how can it create directories >> as root? > > Unless you run Solr in cloud mode (which means using zookeeper), the > server cannot create the core directories itself. When running in > standalone mode, the core directory is created by the bin/solr program > doing the "create" -- which was running as root. That is ... surprising.[1] > I know that because > you needed the "-force" option. So the core directory and its "conf" > subdirectory (with the config) are created by the script, then Solr is > asked (using the CoreAdmin API via http) to add that core. It can't, > because the new directory was created by root, and Solr can't write the > core.properties file that defines the core for Solr. Okay, then that makes sense. I'll try running bin/solr as "solr" via sudo instead of merely as root. I was under the mistaken impression that the server kept its own files in order. It also means that one cannot remote-admin a Solr server. :( > When running Solr in cloud mode, the configs are in zookeeper, so the > create command on the script doesn't have to make the core directory in > order for Solr to find the configuration. It can simply upload the > config to zookeeper and then tell Solr to create the collection, and > Solr will do so, locating the configuration in ZooKeeper. Good to know, though I'm not at the stage where I'm using ZK. > You might be wondering why Solr can't create the core directories itself > using the CoreAdmin API except in cloud mode. This is because the > CoreAdmin API is *OLD* and its functionality has not really changed > since it was created. Historically, it was only designed to add a core > that had already been created. *snapping sounds from inside brain* > We probably need to "fix" this ... but > it has never been a priority. There are bigger problems and features to > work on. Cloud mode is much newer, and although the Collections API > does utilize the CoreAdmin API behind the scenes, the user typically > doesn't use CoreAdmin directly in cloud mode. > >> The client may be running as root, but the server is running as 'solr'. >> And the error occurs on the server, not the client. So, what's really >> going on, here? > > I hope I've explained that clearly above. You have. Running bin/solr as user 'solr' was able to create the core. The way the installer and server work together is very unfortunate. bin/solr knows the euid of the server and, if running under root/sudo could easily mkdir/chown without crapping itself. Having installed a "service" using the Solr installer practically requires you to run bin/solr using sudo, and then it doesn't work. Is there a JIRA ticket already in existence where I can leave a comment? Thanks, -chris [1] https://en.wikipedia.org/wiki/Principle_of_least_astonishment
Re: Confusing error when creating a new core with TLS, service enabled
Shawn, On 4/9/18 8:04 PM, Shawn Heisey wrote: > On 4/9/2018 12:58 PM, Christopher Schultz wrote: >> After playing-around with a Solr 7.2.1 instance launched from the >> extracted tarball, I decided to go ahead and create a "real service" on >> my Debian-based server. >> >> I've run the 7.3.0 install script, configured Solr for TLS, and moved my >> existing configuration into the data directory, here: > > What was the *precise* command you used to install Solr? $ sudo bin/install_solr_service.sh ../solr-7.3.0.tgz -i /usr/local/ > Looking for > all the options you used, so I know where things are. There shouldn't > be anything sensitive in that command, so I don't think you need to > redact it at all. Also, what exactly did you add to > /etc/default/solr.in.sh? Redact any passwords you put there if you need to. # Set by installer SOLR_PID_DIR="/var/solr" SOLR_HOME="/var/solr/data" LOG4J_PROPS="/var/solr/log4j.properties" SOLR_LOGS_DIR="/var/solr/logs" SOLR_PORT="8983" # Set by me SOLR_JAVA_HOME=/usr/local/java-8 SOLR_SSL_KEY_STORE=/etc/solr/solr.p12 SOLR_SSL_KEY_STORE_PASSWORD=xxx SOLR_SSL_KEY_STORE_TYPE=PKCS12 SOLR_SSL_TRUST_STORE=/etc/solr/solr-client.p12 SOLR_SSL_TRUST_STORE_PASSWORD=xxx SOLR_SSL_TRUST_STORE_TYPE=PKCS12 >> When trying to create a new core, I get an NPE running: >> >> $ /usr/local/solr/bin/solr create -V -c new_core >> >> WARNING: Using _default configset with data driven schema functionality. >> NOT RECOMMENDED for production use. >> To turn off: bin/solr config -c new_core -p 8983 -property >> update.autoCreateFields -value false >> Exception in thread "main" java.lang.NullPointerException >> at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:731) >> at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:642) >> at org.apache.solr.util.SolrCLI$CreateTool.runImpl(SolrCLI.java:1773) >> at org.apache.solr.util.SolrCLI$ToolBase.runTool(SolrCLI.java:176) >> at org.apache.solr.util.SolrCLI.main(SolrCLI.java:282) > > Due to the way the code is written there in version 7.3, the exact > nature of the problem is lost and it's not possible to see it without a > change to the source code. If you want to build a patched version of > 7.3, you could re-run it to see exactly what happened. Here's an issue > for the NPE problem: > > https://issues.apache.org/jira/browse/SOLR-12206 Thanks. > Best guess about the error that it got: When you ran the create > command, I think that Java was not able to validate the SSL certificate > from the Solr server. This would be consistent with what I saw in the > source code. This particular scenario was that the solr client was trying to use HTTP on port 8983 (because solr.in.sh could not be read with the TLS hints) and getting a (broken) TLS handshake response. So it wasn't even an HTTP response, which is probably why the client was (very) confused. > For the problem you had later with "-force" ... this is *exactly* why > you shouldn't run bin/solr as root. Not running as root. I'm on the Tomcat security team. I'm obviously not wanting to run the server as root. $ ps aux | grep -e 'PID\|solr' USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND solr 18309 0.0 3.3 2148524 257164 ? Sl Apr09 0:22 [cmd] File permissions make sense, too: $ sudo ls -ld /var/solr/data drwxr-x--- 3 solr solr 4096 Apr 9 15:06 /var/solr/data $ sudo ls -l /var/solr/data total 12 drwxr-xr-x 4 solr solr 4096 Mar 5 15:12 test_core -rw-r- 1 solr solr 2117 Apr 9 09:49 solr.xml -rw-r- 1 solr solr 975 Apr 9 09:49 zoo.cfg > What happened is that the new core directory was created as root, > owned by root. Was it? If my server is running as solr, how can it create directories as root? > But then when Solr tried to add the core, it needed to write a > core.properties file to that directory, but was not able to do so, > probably because it's running as "solr" and has no write permission > in a directory owned by root. That makes absolutely no sense whatsoever. The server is running under a single egid, and it's 'solr', not 'root'. Also, there is no new directory in /var/solr/data (owned by either solr OR root) and if Solr was able to create that directory, it should be able to write to it. The client may be running as root, but the server is running as 'solr'. And the error occurs on the server, not the client. So, what's really going on, here? > The error in the message from the command with "-force" seems to have > schizophrenia. I absolutely edited the log and failed to do so completely. -chris
Re: Confusing error when creating a new core with TLS, service enabled
All, On 4/9/18 2:58 PM, Christopher Schultz wrote: > All, > > After playing-around with a Solr 7.2.1 instance launched from the > extracted tarball, I decided to go ahead and create a "real service" on > my Debian-based server. > > I've run the 7.3.0 install script, configured Solr for TLS, and moved my > existing configuration into the data directory, here: > > $ sudo ls -l /var/solr/data > total 12 > drwxr-xr-x 4 solr solr 4096 Mar 5 15:12 test_core > -rw-r- 1 solr solr 2117 Apr 9 09:49 solr.xml > -rw-r- 1 solr solr 975 Apr 9 09:49 zoo.cfg > > I have a single node, no ZK. > > When trying to create a new core, I get an NPE running: > > $ /usr/local/solr/bin/solr create -V -c new_core > > WARNING: Using _default configset with data driven schema functionality. > NOT RECOMMENDED for production use. > To turn off: bin/solr config -c new_core -p 8983 -property > update.autoCreateFields -value false > Exception in thread "main" java.lang.NullPointerException > at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:731) > at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:642) > at org.apache.solr.util.SolrCLI$CreateTool.runImpl(SolrCLI.java:1773) > at org.apache.solr.util.SolrCLI$ToolBase.runTool(SolrCLI.java:176) > at org.apache.solr.util.SolrCLI.main(SolrCLI.java:282) > > > There is nothing being printed in the log files. > > I thought it might be because I enabled TLS. > > My /etc/default/solr.in.sh (which was created during installation) > contains the minor configuration required for TLS, among other obvious > things such as where my data resides. > > I checked the /usr/local/solr/bin/solr script, and I can see that > /etc/default/solr.in.sh in indeed checked and run it readable. > > Readable. > > The Solr installer (reasonably) makes all scripts, etc. readable only by > the Solr user, and I'm never logged-in as Solr, so I can't read this > file normally. I therefore ended up having to run the command like this: > > $ sudo /usr/local/solr/bin/solr create -V -c new_core Actually, then I got this error: WARNING: Creating cores as the root user can cause Solr to fail and is not advisable. Exiting. If you started Solr as root (not advisable either), force core creation by adding argument -force When adding "-force" to the command-line, I get an error about not being able to persist core properties to a directory on the disk, with not much detail: 2018-04-09 19:03:14.796 ERROR (qtp2114889273-17) [ ] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error CREATEing SolrCore 'cschultz_patients': Couldn't persist core properties to /var/solr/data/new_core/core.properties : /var/solr/data/new_core/core.properties at org.apache.solr.core.CoreContainer.create(CoreContainer.java:989) at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:90) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:358) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:389) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195) at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
Confusing error when creating a new core with TLS, service enabled
All, After playing-around with a Solr 7.2.1 instance launched from the extracted tarball, I decided to go ahead and create a "real service" on my Debian-based server. I've run the 7.3.0 install script, configured Solr for TLS, and moved my existing configuration into the data directory, here: $ sudo ls -l /var/solr/data total 12 drwxr-xr-x 4 solr solr 4096 Mar 5 15:12 test_core -rw-r- 1 solr solr 2117 Apr 9 09:49 solr.xml -rw-r- 1 solr solr 975 Apr 9 09:49 zoo.cfg I have a single node, no ZK. When trying to create a new core, I get an NPE running: $ /usr/local/solr/bin/solr create -V -c new_core WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use. To turn off: bin/solr config -c new_core -p 8983 -property update.autoCreateFields -value false Exception in thread "main" java.lang.NullPointerException at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:731) at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:642) at org.apache.solr.util.SolrCLI$CreateTool.runImpl(SolrCLI.java:1773) at org.apache.solr.util.SolrCLI$ToolBase.runTool(SolrCLI.java:176) at org.apache.solr.util.SolrCLI.main(SolrCLI.java:282) There is nothing being printed in the log files. I thought it might be because I enabled TLS. My /etc/default/solr.in.sh (which was created during installation) contains the minor configuration required for TLS, among other obvious things such as where my data resides. I checked the /usr/local/solr/bin/solr script, and I can see that /etc/default/solr.in.sh in indeed checked and run it readable. Readable. The Solr installer (reasonably) makes all scripts, etc. readable only by the Solr user, and I'm never logged-in as Solr, so I can't read this file normally. I therefore ended up having to run the command like this: $ sudo /usr/local/solr/bin/solr create -V -c new_core This was unexpected, because "everything goes through the web service." Well, everything except for figuring out how to connect to the web service, of course. I think maybe the bin/solr script should maybe dump a message saying "Can't read file $configfile ; might not be able to connect to Solr" or something? It would have saved me a ton of time. Thanks, -chris
Re: Apache commons fileupload migration
Shawn, On 3/20/18 9:13 AM, Shawn Heisey wrote: > On 3/15/2018 6:40 AM, padmanabhan1616 wrote: >> Hi Team,We are using Apache SOLR-5.2.1 as index engine for our data >> analytics >> application. As part of this SOLR uses commons-fileupload-1.2.1.jar >> for file >> manipulation.There is security Vulnerability identified in >> commons-fileupload library: *CVE-2016-131 Apache Commons FileUpload: >> DiskFileItem file manipulation*As per official notice from apache >> software >> foundations this issue has been addressed in commons-fileupload-1.3.3.jar >> and available for all the dependency vendors.*Is this good toupgrade >> commons-fileupload from 1.2.1 to 1.3.3 version directly?* > > Solr previously addressed two other vulnerabilites in > commons-fileupload, both of them after the version you're running. > > https://issues.apache.org/jira/browse/SOLR-9819 > https://issues.apache.org/jira/browse/SOLR-9053 > > One of these fixes just did a jar upgrade, but the other also included > code changes. So it looks like just replacing the jar with 1.3.3 MIGHT > cause problems. The commons-fileupload dependency is only used in one > place in Solr -- the multipart request parser. I cannot tell what > actually uses this functionality, though. I suspect that whatever it is > is not something really common. > > Looking at the way that Solr uses DiskFileItem and related classes, I > don't see any evidence that it actually uses serialization or > deserialization, so I don't think Solr is vulnerable to the problem > fixed in 1.3.3, but there are two other vulnerabilities that the version > you're running has. I haven't assessed whether Solr is vulnerable to > either of those problems. I think you are misunderstanding the attack vector. It doesn't matter how Solr uses DiskFileItem. It matters how a running JVM will behave if it is tricked into deserializing such an object. Let me give an example: 1. Solr is running on a system in read-only mode (by whatever definition) and therefore is not firewalled or anything like that. 2. No "normal" users have access to Solr due to process/file permissions. 3. JMX is enabled on the Solr instance, but only for the 127.0.0.1 interface. This environment might seem to be "secure" because of the isolation provided by the OS for files and processes, and the restriction of JMX to the localhost interface. But JMX uses RMI which uses serialization to marshal objects over the wire to the JMX server. So an attacker can construct a malicious serialized object (say, an Object[] which contains a DiskFileItem somewhere in there) and merely the presence of the vulnerable commons-fileupload library can be used to trick the JVM into executing arbitrary code. The fact that Solr itself doesn't use DiskFileItem is irrelevant. > FYI: If only trusted admins and applications can reach the Solr server, > then any remote vulnerability Solr has cannot be exploited unless > somebody first breaches the security on something else that DOES have > access to Solr. If they manage to do that, they probably have access > that's far more damaging than access to Solr would be. This is absolutely true. But it's easy to overlook things that are outside of the bounds of the application (like JMX/RMI) or in little-used corners where deserialization is occurring for whatever reason. It's those edge-cases that can get you into trouble. The proper mitigation is to upgrade to the latest version of the library. I suggested that the OP read the changelog because it describes all the changes and whether or not they are backward-compatible. -chris signature.asc Description: OpenPGP digital signature
Re: Question liste solr
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Mariano, On 3/19/18 11:50 AM, LOPEZ-CORTES Mariano-ext wrote: > Hello > > We have an index Solr with 3 nodes, 1 shard et 2 replicas. > > Our goal is to index 42 millions rows. Indexing time is important. > The data source is an oracle database. > > Our indexing strategy is : > > * Reading from Oracle to a big CSV file. > > * Reading from 4 files (big file chunked) and injection via > ConcurrentUpdateSolrClient > > Is it the optimal way of injecting such mass of data into Solr ? > > For information, estimated time for our solution is 6h. How big are the CSV files? If most of the time is taken performing the various SELECT operations, then it's probably a good strategy. However, you may find that using the disk as a buffer slows everything down because disk-writes can be very slow. Why not perform your SELECT(s) and write directly to Solr using one of the APIs (either a language-specific API, or through the HTTP API)? Hope that helps, - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqv7aEdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFgJrg//RushznZlTg60TxdE s/XKK+69s9c0+DwZ/IrU366j2ZOcJl8Osu9TpzaCSEpdWuulFG8qCSYThTngaijH I02YCqnK9Ey4+6B7u9QECWNXjdlQXoeINjCnRLVENWzkSmht/U2nW3WTFEPKOvQ3 6ISTPATFnfo6Wt4VYrVefqO/yCCiR5bGL5LsSZYwvqlh9egR8K/wtf4sQ5kji3z+ r2Z0gYpR9igE3ZCIByf6QGq0Ftku90oFCG+kCVNOdgfqwkUaMdc7krv92oTSH4o5 BH+trc2jPf3HKFmp/ywRAPEhAfA5BwbT8vB9gwl/6vuT6efAot7xrLqduF3h7jG6 ffPtkEBbD/ld3inIVta6/hnUwxX9O1fBtJrZegD14cezLV9QcEWFJ8/lUfgGOTdX ZuvwxBFhmCXE9EMWLlpdUOWK9iVBsZoQZxawoqw9xQauBp/Adg29fdeXmEkUssey 85HGDv/x33Bcr1xPGa8nOygWcZRUgGFCh871qStg9GeTNx3C/mSk0wxdKeUDRePg GEuL0p803yCJYAddyF66nnx676LfFeDaocBJelx5UbiteNT23xut7jWP/COyOvoy tpq3c9UfIkobgcA7bZ3IL2Og+hExgo+tLQXiOx6bf2TD1Jk2UOWWk1TAUspuUybD VH6PlwgqcrO28Jx799mJvpIotoE= =aMPk -END PGP SIGNATURE-
Recommendations for non-narrative data
All, I'm using Solr to index and search a database of user data (username, email, first and last name), so there aren't really "terms" in the data to search for, like you might search for words that describe products in a catalog, for example. I have set up my schema to include plain-old text fields for each of the data mentioned above, plus I have a copy-field called "all" which includes everything all together, plus I have a first + last field which uses a phonetic index and query analyzer. Since I don't need things such as term-replacement (spanner == wrench), stemming (first name 'chris' -> 'chri'), and possibly other features that I don't know about, I'm wondering what might be a recommended set of tokenizer(s), analyzer(s), etc. for such data. We will definitely want to be able to search by substring (to find 'cschultz' as a username with 'schultz' as input) but some substrings are probably useless (such as @gmail.com for email addresses) and don't need to be supported. What are some good options to look at for this type of data? In production, we have fewer than 5M records to handle, so this is more of an academic exercise than an actual performance requirement (since Solr is at least an order of magnitude faster than our current RDBMS-searching implementation). If it makes any difference, we are trying to keep the index up-to-date with all user changes made in real time (okay, maybe delayed by a few seconds, but basically realtime). We have a few hundred new-user registrations per day and probably half as many changes to user records as that, so perhaps 2 document-updates per minute on average (during ~12 business hours in the US on weekdays). Thanks for any advice anyone may have, -chris signature.asc Description: OpenPGP digital signature
Re: Apache commons fileupload migration
To whom it may concern, On 3/15/18 8:40 AM, padmanabhan1616 wrote: > Hi Team,We are using Apache SOLR-5.2.1 as index engine for our data analytics > application. As part of this SOLR uses commons-fileupload-1.2.1.jar for file > manipulation.There is security Vulnerability identified in > commons-fileupload library: *CVE-2016-131 Apache Commons FileUpload: > DiskFileItem file manipulation*As per official notice from apache software > foundations this issue has been addressed in commons-fileupload-1.3.3.jar > and available for all the dependency vendors.*Is this good toupgrade > commons-fileupload from 1.2.1 to 1.3.3 version directly?* Please suggest us > best way to handle this. Note - *Currently we don't have any requirements > to upgrade solr, So please suggest best way to handle this vulnarability > without upgrade entire SOLR.*Thanks,Padmanabhan Have you read the changelog?[1] -chris [1] https://commons.apache.org/proper/commons-fileupload/changes-report.html signature.asc Description: OpenPGP digital signature
Re: Including a filtered-field in the default-field
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Erick, (Sorry... hit sent inadvertently before completion...) On 3/12/18 2:50 PM, Erick Erickson wrote: > Something like: > > solr/collection/query?q=chris shultz=edismax=all^10 > phonetic Interesting. Looks like the "qf=all phonetic" would take the place of my existing "df=all" parameter. > The point of edismax is to take whatever the input is and > distribute it among one or more fields defined by the "qf" > parameter. That's an entirely lucid explanation. That's not evident from reading the official documentation :) > In this case, it'll look for "chris" and "shultz" in both the > "all" and "phonetic" fields. It would boost matches in the "all" > field by 10, giving you an easy knob to tweak for "this field is > more important than this other one". Cool, like "if I spell it exactly right, I want that result to float to the top"? > You can combine "fielded" searches, something like: > solr/collection/query?q=firstName:chris > shultz=edismax=all phonetic > > would search for "shultz" in the "all" and "phonetic" fields while > searching for "chris" only in the "firstName" field. Perfect. > As you have noticed, there are a _lot_ of knobs to tweak when it > comes to edismax, and the result of adding =query to the URL > can be...bewildering. But edismax was created exactly to spread the > input out across multiple fields automatically. > > You can also put these as defaults in your requesthandler in > solrconfig.xml. The "browse" handler in some of the examples will > give you a template, I'd copy/paste from the "browse" handler to > you main handler (usually "selsect"), as the "browse" handler is > tied into the Velocity templating engine Since I'm taking the query from the user in my backend Java to convert it into a Solr call, I'm comfortable doing everything in the Java code itself. I'd actually rather not have too much automated stuff, because then I think I'll confuse myself when using the Solr dashboard for debugging, etc. > To start, since there are a lot of parameters to tweak, I'd just > start with the "qf" field (plus some boosts perhaps). Then move on > to pf, pf2, pf3. mm will take a while to get your head around all > by itself. I think once you see the basic operation, then the rest > of the parameters will be easier to understand. > > And I urge you to take it a little at a time, just use two fields > and two terms and look at the result of =query, the parsed > query bits, 'cause each new thing you add adds a further > complication. Fortunately you can just put different parameters on > the URL and see the results for rapidly iterating. Exactly :) Thanks for the hints, - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqm//AdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFhn2Q/+KgmjtAbKbak3qSB9 eHqNz58HS1TQ5XAosMw5WvWikqPcSH+rWVyOQfk+UPNNnI/lsK9dt1Tqpg3LPSHd cdJFEweoWQWhqWkj5lYj+/cJHcuS2Bd4TP3wOuAIdm7heP3iHVsjfRS7YodRVGCn JRbmiJBmtSlw1K+leMf4IF4kkBCzDEuZU/LcKfzyU3VoNORwtGYGHq9EXxaDtFyh 0v8v8PJWGHXgAKxdCf9a1qK9Jb40mTciGIhEQ1V083sN4U/Dieq+u9/VCVTzqlwC KuZ9YWSA58Pqx3biJYwNrjJJITFRFZT4C/TNKeiDENe53n3fL+HsSAhxs2RDvLO0 qK3NXN75B32gLZi7n/+s0SCqQcJeV/HlomLjHeB+0bUTi9Mwwqng7qoaJ49FIdjq N4lgjVLJMZmp87m883PlLev0ZXrTuoX/QRj4a5xh7tENfQ3StoUz0cC0D8GDO+XO WERL5p98KZtfca95SHAQSK41H74O5AbfG/h85iZitRQaM4mYt/cs5DAdGif9T4+z ZDzKgk1kutsTKDRyFZM6qK1O/K+9mk8ye6op+RGCYRr5qbJZpgwgUO8Vl+kOgLS7 WljUkmLbOGsGo8a2pJNJ481OhD3e+C5pa+SFGaxtYT7GBiuGJ/y8LA4HqtXzd+k3 wiHOJ0Bixyo1T4aEjbGZ+tFTOTM= =ehg4 -END PGP SIGNATURE-
Re: Including a filtered-field in the default-field
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Erick, On 3/12/18 2:50 PM, Erick Erickson wrote: > Something like: > > solr/collection/query?q=chris shultz=edismax=all^10 > phonetic Interesting. Looks like the "qf=all phonetic" would take the place of my existing "df=all" parameter. > The point of edismax is to take whatever the input is and > distribute it among one or more fields defined by the "qf" > parameter. In this case, it'll look for "chris" and "shultz" in > both the "all" and "phonetic" fields. It would boost matches in the > "all" field by 10, giving you an easy knob to tweak for "this field > is more important than this other one". > > You can combine "fielded" searches, something like: > solr/collection/query?q=firstName:chris > shultz=edismax=all phonetic > > would search for "shultz" in the "all" and "phonetic" fields while > searching for "chris" only in the "firstName" field. > > As you have noticed, there are a _lot_ of knobs to tweak when it > comes to edismax, and the result of adding =query to the URL > can be...bewildering. But edismax was created exactly to spread the > input out across multiple fields automatically. > > You can also put these as defaults in your requesthandler in > solrconfig.xml. The "browse" handler in some of the examples will > give you a template, I'd copy/paste from the "browse" handler to > you main handler (usually "selsect"), as the "browse" handler is > tied into the Velocity templating engine > > To start, since there are a lot of parameters to tweak, I'd just > start with the "qf" field (plus some boosts perhaps). Then move on > to pf, pf2, pf3. mm will take a while to get your head around all > by itself. I think once you see the basic operation, then the rest > of the parameters will be easier to understand. > > And I urge you to take it a little at a time, just use two fields > and two terms and look at the result of =query, the parsed > query bits, 'cause each new thing you add adds a further > complication. Fortunately you can just put different parameters on > the URL and see the results for rapidly iterating. > > Best, Erick > > > On Mon, Mar 12, 2018 at 11:30 AM, Christopher Schultz > <ch...@christopherschultz.net> wrote: Erick, > > On 3/12/18 1:36 PM, Erick Erickson wrote: >>>> Did you try edismax? > > Err no, and I must admit that it's a lot to take in. Did you > have a particular suggestion for how to use it? > > Thanks, -chris > >>>> On Mon, Mar 12, 2018 at 10:20 AM, Christopher Schultz >>>> <ch...@christopherschultz.net> wrote: All, >>>> >>>> I have a Solr index containing application user information >>>> (username, first/last, etc.). I have created an "all" field >>>> for the purpose of using it as a default. It contains most >>>> but not all fields. >>>> >>>> I recently added phonetic searching for the first and last >>>> names (together in a single field) but it will only work if >>>> the query specifies that field like this: >>>> >>>> chris or phonetic:schultz >>>> >>>> Is there a way to add the phonetic field to the "all" field >>>> and have it searched phonetically alongside the non-phonetic >>>> fields/terms? I see I cannot have multiple "default fields" >>>> :) >>>> >>>> I know on the back-end I can construct a query like this: >>>> >>>> all:[query] phonetic:[query] >>>> >>>> ...but I'd prefer to do as little massaging of the query as >>>> possible. >>>> >>>> Thanks, -chris >>>> > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqm/vIdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjWfhAAumgA99yS8li4nT+r FLtssQ4bitBaMk3QtJQCaHULV90OHYWtL2h0JDewsve+1QP1pUU0R2d/uzHdAF5G Nzo1bqqUCkt11NwsOC4Fe7B06ZCYTlUf7r/qWiNI2fJmRy7Wit8+6qqtMuXNpGMr U+dgsiB9Wn6ygfuDKsMU8++MIxPT908Tu2wDarTTRQ6DvGyGucuRMf8ItYKklBIv I4pDuuS5UY8CpZcIN8bw8Hm7rbfXskC12Lezk81QDnNimbC4u8J9uinReqpWGzC0 d2VsNDkKONBzjGaeUwvtyBLJrEXqWn9F75nSq8PYC/eOalEO8iO9pwolmbtnJ3na VRE8TjsuapoOTYZEZcxbw39/U0gCcO4Ns5Fs3W405gA5ouQ5qnOKPHnk5hRxAEBo QW/31n+mXsjt3S8EzRtlCwyXcVykyyafS5exzzZqgx4j8hJMw3zfHUkF8oJC5nyt f5Tvk/8w5epe/3xKSeAf+5QTtAT5/5DftYiOvMqraTxwVuO/d4QeehAMtzUVCdm8 8JXpAbvp9HiADt52fP8YqQwMs4aX3cbA8SeDaVMZK130XM72hcg7ykspwnEejE8a DIRbHG4I5Z6F4M9mTBudKbJhhDzkWYjo5mAjnd+1apOv1wKH1qELz7XSevVsyQb0 SI+WDftPFpdJ8bAu5XTGvs5TOpw= =RM6r -END PGP SIGNATURE-
Re: Including a filtered-field in the default-field
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Erick, On 3/12/18 1:36 PM, Erick Erickson wrote: > Did you try edismax? Err no, and I must admit that it's a lot to take in. Did you have a particular suggestion for how to use it? Thanks, - -chris > On Mon, Mar 12, 2018 at 10:20 AM, Christopher Schultz > <ch...@christopherschultz.net> wrote: All, > > I have a Solr index containing application user information > (username, first/last, etc.). I have created an "all" field for the > purpose of using it as a default. It contains most but not all > fields. > > I recently added phonetic searching for the first and last names > (together in a single field) but it will only work if the query > specifies that field like this: > > chris or phonetic:schultz > > Is there a way to add the phonetic field to the "all" field and > have it searched phonetically alongside the non-phonetic > fields/terms? I see I cannot have multiple "default fields" :) > > I know on the back-end I can construct a query like this: > > all:[query] phonetic:[query] > > ...but I'd prefer to do as little massaging of the query as > possible. > > Thanks, -chris > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmxzEdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFiejxAAwYVKaztnIESfGTik W2AVf1b/gOKp4WKzOK4Jg37Qjmguzfm2yPuD38gauP62ph5JPKeWjIAU4JHi/O8Y U5tWO2AuJBC/l7+t++vDTITP1iwFS1c7iXCpaMpZ16Ji3ScuKW5ZV2vMIioLXw6U 9BH5xtgh63D2JUyse89LzNVQULBENZx2uzAz7+q7ne7LYdWSR8949ry0EUOssVX+ HEV1Be2QmmLUVHg1bSsTt678mrqV8EGm8Z1pf7WOBK6OJKA36iRTSxlyboShMP5D 5OnIfUoL+HxwIOnjevAmZU4zDZVllXdBme66xF/WT8+HP3NEWqBMDzcRlO8M1TPe yY9Y0By9cpkIXasm2uVYZzmUy5Hb+CcQOUXKLrvxqkE018+iey49pivr7ne2+W+B m0XF7qha4zPBT3onZNm2iDqNuMbVv1443aaAMNjh/E6RwsEgJ6PKSZjMKxM6QEdj 5Jy6dUGqhkQHgFfEa6srz1XHSbL5vwPyH3WQNv1pIRuvASbdfEinZPNmiPM36iMi itlO9HulZ80/It7aQR9llqt10bEK0vh7CzN7EzI6Yu+st1g/uVHeg8GMupNYd0u0 /1t0NQafKjaf1UTO5ubLN/QiDSA1NYVNHkAczbnNzdHj0Z7sI5l2fkNWWxFzikQi RuYSZRH+EG4C2ey1f8vkxlrwIwE= =lb3e -END PGP SIGNATURE-
Including a filtered-field in the default-field
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I have a Solr index containing application user information (username, first/last, etc.). I have created an "all" field for the purpose of using it as a default. It contains most but not all fields. I recently added phonetic searching for the first and last names (together in a single field) but it will only work if the query specifies that field like this: chris or phonetic:schultz Is there a way to add the phonetic field to the "all" field and have it searched phonetically alongside the non-phonetic fields/terms? I see I cannot have multiple "default fields" :) I know on the back-end I can construct a query like this: all:[query] phonetic:[query] ...but I'd prefer to do as little massaging of the query as possible. Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmtt8dHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFggzQ//SRglgqRRvh9GgwCk A2xAjiMZSnf6iYsVJDMPlP1IhlLVrgPH5gm3tKWbOHdf52s3RPaWXHlDE3PAuHtx gpDPOief+mo4X/XmlWnj8R461XahOAhEDKeJ6uIS4X2qR2hZvSQd+gXMN4/aoA/l BcRNSTiQGKUVDDj+3wZayFhjElrrfaDbWC2dwnM2ULMrgK7xyhnulCVjUV+hOswY AmMyTuDJNjPuiT867x8Cckoh8J468OkBtQUUkdHn9UiHwShD8TxaDSUVcpqyxihr oODmvLfPN6EIkv0CN4h/pNrRCvQBlNTSeIh2AqFk7rnD4W0nWSzRXcV5seyGushI pzPvebtIsYcxx+DU7d/4jqH42yba9fADPFa+xhHckbwY4e2lZxvKF7HpLu6aoVnH zhCl/Cdu3cwZPmDWsUX+3Xkb7r28pe1iUrdNoYrbrhfL2WR6xJX5hL32QQyoyy8V w/SU8XLgETYSe0oN773F/Lxjwf2AVATdKY9acrsKr+KSI5VFEFBHNJctkpk0o630 OOI1FYencsbiCdIVoRQ5b94EU/iAqs7r8wi6OdyeawyPOFDZhFwFMeYUPasuWrCP MEB0iSbCI9OIG5pi6tiTtbOQQ85Qb41u2VcyOqKHkneEMn58/nx2QY0FiS3XvcC7 HuZC0A7VssLRu2g5+joWp4NBILI= =yyJm -END PGP SIGNATURE-
Re: Defining a phonetic analyzer and searcher via the schema API
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Erick, On 3/12/18 1:00 PM, Erick Erickson wrote: > bq: which you aren't supposed to edit directly. > > Well, kind of. Here's why it's "discouraged": > https://lucene.apache.org/solr/guide/6_6/schema-api.html. > > But as long as you don't mix-and-match hand-editing with using the > schema API you can hand edit it freely. You're then in charge of > pushing it to ZK and reloading your collections that use it > yourself however. No Zookeeper (yet), but I suspect I'll end up there. I'm mostly toying-around with it right now, but it won't be long before I'll want to go live with it and having a single Solr instance isn't going to help me sleep well at night. I'm sure I'll end up with two instances to begin with, which requires ZK, right? > As a side note, even if I _never_ hand-edited it I'd make it a > practice to regularly pull it from ZK and put it in some VCS system > ;) Actually, I have the script that builds the schema in VCS, so it's roughly the same. As for the schema modifications... did I get those right? Thanks, - -chris > On Mon, Mar 12, 2018 at 9:51 AM, Christopher Schultz > <ch...@christopherschultz.net> wrote: All, > > I'd like to add a new synthesized field that uses a phonetic > analyzer such as Beider-Morse. I'm using Solr 7.2. > > When I request the current schema via the schema API, I get a list > of existing fields, dynamic fields, and analyzers, none of which > appear to be what I'm looking for. > > Conceptually, I think I'd like to do something like this: > > add-field: { name: phoneticname, type: phonetic, multiValued: true > } > > ... but how do I define what type of data "phonetic" should be? > > I can see the example XML definition in this document: > https://lucene.apache.org/solr/guide/7_2/filter-descriptions.html#Filt er > > Descriptions-Beider-MorseFilter > > But I'm not sure how to add an analyzer to the schema using the > schema API: > https://lucene.apache.org/solr/guide/7_2/schema-api.html > > Under "Add a new field type", it says that new analyzers can be > defined, but I'm not entirely sure how to do that ... the API docs > refer to the field type definitions page[1] which just shows what > XML you'd have to put into your schema XML -- which you aren't > supposed to edit directly. > > When looking at the JSON version of my schema, I can see for > example thi s: > > "fieldTypes":[{ "name":"ancestor_path", "class":"solr.TextField", > "indexAnalyzer":{ "tokenizer":{ > "class":"solr.KeywordTokenizerFactory"}}, "queryAnalyzer":{ > "tokenizer":{ "class":"solr.PathHierarchyTokenizerFactory", > "delimiter":"/"}}}, > > So should I create a new field type like this? > > "add-field-type" : { "name" : "phonetic", "class" : > "solr.TextField", > > "analyzer" : { "tokenizer": { "class" : > "solr.StandardTokenizerFactory" }, > > "filters" : [{ "class": "solr.BeiderMorseFilterFactory", > "nameType": "GENERIC", "ruleType": "APPROX", "concat": "true", > "languageSet": "auto" }] } } > > Then, use copy-field as "usual": > > "add-field":{ "name":"phonetic", "type":"phonetic", multiValued: > true, "stored":false }, > > "add-copy-field":{ "source":"first_name", "dest":"phonetic" }, > > "add-copy-field":{ "source":"last_name", "dest":"phonetic" }, > > This seems to work but I wanted to know if I was doing it the right > way. > > Thanks, -chris > > [1] > https://lucene.apache.org/solr/guide/7_2/field-type-definitions-and-pr op > > erties.html#field-type-definitions-and-properties > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmtY4dHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFhdIA/9GkZ/yimVmkwB725L uS4kcy4YJowyYw+eMtvurpIq/ZV/U8H4hFJY/ddsT+bdrjeZMsTdc7B9Tdlha8xt dmuj1VcvDn3uyIUGooTOob6ZvZwjeJEZIJrbwUM5gNq7uJW8xpCU0/3+iP6Km7OY 1Nia5uCuwarLWcsRFdtjCvR3M7ZppBYHec3kVGGOUL637AC6ISgpxhuzOnuTHAss wCjuR1y6AdTjRbHpis3MJdiVIjEENfyzGpEnqvumsu1e+0F/A0DNbhU9nAPv+73d aOLfOW9Fs6jjnq96qzIBAkHLWkqU1GHKYNYHql7/59x8rFcjGkGC7ziSY69lKc+f ivrIEqLH1Go7kawz+1og3dPyl/n0CFWE3UK+wj5QeTY5XLduq0x6EmFKW6D790BS ywmFuqr4cmvKbs3N6BbxHz5QVbjgRsWO4jp4kJi3KDCepd8vKW+2xwHfX/zAcBKY rSDuVkM3KtxQal8xgm4tsvyU3g1dXpNEVa7PFXYJzd3uA2yij9OU6s83NS9LHK3N 2zssPfNDj7QddAEhYan0O4r4wSUN2UNT9nMhBVXXYRpoD6WzrhC5TdRUDh66rkOB AvhAUKsV0rfjct+MUBpQA9W+SUG7i911wNSBJJmB58MYbyxMAJb8NKGk1yEs1MyH FQHEgiEEFRCD9ZFd/fqwfuPyKQo= =Vqz6 -END PGP SIGNATURE-
Defining a phonetic analyzer and searcher via the schema API
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I'd like to add a new synthesized field that uses a phonetic analyzer such as Beider-Morse. I'm using Solr 7.2. When I request the current schema via the schema API, I get a list of existing fields, dynamic fields, and analyzers, none of which appear to be what I'm looking for. Conceptually, I think I'd like to do something like this: add-field: { name: phoneticname, type: phonetic, multiValued: true } ... but how do I define what type of data "phonetic" should be? I can see the example XML definition in this document: https://lucene.apache.org/solr/guide/7_2/filter-descriptions.html#Filter Descriptions-Beider-MorseFilter But I'm not sure how to add an analyzer to the schema using the schema API: https://lucene.apache.org/solr/guide/7_2/schema-api.html Under "Add a new field type", it says that new analyzers can be defined, but I'm not entirely sure how to do that ... the API docs refer to the field type definitions page[1] which just shows what XML you'd have to put into your schema XML -- which you aren't supposed to edit directly. When looking at the JSON version of my schema, I can see for example thi s: "fieldTypes":[{ "name":"ancestor_path", "class":"solr.TextField", "indexAnalyzer":{ "tokenizer":{ "class":"solr.KeywordTokenizerFactory"}}, "queryAnalyzer":{ "tokenizer":{ "class":"solr.PathHierarchyTokenizerFactory", "delimiter":"/"}}}, So should I create a new field type like this? "add-field-type" : { "name" : "phonetic", "class" : "solr.TextField", "analyzer" : { "tokenizer": { "class" : "solr.StandardTokenizerFactory" }, "filters" : [{ "class": "solr.BeiderMorseFilterFactory", "nameType": "GENERIC", "ruleType": "APPROX", "concat": "true", "languageSet": "auto" }] } } Then, use copy-field as "usual": "add-field":{ "name":"phonetic", "type":"phonetic", multiValued: true, "stored":false }, "add-copy-field":{ "source":"first_name", "dest":"phonetic" }, "add-copy-field":{ "source":"last_name", "dest":"phonetic" }, This seems to work but I wanted to know if I was doing it the right way. Thanks, - -chris [1] https://lucene.apache.org/solr/guide/7_2/field-type-definitions-and-prop erties.html#field-type-definitions-and-properties -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmsC4dHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjZWRAAisee5Ya+5dyix91A cGpwgZtFpcVldhd0wDG8qwihq9528vBZCdDSM3yotojMd+Y9dYLm+Q+oM/RT/zoO IXVfRRc352GqG00++hYKpZONUp9Eb3RNjl64+TCufz7vSpr3U/TsJL4wwIMQAY3r eItN/v6TWvvb6jd0z/zL1eITeheOm7bFGjZhGRNv2A7LaQbqTLs6N+SgYphUv7mr E6oQZD5VsdNDqmQdpXVA+Z+eiHweST5JHm1T2ePPz2S7lYunmAcGkAhCmTn2Kwew H3C8+h+mD14YlfYK5J0VcQ2WMZtOkgNNvBiUGIUoEGoqu82dX81408cS49/ZYD/3 c9/p41nfzz2V9M3HwgYqbQTI9vV5HP33t44BsWIQr34x86yAPfnMIH3Yv5iEfXTk aGAyeQjkfmMfJbiKTtmVu8Z7q/AiacgzUFUh3yMzGnoDQKz/OWw0A3JkdJ0TT/vY Y6ZiwarooO1tuhG+wm4h+6rUQpoueJS7K8cdWi7LfVb9LGLgj7NCaOQtyIn9QAmk 1UxaJjIOiyO1hsV31nC0kXfKW2A/gkN444gitSi51106QuzIXpEtCeAc4QmqjJt9 yeI61DFbQRnr76oVCiyYQwEmOj+C0bOkZqkLU7ZvMonWLLjgX0ydrpNSfm0fDDNv tdfbE/POTM+uJlgX0UEEJhN7qz0= =bgGi -END PGP SIGNATURE-
Re: Solr Read-Only?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Terry, On 3/6/18 4:55 PM, Terry Steichen wrote: > Chris, > > Thanks for your suggestion. Restarting solr after an in-memory > corruption is, of course, trivial (compared to rebuilding the > indexes). > > Are there any solr directories that MUST be read/write (even with > a pre-built index)? Would it suffice (for my purposes) to make > only the data/index directory R-O? I installed Solr for the first time 2 weeks ago, so I'm not a great resource, here. But I've used Lucene in the past and the on-disk storage is basically the same AFAICT. When starting with a expand-the-tarball-and-just-go-for-it deployment model, I'd probably make sure that the server/solr directory and everything below it was non-writable by the Solr-user. Obviously, once you have set this up in a test lab, just try to break it and see what happens :) - -chris > On 03/06/2018 04:20 PM, Christopher Schultz wrote: >> Terry, >> >> On 3/6/18 4:08 PM, Terry Steichen wrote: >>> Is it possible to run solr in a read-only directory? >> >>> I'm running it just fine on a ubuntu server which is >>> accessible only through SSH tunneling. At the platform level, >>> this is fine: only authorized users can access it (via a >>> browser on their machine accessing a forwarded port). >> >>> The problem is that it's an all-or-nothing situation so >>> everyone who's authorized access to the platform has, in >>> effect, administrator privileges on solr. I understand that >>> authentication is coming, but that it isn't here yet. (Or, to >>> add complexity, I had to downgrade from 7.2.1 to 6.4.2 to >>> overcome a new bug concerning indexing of eml files, and 6.4.2 >>> definitely doesn't have authentication.) >> >>> Anyway, what I was wondering is if it might be possible to run >>> solr not as me (the administrator), but as a user with lesser >>> privileges so that no one who came through the SSH tunnel could >>> (inadvertently or otherwise) screw up the indexes. >> >> With shell access, the only protection you could provide would >> be through file-permissions. But of course Solr will need to be >> read-write in order to build the index in the first place. So >> you'd probably have to run read-write at first, build the index >> (perhaps that's already been done in the past), then (possibly) >> restart in read-only mode. >> >> Read-only can be achieved by simply revoking write-access to the >> data directories from the euid of the Solr process. >> Theoretically, you could switch from being read-write to >> read-only merely by changing file-permissions... no Solr restarts >> required. >> >> I'm not sure if it matters to you very much, but a user can still >> do some damage to the index even if the "server" is read-only >> (through file-permissions): they can issue a batch of DELETE or >> ADD requests that will effect the in-memory copies of the index. >> It might be temporary, but it might require that you restart the >> Solr instance to get back to a sane state. >> >> Hope that helps, -chris >> > -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqfFf8dHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFhNbQ//SNP5gVLO/Ntt3OA5 9Cg05Gzvc7lNvLQVW1SSDFiQHbAJ91/6CB1N/AHhCTOLyRzmAoYBsOF+wgOuufrV Z8FZBbSCVACiNi48n+agNfA/QQ79pBgTBaharAZqFaEybxhLgivAw5f9VyhABxSt 5Ceq2UffHzOFL4q8yRSpPPwOTAPnPzSH2Qvsv7039ZRJRehiV5WZiwU318Tkbtoy M3LbTjWWlm9/IvqzYyf3KuKAytWDIvXs7aSwGi9RI0K9PtGCJwzz4Dp8G6dJCTo3 +2jLe5Q/bRATEwrNO+uriOUk6DOT2+9giUJbyBQjwW2e9jWCxiUCN/NVosjY1M6F zb9beuQ8Oglkzz/PlcsLpavH7vNayeVhVB2+yGK1L5XiRKz5qtvY7GaFuol4Lb7s 21PR5911vuuw79Kqi7q7srmJF/AtIPbsnBK9c/6Ts6h+VzR1BH+eflec9tSvH5rK OuSyX6KKFjjMskZglHQz5kzdrn6tb1KLt0+lXr5SZpVSUt6YEtlyZMKDFVuxrLFB SsZ8jhjxBh2YYYOhPCkan69bZoz4yyoE49g70+raAwKILZi1z4INFJ0Lf0eS9BSg jXCjUAa+53Ne4/PyVRvycQYEHvPobSyPAW7dMXucldeUmIimn8mC/eLUgV0YTGaM K6WVWl+oMrE5kLhyUEXtEYcdYwM= =IAv7 -END PGP SIGNATURE-
Re: Solr Read-Only?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Terry, On 3/6/18 4:08 PM, Terry Steichen wrote: > Is it possible to run solr in a read-only directory? > > I'm running it just fine on a ubuntu server which is accessible > only through SSH tunneling. At the platform level, this is fine: > only authorized users can access it (via a browser on their machine > accessing a forwarded port). > > The problem is that it's an all-or-nothing situation so everyone > who's authorized access to the platform has, in effect, > administrator privileges on solr. I understand that authentication > is coming, but that it isn't here yet. (Or, to add complexity, I > had to downgrade from 7.2.1 to 6.4.2 to overcome a new bug > concerning indexing of eml files, and 6.4.2 definitely doesn't have > authentication.) > > Anyway, what I was wondering is if it might be possible to run solr > not as me (the administrator), but as a user with lesser privileges > so that no one who came through the SSH tunnel could (inadvertently > or otherwise) screw up the indexes. With shell access, the only protection you could provide would be through file-permissions. But of course Solr will need to be read-write in order to build the index in the first place. So you'd probably have to run read-write at first, build the index (perhaps that's already been done in the past), then (possibly) restart in read-only mode. Read-only can be achieved by simply revoking write-access to the data directories from the euid of the Solr process. Theoretically, you could switch from being read-write to read-only merely by changing file-permissions... no Solr restarts required. I'm not sure if it matters to you very much, but a user can still do some damage to the index even if the "server" is read-only (through file-permissions): they can issue a batch of DELETE or ADD requests that will effect the in-memory copies of the index. It might be temporary, but it might require that you restart the Solr instance to get back to a sane state. Hope that helps, - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqfBiEdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFg9WBAAw1AoqeNTmndplMwT YRLznWAaiSi2/bCzxQEFf8KlTXh80rMc9zVPvMhgqJQYx0EGiMqyUqQEAk1xc/Vq 5XGNk0Vf2efnjA4HVS5pHvhWJz2t4ATagqX6Z98qFvvO0OqkX7lpZat8612jfDYA f2PmZ1GGlkxZhU7eP4u7FX1drVTFJPBWeUndZoPiSZg6Sj/zz4+rbfaCIEhcl2hC 1CorI3OIos4NgJjLwCqHLCuurkN0+NEJOFE+n2wsEJA69UES8sBo4rwZMR7TECWN mv+bFHVc4RQIvmppFPSptQIAX4T0k7PgNY38pfGPKgpHgET8RbvpKP34S434uR06 w8jhwOCUOSY7iUP718vbzK9RKcJFzYB6hb2hIUe/C8Hig2K1EfOys7NHd96uBYvS 7fKL6zHByCw9Fw+XiA1O8q5D6Clo3DAWEix5JUl7FDmbXIeUftHEmzb7axfDisec B80ZYFSUmtOAshaRhKT1dSaw6wIi8io/VDYw+UMIyKh4MFZFDDiN2fF8JLwGkFF4 whZvIaaP8iUBdrhc6ZlOupMA2mjjq+ugAjelyeVjxc/ogaqSOQzIyah7NgW0yvYY u7xaMsVSg6OJWluAe6lEh0U1CYpdBABgdkSjs7rHefIQ/n4du+7sq0fQUcE32dX8 jMOD3In9TqX4JXP3c6EDfMQCN1g= =FrpI -END PGP SIGNATURE-
Alias field names when searching (not for results)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I'd like for users to be able to search a field by multiple names without performing a "copy-field" when analyzing a document. Is that possible? Whenever I search for "solr alias field" I get results about how to re-name fields in the results. Here's what I'd like to do. Let's say I have a document: { id: 1234, field_1: valueA, field_2: valueB, field_3: valueC } I'd like users to be able to find this document using any of the following queries: field_1:valueA f1:valueA 1:valueA I just want the query parser to say "oh, 'f1' is an alias for 'field_1'" and substitute that when performing the search. Is that possible? - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqddZMdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFgbFg/9HIgJgX4Lib2X4XYU P2F4uW9TyDWtp6mA9xsfdYxRNe4K3yFPbkUUwJW2MI2V62SR6apB+TghOMqbmCD/ gaQ0CFWgLsn5Egulj2taUN+MAYD/4GMO9ltyXNc2g9siSMIDUS5N09fwJbxfBXrP SPvSQqUOVD5wKCgoCXCVd+RM+SEClX4k1ZuWDbVAiO4YPpJwFy6+BN2uTCaqP3Ll XOqn+/6ejnPCcvoQrTlE1/DiBTUti8H7V0LOjzEZns8YqZOAH+pAVxYRRQM5UzZS pUBGpHokoaZ0tMf/aCmHp5pI5VWrxrXcS47csBRvoAn8Z7uRxH8p0wYE8BkGs2rw dEzOSOKdhma11ZDkWKg2/sBw8v9swyWy9W3MuA0tqYzfZicsXT2GBHzyPDsqabDq mBPWuxUdqZEaz+fE8SRsW84ELcqe1fbltscng/ZhNRkLOtmn6aeMc+XABhpcVE7o Rfodl/PrQetgzZ4WLyzb7m2bz2w38x6WSPhuQIZHVrHNoCXG+gWY3zMxF6EBEFCV CJvsXaQ1ZpGLjO/uCXJ9iHKxsSoUzWap9qws82xH3QJ52Q7vCoxF5G/2MZWvvgje +MsZbh8L5D0HBM1jTKWx3X+r3FbdURu6P8yUFD/Hywy2J/jev1MiU4Zh3Yw+JByo mR8TdvleHAHfA01tArVgk2yscqI= =44DX -END PGP SIGNATURE-
Re: Updating documents and commit/rollback
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 3/2/18 7:46 PM, Shawn Heisey wrote: > On 3/2/2018 10:39 AM, Christopher Schultz wrote: >> The problem is that I'm updating the index after my SQL UPDATE(s) >> have run, but before my SQL COMMIT occurs. I have had a problem >> where the SQL fails and rolls-back, but the solrClient is not >> rolled-back. >> >> I'm a little wary of rolling-back Solr because, as I understand >> it, the client itself doesn't carry any transactional >> information. That is, it should be a shared-resource (within the >> web application) and indeed, other clients could be connecting >> from other places (like other app servers running the same >> application). Performing either commit() or rollback() on the >> Solr client will commit/rollback *all* writes since the last >> commit, right? > > Correct. Relational databases typically keep track of transactions > on one connection separately from transactions on another > connection, and can roll one of them back without affecting the > others. > > Solr doesn't have this capability. The reason that it doesn't have > this capability is that Lucene doesn't have it, and the majority of > Solr functionality is provided by Lucene. > > If updates are happening concurrently from multiple sources, then > there's no way to have any kind of meaningful rollback. > > I see two solutions: > > 1) Funnel all updates through a single thread/process, which will > not move on from one update to another until the final decision is > made about that update. Then rolling back becomes possible, > because there is only one source for updates. The disadvantage > here is that this thread/process becomes a bottleneck, and > performance may suffer greatly. Also, it can be a single point of > failure. If the rate of updates is low, then the bottleneck may > not be a problem. > > 2) Have your updating software revert the changes "manually" in > situations where the SQL change is rolled back ... by either > deleting the record or sending another update to change values back > to what they were before. Yeah, technique #2 was the only thing I could come up with that made any sense. Serializing updates is probably more trouble than it's worth. In an environment where I'd probably expect to have maybe 50 - 100 "writes" daily to a Solr core, how do you recommend commits be done? The documents are quite small (user metadata like username, first/last and email). Can I add/commit simultaneously? There seems to be no reason to perform separate add/commit steps in this scenario. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqddMUdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjQHBAAiZaJLBQM6t6OLYea LsGtqCtDTCmUuJGpBq7q8/+26OkgCTK0KDOGWlqpMeMvCe8uLlN0qDTGHEm0nLCk Ils9Yv+UOP8iiYMvodUxv5d5Y75Yt5aQ0yZ8X7vp1KOCXTZhXIjmAdtw8KaC3z4y zYJcI3DAEYurkmJcFVwZNQ7LRck2RWRNNsRfWaZ0yGAd2AUvvCp2zV3e0i5cs7hA xICklU+5+5Nsy90pyDalnpgwrbc0uE6ZFGSkAocSDBdvNNONbNAq+sUYsov8af0+ 6qhQWOqZOT2M+Ue51Nlqy+PtECzWOsqXcpFNyM/2Rsz1cnKCzAUbDs2Hi7m5R1UX tST10VBvFTJ4GukGVPxHysVxwTHVg1HYCEngfHKS7HqiVtwkqWMzm315toWoDRfQ J8EMeFZ/cQx716D+DPAKudGBWZ3akyODsb9h1KB4i85pGT4rijKhY7bxddhFDnHi gbCdnpU9/pv8G/Y2SUhW4SgEUd3X6YZZD/4cZ4ocrf8KaXBFrLe8iz1aoFYI5ldh i3TAi28dFHqxrofBTo4f42AXm9SYsycCQ2kBj7Yegyt5Sljfr3yoOckoJnNR05mX 2qjBIJJjJT0CvnV18azerdhpkZtcVbdVYC4WZHEjf6doC3SqqLHL6Pfu5Ha4APZ8 hc0tRk3wV+Cn/XVVx691QN0X1Nw= =0s2n -END PGP SIGNATURE-
Updating documents and commit/rollback
Hey, folks. I've been a long-time Lucene user (running a hilariously-old 1.9.1 version forever), but I'm only just now getting into using Solr. My particular use-case is storing information about web-application users so they can be found more quickly than our current RDBMS-based search (SELECT ... FROM user WHERE username LIKE '%foo%' OR email_address LIKE '%foo%' OR last_name LIKE '%foo%'...). I've set up my Solr (very basic... just untar, bin/solr start), created a core/collection (I'm running single-server for now, no cloudy zookeeper stuff ATM), customized my schema (using the Schema API, since hand-editing is discouraged) and loaded my data. I can search just fine through the Solr dashboard. I've also user solr-solrj to perform searches from within my application, replacing the previous JDBC-based search with the Solr-based one. All is well. Now I'm trying to figure out the best way to update users in the index when their information (e.g. first/last names) change. I have used solr-solrj quite simply like this: SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", user.getId()); doc.addField("username", user.getUsername()); doc.addField("first_name", user.getFirstName()); doc.addField("last_name", user.getLastName()); ... solrClient.add("users", doc); solrClient.commit(); I'm having a problem, though, and I'd like to know what the "right" solution is. The problem is that I'm updating the index after my SQL UPDATE(s) have run, but before my SQL COMMIT occurs. I have had a problem where the SQL fails and rolls-back, but the solrClient is not rolled-back. I'm a little wary of rolling-back Solr because, as I understand it, the client itself doesn't carry any transactional information. That is, it should be a shared-resource (within the web application) and indeed, other clients could be connecting from other places (like other app servers running the same application). Performing either commit() or rollback() on the Solr client will commit/rollback *all* writes since the last commit, right? That means that there is no meaningful way that I can say to Solr "oops, I actually need you to NOT add that document I just told you about". Instead, I have to either commit the document I don't want (and, I dunno, delete it later or whatever) or risk rolling-back other writes that other clients have performed. Do I have that right? So... what's the best way to do this kind of thing? Can I ask Solr to add-and-commit at the same time? If so, how? Is there a meaningful "rollback this one addition" that I can perform? If so, how? Thanks for a great product, -chris signature.asc Description: OpenPGP digital signature