Re: FW: Vulnerabilities in SOLR 8.6.2
As far as I can tell only your first and 5th emails went through. Either way, Cassandra responded on 20200929 - ~15 hrs after your first message: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202009.mbox/%3Cbe447e96-60ed-4a40-88dd-9e0c28be6c71%40Spark%3E Kevin Risden On Fri, Nov 13, 2020 at 11:35 AM Narayanan, Lakshmi wrote: > This is my 5th attempt in the last 60 days > > Is there anyone looking at these mails? > > Does anyone care?? L > > > > > > Lakshmi Narayanan > > Marsh & McLennan Companies > > 121 River Street, Hoboken,NJ-07030 > > 201-284-3345 > > M: 845-300-3809 > > Email: lakshmi.naraya...@mmc.com > > > > > > *From:* Narayanan, Lakshmi > *Sent:* Thursday, October 22, 2020 1:06 PM > *To:* solr-user@lucene.apache.org > *Subject:* FW: Vulnerabilities in SOLR 8.6.2 > > > > This is my 4th attempt to contact > > Please advise, if there is a build that fixes these vulnerabilities > > > > Lakshmi Narayanan > > Marsh & McLennan Companies > > 121 River Street, Hoboken,NJ-07030 > > 201-284-3345 > > M: 845-300-3809 > > Email: lakshmi.naraya...@mmc.com > > > > > > *From:* Narayanan, Lakshmi > *Sent:* Sunday, October 18, 2020 4:01 PM > *To:* solr-user@lucene.apache.org > *Subject:* FW: Vulnerabilities in SOLR 8.6.2 > > > > SOLR-User Support team > > Is there anyone who can answer my question or can point to someone who can > help > > I have not had any response for the past 3 weeks !? > > Please advise > > > > > > Lakshmi Narayanan > > Marsh & McLennan Companies > > 121 River Street, Hoboken,NJ-07030 > > 201-284-3345 > > M: 845-300-3809 > > Email: lakshmi.naraya...@mmc.com > > > > > > *From:* Narayanan, Lakshmi > *Sent:* Sunday, October 04, 2020 2:11 PM > *To:* solr-user@lucene.apache.org > *Cc:* Chattopadhyay, Salil ; Mutnuri, Vishnu > D ; Pathak, Omkar ; > Shenouda, Nasir B > *Subject:* RE: Vulnerabilities in SOLR 8.6.2 > > > > Hello Solr-User Support team > > Please advise or provide further guidance on the request below > > > > Thank you! > > > > Lakshmi Narayanan > > Marsh & McLennan Companies > > 121 River Street, Hoboken,NJ-07030 > > 201-284-3345 > > M: 845-300-3809 > > Email: lakshmi.naraya...@mmc.com > > > > > > *From:* Narayanan, Lakshmi > *Sent:* Monday, September 28, 2020 1:52 PM > *To:* solr-user@lucene.apache.org > *Cc:* Chattopadhyay, Salil ; Mutnuri, Vishnu > D ; Pathak, Omkar ; > Shenouda, Nasir B > *Subject:* Vulnerabilities in SOLR 8.6.2 > *Importance:* High > > > > Hello Solr-User Support team > > We have installed the SOLR 8.6.2 package into docker container in our DEV > environment. Prior to using it, our security team scanned the docker image > using SysDig and found a lot of Critical/High/Medium vulnerabilities. The > full list is in the attached spreadsheet > > > > Scan Summary > > *30* *STOPS **190* *WARNS**188* *Vulnerabilities* > > > > Please advise or point us to how/where to get a package that has been > patched for the Critical/High/Medium vulnerabilities in the attached > spreadsheet > > Your help will be gratefully received > > > > > > Lakshmi Narayanan > > Marsh & McLennan Companies > > 121 River Street, Hoboken,NJ-07030 > > 201-284-3345 > > M: 845-300-3809 > > Email: lakshmi.naraya...@mmc.com > > > > > > -- > > > ** > This e-mail, including any attachments that accompany it, may contain > information that is confidential or privileged. This e-mail is > intended solely for the use of the individual(s) to whom it was intended > to be > addressed. If you have received this e-mail and are not an intended > recipient, > any disclosure, distribution, copying or other use or > retention of this email or information contained within it are prohibited. > If you have received this email in error, please immediately > reply to the sender via e-mail and also permanently > delete all copies of the original message together with any of its > attachments > from your computer or device. > ** >
Re: can't connect to SOLR with JDBC url
> > start (without option : bin/solr start) Solr SQL/JDBC requires Solr Cloud (running w/ Zookeeper) since streaming expressions (which backs the Solr SQL) requires it. You should be able to start Solr this way to get Solr in cloud mode. bin/solr start -c If you use the above to start Solr, the embedded ZK is on localhost:9983 so the JDBC connection string should be: jdbc:solr://localhost:9983?collection=test Assuming your collection name is test. Kevin Risden On Fri, Nov 6, 2020 at 11:31 AM Vincent Bossuet wrote: > Hi all :) > > I'm trying to connect to Solr with JDBC, but I always have > "java.util.concurrent.TimeoutException: Could not connect to ZooKeeper > localhost:9983/ within 15000 ms" (or other port, depends wich jdbc url I > test). > > Here what I did : > >- > >I installed Solr 7.7.2 (i followed install doc here ><https://lucene.apache.org/solr/guide/7_2/installing-solr.html>), i.e. >download, extract, start (without option : bin/solr start). This > version of >Solr is the one I have at work, so i installed the same to test before > on >localhost. >- > >I added a 'test' collection and the examples xml documents, I can see >them at this url <http://localhost:8983/solr/test/select?q=*%3A*> >- > >then I installed DbVisualizer, added driver and a connection, like > explained >here ><https://lucene.apache.org/solr/guide/7_2/solr-jdbc-dbvisualizer.html> > => >the only differences I saw with documentation is that on screencopy with >the jar to import, versions are differents and there is one more jar in >solr archive (commons-math3-3.6.1.jar). Also, the jdbc url to use is > with >or without a '/' in the middle (see here ><http://jdbc:solr//localhost:9983?collection=test>), as this : >jdbc:solr://localhost:9983?collection=test or >jdbc:solr://localhost:9983/?collection=test. I don't know if it is >important... >- > >and I tried both on Ubuntu VM and Windows 10 > > So, all seems to be installed correctly, as in documentation, but when I > click on 'connect', always have a timeout. Every website where I found some > info talk about an url with 9983 port, I tried other possibilities (just in > case) but no success... > >- jdbc:solr://localhost:9983?collection=test >- jdbc:solr://127.0.0.1:9983?collection=test >- jdbc:solr://localhost:9983/?collection=test >- jdbc:solr://localhost:9983/solr?collection=test >- jdbc:solr://localhost:8983/?collection=test >- jdbc:solr://localhost:8983?collection=test >- jdbc:solr://localhost:8983/solr?collection=test >- jdbc:solr://localhost:2181?collection=test >- jdbc:solr://localhost:2181/?collection=test >- jdbc:solr://localhost:2181/solr?collection=test > > If you have an idea, thanks for help ! > > Vincent >
Re: Solr 8.6.2 - Admin UI Issue
Since the image didn't come through - it could be https://issues.apache.org/jira/browse/SOLR-14549 Definitely make sure to clear cache to ensure that JS files aren't cached, but if that doesn't fix it see if SOLR-14549 is related. Kevin Risden On Thu, Oct 8, 2020 at 9:38 AM Eric Pugh wrote: > I’ve seen this behavior as well jumping between versions of Solr. > Typically in the browser console I see some sort of very opaque Javascript > error. > > > On Oct 8, 2020, at 5:54 AM, Colvin Cowie > wrote: > > > > Images won't be included on the mailing list. You need to put them > > somewhere else and link to them. > > > > With that said, if you're switching between versions, maybe your browser > > has the old UI cached? Try clearing the cache / viewing it in a private > > window and see if it's any different. > > > > On Wed, 7 Oct 2020 at 11:22, Vinay Rajput <mailto:vinayrajput4...@gmail.com>> wrote: > > > >> Hi All, > >> > >> We are currently using Solr 7.3.1 in cloud mode and planning to upgrade. > >> When I bootstrapped Solr 8.6.2 in my local machine and uploaded all > >> necessary configs, I noticed one issue in admin UI. > >> > >> If I select a collection and go to files, it shows the content tree > having > >> all files and folders present in that collection. In Solr 8.6.2, it is > >> somehow not showing the folders correctly. In my screenshot, you can see > >> that velocity and xslt are the folders and we have some config files > inside > >> these two folders. Because of this issue, I can't click on folder nodes > and > >> see children nodes. I checked the network calls and it looks like we are > >> getting the correct data from Solr. So, it looks like an Admin UI issue > to > >> me. > >> > >> Does anyone know if this is a* known issue* or I am missing something > >> here? Has anyone noticed the similar issue? I can confirm that It works > >> fine with Solr 7.3.1. > >> > >> [image: image.png][image: image.png] > >> > >> Left image is for 8.6.2 and right image is for 7.3.1 > >> > >> Thanks, > >> Vinay > > ___ > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com < > http://www.opensourceconnections.com/> | My Free/Busy < > http://tinyurl.com/eric-cal> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > >
Re: [CAUTION] SSL + Solr 8.5.1 in cloud mode + Java 8
You need to remove the references from bin/solr or bin/solr.cmd to SOLR_SSL_CLIENT_KEY_STORE and "-Djavax.net.ssl.keyStore". This is different from solr.in.sh. The way the bin/solr script is written it is falling back to whatever is provided as SOLR_SSL_KEY_STORE for the client keystore which is causing issues. Kevin Risden On Wed, Jul 15, 2020 at 3:45 AM Natarajan, Rajeswari < rajeswari.natara...@sap.com> wrote: > Thank you for your reply. I looked at solr.in.sh I see that > SOLR_SSL_CLIENT_KEY_STORE is already commented out by default. But you are > right I looked at the running solr, I see the option > -Djavax.net.ssl.keyStore pointing to solr-ssl.keystore.p12 , not sure how > it is getting that value. Let me dig more. Thanks for the pointer. Also if > you have a pointer how it get's populated other than > SOLR_SSL_CLIENT_KEY_STORE config in solr.in.sh , please let me know > > #SOLR_SSL_CLIENT_KEY_STORE= > #SOLR_SSL_CLIENT_KEY_STORE_PASSWORD= > #SOLR_SSL_CLIENT_KEY_STORE_TYPE= > #SOLR_SSL_CLIENT_TRUST_STORE= > #SOLR_SSL_CLIENT_TRUST_STORE_PASSWORD= > #SOLR_SSL_CLIENT_TRUST_STORE_TYPE= > > Yes we are not using Solr client auth. > > Thanks, > Rajeswari > > On 7/14/20, 5:55 PM, "Kevin Risden" wrote: > > Hmmm so I looked closer - it looks like a side effect of the default > passthrough of the keystore being passed to the client keystore. > > https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L229 > > Can you remove or commout the entire SOLR_SSL_CLIENT_KEY_STORE section > from > bin/solr or bin/solr.cmd depending on which version you are using? The > key > being to make sure to not set "-Djavax.net.ssl.keyStore". > > This assumes that you aren't using Solr client auth (which based on > your > config you aren't) and you aren't trying to use Solr to connect to > anything > that is secured via clientAuth (most likely you aren't). > > If you can try this and report back that would be awesome. I think this > will fix the issue and it would be possible to make client auth opt in > instead of default fall back. > Kevin Risden > > > > On Tue, Jul 14, 2020 at 1:46 AM Natarajan, Rajeswari < > rajeswari.natara...@sap.com> wrote: > > > Thank you so much for the response. Below are the configs I have in > > solr.in.sh and I followed > > https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html > documentation > > > > # Enables HTTPS. It is implicitly true if you set > SOLR_SSL_KEY_STORE. Use > > this config > > # to enable https module with custom jetty configuration. > > SOLR_SSL_ENABLED=true > > # Uncomment to set SSL-related system properties > > # Be sure to update the paths to the correct keystore for your > environment > > SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.p12 > > SOLR_SSL_KEY_STORE_PASSWORD=secret > > SOLR_SSL_TRUST_STORE=etc/solr-ssl.keystore.p12 > > SOLR_SSL_TRUST_STORE_PASSWORD=secret > > # Require clients to authenticate > > SOLR_SSL_NEED_CLIENT_AUTH=false > > # Enable clients to authenticate (but not require) > > SOLR_SSL_WANT_CLIENT_AUTH=false > > # SSL Certificates contain host/ip "peer name" information that is > > validated by default. Setting > > # this to false can be useful to disable these checks when re-using a > > certificate on many hosts > > SOLR_SSL_CHECK_PEER_NAME=true > > > > In local , with the below certificate it works > > --- > > > > keytool -list -keystore solr-ssl.keystore.p12 > > Enter keystore password: > > Keystore type: PKCS12 > > Keystore provider: SUN > > > > Your keystore contains 1 entry > > > > solr-18, Jun 26, 2020, PrivateKeyEntry, > > Certificate fingerprint (SHA1): > > AB:F2:C8:84:E8:E7:A2:BF:2D:0D:2F:D3:95:4A:98:5B:2A:88:81:50 > > C02W48C6HTD6:solr-8.5.1 i843100$ keytool -list -v -keystore > > solr-ssl.keystore.p12 > > Enter keystore password: > > Keystore type: PKCS12 > > Keystore provider: SUN > > > > Your keystore contains 1 entry > > > > Alias name: solr-18 > > Creation date: Jun 26, 2020 > > Entry type: PrivateKeyEntry > > Certificate chain length: 1 > > Certificate[1]: > > Owner: CN=localhost, OU=Organizational Unit, O=Organization, > L=Location, > > ST=State, C=Country > > Issuer: CN=localhost, OU=Organizational
Re: [CAUTION] SSL + Solr 8.5.1 in cloud mode + Java 8
Hmmm so I looked closer - it looks like a side effect of the default passthrough of the keystore being passed to the client keystore. https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L229 Can you remove or commout the entire SOLR_SSL_CLIENT_KEY_STORE section from bin/solr or bin/solr.cmd depending on which version you are using? The key being to make sure to not set "-Djavax.net.ssl.keyStore". This assumes that you aren't using Solr client auth (which based on your config you aren't) and you aren't trying to use Solr to connect to anything that is secured via clientAuth (most likely you aren't). If you can try this and report back that would be awesome. I think this will fix the issue and it would be possible to make client auth opt in instead of default fall back. Kevin Risden On Tue, Jul 14, 2020 at 1:46 AM Natarajan, Rajeswari < rajeswari.natara...@sap.com> wrote: > Thank you so much for the response. Below are the configs I have in > solr.in.sh and I followed > https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html documentation > > # Enables HTTPS. It is implicitly true if you set SOLR_SSL_KEY_STORE. Use > this config > # to enable https module with custom jetty configuration. > SOLR_SSL_ENABLED=true > # Uncomment to set SSL-related system properties > # Be sure to update the paths to the correct keystore for your environment > SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.p12 > SOLR_SSL_KEY_STORE_PASSWORD=secret > SOLR_SSL_TRUST_STORE=etc/solr-ssl.keystore.p12 > SOLR_SSL_TRUST_STORE_PASSWORD=secret > # Require clients to authenticate > SOLR_SSL_NEED_CLIENT_AUTH=false > # Enable clients to authenticate (but not require) > SOLR_SSL_WANT_CLIENT_AUTH=false > # SSL Certificates contain host/ip "peer name" information that is > validated by default. Setting > # this to false can be useful to disable these checks when re-using a > certificate on many hosts > SOLR_SSL_CHECK_PEER_NAME=true > > In local , with the below certificate it works > --- > > keytool -list -keystore solr-ssl.keystore.p12 > Enter keystore password: > Keystore type: PKCS12 > Keystore provider: SUN > > Your keystore contains 1 entry > > solr-18, Jun 26, 2020, PrivateKeyEntry, > Certificate fingerprint (SHA1): > AB:F2:C8:84:E8:E7:A2:BF:2D:0D:2F:D3:95:4A:98:5B:2A:88:81:50 > C02W48C6HTD6:solr-8.5.1 i843100$ keytool -list -v -keystore > solr-ssl.keystore.p12 > Enter keystore password: > Keystore type: PKCS12 > Keystore provider: SUN > > Your keystore contains 1 entry > > Alias name: solr-18 > Creation date: Jun 26, 2020 > Entry type: PrivateKeyEntry > Certificate chain length: 1 > Certificate[1]: > Owner: CN=localhost, OU=Organizational Unit, O=Organization, L=Location, > ST=State, C=Country > Issuer: CN=localhost, OU=Organizational Unit, O=Organization, L=Location, > ST=State, C=Country > Serial number: 45a822c8 > Valid from: Fri Jun 26 00:13:03 PDT 2020 until: Sun Nov 10 23:13:03 PST > 2047 > Certificate fingerprints: > MD5: 0B:80:54:89:44:65:93:07:1F:81:88:8D:EC:BD:38:41 > SHA1: AB:F2:C8:84:E8:E7:A2:BF:2D:0D:2F:D3:95:4A:98:5B:2A:88:81:50 > SHA256: > 9D:65:A6:55:D7:22:B2:72:C2:20:55:66:F8:0C:9C:48:B1:F6:48:40:A4:FB:CB:26:77:DE:C4:97:34:69:25:42 > Signature algorithm name: SHA256withRSA > Subject Public Key Algorithm: 2048-bit RSA key > Version: 3 > > Extensions: > > #1: ObjectId: 2.5.29.17 Criticality=false > SubjectAlternativeName [ > DNSName: localhost > IPAddress: 172.20.10.4 > IPAddress: 127.0.0.1 > ] > > #2: ObjectId: 2.5.29.14 Criticality=false > SubjectKeyIdentifier [ > KeyIdentifier [ > : 1B 6F BB 65 A4 3C 6A F4 C9 05 08 89 88 0E 9E 76 .o.e. 0010: A1 B7 28 BE..(. > ] > > / > In a cluster env , where the deployment , keystore everything is > automated (used by multiple teams) keystore generated is as below. As you > can see the keystore has 2 certificates , in which case I get the > exception below. > > java.lang.UnsupportedOperationException: X509ExtendedKeyManager only > > supported on Server > > at > > > org.apache.solr.client.solrj.impl.Http2SolrClient.createHttpClient(Http2SolrClient.java:223) > > > > In both cases , the config is same except the keystore certificates . In > the JIRA (https://issues.apache.org/jira/browse/SOLR-14105) , I see the > fix says it supports multiple DNS and multiple certificates. So I thought > it should be ok. Please let me know . > > keytool -list -keystore /etc/nginx/certs/sidecar.p12 > Picked up JAVA_TOOL_OPTIONS: -Dfile
Re: [CAUTION] SSL + Solr 8.5.1 in cloud mode + Java 8
> > In local with just certificate and one domain name the SSL communication > worked. With multiple DNS and 2 certificates SSL fails with below exception. > A client keystore by definition can only have a single certificate. A server keystore can have multiple certificates. The reason being is that a client can only be identified by a single certificate. Can you share more details about specifically what your solr.in.sh configs look like related to keystore/truststore and which files? Specifically highlight which files have multiple certificates in them. It looks like for the Solr internal http client, the client keystore has more than one certificate in it and the error is correct. This is more strict with recent versions of Jetty 9.4.x. Previously this would silently fail, but was still incorrect. Now the error is bubbled up so that there is no silent misconfigurations. Kevin Risden On Mon, Jul 13, 2020 at 4:54 PM Natarajan, Rajeswari < rajeswari.natara...@sap.com> wrote: > I looked at the patch mentioned in the JIRA > https://issues.apache.org/jira/browse/SOLR-14105 reporting the below > issue. I looked at the solr 8.5.1 code base , I see the patch is applied. > But still seeing the same exception with different stack trace. The > initial excsption stacktrace was at > > at > org.eclipse.jetty.util.ssl.SslContextFactory.doStart(SslContextFactory.java:245) > > > Now the exception we encounter is at httpsolrclient creation > > > Caused by: java.lang.RuntimeException: > java.lang.UnsupportedOperationException: X509ExtendedKeyManager only > supported on Server > at > org.apache.solr.client.solrj.impl.Http2SolrClient.createHttpClient(Http2SolrClient.java:223) > > I commented the JIRA also. Let me know if this is still an issue. > > Thanks, > Rajeswari > > On 7/13/20, 2:03 AM, "Natarajan, Rajeswari" > wrote: > > Re-sending to see if anyone encountered had this combination and > encountered this issue. In local with just certificate and one domain name > the SSL communication worked. With multiple DNS and 2 certificates SSL > fails with below exception. Below JIRA says it is fixed for > Http2SolrClient , wondering if this is fixed for http1 solr client as we > pass -Dsolr.http1=true . > > Thanks, > Rajeswari > > https://issues.apache.org/jira/browse/SOLR-14105 > > On 7/6/20, 10:02 PM, "Natarajan, Rajeswari" < > rajeswari.natara...@sap.com> wrote: > > Hi, > > We are using Solr 8.5.1 in cloud mode with Java 8. We are > enabling TLS with http1 (as we get a warning java 8 + solr 8.5 SSL can’t > be enabled) and we get below exception > > > > 2020-07-07 03:58:53.078 ERROR (main) [ ] o.a.s.c.SolrCore > null:org.apache.solr.common.SolrException: Error instantiating > shardHandlerFactory class [HttpShardHandlerFactory]: > java.lang.UnsupportedOperationException: X509ExtendedKeyManager only > supported on Server > at > org.apache.solr.handler.component.ShardHandlerFactory.newInstance(ShardHandlerFactory.java:56) > at > org.apache.solr.core.CoreContainer.load(CoreContainer.java:647) > at > org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:263) > at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:183) > at > org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:134) > at > org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:751) > at > java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) > at > java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) > at > java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) > at > org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:744) > at > org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:360) > at > org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1445) > at > org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1409) > at > org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:822) > at > org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:275) > at > org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:52
Re: using S3 as the Directory for Solr
Solr's use of the HdfsDirectory may work over S3 directly if you use the Hadoop AWS binding - s3a [1]. The idea is to replace hdfs:// with s3a://bucket/. Since S3 is eventually consistent, the Hadoop AWS s3a project has s3guard to help with consistent listing. If you are only doing queries (no indexing) with Solr you may not need to worry about the eventual consistency. There was some previous exploration in this area with Solr 6.x/7.x, but it should be much better with Solr 8.x due to the upgraded Hadoop 3.x dependency. I haven't done any stress testing of this, but I made sure it at least in theory would connect. I could index and query some small datasets stored via s3a. Using the HdfsDirectory with s3a will most likely be slower as already pointed out. You might get reasonable performance depending on the nodes used and tuning the HdfsDirectory block cache. [1] https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html Kevin Risden On Fri, Apr 24, 2020 at 1:19 PM dhurandar S wrote: > Its 10 PB of source data, But we do have indexes on most of the attributes. > 80% or so > We have a need to support such large data and we have use cases of finding > a needle in the haystack kinda scenario. > Most of our users are used to Search query language or Solr in addition to > SQL. So we would have both the interfaces. > > We store the actual data in S3 in Parquet and have Presto query it using > SQL (Presto is similar to Hive but much much faster). > > We also now want to store the indexes in S3 we have leeway in query > interactivity performance, the key thing here is support finding the > needle in the haystack pattern and supporting really long-range data in a > cheaper fashion > > regards, > Rahul > > > On Thu, Apr 23, 2020 at 7:41 PM Walter Underwood > wrote: > > > It will be a lot more than 2X or 3X slower. Years ago, I accidentally put > > Solr indexes on an NFS mounted filesystem and it was 100X slower. S3 > would > > be a lot slower than that. > > > > Are you doing relevance-ranked searches on all that data? That is the > only > > reason to use Solr instead of some other solution. > > > > I’d use Apache Hive, or whatever has replaced it. That is what Facebook > > wrote to do searches on their multi-petabyte logs. > > > > https://hive.apache.org > > > > More options. > > > > https://jethro.io/hadoop-hive > > https://mapr.com/why-hadoop/sql-hadoop/sql-hadoop-details/ > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > > On Apr 23, 2020, at 7:29 PM, Christopher Schultz < > > ch...@christopherschultz.net> wrote: > > > > > > -BEGIN PGP SIGNED MESSAGE- > > > Hash: SHA256 > > > > > > Rahul, > > > > > > On 4/23/20 21:49, dhurandar S wrote: > > >> Thank you for your reply. The reason we are looking for S3 is since > > >> the volume is close to 10 Petabytes. We are okay to have higher > > >> latency of say twice or thrice that of placing data on the local > > >> disk. But we have a requirement to have long-range data and > > >> providing Seach capability on that. Every other storage apart from > > >> S3 turned out to be very expensive at that scale. > > >> > > >> Basically I want to replace > > >> > > >> -Dsolr.directoryFactory=HdfsDirectoryFactory \ > > >> > > >> with S3 based implementation. > > > > > > Can you clarify whether you have 10 PiB of /source data/ or 10 PiB of > > > /index data/? > > > > > > You can theoretically store your source data anywhere, of course. 10 > > > PiB sounds like a truly enormous index. > > > > > > - -chris > > > > > >> On Thu, Apr 23, 2020 at 3:12 AM Jan Høydahl > > >> wrote: > > >> > > >>> Hi, > > >>> > > >>> Is your data so partitioned that it makes sense to consider > > >>> splitting up in multiple collections and make some arrangement > > >>> that will keep only a few collections live at a time, loading > > >>> index files from S3 on demand? > > >>> > > >>> I cannot see how an S3 directory would be able to effectively > > >>> cache files in S3 and what units the index files would be stored > > >>> as? > > >>> > > >>> Have you investigated EFS as an alternative? That would look like > > >>> a normal filesystem to Solr but might be ch
Re: Schema Browser API
The Luke request handler may do what you are asking for already? This is coming directly from Lucene and doesn't rely on what Solr has in the schema information. /admin/luke https://lucene.apache.org/solr/guide/7_7/implicit-requesthandlers.html https://cwiki.apache.org/confluence/display/SOLR/LukeRequestHandler PS - There is also the ability to run Luke standalone over Lucene indices. Kevin Risden On Thu, Apr 9, 2020 at 3:34 PM Webster Homer wrote: > > I was just looking at the Schema Browser for one of our collections. It's > pretty handy. I was thinking that it would be useful to create a tool that > would create a report about what fields were indexed had docValues, were > multivalued etc... > > Has someone built such a tool? I want it to aid in estimating memory > requirements for our collections. > > I'm currently running solr 7.7.2 > > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to any > other person. If you have received this transmission in error, please notify > the sender immediately and delete the message and any attachment from your > system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not > accept liability for any omissions or errors in this message which may arise > as a result of E-Mail-transmission or for damages resulting from any > unauthorized changes of the content of this message and any attachment > thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not > guarantee that this message is free of viruses and does not accept liability > for any damages caused by any virus transmitted therewith. > > > > Click http://www.merckgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer.
Re: CVEs (vulnerabilities) that apply to Solr 8.4.1
https://lucene.apache.org/solr/security.html The security page on the Solr website has details about how to report security items. It also has a link to the wiki page with details about some of these that are false positives. Each version of Solr has dependency updates and addresses different dependency CVEs as they are reported and detected. I haven't looked through what was shared specifically but Solr 8.5 which is under vote addresses at least a few dependency upgrades. Kevin Risden On Fri, Mar 20, 2020 at 10:23 AM Ahlberg, Christopher C. wrote: > Our TRM team (Technology Risk Management) has provided us with the > attached vulnerabilities analysis for Solr 8.4.1, (security issues > extracted below.) > > > > Has anyone out there in the Solr community done anything to document > workarounds or mitigations for any of these identified vulnerabilities in > Solr 8.4.1? Does anyone know if work to address these issues is happening > for subsequent releases? > > > > Any and all comments will be greatly appreciated! > > > > From their analysis: > > Security Issues > > *Threat Level Problem Code > Component > Status* > > *9 *sonatype-2019-0115 jQuery > 1.7.1 Open > > sonatype-2019-0115 com.carrotsearch.randomizedtesting : junit4-ant : > 2.7.2Open > > CVE-2015-1832 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-1832> > org.apache.derby > : derby : 10.9.1.0 Open > > CVE-2015-1832 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-1832> > org.ikasan > : ikasan-solr-distribution : zip : 3.0.0Open > > CVE-2017-1000190 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-1000190> > org.ikasan > : ikasan-solr-distribution : zip : 3.0.0Open > > sonatype-2019-0115 org.ikasan : ikasan-solr-distribution : zip : > 3.0.0Open > > sonatype-2019-0494 org.ikasan : ikasan-solr-distribution : zip : > 3.0.0Open > > *8 *CVE-2019-10088 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-10088> > org.apache.tika > : tika-core : 1.19.1 Open > > CVE-2019-10088 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-10088> > org.ikasan > : ikasan-solr-distribution : zip : 3.0.0Open > > *7 *CVE-2012-0881 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-0881> > apache-xerces > : xercesImpl : 2.9.1 Open > > CVE-2013-4002 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-4002> > apache-xerces > : xercesImpl : 2.9.1 Open > > CVE-2019-14262 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14262> > com.drewnoakes > : metadata-extractor : 2.11.0Open > > CVE-2019-12402 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-12402> > org.apache.commons > : commons-compress : 1.18 Open > > CVE-2019-10094 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-10094> > org.apache.tika > : tika-core : 1.19.1 Open > > CVE-2012-0881 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-0881> > org.ikasan > : ikasan-solr-distribution : zip : 3.0.0Open > > CVE-2013-4002 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-4002> > org.ikasan > : ikasan-solr-distribution : zip : 3.0.0Open > > CVE-2014-0114 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-0114> > org.ikasan > : ikasan-solr-distribution : zip : 3.0.0Open > > CVE-2019-10094 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-10094> > org.ikasan > : ikasan-solr-distribution : zip : 3.0.0Open > > CVE-2019-12086 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-12086> > org.ikasan > : ikasan-solr-distribution : zip : 3.0.0Open > > CVE-2019-12402 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-12402> > org.ikasan > : ikasan-solr-distribution : zip : 3.0.0Open > > CVE-2019-14262 > <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14262> > org.ikasan > : ikasan-solr-distribution : zip : 3.0.0Op
Re: Oracle OpenJDK to Amazon Corretto OpenJDK
guage = en user.name = root user.timezone = openjdk version "11.0.6" 2020-01-14 LTS OpenJDK Runtime Environment Corretto-11.0.6.10.1 (build 11.0.6+10-LTS) OpenJDK 64-Bit Server VM Corretto-11.0.6.10.1 (build 11.0.6+10-LTS, mixed mode) Kevin Risden On Fri, Jan 31, 2020 at 1:25 PM Kevin Risden wrote: > Whoops forgot to share the same output from latest. The docker images are > clearly building from AdoptOpenJDK so specification vendor is potentially > misleading? > > ➜ ~ docker pull solr > Using default tag: latest > latest: Pulling from library/solr > Digest: > sha256:ef1f2241c1aa51746aa3ad05570123eef128d98e91bc07336c37f2a1b37df7a9 > Status: Image is up to date for solr:latest > docker.io/library/solr:latest > ➜ ~ docker run --rm -it solr bash -c "java -XshowSettings:properties > -version" > Property settings: > awt.toolkit = sun.awt.X11.XToolkit > file.encoding = UTF-8 > file.separator = / > java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment > java.awt.printerjob = sun.print.PSPrinterJob > java.class.path = > java.class.version = 55.0 > java.home = /usr/local/openjdk-11 > java.io.tmpdir = /tmp > java.library.path = /usr/java/packages/lib > /usr/lib64 > /lib64 > /lib > /usr/lib > java.runtime.name = OpenJDK Runtime Environment > java.runtime.version = 11.0.6+10 > java.specification.name = Java Platform API Specification > java.specification.vendor = Oracle Corporation > java.specification.version = 11 > java.vendor = Oracle Corporation > java.vendor.url = http://java.oracle.com/ > java.vendor.url.bug = http://bugreport.java.com/bugreport/ > java.vendor.version = 18.9 > java.version = 11.0.6 > java.version.date = 2020-01-14 > java.vm.compressedOopsMode = 32-bit > java.vm.info = mixed mode > java.vm.name = OpenJDK 64-Bit Server VM > java.vm.specification.name = Java Virtual Machine Specification > java.vm.specification.vendor = Oracle Corporation > java.vm.specification.version = 11 > java.vm.vendor = Oracle Corporation > java.vm.version = 11.0.6+10 > jdk.debug = release > line.separator = \n > os.arch = amd64 > os.name = Linux > os.version = 4.19.76-linuxkit > path.separator = : > sun.arch.data.model = 64 > sun.boot.library.path = /usr/local/openjdk-11/lib > sun.cpu.endian = little > sun.cpu.isalist = > sun.io.unicode.encoding = UnicodeLittle > sun.java.launcher = SUN_STANDARD > sun.jnu.encoding = UTF-8 > sun.management.compiler = HotSpot 64-Bit Tiered Compilers > sun.os.patch.level = unknown > user.dir = /opt/solr-8.4.1 > user.home = /home/solr > user.language = en > user.name = solr > user.timezone = > > openjdk version "11.0.6" 2020-01-14 > OpenJDK Runtime Environment 18.9 (build 11.0.6+10) > OpenJDK 64-Bit Server VM 18.9 (build 11.0.6+10, mixed mode) > > Kevin Risden > > > On Fri, Jan 31, 2020 at 1:22 PM Kevin Risden wrote: > >> What specific Solr tag are you using? That looks like JDK 1.8 and an >> older version. >> >> Just picking the current latest as an example: >> >> >> https://github.com/docker-solr/docker-solr/blob/394ead2fa128d90afb072284bce5f1715345c53c/8.4/Dockerfile >> >> which uses openjdk:11-stretch >> >> and looking up that is >> >> >> https://github.com/docker-library/openjdk/blob/1b6e2ef66a086f47315f5d05ecf7de3dae7413f2/11/jdk/Dockerfile#L36 >> >> This is JDK 11 and not JDK 1.8. >> >> Even openjdk:8-stretch >> >> >> https://github.com/docker-library/openjdk/blob/a886db8d5ea96b7bc0104b2f55fabd44bcb5e7c0/8/jdk/Dockerfile#L36 >> >> So maybe you have an older Solr docker tag? >> >> Kevin Risden >> >> >> On Fri, Jan 31, 2020 at 1:13 PM Walter Underwood >> wrote: >> >>> Maybe you can give them an estimate of how much work it will be. See if >>> legal will put it on their budget. Free software isn’t free, especially the >>> “free kittens” kind. >>> >>> This guy offers consulting for custom Docker images. >>> >>> https://pythonspeed.com/about/ >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>> > On Jan 31, 2020, at 9:45 AM, Arnold Bronley >>> wrote: >>> > >>> > Thanks for the helpful information. It is a no-go because even though >>> it is >>> > OpenJDK and free, vendor is Orac
Re: Oracle OpenJDK to Amazon Corretto OpenJDK
Whoops forgot to share the same output from latest. The docker images are clearly building from AdoptOpenJDK so specification vendor is potentially misleading? ➜ ~ docker pull solr Using default tag: latest latest: Pulling from library/solr Digest: sha256:ef1f2241c1aa51746aa3ad05570123eef128d98e91bc07336c37f2a1b37df7a9 Status: Image is up to date for solr:latest docker.io/library/solr:latest ➜ ~ docker run --rm -it solr bash -c "java -XshowSettings:properties -version" Property settings: awt.toolkit = sun.awt.X11.XToolkit file.encoding = UTF-8 file.separator = / java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment java.awt.printerjob = sun.print.PSPrinterJob java.class.path = java.class.version = 55.0 java.home = /usr/local/openjdk-11 java.io.tmpdir = /tmp java.library.path = /usr/java/packages/lib /usr/lib64 /lib64 /lib /usr/lib java.runtime.name = OpenJDK Runtime Environment java.runtime.version = 11.0.6+10 java.specification.name = Java Platform API Specification java.specification.vendor = Oracle Corporation java.specification.version = 11 java.vendor = Oracle Corporation java.vendor.url = http://java.oracle.com/ java.vendor.url.bug = http://bugreport.java.com/bugreport/ java.vendor.version = 18.9 java.version = 11.0.6 java.version.date = 2020-01-14 java.vm.compressedOopsMode = 32-bit java.vm.info = mixed mode java.vm.name = OpenJDK 64-Bit Server VM java.vm.specification.name = Java Virtual Machine Specification java.vm.specification.vendor = Oracle Corporation java.vm.specification.version = 11 java.vm.vendor = Oracle Corporation java.vm.version = 11.0.6+10 jdk.debug = release line.separator = \n os.arch = amd64 os.name = Linux os.version = 4.19.76-linuxkit path.separator = : sun.arch.data.model = 64 sun.boot.library.path = /usr/local/openjdk-11/lib sun.cpu.endian = little sun.cpu.isalist = sun.io.unicode.encoding = UnicodeLittle sun.java.launcher = SUN_STANDARD sun.jnu.encoding = UTF-8 sun.management.compiler = HotSpot 64-Bit Tiered Compilers sun.os.patch.level = unknown user.dir = /opt/solr-8.4.1 user.home = /home/solr user.language = en user.name = solr user.timezone = openjdk version "11.0.6" 2020-01-14 OpenJDK Runtime Environment 18.9 (build 11.0.6+10) OpenJDK 64-Bit Server VM 18.9 (build 11.0.6+10, mixed mode) Kevin Risden On Fri, Jan 31, 2020 at 1:22 PM Kevin Risden wrote: > What specific Solr tag are you using? That looks like JDK 1.8 and an older > version. > > Just picking the current latest as an example: > > > https://github.com/docker-solr/docker-solr/blob/394ead2fa128d90afb072284bce5f1715345c53c/8.4/Dockerfile > > which uses openjdk:11-stretch > > and looking up that is > > > https://github.com/docker-library/openjdk/blob/1b6e2ef66a086f47315f5d05ecf7de3dae7413f2/11/jdk/Dockerfile#L36 > > This is JDK 11 and not JDK 1.8. > > Even openjdk:8-stretch > > > https://github.com/docker-library/openjdk/blob/a886db8d5ea96b7bc0104b2f55fabd44bcb5e7c0/8/jdk/Dockerfile#L36 > > So maybe you have an older Solr docker tag? > > Kevin Risden > > > On Fri, Jan 31, 2020 at 1:13 PM Walter Underwood > wrote: > >> Maybe you can give them an estimate of how much work it will be. See if >> legal will put it on their budget. Free software isn’t free, especially the >> “free kittens” kind. >> >> This guy offers consulting for custom Docker images. >> >> https://pythonspeed.com/about/ >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> > On Jan 31, 2020, at 9:45 AM, Arnold Bronley >> wrote: >> > >> > Thanks for the helpful information. It is a no-go because even though >> it is >> > OpenJDK and free, vendor is Oracle and legal dept. at our company is >> trying >> > to get away from anything Oracle. >> > It is little paranoid reaction, I agree. >> > >> > See the java.vendor property in following output. >> > >> > $ java -XshowSettings:properties -version >> > Property settings: >> >awt.toolkit = sun.awt.X11.XToolkit >> >file.encoding = UTF-8 >> >file.encoding.pkg = sun.io >> >file.separator = / >> >java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment >> >java.awt.printerjob = sun.print.PSPrinterJob >> >java.class.path = . >> >java.class.version = 52.0 >> >java.endorsed.dirs = >> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/endorsed >> >java.ext.dirs = /usr/lib/jvm/java-8-openjdk-amd64/jre/lib
Re: Oracle OpenJDK to Amazon Corretto OpenJDK
What specific Solr tag are you using? That looks like JDK 1.8 and an older version. Just picking the current latest as an example: https://github.com/docker-solr/docker-solr/blob/394ead2fa128d90afb072284bce5f1715345c53c/8.4/Dockerfile which uses openjdk:11-stretch and looking up that is https://github.com/docker-library/openjdk/blob/1b6e2ef66a086f47315f5d05ecf7de3dae7413f2/11/jdk/Dockerfile#L36 This is JDK 11 and not JDK 1.8. Even openjdk:8-stretch https://github.com/docker-library/openjdk/blob/a886db8d5ea96b7bc0104b2f55fabd44bcb5e7c0/8/jdk/Dockerfile#L36 So maybe you have an older Solr docker tag? Kevin Risden On Fri, Jan 31, 2020 at 1:13 PM Walter Underwood wrote: > Maybe you can give them an estimate of how much work it will be. See if > legal will put it on their budget. Free software isn’t free, especially the > “free kittens” kind. > > This guy offers consulting for custom Docker images. > > https://pythonspeed.com/about/ > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Jan 31, 2020, at 9:45 AM, Arnold Bronley > wrote: > > > > Thanks for the helpful information. It is a no-go because even though it > is > > OpenJDK and free, vendor is Oracle and legal dept. at our company is > trying > > to get away from anything Oracle. > > It is little paranoid reaction, I agree. > > > > See the java.vendor property in following output. > > > > $ java -XshowSettings:properties -version > > Property settings: > >awt.toolkit = sun.awt.X11.XToolkit > >file.encoding = UTF-8 > >file.encoding.pkg = sun.io > >file.separator = / > >java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment > >java.awt.printerjob = sun.print.PSPrinterJob > >java.class.path = . > >java.class.version = 52.0 > >java.endorsed.dirs = > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/endorsed > >java.ext.dirs = /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext > >/usr/java/packages/lib/ext > >java.home = /usr/lib/jvm/java-8-openjdk-amd64/jre > >java.io.tmpdir = /tmp > >java.library.path = /usr/java/packages/lib/amd64 > >/usr/lib/x86_64-linux-gnu/jni > >/lib/x86_64-linux-gnu > >/usr/lib/x86_64-linux-gnu > >/usr/lib/jni > >/lib > >/usr/lib > >java.runtime.name = OpenJDK Runtime Environment > >java.runtime.version = 1.8.0_181-8u181-b13-1~deb9u1-b13 > >java.specification.name = Java Platform API Specification > >java.specification.vendor = Oracle Corporation > >java.specification.version = 1.8 > >java.vendor = Oracle Corporation > >java.vendor.url = http://java.oracle.com/ > >java.vendor.url.bug = http://bugreport.sun.com/bugreport/ > >java.version = 1.8.0_181 > >java.vm.info = mixed mode > >java.vm.name = OpenJDK 64-Bit Server VM > >java.vm.specification.name = Java Virtual Machine Specification > >java.vm.specification.vendor = Oracle Corporation > >java.vm.specification.version = 1.8 > >java.vm.vendor = Oracle Corporation > >java.vm.version = 25.181-b13 > >line.separator = \n > >os.arch = amd64 > >os.name = Linux > >os.version = 4.9.0-8-amd64 > >path.separator = : > >sun.arch.data.model = 64 > >sun.boot.class.path = > > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/resources.jar > >/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar > >/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/sunrsasign.jar > >/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jsse.jar > >/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jce.jar > >/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/charsets.jar > >/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jfr.jar > >/usr/lib/jvm/java-8-openjdk-amd64/jre/classes > >sun.boot.library.path = > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64 > >sun.cpu.endian = little > >sun.cpu.isalist = > >sun.io.unicode.encoding = UnicodeLittle > >sun.java.launcher = SUN_STANDARD > >sun.jnu.encoding = UTF-8 > >sun.management.compiler = HotSpot 64-Bit Tiered Compilers > >sun.os.patch.level = unknown > >user.country = US > >user.dir = /opt/solr > >user.home = /home/solr > >user.language = en > >user.name = solr > >user.timezone = > > > > openjdk version "1.8.0_181" > > OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-1~deb9u1-b13) > > OpenJDK 64-Bit Server VM (build
Re: SQL selectable fields
So I haven't looked at this in a few years, but the columns should be registered in the SQL catalog so you should be able to ask via SQL for all the columns. describe table or using the JDBC metadata should work. There may be some edge cases where depending on sharding you get into a case where the columns aren't registered since we look at Luke to determine what fields are really there for type information. Kevin Risden On Fri, Jan 24, 2020 at 9:48 AM Joel Bernstein wrote: > Does "_nest_path_" come back in a normal search? I would expect that the > fields that are returned by normal searches would also work in SQL. If that > turns out to be the case you could derive the fields from performing a > search and seeing what fields are returned. > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Thu, Jan 23, 2020 at 3:02 PM Nick Vercammen > > wrote: > > > Hey All, > > > > is there a way to get a list of all fields in a collection that can be > used > > in an SQL query? Currently I retrieve a list of fields through the schema > > api: GET col/schema/fields. > > > > This returns all fields in a collection. But when I do a select on all > > fields I get an exception because apparently _nest_path_ is no column in > > the collection table: > > > > Failed to execute sqlQuery 'SELECT films._text_ AS text, > films._nest_path_ > > FROM films LIMIT 2000' against JDBC connection 'jdbc:calcitesolr:'. > > Error while executing SQL "SELECT films._text_ AS text, > films._nest_path_ > > FROM films LIMIT 2000": From line 1, column 37 to line 1, column 47: > Column > > '_nest_path_' not found in table 'films' > > > > Can I determine which fields can be used in a SQL query? By means of the > > type? > > > > kind regards, > > > > Nick > > >
Re: ConnectionImpl.isValid() does not behave as described in Connection javadocs
Nick - Feel free to open a Jira and PR. I think the disconnect is the meaning of timeout=0 between JDBC and the Solr client. Kevin Risden On Sun, Jan 19, 2020 at 3:34 PM Nick Vercammen wrote: > I think so as the ConnectionImpl in solr is not in line with the > description of the java connection interface > > > Op 19 jan. 2020 om 21:23 heeft Erick Erickson > het volgende geschreven: > > > > Is this a Solr issue? > > > >> On Sun, Jan 19, 2020, 14:24 Nick Vercammen > >> wrote: > >> > >> Hello, > >> > >> I'm trying to write a solr driver for metabase. Internally metabase > uses a > >> C3P0 connection pool. Upon checkout of the connection from the pool the > >> library does a call to isValid(0) (timeout = 0) > >> > >> According to the javadocs ( > >> > >> > https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/Connection.html#isValid(int) > >> ) > >> a > >> timeout = 0 means no timeout. In the current implementation a timeout = > 0 > >> means that the connection is always invalid. > >> > >> I can provide a PR for this. > >> > >> Nick > >> > >> -- > >> [image: Zeticon] > >> Nick Vercammen > >> CTO > >> +32 9 275 31 31 > >> +32 471 39 77 36 > >> nick.vercam...@zeticon.com > >> <https://www.facebook.com/MediaHaven-1536452166583533/> > >> <https://www.linkedin.com/company/zeticon/> < > >> https://twitter.com/mediahaven> > >> www.zeticon.com > >> >
Re: Solr 8.4.0 Cloud Graph is not shown due to CSP
So this is caused by SOLR-13982 [1] and specifically SOLR-13987 [2]. Can you open a new Jira specifically for this? It would be great if you could capture from Chrome Dev Tools (or Firefox) the error message around what specifically CSP is complaining about. The other thing to ensure is that you force refresh the UI to make sure nothing is cached. Idk if that is in play here but doesn't hurt. [1] https://issues.apache.org/jira/browse/SOLR-13982 [2] https://issues.apache.org/jira/browse/SOLR-13987 Kevin Risden On Tue, Jan 7, 2020, 11:15 Jörn Franke wrote: > Dear all, > > I noted that in Solr Cloud 8.4.0 the graph is not shown due to > Content-Security-Policy. Apparently it violates unsafe-eval. > It is a minor UI thing, but should I create an issue to that one? Maybe it > is rather easy to avoid in the source code of the admin page? > > Thank you. > > Best regards > > >
Re: Client Cert Broken in Solr 8.2.0 because of a Jetty Issue (workaround included)
Thanks for the report Ryan. It looks like this fell through the cracks and was reported a second time in Jira. https://issues.apache.org/jira/browse/SOLR-14106 I have a patch up there that should help with some comments about multiple clientAuth certificates. Kevin Risden On Fri, Sep 27, 2019 at 1:04 PM Ryan Rockenbaugh wrote: > All, > If you are using client authentication with SSL in Solr > (SOLR_SSL_NEED_CLIENT_AUTH=true or SOLR_SSL_WANT_CLIENT_AUTH=true), be > advised that Jetty made a change that will break Solr 8.2.0 > The version of Jetty packaged with Solr 8.2.0 changed to 9.4.19.v20190610 > (see > https://lucene.apache.org/solr/8_2_0/changes/Changes.html#v8.2.0.versions_of_major_components > ) > The official Jetty issue is here: > https://github.com/eclipse/jetty.project/issues/3554 > The stated fix is: > Set endpointIdentificationAlgorithm=null or better yet use > SslContextFactory.Server instead of a plain SslContextFactory. > I found I couldn't change the class from SslContextFactory to > SslContextFactory.Server > My workaround was to update the file server/etc/jetty-ssl.xml, adding the > following entry to the element: > > > Thanks, > Ryan Rockenbaugh > > > > > > "Do all the good you can, By all the means you can, In all the ways > you can, In all the places you can, At all the times you can, To all > the people you can, As long as ever you can." > > - John Wesley
Re: CVE-2017-7525 fix for Solr 7.7.x
There are no specific plans for any 7.x branch releases that I'm aware of. Specifically for SOLR-13110, that required upgrading Hadoop 2.x to 3.x for specifically jackson-mapper-asl and there are no plans to backport that to 7.x even if there was a future 7.x release. Kevin Risden On Wed, Dec 18, 2019 at 8:44 AM Mehai, Lotfi wrote: > Hello; > > We are using Solr 7.7.0. The CVE-2017-7525 have been fixed for Solr 8.x. > https://issues.apache.org/jira/browse/SOLR-13110 > > When the fix will be available for Solr 7.7.x > > Lotfi >
Re: Active directory integration in Solr
So I wrote the blog more of an experiment above. I don't know if it is fully operating other than on a single node. That being said, the Hadoop authentication plugin doesn't require running on HDFS. It just uses the Hadoop code to do authentication. I will echo what Jorn said though - I wouldn't expose Solr to the internet or directly without some sort of API. Whether you do authentication/authorization at the API is a separate question. Kevin Risden On Wed, Nov 20, 2019 at 1:54 PM Jörn Franke wrote: > I would not give users directly access to Solr - even with LDAP plugin. > Build a rest interface or web interface that does the authentication and > authorization and security sanitization. Then you can also manage better > excessive queries or explicitly forbid certain type of queries (eg specific > streaming expressions - I would not expose all of them to users). > > > Am 19.11.2019 um 11:02 schrieb Kommu, Vinodh K. : > > > > Thanks Charlie. > > > > We are already using Basic authentication in our existing clusters, > however it's getting difficult to maintain number of users as we are > getting too many requests for readonly access from support teams. So we > desperately looking for active directory solution. Just wondering if > someone might have same requirement need. > > > > > > Regards, > > Vinodh > > > > -Original Message- > > From: Charlie Hull > > Sent: Tuesday, November 19, 2019 2:55 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Active directory integration in Solr > > > > ATTENTION! This email originated outside of DTCC; exercise caution. > > > > Not out of the box, there are a few authentication plugins bundled but > not for AD > > > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F7_2%2Fauthentication-and-authorization-plugins.htmldata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245309858sdata=fkahJ62aWFYh7QxcyFQbJV9u8OsTYSWp6pv0MNdzjps%3Dreserved=0 > > - there's also some useful stuff in Apache ManifoldCF > > > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.francelabs.com%2Fblog%2Ftutorial-on-authorizations-for-manifold-cf-and-solr%2Fdata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245319858sdata=iYiKRDJKYBZaxUd%2F%2BIddFBwxB2RhSqih2KZc26aZlRU%3Dreserved=0 > > > > > > Best > > > > Charlie > > > >> On 18/11/2019 15:08, Kommu, Vinodh K. wrote: > >> Hi, > >> > >> Does anyone know that Solr has any out of the box capability to > integrate Active directory (using LDAP) when security is enabled? Instead > of creating users in security.json file, planning to use users who already > exists in active directory so they can use their individual credentials > rather than defining in Solr. Did anyone came across similar requirement? > If so was there any working solution? > >> > >> > >> Thanks, > >> Vinodh > >> > >> DTCC DISCLAIMER: This email and any files transmitted with it are > confidential and intended solely for the use of the individual or entity to > whom they are addressed. If you have received this email in error, please > notify us immediately and delete the email and any attachments from your > system. The recipient should check this email and any attachments for the > presence of viruses. The company accepts no liability for any damage caused > by any virus transmitted by this email. > >> > > > > -- > > Charlie Hull > > Flax - Open Source Enterprise Search > > > > tel/fax: +44 (0)8700 118334 > > mobile: +44 (0)7767 825828 > > web: > https://nam02.safelinks.protection.outlook.com/?url=www.flax.co.ukdata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245319858sdata=YNGIg%2FVgL2w82i3JWsBkBTJeefHMjSxbjLaQyOdJVt0%3Dreserved=0 > > > > DTCC DISCLAIMER: This email and any files transmitted with it are > confidential and intended solely for the use of the individual or entity to > whom they are addressed. If you have received this email in error, please > notify us immediately and delete the email and any attachments from your > system. The recipient should check this email and any attachments for the > presence of viruses. The company accepts no liability for any damage caused > by any virus transmitted by this email. > > >
Re: Clustering error in Solr 8.2.0
According to the stack trace: java.lang.NoClassDefFoundError: org/apache/commons/lang/ObjectUtils at lingo3g.s.hashCode(Unknown Source) It looks like lingo3g - lingo3g isn't on Maven central and looks like it requires a license to download. You would have to contact them to see if it still uses commons-lang. You could also copy in commons-lang dependency. Kevin Risden On Thu, Aug 8, 2019 at 10:23 PM Zheng Lin Edwin Yeo wrote: > Hi Erick, > > Thanks for your reply. > > My clustering code is taken as it is from the Solr package, only the codes > related to lingo3g is taken from previous version. > > Below are the 3 files that I have taken from previous version: > - lingo3g-1.15.0 > - morfologik-fsa-2.1.1 > - morfologik-stemming-2.1.1 > > Does anyone of these could have caused the error? > > Regards, > Edwin > > On Thu, 8 Aug 2019 at 19:56, Erick Erickson > wrote: > > > This dependency was removed as part of > > https://issues.apache.org/jira/browse/SOLR-9079, so my guess is you’re > > pointing to an old version of the clustering code. > > > > Best, > > Erick > > > > > On Aug 8, 2019, at 4:22 AM, Zheng Lin Edwin Yeo > > wrote: > > > > > > ObjectUtils > > > > >
Re: Solr on HDFS
> > If you think about it, having a shard with 3 replicas on top of a file system that does 3x replication seems a little excessive! https://issues.apache.org/jira/browse/SOLR-6305 should help here. I can take a look at merging the patch since looks like it has been helpful to others. Kevin Risden On Fri, Aug 2, 2019 at 10:09 AM Joe Obernberger < joseph.obernber...@gmail.com> wrote: > Hi Kyle - Thank you. > > Our current index is split across 3 solr collections; our largest > collection is 26.8TBytes (80.5TBytes when 3x replicated in HDFS) across > 100 shards. There are 40 machines hosting this cluster. We've found > that when dealing with large collections having no replicas (but lots of > shards) ends up being more reliable since there is a much smaller > recovery time. We keep another 30 day index (1.4TBytes) that does have > replicas (40 shards, 3 replicas each), and if a node goes down, we > manually delete lock files and then bring it back up and yes - lots of > network IO, but it usually recovers OK. > > Having a large collection like this with no replicas seems like a recipe > for disaster. So, we've been experimenting with the latest version > (8.2) and our index process to split up the data into many solr > collections that do have replicas, and then build the list of > collections to search at query time. Our searches are date based, so we > can define what collections we want to query at query time. As a test, > we ran just two machines, HDFS, and 500 collections. One server ran out > of memory and crashed. We had over 1,600 lock files to delete. > > If you think about it, having a shard with 3 replicas on top of a file > system that does 3x replication seems a little excessive! I'd love to > see Solr take more advantage of a shared FS. Perhaps an idea is to use > HDFS but with an NFS gateway. Seems like that may be slow. > Architecturally, I love only having one large file system to manage > instead of lots of individual file systems across many machines. HDFS > makes this easy. > > -Joe > > On 8/2/2019 9:10 AM, lstusr 5u93n4 wrote: > > Hi Joe, > > > > We fought with Solr on HDFS for quite some time, and faced similar issues > > as you're seeing. (See this thread, for example:" > > > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201812.mbox/%3cCABd9LjTeacXpy3FFjFBkzMq6vhgu7Ptyh96+w-KC2p=-rqk...@mail.gmail.com%3e > > ) > > > > The Solr lock files on HDFS get deleted if the Solr server gets shut down > > gracefully, but we couldn't always guarantee that in our environment so > we > > ended up writing a custom startup script to search for lock files on HDFS > > and delete them before solr startup. > > > > However, the issue that you mention of the Solr server rebuilding its > whole > > index from replicas on startup was enough of a show-stopper for us that > we > > switched away from HDFS to local disk. It literally made the difference > > between 24+ hours of recovery time after an unexpected outage to less > than > > a minute... > > > > If you do end up finding a solution to this issue, please post it to this > > mailing list, because there are others out there (like us!) who would > most > > definitely make use it. > > > > Thanks > > > > Kyle > > > > On Fri, 2 Aug 2019 at 08:58, Joe Obernberger < > joseph.obernber...@gmail.com> > > wrote: > > > >> Thank you. No, while the cluster is using Cloudera for HDFS, we do not > >> use Cloudera to manager the solr cluster. If it is a > >> configuration/architecture issue, what can I do to fix it? I'd like a > >> system where servers can come and go, but the indexes stay available and > >> recover automatically. Is that possible with HDFS? > >> While adding an alias to other collections would be an option, if that > >> collection is the only collection, or one that is currently needed, in a > >> live system, we can't bring it down, re-create it, and re-index when > >> that process may take weeks to do. > >> > >> Any ideas? > >> > >> -Joe > >> > >> On 8/1/2019 6:15 PM, Angie Rabelero wrote: > >>> I don’t think you’re using claudera or ambari, but ambari has an option > >> to delete the locks. This seems more a configuration/architecture isssue > >> than a realibility issue. You may want to spin up an alias while you > bring > >> down, clear locks and directories, recreate and index the affected > >> collection, while you work your other isues. > >>> On Aug 1, 2019, at 16:40, Joe Obernberger < > joseph.obernb
Re: How to use Parallel SQL Interface when basic auth is enabled on Solr cluster
Pretty sure you are running into https://issues.apache.org/jira/browse/SOLR-8213 Always looking for patches to help improve things :) Kevin Risden On Wed, Jul 24, 2019 at 4:50 AM Suril Shah wrote: > Hi, > I am using Solr Version 7.6.0 where Basic Authentication is enabled. I am > trying to use Parallel SQL to run some SQL queries. > > This is the code snippet that I am using to connect to Solr and run some > SQL queries on it. This works when authentication is not enabled on the > Solr cluster. > >public Connection getSolrSqlJDBCConnection(String aggregationMode) > throws SQLException > { > > String solrZkConnString = ":2181"; > > String collection = "customer"; > > String numWorkers = "2"; > > Connection solrSqlClientConn = null; > > try { > > solrSqlClientConn = DriverManager.*getConnection*("jdbc:solr://" + > solrZkConnString + "?collection=" > > + collection + "=" + aggregationMode + > "=" + numWorkers); > > > > } catch (SQLException e) { > > throw e; > > } > > return solrSqlClientConn; > > } > > > > public ResultSet executeSolrSqlStatement(String querySqlString, String > aggregationMode) throwsSQLException { > > try { > > Connection solrSqlJdbcConn = > getSolrSqlJDBCConnection(aggregationMode); > > Statement stmt = solrSqlJdbcConn.createStatement(); > > ResultSet rs = stmt.executeQuery(querySqlString); > > solrSqlJdbcConn.close(); > > return rs; > > } catch (SQLException e) { > > throw e; > > } > > } > > > >public void testSelectOnIndex() throws SQLException { > > String owner_id = "3cfc7734-e4b4-4c9b-b91e-44c8c5943fb0"; > > String solrSqlString = "select customer_id_s, customer_name_s, > country_s, city_s, postal_code_s, address_s from customer where owner_id_s > = '"+owner_id+"'"; > > System.*out*.println("solrSqlString = "+solrSqlString); > > try { > > ResultSet sqlResultSet = executeSolrSqlStatement(solrSqlString, > "map_reduce"); > > while (sqlResultSet.next()) { > > System.*out*.println("--- customer_id ---" + > sqlResultSet.getString("customer_id_s")); > > System.*out*.println("--- customer_name ---" + > sqlResultSet.getString("customer_name_s")); > > System.*out*.println("--- country ---" + > sqlResultSet.getString("country_s")); > > System.*out*.println("--- city ---" + > sqlResultSet.getString("city_s")); > > System.*out*.println("--- postalcode ---" + > sqlResultSet.getString("postal_code_s")); > > System.*out*.println("--- address ---" + > sqlResultSet.getString("address_s")); > > } > > } catch (SQLException e) { > > e.printStackTrace(); > > } > > } > > > When authentication is enabled I tried adding the username and password > JDBC connection string. > > Replaced one line in the getSolrSqlJDBCConnection() method: > >solrSqlClientConn = DriverManager.*getConnection*("jdbc:solr://" + > solrZkConnString + "?collection=" + collection + "=" + > aggregationMode + "=" + numWorkers,"",""); > > > The and here will be the username password for Solr. > > > On making the above change, we are getting the following error: > > > > java.sql.SQLException: java.sql.SQLException: java.io.IOException: --> > http://:8983/solr/customer_shard1_replica_n2/: An exception has > occurred on the server, refer to server log for details. > > at > io.strati.libs.forklift.org.apache.solr.client.solrj.io > .sql.StatementImpl.executeQueryImpl(StatementImpl.java:74) > > at > io.strati.libs.forklift.org.apache.solr.client.solrj.io > .sql.StatementImpl.executeQuery(StatementImpl.java:111) > > at io.strati.search.Test.executeSolrSqlStatement(Test.java:54) > > at io.strati.search.Test.main(Test.java:20) > > Caused by: java.sql.SQLException: java.io.IOException: --> http:// > :8983/solr/customer_shard1_replica_n2/: An exception has occurred > on the server, refer to server log for details. > > at > io.strati.libs.forklift.org.apache.solr.client.solrj.io > .sql.ResultSetImpl.(ResultSetImpl.java:83) > > at > io.strati.libs.forklift.org.apache.solr.client.solrj.io > .sql.StatementImpl.executeQueryImpl(StatementImpl.java:
Re: Solr Cloud Kerberos cookie rejected spnego
I don't think a Kerberos ticket without the hostname makes sense. You almost always need a valid hostname and DNS for Kerberos to work successfully. Kevin Risden On Sun, Jun 23, 2019 at 10:54 AM Rakesh Enjala wrote: > Hi Team, > > Enabled solrcloud-7.4.0 with kerberos. While creating a collection getting > below error > > org.apache.http.impl.auth.HttpAuthenticator; NEGOTIATE authentication > error: No valid credentials provided (Mechanism level: No valid credentials > provided (Mechanism level: Server not found in Kerberos database (7))) > org.apache.http.client.protocol.ResponseProcessCookies; Cookie rejected > [hadoop.auth="", version:0, domain:xxx.xxx.com, path:/, expiry: > Illegal > domain attribute "". Domain of origin: "localhost" > > enabled krb5 debug true and am able to find the actual problem is that > sname is HTTP/localh...@realm.com, it should be HTTP/@DOMAIN1.COM not the > localhost > > solr.in.sh > > SOLR_AUTH_TYPE="kerberos" > > SOLR_AUTHENTICATION_OPTS="-DauthenticationPlugin=org.apache.solr.security.KerberosPlugin > -Djava.security.auth.login.config=/solr/jaas.conf > -Dsun.security.krb5.debug=true -Dsolr.kerberos.cookie.domain= > -Dsolr.kerberos.name.rules=DEFAULT -Dsolr.kerberos.principal=HTTP/@ > DOMAIN1.COM -Dsolr.kerberos.keytab=/solr/HTTP.keytab" > > Please help me out! > *Regards,* > *Rakesh Enjala* >
Re: Odd error with Solr 8 log / ingestion
Do you see a message about idle timeout? There is a jetty bug with HTTP/2 and idle timeout that causes some stream closing. The jira below says test error, but I'm pretty sure it could come up in real usage. * https://issues.apache.org/jira/browse/SOLR-13413 * https://github.com/eclipse/jetty.project/issues/3605 Kevin Risden On Thu, Jun 6, 2019 at 2:38 PM Erick Erickson wrote: > Probably your packet size is too big for the Solr<->Solr default settings. > Quick test would be to try sending 10 docs per packet, then 100, then 1,000 > etc. > > There’s not much to be gained efficiency-wise once you get past 100 > docs/shard, see: > https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/ > > Second, you’ll get improved throughput if you use SolrJ rather than a > straight HTTP connection, but your setup may not be amenable to that > alternative. > > Best, > Erick > > > On Jun 6, 2019, at 11:23 AM, Erie Data Systems > wrote: > > > > Hello everyone, > > > > I recently setup Solr 8 in SolrCloud mode, previously I was using > > standalone mode and was able to easily push 10,000 records in per HTTP > call > > wit autocommit. Ingestion occurs when server A pushes (HTTPS) payload to > > server B (SolrCloud) on LAN network. > > > > However, once converted to SolrCloud (1 node, 3 shards, 1 replica) I am > > seeing the following error : > > > > ConcurrentUpdateHttp2SolrClient > > Error consuming and closing http response stream. > > > > Im wondering, what possibly causes could be, im not seeing much > > documentation online specific to Solr. > > > > Thanks in advance for any assistance, > > Craig > >
Re: SolrJ, CloudSolrClient and basic authentication
Chris - not sure if what you are seeing is related to basic auth credentials not being sent until a 401. There was report of this behavior with Apache Knox in front of Solr. https://issues.apache.org/jira/browse/KNOX-1066 The jira above has an example of how to preemptively send basic auth instead of waiting for the 401 from the server. Kevin Risden On Fri, May 31, 2019 at 4:28 PM Christopher Schultz < ch...@christopherschultz.net> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > Dimitris, > > On 6/1/18 02:46, Dimitris Kardarakos wrote: > > Thanks a lot Shawn. I had tried with the documented approach, but > > since I use SolrClient.add to add documents to the index, I could > > not "port" the documented approach to my case (probably I do miss > > something). > > > > The custom HttpClient suggestion worked as expected! > > Can you please explain how you did this? > > I'm facing a problem where the simplest possible solution is giving > the error "org.apache.http.client.NonRepeatableRequestException: > Cannot retry request with a non-repeatable request entity.". > > It seems that SolrClient is using something like BasicHttpEntity which > isn't "repeatable" when using HTTP Basic auth (where the server is > supposed to challenge the client and the client only then sends the > credentials). I need to either make the client data repeatable (which > is in SolrClient, which I'd prefer to avoid) or I need to make > HttpClient use an "expectant" credential-sending technique, or I need > to just stuff things into a header manually. > > What did you do to solve this problem? It seems like this should > really probably come up more often than it does. Maybe nobody bothers > to lock-down their Solr instances? > > Thanks, > - -chris > > > On 31/05/2018 06:16 μμ, Shawn Heisey wrote: > >> On 5/31/2018 8:03 AM, Dimitris Kardarakos wrote: > >>> Following the feedback in the "Index protected zip" thread, I > >>> am trying to add documents to the index using SolrJ API. > >>> > >>> The server is in SolrCloud mode with BasicAuthPlugin for > >>> authentication. > >>> > >>> I have not managed to figure out how to pass username/password > >>> to my client. > >> There are two ways to approach this. > >> > >> One approach is to build a custom HttpClient object that uses > >> credentials by default, and then use that custom HttpClient > >> object to build your CloudSolrClient. Exactly how to correctly > >> build the HttpClient object will depend on exactly which > >> HttpClient version you've included into your program. If you go > >> with SolrJ dependency defaults, then the HttpClient version will > >> depend on the SolrJ version. > >> > >> The other approach is the method described in the documentation, > >> where credentials are added to each request object: > >> > >> https://lucene.apache.org/solr/guide/6_6/basic-authentication-plugin. > html#BasicAuthenticationPlugin-UsingBasicAuthwithSolrJ > <https://lucene.apache.org/solr/guide/6_6/basic-authentication-plugin.html#BasicAuthenticationPlugin-UsingBasicAuthwithSolrJ> > >> > >> > >> > >> > There are several different kinds of request objects. A few examples: > >> UpdateRequest, QueryRequest, CollectionAdminRequest. > >> > >> Thanks, Shawn > >> > > > -BEGIN PGP SIGNATURE- > Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ > > iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlzxjlEACgkQHPApP6U8 > pFhoeQ/7BzlhjGGE8tnMcrdmruP+N2rgvawfLcTdzDg3U4cQFNUVRoCclZcM8LiA > iuZf+cAewTTQTjLpQuSv2WoknQgO/YRgaqTlo+b3hv9zR2awY8Mob/m5RYcYAwmn > i+2SJrG7+u+qhpfDQGSjwppUKpm2WrfvGXL3lcRF48UXQ+z7J95o2g88SnP44FKH > 87/X/iYX+xMsj0bkIEOkyppuXENQQwUZ7QWhgfAxSItJr2A0Ma6zkuuNPf4FvBJ1 > JQM/c33WWbAXK3B7tI5iQsstVi5CMOhRF0Z336/vZgq6aF9uEZvIOWEVAlM+E8Qp > mYlZz7tERzUMs+QbcBcSdDIb8VSPwYy5kvKiJ9eEpjFGXmPBLOqiJ4M+4SOeGFq7 > BA5sbm6k4gwHc33MiKvnHE1K+k3r1OBPngjxvelsyIaqSnX3zpKPTFhkU2dvWMPt > XPo/ICuiliGowD8xh5EhB6w0BuYZhK3dW7AKMCLbyoANwk7SLfHxC6O+rdmYyDQF > UwiR65+3ImmeKJOZt7lFoR43BXoFuz6L1SILU8XRcclS5KwXHg3moBElU7jM9iKV > 9vMwWkuPGUA2gq5K0oV4XFEOShxUxFiCL4FXjd/P7x9Evhio+itvaUlHzP8FGblh > YyK+l2YqjKBnTJ0G4XE8UnJcmH8C23jJ05gwMgq92pXBQy5ly6s= > =6kab > -END PGP SIGNATURE- >
Re: Status of solR / HDFS-v3 compatibility
For Apache Solr 7.x or older yes - Apache Hadoop 2.x was the dependency. Apache Solr 8.0+ has Hadoop 3 compatibility with SOLR-9515. I did some testing to make sure that Solr 8.0 worked on Hadoop 2 as well as Hadoop 3, but the libraries are Hadoop 3. The reference guide for 8.0+ hasn't been released yet, but also don't think it was updated. Kevin Risden On Thu, May 2, 2019 at 9:32 AM Nicolas Paris wrote: > Hi > > solr doc [1] says it's only compatible with hdfs 2.x > is that true ? > > > [1]: http://lucene.apache.org/solr/guide/7_7/running-solr-on-hdfs.html > > -- > nicolas >
Re: solr 7.x sql query returns null
Do you have multiple shards (including replicas) on the same host for the collection in question? Do the number of shards per host change on the export/index? Kevin Risden On Thu, Apr 18, 2019, 20:50 Joel Bernstein wrote: > That stack trace points here: > > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.3.0/solr/core/src/java/org/apache/solr/handler/sql/SolrSchema.java#L103 > > So the Sql Schema is not initializing properly for this dataset. I'd be > interested in understanding why. > > If you want to create a jira ticket and attach your schema we can track > this down. I'll probably attach a special binary to the ticket which has > additional logging so we can can find out what field is causing the > problem. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Thu, Apr 18, 2019 at 1:38 PM David Barnett wrote: > > > Hi Joel, besides the solr log is there anywhere else i need to go ? > > anything I need to set to get more detail ? > > > > On Thu, 18 Apr 2019 at 10:46, Joel Bernstein wrote: > > > > > This let's make sure the jdbc URL is correct. > > > > > > Reloading the collection shouldn't effect much unless the schema is > > > different. > > > > > > But as Shawn mentioned the stack trace is not coming from Solr. Is > there > > > more in the logs beyond the Calcite exception? > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > > > > On Thu, Apr 18, 2019 at 11:04 AM Shawn Heisey > > wrote: > > > > > > > On 4/18/2019 1:47 AM, David Barnett wrote: > > > > > I have a large solr 7.3 collection 400m + documents. > > > > > > > > > > I’m trying to use the Solr JDBC driver to query the data but I get > a > > > > > > > > > > java.io.IOException: Failed to execute sqlQuery 'select id from > > > document > > > > limit 10' against JDBC connection 'jdbc:calcitesolr:'. > > > > > Error while executing SQL "select id from document limit 10": null > > > > > > > > > > > > > > > > By the way, either that JDBC url is extremely incomplete or you nuked > > it > > > > from the log before sharing. Seeing the construction of the full URL > > > > might be helpful. If you need to redact it in some way for privacy > > > > concerns, do so in a way so that we can still tell what the URL was - > > > > change a real password to PASSWORD, change things like host names to > > > > something like HOST_NAME, etc. > > > > > > > > > Caused by: java.lang.NullPointerException > > > > > at > > > > > > > > > > org.apache.calcite.plan.volcano.VolcanoPlanner.validate(VolcanoPlanner.java:891 > > > > > at > > > > > > > > > > > > > > org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:866) > > > > > at > > > > > > > > > > org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:883) > > > > > at > > > > > > > > > > org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:101) > > > > > at > > > > > > > > > > org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:336) > > > > > at > > > > > > > > > > org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1496) > > > > > at > > > > > > > > > > org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:863) > > > > > at > > > > > > > > > > org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:883) > > > > > at > > > > > > > > > > org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:101) > > > > > at > > > > > > > > > > org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:336) > > > > > at > > > > > > > > > > org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1496) > > > > > at > > > > > > > > > > org.apache.c
Re: All replicas created on the same node
Might be https://issues.apache.org/jira/browse/SOLR-13248 >From the upgrade notes 7.7: SOLR-13248: The default replica placement strategy used in Solr has been reverted to the 'legacy' policy used by Solr 7.4 and previous versions. This is due to multiple bugs in the autoscaling based replica placement strategy that was made default in Solr 7.5 which causes multiple replicas of the same shard to be placed on the same node in addition to the maxShardsPerNode and createNodeSet parameters being ignored. Although the default has changed, autoscaling will continue to be used if a cluster policy or preference is specified or a collection level policy is in use. The default replica placement strategy can be changed to use autoscaling again by setting a cluster property: curl -X POST -H 'Content-type:application/json' --data-binary ' { "set-obj-property": { "defaults" : { "cluster": { "useLegacyReplicaAssignment":false } } } }' http://$SOLR_HOST:$SOLR_PORT/api/cluster Kevin Risden On Fri, Mar 8, 2019 at 3:04 PM levtannen wrote: > Hi community, > I have solr 7.6 running on three nodes with about 400 collections with one > shard and 3 replicas per collection. I want replicas to be spread between > all 3 nodes so that for every collection I have one replica per collection > on each node. > I create collections via the SolrJ code. > for (String collectionName: Names>){ > create = > CollectionAdminRequest.createCollection(collectionName, source, > 1, 3); > result = solrClient.request(create); > } > In solr 7.4 it worked fine, but in solr 7.6 created replicas are not spread > equally between nodes. In some collections all 3 replicas are created just > on one node, in some 2 replicas are created in one node and 1 in another > and some collections are created correctly: I replica per node. > Could anyone give me advice on why it happened and how to fix it? > > Thank you. > Lev Tannen > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: solr reads whole index on startup
Kyle - Thanks so much for the followup on this. Rarely do we get to see results compared with detail. Can you share the Solr HDFS configuration settings that you tested with? Blockcache and direct memory size? I'd be curious just as a reference point. Kevin Risden On Thu, Dec 20, 2018 at 10:31 AM lstusr 5u93n4 wrote: > > Hi All, > > To close this off, I'm sad to report that we've come to a end with Solr on > HDFS. > > Here's what we finally did: > - created two brand-new identical Solr cloud clusters, one on HDFS and one > on local disk. > - 1 replica per node. Each node 16GB ram. > - Added documents. > - Compared start-up times for a single node after a graceful shutdown. > > What we observe: > - on startup, the replica will transition from "Gone" to "Down" fairly > quickly. (a few seconds) > - The replica then spends some time in the "Down" state before > transitioning to "Recovering" > - The replica stays in "Recovering" for some time, before transitioning to > "Active" > > Results for 75M docs in the replica, replica size 28.5GB: > > - HDFS > - Time in "Down": 4m 49s > - Time in "Recovering": 2m 30s > - Total time to restart: 7m 9s > > - Local Disk > - Time in "Down": 0m 5s > - Time in "Recovering": 0m 8s > - Total time to restart: 0m 13s > > > Results for 100M docs in the replica, replica size 37GB: > >- HDFS > - Time in "Down": 8m 30s > - Time in "Recovering": 5m 19s > - Total time to restart: 13m 49s > > - Local Disk > - Time in "Down": 0m 4s > - Time in "Recovering": 0m 10s > - Total time to restart: 0m 14s > > > Conclusions: > - As the index size grows, Solr on HDFS has a trend towards increasing > restart times that's not seen on local disk. > > Notes: > - HDFS in our environment is FINE. The network is FINE. We have hbase > servers running on the same ESXi hosts as Solr, they access the same HDFS > filesystem, and hbase bandwidth regularly exceeds 2GB/s. All latencies are > sub-millisecond. > - The values reported above are averages. There's some variance to the > results, but the averages are representative of the times we're seeing. > > Thanks for reading! > > Kyle > > > > On Mon, 10 Dec 2018 at 14:14, lstusr 5u93n4 wrote: > > > Hi Guys, > > > > > What OS is it on? > > CentOS 7 > > > > > With your indexes in HDFS, the HDFS software running > > > inside Solr also needs heap memory to operate, and is probably going to > > > set aside part of the heap for caching purposes. > > We still have the solr.hdfs.blockcache.slab.count parameter set to the > > default of 1, but we're going to tune this a bit and see what happens. > > > > > but for this setup, I'd definitely want a LOT more than 16GB.GB > > So where would you start? We can easily double the number of servers to 6, > > and put one replica on each (probably going to do this anyways.) Would you > > go bigger than 6 x 16GB ? Keeping in mind, even with our little 3 x 16GB we > > haven't had performance problems... This thread kind of diverged that way, > > but really the initial issue was just that the whole index seems to be read > > on startup. (Which I fully understand may be resource related, but I have > > yet to try reproduce on a smaller scale to confirm/deny.) > > > > > As Solr runs, it writes a GC log. Can you share all of the GC log files > > > that Solr has created? There should not be any proprietary information > > > in those files. > > > > This I can do. Actually, I've collected a lot of things, redacted any > > private info, and collected here into a series of logs / screenshots. > > > > So what I did: > > - 16:49 GMT -- stopped solr on one node (node 4) using bin/solr stop, and > > keeping the others alive.. Captured the solr log as it was stopping, and > > uploaded here: > > - https://pastebin.com/raw/UhSTdb1h > > > > - 17:00 GMT - restarted solr on the same node (other two stayed up the > > whole time) and let it run for an hour. Captured the solr logs since the > > startup here: > > - https://pastebin.com/raw/S4Z9XVrG > > > > - Observed the outbound network traffic from HDFS to this particular solr > > instance during this time, screenshotted it, and put the image here: (times > > are in EST for that screenshot) > > - https://imagebin.ca/v/4PY63LAMSVV1 > > > > - Screenshotted the resource usage on the
Re: solr reads whole index on startup
Do you have logs right before the following? "we notice that the nodes go into "Recovering" state for about 10-12 hours before finally coming alive." Is there a peersync failure or something else in the logs indicating why there is a full recovery? Kevin Risden On Wed, Dec 5, 2018 at 12:53 PM lstusr 5u93n4 wrote: > Hi All, > > We have a collection: > - solr 7.5 > - 3 shards, replication factor 2 for a total of 6 NRT replicas > - 3 servers, 16GB ram each > - 2 billion documents > - autoAddReplicas: false > - 2.1 TB on-disk index size > - index stored on hdfs on separate servers. > > If we (gracefully) shut down solr on all 3 servers, when we re-launch solr > we notice that the nodes go into "Recovering" state for about 10-12 hours > before finally coming alive. > > During this recovery time, we notice high network traffic outbound from our > HDFS servers to our solr servers. The sum total of which is roughly > equivalent to the index size on disk. > > So it seems to us that on startup, solr has to re-read the entire index > before coming back alive. > > 1. is this assumption correct? > 2. is there any way to mitigate this, so that solr can launch faster? > > Thanks! > > Kyle >
Re: SolrCloud Replication Failure
Erick Erickson - I don't have much time to chase this down. Do you think this a blocker for 7.6? It seems pretty serious. Jeremy - This would be a good JIRA to create - we can move the conversation there to try to get the right people involved. Kevin Risden On Fri, Nov 2, 2018 at 7:57 AM Jeremy Smith wrote: > Hi Susheel, > > Yes, it appears that under certain conditions, if a follower is down > when the leader gets an update, the follower will not receive that update > when it comes back (or maybe it receives the update and it's then > overwritten by its own transaction logs, I'm not sure). Furthermore, if > that follower then becomes the leader, it will replicate its own out of > date value back to the former leader, even though the version number is > lower. > > >-Jeremy > > > From: Susheel Kumar > Sent: Thursday, November 1, 2018 2:57:00 PM > To: solr-user@lucene.apache.org > Subject: Re: SolrCloud Replication Failure > > Are we saying it has to do something with stop and restarting replica's > otherwise I haven't seen/heard any issues with document updates and > forwarding to replica's... > > Thanks, > Susheel > > On Thu, Nov 1, 2018 at 12:58 PM Erick Erickson > wrote: > > > So this seems like it absolutely needs a JIRA > > On Thu, Nov 1, 2018 at 9:39 AM > Kevin Risden > wrote: > > > > > > I pushed 3 branches that modifies test.sh to test 5.5, 6.6, and 7.5 > > locally > > > without docker. I still see the same behavior where the latest updates > > > aren't on the replicas. I still don't know what is happening but it > > happens > > > without Docker :( > > > > > > > > > https://github.com/risdenk/test-solr-start-stop-replica-consistency/branches > > > > > > Kevin Risden > > > > > > > > > On Thu, Nov 1, 2018 at 11:41 AM Kevin Risden > wrote: > > > > > > > Erick - Yea thats a fair point. Would be interesting to see if this > > fails > > > > without Docker. > > > > > > > > Kevin Risden > > > > > > > > > > > > On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson < > > erickerick...@gmail.com> > > > > wrote: > > > > > > > >> Kevin: > > > >> > > > >> You're also using Docker, right? Docker is not "officially" > supported > > > >> although there's some movement in that direction and if this is only > > > >> reproducible in Docker than it's a clue where to look > > > >> > > > >> Erick > > > >> On Wed, Oct 31, 2018 at 7:24 PM > > > >> Kevin Risden > > > >> wrote: > > > >> > > > > >> > I haven't dug into why this is happening but it definitely > > reproduces. I > > > >> > removed the local requirements (port mapping and such) from the > > gist you > > > >> > posted (very helpful). I confirmed this fails locally and on > Travis > > CI. > > > >> > > > > >> > > https://github.com/risdenk/test-solr-start-stop-replica-consistency > > > >> > > > > >> > I don't even see the first update getting applied from num 10 -> > 20. > > > >> After > > > >> > the first update there is no more change. > > > >> > > > > >> > Kevin Risden > > > >> > > > > >> > > > > >> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith > > > > >> wrote: > > > >> > > > > >> > > Thanks Erick, this is 7.5.0. > > > >> > > > > > >> > > From: Erick Erickson > > > >> > > Sent: Wednesday, October 31, 2018 8:20:18 PM > > > >> > > To: solr-user > > > >> > > Subject: Re: SolrCloud Replication Failure > > > >> > > > > > >> > > What version of solr? This code was pretty much rewriten in 7.3 > > IIRC > > > >> > > > > > >> > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith > wrote: > > > >> > > > > > >> > > > Hi all, > > > >> > > > > > > >> > > > We are currently running a moderately large instance of > > > >> standalone > > > >> > > > solr and are preparing to switch to solr cloud
Re: solr cloud - hdfs folder structure best practice
I prefer a single HDFS home since it definitely simplifies things. No need to create folders for each node or anything like that if you add nodes to the cluster. The replicas underneath will get their own folders. I don't know if there are issues with autoAddReplicas or other types of failovers if there are different home folders. I've run Solr on HDFS with the same basic configs as listed here: https://risdenk.github.io/2018/10/23/apache-solr-running-on-apache-hadoop-hdfs.html Kevin Risden On Fri, Nov 2, 2018 at 1:19 PM lstusr 5u93n4 wrote: > Hi All, > > Here's a question that I can't find an answer to in the documentation: > > When configuring solr cloud with HDFS, is it best to: > a) provide a unique hdfs folder for each solr cloud instance > or > b) provide the same hdfs folder to all solr cloud instances. > > So for example, if I have two solr cloud nodes, I can configure them either > with: > >node1: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node1 >node2: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node2 > > Or I could configure both nodes with: > > -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr > > In the second option, all solr cloud nodes can "see" all index files from > all other solr cloud nodes. Are there pros or cons to allowing the all of > the solr nodes to see all files in the collection? > > Thanks, > > Kyle >
Re: SolrCloud Replication Failure
I pushed 3 branches that modifies test.sh to test 5.5, 6.6, and 7.5 locally without docker. I still see the same behavior where the latest updates aren't on the replicas. I still don't know what is happening but it happens without Docker :( https://github.com/risdenk/test-solr-start-stop-replica-consistency/branches Kevin Risden On Thu, Nov 1, 2018 at 11:41 AM Kevin Risden wrote: > Erick - Yea thats a fair point. Would be interesting to see if this fails > without Docker. > > Kevin Risden > > > On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson > wrote: > >> Kevin: >> >> You're also using Docker, right? Docker is not "officially" supported >> although there's some movement in that direction and if this is only >> reproducible in Docker than it's a clue where to look >> >> Erick >> On Wed, Oct 31, 2018 at 7:24 PM >> Kevin Risden >> wrote: >> > >> > I haven't dug into why this is happening but it definitely reproduces. I >> > removed the local requirements (port mapping and such) from the gist you >> > posted (very helpful). I confirmed this fails locally and on Travis CI. >> > >> > https://github.com/risdenk/test-solr-start-stop-replica-consistency >> > >> > I don't even see the first update getting applied from num 10 -> 20. >> After >> > the first update there is no more change. >> > >> > Kevin Risden >> > >> > >> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith >> wrote: >> > >> > > Thanks Erick, this is 7.5.0. >> > > >> > > From: Erick Erickson >> > > Sent: Wednesday, October 31, 2018 8:20:18 PM >> > > To: solr-user >> > > Subject: Re: SolrCloud Replication Failure >> > > >> > > What version of solr? This code was pretty much rewriten in 7.3 IIRC >> > > >> > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith > > > >> > > > Hi all, >> > > > >> > > > We are currently running a moderately large instance of >> standalone >> > > > solr and are preparing to switch to solr cloud to help us scale >> up. I >> > > have >> > > > been running a number of tests using docker locally and ran into an >> issue >> > > > where replication is consistently failing. I have pared down the >> test >> > > case >> > > > as minimally as I could. Here's a link for the docker-compose.yml >> (I put >> > > > it in a directory called solrcloud_simple) and a script to run the >> test: >> > > > >> > > > >> > > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489 >> > > > >> > > > >> > > > Here's the basic idea behind the test: >> > > > >> > > > >> > > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2 >> > > > replicas (each node gets a replica). Just use the default schema, >> > > although >> > > > I've also tried our schema and got the same result. >> > > > >> > > > >> > > > 2) Shut down solr-2 >> > > > >> > > > >> > > > 3) Add 100 simple docs, just id and a field called num. >> > > > >> > > > >> > > > 4) Start solr-2 and check that it received the documents. It did! >> > > > >> > > > >> > > > 5) Update a document, commit, and check that solr-2 received the >> update. >> > > > It did! >> > > > >> > > > >> > > > 6) Stop solr-2, update the same document, start solr-2, and make >> sure >> > > that >> > > > it received the update. It did! >> > > > >> > > > >> > > > 7) Repeat step 6 with a new value. This time solr-2 reverts back >> to what >> > > > it had in step 5. >> > > > >> > > > >> > > > I believe the main issue comes from this in the logs: >> > > > >> > > > >> > > > solr-2_1 | 2018-10-31 17:04:26.135 INFO >> > > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr >> > > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test >> s:shard1 >> > > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync: >> > > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr Our >> versions >> > > are >> > > > newer. ourHighThreshold=1615861330901729280 >> > > > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280 >> > > > otherHighest=1615861335081353216 >> > > > >> > > > PeerSync thinks the versions on solr-2 are newer for some reason, >> so it >> > > > doesn't try to sync from solr-1. In the final state, solr-2 will >> always >> > > > have a lower version for the updated doc than solr-1. I've tried >> this >> > > with >> > > > different commit strategies, both auto and manual, and it doesn't >> seem to >> > > > make any difference. >> > > > >> > > > Is this a bug with solr, an issue with using docker, or am I just >> > > > expecting too much from solr? >> > > > >> > > > Thanks for any insights you may have, >> > > > >> > > > Jeremy >> > > > >> > > > >> > > > >> > > >> >
Re: SolrCloud Replication Failure
Erick - Yea thats a fair point. Would be interesting to see if this fails without Docker. Kevin Risden On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson wrote: > Kevin: > > You're also using Docker, right? Docker is not "officially" supported > although there's some movement in that direction and if this is only > reproducible in Docker than it's a clue where to look > > Erick > On Wed, Oct 31, 2018 at 7:24 PM > Kevin Risden > wrote: > > > > I haven't dug into why this is happening but it definitely reproduces. I > > removed the local requirements (port mapping and such) from the gist you > > posted (very helpful). I confirmed this fails locally and on Travis CI. > > > > https://github.com/risdenk/test-solr-start-stop-replica-consistency > > > > I don't even see the first update getting applied from num 10 -> 20. > After > > the first update there is no more change. > > > > Kevin Risden > > > > > > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith > wrote: > > > > > Thanks Erick, this is 7.5.0. > > > > > > From: Erick Erickson > > > Sent: Wednesday, October 31, 2018 8:20:18 PM > > > To: solr-user > > > Subject: Re: SolrCloud Replication Failure > > > > > > What version of solr? This code was pretty much rewriten in 7.3 IIRC > > > > > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith > > > > > > Hi all, > > > > > > > > We are currently running a moderately large instance of > standalone > > > > solr and are preparing to switch to solr cloud to help us scale up. > I > > > have > > > > been running a number of tests using docker locally and ran into an > issue > > > > where replication is consistently failing. I have pared down the > test > > > case > > > > as minimally as I could. Here's a link for the docker-compose.yml > (I put > > > > it in a directory called solrcloud_simple) and a script to run the > test: > > > > > > > > > > > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489 > > > > > > > > > > > > Here's the basic idea behind the test: > > > > > > > > > > > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2 > > > > replicas (each node gets a replica). Just use the default schema, > > > although > > > > I've also tried our schema and got the same result. > > > > > > > > > > > > 2) Shut down solr-2 > > > > > > > > > > > > 3) Add 100 simple docs, just id and a field called num. > > > > > > > > > > > > 4) Start solr-2 and check that it received the documents. It did! > > > > > > > > > > > > 5) Update a document, commit, and check that solr-2 received the > update. > > > > It did! > > > > > > > > > > > > 6) Stop solr-2, update the same document, start solr-2, and make sure > > > that > > > > it received the update. It did! > > > > > > > > > > > > 7) Repeat step 6 with a new value. This time solr-2 reverts back to > what > > > > it had in step 5. > > > > > > > > > > > > I believe the main issue comes from this in the logs: > > > > > > > > > > > > solr-2_1 | 2018-10-31 17:04:26.135 INFO > > > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr > > > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test > s:shard1 > > > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync: > > > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr Our > versions > > > are > > > > newer. ourHighThreshold=1615861330901729280 > > > > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280 > > > > otherHighest=1615861335081353216 > > > > > > > > PeerSync thinks the versions on solr-2 are newer for some reason, so > it > > > > doesn't try to sync from solr-1. In the final state, solr-2 will > always > > > > have a lower version for the updated doc than solr-1. I've tried > this > > > with > > > > different commit strategies, both auto and manual, and it doesn't > seem to > > > > make any difference. > > > > > > > > Is this a bug with solr, an issue with using docker, or am I just > > > > expecting too much from solr? > > > > > > > > Thanks for any insights you may have, > > > > > > > > Jeremy > > > > > > > > > > > > > > > >
Re: SolrCloud Replication Failure
So I just added PRs 5.5, 6.6, 7.1, 7.2, 7.3, 7.4, and 7.5. They all seem to have the exact same behavior... I don't have much more insight here but it doesn't seem to be correct. Kevin Risden On Thu, Nov 1, 2018 at 9:45 AM Kevin Risden wrote: > Ahhh your PR triggered an idea. I'll open a few PRs adjusting the Solr > version from latest back to earlier 7.x versions. See which version the > problem was introduced in. > > Kevin Risden > > > On Thu, Nov 1, 2018 at 9:17 AM Jeremy Smith wrote: > >> Thanks so much for looking into this and cleaning up my code. >> >> >> I added a pull request to show some additional strange behavior. If we >> restart solr-1, making solr-2 the leader, the out of date value of [10] >> gets propagated back to solr-1. Perhaps this will give a hint as to what >> is going on. >> >> >> From: >> Kevin Risden >> >> Sent: Wednesday, October 31, 2018 10:24:24 PM >> To: solr-user@lucene.apache.org >> Subject: Re: SolrCloud Replication Failure >> >> I haven't dug into why this is happening but it definitely reproduces. I >> removed the local requirements (port mapping and such) from the gist you >> posted (very helpful). I confirmed this fails locally and on Travis CI. >> >> https://github.com/risdenk/test-solr-start-stop-replica-consistency >> >> I don't even see the first update getting applied from num 10 -> 20. After >> the first update there is no more change. >> >> Kevin Risden >> >> >> On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith wrote: >> >> > Thanks Erick, this is 7.5.0. >> > >> > From: Erick Erickson >> > Sent: Wednesday, October 31, 2018 8:20:18 PM >> > To: solr-user >> > Subject: Re: SolrCloud Replication Failure >> > >> > What version of solr? This code was pretty much rewriten in 7.3 IIRC >> > >> > On Wed, Oct 31, 2018, 10:47 Jeremy Smith > > >> > > Hi all, >> > > >> > > We are currently running a moderately large instance of >> standalone >> > > solr and are preparing to switch to solr cloud to help us scale up. I >> > have >> > > been running a number of tests using docker locally and ran into an >> issue >> > > where replication is consistently failing. I have pared down the test >> > case >> > > as minimally as I could. Here's a link for the docker-compose.yml (I >> put >> > > it in a directory called solrcloud_simple) and a script to run the >> test: >> > > >> > > >> > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489 >> > > >> > > >> > > Here's the basic idea behind the test: >> > > >> > > >> > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2 >> > > replicas (each node gets a replica). Just use the default schema, >> > although >> > > I've also tried our schema and got the same result. >> > > >> > > >> > > 2) Shut down solr-2 >> > > >> > > >> > > 3) Add 100 simple docs, just id and a field called num. >> > > >> > > >> > > 4) Start solr-2 and check that it received the documents. It did! >> > > >> > > >> > > 5) Update a document, commit, and check that solr-2 received the >> update. >> > > It did! >> > > >> > > >> > > 6) Stop solr-2, update the same document, start solr-2, and make sure >> > that >> > > it received the update. It did! >> > > >> > > >> > > 7) Repeat step 6 with a new value. This time solr-2 reverts back to >> what >> > > it had in step 5. >> > > >> > > >> > > I believe the main issue comes from this in the logs: >> > > >> > > >> > > solr-2_1 | 2018-10-31 17:04:26.135 INFO >> > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr >> > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test >> s:shard1 >> > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync: >> > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr Our versions >> > are >> > > newer. ourHighThreshold=1615861330901729280 >> > > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280 >> > > otherHighest=1615861335081353216 >> > > >> > > PeerSync thinks the versions on solr-2 are newer for some reason, so >> it >> > > doesn't try to sync from solr-1. In the final state, solr-2 will >> always >> > > have a lower version for the updated doc than solr-1. I've tried this >> > with >> > > different commit strategies, both auto and manual, and it doesn't >> seem to >> > > make any difference. >> > > >> > > Is this a bug with solr, an issue with using docker, or am I just >> > > expecting too much from solr? >> > > >> > > Thanks for any insights you may have, >> > > >> > > Jeremy >> > > >> > > >> > > >> > >> >
Re: SolrCloud Replication Failure
Ahhh your PR triggered an idea. I'll open a few PRs adjusting the Solr version from latest back to earlier 7.x versions. See which version the problem was introduced in. Kevin Risden On Thu, Nov 1, 2018 at 9:17 AM Jeremy Smith wrote: > Thanks so much for looking into this and cleaning up my code. > > > I added a pull request to show some additional strange behavior. If we > restart solr-1, making solr-2 the leader, the out of date value of [10] > gets propagated back to solr-1. Perhaps this will give a hint as to what > is going on. > > ____ > From: > Kevin Risden > > Sent: Wednesday, October 31, 2018 10:24:24 PM > To: solr-user@lucene.apache.org > Subject: Re: SolrCloud Replication Failure > > I haven't dug into why this is happening but it definitely reproduces. I > removed the local requirements (port mapping and such) from the gist you > posted (very helpful). I confirmed this fails locally and on Travis CI. > > https://github.com/risdenk/test-solr-start-stop-replica-consistency > > I don't even see the first update getting applied from num 10 -> 20. After > the first update there is no more change. > > Kevin Risden > > > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith wrote: > > > Thanks Erick, this is 7.5.0. > > > > From: Erick Erickson > > Sent: Wednesday, October 31, 2018 8:20:18 PM > > To: solr-user > > Subject: Re: SolrCloud Replication Failure > > > > What version of solr? This code was pretty much rewriten in 7.3 IIRC > > > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith > > > > Hi all, > > > > > > We are currently running a moderately large instance of standalone > > > solr and are preparing to switch to solr cloud to help us scale up. I > > have > > > been running a number of tests using docker locally and ran into an > issue > > > where replication is consistently failing. I have pared down the test > > case > > > as minimally as I could. Here's a link for the docker-compose.yml (I > put > > > it in a directory called solrcloud_simple) and a script to run the > test: > > > > > > > > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489 > > > > > > > > > Here's the basic idea behind the test: > > > > > > > > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2 > > > replicas (each node gets a replica). Just use the default schema, > > although > > > I've also tried our schema and got the same result. > > > > > > > > > 2) Shut down solr-2 > > > > > > > > > 3) Add 100 simple docs, just id and a field called num. > > > > > > > > > 4) Start solr-2 and check that it received the documents. It did! > > > > > > > > > 5) Update a document, commit, and check that solr-2 received the > update. > > > It did! > > > > > > > > > 6) Stop solr-2, update the same document, start solr-2, and make sure > > that > > > it received the update. It did! > > > > > > > > > 7) Repeat step 6 with a new value. This time solr-2 reverts back to > what > > > it had in step 5. > > > > > > > > > I believe the main issue comes from this in the logs: > > > > > > > > > solr-2_1 | 2018-10-31 17:04:26.135 INFO > > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr > > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test s:shard1 > > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync: > > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr Our versions > > are > > > newer. ourHighThreshold=1615861330901729280 > > > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280 > > > otherHighest=1615861335081353216 > > > > > > PeerSync thinks the versions on solr-2 are newer for some reason, so it > > > doesn't try to sync from solr-1. In the final state, solr-2 will > always > > > have a lower version for the updated doc than solr-1. I've tried this > > with > > > different commit strategies, both auto and manual, and it doesn't seem > to > > > make any difference. > > > > > > Is this a bug with solr, an issue with using docker, or am I just > > > expecting too much from solr? > > > > > > Thanks for any insights you may have, > > > > > > Jeremy > > > > > > > > > > > >
Re: SolrCloud Replication Failure
I haven't dug into why this is happening but it definitely reproduces. I removed the local requirements (port mapping and such) from the gist you posted (very helpful). I confirmed this fails locally and on Travis CI. https://github.com/risdenk/test-solr-start-stop-replica-consistency I don't even see the first update getting applied from num 10 -> 20. After the first update there is no more change. Kevin Risden On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith wrote: > Thanks Erick, this is 7.5.0. > > From: Erick Erickson > Sent: Wednesday, October 31, 2018 8:20:18 PM > To: solr-user > Subject: Re: SolrCloud Replication Failure > > What version of solr? This code was pretty much rewriten in 7.3 IIRC > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith > > Hi all, > > > > We are currently running a moderately large instance of standalone > > solr and are preparing to switch to solr cloud to help us scale up. I > have > > been running a number of tests using docker locally and ran into an issue > > where replication is consistently failing. I have pared down the test > case > > as minimally as I could. Here's a link for the docker-compose.yml (I put > > it in a directory called solrcloud_simple) and a script to run the test: > > > > > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489 > > > > > > Here's the basic idea behind the test: > > > > > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2 > > replicas (each node gets a replica). Just use the default schema, > although > > I've also tried our schema and got the same result. > > > > > > 2) Shut down solr-2 > > > > > > 3) Add 100 simple docs, just id and a field called num. > > > > > > 4) Start solr-2 and check that it received the documents. It did! > > > > > > 5) Update a document, commit, and check that solr-2 received the update. > > It did! > > > > > > 6) Stop solr-2, update the same document, start solr-2, and make sure > that > > it received the update. It did! > > > > > > 7) Repeat step 6 with a new value. This time solr-2 reverts back to what > > it had in step 5. > > > > > > I believe the main issue comes from this in the logs: > > > > > > solr-2_1 | 2018-10-31 17:04:26.135 INFO > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test s:shard1 > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync: > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr Our versions > are > > newer. ourHighThreshold=1615861330901729280 > > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280 > > otherHighest=1615861335081353216 > > > > PeerSync thinks the versions on solr-2 are newer for some reason, so it > > doesn't try to sync from solr-1. In the final state, solr-2 will always > > have a lower version for the updated doc than solr-1. I've tried this > with > > different commit strategies, both auto and manual, and it doesn't seem to > > make any difference. > > > > Is this a bug with solr, an issue with using docker, or am I just > > expecting too much from solr? > > > > Thanks for any insights you may have, > > > > Jeremy > > > > > > >
Re: hdfs - documents missing after hard poweroff
Also do you have auto add replicas turned on for these collections over HDFS? Kevin Risden On Wed, Oct 31, 2018 at 8:20 PM Kevin Risden wrote: > So I'm definitely curious what is going on here. > > Are you still able to reproduce this? Can you check if files have been > modified on HDFS? I'd be curious if tlogs or the index is changing > underneath for the different restarts. Since there is no new indexing I > would guess not but something to check. > > Can you run check index on the index to make sure its not corrupt when you > don't get the full result set. > > Kevin Risden > > > On Tue, Oct 16, 2018 at 10:23 AM Kyle Fransham > wrote: > >> Hi, >> >> Sometimes after a full poweroff of the solr cloud nodes, we see missing >> documents from the index. Is there anything about our setup or our >> recovery >> procedure that could cause this? Details are below: >> >> We see the following (somewhat random) behaviour: >> >> - add 10 documents to index. Commit. >> - query for all documents - 10 documents returned. >> - restart all solr nodes and reset the collection (procedure is below). >> - query for all documents 10 documents returned. >> - restart+reset all again. - sometimes 7, 8, 9, or 10 documents returned. >> >> To summarize, after a full reboot of all the solr nodes, we are finding >> that (sometimes) not all documents are in the index. This situation >> doesn't >> remedy itself by waiting. Restarting all will sometimes re-add them, >> sometimes not. >> >> Our procedure for recovering from a hard poweroff is: >> - manually delete all *.lock files from the index folders on hdfs. >> - fully delete the znode from zookeeper. >> - re-add an empty znode in zookeeper. >> - start up all solr nodes. >> - re-add the configset. >> - re-issue the collection create command. >> >> After doing the above, we find that we are able to see all of the files in >> the index about 60% of the time. Other times, we are missing some >> documents. >> >> Some other things about our environment: >> - we're doing this test with 1 collection that has 18 shards distributed >> across 3 solr cloud nodes. >> - solr version 7.5.0 >> - hdfs is not running on the solr nodes, and is not being restarted. >> >> Any thoughts or tips are greatly appreciated, >> >> Kyle >> >> -- >> CONFIDENTIALITY NOTICE: The information contained in this email is >> privileged and confidential and intended only for the use of the >> individual >> or entity to whom it is addressed. If you receive this message in >> error, >> please notify the sender immediately at 613-729-1100 and destroy the >> original message and all copies. Thank you. >> >
Re: hdfs - documents missing after hard poweroff
So I'm definitely curious what is going on here. Are you still able to reproduce this? Can you check if files have been modified on HDFS? I'd be curious if tlogs or the index is changing underneath for the different restarts. Since there is no new indexing I would guess not but something to check. Can you run check index on the index to make sure its not corrupt when you don't get the full result set. Kevin Risden On Tue, Oct 16, 2018 at 10:23 AM Kyle Fransham wrote: > Hi, > > Sometimes after a full poweroff of the solr cloud nodes, we see missing > documents from the index. Is there anything about our setup or our recovery > procedure that could cause this? Details are below: > > We see the following (somewhat random) behaviour: > > - add 10 documents to index. Commit. > - query for all documents - 10 documents returned. > - restart all solr nodes and reset the collection (procedure is below). > - query for all documents 10 documents returned. > - restart+reset all again. - sometimes 7, 8, 9, or 10 documents returned. > > To summarize, after a full reboot of all the solr nodes, we are finding > that (sometimes) not all documents are in the index. This situation doesn't > remedy itself by waiting. Restarting all will sometimes re-add them, > sometimes not. > > Our procedure for recovering from a hard poweroff is: > - manually delete all *.lock files from the index folders on hdfs. > - fully delete the znode from zookeeper. > - re-add an empty znode in zookeeper. > - start up all solr nodes. > - re-add the configset. > - re-issue the collection create command. > > After doing the above, we find that we are able to see all of the files in > the index about 60% of the time. Other times, we are missing some > documents. > > Some other things about our environment: > - we're doing this test with 1 collection that has 18 shards distributed > across 3 solr cloud nodes. > - solr version 7.5.0 > - hdfs is not running on the solr nodes, and is not being restarted. > > Any thoughts or tips are greatly appreciated, > > Kyle > > -- > CONFIDENTIALITY NOTICE: The information contained in this email is > privileged and confidential and intended only for the use of the > individual > or entity to whom it is addressed. If you receive this message in error, > please notify the sender immediately at 613-729-1100 and destroy the > original message and all copies. Thank you. >
Re: Extracting top level URL when indexing document
Looks like stop words (in, and, on) is what is breaking. The regex looks like it is correct. Kevin Risden On Tue, Jun 12, 2018, 18:02 Hanjan, Harinder wrote: > Hello! > > I am indexing web documents and have a need to extract their top-level URL > to be stored in a different field. I have had some success with the > PatternTokenizerFactory (relevant schema bits at the bottom) but the > behavior appears to be inconsistent. Most of the times, the top level URL > is extracted just fine but for some documents, it is being cut off. > > Examples: > URL > > Extracted URL > > Comment > > http://www.calgaryarb.ca/eCourtPublic/15M2018.pdf > > http://www.calgaryarb.ca > > Success > > http://www.calgarymlc.ca/about-cmlc/ > > http://www.calgarymlc.ca > > Success > > http://www.calgarypolicecommission.ca/reports.php > > http://www.calgarypolicecommissio > > Fail > > https://attainyourhome.com/ > > https://attai > > Fail > > https://liveandplay.calgary.ca/DROPIN/page/dropin > > https://livea > > Fail > > > > > Relevant schema: > > > multiValued="false"/> > > sortMissingLast="true"> > > > class="solr.PatternTokenizerFactory" > > pattern="^https?://(?:[^@/n]+@)?(?:www.)?([^:/n]+)" > group="0"/> > > > > > I have tested the Regex and it is matching things fine. Please see > https://regex101.com/r/wN6cZ7/358. > So it appears that I have a gap in my understanding of how Solr > PatternTokenizerFactory works. I would appreciate any insight on the issue. > hostname field will be used in facet queries. > > Thank you! > Harinder > > > NOTICE - > This communication is intended ONLY for the use of the person or entity > named above and may contain information that is confidential or legally > privileged. If you are not the intended recipient named above or a person > responsible for delivering messages or communications to the intended > recipient, YOU ARE HEREBY NOTIFIED that any use, distribution, or copying > of this communication or any of the information contained in it is strictly > prohibited. If you have received this communication in error, please notify > us immediately by telephone and then destroy or delete this communication, > or return it to us by mail if requested by us. The City of Calgary thanks > you for your attention and co-operation. >
Re: Solr OOM Crashes / JVM tuning advice
I'm going to share how I've debugged a similar OOM crash and solving it had nothing to do with increasing heap. https://risdenk.github.io/2017/12/18/ambari-infra-solr-ranger.html This is specifically for Apache Ranger and how to fix it but you can treat it just like any application using Solr. There were a few things that caused issues "out of the blue": - Document TTL - The documents getting deleted after some time would trigger OOM (due to caches taking up too much heap) - Extra query load - caches again taking up too much memory - Extra inserts - too many commits refreshing caches and again going OOM Many of these can be reduced by using docvalues for fields that you typically sort/filter on. Kevin Risden On Wed, Apr 11, 2018 at 6:01 PM, Deepak Goel <deic...@gmail.com> wrote: > A few observations: > > 1. The Old Gen Heap on 9th April is about 6GB occupied which then runs up > to 9+GB on 10th April (It steadily increases throughout the day) > 2. The Old Gen GC is never able to reclaim any free memory > > > > Deepak > "Please stop cruelty to Animals, help by becoming a Vegan" > +91 73500 12833 > deic...@gmail.com > > Facebook: https://www.facebook.com/deicool > LinkedIn: www.linkedin.com/in/deicool > > "Plant a Tree, Go Green" > > On Wed, Apr 11, 2018 at 8:53 PM, Adam Harrison-Fuller < > aharrison-ful...@mintel.com> wrote: > > > In addition, here is the GC log leading up to the crash. > > > > https://www.dropbox.com/s/sq09d6hbss9b5ov/solr_gc_log_ > > 20180410_1009.zip?dl=0 > > > > Thanks! > > > > Adam > > > > On 11 April 2018 at 16:18, Adam Harrison-Fuller < > > aharrison-ful...@mintel.com > > > wrote: > > > > > Thanks for the advice so far. > > > > > > The directoryFactory is set to ${solr.directoryFactory:solr. > > NRTCachingDirectoryFactory}. > > > > > > > > > The servers workload is predominantly queries with updates taking place > > > once a day. It seems the servers are more likely to go down whilst the > > > servers are indexing but not exclusively so. > > > > > > I'm having issues locating the actual out of memory exception. I can > > tell > > > that it has ran out of memory as its called the oom_killer script which > > as > > > left a log file in the logs directory. I cannot find the actual > > exception > > > in the solr.log or our solr_gc.log, any suggestions? > > > > > > Cheers, > > > Adam > > > > > > > > > On 11 April 2018 at 15:49, Walter Underwood <wun...@wunderwood.org> > > wrote: > > > > > >> For readability, I’d use -Xmx12G instead of > -XX:MaxHeapSize=12884901888. > > >> Also, I always use a start size the same as the max size, since > servers > > >> will eventually grow to the max size. So: > > >> > > >> -Xmx12G -Xms12G > > >> > > >> wunder > > >> Walter Underwood > > >> wun...@wunderwood.org > > >> http://observer.wunderwood.org/ (my blog) > > >> > > >> > On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar < > sujaybawas...@gmail.com> > > >> wrote: > > >> > > > >> > What is directory factory defined in solrconfig.xml? Your JVM heap > > >> should > > >> > be tuned up with respect to that. > > >> > How solr is being use, is it more updates and less query or less > > >> updates > > >> > more queries? > > >> > What is OOM error? Is it frequent GC or Error 12? > > >> > > > >> > On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller < > > >> > aharrison-ful...@mintel.com> wrote: > > >> > > > >> >> Hey Jesus, > > >> >> > > >> >> Thanks for the suggestions. The Solr nodes have 4 CPUs assigned to > > >> them. > > >> >> > > >> >> Cheers! > > >> >> Adam > > >> >> > > >> >> On 11 April 2018 at 11:22, Jesus Olivan <jesus.oli...@letgo.com> > > >> wrote: > > >> >> > > >> >>> Hi Adam, > > >> >>> > > >> >>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of > physical > > >> RAM, > > >> >>> your JVM can afford more RAM without threading penalties due to > > >> outside > > >> >>> heap RAM la
Re: solr 5.2->7.2, suggester failure
It looks like there were changes in Lucene 7.0 that limited the size of the automaton to prevent overflowing the stack. https://issues.apache.org/jira/browse/LUCENE-7914 The commit being: https://github.com/apache/lucene-solr/commit/7dde798473d1a8640edafb41f28ad25d17f25a2d Kevin Risden On Tue, Apr 3, 2018 at 1:45 PM, David Hastings <hastings.recurs...@gmail.com > wrote: > For data, its primarily a lot of garbage, around 200k titles, varying > length. im actually looking through my application now to see if I even > still use it or if it was an early experiment. I am just finding it odd > thats its failing in 7 but does fine on 5 > > On Tue, Apr 3, 2018 at 2:41 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > What kinds of things go into your title field? On first blush that's a > > bit odd for a multi-word title field since it treats the entire input > > as a single string. The code is trying to build a large FST to hold > > all of this data. Would AnalyzingInfixLookupFactory or similar make > > more sense? > > > > buildOnStartup and buildOnOptimize are other red flags. This means > > that every time you start up, the data for the title field is read > > from disk and the FST is built (or index if you use a different impl). > > On a large corpus this may take many minutes. > > > > Best, > > Erick > > > > On Tue, Apr 3, 2018 at 11:28 AM, David Hastings > > <hastings.recurs...@gmail.com> wrote: > > > Hey all, I recently got a 7.2 instance up and running, and it seems to > be > > > going well however, I have ran into this when creating one of my > indexes, > > > and was wondering if anyone had a quick idea right off the top of their > > > head. > > > > > > solrconfig: > > > > > > > > > > > > fixspell > > > FuzzyLookupFactory > > > > > > string > > > > > > DocumentDictionaryFactory > > > title > > > true > > > true > > > > > > > > > > > > received error: > > > > > > > > > ERROR true > > > SuggestComponent > > > Exception in building suggester index for: fixspell > > > java.lang.IllegalArgumentException: input automaton is too large: 1001 > > > at > > > org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse( > > Operations.java:1298) > > > at > > > org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse( > > Operations.java:1306) > > > at > > > org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse( > > Operations.java:1306) > > > > > > . > > > > > > at > > > org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse( > > Operations.java:1306) > > > at > > > org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse( > > Operations.java:1306) > > > at > > > org.apache.lucene.util.automaton.Operations.topoSortStates(Operations. > > java:1275) > > > at > > > org.apache.lucene.search.suggest.analyzing. > > AnalyzingSuggester.replaceSep(AnalyzingSuggester.java:292) > > > at > > > org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester. > > toAutomaton(AnalyzingSuggester.java:854) > > > at > > > org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.build( > > AnalyzingSuggester.java:430) > > > at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:190) > > > at > > > org.apache.solr.spelling.suggest.SolrSuggester.build( > > SolrSuggester.java:181) > > > at > > > org.apache.solr.handler.component.SuggestComponent$SuggesterListener. > > buildSuggesterIndex(SuggestComponent.java:529) > > > at > > > org.apache.solr.handler.component.SuggestComponent$ > > SuggesterListener.newSearcher(SuggestComponent.java:511) > > > at org.apache.solr.core.SolrCore.lambda$getSearcher$17( > > SolrCore.java:2275) > > >
Re: Ingestion not scaling horizontally as I add more cores to Solr
When you say "multiple machines", was these all local machines or vms or something else? I worked with a group once that used laptops to benchmark a service and it was a WiFi network limit that caused weird results. LAN connections or even better a dedicated client machine would help push more documents. Kevin Risden On Thu, Jan 11, 2018 at 11:39 AM, Shashank Pedamallu <spedama...@vmware.com> wrote: > Thank you very much for the reply Shawn. Is the jmeter running on a > different machine from Solr or on the same machine? > Solr is running on a dedicated VM. And I’ve tried to split the client > requests from multiple machines but the result was not different. So, I > don’t think the bottleneck is with the client side. > > Thanks, > Shashank > > > On 1/10/18, 10:54 PM, "Shawn Heisey" <apa...@elyograg.org> wrote: > > On 1/10/2018 12:58 PM, Shashank Pedamallu wrote: > > As you can see, the number of documents being ingested per core is > not scaling horizontally as I'm adding more cores. Rather the total number > of documents getting ingested for Solr JVM is being topped around 90k > documents per second. > > I would call 90K documents per second a very respectable speed. I > can't > get my indexing to happen at anywhere near that rate. My indexing is > not multi-threaded, though. > > > From the iostats and top commands, I do not see any bottlenecks > with the iops or cpu respectively, CPU usaeg is around 65% and a sample of > iostats is below: > > > > avg-cpu: %user %nice %system %iowait %steal %idle > > > >55.320.002.331.640.00 40.71 > > > > Device:tpskB_read/skB_wrtn/skB_read > kB_wrtn > > > > sda5 2523.00 45812.00298312.00 45812 > 298312 > > Nearly 300 megabytes per second write speed? That's a LOT of data. > This storage must be quite a bit better than a single spinning disk. > You won't get that kind of sustained transfer speed out of standard > spinning disks unless they are using something like RAID10 or RAID0. > This transfer speed is also well beyond the capabilities of Gigabit > Ethernet. > > When Gus asked whether you were sending documents to the cloud from > your > local machine, I don't think he was referring to a public cloud. I > think he assumed you were running SolrCloud, so "cloud" was probably > referring to your Solr installation, not a public cloud service. If I > had to guess, I think the intent was to find out what caliber of > machine > you're using to send the indexing requests. > > I don't know if the bottleneck is on the client side or the server > side. > But I would imagine that with everything on a single machine, you may > not be able to get the ingestion rate to go much higher. > > Is the jmeter running on a different machine from Solr or on the same > machine? > > Thanks, > Shawn > > >
Re: Recovery Issue - Solr 6.6.1 and HDFS
Thanks for the detailed answers Joe. Definitely sounds like you covered most of the easy HDFS performance items. Kevin Risden On Wed, Nov 22, 2017 at 7:44 AM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: > Hi Kevin - > * HDFS is part of Cloudera 5.12.0. > * Solr is co-located in most cases. We do have several nodes that run on > servers that are not data nodes, but most do. Unfortunately, our nodes are > not the same size. Some nodes have 8TBytes of disk, while our largest > nodes are 64TBytes. This results in a lot of data that needs to go over > the network. > > * Command is: > /usr/lib/jvm/jre-1.8.0/bin/java -server -Xms12g -Xmx16g -Xss2m > -XX:+UseG1GC -XX:MaxDirectMemorySize=11g -XX:+PerfDisableSharedMem > -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=16m > -XX:MaxGCPauseMillis=300 -XX:InitiatingHeapOccupancyPercent=75 > -XX:+UseLargePages -XX:ParallelGCThreads=16 -XX:-ResizePLAB > -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > -Xloggc:/opt/solr6/server/logs/solr_gc.log -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M -DzkClientTimeout=30 > -DzkHost=frodo.querymasters.com:2181,bilbo.querymasters.com:2181, > gandalf.querymasters.com:2181,cordelia.querymasters.com:2181,cressida. > querymasters.com:2181/solr6.6.0 -Dsolr.log.dir=/opt/solr6/server/logs > -Djetty.port=9100 -DSTOP.PORT=8100 -DSTOP.KEY=solrrocks -Dhost=tarvos > -Duser.timezone=UTC -Djetty.home=/opt/solr6/server > -Dsolr.solr.home=/opt/solr6/server/solr -Dsolr.install.dir=/opt/solr6 > -Dsolr.clustering.enabled=true -Dsolr.lock.type=hdfs > -Dsolr.autoSoftCommit.maxTime=12 -Dsolr.autoCommit.maxTime=180 > -Dsolr.solr.home=/etc/solr6 -Djava.library.path=/opt/cloud > era/parcels/CDH/lib/hadoop/lib/native -Xss256k -Dsolr.log.muteconsole > -XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100 > /opt/solr6/server/logs -jar start.jar --module=http > > * We have enabled short circuit reads. > > Right now, we have a relatively small block cache due to the requirements > that the servers run other software. We tried to find the best balance > between block cache size, and RAM for programs, while still giving enough > for local FS cache. This came out to be 84 128M blocks - or about 10G for > the cache per node (45 nodes total). > > class="solr.HdfsDirectoryFactory"> > true > true > 84 > true bool> > 16384 > true > true > 128 > 1024 > hdfs://nameservice1:8020/solr6.6.0 r> > /etc/hadoop/conf.cloudera.hdfs1 r> > > > Thanks for reviewing! > > -Joe > > > > On 11/22/2017 8:20 AM, Kevin Risden wrote: > >> Joe, >> >> I have a few questions about your Solr and HDFS setup that could help >> improve the recovery performance. >> >> * Is HDFS part of a distribution from Hortonworks, Cloudera, etc? >> * Is Solr colocated with HDFS data nodes? >> * What is the output of "ps aux | grep solr"? (specifically looking for >> the >> Java arguments that are being set.) >> >> Depending on how Solr on HDFS was setup, there are some potentially simple >> settings that can help significantly improve performance. >> >> 1) Short circuit reads >> >> If Solr is colocated with an HDFS datanode, short circuit reads can >> improve >> read performance since it skips a network hop if the data is local to that >> node. This requires HDFS native libraries to be added to Solr. >> >> 2) HDFS block cache in Solr >> >> Solr without HDFS uses the OS page cache to handle caching data for >> queries. With HDFS, Solr has a special HDFS block cache which allows for >> caching HDFS blocks. This significantly helps query performance. There are >> a few configuration parameters that can help here. >> >> Kevin Risden >> >> On Wed, Nov 22, 2017 at 4:20 AM, Hendrik Haddorp <hendrik.hadd...@gmx.net >> > >> wrote: >> >> Hi Joe, >>> >>> sorry, I have not seen that problem. I would normally not delete a >>> replica >>> if the shard is down but only if there is an active shard. Without an >>> active leader the replica should not be able to recover. I also just had >>> a >>> case where all replicas of a shard stayed in down state and restarts >>> didn't >>> help. This was however also caused by lock files. Once I cleaned them up >>> and restarted all Solr instances that had a
Re: Recovery Issue - Solr 6.6.1 and HDFS
Joe, I have a few questions about your Solr and HDFS setup that could help improve the recovery performance. * Is HDFS part of a distribution from Hortonworks, Cloudera, etc? * Is Solr colocated with HDFS data nodes? * What is the output of "ps aux | grep solr"? (specifically looking for the Java arguments that are being set.) Depending on how Solr on HDFS was setup, there are some potentially simple settings that can help significantly improve performance. 1) Short circuit reads If Solr is colocated with an HDFS datanode, short circuit reads can improve read performance since it skips a network hop if the data is local to that node. This requires HDFS native libraries to be added to Solr. 2) HDFS block cache in Solr Solr without HDFS uses the OS page cache to handle caching data for queries. With HDFS, Solr has a special HDFS block cache which allows for caching HDFS blocks. This significantly helps query performance. There are a few configuration parameters that can help here. Kevin Risden On Wed, Nov 22, 2017 at 4:20 AM, Hendrik Haddorp <hendrik.hadd...@gmx.net> wrote: > Hi Joe, > > sorry, I have not seen that problem. I would normally not delete a replica > if the shard is down but only if there is an active shard. Without an > active leader the replica should not be able to recover. I also just had a > case where all replicas of a shard stayed in down state and restarts didn't > help. This was however also caused by lock files. Once I cleaned them up > and restarted all Solr instances that had a replica they recovered. > > For the lock files I discovered that the index is not always in the > "index" folder but can also be in an index. folder. There can be > an "index.properties" file in the "data" directory in HDFS and this > contains the correct index folder name. > > If you are really desperate you could also delete all but one replica so > that the leader election is quite trivial. But this does of course increase > the risk of finally loosing the data quite a bit. So I would try looking > into the code and figure out what the problem is here and maybe compare the > state in HDFS and ZK with a shard that works. > > regards, > Hendrik > > > On 21.11.2017 23:57, Joe Obernberger wrote: > >> Hi Hendrick - the shards in question have three replicas. I tried >> restarting each one (one by one) - no luck. No leader is found. I deleted >> one of the replicas and added a new one, and the new one also shows as >> 'down'. I also tried the FORCELEADER call, but that had no effect. I >> checked the OVERSEERSTATUS, but there is nothing unusual there. I don't >> see anything useful in the logs except the error: >> >> org.apache.solr.common.SolrException: Error getting leader from zk for >> shard shard21 >> at org.apache.solr.cloud.ZkController.getLeader(ZkController. >> java:996) >> at org.apache.solr.cloud.ZkController.register(ZkController.java:902) >> at org.apache.solr.cloud.ZkController.register(ZkController.java:846) >> at org.apache.solr.core.ZkContainer.lambda$registerInZk$0( >> ZkContainer.java:181) >> at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE >> xecutor.lambda$execute$0(ExecutorUtil.java:229) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1149) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> Caused by: org.apache.solr.common.SolrException: Could not get leader >> props >> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkControll >> er.java:1043) >> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkControll >> er.java:1007) >> at org.apache.solr.cloud.ZkController.getLeader(ZkController. >> java:963) >> ... 7 more >> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >> KeeperErrorCode = NoNode for /collections/UNCLASS/leaders/shard21/leader >> at org.apache.zookeeper.KeeperException.create(KeeperException. >> java:111) >> at org.apache.zookeeper.KeeperException.create(KeeperException. >> java:51) >> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) >> at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkCl >> ient.java:357) >> at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkCl >> ient.java:354) >> at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(Zk >> CmdExecutor.java:60) >> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClie >> nt.java:354) >> at org.apache.solr.cloud.ZkController.g
Re: Solr7: Bad query throughput around commit time
> One machine runs with a 3TB drive, running 3 solr processes (each with one core as described above). How much total memory on the machine? Kevin Risden On Sat, Nov 11, 2017 at 1:08 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: > Thanks for a quick and detailed response, Erick! > > Unfortunately i don't have a proof, but our servers with solr 4.5 are > running really nicely with the above config. I had assumed that same or > similar settings will also perform well with Solr 7, but that assumption > didn't hold. As, a lot has changed in 3 major releases. > I have tweaked the cache values as you suggested but increasing or > decreasing doesn't seem to do any noticeable improvement. > > At the moment, my one core has 800GB index, ~450 Million documents, 48 G > Xmx. GC pauses haven't been an issue though. One machine runs with a 3TB > drive, running 3 solr processes (each with one core as described above). I > agree that it is a very atypical system so i should probably try different > parameters with a fresh eye to find the solution. > > > I tried with autocommits (commit with opensearcher=false very half minute ; > and softcommit every 5 minutes). That supported the hypothesis that the > query throughput decreases after opening a new searcher and **not** after > committing the index . Cache hit ratios are all in 80+% (even when i > decreased the filterCache to 128, so i will keep it at this lower value). > Document cache hitratio is really bad, it drops to around 40% after > newSearcher. But i guess that is expected, since it cannot be warmed up > anyway. > > > Thanks > Nawab > > > > On Thu, Nov 9, 2017 at 9:11 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > What evidence to you have that the changes you've made to your configs > > are useful? There's lots of things in here that are suspect: > > > > 1 > > > > First, this is useless unless you are forceMerging/optimizing. Which > > you shouldn't be doing under most circumstances. And you're going to > > be rewriting a lot of data every time See: > > > > https://lucidworks.com/2017/10/13/segment-merging-deleted- > > documents-optimize-may-bad/ > > > > filterCache size of size="10240" is far in excess of what we usually > > recommend. Each entry can be up to maxDoc/8 and you have 10K of them. > > Why did you choose this? On the theory that "more is better?" If > > you're using NOW then you may not be using the filterCache well, see: > > > > https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/ > > > > autowarmCount="1024" > > > > Every time you commit you're firing off 1024 queries which is going to > > spike the CPU a lot. Again, this is super-excessive. I usually start > > with 16 or so. > > > > Why are you committing from a cron job? Why not just set your > > autocommit settings and forget about it? That's what they're for. > > > > Your queryResultCache is likewise kind of large, but it takes up much > > less space than the filterCache per entry so it's probably OK. I'd > > still shrink it and set the autowarm to 16 or so to start, unless > > you're seeing a pretty high hit ratio, which is pretty unusual but > > does happen. > > > > 48G of memory is just asking for long GC pauses. How many docs do you > > have in each core anyway? If you're really using this much heap, then > > it'd be good to see what you can do to shrink in. Enabling docValues > > for all fields you facet, sort or group on will help that a lot if you > > haven't already. > > > > How much memory on your entire machine? And how much is used by _all_ > > the JVMs you running on a particular machine? MMapDirectory needs as > > much OS memory space as it can get, see: > > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > > > Lately we've seen some structures that consume memory until a commit > > happens (either soft or hard). I'd shrink my autocommit down to 60 > > seconds or even less (openSearcher=false). > > > > In short, I'd go back mostly to the default settings and build _up_ as > > you can demonstrate improvements. You've changed enough things here > > that untangling which one is the culprit will be hard. You want the > > JVM to have as little memory as possible, unfortunately that's > > something you figure out by experimentation. > > > > Best, > > Erick > > > > On Thu, Nov 9, 2017 at 8:42 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> > > wrote: > > > Hi, > > > > > > I am committing every
Re: Parallel SQL: GROUP BY throws exception
Calcite might support this in 0.14. I know group by support was improved lately. It might be as simple as upgrading the dependency? A test case showing the NPE would be helpful. We are using MySQL dialect under the hood with Calcite. Kevin Risden On Tue, Oct 17, 2017 at 8:09 AM, Joel Bernstein <joels...@gmail.com> wrote: > This would be a good jira to create at ( > https://issues.apache.org/jira/projects/SOLR) > > Interesting that the query works in MySQL. I'm assuming MySQL automatically > adds the group by field to the field list. We can look at doing this as > well. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, Oct 17, 2017 at 6:48 AM, Dmitry Gerasimov < > dgerasi...@kommunion.com> > wrote: > > > Joel, > > > > Thanks for the tip. That worked. I was confused since this query works > > just fine in MySQL. > > It would of course be very helpful if SOLR was responding with a > > proper error. What’s the process here? Where do I post this request? > > > > Dmitry > > > > > > > > > > > -- Forwarded message -- > > > From: Joel Bernstein <joels...@gmail.com> > > > To: solr-user@lucene.apache.org > > > Cc: > > > Bcc: > > > Date: Mon, 16 Oct 2017 11:16:28 -0400 > > > Subject: Re: Parallel SQL: GROUP BY throws exception > > > Ok, I just the read the query again. > > > > > > Try the failing query like this: > > > > > > SELECT people_person_id, sum(amount) as total FROM donation GROUP BY > > > people_person_id > > > > > > That is the correct syntax for the SQL group by aggregation. > > > > > > It looks like you found a null pointer though where a proper error > > message > > > is needed. > > > > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > On Mon, Oct 16, 2017 at 9:49 AM, Joel Bernstein <joels...@gmail.com> > > wrote: > > > > > > > Also what version are you using? > > > > > > > > Joel Bernstein > > > > http://joelsolr.blogspot.com/ > > > > > > > > On Mon, Oct 16, 2017 at 9:49 AM, Joel Bernstein <joels...@gmail.com> > > > > wrote: > > > > > > > >> Can you provide the stack trace? > > > >> > > > >> Are you in SolrCloud mode? > > > >> > > > >> > > > >> > > > >> Joel Bernstein > > > >> http://joelsolr.blogspot.com/ > > > >> > > > >> On Mon, Oct 16, 2017 at 9:20 AM, Dmitry Gerasimov < > > > >> dgerasi...@kommunion.com> wrote: > > > >> > > > >>> Hi all! > > > >>> > > > >>> This query works as expected: > > > >>> SELECT sum(amount) as total FROM donation > > > >>> > > > >>> Adding GROUP BY: > > > >>> SELECT sum(amount) as total FROM donation GROUP BY people_person_id > > > >>> > > > >>> Now I get response: > > > >>> { > > > >>> "result-set":{ > > > >>> "docs":[{ > > > >>> "EXCEPTION":"Failed to execute sqlQuery 'SELECT sum(amount) > > as > > > >>> total FROM donation GROUP BY people_person_id' against JDBC > > connection > > > >>> 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT > sum(amount) > > as > > > >>> total FROM donation GROUP BY people_person_id\": null", > > > >>> "EOF":true, > > > >>> "RESPONSE_TIME":279}]} > > > >>> } > > > >>> > > > >>> Any ideas on what is causing this? Or how to debug? > > > >>> > > > >>> > > > >>> Here is the collection structure: > > > >>> > > > >>> > > >>> required="true" > > > >>> multiValued="false"/> > > > >>> > stored="true" > > > >>> required="true" multiValued="false" docValues="true"/> > > > >>> > > >>> required="true" multiValued="false"/> > > > >>> > > >>> multiValued="false" docValues="true"/> > > > >>> > > > >>> > > > >>> Thanks! > > > >>> > > > >> > > > >> > > > > > > > > > >
Re: Solr uses lots of shared memory!
I haven't looked at reproducing this locally, but since it seems like there haven't been any new ideas decided to share this in case it helps: I noticed in Travis CI [1] they are adding the environment variable MALLOC_ARENA_MAX=2 and so I googled what that configuration did. To my surprise, I came across a stackoverflow post [2] about how glibc could actually be the case and report memory differently. I then found a Hadoop issue HADOOP-7154 [3] about setting this as well to reduce virtual memory usage. I found some more cases where this has helped as well [4], [5], and [6] [1] https://docs.travis-ci.com/user/build-environment-updates/2017-09-06/#Added [2] https://stackoverflow.com/questions/10575342/what-would-cause-a-java-process-to-greatly-exceed-the-xmx-or-xss-limit [3] https://issues.apache.org/jira/browse/HADOOP-7154?focusedCommentId=14505792 [4] https://github.com/cloudfoundry/java-buildpack/issues/320 [5] https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior [6] https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en Kevin Risden On Thu, Aug 24, 2017 at 10:19 AM, Markus Jelsma <markus.jel...@openindex.io> wrote: > Hello Bernd, > > According to the man page, i should get a list of stuff in shared memory if i > invoke it with just a PID. Which shows a list of libraries that together > account for about 25 MB's shared memory usage. Accoring to ps and top, the > JVM uses 2800 MB shared memory (not virtual), that leaves 2775 MB unaccounted > for. Any ideas? Anyone else to reproduce it on a freshly restarted node? > > Thanks, > Markus > > > PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND > 18901 markus20 0 14,778g 4,965g 2,987g S 891,1 31,7 20:21.63 java > > 0x55b9a17f1000 6K /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java > 0x7fdf1d314000 182K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libsunec.so > 0x7fdf1e548000 38K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libmanagement.so > 0x7fdf1e78e000 94K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so > 0x7fdf1e9a6000 75K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnio.so > 0x7fdf5cd6e000 34K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libzip.so > 0x7fdf5cf77000 46K /lib/x86_64-linux-gnu/libnss_files-2.24.so > 0x7fdf5d189000 46K /lib/x86_64-linux-gnu/libnss_nis-2.24.so > 0x7fdf5d395000 90K /lib/x86_64-linux-gnu/libnsl-2.24.so > 0x7fdf5d5ae000 34K /lib/x86_64-linux-gnu/libnss_compat-2.24.so > 0x7fdf5d7b7000 187K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjava.so > 0x7fdf5d9e6000 70K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libverify.so > 0x7fdf5dbf8000 30K /lib/x86_64-linux-gnu/librt-2.24.so > 0x7fdf5de0 90K /lib/x86_64-linux-gnu/libgcc_s.so.1 > 0x7fdf5e017000 1063K /lib/x86_64-linux-gnu/libm-2.24.so > 0x7fdf5e32 1553K /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22 > 0x7fdf5e6a8000 15936K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so > 0x7fdf5f5ed000 139K/lib/x86_64-linux-gnu/libpthread-2.24.so > 0x7fdf5f80b000 14K /lib/x86_64-linux-gnu/libdl-2.24.so > 0x7fdf5fa0f000 110K/lib/x86_64-linux-gnu/libz.so.1.2.11 > 0x7fdf5fc2b000 1813K /lib/x86_64-linux-gnu/libc-2.24.so > 0x7fdf5fff2000 58K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/jli/libjli.so > 0x7fdf60201000 158K/lib/x86_64-linux-gnu/ld-2.24.so > > -Original message- >> From:Bernd Fehling <bernd.fehl...@uni-bielefeld.de> >> Sent: Thursday 24th August 2017 15:39 >> To: solr-user@lucene.apache.org >> Subject: Re: Solr uses lots of shared memory! >> >> Just an idea, how about taking a dump with jmap and using >> MemoryAnalyzerTool to see what is going on? >> >> Regards >> Bernd >> >> >> Am 24.08.2017 um 11:49 schrieb Markus Jelsma: >> > Hello Shalin, >> > >> > Yes, the main search index has DocValues on just a few fields, they are >> > used for facetting and function queries, we started using DocValues when >> > 6.0 was released. Most fields are content fields for many languages. I >> > don't think it is going to be DocValues because the max shared memory >> > consumption is reduced my searching on fields fewer languages, and by >> > disabling highlighting, both not using DocValues. >> > >> > But it tried the option regardless, and because i didn't kno
Re: Possible regression in Parallel SQL in 6.5.1?
Well didn't take as long as I thought: https://issues.apache.org/jira/browse/CALCITE-1306 Once Calcite 1.13 is released we should upgrade and get support for this again. Kevin Risden On Tue, May 16, 2017 at 7:23 PM, Kevin Risden <compuwizard...@gmail.com> wrote: > Yea this came up on the calcite mailing list. Not sure if aliases in the > having clause were going to be added. I'll have to see if I can find that > discussion or JIRA. > > Kevin Risden > > On May 16, 2017 18:54, "Joel Bernstein" <joels...@gmail.com> wrote: > >> Yeah, Calcite doesn't support field aliases in the having clause. The >> query >> should work if you use count(*). We could consider this a regression, but >> I >> think this will be a won't fix. >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Tue, May 16, 2017 at 12:51 PM, Timothy Potter <thelabd...@gmail.com> >> wrote: >> >> > This SQL used to work pre-calcite: >> > >> > SELECT movie_id, COUNT(*) as num_ratings, avg(rating) as aggAvg FROM >> > ratings GROUP BY movie_id HAVING num_ratings > 100 ORDER BY aggAvg ASC >> > LIMIT 10 >> > >> > Now I get: >> > Caused by: java.io.IOException: --> >> > http://192.168.1.4:8983/solr/ratings_shard2_replica1/:Failed to >> > execute sqlQuery 'SELECT movie_id, COUNT(*) as num_ratings, >> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING >> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10' against JDBC >> > connection 'jdbc:calcitesolr:'. >> > Error while executing SQL "SELECT movie_id, COUNT(*) as num_ratings, >> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING >> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10": From line 1, column >> > 103 to line 1, column 113: Column 'num_ratings' not found in any table >> > at org.apache.solr.client.solrj.io.stream.SolrStream.read( >> > SolrStream.java:235) >> > at com.lucidworks.spark.query.TupleStreamIterator.fetchNextTupl >> e( >> > TupleStreamIterator.java:82) >> > at com.lucidworks.spark.query.TupleStreamIterator.hasNext( >> > TupleStreamIterator.java:47) >> > ... 31 more >> > >> >
Re: Possible regression in Parallel SQL in 6.5.1?
Yea this came up on the calcite mailing list. Not sure if aliases in the having clause were going to be added. I'll have to see if I can find that discussion or JIRA. Kevin Risden On May 16, 2017 18:54, "Joel Bernstein" <joels...@gmail.com> wrote: > Yeah, Calcite doesn't support field aliases in the having clause. The query > should work if you use count(*). We could consider this a regression, but I > think this will be a won't fix. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, May 16, 2017 at 12:51 PM, Timothy Potter <thelabd...@gmail.com> > wrote: > > > This SQL used to work pre-calcite: > > > > SELECT movie_id, COUNT(*) as num_ratings, avg(rating) as aggAvg FROM > > ratings GROUP BY movie_id HAVING num_ratings > 100 ORDER BY aggAvg ASC > > LIMIT 10 > > > > Now I get: > > Caused by: java.io.IOException: --> > > http://192.168.1.4:8983/solr/ratings_shard2_replica1/:Failed to > > execute sqlQuery 'SELECT movie_id, COUNT(*) as num_ratings, > > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING > > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10' against JDBC > > connection 'jdbc:calcitesolr:'. > > Error while executing SQL "SELECT movie_id, COUNT(*) as num_ratings, > > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING > > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10": From line 1, column > > 103 to line 1, column 113: Column 'num_ratings' not found in any table > > at org.apache.solr.client.solrj.io.stream.SolrStream.read( > > SolrStream.java:235) > > at com.lucidworks.spark.query.TupleStreamIterator. > fetchNextTuple( > > TupleStreamIterator.java:82) > > at com.lucidworks.spark.query.TupleStreamIterator.hasNext( > > TupleStreamIterator.java:47) > > ... 31 more > > >
Re: Solr with HDFS on AWS S3 - Server restart fails to load the core
> > Thank you for the response. Setting “loadOnStartup=true“ results in showing > the connection timeout on clicking 'Core Admin' on Solr UI. Also, reload > does not work as the core is not loaded at all. Can you clarify what you mean by this? Does the core get loaded after you restart Solr? The initial description was the core wasn't loaded after Solr was restarted. What you are describing now is different I think. Kevin Risden On Fri, Apr 7, 2017 at 6:31 PM, Amarnath palavalli <pamarn...@gmail.com> wrote: > Hi Trey, > > Thank you for the response. Setting “loadOnStartup=true“ results in showing > the connection timeout on clicking 'Core Admin' on Solr UI. Also, reload > does not work as the core is not loaded at all. > > I suspect, something to do with HTTP connection idle time, probably the > connection is closed before the data is pulled from S3. I see that the ' > maxUpdateConnectionIdleTime' is 40 seconds by default. However, don't know > how to change it. > > Thanks, > Amar > > > > On Fri, Apr 7, 2017 at 12:47 PM, Cahill, Trey <trey.cah...@siemens.com> > wrote: > > > Hi Amarnath, > > > > It looks like you’ve set the core to not load on startup via the > > “loadOnStartup=false“ property. Your response also shows that the core > is > > not loaded, “false“. > > > > I’m not really sure how to load cores after a restart, but possibly using > > the Core Admin Reload would do it (https://cwiki.apache.org/ > > confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD). > > > > Best of luck, > > > > Trey > > > > From: Amarnath palavalli [mailto:pamarn...@gmail.com] > > Sent: Friday, April 07, 2017 3:20 PM > > To: solr-user@lucene.apache.org > > Subject: Solr with HDFS on AWS S3 - Server restart fails to load the core > > > > Hello, > > > > I configured Solr to use HDFS, which in turn configured to use S3N. I > used > > the information from this issue to configure: > > https://issues.apache.org/jira/browse/SOLR-9952 > > > > Here is the command I have used to start the Solr with HDFS: > > bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory > > -Dsolr.lock.type=hdfs -Dsolr.hdfs.home=s3n://amar-hdfs/solr > > -Dsolr.hdfs.confdir=/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop > > -DXX:MaxDirectMemorySize=2g > > > > I am able to create a core, with the following properties: > > #Written by CorePropertiesLocator > > #Thu Apr 06 23:08:57 UTC 2017 > > name=amar-s3 > > loadOnStartup=false > > transient=true > > configSet=base-config > > > > I am able to ingest messages into Solr and also query the content. > > Everything seems to be fine until this stage and I can see the data dir > on > > S3. > > > > However, the problem is when I restart the Solr server, that is when I > see > > the core not loaded even when accessed/queried against it. Here is the > > admin API to get all cores gives: > > > > > > 0 > > 617 > > > > > > > > ... > > > > amar-s3 > > > > /Users/apalavalli/solr/solr-deployment/server/solr/amar-s3 > > > > data/ > > solrconfig.xml > > schema.xml > > false > > > > > > > > > > I don't see any issues reported in the log as well, but see this error > > from the UI: > > > > [Inline image 1] > > > > > > Not sure about the problem. This is happening when I ingest more than 40K > > messages in core before restarting Solr server. > > > > I am using Hadoop 2.7.3 with S3N FS. Please help me on resolving this > > issue. > > > > Thanks, > > Regards, > > Amar > > > > > > > > > > >
Re: Searchable archive of this mailing list
Google usually does a pretty good job of indexing this mailing list. The other place I'll usually go is here: http://search-lucene.com/?project=Solr Kevin Risden On Fri, Mar 31, 2017 at 4:18 PM, OTH <omer.t@gmail.com> wrote: > Hi all, > > Is there a searchable archive of this mailing list? > > I'm asking just so I don't have to post a question in the future which may > have been answered before already. > > Thanks >
Re: Add fieldType from Solr API
As Alex said there is no Admin UI support. The API is called the Schema API: https://cwiki.apache.org/confluence/display/solr/Schema+API That allows you to modify the schema programatically. You will have to reload the collection either way. Kevin Risden On Sun, Feb 26, 2017 at 1:33 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > You can hand edit it, just make sure to reload the collection after. > > Otherwise, I believe, there is API. > > Not the Admin UI yet, unfortunately. > > Regards, > Alex > > On 26 Feb 2017 1:50 PM, "OTH" <omer.t@gmail.com> wrote: > > Hello, > > I am new to Solr, and am using Solr v. 6.4.1. > > I need to add a new "fieldType" to my schema. My version of Solr is using > the "managed-schema" XML file, which I gather one is not supposed to modify > directly. Is it possible to add a new fieldType using the Solr Admin via > the browser? The "schema" page doesn't seem to provide this option, at > least from what I can tell. > > Thanks >
JSON Facet API - Range Query - Missing field parameter NPE
One of my colleagues ran into this testing the JSON Facet API. A malformed JSON Facet API range query seems to get a NPE and then devolves into saying no live servers to handle the request. It looks like the FacetRangeProcessor should check the inputs before trying to getField. Does this seem reasonable? The problematic query: json.facet={price:{type:range,start:0,end:600,gap:50}} The fixed query: json.facet={prices:{field:price,type:range,start:0,end:600,gap:50}} The stack trace: INFO - 2017-02-24 20:54:52.217; [c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica1] org.apache.solr.core.SolrCore; [gettingstarted_shard1_replica1] webapp=/solr path=/select params={df=_text_=false&_facet_={}=id= score=1048580=0=true=htt p://localhost:8983/solr/gettingstarted_shard1_replica1/| http://localhost:7574/solr/gettingstarted_shard1_replica2/=10; version=2=*:*={price:{type:range,start:0,end:600,gap:50}}= 1487969692214=true=javabin} hits=2328 status=500 QTime=1 ERROR - 2017-02-24 20:54:52.218; [c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica1] org.apache.solr.common.SolrException; null:java.lang.NullPointerException at org.apache.solr.schema.IndexSchema$DynamicReplacement$ DynamicPattern$NameEndsWith.matches(IndexSchema.java:1043) at org.apache.solr.schema.IndexSchema$DynamicReplacement.matches( IndexSchema.java:1057) at org.apache.solr.schema.IndexSchema.getFieldOrNull(IndexSchema.java:1213) at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1230) at org.apache.solr.search.facet.FacetRangeProcessor.process( FacetRange.java:96) at org.apache.solr.search.facet.FacetProcessor.processSubs( FacetProcessor.java:439) at org.apache.solr.search.facet.FacetProcessor.fillBucket( FacetProcessor.java:396) at org.apache.solr.search.facet.FacetQueryProcessor.process( FacetQuery.java:60) at org.apache.solr.search.facet.FacetModule.process(FacetModule.java:96) at org.apache.solr.handler.component.SearchHandler.handleRequestBody( SearchHandler.java:295) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:166) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306) Kevin Risden
Re: SSL using signed client certificate not working
It sounds like Edge, Firefox, and Chrome aren't setup on your computer to do client authentication. You can set need client authentication to false and use want client authentication in solr.in.sh. This will allow browsers that don't present a client certificate to work. Otherwise you need to configure your browsers. Client authentication is an extra part of SSL and not usually required. Kevin Risden On Feb 15, 2017 4:43 AM, "Espen Rise Halstensen" <e...@dips.no> wrote: > > Hi, > > I have some problems with client certificates. By the look of it, it works > with > curl and safari prompts for and accepts my certificate. Does not work with > Edge, > Firefox or Chrome. The certificates are requested from our CA. > > When requesting https://s02/solr in the browser, it doesn't > prompt for certificate and I get the following error message in Chrome: > >This site can't provide a secure connection > >s02 didn't accept your login certificate, or one may not have been > provided. > >Try contacting the system admin. > > When debugging with wireshark I can see the s01t9 certificate in the > "certificate request"-part of the handshake, but the browser answers > without certificate. > > > Setup as follows: > > solr.in.sh: > SOLR_SSL_KEY_STORE=etc/keystore.jks > SOLR_SSL_KEY_STORE_PASSWORD=secret > SOLR_SSL_TRUST_STORE=etc/truststore.jks > SOLR_SSL_TRUST_STORE_PASSWORD=secret > SOLR_SSL_NEED_CLIENT_AUTH=true > SOLR_SSL_WANT_CLIENT_AUTH=false > > Content of truststore.jks: > [solruser@s02 etc]# keytool -list -keystore > /opt/solr-6.4.0/server/etc/truststore.jks > -storepass secret > > Keystore type: JKS > Keystore provider: SUN > > Your keystore contains 1 entry > > s01t9, 15.feb.2017, trustedCertEntry, > Certificate fingerprint (SHA1): CF:BD:02:71:64:F0:BA:65:71:10: > A1:23:42:34:E0:3C:37:75:E1:BF > > > > Curl(returns html of admin page with -L option): > > curl -v -E s01t9.pem:secret --cacert rootca.pem 'https://vs02/solr' > * Hostname was NOT found in DNS cache > * Trying 10.0.121.132... > * Connected to s02 (10.0.121.132) port 443 (#0) > * successfully set certificate verify locations: > * CAfile: rootca.pem > CApath: /etc/ssl/certs > * SSLv3, TLS handshake, Client hello (1): > * SSLv3, TLS handshake, Server hello (2): > * SSLv3, TLS handshake, CERT (11): > * SSLv3, TLS handshake, Request CERT (13): > * SSLv3, TLS handshake, Server finished (14): > * SSLv3, TLS handshake, CERT (11): > * SSLv3, TLS handshake, Client key exchange (16): > * SSLv3, TLS handshake, CERT verify (15): > * SSLv3, TLS change cipher, Client hello (1): > * SSLv3, TLS handshake, Finished (20): > * SSLv3, TLS change cipher, Client hello (1): > * SSLv3, TLS handshake, Finished (20): > * SSL connection using AES256-SHA256 > * Server certificate: > *subject: CN=s01t9 > *start date: 2017-01-09 11:31:49 GMT > *expire date: 2022-01-08 11:31:49 GMT > *subjectAltName: s02 matched > *issuer: DC=local; DC=com; CN=Root CA > *SSL certificate verify ok. > > GET /solr HTTP/1.1 > > User-Agent: curl/7.35.0 > > Host: s02 > > Accept: */* > > > < HTTP/1.1 302 Found > < Location: https://s02 /solr/ > < Content-Length: 0 > < > * Connection #0 to host s02 left intact > > Thanks, > Espen >
Re: 回复: bin/post and self-signed SSL
I expect that the commands work the same or very close from 5.5.x through 6.4.x. There have been some cleaning up of the bin/solr and bin/post commands but not many security changes. If you find differently then please let us know. Kevin Risden On Feb 5, 2017 21:02, "alias" <524839...@qq.com> wrote: > You mean this can only be used in this version 5.5.x? Other versions > invalid? > > > > > -- 原始邮件 -- > 发件人: "Kevin Risden";<compuwizard...@gmail.com>; > 发送时间: 2017年2月6日(星期一) 上午9:44 > 收件人: "solr-user"<solr-user@lucene.apache.org>; > > 主题: Re: bin/post and self-signed SSL > > > > Originally formatted as MarkDown. This was tested against Solr 5.5.x > packaged as Lucidworks HDP Search. It would be the same as Solr 5.5.x. > > # Using Solr > * > https://cwiki.apache.org/confluence/display/solr/Solr+ > Start+Script+Reference > * https://cwiki.apache.org/confluence/display/solr/Running+Solr > * https://cwiki.apache.org/confluence/display/solr/Collections+API > > ## Create collection (w/o Kerberos) > ```bash > /opt/lucidworks-hdpsearch/solr/bin/solr create -c test > ``` > > ## Upload configuration directory (w/ SSL and Kerberos) > ```bash > /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh > -zkhost ZK_CONNECTION_STRING -cmd upconfig -confname basic_config -confdir > /opt/lucidworks-hdpsearch/solr/server/solr/configsets/basic_configs/conf > ``` > > ## Create Collection (w/ SSL and Kerberos) > ```bash > curl -k --negotiate -u : " > https://SOLR_HOST:8983/solr/admin/collections?action= > CREATE=newCollection=1= > 1=basic_config > " > ``` > > ## Delete collection (w/o Kerberos) > ```bash > /opt/lucidworks-hdpsearch/solr/bin/solr delete -c test > ``` > > ## Delete Collection (w/ SSL and Kerberos) > ```bash > curl -k --negotiate -u : " > https://SOLR_HOST:8983/solr/admin/collections?action= > DELETE=newCollection > " > ``` > > ## Adding some test docs (w/o SSL) > ```bash > /opt/lucidworks-hdpsearch/solr/bin/post -c test > /opt/lucidworks-hdpsearch/solr/example/exampledocs/*.xml > ``` > > ## Adding documents (w/ SSL and Kerberos) > ```bash > curl -k --negotiate -u : " > https://SOLR_HOST:8983/solr/newCollection/update?commit=true; -H > "Content-Type: application/json" --data-binary > @/opt/lucidworks-hdpsearch/solr/example/exampledocs/books.json > ``` > > ## List Collections (w/ SSL and Kerberos) > ```bash > curl -k --negotiate -u : " > https://SOLR_HOST:8983/solr/admin/collections?action=LIST; > ``` > > Kevin Risden > > On Sun, Feb 5, 2017 at 5:55 PM, Kevin Risden <compuwizard...@gmail.com> > wrote: > > > Last time I looked at this, there was no way to pass any Java properties > > to the bin/post command. This made it impossible to even set the SSL > > properties manually. I checked master just now and still there is no > place > > to enter Java properties that would make it to the Java command. > > > > I came up with a chart of commands previously that worked with standard > > (no SSL or Kerberos), SSL only, and SSL with Kerberos. Only the standard > > solr setup worked for the bin/solr and bin/post commands. Errors popped > up > > that I couldn't work around. I've been meaning to get back to it just > > haven't had a chance. > > > > I'll try to share that info when I get back to my laptop. > > > > Kevin Risden > > > > On Feb 5, 2017 12:31, "Jan Høydahl" <jan@cominvent.com> wrote: > > > >> Hi, > >> > >> I’m trying to post a document to Solr using bin/post after enabling SSL > >> with self signed certificate. Result is: > >> > >> $ post -url https://localhost:8983/solr/sslColl *.html > >> /usr/lib/jvm/java-8-openjdk-amd64/bin/java -classpath > >> /opt/solr/dist/solr-core-6.4.0.jar -Dauto=yes -Durl= > >> https://localhost:8983/solr/sslColl -Dc= -Ddata=files > >> org.apache.solr.util.SimplePostTool lab-index.html lab-ops1.html > >> lab-ops2.html lab-ops3.html lab-ops4.html lab-ops6.html lab-ops8.html > >> SimplePostTool version 5.0.0 > >> Posting files to [base] url https://localhost:8983/solr/sslColl... > >> Entering auto mode. File endings considered are > >> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp, > >> ods,ott,otp,ots,rtf,htm,html,txt,log > >> POSTing file lab-index.html (text/html) to [base]/extract > >> SimplePostTool: FATAL: Connection error (is Solr running at > >> https://localhost:8983/solr/sslColl ?): javax.net.ssl. > SSLHandshakeException: > >> sun.security.validator.ValidatorException: PKIX path building failed: > >> sun.security.provider.certpath.SunCertPathBuilderException: unable to > >> find valid certification path to requested target > >> > >> > >> Do anyone know a workaround for letting bin/post accept self-signed > cert? > >> Have not tested it against a CA signed Solr... > >> > >> -- > >> Jan Høydahl, search solution architect > >> Cominvent AS - www.cominvent.com > >> > >>
Re: bin/post and self-signed SSL
Originally formatted as MarkDown. This was tested against Solr 5.5.x packaged as Lucidworks HDP Search. It would be the same as Solr 5.5.x. # Using Solr * https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference * https://cwiki.apache.org/confluence/display/solr/Running+Solr * https://cwiki.apache.org/confluence/display/solr/Collections+API ## Create collection (w/o Kerberos) ```bash /opt/lucidworks-hdpsearch/solr/bin/solr create -c test ``` ## Upload configuration directory (w/ SSL and Kerberos) ```bash /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost ZK_CONNECTION_STRING -cmd upconfig -confname basic_config -confdir /opt/lucidworks-hdpsearch/solr/server/solr/configsets/basic_configs/conf ``` ## Create Collection (w/ SSL and Kerberos) ```bash curl -k --negotiate -u : " https://SOLR_HOST:8983/solr/admin/collections?action=CREATE=newCollection=1=1=basic_config " ``` ## Delete collection (w/o Kerberos) ```bash /opt/lucidworks-hdpsearch/solr/bin/solr delete -c test ``` ## Delete Collection (w/ SSL and Kerberos) ```bash curl -k --negotiate -u : " https://SOLR_HOST:8983/solr/admin/collections?action=DELETE=newCollection " ``` ## Adding some test docs (w/o SSL) ```bash /opt/lucidworks-hdpsearch/solr/bin/post -c test /opt/lucidworks-hdpsearch/solr/example/exampledocs/*.xml ``` ## Adding documents (w/ SSL and Kerberos) ```bash curl -k --negotiate -u : " https://SOLR_HOST:8983/solr/newCollection/update?commit=true; -H "Content-Type: application/json" --data-binary @/opt/lucidworks-hdpsearch/solr/example/exampledocs/books.json ``` ## List Collections (w/ SSL and Kerberos) ```bash curl -k --negotiate -u : " https://SOLR_HOST:8983/solr/admin/collections?action=LIST; ``` Kevin Risden On Sun, Feb 5, 2017 at 5:55 PM, Kevin Risden <compuwizard...@gmail.com> wrote: > Last time I looked at this, there was no way to pass any Java properties > to the bin/post command. This made it impossible to even set the SSL > properties manually. I checked master just now and still there is no place > to enter Java properties that would make it to the Java command. > > I came up with a chart of commands previously that worked with standard > (no SSL or Kerberos), SSL only, and SSL with Kerberos. Only the standard > solr setup worked for the bin/solr and bin/post commands. Errors popped up > that I couldn't work around. I've been meaning to get back to it just > haven't had a chance. > > I'll try to share that info when I get back to my laptop. > > Kevin Risden > > On Feb 5, 2017 12:31, "Jan Høydahl" <jan@cominvent.com> wrote: > >> Hi, >> >> I’m trying to post a document to Solr using bin/post after enabling SSL >> with self signed certificate. Result is: >> >> $ post -url https://localhost:8983/solr/sslColl *.html >> /usr/lib/jvm/java-8-openjdk-amd64/bin/java -classpath >> /opt/solr/dist/solr-core-6.4.0.jar -Dauto=yes -Durl= >> https://localhost:8983/solr/sslColl -Dc= -Ddata=files >> org.apache.solr.util.SimplePostTool lab-index.html lab-ops1.html >> lab-ops2.html lab-ops3.html lab-ops4.html lab-ops6.html lab-ops8.html >> SimplePostTool version 5.0.0 >> Posting files to [base] url https://localhost:8983/solr/sslColl... >> Entering auto mode. File endings considered are >> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp, >> ods,ott,otp,ots,rtf,htm,html,txt,log >> POSTing file lab-index.html (text/html) to [base]/extract >> SimplePostTool: FATAL: Connection error (is Solr running at >> https://localhost:8983/solr/sslColl ?): javax.net.ssl.SSLHandshakeException: >> sun.security.validator.ValidatorException: PKIX path building failed: >> sun.security.provider.certpath.SunCertPathBuilderException: unable to >> find valid certification path to requested target >> >> >> Do anyone know a workaround for letting bin/post accept self-signed cert? >> Have not tested it against a CA signed Solr... >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> >>
Re: bin/post and self-signed SSL
Last time I looked at this, there was no way to pass any Java properties to the bin/post command. This made it impossible to even set the SSL properties manually. I checked master just now and still there is no place to enter Java properties that would make it to the Java command. I came up with a chart of commands previously that worked with standard (no SSL or Kerberos), SSL only, and SSL with Kerberos. Only the standard solr setup worked for the bin/solr and bin/post commands. Errors popped up that I couldn't work around. I've been meaning to get back to it just haven't had a chance. I'll try to share that info when I get back to my laptop. Kevin Risden On Feb 5, 2017 12:31, "Jan Høydahl" <jan@cominvent.com> wrote: > Hi, > > I’m trying to post a document to Solr using bin/post after enabling SSL > with self signed certificate. Result is: > > $ post -url https://localhost:8983/solr/sslColl *.html > /usr/lib/jvm/java-8-openjdk-amd64/bin/java -classpath > /opt/solr/dist/solr-core-6.4.0.jar -Dauto=yes -Durl= > https://localhost:8983/solr/sslColl -Dc= -Ddata=files > org.apache.solr.util.SimplePostTool lab-index.html lab-ops1.html > lab-ops2.html lab-ops3.html lab-ops4.html lab-ops6.html lab-ops8.html > SimplePostTool version 5.0.0 > Posting files to [base] url https://localhost:8983/solr/sslColl... > Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc, > docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log > POSTing file lab-index.html (text/html) to [base]/extract > SimplePostTool: FATAL: Connection error (is Solr running at > https://localhost:8983/solr/sslColl ?): javax.net.ssl.SSLHandshakeException: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException: unable to > find valid certification path to requested target > > > Do anyone know a workaround for letting bin/post accept self-signed cert? > Have not tested it against a CA signed Solr... > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > >
Re: How long for autoAddReplica?
> > so migrating by replacing nodes is going to be a bother. Not sure what you mean by migrating and replacing nodes, but these two new actions on the Collections API as of Solr 6.2 may be of use: - https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-REPLACENODE:MoveAllReplicasinaNodetoAnother - https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DELETENODE:DeleteReplicasinaNode Kevin Risden On Thu, Feb 2, 2017 at 11:46 AM, Erick Erickson <erickerick...@gmail.com> wrote: > bq: I don’t see a way to add replicas through the UI, so migrating by > replacing nodes is going to be a bother > > There's a lot of improvements in the admin UI for SolrCloud that I'd > love to see. Drag/drop replicas would be really cool for instance. > > At present though using > ADDREPLICA/wait-for-new-replica-to-be-active/DELETEREPLICA through the > collections API is what's available. > > Best, > Erick > > On Thu, Feb 2, 2017 at 8:37 AM, Walter Underwood <wun...@wunderwood.org> > wrote: > > Oh, missed that limitation. > > > > Seems like something that would be very handy in all installations. I > don’t see a way to add replicas through the UI, so migrating by replacing > nodes is going to be a bother. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > > >> On Feb 2, 2017, at 12:25 AM, Hendrik Haddorp <hendrik.hadd...@gmx.net> > wrote: > >> > >> Hi, > >> > >> are you using HDFS? According to the documentation the feature should > be only available if you are using HDFS. For me it did however also fail on > that. See the thread "Solr on HDFS: AutoAddReplica does not add a replica" > from about two weeks ago. > >> > >> regards, > >> Hendrik > >> > >> On 02.02.2017 07:21, Walter Underwood wrote: > >>> I added a new node an shut down a node with a shard replica on it. It > has been an hour and I don’t see any activity toward making a new replica. > >>> > >>> The new node and the one I shut down are both 6.4. The rest of the > 16-node cluster is 6.2.1. > >>> > >>> wunder > >>> Walter Underwood > >>> wun...@wunderwood.org > >>> http://observer.wunderwood.org/ (my blog) > >>> > >>> > >>> > >> > > >
Re: 6.4 in a 6.2.1 cluster?
Just my two cents: I wouldn't trust that it completely works to be honest. It works for the very small test case that was put together (select q=*:*). I would love to add more tests to it. If there are any ideas of things that you think should be tested that would be great to comment on the JIRA (ideally everything but prioritizing some examples would be nice). Kevin Risden On Tue, Jan 31, 2017 at 11:19 AM, Walter Underwood <wun...@wunderwood.org> wrote: > I’m sure people need to do this, so I’ll share it worked for me. > > I just noticed that there is a new integration test being written to > verify that this works. Great! > > https://issues.apache.org/jira/browse/SOLR-8581 < > https://issues.apache.org/jira/browse/SOLR-8581> > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Jan 25, 2017, at 11:18 AM, Walter Underwood <wun...@wunderwood.org> > wrote: > > > > Has anybody done this? Not for long term use of course, but does it work > well enough > > for a rolling upgrade? > > > > wunder > > Walter Underwood > > wun...@wunderwood.org <mailto:wun...@wunderwood.org> > > http://observer.wunderwood.org/ (my blog) > > > > > >
Re: Adding DocExpirationUpdateProcessorFactory causes "Overlapping onDeckSearchers" warnings
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/update/processor/DocExpirationUpdateProcessorFactory.java#L407 Based on that it looks like this would definitely trigger additional commits. Specifically with openSearcher being true. Not sure the best way around this. Kevin Risden On Fri, Dec 9, 2016 at 5:15 PM, Brent <brent.pear...@gmail.com> wrote: > I'm using Solr Cloud 6.1.0, and my client application is using SolrJ 6.1.0. > > Using this Solr config, I get none of the dreaded "PERFORMANCE WARNING: > Overlapping onDeckSearchers=2" log messages: > https://dl.dropboxusercontent.com/u/49733981/solrconfig-no_warnings.xml > > However, I start getting them frequently after I add an expiration update > processor to the update request processor chain, as seen in this config (at > the bottom): > https://dl.dropboxusercontent.com/u/49733981/solrconfig-warnings.xml > > Do I have something configured wrong in the way I've tried to add the > function of expiring documents? My client application sets the "expire_at" > field with the date to remove the document being added, so I don't need > anything on the Solr Cloud side to calculate the expiration date using a > TTL. I've confirmed that the documents are getting removed as expected > after > the TTL duration. > > Is it possible that the expiration processor is triggering additional > commits? Seems like the warning is usually the result of commits happening > too frequently. If the commit spacing is fine without the expiration > processor, but not okay when I add it, it seems like maybe each update is > now triggering a (soft?) commit. Although, that'd actually be crazy and I'm > sure I'd see a lot more errors if that were the case... is it triggering a > commit every 30 seconds, because that's what I have the > autoDeletePeriodSeconds set to? Maybe if I try to offset that a bit from > the > 10 second auto soft commit I'm using? Seems like it'd be better (if that is > the case) if the processor simple didn't have to do a commit when it > expires > documents, and instead let the auto commit settings handle that. > > Do I still need the line: > name="/update"> > when I have the > default="true"> > element? > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Adding- > DocExpirationUpdateProcessorFactory-causes-Overlapping- > onDeckSearchers-warnings-tp4309155.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Highlighting, offsets -- external doc store
For #2 you might be able to get away with the following: https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component The Term Vector component can return offsets and positions. Not sure how useful they would be to you, but at least is a starting point. I'm assuming this requires only termVecotrs and termPositions and won't require stored to be true. Kevin Risden On Tue, Nov 29, 2016 at 12:00 PM, Kevin Risden <compuwizard...@gmail.com> wrote: > For #3 specifically, I've always found this page useful: > > https://cwiki.apache.org/confluence/display/solr/Field+ > Properties+by+Use+Case > > It lists out what properties are necessary on each field based on a use > case. > > Kevin Risden > > On Tue, Nov 29, 2016 at 11:49 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> (1) No that I have readily at hand. And to make it >> worse, there's the UnifiedHighlighter coming out soon >> >> I don't think there's a good way for (2). >> >> for (3) at least yes. The reason is simple. For analyzed text, >> the only thing in the index is what's made it through the >> analysis chains. So stopwords are missing. Stemming >> has been done. You could even have put a phonetic filter >> in there and have terms like ARDT KNTR which would >> be...er...not very useful to show the end user so the original >> text must be available. >> >> >> >> >> Not much help... >> Erick >> >> On Tue, Nov 29, 2016 at 8:43 AM, John Bickerstaff >> <j...@johnbickerstaff.com> wrote: >> > All, >> > >> > One of the questions I've been asked to answer / prove out is around the >> > question of highlighting query matches in responses. >> > >> > BTW - One assumption I'm making is that highlighting is basically a >> > function of storing offsets for terms / tokens at index time. If that's >> > not right, I'd be grateful for pointers in the right direction. >> > >> > My underlying need is to get highlighting on search term matches for >> > returned documents. I need to choose between doing this in Solr and >> using >> > an external document store, so I'm interested in whether Solr can >> provide >> > the doc store with the information necessary to identify which >> section(s) >> > of the doc to highlight in a query response... >> > >> > A few questions: >> > >> > 1. This page doesn't say a lot about how things work - is there >> somewhere >> > with more information on dealing with offsets and highlighting? On >> offsets >> > and how they're handled? >> > https://cwiki.apache.org/confluence/display/solr/Highlighting >> > >> > 2. Can I return offset information with a query response or is that >> > internal only? If yes, can I return offset info if I have NOT stored >> the >> > data in Solr but indexed only? >> > >> > (Explanation: Currently my project is considering indexing only and >> storing >> > the entire text elsewhere -- using Solr to return only doc ID's for >> > searches. If Solr could also return offsets, these could be used in >> > processing the text stored elsewhere to provide highlighting) >> > >> > 3. Do I assume correctly that in order for Solr highlighting to work >> > correctly, the text MUST also be stored in Solr (I.E. not indexed only, >> but >> > stored=true) >> > >> > Many thanks... >> > >
Re: Highlighting, offsets -- external doc store
For #3 specifically, I've always found this page useful: https://cwiki.apache.org/confluence/display/solr/Field+Properties+by+Use+Case It lists out what properties are necessary on each field based on a use case. Kevin Risden On Tue, Nov 29, 2016 at 11:49 AM, Erick Erickson <erickerick...@gmail.com> wrote: > (1) No that I have readily at hand. And to make it > worse, there's the UnifiedHighlighter coming out soon > > I don't think there's a good way for (2). > > for (3) at least yes. The reason is simple. For analyzed text, > the only thing in the index is what's made it through the > analysis chains. So stopwords are missing. Stemming > has been done. You could even have put a phonetic filter > in there and have terms like ARDT KNTR which would > be...er...not very useful to show the end user so the original > text must be available. > > > > > Not much help... > Erick > > On Tue, Nov 29, 2016 at 8:43 AM, John Bickerstaff > <j...@johnbickerstaff.com> wrote: > > All, > > > > One of the questions I've been asked to answer / prove out is around the > > question of highlighting query matches in responses. > > > > BTW - One assumption I'm making is that highlighting is basically a > > function of storing offsets for terms / tokens at index time. If that's > > not right, I'd be grateful for pointers in the right direction. > > > > My underlying need is to get highlighting on search term matches for > > returned documents. I need to choose between doing this in Solr and > using > > an external document store, so I'm interested in whether Solr can provide > > the doc store with the information necessary to identify which section(s) > > of the doc to highlight in a query response... > > > > A few questions: > > > > 1. This page doesn't say a lot about how things work - is there somewhere > > with more information on dealing with offsets and highlighting? On > offsets > > and how they're handled? > > https://cwiki.apache.org/confluence/display/solr/Highlighting > > > > 2. Can I return offset information with a query response or is that > > internal only? If yes, can I return offset info if I have NOT stored the > > data in Solr but indexed only? > > > > (Explanation: Currently my project is considering indexing only and > storing > > the entire text elsewhere -- using Solr to return only doc ID's for > > searches. If Solr could also return offsets, these could be used in > > processing the text stored elsewhere to provide highlighting) > > > > 3. Do I assume correctly that in order for Solr highlighting to work > > correctly, the text MUST also be stored in Solr (I.E. not indexed only, > but > > stored=true) > > > > Many thanks... >
Re: Documentation of Zookeeper's specific roles and functions in Solr Cloud?
If using CloudSolrClient or another zookeeper aware client, then a request gets sent to Zookeeper to determine the live nodes. If indexing, CloudSolrClient can find the leader and send documents directly there. The client then uses that information to query the correct nodes directly. Zookeeper is not forwarding requests to Solr. The client requests from Zookeeper and then the client uses that information to query Solr directly. Kevin Risden On Tue, Nov 29, 2016 at 10:49 AM, John Bickerstaff <j...@johnbickerstaff.com > wrote: > All, > > I've thought I understood that Solr search requests are made to the Solr > servers and NOT Zookeeper directly. (I.E. Zookeeper doesn't decide which > Solr server responds to requests and requests are made directly to Solr) > > My new place tells me they're sending requests to Zookeeper - and those are > getting sent on to Solr by Zookeeper - -- this is news to me if it's > true... > > Is there any documentation of exactly the role(s) played by Zookeeper in a > SolrCloud setup? >
Re: Solr 6.3.0 SQL question
Is there a longer error/stack trace in your Solr server logs? I wonder if the real error is being masked. Kevin Risden On Mon, Nov 28, 2016 at 3:24 PM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: > I'm running this query: > > curl --data-urlencode 'stmt=SELECT avg(TextSize) from UNCLASS' > http://cordelia:9100/solr/UNCLASS/sql?aggregationMode=map_reduce > > The error that I get back is: > > {"result-set":{"docs":[ > {"EXCEPTION":"org.apache.solr.common.SolrException: Collection not found: > unclass","EOF":true,"RESPONSE_TIME":2}]}} > > TextSize is defined as: > indexed="true" stored="true"/> > > This query works fine: > curl --data-urlencode 'stmt=SELECT TextSize from UNCLASS' > http://cordelia:9100/solr/UNCLASS/sql?aggregationMode=map_reduce > > Any idea what I'm doing wrong? > Thank you! > > -Joe > >
Re: Basic Auth for Solr Streaming Expressions
Thanks Sandeep! Kevin Risden On Wed, Nov 16, 2016 at 3:33 PM, sandeep mukherjee < wiredcit...@yahoo.com.invalid> wrote: > [SOLR-9779] Basic auth in not supported in Streaming Expressions - ASF JIRA > > | > | > | > | || > >| > > | > | > | | > [SOLR-9779] Basic auth in not supported in Streaming Expressions - ASF JIRA >| | > > | > > | > > > > I have created the above jira ticket for the base auth support in solr > streaming expressions. > ThanksSandeep > > On Wednesday, November 16, 2016 8:22 AM, sandeep mukherjee > <wiredcit...@yahoo.com.INVALID> wrote: > > > blockquote, div.yahoo_quoted { margin-left: 0 !important; > border-left:1px #715FFA solid !important; padding-left:1ex !important; > background-color:white !important; } Nope never got past the login screen. > Will create one today. > > > Sent from Yahoo Mail for iPhone > > > On Wednesday, November 16, 2016, 8:17 AM, Kevin Risden < > compuwizard...@gmail.com> wrote: > > Was a JIRA ever created for this? I couldn't find it searching. > > One that is semi related is SOLR-8213 for SolrJ JDBC auth. > > Kevin Risden > > On Wed, Nov 9, 2016 at 8:25 PM, Joel Bernstein <joels...@gmail.com> wrote: > > > Thanks for digging into this, let's create a jira ticket for this. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Wed, Nov 9, 2016 at 6:23 PM, sandeep mukherjee < > > wiredcit...@yahoo.com.invalid> wrote: > > > > > I have more progress since my last mail. I figured out that in the > > > StreamContext object there is a way to set the SolrClientCache object > > which > > > keep reference to all the CloudSolrClient where I can set a reference > to > > > HttpClient which sets the Basic Auth header. However the problem is, > > inside > > > the SolrClientCache there is no way to set your own version of > > > CloudSolrClient with BasicAuth enabled. Unfortunately, SolrClientCache > > has > > > no set method which takes a CloudSolrClient object. > > > So long story short we need an API in SolrClientCache to > > > accept CloudSolrClient object from user. > > > Please let me know if there is a better way to enable Basic Auth when > > > using StreamFactory as mentioned in my previous email. > > > Thanks much,Sandeep > > > > > >On Wednesday, November 9, 2016 11:44 AM, sandeep mukherjee > > > <wiredcit...@yahoo.com.INVALID> wrote: > > > > > > > > > Hello everyone, > > > I trying to find the documentation for Basic Auth plugin for Solr > > > Streaming expressions. But I'm not able to find it in the documentation > > > anywhere. Could you please point me in right direction of how to enable > > > Basic auth for Solr Streams? > > > I'm creating StreamFactory as follows: I wonder how and where can I > > > specify Basic Auth username and password > > > @Bean > > > public StreamFactory streamFactory() { > > >SolrConfig solrConfig = ConfigManager.getNamedConfig("solr", > > > SolrConfig.class); > > > > > >return new StreamFactory().withDefaultZkHost(solrConfig. > > > getConnectString()) > > >.withFunctionName("gatherNodes", GatherNodesStream.class); > > > } > > > > > > > > > > > > > > > > >
Re: Hardware size in solrcloud
First question: is your initial sizing correct? 7GB/1 billion = 7 bytes per document? That would be basically 7 characters? Anyway there are lots of variables regarding sizing. The typical response is: https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Kevin Risden On Wed, Nov 16, 2016 at 1:12 PM, Mugeesh Husain <muge...@gmail.com> wrote: > I have lots of document i dont know now how much it would be in future. for > the inilial stage, I am looking for hardware details(assumption). > > I are looking forward to setting up a billion document(1 billion approx) > solr index and the size is 7GB. > > Can you please suggest the hardware details as per experience. > 1. OS(32/64bit): > 2. Processor: > 3. RAM: > 4. No of physical servers/systems : > > > Thanks > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Hardware-size-in-solrcloud-tp4306169.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Basic Auth for Solr Streaming Expressions
Was a JIRA ever created for this? I couldn't find it searching. One that is semi related is SOLR-8213 for SolrJ JDBC auth. Kevin Risden On Wed, Nov 9, 2016 at 8:25 PM, Joel Bernstein <joels...@gmail.com> wrote: > Thanks for digging into this, let's create a jira ticket for this. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, Nov 9, 2016 at 6:23 PM, sandeep mukherjee < > wiredcit...@yahoo.com.invalid> wrote: > > > I have more progress since my last mail. I figured out that in the > > StreamContext object there is a way to set the SolrClientCache object > which > > keep reference to all the CloudSolrClient where I can set a reference to > > HttpClient which sets the Basic Auth header. However the problem is, > inside > > the SolrClientCache there is no way to set your own version of > > CloudSolrClient with BasicAuth enabled. Unfortunately, SolrClientCache > has > > no set method which takes a CloudSolrClient object. > > So long story short we need an API in SolrClientCache to > > accept CloudSolrClient object from user. > > Please let me know if there is a better way to enable Basic Auth when > > using StreamFactory as mentioned in my previous email. > > Thanks much,Sandeep > > > > On Wednesday, November 9, 2016 11:44 AM, sandeep mukherjee > > <wiredcit...@yahoo.com.INVALID> wrote: > > > > > > Hello everyone, > > I trying to find the documentation for Basic Auth plugin for Solr > > Streaming expressions. But I'm not able to find it in the documentation > > anywhere. Could you please point me in right direction of how to enable > > Basic auth for Solr Streams? > > I'm creating StreamFactory as follows: I wonder how and where can I > > specify Basic Auth username and password > > @Bean > > public StreamFactory streamFactory() { > > SolrConfig solrConfig = ConfigManager.getNamedConfig("solr", > > SolrConfig.class); > > > > return new StreamFactory().withDefaultZkHost(solrConfig. > > getConnectString()) > > .withFunctionName("gatherNodes", GatherNodesStream.class); > > } > > > > > > >
Re: Sorl shards: very sensitive to swap space usage !?
Agreed with what Shawn and Erick said. If you don't see anything in the Solr logs and your servers are swapping a lot, this could mean the Linux OOM killer is killing the Solr process (and maybe others). There is usually a log of this depending on your Linux distribution. Kevin Risden On Thu, Nov 10, 2016 at 6:42 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 11/10/2016 3:20 PM, Chetas Joshi wrote: > > I have a SolrCloud (Solr 5.5.0) of 50 nodes. The JVM heap memory usage > > of my solr shards is never more than 50% of the total heap. However, > > the hosts on which my solr shards are deployed often run into 99% swap > > space issue. This causes the solr shards go down. Why solr shards are > > so sensitive to the swap space usage? The JVM heap is more than enough > > so the shards should never require the swap space. What could be the > > reason? Where can find the reason why the solr shards go down. I don't > > see anything on the solr logs. > > If the machine that Solr is installed on is using swap, that means > you're having serious problems, and your performance will be TERRIBLE. > This kind of problem cannot be caused by Solr if it is properly > configured for the machine it's running on. > > Solr is a Java program. That means its memory usage is limited to the > Java heap, plus a little bit for Java itself, and absolutely cannot go > any higher. If the Java heap is set too large, then the operating > system might utilize swap to meet Java's memory demands. The solution > is to set your Java heap to a value that's significantly smaller than > the amount of available physical memory. Setting the heap to a value > that's close to (or more than) the amount of physical memory, is a > recipe for very bad performance. > > You need to also limit the memory usage of other software installed on > the machine, or you might run into a situation where swap is required > that is not Solr's fault. > > Thanks, > Shawn > >
Re: How to substract numeric value stored in 2 documents related by correlation id one-to-one
The Parallel SQL support for what you are asking for doesn't exist quite yet. The use case you described is close to what I was envisioning for the Solr SQL support. This would allow full text searches and then some analytics on top of it (like call duration). I'm not sure if subtracting fields (c2.time-c1.time) is supported in streaming expressions yet. The leftOuterJoin is but not sure about arbitrary math equations. The Parallel SQL side has an issue w/ 1!=0 right now so I'm guessing adding/subtracting is also out for now. The ticket you will want to follow is SOLR-8593 ( https://issues.apache.org/jira/browse/SOLR-8593) This is the Calcite integration and should enable a lot more SQL syntax as a result. Kevin Risden Apache Lucene/Solr Committer Hadoop and Search Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/> M: 732 213 8417 LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+ <http://www.google.com/+AvalonConsultingLLC> | Twitter <https://twitter.com/avalonconsult> - This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. On Wed, Oct 19, 2016 at 8:23 AM, <ka...@kahle.cz> wrote: > Hello, > I have 2 documents recorded at request or response of a service call : > Entity Request > { > "type":"REQ", > "reqid":"MES0", >"service":"service0", >"time":1, > } > Entity response > { > "type":"RES", > "reqid":"MES0", >"time":10, > } > > I need to create following statistics: > Total service call duration for each call (reqid is unique for each > service call) : > similar to query : > select c1.reqid,c1.service,c1.time as REQTime, c2.time as RESTime , > c2.time - c1.time as TotalTime from collection c1 left join collection c2 > on c1.reqid = c2.reqid and c2.type = 'RES' > > { >"reqid":"MES0", >"service":service0, >"REQTime":1, >"RESTime":10, >"TotalTime":9 > } > > Average service call duration : > similar to query : > select c1.service, avg(c2.time - c1.time) as AvgTime, count(*) from > collection c1 left join collection c2 on c1.reqid = c2.reqid and c2.type = > 'RES' group by c1.service > > { >"service":service0, >"AvgTime":9, >"Count": 1 > } > > I Tried to find solution in archives, I experimented with !join, > subquery, _query_ etc. but not succeeded.. > I can probably use streaming and leftOuterJoin, but in my understanding > this functionality is not ready for production. > Is SOLR capable to fulfill these use cases? What are the key functions to > focus on ? > > Thanks' Pavel > > > > > > > > >
Re: Problem with Password Decryption in Data Import Handler
I haven't tried this but is it possible there is a new line at the end in the file? If you did something like echo "" > file.txt then there would be a new line. Use echo -n "" > file.txt Also you should be able to check how many characters are in the file. Kevin Risden On Wed, Oct 5, 2016 at 5:00 PM, Jamie Jackson <jamieja...@gmail.com> wrote: > Hi Folks, > > (Using Solr 5.5.3.) > > As far as I know, the only place where encrypted password use is documented > is in > https://cwiki.apache.org/confluence/display/solr/ > Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler, > under the "Configuring the DIH Configuration File", in a comment in the > sample XML file: > > > > Anyway, I can encrypt just fine: > > $ openssl enc -aes-128-cbc -a -salt -in stgps.txt > enter aes-128-cbc encryption password: > Verifying - enter aes-128-cbc encryption password: > U2FsdGVkX1+VtVoQtmEREvB5qZjn3131+N4jRXmjyIY= > > > I can also decrypt just fine from the command line. > > However, if I use the encrypted password and encryptKeyFile in the config > file, I end up with an error: "String length must be a multiple of four." > > https://gist.github.com/jamiejackson/3852dacb03432328ea187d43ade5e4d9 > > How do I get this working? > > Thanks, > Jamie >
Re: CheckHdfsIndex with Kerberos not working
You need to have the hadoop pieces on the classpath. Like core-site.xml and hdfs-site.xml. There is an hdfs classpath command that would help but it may have too many pieces. You may just need core-site and hdfs-site so you don't get conflicting jars. Something like this may work for you: java -cp "$(hdfs classpath):./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/ ext/*:/hadoop/hadoop-client/lib/servlet-api-2.5.jar" -ea:org.apache.lucene... org.apache.solr.index.hdfs.CheckHdfsIndex hdfs://:8020/apps/solr/data/ExampleCollection/ core_node1/data/index Kevin Risden On Mon, Oct 3, 2016 at 1:38 PM, Rishabh Patel < rishabh.mahendra.pa...@gmail.com> wrote: > Hello, > > My SolrCloud 5.5 installation has Kerberos enabled. The CheckHdfsIndex test > fails to run. However, without Kerberos, I am able to run the test with no > issues. > > I ran the following command: > > java -cp > "./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/ > ext/*:/hadoop/hadoop-client/lib/servlet-api-2.5.jar" > -ea:org.apache.lucene... org.apache.solr.index.hdfs.CheckHdfsIndex > hdfs://:8020/apps/solr/data/ExampleCollection/ > core_node1/data/index > > The error is: > > ERROR: could not open hdfs directory " > hdfs://:8020/apps/solr/data/ExampleCollection/ > core_node1/data/index > "; > exiting org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security. > AccessControlException): > SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] > Does this error message imply that the test cannot run with Kerberos > enabled? > > For reference, I followed this blog > http://yonik.com/solr-5-5/ > > -- > Regards, > *Rishabh Patel* >
Re: Unable to connect to correct port in solr 6.2.0
Jan - the issue you are hitting is Docker and /proc/version is getting the underlying OS kernel and not what you would expect from the Docker container. The errors for update-rc.d and service are because the docker image you are using is trimmed down. Kevin Risden On Mon, Sep 12, 2016 at 3:19 PM, Jan Høydahl <jan@cominvent.com> wrote: > I tried it on a Docker RHEL system (gidikern/rhel-oracle-jre) and the > install failed with errors > > ./install_solr_service.sh: line 322: update-rc.d: command not found > ./install_solr_service.sh: line 326: service: command not found > ./install_solr_service.sh: line 328: service: command not found > > Turns out that /proc/version returns “Ubuntu” this on the system: > Linux version 4.4.19-moby (root@3934ed318998) (gcc version 5.4.0 20160609 > (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #1 SMP Thu Sep 1 09:44:30 UTC 2016 > There is also a /etc/redhat-release file: > Red Hat Enterprise Linux Server release 7.1 (Maipo) > > So the install of rc.d failed completely because of this. Don’t know if > this is common on RHEL systems, perhaps we need to improve distro detection > in installer? > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 12. sep. 2016 kl. 21.31 skrev Shalin Shekhar Mangar < > shalinman...@gmail.com>: > > > > I just tried this out on ubuntu (sorry I don't have access to a red hat > > system) and it works fine. > > > > One thing that you have to take care of is that if you install the > service > > on the default 8983 port then, trying to upgrade with the same tar to a > > different port does not work. So please ensure that you hadn't already > > installed the service before already. > > > > On Tue, Sep 13, 2016 at 12:53 AM, Shalin Shekhar Mangar < > > shalinman...@gmail.com> wrote: > > > >> Which version of red hat? Is lsof installed on this system? > >> > >> On Mon, Sep 12, 2016 at 4:30 PM, Preeti Bhat <preeti.b...@shoregrp.com> > >> wrote: > >> > >>> HI All, > >>> > >>> I am trying to setup the solr in Redhat Linux, using the > >>> install_solr_service.sh script of solr.6.2.0 tgz. The script runs and > >>> starts the solr on port 8983 even when the port is specifically > specified > >>> as 2016. > >>> > >>> /root/install_solr_service.sh solr-6.2.0.tgz -i /opt -d /var/solr -u > root > >>> -s solr -p 2016 > >>> > >>> Is this correct way to setup solr in linux? Also, I have observed that > if > >>> I go to the /bin/solr and start with the port number its working as > >>> expected but not as service. > >>> > >>> I would like to setup the SOLR in SOLRCloud mode with external > zookeepers. > >>> > >>> Could someone please advise on this? > >>> > >>> > >>> > >>> NOTICE TO RECIPIENTS: This communication may contain confidential > and/or > >>> privileged information. If you are not the intended recipient (or have > >>> received this communication in error) please notify the sender and > >>> it-supp...@shoregrp.com immediately, and destroy this communication. > Any > >>> unauthorized copying, disclosure or distribution of the material in > this > >>> communication is strictly forbidden. Any views or opinions presented in > >>> this email are solely those of the author and do not necessarily > represent > >>> those of the company. Finally, the recipient should check this email > and > >>> any attachments for the presence of viruses. The company accepts no > >>> liability for any damage caused by any virus transmitted by this email. > >>> > >>> > >>> > >> > >> > >> -- > >> Regards, > >> Shalin Shekhar Mangar. > >> > > > > > > > > -- > > Regards, > > Shalin Shekhar Mangar. > >
Re: NoNode error on -downconfig when node does exist?
Just a quick guess: do you have a period (.) in your zk connection string chroot when you meant an underscore (_)? When you do the ls you use /solr6_1/configs, but you have /solr6.1 in your zk connection string chroot. Kevin Risden On Mon, Aug 8, 2016 at 4:44 PM, John Bickerstaff <j...@johnbickerstaff.com> wrote: > First, the caveat: I understand this is technically a zookeeper error. It > is an error that occurs when trying to deal with Solr however, so I'm > hoping someone on the list may have some insight. Also, I'm getting the > error via the zkcli.sh tool that comes with Solr... > > I have created a collection in SolrCloud (6.1) giving the "techproducts" > sample directory as the location of the conf files. > > I then wanted to download those files from zookeeper to the local machine > via the -cmd downconfig command, so I issue this command: > > sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd downconfig > -confdir /home/john/conf/ -confname statdx -z 192.168.56.5/solr6.1 > > Instead of the files, I get a stacktrace / error back which says : > > exception in thread "main" java.io.IOException: Error downloading files > from zookeeper path /configs/statdx to /home/john/conf > at > org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK( > ZkConfigManager.java:117) > at > org.apache.solr.common.cloud.ZkConfigManager.downloadConfigDir( > ZkConfigManager.java:153) > at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:237) > *Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /configs/statdx* > at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) > at > org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:331) > at > org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:328) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation( > ZkCmdExecutor.java:60) > at > org.apache.solr.common.cloud.SolrZkClient.getChildren( > SolrZkClient.java:328) > at > org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK( > ZkConfigManager.java:101) > ... 2 more > > However, when I actually look in Zookeeper, I find that the "directory" > does exist and that inside it are listed all the files. > > Here is the output from zookeeper: > > [zk: localhost:2181(CONNECTED) 0] *ls /solr6_1/configs* > [statdx] > > and... > > [zk: localhost:2181(CONNECTED) 1] *ls /solr6_1/configs/statdx* > [mapping-FoldToASCII.txt, currency.xml, managed-schema, protwords.txt, > synonyms.txt, stopwords.txt, _schema_analysis_synonyms_english.json, > velocity, admin-extra.html, update-script.js, > _schema_analysis_stopwords_english.json, solrconfig.xml, > admin-extra.menu-top.html, elevate.xml, clustering, xslt, > _rest_managed.json, mapping-ISOLatin1Accent.txt, spellings.txt, lang, > admin-extra.menu-bottom.html] > > I've rebooted all my zookeeper nodes and restarted them - just in case... > Same deal. > > Has anyone seen anything like this? >
Re: Solr 6 / Solrj RuntimeException: First tuple is not a metadata tuple
> > java.sql.SQLException: java.lang.RuntimeException: First tuple is not a > metadata tuple > That is a client side error message meaning that the statement couldn't be handled. There should be better error handling around this, but its not in place currently. And on Solr side, the logs seem okay: The logs you shared don't seem to be the full logs. There will be a related exception on the Solr server side. The exception on the Solr server side will explain the cause of the problem. Kevin Risden On Wed, May 4, 2016 at 2:57 AM, deniz <denizdurmu...@gmail.com> wrote: > I am trying to go through the steps here > <http://https://sematext.com/blog/2016/04/26/solr-6-as-jdbc-data-source/> > to start playing with the new api, but I am getting: > > java.sql.SQLException: java.lang.RuntimeException: First tuple is not a > metadata tuple > at > > org.apache.solr.client.solrj.io.sql.StatementImpl.executeQuery(StatementImpl.java:70) > at com.sematext.blog.App.main(App.java:28) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > com.intellij.rt.execution.application.AppMain.main(AppMain.java:144) > Caused by: java.lang.RuntimeException: First tuple is not a metadata tuple > at > > org.apache.solr.client.solrj.io.sql.ResultSetImpl.(ResultSetImpl.java:75) > at > > org.apache.solr.client.solrj.io.sql.StatementImpl.executeQuery(StatementImpl.java:67) > ... 6 more > > > > My code is > > import java.sql.Connection; > import java.sql.DriverManager; > import java.sql.ResultSet; > import java.sql.SQLException; > import java.sql.Statement; > > > /** > * Hello world! > * > */ > public class App > { > public static void main( String[] args ) > { > > > Connection connection = null; > Statement statement = null; > ResultSet resultSet = null; > > try{ > String connectionString = > > "jdbc:solr://zkhost:port?collection=test=map_reduce=1"; > connection = DriverManager.getConnection(connectionString); > statement = connection.createStatement(); > resultSet = statement.executeQuery("select id, text from test > where tits=1 limit 5"); > while(resultSet.next()){ > String id = resultSet.getString("id"); > String nickname = resultSet.getString("text"); > > System.out.println(id + " : " + nickname); > } > }catch(Exception e){ > e.printStackTrace(); > }finally{ > if (resultSet != null) { > try { > resultSet.close(); > } catch (Exception ex) { > } > } > if (statement != null) { > try { > statement.close(); > } catch (Exception ex) { > } > } > if (connection != null) { > try { > connection.close(); > } catch (Exception ex) { > } > } > } > > > } > } > > > I tried to figure out what is happening, but there is no more logs other > than the one above. And on Solr side, the logs seem okay: > > 2016-05-04 15:52:30.364 INFO (qtp1634198-41) [c:test s:shard1 r:core_node1 > x:test] o.a.s.c.S.Request [test] webapp=/solr path=/sql > > params={includeMetadata=true=1=json=2.2=select+id,+text+from+test+where+tits%3D1+limit+5=map_reduce} > status=0 QTime=3 > 2016-05-04 15:52:30.382 INFO (qtp1634198-46) [c:test s:shard1 r:core_node1 > x:test] o.a.s.c.S.Request [test] webapp=/solr path=/select > > params={q=(tits:"1")=false=id,text,score=score+desc=5=json=2.2} > hits=5624 status=0 QTime=1 > > > The error is happening because of some missing handlers on errors on the > code or because of some strict checks on IDE(Ideaj)? Anyone had similar > issues while using sql with solrj? > > > Thanks > > Deniz > > > > - > Zeki ama calismiyor... Calissa yapar... > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-6-Solrj-RuntimeException-First-tuple-is-not-a-metadata-tuple-tp4274451.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Parallel SQL Interface returns "java.lang.NullPointerException" after reloading collection
What I think is happening is that since the CloudSolrClient is from the SolrCache and the collection was reloaded. zkStateReader is actually null since there was no cloudSolrClient.connect() call after the reload. I think that would cause the NPE on anything that uses the zkStateReader like getClusterState(). ZkStateReader zkStateReader = cloudSolrClient.getZkStateReader(); ClusterState clusterState = zkStateReader.getClusterState(); Kevin Risden Apache Lucene/Solr Committer Hadoop and Search Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/> M: 732 213 8417 LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+ <http://www.google.com/+AvalonConsultingLLC> | Twitter <https://twitter.com/avalonconsult> - This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. On Mon, May 2, 2016 at 9:58 PM, Joel Bernstein <joels...@gmail.com> wrote: > Looks like the loop below is throwing a Null pointer. I suspect the > collection has not yet come back online. In theory this should be self > healing and when the collection comes back online it should start working > again. If not then that would be a bug. > > for(String col : clusterState.getCollections()) { > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Mon, May 2, 2016 at 10:06 PM, Ryan Yacyshyn <ryan.yacys...@gmail.com> > wrote: > > > Yes stack trace can be found here: > > > > http://pastie.org/10821638 > > > > > > > > On Mon, 2 May 2016 at 01:05 Joel Bernstein <joels...@gmail.com> wrote: > > > > > Can you post your stack trace? I suspect this has to do with how the > > > Streaming API is interacting with SolrCloud. We can probably also > create > > a > > > jira ticket for this. > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > On Sun, May 1, 2016 at 4:02 AM, Ryan Yacyshyn <ryan.yacys...@gmail.com > > > > > wrote: > > > > > > > Hi all, > > > > > > > > I'm exploring with parallel SQL queries and found something strange > > after > > > > reloading the collection: the same query will return a > > > > java.lang.NullPointerException error. Here are my steps on a fresh > > > install > > > > of Solr 6.0.0. > > > > > > > > *Start Solr in cloud mode with example* > > > > bin/solr -e cloud -noprompt > > > > > > > > *Index some data* > > > > bin/post -c gettingstarted example/exampledocs/*.xml > > > > > > > > *Send query, which works* > > > > curl --data-urlencode 'stmt=select id,name from gettingstarted where > > > > inStock = true limit 2' > http://localhost:8983/solr/gettingstarted/sql > > > > > > > > *Reload the collection* > > > > curl ' > > > > > > > > > > > > > > http://localhost:8983/solr/admin/collections?action=RELOAD=gettingstarted > > > > ' > > > > > > > > After reloading, running the exact query above will return the null > > > pointer > > > > exception error. Any idea why? > > > > > > > > If I stop all Solr severs and restart, then it's fine. > > > > > > > > *java -version* > > > > java version "1.8.0_25" > > > > Java(TM) SE Runtime Environment (build 1.8.0_25-b17) > > > > Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode) > > > > > > > > Thanks, > > > > Ryan > > > > > > > > > >
Re: Question on Solr JDBC driver with SQL client like DB Visualizer
> > Page 11, the screenshot specifies to select a > "solr-solrj-6.0.0-SNAPSHOT.jar" which is equivalent into > "solr-solrj-6.0.0.jar" shipped with released version, correct? > Correct the PDF was generated before 6.0.0 was released. The documentation from SOLR-8521 is being migrated to here: https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface#ParallelSQLInterface-SQLClientsandDatabaseVisualizationTools > When I try adding that jar, it doesn't show up driver class, DBVisualizer > still shows "No new driver class". Does it mean the class is not added to > this jar yet? > I checked the Solr 6.0.0 release and the driver is there. I was testing it yesterday for a blog series that I'm putting together. Just for reference here is the output for the Solr 6 release: tar -tvf solr-solrj-6.0.0.jar | grep sql drwxrwxrwx 0 0 0 0 Apr 1 14:40 org/apache/solr/client/solrj/io/sql/ -rwxrwxrwx 0 0 0 842 Apr 1 14:40 META-INF/services/java.sql.Driver -rwxrwxrwx 0 0 0 10124 Apr 1 14:40 org/apache/solr/client/solrj/io/sql/ConnectionImpl.class -rwxrwxrwx 0 0 0 23557 Apr 1 14:40 org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class -rwxrwxrwx 0 0 04459 Apr 1 14:40 org/apache/solr/client/solrj/io/sql/DriverImpl.class -rwxrwxrwx 0 0 0 28333 Apr 1 14:40 org/apache/solr/client/solrj/io/sql/ResultSetImpl.class -rwxrwxrwx 0 0 05167 Apr 1 14:40 org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class -rwxrwxrwx 0 0 0 10451 Apr 1 14:40 org/apache/solr/client/solrj/io/sql/StatementImpl.class -rwxrwxrwx 0 0 0 141 Apr 1 14:40 org/apache/solr/client/solrj/io/sql/package-info.class Kevin Risden Apache Lucene/Solr Committer Hadoop and Search Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/> M: 732 213 8417 LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+ <http://www.google.com/+AvalonConsultingLLC> | Twitter <https://twitter.com/avalonconsult> - This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited.
Re: Which line is solr following in terms of a BI Tool?
For Solr 6, ParallelSQL and Solr JDBC driver are going to be developed more as well as JSON facets. The Solr JDBC driver that is in Solr 6 contains SOLR-8502. There are further improvements coming in SOLR-8659 that didn't make it into 6.0. The Solr JDBC piece leverages ParallelSQL and in some cases uses JSON facets under the hood. The Solr JDBC driver should enable BI tools to connect to Solr and use the language of SQL. This is also a familiar interface for many Java developers. Just a note: Solr is not an RDBMS and shouldn't be treated like one even with a JDBC driver. The Solr JDBC driver is more of a convenience for querying. Kevin Risden On Tue, Apr 12, 2016 at 6:24 PM, Erick Erickson <erickerick...@gmail.com> wrote: > The unsatisfactory answer is that the have different characteristics. > > The analytics contrib does not work in distributed mode. It's not > receiving a lot of love at this point. > > The JSON facets are estimations. Generally very close but are not > guaranteed to be 100% accurate. The variance, as I understand it, > is something on the order of < 1% in most cases. > > The pivot facets are accurate, but more expensive than the JSON > facets. > > And, to make matters worse, the ParllelSQL way of doing some > aggregations is going to give yet another approach. > > Best, > Erick > > On Tue, Apr 12, 2016 at 7:15 AM, Pablo <anzorena.f...@gmail.com> wrote: > > Hello, > > I think this topic is important for solr users that are planning to use > solr > > as a BI Tool. > > Speaking about facets, nowadays there are three majors way of doing > (more or > > less) the same in solr. > > First, you have the pivot facets, on the other hand you have the > Analytics > > component and finally you have the JSON Facet Api. > > So, which line is Solr following? Which of these component is going to > be in > > constant development and which one is going to be deprecated sooner. > > In Yonik page, there are some test that shows how JSON Facet Api performs > > better than legacy facets, also the Api was way simpler than the pivot > > facets, so in my case that was enough to base my solution around the JSON > > Api. But I would like to know what are the thoughts of the solr > developers. > > > > Thanks! > > > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/Which-line-is-solr-following-in-terms-of-a-BI-Tool-tp4269597.html > > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: NoSuchFileException errors common on version 5.5.0
This sounds related to SOLR-8587 and there is a fix in SOLR-8793 that isn't out in a release since it was fixed after 5.5 went out. Kevin Risden Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/> M: 732 213 8417 LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+ <http://www.google.com/+AvalonConsultingLLC> | Twitter <https://twitter.com/avalonconsult> - This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. On Thu, Mar 10, 2016 at 11:02 AM, Shawn Heisey <apa...@elyograg.org> wrote: > I have a dev system running 5.5.0. I am seeing a lot of > NoSuchFileException errors (for segments_XXXfilenames). > > Here's a log excerpt: > > 2016-03-10 09:52:00.054 INFO (qtp1012570586-821) [ x:inclive] > org.apache.solr.core.SolrCore.Request [inclive] webapp=/solr > path=/admin/luke > params={qt=/admin/luke=schema=javabin=2} status=500 QTime=1 > 2016-03-10 09:52:00.055 ERROR (qtp1012570586-821) [ x:inclive] > org.apache.solr.servlet.HttpSolrCall > null:java.nio.file.NoSuchFileException: > /index/solr5/data/data/inc_0/index/segments_ias > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > > sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > at > > sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) > at > > sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) > at java.nio.file.Files.readAttributes(Files.java:1737) > at java.nio.file.Files.size(Files.java:2332) > at > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:209) > > > I did not include the full stacktrace, only up to the first Lucene/Solr > class. > > Most of the error logs are preceded by a request to the /admin/luke > handler, like you see above, but there are also entries where a failed > request is not logged right before the error. My index maintenance > program calls /admin/luke to programmatically determine the uniqueKey > for the index. > > These errors do not seem to actually interfere with Solr operation, but > they do concern me. > > Thanks, > Shawn > >
[ANNOUNCE] YCSB 0.7.0 Release
On behalf of the development community, I am pleased to announce the release of YCSB 0.7.0. Highlights: * GemFire binding replaced with Apache Geode (incubating) binding * Apache Solr binding was added * OrientDB binding improvements * HBase Kerberos support and use single connection * Accumulo improvements * JDBC improvements * Couchbase scan implementation * MongoDB improvements * Elasticsearch version increase to 2.1.1 Full release notes, including links to source and convenience binaries: https://github.com/brianfrankcooper/YCSB/releases/tag/0.7.0 This release covers changes from the last 1 month.
Re: CloudSolrClient query /admin/info/system
Created https://issues.apache.org/jira/browse/SOLR-8216 Kevin Risden Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/> M: 732 213 8417 LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+ <http://www.google.com/+AvalonConsultingLLC> | Twitter <https://twitter.com/avalonconsult> - This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. On Tue, Oct 27, 2015 at 5:11 AM, Alan Woodward <a...@flax.co.uk> wrote: > Hi Kevin, > > This looks like a bug in CSC - could you raise an issue? > > Alan Woodward > www.flax.co.uk > > > On 26 Oct 2015, at 22:21, Kevin Risden wrote: > > > I am trying to use CloudSolrClient to query information about the Solr > > server including version information. I found /admin/info/system and it > > seems to provide the information I am looking for. However, it looks like > > CloudSolrClient cannot query /admin/info since INFO_HANDLER_PATH [1] is > not > > part of the ADMIN_PATHS in CloudSolrClient.java [2]. Was this possibly > > missed as part of SOLR-4943 [3]? > > > > Is this an issue or is there a better way to query this information? > > > > As a side note, ZK_PATH also isn't listed in ADMIN_PATHS. I'm not sure > what > > issues that could cause. Is there a reason that ADMIN_PATHS in > > CloudSolrClient would be different than the paths in CommonParams [1]? > > > > [1] > > > https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/params/CommonParams.java#L168 > > [2] > > > https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L808 > > [3] https://issues.apache.org/jira/browse/SOLR-4943 > > > > Kevin Risden > > Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/ > > > > M: 732 213 8417 > > LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | > Google+ > > <http://www.google.com/+AvalonConsultingLLC> | Twitter > > <https://twitter.com/avalonconsult> > >
CloudSolrClient query /admin/info/system
I am trying to use CloudSolrClient to query information about the Solr server including version information. I found /admin/info/system and it seems to provide the information I am looking for. However, it looks like CloudSolrClient cannot query /admin/info since INFO_HANDLER_PATH [1] is not part of the ADMIN_PATHS in CloudSolrClient.java [2]. Was this possibly missed as part of SOLR-4943 [3]? Is this an issue or is there a better way to query this information? As a side note, ZK_PATH also isn't listed in ADMIN_PATHS. I'm not sure what issues that could cause. Is there a reason that ADMIN_PATHS in CloudSolrClient would be different than the paths in CommonParams [1]? [1] https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/params/CommonParams.java#L168 [2] https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L808 [3] https://issues.apache.org/jira/browse/SOLR-4943 Kevin Risden Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/> M: 732 213 8417 LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+ <http://www.google.com/+AvalonConsultingLLC> | Twitter <https://twitter.com/avalonconsult>
Lucene/Solr Git Mirrors 5 day lag behind SVN?
It looks like both Apache Git mirror (git://git.apache.org/lucene-solr.git) and GitHub mirror (https://github.com/apache/lucene-solr.git) are 5 days behind SVN. This seems to have happened before: https://issues.apache.org/jira/browse/INFRA-9182 Is this a known issue? Kevin Risden