Re: FW: Vulnerabilities in SOLR 8.6.2

2020-11-13 Thread Kevin Risden
As far as I can tell only your first and 5th emails went through. Either
way, Cassandra responded on 20200929 - ~15 hrs after your first message:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202009.mbox/%3Cbe447e96-60ed-4a40-88dd-9e0c28be6c71%40Spark%3E

Kevin Risden


On Fri, Nov 13, 2020 at 11:35 AM Narayanan, Lakshmi
 wrote:

> This is my 5th attempt in the last 60 days
>
> Is there anyone looking at these mails?
>
> Does anyone care?? L
>
>
>
>
>
> Lakshmi Narayanan
>
> Marsh & McLennan Companies
>
> 121 River Street, Hoboken,NJ-07030
>
> 201-284-3345
>
> M: 845-300-3809
>
> Email: lakshmi.naraya...@mmc.com
>
>
>
>
>
> *From:* Narayanan, Lakshmi 
> *Sent:* Thursday, October 22, 2020 1:06 PM
> *To:* solr-user@lucene.apache.org
> *Subject:* FW: Vulnerabilities in SOLR 8.6.2
>
>
>
> This is my 4th attempt to contact
>
> Please advise, if there is a build that fixes these vulnerabilities
>
>
>
> Lakshmi Narayanan
>
> Marsh & McLennan Companies
>
> 121 River Street, Hoboken,NJ-07030
>
> 201-284-3345
>
> M: 845-300-3809
>
> Email: lakshmi.naraya...@mmc.com
>
>
>
>
>
> *From:* Narayanan, Lakshmi 
> *Sent:* Sunday, October 18, 2020 4:01 PM
> *To:* solr-user@lucene.apache.org
> *Subject:* FW: Vulnerabilities in SOLR 8.6.2
>
>
>
> SOLR-User Support team
>
> Is there anyone who can answer my question or can point to someone who can
> help
>
> I have not had any response for the past 3 weeks !?
>
> Please advise
>
>
>
>
>
> Lakshmi Narayanan
>
> Marsh & McLennan Companies
>
> 121 River Street, Hoboken,NJ-07030
>
> 201-284-3345
>
> M: 845-300-3809
>
> Email: lakshmi.naraya...@mmc.com
>
>
>
>
>
> *From:* Narayanan, Lakshmi 
> *Sent:* Sunday, October 04, 2020 2:11 PM
> *To:* solr-user@lucene.apache.org
> *Cc:* Chattopadhyay, Salil ; Mutnuri, Vishnu
> D ; Pathak, Omkar ;
> Shenouda, Nasir B 
> *Subject:* RE: Vulnerabilities in SOLR 8.6.2
>
>
>
> Hello Solr-User Support team
>
> Please advise or provide further guidance on the request below
>
>
>
> Thank you!
>
>
>
> Lakshmi Narayanan
>
> Marsh & McLennan Companies
>
> 121 River Street, Hoboken,NJ-07030
>
> 201-284-3345
>
> M: 845-300-3809
>
> Email: lakshmi.naraya...@mmc.com
>
>
>
>
>
> *From:* Narayanan, Lakshmi 
> *Sent:* Monday, September 28, 2020 1:52 PM
> *To:* solr-user@lucene.apache.org
> *Cc:* Chattopadhyay, Salil ; Mutnuri, Vishnu
> D ; Pathak, Omkar ;
> Shenouda, Nasir B 
> *Subject:* Vulnerabilities in SOLR 8.6.2
> *Importance:* High
>
>
>
> Hello Solr-User Support team
>
> We have installed the SOLR 8.6.2 package into docker container in our DEV
> environment. Prior to using it, our security team scanned the docker image
> using SysDig and found a lot of Critical/High/Medium vulnerabilities. The
> full list is in the attached spreadsheet
>
>
>
> Scan Summary
>
> *30* *STOPS **190* *WARNS**188* *Vulnerabilities*
>
>
>
> Please advise or point us to how/where to get a package that has been
> patched for the Critical/High/Medium vulnerabilities in the attached
> spreadsheet
>
> Your help will be gratefully received
>
>
>
>
>
> Lakshmi Narayanan
>
> Marsh & McLennan Companies
>
> 121 River Street, Hoboken,NJ-07030
>
> 201-284-3345
>
> M: 845-300-3809
>
> Email: lakshmi.naraya...@mmc.com
>
>
>
>
>
> --
>
>
> **
> This e-mail, including any attachments that accompany it, may contain
> information that is confidential or privileged. This e-mail is
> intended solely for the use of the individual(s) to whom it was intended
> to be
> addressed. If you have received this e-mail and are not an intended
> recipient,
> any disclosure, distribution, copying or other use or
> retention of this email or information contained within it are prohibited.
> If you have received this email in error, please immediately
> reply to the sender via e-mail and also permanently
> delete all copies of the original message together with any of its
> attachments
> from your computer or device.
> **
>


Re: can't connect to SOLR with JDBC url

2020-11-09 Thread Kevin Risden
>
> start (without option : bin/solr start)


Solr SQL/JDBC requires Solr Cloud (running w/ Zookeeper) since streaming
expressions (which backs the Solr SQL) requires it.

You should be able to start Solr this way to get Solr in cloud mode.

bin/solr start -c

If you use the above to start Solr, the embedded ZK is on localhost:9983 so
the JDBC connection string should be:

jdbc:solr://localhost:9983?collection=test

Assuming your collection name is test.

Kevin Risden


On Fri, Nov 6, 2020 at 11:31 AM Vincent Bossuet  wrote:

> Hi all :)
>
> I'm trying to connect to Solr with JDBC, but I always have
> "java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
> localhost:9983/ within 15000 ms" (or other port, depends wich jdbc url I
> test).
>
> Here what I did :
>
>-
>
>I installed Solr 7.7.2 (i followed install doc here
><https://lucene.apache.org/solr/guide/7_2/installing-solr.html>), i.e.
>download, extract, start (without option : bin/solr start). This
> version of
>Solr is the one I have at work, so i installed the same to test before
> on
>localhost.
>-
>
>I added a 'test' collection and the examples xml documents, I can see
>them at this url <http://localhost:8983/solr/test/select?q=*%3A*>
>-
>
>then I installed DbVisualizer, added driver and a connection, like
> explained
>here
><https://lucene.apache.org/solr/guide/7_2/solr-jdbc-dbvisualizer.html>
> =>
>the only differences I saw with documentation is that on screencopy with
>the jar to import, versions are differents and there is one more jar in
>solr archive (commons-math3-3.6.1.jar). Also, the jdbc url to use is
> with
>or without a '/' in the middle (see here
><http://jdbc:solr//localhost:9983?collection=test>), as this :
>jdbc:solr://localhost:9983?collection=test or
>jdbc:solr://localhost:9983/?collection=test. I don't know if it is
>important...
>-
>
>and I tried both on Ubuntu VM and Windows 10
>
> So, all seems to be installed correctly, as in documentation, but when I
> click on 'connect', always have a timeout. Every website where I found some
> info talk about an url with 9983 port, I tried other possibilities (just in
> case) but no success...
>
>- jdbc:solr://localhost:9983?collection=test
>- jdbc:solr://127.0.0.1:9983?collection=test
>- jdbc:solr://localhost:9983/?collection=test
>- jdbc:solr://localhost:9983/solr?collection=test
>- jdbc:solr://localhost:8983/?collection=test
>- jdbc:solr://localhost:8983?collection=test
>- jdbc:solr://localhost:8983/solr?collection=test
>- jdbc:solr://localhost:2181?collection=test
>- jdbc:solr://localhost:2181/?collection=test
>- jdbc:solr://localhost:2181/solr?collection=test
>
> If you have an idea, thanks for help !
>
> Vincent
>


Re: Solr 8.6.2 - Admin UI Issue

2020-10-08 Thread Kevin Risden
Since the image didn't come through - it could be
https://issues.apache.org/jira/browse/SOLR-14549

Definitely make sure to clear cache to ensure that JS files aren't cached,
but if that doesn't fix it see if SOLR-14549 is related.
Kevin Risden



On Thu, Oct 8, 2020 at 9:38 AM Eric Pugh 
wrote:

> I’ve seen this behavior as well jumping between versions of Solr.
> Typically in the browser console I see some sort of very opaque Javascript
> error.
>
> > On Oct 8, 2020, at 5:54 AM, Colvin Cowie 
> wrote:
> >
> > Images won't be included on the mailing list. You need to put them
> > somewhere else and link to them.
> >
> > With that said, if you're switching between versions, maybe your browser
> > has the old UI cached? Try clearing the cache / viewing it in a private
> > window and see if it's any different.
> >
> > On Wed, 7 Oct 2020 at 11:22, Vinay Rajput  <mailto:vinayrajput4...@gmail.com>> wrote:
> >
> >> Hi All,
> >>
> >> We are currently using Solr 7.3.1 in cloud mode and planning to upgrade.
> >> When I bootstrapped Solr 8.6.2 in my local machine and uploaded all
> >> necessary configs, I noticed one issue in admin UI.
> >>
> >> If I select a collection and go to files, it shows the content tree
> having
> >> all files and folders present in that collection. In Solr 8.6.2, it is
> >> somehow not showing the folders correctly. In my screenshot, you can see
> >> that velocity and xslt are the folders and we have some config files
> inside
> >> these two folders. Because of this issue, I can't click on folder nodes
> and
> >> see children nodes. I checked the network calls and it looks like we are
> >> getting the correct data from Solr. So, it looks like an Admin UI issue
> to
> >> me.
> >>
> >> Does anyone know if this is a* known issue* or I am missing something
> >> here? Has anyone noticed the similar issue?  I can confirm that It works
> >> fine with Solr 7.3.1.
> >>
> >> [image: image.png][image: image.png]
> >>
> >> Left image is for 8.6.2 and right image is for 7.3.1
> >>
> >> Thanks,
> >> Vinay
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>


Re: [CAUTION] SSL + Solr 8.5.1 in cloud mode + Java 8

2020-07-15 Thread Kevin Risden
You need to remove the references from bin/solr or bin/solr.cmd to
SOLR_SSL_CLIENT_KEY_STORE and "-Djavax.net.ssl.keyStore". This is different
from solr.in.sh.

The way the bin/solr script is written it is falling back to whatever is
provided as SOLR_SSL_KEY_STORE for the client keystore which is causing
issues.

Kevin Risden



On Wed, Jul 15, 2020 at 3:45 AM Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Thank you for your reply. I looked at solr.in.sh I see that
> SOLR_SSL_CLIENT_KEY_STORE  is already commented out by default. But you are
> right I looked at the running solr,  I see the option
> -Djavax.net.ssl.keyStore pointing to solr-ssl.keystore.p12 , not sure how
> it is getting that value. Let me dig more. Thanks for the pointer. Also if
> you have a pointer how it get's populated  other than
> SOLR_SSL_CLIENT_KEY_STORE config in solr.in.sh , please let me know
>
> #SOLR_SSL_CLIENT_KEY_STORE=
> #SOLR_SSL_CLIENT_KEY_STORE_PASSWORD=
> #SOLR_SSL_CLIENT_KEY_STORE_TYPE=
> #SOLR_SSL_CLIENT_TRUST_STORE=
> #SOLR_SSL_CLIENT_TRUST_STORE_PASSWORD=
> #SOLR_SSL_CLIENT_TRUST_STORE_TYPE=
>
> Yes we are not using Solr client auth.
>
> Thanks,
> Rajeswari
>
> On 7/14/20, 5:55 PM, "Kevin Risden"  wrote:
>
> Hmmm so I looked closer - it looks like a side effect of the default
> passthrough of the keystore being passed to the client keystore.
>
> https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L229
>
> Can you remove or commout the entire SOLR_SSL_CLIENT_KEY_STORE section
> from
> bin/solr or bin/solr.cmd depending on which version you are using? The
> key
> being to make sure to not set "-Djavax.net.ssl.keyStore".
>
> This assumes that you aren't using Solr client auth (which based on
> your
> config you aren't) and you aren't trying to use Solr to connect to
> anything
> that is secured via clientAuth (most likely you aren't).
>
> If you can try this and report back that would be awesome. I think this
> will fix the issue and it would be possible to make client auth opt in
> instead of default fall back.
> Kevin Risden
>
>
>
> On Tue, Jul 14, 2020 at 1:46 AM Natarajan, Rajeswari <
> rajeswari.natara...@sap.com> wrote:
>
> > Thank you so much for the response.  Below are the configs I have in
> > solr.in.sh and I followed
> > https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html
> documentation
> >
> > # Enables HTTPS. It is implicitly true if you set
> SOLR_SSL_KEY_STORE. Use
> > this config
> > # to enable https module with custom jetty configuration.
> > SOLR_SSL_ENABLED=true
> > # Uncomment to set SSL-related system properties
> > # Be sure to update the paths to the correct keystore for your
> environment
> > SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.p12
> > SOLR_SSL_KEY_STORE_PASSWORD=secret
> > SOLR_SSL_TRUST_STORE=etc/solr-ssl.keystore.p12
> > SOLR_SSL_TRUST_STORE_PASSWORD=secret
> > # Require clients to authenticate
> > SOLR_SSL_NEED_CLIENT_AUTH=false
> > # Enable clients to authenticate (but not require)
> > SOLR_SSL_WANT_CLIENT_AUTH=false
> > # SSL Certificates contain host/ip "peer name" information that is
> > validated by default. Setting
> > # this to false can be useful to disable these checks when re-using a
> > certificate on many hosts
> > SOLR_SSL_CHECK_PEER_NAME=true
> >
> > In local , with the below certificate it works
> > ---
> >
> > keytool -list -keystore solr-ssl.keystore.p12
> > Enter keystore password:
> > Keystore type: PKCS12
> > Keystore provider: SUN
> >
> > Your keystore contains 1 entry
> >
> > solr-18, Jun 26, 2020, PrivateKeyEntry,
> > Certificate fingerprint (SHA1):
> > AB:F2:C8:84:E8:E7:A2:BF:2D:0D:2F:D3:95:4A:98:5B:2A:88:81:50
> > C02W48C6HTD6:solr-8.5.1 i843100$ keytool -list -v -keystore
> > solr-ssl.keystore.p12
> > Enter keystore password:
> > Keystore type: PKCS12
> > Keystore provider: SUN
> >
> > Your keystore contains 1 entry
> >
> > Alias name: solr-18
> > Creation date: Jun 26, 2020
> > Entry type: PrivateKeyEntry
> > Certificate chain length: 1
> > Certificate[1]:
> > Owner: CN=localhost, OU=Organizational Unit, O=Organization,
> L=Location,
> > ST=State, C=Country
> > Issuer: CN=localhost, OU=Organizational 

Re: [CAUTION] SSL + Solr 8.5.1 in cloud mode + Java 8

2020-07-14 Thread Kevin Risden
Hmmm so I looked closer - it looks like a side effect of the default
passthrough of the keystore being passed to the client keystore.

https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L229

Can you remove or commout the entire SOLR_SSL_CLIENT_KEY_STORE section from
bin/solr or bin/solr.cmd depending on which version you are using? The key
being to make sure to not set "-Djavax.net.ssl.keyStore".

This assumes that you aren't using Solr client auth (which based on your
config you aren't) and you aren't trying to use Solr to connect to anything
that is secured via clientAuth (most likely you aren't).

If you can try this and report back that would be awesome. I think this
will fix the issue and it would be possible to make client auth opt in
instead of default fall back.
Kevin Risden



On Tue, Jul 14, 2020 at 1:46 AM Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Thank you so much for the response.  Below are the configs I have in
> solr.in.sh and I followed
> https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html documentation
>
> # Enables HTTPS. It is implicitly true if you set SOLR_SSL_KEY_STORE. Use
> this config
> # to enable https module with custom jetty configuration.
> SOLR_SSL_ENABLED=true
> # Uncomment to set SSL-related system properties
> # Be sure to update the paths to the correct keystore for your environment
> SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.p12
> SOLR_SSL_KEY_STORE_PASSWORD=secret
> SOLR_SSL_TRUST_STORE=etc/solr-ssl.keystore.p12
> SOLR_SSL_TRUST_STORE_PASSWORD=secret
> # Require clients to authenticate
> SOLR_SSL_NEED_CLIENT_AUTH=false
> # Enable clients to authenticate (but not require)
> SOLR_SSL_WANT_CLIENT_AUTH=false
> # SSL Certificates contain host/ip "peer name" information that is
> validated by default. Setting
> # this to false can be useful to disable these checks when re-using a
> certificate on many hosts
> SOLR_SSL_CHECK_PEER_NAME=true
>
> In local , with the below certificate it works
> ---
>
> keytool -list -keystore solr-ssl.keystore.p12
> Enter keystore password:
> Keystore type: PKCS12
> Keystore provider: SUN
>
> Your keystore contains 1 entry
>
> solr-18, Jun 26, 2020, PrivateKeyEntry,
> Certificate fingerprint (SHA1):
> AB:F2:C8:84:E8:E7:A2:BF:2D:0D:2F:D3:95:4A:98:5B:2A:88:81:50
> C02W48C6HTD6:solr-8.5.1 i843100$ keytool -list -v -keystore
> solr-ssl.keystore.p12
> Enter keystore password:
> Keystore type: PKCS12
> Keystore provider: SUN
>
> Your keystore contains 1 entry
>
> Alias name: solr-18
> Creation date: Jun 26, 2020
> Entry type: PrivateKeyEntry
> Certificate chain length: 1
> Certificate[1]:
> Owner: CN=localhost, OU=Organizational Unit, O=Organization, L=Location,
> ST=State, C=Country
> Issuer: CN=localhost, OU=Organizational Unit, O=Organization, L=Location,
> ST=State, C=Country
> Serial number: 45a822c8
> Valid from: Fri Jun 26 00:13:03 PDT 2020 until: Sun Nov 10 23:13:03 PST
> 2047
> Certificate fingerprints:
>  MD5:  0B:80:54:89:44:65:93:07:1F:81:88:8D:EC:BD:38:41
>  SHA1: AB:F2:C8:84:E8:E7:A2:BF:2D:0D:2F:D3:95:4A:98:5B:2A:88:81:50
>  SHA256:
> 9D:65:A6:55:D7:22:B2:72:C2:20:55:66:F8:0C:9C:48:B1:F6:48:40:A4:FB:CB:26:77:DE:C4:97:34:69:25:42
> Signature algorithm name: SHA256withRSA
> Subject Public Key Algorithm: 2048-bit RSA key
> Version: 3
>
> Extensions:
>
> #1: ObjectId: 2.5.29.17 Criticality=false
> SubjectAlternativeName [
>   DNSName: localhost
>   IPAddress: 172.20.10.4
>   IPAddress: 127.0.0.1
> ]
>
> #2: ObjectId: 2.5.29.14 Criticality=false
> SubjectKeyIdentifier [
> KeyIdentifier [
> : 1B 6F BB 65 A4 3C 6A F4   C9 05 08 89 88 0E 9E 76  .o.e. 0010: A1 B7 28 BE..(.
> ]
>
> /
> In a cluster env , where the deployment  , keystore everything is
> automated  (used by  multiple teams) keystore generated is as below. As you
> can see the  keystore has 2 certificates , in which case I get the
> exception  below.
>
> java.lang.UnsupportedOperationException: X509ExtendedKeyManager only
> > supported on Server
> >   at
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient.createHttpClient(Http2SolrClient.java:223)
> >
>
> In both cases , the config is same except the keystore certificates . In
> the JIRA (https://issues.apache.org/jira/browse/SOLR-14105) , I see the
> fix says it supports multiple DNS and multiple certificates. So I thought
> it should be ok. Please let me know .
>
> keytool -list -keystore  /etc/nginx/certs/sidecar.p12
> Picked up JAVA_TOOL_OPTIONS: -Dfile

Re: [CAUTION] SSL + Solr 8.5.1 in cloud mode + Java 8

2020-07-13 Thread Kevin Risden
>
> In local with just certificate and one domain name  the SSL communication
> worked. With multiple DNS and 2 certificates SSL fails with below exception.
>

A client keystore by definition can only have a single certificate. A
server keystore can have multiple certificates. The reason being is that a
client can only be identified by a single certificate.

Can you share more details about specifically what your solr.in.sh configs
look like related to keystore/truststore and which files? Specifically
highlight which files have multiple certificates in them.

It looks like for the Solr internal http client, the client keystore has
more than one certificate in it and the error is correct. This is more
strict with recent versions of Jetty 9.4.x. Previously this would silently
fail, but was still incorrect. Now the error is bubbled up so that there is
no silent misconfigurations.

Kevin Risden


On Mon, Jul 13, 2020 at 4:54 PM Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> I looked at the patch mentioned in the JIRA
> https://issues.apache.org/jira/browse/SOLR-14105  reporting the below
> issue. I looked at the solr 8.5.1 code base , I see the patch is applied.
> But still seeing the same  exception with different stack trace. The
> initial excsption stacktrace was at
>
> at
> org.eclipse.jetty.util.ssl.SslContextFactory.doStart(SslContextFactory.java:245)
>
>
> Now the exception we encounter is at httpsolrclient creation
>
>
> Caused by: java.lang.RuntimeException:
> java.lang.UnsupportedOperationException: X509ExtendedKeyManager only
> supported on Server
>   at
> org.apache.solr.client.solrj.impl.Http2SolrClient.createHttpClient(Http2SolrClient.java:223)
>
> I commented the JIRA also. Let me know if this is still an issue.
>
> Thanks,
> Rajeswari
>
> On 7/13/20, 2:03 AM, "Natarajan, Rajeswari" 
> wrote:
>
> Re-sending to see if anyone encountered  had this combination and
> encountered this issue. In local with just certificate and one domain name
> the SSL communication worked. With multiple DNS and 2 certificates SSL
> fails with below exception.  Below JIRA says it is fixed for
> Http2SolrClient , wondering if this is fixed for http1 solr client as we
> pass -Dsolr.http1=true .
>
> Thanks,
> Rajeswari
>
> https://issues.apache.org/jira/browse/SOLR-14105
>
> On 7/6/20, 10:02 PM, "Natarajan, Rajeswari" <
> rajeswari.natara...@sap.com> wrote:
>
> Hi,
>
> We are using Solr 8.5.1 in cloud mode  with Java 8. We are
> enabling  TLS  with http1  (as we get a warning java 8 + solr 8.5 SSL can’t
> be enabled) and we get below exception
>
>
>
> 2020-07-07 03:58:53.078 ERROR (main) [   ] o.a.s.c.SolrCore
> null:org.apache.solr.common.SolrException: Error instantiating
> shardHandlerFactory class [HttpShardHandlerFactory]:
> java.lang.UnsupportedOperationException: X509ExtendedKeyManager only
> supported on Server
>   at
> org.apache.solr.handler.component.ShardHandlerFactory.newInstance(ShardHandlerFactory.java:56)
>   at
> org.apache.solr.core.CoreContainer.load(CoreContainer.java:647)
>   at
> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:263)
>   at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:183)
>   at
> org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:134)
>   at
> org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:751)
>   at
> java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
>   at
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
>   at
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
>   at
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:744)
>   at
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:360)
>   at
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1445)
>   at
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1409)
>   at
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:822)
>   at
> org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:275)
>   at
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:52

Re: using S3 as the Directory for Solr

2020-04-25 Thread Kevin Risden
Solr's use of the HdfsDirectory may work over S3 directly if you use the
Hadoop AWS binding - s3a [1]. The idea is to replace hdfs:// with
s3a://bucket/. Since S3 is eventually consistent, the Hadoop AWS s3a
project has s3guard to help with consistent listing. If you are only doing
queries (no indexing) with Solr you may not need to worry about the
eventual consistency.

There was some previous exploration in this area with Solr 6.x/7.x, but it
should be much better with Solr 8.x due to the upgraded Hadoop 3.x
dependency. I haven't done any stress testing of this, but I made sure it
at least in theory would connect. I could index and query some small
datasets stored via s3a.

Using the HdfsDirectory with s3a will most likely be slower as already
pointed out. You might get reasonable performance depending on the nodes
used and tuning the HdfsDirectory block cache.

[1]
https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html

Kevin Risden


On Fri, Apr 24, 2020 at 1:19 PM dhurandar S  wrote:

> Its 10 PB of source data, But we do have indexes on most of the attributes.
> 80% or so
> We have a need to support such large data and we have use cases of finding
> a needle in the haystack kinda scenario.
> Most of our users are used to Search query language or Solr in addition to
> SQL. So we would have both the interfaces.
>
> We store the actual data in S3 in Parquet and have Presto query it using
> SQL (Presto is similar to Hive but much much faster).
>
> We also now want to store the indexes in S3 we have leeway in query
> interactivity  performance, the key thing here is support finding the
> needle in the haystack pattern and supporting really long-range data in a
> cheaper fashion
>
> regards,
> Rahul
>
>
> On Thu, Apr 23, 2020 at 7:41 PM Walter Underwood 
> wrote:
>
> > It will be a lot more than 2X or 3X slower. Years ago, I accidentally put
> > Solr indexes on an NFS mounted filesystem and it was 100X slower. S3
> would
> > be a lot slower than that.
> >
> > Are you doing relevance-ranked searches on all that data? That is the
> only
> > reason to use Solr instead of some other solution.
> >
> > I’d use Apache Hive, or whatever has replaced it. That is what Facebook
> > wrote to do searches on their multi-petabyte logs.
> >
> > https://hive.apache.org
> >
> > More options.
> >
> > https://jethro.io/hadoop-hive
> > https://mapr.com/why-hadoop/sql-hadoop/sql-hadoop-details/
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Apr 23, 2020, at 7:29 PM, Christopher Schultz <
> > ch...@christopherschultz.net> wrote:
> > >
> > > -BEGIN PGP SIGNED MESSAGE-
> > > Hash: SHA256
> > >
> > > Rahul,
> > >
> > > On 4/23/20 21:49, dhurandar S wrote:
> > >> Thank you for your reply. The reason we are looking for S3 is since
> > >> the volume is close to 10 Petabytes. We are okay to have higher
> > >> latency of say twice or thrice that of placing data on the local
> > >> disk. But we have a requirement to have long-range data and
> > >> providing Seach capability on that.  Every other storage apart from
> > >> S3 turned out to be very expensive at that scale.
> > >>
> > >> Basically I want to replace
> > >>
> > >> -Dsolr.directoryFactory=HdfsDirectoryFactory \
> > >>
> > >> with S3 based implementation.
> > >
> > > Can you clarify whether you have 10 PiB of /source data/ or 10 PiB of
> > > /index data/?
> > >
> > > You can theoretically store your source data anywhere, of course. 10
> > > PiB sounds like a truly enormous index.
> > >
> > > - -chris
> > >
> > >> On Thu, Apr 23, 2020 at 3:12 AM Jan Høydahl 
> > >> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> Is your data so partitioned that it makes sense to consider
> > >>> splitting up in multiple collections and make some arrangement
> > >>> that will keep only a few collections live at a time, loading
> > >>> index files from S3 on demand?
> > >>>
> > >>> I cannot see how an S3 directory would be able to effectively
> > >>> cache files in S3 and what units the index files would be stored
> > >>> as?
> > >>>
> > >>> Have you investigated EFS as an alternative? That would look like
> > >>> a normal filesystem to Solr but might be ch

Re: Schema Browser API

2020-04-09 Thread Kevin Risden
The Luke request handler may do what you are asking for already? This
is coming directly from Lucene and doesn't rely on what Solr has in
the schema information.

/admin/luke

https://lucene.apache.org/solr/guide/7_7/implicit-requesthandlers.html
https://cwiki.apache.org/confluence/display/SOLR/LukeRequestHandler

PS - There is also the ability to run Luke standalone over Lucene indices.

Kevin Risden

On Thu, Apr 9, 2020 at 3:34 PM Webster Homer
 wrote:
>
> I was just looking at the Schema Browser for one of our collections. It's 
> pretty handy. I was thinking that it would be useful to create a tool that 
> would create a report about what fields were indexed had docValues, were 
> multivalued etc...
>
> Has someone built such a tool? I want it to aid in estimating memory 
> requirements for our collections.
>
> I'm currently running solr 7.7.2
>
>
>
> This message and any attachment are confidential and may be privileged or 
> otherwise protected from disclosure. If you are not the intended recipient, 
> you must not copy this message or attachment or disclose the contents to any 
> other person. If you have received this transmission in error, please notify 
> the sender immediately and delete the message and any attachment from your 
> system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not 
> accept liability for any omissions or errors in this message which may arise 
> as a result of E-Mail-transmission or for damages resulting from any 
> unauthorized changes of the content of this message and any attachment 
> thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not 
> guarantee that this message is free of viruses and does not accept liability 
> for any damages caused by any virus transmitted therewith.
>
>
>
> Click http://www.merckgroup.com/disclaimer to access the German, French, 
> Spanish and Portuguese versions of this disclaimer.


Re: CVEs (vulnerabilities) that apply to Solr 8.4.1

2020-03-20 Thread Kevin Risden
https://lucene.apache.org/solr/security.html

The security page on the Solr website has details about how to report
security items. It also has a link to the wiki page with details about some
of these that are false positives.

Each version of Solr has dependency updates and addresses different
dependency CVEs as they are reported and detected. I haven't looked through
what was shared specifically but Solr 8.5 which is under vote addresses at
least a few dependency upgrades.

Kevin Risden


On Fri, Mar 20, 2020 at 10:23 AM Ahlberg, Christopher C. 
wrote:

> Our TRM team (Technology Risk Management) has provided us with the
> attached vulnerabilities analysis for Solr 8.4.1, (security issues
> extracted below.)
>
>
>
> Has anyone out there in the Solr community done anything to document
> workarounds or mitigations for any of these identified vulnerabilities in
> Solr 8.4.1?  Does anyone know if work to address these issues is happening
> for subsequent releases?
>
>
>
> Any and all comments will be greatly appreciated!
>
>
>
> From their analysis:
>
> Security Issues
>
> *Threat Level Problem Code
> Component
> Status*
>
> *9  *sonatype-2019-0115  jQuery
> 1.7.1  Open
>
> sonatype-2019-0115  com.carrotsearch.randomizedtesting : junit4-ant :
> 2.7.2Open
>
> CVE-2015-1832
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-1832> 
> org.apache.derby
> : derby : 10.9.1.0 Open
>
> CVE-2015-1832
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-1832> 
> org.ikasan
> : ikasan-solr-distribution : zip : 3.0.0Open
>
> CVE-2017-1000190
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-1000190>
> org.ikasan
> : ikasan-solr-distribution : zip : 3.0.0Open
>
> sonatype-2019-0115  org.ikasan : ikasan-solr-distribution : zip :
> 3.0.0Open
>
> sonatype-2019-0494  org.ikasan : ikasan-solr-distribution : zip :
> 3.0.0Open
>
> *8  *CVE-2019-10088
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-10088>
> org.apache.tika
> : tika-core : 1.19.1  Open
>
> CVE-2019-10088
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-10088>
> org.ikasan
> : ikasan-solr-distribution : zip : 3.0.0Open
>
> *7  *CVE-2012-0881
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-0881> 
> apache-xerces
> : xercesImpl : 2.9.1 Open
>
> CVE-2013-4002
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-4002>
> apache-xerces
> : xercesImpl : 2.9.1 Open
>
> CVE-2019-14262
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14262>
> com.drewnoakes
> : metadata-extractor : 2.11.0Open
>
> CVE-2019-12402
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-12402>
> org.apache.commons
> : commons-compress : 1.18  Open
>
> CVE-2019-10094
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-10094>
> org.apache.tika
> : tika-core : 1.19.1  Open
>
> CVE-2012-0881
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-0881>
> org.ikasan
> : ikasan-solr-distribution : zip : 3.0.0Open
>
> CVE-2013-4002
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-4002>
> org.ikasan
> : ikasan-solr-distribution : zip : 3.0.0Open
>
> CVE-2014-0114
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-0114>
> org.ikasan
> : ikasan-solr-distribution : zip : 3.0.0Open
>
> CVE-2019-10094
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-10094>
> org.ikasan
> : ikasan-solr-distribution : zip : 3.0.0Open
>
> CVE-2019-12086
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-12086>
> org.ikasan
> : ikasan-solr-distribution : zip : 3.0.0Open
>
> CVE-2019-12402
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-12402>
> org.ikasan
> : ikasan-solr-distribution : zip : 3.0.0Open
>
> CVE-2019-14262
> <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14262>
> org.ikasan
> : ikasan-solr-distribution : zip : 3.0.0Op

Re: Oracle OpenJDK to Amazon Corretto OpenJDK

2020-01-31 Thread Kevin Risden
guage = en
user.name = root
user.timezone =

openjdk version "11.0.6" 2020-01-14 LTS
OpenJDK Runtime Environment Corretto-11.0.6.10.1 (build 11.0.6+10-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.6.10.1 (build 11.0.6+10-LTS, mixed
mode)


Kevin Risden


On Fri, Jan 31, 2020 at 1:25 PM Kevin Risden  wrote:

> Whoops forgot to share the same output from latest. The docker images are
> clearly building from AdoptOpenJDK so specification vendor is potentially
> misleading?
>
> ➜  ~ docker pull solr
> Using default tag: latest
> latest: Pulling from library/solr
> Digest:
> sha256:ef1f2241c1aa51746aa3ad05570123eef128d98e91bc07336c37f2a1b37df7a9
> Status: Image is up to date for solr:latest
> docker.io/library/solr:latest
> ➜  ~ docker run --rm -it solr bash -c "java -XshowSettings:properties
> -version"
> Property settings:
> awt.toolkit = sun.awt.X11.XToolkit
> file.encoding = UTF-8
> file.separator = /
> java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
> java.awt.printerjob = sun.print.PSPrinterJob
> java.class.path =
> java.class.version = 55.0
> java.home = /usr/local/openjdk-11
> java.io.tmpdir = /tmp
> java.library.path = /usr/java/packages/lib
> /usr/lib64
> /lib64
> /lib
> /usr/lib
> java.runtime.name = OpenJDK Runtime Environment
> java.runtime.version = 11.0.6+10
> java.specification.name = Java Platform API Specification
> java.specification.vendor = Oracle Corporation
> java.specification.version = 11
> java.vendor = Oracle Corporation
> java.vendor.url = http://java.oracle.com/
> java.vendor.url.bug = http://bugreport.java.com/bugreport/
> java.vendor.version = 18.9
> java.version = 11.0.6
> java.version.date = 2020-01-14
> java.vm.compressedOopsMode = 32-bit
> java.vm.info = mixed mode
> java.vm.name = OpenJDK 64-Bit Server VM
> java.vm.specification.name = Java Virtual Machine Specification
> java.vm.specification.vendor = Oracle Corporation
> java.vm.specification.version = 11
> java.vm.vendor = Oracle Corporation
> java.vm.version = 11.0.6+10
> jdk.debug = release
> line.separator = \n
> os.arch = amd64
> os.name = Linux
> os.version = 4.19.76-linuxkit
> path.separator = :
> sun.arch.data.model = 64
> sun.boot.library.path = /usr/local/openjdk-11/lib
> sun.cpu.endian = little
> sun.cpu.isalist =
> sun.io.unicode.encoding = UnicodeLittle
> sun.java.launcher = SUN_STANDARD
> sun.jnu.encoding = UTF-8
> sun.management.compiler = HotSpot 64-Bit Tiered Compilers
> sun.os.patch.level = unknown
> user.dir = /opt/solr-8.4.1
> user.home = /home/solr
>     user.language = en
> user.name = solr
> user.timezone =
>
> openjdk version "11.0.6" 2020-01-14
> OpenJDK Runtime Environment 18.9 (build 11.0.6+10)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.6+10, mixed mode)
>
> Kevin Risden
>
>
> On Fri, Jan 31, 2020 at 1:22 PM Kevin Risden  wrote:
>
>> What specific Solr tag are you using? That looks like JDK 1.8 and an
>> older version.
>>
>> Just picking the current latest as an example:
>>
>>
>> https://github.com/docker-solr/docker-solr/blob/394ead2fa128d90afb072284bce5f1715345c53c/8.4/Dockerfile
>>
>> which uses openjdk:11-stretch
>>
>> and looking up that is
>>
>>
>> https://github.com/docker-library/openjdk/blob/1b6e2ef66a086f47315f5d05ecf7de3dae7413f2/11/jdk/Dockerfile#L36
>>
>> This is JDK 11 and not JDK 1.8.
>>
>> Even openjdk:8-stretch
>>
>>
>> https://github.com/docker-library/openjdk/blob/a886db8d5ea96b7bc0104b2f55fabd44bcb5e7c0/8/jdk/Dockerfile#L36
>>
>> So maybe you have an older Solr docker tag?
>>
>> Kevin Risden
>>
>>
>> On Fri, Jan 31, 2020 at 1:13 PM Walter Underwood 
>> wrote:
>>
>>> Maybe you can give them an estimate of how much work it will be. See if
>>> legal will put it on their budget. Free software isn’t free, especially the
>>> “free kittens” kind.
>>>
>>> This guy offers consulting for custom Docker images.
>>>
>>> https://pythonspeed.com/about/
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>> > On Jan 31, 2020, at 9:45 AM, Arnold Bronley 
>>> wrote:
>>> >
>>> > Thanks for the helpful information. It is a no-go because even though
>>> it is
>>> > OpenJDK and free, vendor is Orac

Re: Oracle OpenJDK to Amazon Corretto OpenJDK

2020-01-31 Thread Kevin Risden
Whoops forgot to share the same output from latest. The docker images are
clearly building from AdoptOpenJDK so specification vendor is potentially
misleading?

➜  ~ docker pull solr
Using default tag: latest
latest: Pulling from library/solr
Digest:
sha256:ef1f2241c1aa51746aa3ad05570123eef128d98e91bc07336c37f2a1b37df7a9
Status: Image is up to date for solr:latest
docker.io/library/solr:latest
➜  ~ docker run --rm -it solr bash -c "java -XshowSettings:properties
-version"
Property settings:
awt.toolkit = sun.awt.X11.XToolkit
file.encoding = UTF-8
file.separator = /
java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
java.awt.printerjob = sun.print.PSPrinterJob
java.class.path =
java.class.version = 55.0
java.home = /usr/local/openjdk-11
java.io.tmpdir = /tmp
java.library.path = /usr/java/packages/lib
/usr/lib64
/lib64
/lib
/usr/lib
java.runtime.name = OpenJDK Runtime Environment
java.runtime.version = 11.0.6+10
java.specification.name = Java Platform API Specification
java.specification.vendor = Oracle Corporation
java.specification.version = 11
java.vendor = Oracle Corporation
java.vendor.url = http://java.oracle.com/
java.vendor.url.bug = http://bugreport.java.com/bugreport/
java.vendor.version = 18.9
java.version = 11.0.6
java.version.date = 2020-01-14
java.vm.compressedOopsMode = 32-bit
java.vm.info = mixed mode
java.vm.name = OpenJDK 64-Bit Server VM
java.vm.specification.name = Java Virtual Machine Specification
java.vm.specification.vendor = Oracle Corporation
java.vm.specification.version = 11
java.vm.vendor = Oracle Corporation
java.vm.version = 11.0.6+10
jdk.debug = release
line.separator = \n
os.arch = amd64
os.name = Linux
os.version = 4.19.76-linuxkit
path.separator = :
sun.arch.data.model = 64
sun.boot.library.path = /usr/local/openjdk-11/lib
sun.cpu.endian = little
sun.cpu.isalist =
sun.io.unicode.encoding = UnicodeLittle
sun.java.launcher = SUN_STANDARD
sun.jnu.encoding = UTF-8
sun.management.compiler = HotSpot 64-Bit Tiered Compilers
sun.os.patch.level = unknown
user.dir = /opt/solr-8.4.1
user.home = /home/solr
user.language = en
user.name = solr
user.timezone =

openjdk version "11.0.6" 2020-01-14
OpenJDK Runtime Environment 18.9 (build 11.0.6+10)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.6+10, mixed mode)

Kevin Risden


On Fri, Jan 31, 2020 at 1:22 PM Kevin Risden  wrote:

> What specific Solr tag are you using? That looks like JDK 1.8 and an older
> version.
>
> Just picking the current latest as an example:
>
>
> https://github.com/docker-solr/docker-solr/blob/394ead2fa128d90afb072284bce5f1715345c53c/8.4/Dockerfile
>
> which uses openjdk:11-stretch
>
> and looking up that is
>
>
> https://github.com/docker-library/openjdk/blob/1b6e2ef66a086f47315f5d05ecf7de3dae7413f2/11/jdk/Dockerfile#L36
>
> This is JDK 11 and not JDK 1.8.
>
> Even openjdk:8-stretch
>
>
> https://github.com/docker-library/openjdk/blob/a886db8d5ea96b7bc0104b2f55fabd44bcb5e7c0/8/jdk/Dockerfile#L36
>
> So maybe you have an older Solr docker tag?
>
> Kevin Risden
>
>
> On Fri, Jan 31, 2020 at 1:13 PM Walter Underwood 
> wrote:
>
>> Maybe you can give them an estimate of how much work it will be. See if
>> legal will put it on their budget. Free software isn’t free, especially the
>> “free kittens” kind.
>>
>> This guy offers consulting for custom Docker images.
>>
>> https://pythonspeed.com/about/
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> > On Jan 31, 2020, at 9:45 AM, Arnold Bronley 
>> wrote:
>> >
>> > Thanks for the helpful information. It is a no-go because even though
>> it is
>> > OpenJDK and free, vendor is Oracle and legal dept. at our company is
>> trying
>> > to get away from anything Oracle.
>> > It is little paranoid reaction, I agree.
>> >
>> > See the java.vendor property in following output.
>> >
>> > $ java -XshowSettings:properties -version
>> > Property settings:
>> >awt.toolkit = sun.awt.X11.XToolkit
>> >file.encoding = UTF-8
>> >file.encoding.pkg = sun.io
>> >file.separator = /
>> >java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
>> >java.awt.printerjob = sun.print.PSPrinterJob
>> >java.class.path = .
>> >java.class.version = 52.0
>> >java.endorsed.dirs =
>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/endorsed
>> >java.ext.dirs = /usr/lib/jvm/java-8-openjdk-amd64/jre/lib

Re: Oracle OpenJDK to Amazon Corretto OpenJDK

2020-01-31 Thread Kevin Risden
What specific Solr tag are you using? That looks like JDK 1.8 and an older
version.

Just picking the current latest as an example:

https://github.com/docker-solr/docker-solr/blob/394ead2fa128d90afb072284bce5f1715345c53c/8.4/Dockerfile

which uses openjdk:11-stretch

and looking up that is

https://github.com/docker-library/openjdk/blob/1b6e2ef66a086f47315f5d05ecf7de3dae7413f2/11/jdk/Dockerfile#L36

This is JDK 11 and not JDK 1.8.

Even openjdk:8-stretch

https://github.com/docker-library/openjdk/blob/a886db8d5ea96b7bc0104b2f55fabd44bcb5e7c0/8/jdk/Dockerfile#L36

So maybe you have an older Solr docker tag?

Kevin Risden


On Fri, Jan 31, 2020 at 1:13 PM Walter Underwood 
wrote:

> Maybe you can give them an estimate of how much work it will be. See if
> legal will put it on their budget. Free software isn’t free, especially the
> “free kittens” kind.
>
> This guy offers consulting for custom Docker images.
>
> https://pythonspeed.com/about/
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jan 31, 2020, at 9:45 AM, Arnold Bronley 
> wrote:
> >
> > Thanks for the helpful information. It is a no-go because even though it
> is
> > OpenJDK and free, vendor is Oracle and legal dept. at our company is
> trying
> > to get away from anything Oracle.
> > It is little paranoid reaction, I agree.
> >
> > See the java.vendor property in following output.
> >
> > $ java -XshowSettings:properties -version
> > Property settings:
> >awt.toolkit = sun.awt.X11.XToolkit
> >file.encoding = UTF-8
> >file.encoding.pkg = sun.io
> >file.separator = /
> >java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
> >java.awt.printerjob = sun.print.PSPrinterJob
> >java.class.path = .
> >java.class.version = 52.0
> >java.endorsed.dirs =
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/endorsed
> >java.ext.dirs = /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext
> >/usr/java/packages/lib/ext
> >java.home = /usr/lib/jvm/java-8-openjdk-amd64/jre
> >java.io.tmpdir = /tmp
> >java.library.path = /usr/java/packages/lib/amd64
> >/usr/lib/x86_64-linux-gnu/jni
> >/lib/x86_64-linux-gnu
> >/usr/lib/x86_64-linux-gnu
> >/usr/lib/jni
> >/lib
> >/usr/lib
> >java.runtime.name = OpenJDK Runtime Environment
> >java.runtime.version = 1.8.0_181-8u181-b13-1~deb9u1-b13
> >java.specification.name = Java Platform API Specification
> >java.specification.vendor = Oracle Corporation
> >java.specification.version = 1.8
> >java.vendor = Oracle Corporation
> >java.vendor.url = http://java.oracle.com/
> >java.vendor.url.bug = http://bugreport.sun.com/bugreport/
> >java.version = 1.8.0_181
> >java.vm.info = mixed mode
> >java.vm.name = OpenJDK 64-Bit Server VM
> >java.vm.specification.name = Java Virtual Machine Specification
> >java.vm.specification.vendor = Oracle Corporation
> >java.vm.specification.version = 1.8
> >java.vm.vendor = Oracle Corporation
> >java.vm.version = 25.181-b13
> >line.separator = \n
> >os.arch = amd64
> >os.name = Linux
> >os.version = 4.9.0-8-amd64
> >path.separator = :
> >sun.arch.data.model = 64
> >sun.boot.class.path =
> > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/resources.jar
> >/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar
> >/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/sunrsasign.jar
> >/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jsse.jar
> >/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jce.jar
> >/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/charsets.jar
> >/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jfr.jar
> >/usr/lib/jvm/java-8-openjdk-amd64/jre/classes
> >sun.boot.library.path =
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64
> >sun.cpu.endian = little
> >sun.cpu.isalist =
> >sun.io.unicode.encoding = UnicodeLittle
> >sun.java.launcher = SUN_STANDARD
> >sun.jnu.encoding = UTF-8
> >sun.management.compiler = HotSpot 64-Bit Tiered Compilers
> >sun.os.patch.level = unknown
> >user.country = US
> >user.dir = /opt/solr
> >user.home = /home/solr
> >user.language = en
> >user.name = solr
> >user.timezone =
> >
> > openjdk version "1.8.0_181"
> > OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-1~deb9u1-b13)
> > OpenJDK 64-Bit Server VM (build

Re: SQL selectable fields

2020-01-24 Thread Kevin Risden
So I haven't looked at this in a few years, but the columns should be
registered in the SQL catalog so you should be able to ask via SQL for all
the columns.

describe table or using the JDBC metadata should work.

There may be some edge cases where depending on sharding you get into a
case where the columns aren't registered since we look at Luke to determine
what fields are really there for type information.

Kevin Risden


On Fri, Jan 24, 2020 at 9:48 AM Joel Bernstein  wrote:

> Does "_nest_path_" come back in a normal search? I would expect that the
> fields that are returned by normal searches would also work in SQL. If that
> turns out to be the case you could derive the fields from performing a
> search and seeing what fields are returned.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Jan 23, 2020 at 3:02 PM Nick Vercammen  >
> wrote:
>
> > Hey All,
> >
> > is there a way to get a list of all fields in a collection that can be
> used
> > in an SQL query? Currently I retrieve a list of fields through the schema
> > api: GET col/schema/fields.
> >
> > This returns all fields in a collection. But when I do a select on all
> > fields I get an exception because apparently _nest_path_ is no column in
> > the collection table:
> >
> > Failed to execute sqlQuery 'SELECT  films._text_ AS text,
> films._nest_path_
> > FROM films LIMIT 2000' against JDBC connection 'jdbc:calcitesolr:'.
> > Error while executing SQL "SELECT  films._text_ AS text,
> films._nest_path_
> > FROM films LIMIT 2000": From line 1, column 37 to line 1, column 47:
> Column
> > '_nest_path_' not found in table 'films'
> >
> > Can I determine which fields can be used in a SQL query? By means of the
> > type?
> >
> > kind regards,
> >
> > Nick
> >
>


Re: ConnectionImpl.isValid() does not behave as described in Connection javadocs

2020-01-21 Thread Kevin Risden
Nick - Feel free to open a Jira and PR. I think the disconnect is the
meaning of timeout=0 between JDBC and the Solr client.

Kevin Risden


On Sun, Jan 19, 2020 at 3:34 PM Nick Vercammen 
wrote:

> I think so as the ConnectionImpl in solr is not in line with the
> description of the java connection interface
>
> > Op 19 jan. 2020 om 21:23 heeft Erick Erickson 
> het volgende geschreven:
> >
> > Is this a Solr issue?
> >
> >> On Sun, Jan 19, 2020, 14:24 Nick Vercammen 
> >> wrote:
> >>
> >> Hello,
> >>
> >> I'm trying to write a solr driver for metabase. Internally metabase
> uses a
> >> C3P0 connection pool. Upon checkout of the connection from the pool the
> >> library does a call to isValid(0) (timeout = 0)
> >>
> >> According to the javadocs (
> >>
> >>
> https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/Connection.html#isValid(int)
> >> )
> >> a
> >> timeout = 0 means no timeout. In the current implementation a timeout =
> 0
> >> means that the connection is always invalid.
> >>
> >> I can provide a PR for this.
> >>
> >> Nick
> >>
> >> --
> >> [image: Zeticon]
> >> Nick Vercammen
> >> CTO
> >> +32 9 275 31 31
> >> +32 471 39 77 36
> >> nick.vercam...@zeticon.com
> >> <https://www.facebook.com/MediaHaven-1536452166583533/>
> >> <https://www.linkedin.com/company/zeticon/> <
> >> https://twitter.com/mediahaven>
> >> www.zeticon.com
> >>
>


Re: Solr 8.4.0 Cloud Graph is not shown due to CSP

2020-01-07 Thread Kevin Risden
So this is caused by SOLR-13982 [1] and specifically SOLR-13987 [2]. Can
you open a new Jira specifically for this? It would be great if you could
capture from Chrome Dev Tools (or Firefox) the error message around what
specifically CSP is complaining about.

The other thing to ensure is that you force refresh the UI to make sure
nothing is cached. Idk if that is in play here but doesn't hurt.

[1] https://issues.apache.org/jira/browse/SOLR-13982
[2] https://issues.apache.org/jira/browse/SOLR-13987

Kevin Risden

On Tue, Jan 7, 2020, 11:15 Jörn Franke  wrote:

> Dear all,
>
> I noted that in Solr Cloud 8.4.0 the graph is not shown due to
> Content-Security-Policy. Apparently it violates unsafe-eval.
> It is a minor UI thing, but should I create an issue to that one? Maybe it
> is rather easy to avoid in the source code of the admin page?
>
> Thank you.
>
> Best regards
>
>
>


Re: Client Cert Broken in Solr 8.2.0 because of a Jetty Issue (workaround included)

2019-12-19 Thread Kevin Risden
Thanks for the report Ryan. It looks like this fell through the cracks and
was reported a second time in Jira.

https://issues.apache.org/jira/browse/SOLR-14106

I have a patch up there that should help with some comments about multiple
clientAuth certificates.

Kevin Risden


On Fri, Sep 27, 2019 at 1:04 PM Ryan Rockenbaugh
 wrote:

> All,
> If you are using client authentication with SSL in Solr
> (SOLR_SSL_NEED_CLIENT_AUTH=true or  SOLR_SSL_WANT_CLIENT_AUTH=true), be
> advised that Jetty made a change that will break Solr 8.2.0
> The version of Jetty packaged with Solr 8.2.0 changed to 9.4.19.v20190610
> (see
> https://lucene.apache.org/solr/8_2_0/changes/Changes.html#v8.2.0.versions_of_major_components
> )
> The official Jetty issue is here:
> https://github.com/eclipse/jetty.project/issues/3554
> The stated fix is:
> Set endpointIdentificationAlgorithm=null or better yet use
> SslContextFactory.Server instead of a plain SslContextFactory.
> I found I couldn't change the class from SslContextFactory to
> SslContextFactory.Server
> My workaround was to update the file server/etc/jetty-ssl.xml, adding the
> following entry to the  element:
>
> 
> Thanks,
> Ryan Rockenbaugh
>
>
>
>
>
> "Do all the good you can, By all the means you can, In all the ways
> you can, In all the places you can, At all the times you can, To all
> the people you can, As long as ever you can."
>
>  - John Wesley


Re: CVE-2017-7525 fix for Solr 7.7.x

2019-12-18 Thread Kevin Risden
There are no specific plans for any 7.x branch releases that I'm aware of.
Specifically for SOLR-13110, that required upgrading Hadoop 2.x to 3.x for
specifically jackson-mapper-asl and there are no plans to backport that to
7.x even if there was a future 7.x release.

Kevin Risden


On Wed, Dec 18, 2019 at 8:44 AM Mehai, Lotfi 
wrote:

> Hello;
>
> We are using Solr 7.7.0. The CVE-2017-7525 have been fixed for Solr 8.x.
> https://issues.apache.org/jira/browse/SOLR-13110
>
> When the fix will be available for Solr 7.7.x
>
> Lotfi
>


Re: Active directory integration in Solr

2019-11-20 Thread Kevin Risden
So I wrote the blog more of an experiment above. I don't know if it is
fully operating other than on a single node. That being said, the Hadoop
authentication plugin doesn't require running on HDFS. It just uses the
Hadoop code to do authentication.

I will echo what Jorn said though - I wouldn't expose Solr to the internet
or directly without some sort of API. Whether you do
authentication/authorization at the API is a separate question.

Kevin Risden


On Wed, Nov 20, 2019 at 1:54 PM Jörn Franke  wrote:

> I would not give users directly access to Solr - even with LDAP plugin.
> Build a rest interface or web interface that does the authentication and
> authorization and security sanitization. Then you can also manage better
> excessive queries or explicitly forbid certain type of queries (eg specific
> streaming expressions - I would not expose all of them to users).
>
> > Am 19.11.2019 um 11:02 schrieb Kommu, Vinodh K. :
> >
> > Thanks Charlie.
> >
> > We are already using Basic authentication in our existing clusters,
> however it's getting difficult to maintain number of users as we are
> getting too many requests for readonly access from support teams. So we
> desperately looking for active directory solution. Just wondering if
> someone might have same requirement need.
> >
> >
> > Regards,
> > Vinodh
> >
> > -Original Message-
> > From: Charlie Hull 
> > Sent: Tuesday, November 19, 2019 2:55 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Active directory integration in Solr
> >
> > ATTENTION! This email originated outside of DTCC; exercise caution.
> >
> > Not out of the box, there are a few authentication plugins bundled but
> not for AD
> >
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F7_2%2Fauthentication-and-authorization-plugins.htmldata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245309858sdata=fkahJ62aWFYh7QxcyFQbJV9u8OsTYSWp6pv0MNdzjps%3Dreserved=0
> > - there's also some useful stuff in Apache ManifoldCF
> >
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.francelabs.com%2Fblog%2Ftutorial-on-authorizations-for-manifold-cf-and-solr%2Fdata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245319858sdata=iYiKRDJKYBZaxUd%2F%2BIddFBwxB2RhSqih2KZc26aZlRU%3Dreserved=0
> >
> >
> > Best
> >
> > Charlie
> >
> >> On 18/11/2019 15:08, Kommu, Vinodh K. wrote:
> >> Hi,
> >>
> >> Does anyone know that Solr has any out of the box capability to
> integrate Active directory (using LDAP) when security is enabled? Instead
> of creating users in security.json file, planning to use users who already
> exists in active directory so they can use their individual credentials
> rather than defining in Solr. Did anyone came across similar requirement?
> If so was there any working solution?
> >>
> >>
> >> Thanks,
> >> Vinodh
> >>
> >> DTCC DISCLAIMER: This email and any files transmitted with it are
> confidential and intended solely for the use of the individual or entity to
> whom they are addressed. If you have received this email in error, please
> notify us immediately and delete the email and any attachments from your
> system. The recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any damage caused
> by any virus transmitted by this email.
> >>
> >
> > --
> > Charlie Hull
> > Flax - Open Source Enterprise Search
> >
> > tel/fax: +44 (0)8700 118334
> > mobile:  +44 (0)7767 825828
> > web:
> https://nam02.safelinks.protection.outlook.com/?url=www.flax.co.ukdata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245319858sdata=YNGIg%2FVgL2w82i3JWsBkBTJeefHMjSxbjLaQyOdJVt0%3Dreserved=0
> >
> > DTCC DISCLAIMER: This email and any files transmitted with it are
> confidential and intended solely for the use of the individual or entity to
> whom they are addressed. If you have received this email in error, please
> notify us immediately and delete the email and any attachments from your
> system. The recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any damage caused
> by any virus transmitted by this email.
> >
>


Re: Clustering error in Solr 8.2.0

2019-08-08 Thread Kevin Risden
According to the stack trace:

java.lang.NoClassDefFoundError: org/apache/commons/lang/ObjectUtils
at lingo3g.s.hashCode(Unknown Source)

It looks like lingo3g - lingo3g isn't on Maven central and looks like it
requires a license to download. You would have to contact them to see if it
still uses commons-lang. You could also copy in commons-lang dependency.

Kevin Risden


On Thu, Aug 8, 2019 at 10:23 PM Zheng Lin Edwin Yeo 
wrote:

> Hi Erick,
>
> Thanks for your reply.
>
> My clustering code is taken as it is from the Solr package, only the codes
> related to lingo3g is taken from previous version.
>
> Below are the 3 files that I have taken from previous version:
> - lingo3g-1.15.0
> - morfologik-fsa-2.1.1
> - morfologik-stemming-2.1.1
>
> Does anyone of these could have caused the error?
>
> Regards,
> Edwin
>
> On Thu, 8 Aug 2019 at 19:56, Erick Erickson 
> wrote:
>
> > This dependency was removed as part of
> > https://issues.apache.org/jira/browse/SOLR-9079, so my guess is you’re
> > pointing to an old version of the clustering code.
> >
> > Best,
> > Erick
> >
> > > On Aug 8, 2019, at 4:22 AM, Zheng Lin Edwin Yeo 
> > wrote:
> > >
> > > ObjectUtils
> >
> >
>


Re: Solr on HDFS

2019-08-02 Thread Kevin Risden
>
> If you think about it, having a shard with 3 replicas on top of a file

system that does 3x replication seems a little excessive!


https://issues.apache.org/jira/browse/SOLR-6305 should help here. I can
take a look at merging the patch since looks like it has been helpful to
others.


Kevin Risden


On Fri, Aug 2, 2019 at 10:09 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Hi Kyle - Thank you.
>
> Our current index is split across 3 solr collections; our largest
> collection is 26.8TBytes (80.5TBytes when 3x replicated in HDFS) across
> 100 shards.  There are 40 machines hosting this cluster. We've found
> that when dealing with large collections having no replicas (but lots of
> shards) ends up being more reliable since there is a much smaller
> recovery time.  We keep another 30 day index (1.4TBytes) that does have
> replicas (40 shards, 3 replicas each), and if a node goes down, we
> manually delete lock files and then bring it back up and yes - lots of
> network IO, but it usually recovers OK.
>
> Having a large collection like this with no replicas seems like a recipe
> for disaster.  So, we've been experimenting with the latest version
> (8.2) and our index process to split up the data into many solr
> collections that do have replicas, and then build the list of
> collections to search at query time.  Our searches are date based, so we
> can define what collections we want to query at query time. As a test,
> we ran just two machines, HDFS, and 500 collections. One server ran out
> of memory and crashed.  We had over 1,600 lock files to delete.
>
> If you think about it, having a shard with 3 replicas on top of a file
> system that does 3x replication seems a little excessive! I'd love to
> see Solr take more advantage of a shared FS.  Perhaps an idea is to use
> HDFS but with an NFS gateway.  Seems like that may be slow.
> Architecturally, I love only having one large file system to manage
> instead of lots of individual file systems across many machines.  HDFS
> makes this easy.
>
> -Joe
>
> On 8/2/2019 9:10 AM, lstusr 5u93n4 wrote:
> > Hi Joe,
> >
> > We fought with Solr on HDFS for quite some time, and faced similar issues
> > as you're seeing. (See this thread, for example:"
> >
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201812.mbox/%3cCABd9LjTeacXpy3FFjFBkzMq6vhgu7Ptyh96+w-KC2p=-rqk...@mail.gmail.com%3e
> >   )
> >
> > The Solr lock files on HDFS get deleted if the Solr server gets shut down
> > gracefully, but we couldn't always guarantee that in our environment so
> we
> > ended up writing a custom startup script to search for lock files on HDFS
> > and delete them before solr startup.
> >
> > However, the issue that you mention of the Solr server rebuilding its
> whole
> > index from replicas on startup was enough of a show-stopper for us that
> we
> > switched away from HDFS to local disk. It literally made the difference
> > between 24+ hours of recovery time after an unexpected outage to less
> than
> > a minute...
> >
> > If you do end up finding a solution to this issue, please post it to this
> > mailing list, because there are others out there (like us!) who would
> most
> > definitely make use it.
> >
> > Thanks
> >
> > Kyle
> >
> > On Fri, 2 Aug 2019 at 08:58, Joe Obernberger <
> joseph.obernber...@gmail.com>
> > wrote:
> >
> >> Thank you.  No, while the cluster is using Cloudera for HDFS, we do not
> >> use Cloudera to manager the solr cluster.  If it is a
> >> configuration/architecture issue, what can I do to fix it?  I'd like a
> >> system where servers can come and go, but the indexes stay available and
> >> recover automatically.  Is that possible with HDFS?
> >> While adding an alias to other collections would be an option, if that
> >> collection is the only collection, or one that is currently needed, in a
> >> live system, we can't bring it down, re-create it, and re-index when
> >> that process may take weeks to do.
> >>
> >> Any ideas?
> >>
> >> -Joe
> >>
> >> On 8/1/2019 6:15 PM, Angie Rabelero wrote:
> >>> I don’t think you’re using claudera or ambari, but ambari has an option
> >> to delete the locks. This seems more a configuration/architecture isssue
> >> than a realibility issue. You may want to spin up an alias while you
> bring
> >> down, clear locks and directories, recreate and index the affected
> >> collection, while you work your other isues.
> >>> On Aug 1, 2019, at 16:40, Joe Obernberger <
> joseph.obernb

Re: How to use Parallel SQL Interface when basic auth is enabled on Solr cluster

2019-07-24 Thread Kevin Risden
Pretty sure you are running into
https://issues.apache.org/jira/browse/SOLR-8213

Always looking for patches to help improve things :)

Kevin Risden


On Wed, Jul 24, 2019 at 4:50 AM Suril Shah  wrote:

> Hi,
> I am using Solr Version 7.6.0 where Basic Authentication is enabled. I am
> trying to use Parallel SQL to run some SQL queries.
>
> This is the code snippet that I am using to connect to Solr and run some
> SQL queries on it. This works when authentication is not enabled on the
> Solr cluster.
>
>public Connection getSolrSqlJDBCConnection(String aggregationMode)
> throws SQLException
> {
>
> String solrZkConnString = ":2181";
>
> String collection = "customer";
>
> String numWorkers = "2";
>
> Connection solrSqlClientConn = null;
>
> try {
>
>   solrSqlClientConn = DriverManager.*getConnection*("jdbc:solr://" +
> solrZkConnString + "?collection="
>
>   + collection + "=" + aggregationMode +
> "=" + numWorkers);
>
>
>
> } catch (SQLException e) {
>
>   throw e;
>
> }
>
> return solrSqlClientConn;
>
>   }
>
>
>
>   public ResultSet executeSolrSqlStatement(String querySqlString, String
> aggregationMode) throwsSQLException {
>
> try {
>
>   Connection solrSqlJdbcConn =
> getSolrSqlJDBCConnection(aggregationMode);
>
>   Statement stmt = solrSqlJdbcConn.createStatement();
>
>   ResultSet rs = stmt.executeQuery(querySqlString);
>
>   solrSqlJdbcConn.close();
>
>   return rs;
>
> } catch (SQLException e) {
>
>   throw e;
>
> }
>
>   }
>
>
>
>public void testSelectOnIndex() throws SQLException {
>
>  String owner_id = "3cfc7734-e4b4-4c9b-b91e-44c8c5943fb0";
>
> String solrSqlString = "select customer_id_s, customer_name_s,
> country_s, city_s, postal_code_s, address_s from customer where owner_id_s
> = '"+owner_id+"'";
>
> System.*out*.println("solrSqlString = "+solrSqlString);
>
> try {
>
>   ResultSet sqlResultSet = executeSolrSqlStatement(solrSqlString,
> "map_reduce");
>
>   while (sqlResultSet.next()) {
>
> System.*out*.println("--- customer_id ---" +
> sqlResultSet.getString("customer_id_s"));
>
> System.*out*.println("--- customer_name ---" +
> sqlResultSet.getString("customer_name_s"));
>
> System.*out*.println("--- country ---" +
> sqlResultSet.getString("country_s"));
>
> System.*out*.println("--- city ---" +
> sqlResultSet.getString("city_s"));
>
> System.*out*.println("--- postalcode ---" +
> sqlResultSet.getString("postal_code_s"));
>
> System.*out*.println("--- address ---" +
> sqlResultSet.getString("address_s"));
>
>   }
>
> } catch (SQLException e) {
>
>   e.printStackTrace();
>
> }
>
>   }
>
>
> When authentication is enabled I tried adding the username and password
> JDBC connection string.
>
> Replaced one line in the getSolrSqlJDBCConnection() method:
>
>solrSqlClientConn = DriverManager.*getConnection*("jdbc:solr://" +
> solrZkConnString + "?collection=" + collection + "=" +
> aggregationMode + "=" + numWorkers,"","");
>
>
> The  and  here will be the username password for Solr.
>
>
> On making the above change, we are getting the following error:
>
>
>
> java.sql.SQLException: java.sql.SQLException: java.io.IOException: -->
> http://:8983/solr/customer_shard1_replica_n2/: An exception has
> occurred on the server, refer to server log for details.
>
> at
> io.strati.libs.forklift.org.apache.solr.client.solrj.io
> .sql.StatementImpl.executeQueryImpl(StatementImpl.java:74)
>
> at
> io.strati.libs.forklift.org.apache.solr.client.solrj.io
> .sql.StatementImpl.executeQuery(StatementImpl.java:111)
>
> at io.strati.search.Test.executeSolrSqlStatement(Test.java:54)
>
> at io.strati.search.Test.main(Test.java:20)
>
> Caused by: java.sql.SQLException: java.io.IOException: --> http://
> :8983/solr/customer_shard1_replica_n2/: An exception has occurred
> on the server, refer to server log for details.
>
> at
> io.strati.libs.forklift.org.apache.solr.client.solrj.io
> .sql.ResultSetImpl.(ResultSetImpl.java:83)
>
> at
> io.strati.libs.forklift.org.apache.solr.client.solrj.io
> .sql.StatementImpl.executeQueryImpl(StatementImpl.java:

Re: Solr Cloud Kerberos cookie rejected spnego

2019-06-23 Thread Kevin Risden
I don't think a Kerberos ticket without the hostname makes sense. You
almost always need a valid hostname and DNS for Kerberos to work
successfully.

Kevin Risden


On Sun, Jun 23, 2019 at 10:54 AM Rakesh Enjala
 wrote:

> Hi Team,
>
> Enabled solrcloud-7.4.0 with kerberos. While creating a collection getting
> below error
>
> org.apache.http.impl.auth.HttpAuthenticator; NEGOTIATE authentication
> error: No valid credentials provided (Mechanism level: No valid credentials
> provided (Mechanism level: Server not found in Kerberos database (7)))
> org.apache.http.client.protocol.ResponseProcessCookies; Cookie rejected
> [hadoop.auth="", version:0, domain:xxx.xxx.com, path:/, expiry:
> Illegal
> domain attribute "". Domain of origin: "localhost"
>
> enabled krb5 debug true and am able to find the actual problem is that
> sname is HTTP/localh...@realm.com, it should be HTTP/@DOMAIN1.COM not the
> localhost
>
> solr.in.sh
>
> SOLR_AUTH_TYPE="kerberos"
>
> SOLR_AUTHENTICATION_OPTS="-DauthenticationPlugin=org.apache.solr.security.KerberosPlugin
> -Djava.security.auth.login.config=/solr/jaas.conf
> -Dsun.security.krb5.debug=true -Dsolr.kerberos.cookie.domain=
> -Dsolr.kerberos.name.rules=DEFAULT -Dsolr.kerberos.principal=HTTP/@
> DOMAIN1.COM -Dsolr.kerberos.keytab=/solr/HTTP.keytab"
>
> Please help me out!
> *Regards,*
> *Rakesh Enjala*
>


Re: Odd error with Solr 8 log / ingestion

2019-06-06 Thread Kevin Risden
Do you see a message about idle timeout? There is a jetty bug with HTTP/2
and idle timeout that causes some stream closing. The jira below says test
error, but I'm pretty sure it could come up in real usage.

* https://issues.apache.org/jira/browse/SOLR-13413
* https://github.com/eclipse/jetty.project/issues/3605

Kevin Risden


On Thu, Jun 6, 2019 at 2:38 PM Erick Erickson 
wrote:

> Probably your packet size is too big for the Solr<->Solr default settings.
> Quick test would be to try sending 10 docs per packet, then 100, then 1,000
> etc.
>
> There’s not much to be gained efficiency-wise once you get past 100
> docs/shard, see:
> https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/
>
> Second, you’ll get improved throughput if you use SolrJ rather than a
> straight HTTP connection, but your setup may not be amenable to that
> alternative.
>
> Best,
> Erick
>
> > On Jun 6, 2019, at 11:23 AM, Erie Data Systems 
> wrote:
> >
> > Hello everyone,
> >
> > I recently setup Solr 8 in SolrCloud mode, previously I was using
> > standalone mode and was able to easily push 10,000 records in per HTTP
> call
> > wit autocommit. Ingestion occurs when server A pushes (HTTPS) payload to
> > server B (SolrCloud) on LAN network.
> >
> > However, once converted to SolrCloud (1 node, 3 shards, 1 replica) I am
> > seeing the following error :
> >
> > ConcurrentUpdateHttp2SolrClient
> > Error consuming and closing http response stream.
> >
> > Im wondering, what possibly causes could be, im not seeing much
> > documentation online specific to Solr.
> >
> > Thanks in advance for any assistance,
> > Craig
>
>


Re: SolrJ, CloudSolrClient and basic authentication

2019-06-03 Thread Kevin Risden
Chris - not sure if what you are seeing is related to basic auth
credentials not being sent until a 401. There was report of this behavior
with Apache Knox in front of Solr.

https://issues.apache.org/jira/browse/KNOX-1066

The jira above has an example of how to preemptively send basic auth
instead of waiting for the 401 from the server.

Kevin Risden


On Fri, May 31, 2019 at 4:28 PM Christopher Schultz <
ch...@christopherschultz.net> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Dimitris,
>
> On 6/1/18 02:46, Dimitris Kardarakos wrote:
> > Thanks a lot Shawn. I had tried with the documented approach, but
> > since I use SolrClient.add to add documents to the index, I could
> > not "port" the documented approach to my case (probably I do miss
> > something).
> >
> > The custom HttpClient suggestion worked as expected!
>
> Can you please explain how you did this?
>
> I'm facing a problem where the simplest possible solution is giving
> the error "org.apache.http.client.NonRepeatableRequestException:
> Cannot retry request with a non-repeatable request entity.".
>
> It seems that SolrClient is using something like BasicHttpEntity which
> isn't "repeatable" when using HTTP Basic auth (where the server is
> supposed to challenge the client and the client only then sends the
> credentials). I need to either make the client data repeatable (which
> is in SolrClient, which I'd prefer to avoid) or I need to make
> HttpClient use an "expectant" credential-sending technique, or I need
> to just stuff things into a header manually.
>
> What did you do to solve this problem? It seems like this should
> really probably come up more often than it does. Maybe nobody bothers
> to lock-down their Solr instances?
>
> Thanks,
> - -chris
>
> > On 31/05/2018 06:16 μμ, Shawn Heisey wrote:
> >> On 5/31/2018 8:03 AM, Dimitris Kardarakos wrote:
> >>> Following the feedback in the "Index protected zip" thread, I
> >>> am trying to add documents to the index using SolrJ API.
> >>>
> >>> The server is in SolrCloud mode with BasicAuthPlugin for
> >>> authentication.
> >>>
> >>> I have not managed to figure out how to pass username/password
> >>> to my client.
> >> There are two ways to approach this.
> >>
> >> One approach is to build a custom HttpClient object that uses
> >> credentials by default, and then use that custom HttpClient
> >> object to build your CloudSolrClient.  Exactly how to correctly
> >> build the HttpClient object will depend on exactly which
> >> HttpClient version you've included into your program.  If you go
> >> with SolrJ dependency defaults, then the HttpClient version will
> >> depend on the SolrJ version.
> >>
> >> The other approach is the method described in the documentation,
> >> where credentials are added to each request object:
> >>
> >> https://lucene.apache.org/solr/guide/6_6/basic-authentication-plugin.
> html#BasicAuthenticationPlugin-UsingBasicAuthwithSolrJ
> <https://lucene.apache.org/solr/guide/6_6/basic-authentication-plugin.html#BasicAuthenticationPlugin-UsingBasicAuthwithSolrJ>
> >>
> >>
> >>
> >>
> There are several different kinds of request objects.  A few examples:
> >> UpdateRequest, QueryRequest, CollectionAdminRequest.
> >>
> >> Thanks, Shawn
> >>
> >
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlzxjlEACgkQHPApP6U8
> pFhoeQ/7BzlhjGGE8tnMcrdmruP+N2rgvawfLcTdzDg3U4cQFNUVRoCclZcM8LiA
> iuZf+cAewTTQTjLpQuSv2WoknQgO/YRgaqTlo+b3hv9zR2awY8Mob/m5RYcYAwmn
> i+2SJrG7+u+qhpfDQGSjwppUKpm2WrfvGXL3lcRF48UXQ+z7J95o2g88SnP44FKH
> 87/X/iYX+xMsj0bkIEOkyppuXENQQwUZ7QWhgfAxSItJr2A0Ma6zkuuNPf4FvBJ1
> JQM/c33WWbAXK3B7tI5iQsstVi5CMOhRF0Z336/vZgq6aF9uEZvIOWEVAlM+E8Qp
> mYlZz7tERzUMs+QbcBcSdDIb8VSPwYy5kvKiJ9eEpjFGXmPBLOqiJ4M+4SOeGFq7
> BA5sbm6k4gwHc33MiKvnHE1K+k3r1OBPngjxvelsyIaqSnX3zpKPTFhkU2dvWMPt
> XPo/ICuiliGowD8xh5EhB6w0BuYZhK3dW7AKMCLbyoANwk7SLfHxC6O+rdmYyDQF
> UwiR65+3ImmeKJOZt7lFoR43BXoFuz6L1SILU8XRcclS5KwXHg3moBElU7jM9iKV
> 9vMwWkuPGUA2gq5K0oV4XFEOShxUxFiCL4FXjd/P7x9Evhio+itvaUlHzP8FGblh
> YyK+l2YqjKBnTJ0G4XE8UnJcmH8C23jJ05gwMgq92pXBQy5ly6s=
> =6kab
> -END PGP SIGNATURE-
>


Re: Status of solR / HDFS-v3 compatibility

2019-05-02 Thread Kevin Risden
For Apache Solr 7.x or older yes - Apache Hadoop 2.x was the dependency.
Apache Solr 8.0+ has Hadoop 3 compatibility with SOLR-9515. I did some
testing to make sure that Solr 8.0 worked on Hadoop 2 as well as Hadoop 3,
but the libraries are Hadoop 3.

The reference guide for 8.0+ hasn't been released yet, but also don't think
it was updated.

Kevin Risden


On Thu, May 2, 2019 at 9:32 AM Nicolas Paris 
wrote:

> Hi
>
> solr doc [1] says it's only compatible with hdfs 2.x
> is that true ?
>
>
> [1]: http://lucene.apache.org/solr/guide/7_7/running-solr-on-hdfs.html
>
> --
> nicolas
>


Re: solr 7.x sql query returns null

2019-04-18 Thread Kevin Risden
Do you have multiple shards (including replicas) on the same host for the
collection in question? Do the number of shards per host change on the
export/index?

Kevin Risden

On Thu, Apr 18, 2019, 20:50 Joel Bernstein  wrote:

> That stack trace points here:
>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.3.0/solr/core/src/java/org/apache/solr/handler/sql/SolrSchema.java#L103
>
> So the Sql Schema is not initializing properly for this dataset. I'd be
> interested in understanding why.
>
> If you want to create a jira ticket and attach your schema we can track
> this down. I'll probably attach a special binary to the ticket which has
> additional logging so we can can find out what field is causing the
> problem.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Apr 18, 2019 at 1:38 PM David Barnett  wrote:
>
> > Hi Joel, besides the solr log is there anywhere else i need to go ?
> > anything I need to set to get more detail ?
> >
> > On Thu, 18 Apr 2019 at 10:46, Joel Bernstein  wrote:
> >
> > > This let's make sure the jdbc URL is correct.
> > >
> > > Reloading the collection shouldn't effect much unless the schema is
> > > different.
> > >
> > > But as Shawn mentioned the stack trace is not coming from Solr. Is
> there
> > > more in the logs beyond the Calcite exception?
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Thu, Apr 18, 2019 at 11:04 AM Shawn Heisey 
> > wrote:
> > >
> > > > On 4/18/2019 1:47 AM, David Barnett wrote:
> > > > > I have a large solr 7.3 collection 400m + documents.
> > > > >
> > > > > I’m trying to use the Solr JDBC driver to query the data but I get
> a
> > > > >
> > > > > java.io.IOException: Failed to execute sqlQuery 'select id from
> > > document
> > > > limit 10' against JDBC connection 'jdbc:calcitesolr:'.
> > > > > Error while executing SQL "select id from document limit 10": null
> > > >
> > > > 
> > > >
> > > > By the way, either that JDBC url is extremely incomplete or you nuked
> > it
> > > > from the log before sharing.  Seeing the construction of the full URL
> > > > might be helpful.  If you need to redact it in some way for privacy
> > > > concerns, do so in a way so that we can still tell what the URL was -
> > > > change a real password to PASSWORD, change things like host names to
> > > > something like HOST_NAME, etc.
> > > >
> > > > > Caused by: java.lang.NullPointerException
> > > > >  at
> > > >
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.validate(VolcanoPlanner.java:891
> > > > >  at
> > > >
> > > >
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:866)
> > > > >  at
> > > >
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:883)
> > > > >  at
> > > >
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:101)
> > > > >  at
> > > >
> > >
> >
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:336)
> > > > >  at
> > > >
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1496)
> > > > >  at
> > > >
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:863)
> > > > >  at
> > > >
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:883)
> > > > >  at
> > > >
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:101)
> > > > >  at
> > > >
> > >
> >
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:336)
> > > > >  at
> > > >
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1496)
> > > > >  at
> > > >
> > >
> >
> org.apache.c

Re: All replicas created on the same node

2019-03-08 Thread Kevin Risden
Might be https://issues.apache.org/jira/browse/SOLR-13248

>From the upgrade notes 7.7:
SOLR-13248: The default replica placement strategy used in Solr has been
reverted to the 'legacy' policy used by Solr
 7.4 and previous versions. This is due to multiple bugs in the autoscaling
based replica placement strategy that was
 made default in Solr 7.5 which causes multiple replicas of the same shard
to be placed on the same node in addition
 to the maxShardsPerNode and createNodeSet parameters being ignored.
Although the default has changed, autoscaling
 will continue to be used if a cluster policy or preference is specified or
a collection level policy is in use.
 The default replica placement strategy can be changed to use autoscaling
again by setting a cluster property:
 curl -X POST -H 'Content-type:application/json' --data-binary '
 {
   "set-obj-property": {
 "defaults" : {
   "cluster": {
 "useLegacyReplicaAssignment":false
   }
 }
   }
 }' http://$SOLR_HOST:$SOLR_PORT/api/cluster

Kevin Risden


On Fri, Mar 8, 2019 at 3:04 PM levtannen  wrote:

> Hi community,
> I have solr 7.6 running on three nodes with about 400 collections with one
> shard  and 3 replicas per collection. I want replicas to be spread between
> all 3 nodes so that for every collection I have one replica per collection
> on each node.
> I create collections via the SolrJ code.
>  for (String collectionName: Names>){
> create =
> CollectionAdminRequest.createCollection(collectionName, source,
> 1, 3);
> result = solrClient.request(create);
> }
> In solr 7.4 it worked fine, but in solr 7.6 created replicas are not spread
> equally between nodes. In some collections all 3 replicas are created just
> on one node, in some 2 replicas are created  in one node and 1 in another
> and some collections are created correctly: I replica per node.
> Could anyone  give me advice on why it happened and how to fix it?
>
> Thank you.
> Lev Tannen
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: solr reads whole index on startup

2018-12-20 Thread Kevin Risden
Kyle - Thanks so much for the followup on this. Rarely do we get to
see results compared with detail.

Can you share the Solr HDFS configuration settings that you tested
with? Blockcache and direct memory size? I'd be curious just as a
reference point.

Kevin Risden

On Thu, Dec 20, 2018 at 10:31 AM lstusr 5u93n4  wrote:
>
> Hi All,
>
> To close this off, I'm sad to report that we've come to a end with Solr on
> HDFS.
>
> Here's what we finally did:
>  - created two brand-new identical Solr cloud clusters, one on HDFS and one
> on local disk.
> - 1 replica per node. Each node 16GB ram.
>  - Added documents.
>  - Compared start-up times for a single node after a graceful shutdown.
>
> What we observe:
>  - on startup, the replica will transition from "Gone" to "Down" fairly
> quickly. (a few seconds)
>  - The replica then spends some time in the "Down" state before
> transitioning to "Recovering"
>  - The replica stays in "Recovering" for some time, before transitioning to
> "Active"
>
> Results for 75M docs in the replica, replica size 28.5GB:
>
>   - HDFS
>  - Time in "Down": 4m 49s
>  - Time in "Recovering": 2m 30s
>  - Total time to restart: 7m 9s
>
>   - Local Disk
>  - Time in "Down": 0m 5s
>  - Time in "Recovering": 0m 8s
>  - Total time to restart: 0m 13s
>
>
> Results for 100M docs in the replica, replica size 37GB:
>
>- HDFS
> - Time in "Down": 8m 30s
>  - Time in "Recovering": 5m 19s
>  - Total time to restart: 13m 49s
>
>   - Local Disk
>  - Time in "Down": 0m 4s
>  - Time in "Recovering": 0m 10s
>  - Total time to restart: 0m 14s
>
>
> Conclusions:
>  - As the index size grows, Solr on HDFS has a trend towards increasing
> restart times that's not seen on local disk.
>
> Notes:
>  - HDFS in our environment is FINE. The network is FINE. We have hbase
> servers running on the same ESXi hosts as Solr, they access the same HDFS
> filesystem, and hbase bandwidth regularly exceeds 2GB/s. All latencies are
> sub-millisecond.
>  - The values reported above are averages. There's some variance to the
> results, but the averages are representative of the times we're seeing.
>
> Thanks for reading!
>
> Kyle
>
>
>
> On Mon, 10 Dec 2018 at 14:14, lstusr 5u93n4  wrote:
>
> > Hi Guys,
> >
> > >  What OS is it on?
> > CentOS 7
> >
> > >  With your indexes in HDFS, the HDFS software running
> > > inside Solr also needs heap memory to operate, and is probably going to
> > > set aside part of the heap for caching purposes.
> > We still have the solr.hdfs.blockcache.slab.count parameter set to the
> > default of 1, but we're going to tune this a bit and see what happens.
> >
> > > but for this setup, I'd definitely want a LOT more than 16GB.GB
> > So where would you start? We can easily double the number of servers to 6,
> > and put one replica on each (probably going to do this anyways.)  Would you
> > go bigger than 6 x 16GB ? Keeping in mind, even with our little 3 x 16GB we
> > haven't had performance problems... This thread kind of diverged that way,
> > but really the initial issue was just that the whole index seems to be read
> > on startup. (Which I fully understand may be resource related, but I have
> > yet to try reproduce on a smaller scale to confirm/deny.)
> >
> > > As Solr runs, it writes a GC log.  Can you share all of the GC log files
> > > that Solr has created?  There should not be any proprietary information
> > > in those files.
> >
> > This I can do. Actually, I've collected a lot of things, redacted any
> > private info, and collected here into a series of logs / screenshots.
> >
> > So what I did:
> >  - 16:49 GMT -- stopped solr on one node (node 4) using bin/solr stop, and
> > keeping the others alive.. Captured the solr log as it was stopping, and
> > uploaded here:
> >  - https://pastebin.com/raw/UhSTdb1h
> >
> > - 17:00 GMT  - restarted solr on the same node (other two stayed up the
> > whole time) and let it run for an hour. Captured the solr logs since the
> > startup here:
> > - https://pastebin.com/raw/S4Z9XVrG
> >
> >  - Observed the outbound network traffic from HDFS to this particular solr
> > instance during this time, screenshotted it, and put the image here: (times
> > are in EST for that screenshot)
> > - https://imagebin.ca/v/4PY63LAMSVV1
> >
> >  - Screenshotted the resource usage on the 

Re: solr reads whole index on startup

2018-12-05 Thread Kevin Risden
Do you have logs right before the following?

"we notice that the nodes go into "Recovering" state for about 10-12 hours
before finally coming alive."

Is there a peersync failure or something else in the logs indicating why
there is a full recovery?

Kevin Risden


On Wed, Dec 5, 2018 at 12:53 PM lstusr 5u93n4  wrote:

> Hi All,
>
> We have a collection:
>   - solr 7.5
>   - 3 shards, replication factor 2 for a total of 6 NRT replicas
>   - 3 servers, 16GB ram each
>   - 2 billion documents
>   - autoAddReplicas: false
>   - 2.1 TB on-disk index size
>   - index stored on hdfs on separate servers.
>
> If we (gracefully) shut down solr on all 3 servers, when we re-launch solr
> we notice that the nodes go into "Recovering" state for about 10-12 hours
> before finally coming alive.
>
> During this recovery time, we notice high network traffic outbound from our
> HDFS servers to our solr servers. The sum total of which is roughly
> equivalent to the index size on disk.
>
> So it seems to us that on startup, solr has to re-read the entire index
> before coming back alive.
>
> 1. is this assumption correct?
> 2. is there any way to mitigate this, so that solr can launch faster?
>
> Thanks!
>
> Kyle
>


Re: SolrCloud Replication Failure

2018-11-06 Thread Kevin Risden
Erick Erickson - I don't have much time to chase this down. Do you think
this a blocker for 7.6? It seems pretty serious.

Jeremy - This would be a good JIRA to create - we can move the conversation
there to try to get the right people involved.

Kevin Risden


On Fri, Nov 2, 2018 at 7:57 AM Jeremy Smith  wrote:

> Hi Susheel,
>
>  Yes, it appears that under certain conditions, if a follower is down
> when the leader gets an update, the follower will not receive that update
> when it comes back (or maybe it receives the update and it's then
> overwritten by its own transaction logs, I'm not sure).  Furthermore, if
> that follower then becomes the leader, it will replicate its own out of
> date value back to the former leader, even though the version number is
> lower.
>
>
>-Jeremy
>
> 
> From: Susheel Kumar 
> Sent: Thursday, November 1, 2018 2:57:00 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud Replication Failure
>
> Are we saying it has to do something with stop and restarting replica's
> otherwise I haven't seen/heard any issues with document updates and
> forwarding to replica's...
>
> Thanks,
> Susheel
>
> On Thu, Nov 1, 2018 at 12:58 PM Erick Erickson 
> wrote:
>
> > So  this seems like it absolutely needs a JIRA
> > On Thu, Nov 1, 2018 at 9:39 AM
> Kevin Risden
>  wrote:
> > >
> > > I pushed 3 branches that modifies test.sh to test 5.5, 6.6, and 7.5
> > locally
> > > without docker. I still see the same behavior where the latest updates
> > > aren't on the replicas. I still don't know what is happening but it
> > happens
> > > without Docker :(
> > >
> > >
> >
> https://github.com/risdenk/test-solr-start-stop-replica-consistency/branches
> > >
> > > Kevin Risden
> > >
> > >
> > > On Thu, Nov 1, 2018 at 11:41 AM Kevin Risden 
> wrote:
> > >
> > > > Erick - Yea thats a fair point. Would be interesting to see if this
> > fails
> > > > without Docker.
> > > >
> > > > Kevin Risden
> > > >
> > > >
> > > > On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> Kevin:
> > > >>
> > > >> You're also using Docker, right? Docker is not "officially"
> supported
> > > >> although there's some movement in that direction and if this is only
> > > >> reproducible in Docker than it's a clue where to look
> > > >>
> > > >> Erick
> > > >> On Wed, Oct 31, 2018 at 7:24 PM
> > > >> Kevin Risden
> > > >>  wrote:
> > > >> >
> > > >> > I haven't dug into why this is happening but it definitely
> > reproduces. I
> > > >> > removed the local requirements (port mapping and such) from the
> > gist you
> > > >> > posted (very helpful). I confirmed this fails locally and on
> Travis
> > CI.
> > > >> >
> > > >> >
> https://github.com/risdenk/test-solr-start-stop-replica-consistency
> > > >> >
> > > >> > I don't even see the first update getting applied from num 10 ->
> 20.
> > > >> After
> > > >> > the first update there is no more change.
> > > >> >
> > > >> > Kevin Risden
> > > >> >
> > > >> >
> > > >> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith  >
> > > >> wrote:
> > > >> >
> > > >> > > Thanks Erick, this is 7.5.0.
> > > >> > > 
> > > >> > > From: Erick Erickson 
> > > >> > > Sent: Wednesday, October 31, 2018 8:20:18 PM
> > > >> > > To: solr-user
> > > >> > > Subject: Re: SolrCloud Replication Failure
> > > >> > >
> > > >> > > What version of solr? This code was pretty much rewriten in 7.3
> > IIRC
> > > >> > >
> > > >> > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith  > wrote:
> > > >> > >
> > > >> > > > Hi all,
> > > >> > > >
> > > >> > > >  We are currently running a moderately large instance of
> > > >> standalone
> > > >> > > > solr and are preparing to switch to solr cloud 

Re: solr cloud - hdfs folder structure best practice

2018-11-02 Thread Kevin Risden
I prefer a single HDFS home since it definitely simplifies things. No need
to create folders for each node or anything like that if you add nodes to
the cluster. The replicas underneath will get their own folders. I don't
know if there are issues with autoAddReplicas or other types of failovers
if there are different home folders.

I've run Solr on HDFS with the same basic configs as listed here:
https://risdenk.github.io/2018/10/23/apache-solr-running-on-apache-hadoop-hdfs.html

Kevin Risden


On Fri, Nov 2, 2018 at 1:19 PM lstusr 5u93n4  wrote:

> Hi All,
>
> Here's a question that I can't find an answer to in the documentation:
>
> When configuring solr cloud with HDFS, is it best to:
>   a) provide a unique hdfs folder for each solr cloud instance
> or
>   b) provide the same hdfs folder to all solr cloud instances.
>
> So for example, if I have two solr cloud nodes, I can configure them either
> with:
>
>node1: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node1
>node2: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node2
>
> Or I could configure both nodes with:
>
> -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr
>
> In the second option, all solr cloud nodes can "see" all index files from
> all other solr cloud nodes. Are there pros or cons to allowing the all of
> the solr nodes to see all files in the collection?
>
> Thanks,
>
> Kyle
>


Re: SolrCloud Replication Failure

2018-11-01 Thread Kevin Risden
I pushed 3 branches that modifies test.sh to test 5.5, 6.6, and 7.5 locally
without docker. I still see the same behavior where the latest updates
aren't on the replicas. I still don't know what is happening but it happens
without Docker :(

https://github.com/risdenk/test-solr-start-stop-replica-consistency/branches

Kevin Risden


On Thu, Nov 1, 2018 at 11:41 AM Kevin Risden  wrote:

> Erick - Yea thats a fair point. Would be interesting to see if this fails
> without Docker.
>
> Kevin Risden
>
>
> On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson 
> wrote:
>
>> Kevin:
>>
>> You're also using Docker, right? Docker is not "officially" supported
>> although there's some movement in that direction and if this is only
>> reproducible in Docker than it's a clue where to look
>>
>> Erick
>> On Wed, Oct 31, 2018 at 7:24 PM
>> Kevin Risden
>>  wrote:
>> >
>> > I haven't dug into why this is happening but it definitely reproduces. I
>> > removed the local requirements (port mapping and such) from the gist you
>> > posted (very helpful). I confirmed this fails locally and on Travis CI.
>> >
>> > https://github.com/risdenk/test-solr-start-stop-replica-consistency
>> >
>> > I don't even see the first update getting applied from num 10 -> 20.
>> After
>> > the first update there is no more change.
>> >
>> > Kevin Risden
>> >
>> >
>> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith 
>> wrote:
>> >
>> > > Thanks Erick, this is 7.5.0.
>> > > 
>> > > From: Erick Erickson 
>> > > Sent: Wednesday, October 31, 2018 8:20:18 PM
>> > > To: solr-user
>> > > Subject: Re: SolrCloud Replication Failure
>> > >
>> > > What version of solr? This code was pretty much rewriten in 7.3 IIRC
>> > >
>> > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith > > >
>> > > > Hi all,
>> > > >
>> > > >  We are currently running a moderately large instance of
>> standalone
>> > > > solr and are preparing to switch to solr cloud to help us scale
>> up.  I
>> > > have
>> > > > been running a number of tests using docker locally and ran into an
>> issue
>> > > > where replication is consistently failing.  I have pared down the
>> test
>> > > case
>> > > > as minimally as I could.  Here's a link for the docker-compose.yml
>> (I put
>> > > > it in a directory called solrcloud_simple) and a script to run the
>> test:
>> > > >
>> > > >
>> > > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
>> > > >
>> > > >
>> > > > Here's the basic idea behind the test:
>> > > >
>> > > >
>> > > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2
>> > > > replicas (each node gets a replica).  Just use the default schema,
>> > > although
>> > > > I've also tried our schema and got the same result.
>> > > >
>> > > >
>> > > > 2) Shut down solr-2
>> > > >
>> > > >
>> > > > 3) Add 100 simple docs, just id and a field called num.
>> > > >
>> > > >
>> > > > 4) Start solr-2 and check that it received the documents.  It did!
>> > > >
>> > > >
>> > > > 5) Update a document, commit, and check that solr-2 received the
>> update.
>> > > > It did!
>> > > >
>> > > >
>> > > > 6) Stop solr-2, update the same document, start solr-2, and make
>> sure
>> > > that
>> > > > it received the update.  It did!
>> > > >
>> > > >
>> > > > 7) Repeat step 6 with a new value.  This time solr-2 reverts back
>> to what
>> > > > it had in step 5.
>> > > >
>> > > >
>> > > > I believe the main issue comes from this in the logs:
>> > > >
>> > > >
>> > > > solr-2_1  | 2018-10-31 17:04:26.135 INFO
>> > > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
>> > > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test
>> s:shard1
>> > > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync:
>> > > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our
>> versions
>> > > are
>> > > > newer. ourHighThreshold=1615861330901729280
>> > > > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280
>> > > > otherHighest=1615861335081353216
>> > > >
>> > > > PeerSync thinks the versions on solr-2 are newer for some reason,
>> so it
>> > > > doesn't try to sync from solr-1.  In the final state, solr-2 will
>> always
>> > > > have a lower version for the updated doc than solr-1.  I've tried
>> this
>> > > with
>> > > > different commit strategies, both auto and manual, and it doesn't
>> seem to
>> > > > make any difference.
>> > > >
>> > > > Is this a bug with solr, an issue with using docker, or am I just
>> > > > expecting too much from solr?
>> > > >
>> > > > Thanks for any insights you may have,
>> > > >
>> > > > Jeremy
>> > > >
>> > > >
>> > > >
>> > >
>>
>


Re: SolrCloud Replication Failure

2018-11-01 Thread Kevin Risden
Erick - Yea thats a fair point. Would be interesting to see if this fails
without Docker.

Kevin Risden


On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson 
wrote:

> Kevin:
>
> You're also using Docker, right? Docker is not "officially" supported
> although there's some movement in that direction and if this is only
> reproducible in Docker than it's a clue where to look
>
> Erick
> On Wed, Oct 31, 2018 at 7:24 PM
> Kevin Risden
>  wrote:
> >
> > I haven't dug into why this is happening but it definitely reproduces. I
> > removed the local requirements (port mapping and such) from the gist you
> > posted (very helpful). I confirmed this fails locally and on Travis CI.
> >
> > https://github.com/risdenk/test-solr-start-stop-replica-consistency
> >
> > I don't even see the first update getting applied from num 10 -> 20.
> After
> > the first update there is no more change.
> >
> > Kevin Risden
> >
> >
> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith 
> wrote:
> >
> > > Thanks Erick, this is 7.5.0.
> > > 
> > > From: Erick Erickson 
> > > Sent: Wednesday, October 31, 2018 8:20:18 PM
> > > To: solr-user
> > > Subject: Re: SolrCloud Replication Failure
> > >
> > > What version of solr? This code was pretty much rewriten in 7.3 IIRC
> > >
> > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith  > >
> > > > Hi all,
> > > >
> > > >  We are currently running a moderately large instance of
> standalone
> > > > solr and are preparing to switch to solr cloud to help us scale up.
> I
> > > have
> > > > been running a number of tests using docker locally and ran into an
> issue
> > > > where replication is consistently failing.  I have pared down the
> test
> > > case
> > > > as minimally as I could.  Here's a link for the docker-compose.yml
> (I put
> > > > it in a directory called solrcloud_simple) and a script to run the
> test:
> > > >
> > > >
> > > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
> > > >
> > > >
> > > > Here's the basic idea behind the test:
> > > >
> > > >
> > > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2
> > > > replicas (each node gets a replica).  Just use the default schema,
> > > although
> > > > I've also tried our schema and got the same result.
> > > >
> > > >
> > > > 2) Shut down solr-2
> > > >
> > > >
> > > > 3) Add 100 simple docs, just id and a field called num.
> > > >
> > > >
> > > > 4) Start solr-2 and check that it received the documents.  It did!
> > > >
> > > >
> > > > 5) Update a document, commit, and check that solr-2 received the
> update.
> > > > It did!
> > > >
> > > >
> > > > 6) Stop solr-2, update the same document, start solr-2, and make sure
> > > that
> > > > it received the update.  It did!
> > > >
> > > >
> > > > 7) Repeat step 6 with a new value.  This time solr-2 reverts back to
> what
> > > > it had in step 5.
> > > >
> > > >
> > > > I believe the main issue comes from this in the logs:
> > > >
> > > >
> > > > solr-2_1  | 2018-10-31 17:04:26.135 INFO
> > > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
> > > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test
> s:shard1
> > > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync:
> > > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our
> versions
> > > are
> > > > newer. ourHighThreshold=1615861330901729280
> > > > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280
> > > > otherHighest=1615861335081353216
> > > >
> > > > PeerSync thinks the versions on solr-2 are newer for some reason, so
> it
> > > > doesn't try to sync from solr-1.  In the final state, solr-2 will
> always
> > > > have a lower version for the updated doc than solr-1.  I've tried
> this
> > > with
> > > > different commit strategies, both auto and manual, and it doesn't
> seem to
> > > > make any difference.
> > > >
> > > > Is this a bug with solr, an issue with using docker, or am I just
> > > > expecting too much from solr?
> > > >
> > > > Thanks for any insights you may have,
> > > >
> > > > Jeremy
> > > >
> > > >
> > > >
> > >
>


Re: SolrCloud Replication Failure

2018-11-01 Thread Kevin Risden
So I just added PRs 5.5, 6.6, 7.1, 7.2, 7.3, 7.4, and 7.5. They all seem to
have the exact same behavior... I don't have much more insight here but it
doesn't seem to be correct.

Kevin Risden


On Thu, Nov 1, 2018 at 9:45 AM Kevin Risden  wrote:

> Ahhh your PR triggered an idea. I'll open a few PRs adjusting the Solr
> version from latest back to  earlier 7.x versions. See which version the
> problem was introduced in.
>
> Kevin Risden
>
>
> On Thu, Nov 1, 2018 at 9:17 AM Jeremy Smith  wrote:
>
>> Thanks so much for looking into this and cleaning up my code.
>>
>>
>> I added a pull request to show some additional strange behavior.  If we
>> restart solr-1, making solr-2 the leader, the out of date value of [10]
>> gets propagated back to solr-1.  Perhaps this will give a hint as to what
>> is going on.
>>
>> 
>> From:
>> Kevin Risden
>> 
>> Sent: Wednesday, October 31, 2018 10:24:24 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SolrCloud Replication Failure
>>
>> I haven't dug into why this is happening but it definitely reproduces. I
>> removed the local requirements (port mapping and such) from the gist you
>> posted (very helpful). I confirmed this fails locally and on Travis CI.
>>
>> https://github.com/risdenk/test-solr-start-stop-replica-consistency
>>
>> I don't even see the first update getting applied from num 10 -> 20. After
>> the first update there is no more change.
>>
>> Kevin Risden
>>
>>
>> On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith  wrote:
>>
>> > Thanks Erick, this is 7.5.0.
>> > 
>> > From: Erick Erickson 
>> > Sent: Wednesday, October 31, 2018 8:20:18 PM
>> > To: solr-user
>> > Subject: Re: SolrCloud Replication Failure
>> >
>> > What version of solr? This code was pretty much rewriten in 7.3 IIRC
>> >
>> > On Wed, Oct 31, 2018, 10:47 Jeremy Smith > >
>> > > Hi all,
>> > >
>> > >  We are currently running a moderately large instance of
>> standalone
>> > > solr and are preparing to switch to solr cloud to help us scale up.  I
>> > have
>> > > been running a number of tests using docker locally and ran into an
>> issue
>> > > where replication is consistently failing.  I have pared down the test
>> > case
>> > > as minimally as I could.  Here's a link for the docker-compose.yml (I
>> put
>> > > it in a directory called solrcloud_simple) and a script to run the
>> test:
>> > >
>> > >
>> > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
>> > >
>> > >
>> > > Here's the basic idea behind the test:
>> > >
>> > >
>> > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2
>> > > replicas (each node gets a replica).  Just use the default schema,
>> > although
>> > > I've also tried our schema and got the same result.
>> > >
>> > >
>> > > 2) Shut down solr-2
>> > >
>> > >
>> > > 3) Add 100 simple docs, just id and a field called num.
>> > >
>> > >
>> > > 4) Start solr-2 and check that it received the documents.  It did!
>> > >
>> > >
>> > > 5) Update a document, commit, and check that solr-2 received the
>> update.
>> > > It did!
>> > >
>> > >
>> > > 6) Stop solr-2, update the same document, start solr-2, and make sure
>> > that
>> > > it received the update.  It did!
>> > >
>> > >
>> > > 7) Repeat step 6 with a new value.  This time solr-2 reverts back to
>> what
>> > > it had in step 5.
>> > >
>> > >
>> > > I believe the main issue comes from this in the logs:
>> > >
>> > >
>> > > solr-2_1  | 2018-10-31 17:04:26.135 INFO
>> > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
>> > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test
>> s:shard1
>> > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync:
>> > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our versions
>> > are
>> > > newer. ourHighThreshold=1615861330901729280
>> > > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280
>> > > otherHighest=1615861335081353216
>> > >
>> > > PeerSync thinks the versions on solr-2 are newer for some reason, so
>> it
>> > > doesn't try to sync from solr-1.  In the final state, solr-2 will
>> always
>> > > have a lower version for the updated doc than solr-1.  I've tried this
>> > with
>> > > different commit strategies, both auto and manual, and it doesn't
>> seem to
>> > > make any difference.
>> > >
>> > > Is this a bug with solr, an issue with using docker, or am I just
>> > > expecting too much from solr?
>> > >
>> > > Thanks for any insights you may have,
>> > >
>> > > Jeremy
>> > >
>> > >
>> > >
>> >
>>
>


Re: SolrCloud Replication Failure

2018-11-01 Thread Kevin Risden
Ahhh your PR triggered an idea. I'll open a few PRs adjusting the Solr
version from latest back to  earlier 7.x versions. See which version the
problem was introduced in.

Kevin Risden


On Thu, Nov 1, 2018 at 9:17 AM Jeremy Smith  wrote:

> Thanks so much for looking into this and cleaning up my code.
>
>
> I added a pull request to show some additional strange behavior.  If we
> restart solr-1, making solr-2 the leader, the out of date value of [10]
> gets propagated back to solr-1.  Perhaps this will give a hint as to what
> is going on.
>
> ____
> From:
> Kevin Risden
> 
> Sent: Wednesday, October 31, 2018 10:24:24 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud Replication Failure
>
> I haven't dug into why this is happening but it definitely reproduces. I
> removed the local requirements (port mapping and such) from the gist you
> posted (very helpful). I confirmed this fails locally and on Travis CI.
>
> https://github.com/risdenk/test-solr-start-stop-replica-consistency
>
> I don't even see the first update getting applied from num 10 -> 20. After
> the first update there is no more change.
>
> Kevin Risden
>
>
> On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith  wrote:
>
> > Thanks Erick, this is 7.5.0.
> > 
> > From: Erick Erickson 
> > Sent: Wednesday, October 31, 2018 8:20:18 PM
> > To: solr-user
> > Subject: Re: SolrCloud Replication Failure
> >
> > What version of solr? This code was pretty much rewriten in 7.3 IIRC
> >
> > On Wed, Oct 31, 2018, 10:47 Jeremy Smith  >
> > > Hi all,
> > >
> > >  We are currently running a moderately large instance of standalone
> > > solr and are preparing to switch to solr cloud to help us scale up.  I
> > have
> > > been running a number of tests using docker locally and ran into an
> issue
> > > where replication is consistently failing.  I have pared down the test
> > case
> > > as minimally as I could.  Here's a link for the docker-compose.yml (I
> put
> > > it in a directory called solrcloud_simple) and a script to run the
> test:
> > >
> > >
> > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
> > >
> > >
> > > Here's the basic idea behind the test:
> > >
> > >
> > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2
> > > replicas (each node gets a replica).  Just use the default schema,
> > although
> > > I've also tried our schema and got the same result.
> > >
> > >
> > > 2) Shut down solr-2
> > >
> > >
> > > 3) Add 100 simple docs, just id and a field called num.
> > >
> > >
> > > 4) Start solr-2 and check that it received the documents.  It did!
> > >
> > >
> > > 5) Update a document, commit, and check that solr-2 received the
> update.
> > > It did!
> > >
> > >
> > > 6) Stop solr-2, update the same document, start solr-2, and make sure
> > that
> > > it received the update.  It did!
> > >
> > >
> > > 7) Repeat step 6 with a new value.  This time solr-2 reverts back to
> what
> > > it had in step 5.
> > >
> > >
> > > I believe the main issue comes from this in the logs:
> > >
> > >
> > > solr-2_1  | 2018-10-31 17:04:26.135 INFO
> > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
> > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test s:shard1
> > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync:
> > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our versions
> > are
> > > newer. ourHighThreshold=1615861330901729280
> > > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280
> > > otherHighest=1615861335081353216
> > >
> > > PeerSync thinks the versions on solr-2 are newer for some reason, so it
> > > doesn't try to sync from solr-1.  In the final state, solr-2 will
> always
> > > have a lower version for the updated doc than solr-1.  I've tried this
> > with
> > > different commit strategies, both auto and manual, and it doesn't seem
> to
> > > make any difference.
> > >
> > > Is this a bug with solr, an issue with using docker, or am I just
> > > expecting too much from solr?
> > >
> > > Thanks for any insights you may have,
> > >
> > > Jeremy
> > >
> > >
> > >
> >
>


Re: SolrCloud Replication Failure

2018-10-31 Thread Kevin Risden
I haven't dug into why this is happening but it definitely reproduces. I
removed the local requirements (port mapping and such) from the gist you
posted (very helpful). I confirmed this fails locally and on Travis CI.

https://github.com/risdenk/test-solr-start-stop-replica-consistency

I don't even see the first update getting applied from num 10 -> 20. After
the first update there is no more change.

Kevin Risden


On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith  wrote:

> Thanks Erick, this is 7.5.0.
> 
> From: Erick Erickson 
> Sent: Wednesday, October 31, 2018 8:20:18 PM
> To: solr-user
> Subject: Re: SolrCloud Replication Failure
>
> What version of solr? This code was pretty much rewriten in 7.3 IIRC
>
> On Wed, Oct 31, 2018, 10:47 Jeremy Smith 
> > Hi all,
> >
> >  We are currently running a moderately large instance of standalone
> > solr and are preparing to switch to solr cloud to help us scale up.  I
> have
> > been running a number of tests using docker locally and ran into an issue
> > where replication is consistently failing.  I have pared down the test
> case
> > as minimally as I could.  Here's a link for the docker-compose.yml (I put
> > it in a directory called solrcloud_simple) and a script to run the test:
> >
> >
> > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
> >
> >
> > Here's the basic idea behind the test:
> >
> >
> > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2
> > replicas (each node gets a replica).  Just use the default schema,
> although
> > I've also tried our schema and got the same result.
> >
> >
> > 2) Shut down solr-2
> >
> >
> > 3) Add 100 simple docs, just id and a field called num.
> >
> >
> > 4) Start solr-2 and check that it received the documents.  It did!
> >
> >
> > 5) Update a document, commit, and check that solr-2 received the update.
> > It did!
> >
> >
> > 6) Stop solr-2, update the same document, start solr-2, and make sure
> that
> > it received the update.  It did!
> >
> >
> > 7) Repeat step 6 with a new value.  This time solr-2 reverts back to what
> > it had in step 5.
> >
> >
> > I believe the main issue comes from this in the logs:
> >
> >
> > solr-2_1  | 2018-10-31 17:04:26.135 INFO
> > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
> > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test s:shard1
> > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync:
> > core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our versions
> are
> > newer. ourHighThreshold=1615861330901729280
> > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280
> > otherHighest=1615861335081353216
> >
> > PeerSync thinks the versions on solr-2 are newer for some reason, so it
> > doesn't try to sync from solr-1.  In the final state, solr-2 will always
> > have a lower version for the updated doc than solr-1.  I've tried this
> with
> > different commit strategies, both auto and manual, and it doesn't seem to
> > make any difference.
> >
> > Is this a bug with solr, an issue with using docker, or am I just
> > expecting too much from solr?
> >
> > Thanks for any insights you may have,
> >
> > Jeremy
> >
> >
> >
>


Re: hdfs - documents missing after hard poweroff

2018-10-31 Thread Kevin Risden
Also do you have auto add replicas turned on for these collections over
HDFS?

Kevin Risden


On Wed, Oct 31, 2018 at 8:20 PM Kevin Risden  wrote:

> So I'm definitely curious what is going on here.
>
> Are you still able to reproduce this? Can you check if files have been
> modified on HDFS? I'd be curious if tlogs or the index is changing
> underneath for the different restarts. Since there is no new indexing I
> would guess not but something to check.
>
> Can you run check index on the index to make sure its not corrupt when you
> don't get the full result set.
>
> Kevin Risden
>
>
> On Tue, Oct 16, 2018 at 10:23 AM Kyle Fransham 
> wrote:
>
>> Hi,
>>
>> Sometimes after a full poweroff of the solr cloud nodes, we see missing
>> documents from the index. Is there anything about our setup or our
>> recovery
>> procedure that could cause this? Details are below:
>>
>> We see the following (somewhat random) behaviour:
>>
>>  - add 10 documents to index. Commit.
>>  - query for all documents - 10 documents returned.
>>  - restart all solr nodes and reset the collection (procedure is below).
>>  - query for all  documents 10 documents returned.
>>  - restart+reset all again. - sometimes 7, 8, 9, or 10 documents returned.
>>
>> To summarize, after a full reboot of all the solr nodes, we are finding
>> that (sometimes) not all documents are in the index. This situation
>> doesn't
>> remedy itself by waiting. Restarting all will sometimes re-add them,
>> sometimes not.
>>
>> Our procedure for recovering from a hard poweroff is:
>>  - manually delete all *.lock files from the index folders on hdfs.
>>  - fully delete the znode from zookeeper.
>>  - re-add an empty znode in zookeeper.
>>  - start up all solr nodes.
>>  - re-add the configset.
>>  - re-issue the collection create command.
>>
>> After doing the above, we find that we are able to see all of the files in
>> the index about 60% of the time. Other times, we are missing some
>> documents.
>>
>> Some other things about our environment:
>>  - we're doing this test with 1 collection that has 18 shards distributed
>> across 3 solr cloud nodes.
>>  - solr version 7.5.0
>>  - hdfs is not running on the solr nodes, and is not being restarted.
>>
>> Any thoughts or tips are greatly appreciated,
>>
>> Kyle
>>
>> --
>> CONFIDENTIALITY NOTICE: The information contained in this email is
>> privileged and confidential and intended only for the use of the
>> individual
>> or entity to whom it is addressed.   If you receive this message in
>> error,
>> please notify the sender immediately at 613-729-1100 and destroy the
>> original message and all copies. Thank you.
>>
>


Re: hdfs - documents missing after hard poweroff

2018-10-31 Thread Kevin Risden
So I'm definitely curious what is going on here.

Are you still able to reproduce this? Can you check if files have been
modified on HDFS? I'd be curious if tlogs or the index is changing
underneath for the different restarts. Since there is no new indexing I
would guess not but something to check.

Can you run check index on the index to make sure its not corrupt when you
don't get the full result set.

Kevin Risden


On Tue, Oct 16, 2018 at 10:23 AM Kyle Fransham 
wrote:

> Hi,
>
> Sometimes after a full poweroff of the solr cloud nodes, we see missing
> documents from the index. Is there anything about our setup or our recovery
> procedure that could cause this? Details are below:
>
> We see the following (somewhat random) behaviour:
>
>  - add 10 documents to index. Commit.
>  - query for all documents - 10 documents returned.
>  - restart all solr nodes and reset the collection (procedure is below).
>  - query for all  documents 10 documents returned.
>  - restart+reset all again. - sometimes 7, 8, 9, or 10 documents returned.
>
> To summarize, after a full reboot of all the solr nodes, we are finding
> that (sometimes) not all documents are in the index. This situation doesn't
> remedy itself by waiting. Restarting all will sometimes re-add them,
> sometimes not.
>
> Our procedure for recovering from a hard poweroff is:
>  - manually delete all *.lock files from the index folders on hdfs.
>  - fully delete the znode from zookeeper.
>  - re-add an empty znode in zookeeper.
>  - start up all solr nodes.
>  - re-add the configset.
>  - re-issue the collection create command.
>
> After doing the above, we find that we are able to see all of the files in
> the index about 60% of the time. Other times, we are missing some
> documents.
>
> Some other things about our environment:
>  - we're doing this test with 1 collection that has 18 shards distributed
> across 3 solr cloud nodes.
>  - solr version 7.5.0
>  - hdfs is not running on the solr nodes, and is not being restarted.
>
> Any thoughts or tips are greatly appreciated,
>
> Kyle
>
> --
> CONFIDENTIALITY NOTICE: The information contained in this email is
> privileged and confidential and intended only for the use of the
> individual
> or entity to whom it is addressed.   If you receive this message in error,
> please notify the sender immediately at 613-729-1100 and destroy the
> original message and all copies. Thank you.
>


Re: Extracting top level URL when indexing document

2018-06-12 Thread Kevin Risden
Looks like stop words (in, and, on) is what is breaking. The regex looks
like it is correct.

Kevin Risden

On Tue, Jun 12, 2018, 18:02 Hanjan, Harinder 
wrote:

> Hello!
>
> I am indexing web documents and have a need to extract their top-level URL
> to be stored in a different field. I have had some success with the
> PatternTokenizerFactory (relevant schema bits at the bottom) but the
> behavior appears to be inconsistent.  Most of the times, the top level URL
> is extracted just fine but for some documents, it is being cut off.
>
> Examples:
> URL
>
> Extracted URL
>
> Comment
>
> http://www.calgaryarb.ca/eCourtPublic/15M2018.pdf
>
> http://www.calgaryarb.ca
>
> Success
>
> http://www.calgarymlc.ca/about-cmlc/
>
> http://www.calgarymlc.ca
>
> Success
>
> http://www.calgarypolicecommission.ca/reports.php
>
> http://www.calgarypolicecommissio
>
> Fail
>
> https://attainyourhome.com/
>
> https://attai
>
> Fail
>
> https://liveandplay.calgary.ca/DROPIN/page/dropin
>
> https://livea
>
> Fail
>
>
>
>
> Relevant schema:
> 
>
>  multiValued="false"/>
>
>  sortMissingLast="true">
> 
> 
> class="solr.PatternTokenizerFactory"
>
> pattern="^https?://(?:[^@/n]+@)?(?:www.)?([^:/n]+)"
> group="0"/>
> 
> 
>
>
> I have tested the Regex and it is matching things fine. Please see
> https://regex101.com/r/wN6cZ7/358.
> So it appears that I have a gap in my understanding of how Solr
> PatternTokenizerFactory works. I would appreciate any insight on the issue.
> hostname field will be used in facet queries.
>
> Thank you!
> Harinder
>
> 
> NOTICE -
> This communication is intended ONLY for the use of the person or entity
> named above and may contain information that is confidential or legally
> privileged. If you are not the intended recipient named above or a person
> responsible for delivering messages or communications to the intended
> recipient, YOU ARE HEREBY NOTIFIED that any use, distribution, or copying
> of this communication or any of the information contained in it is strictly
> prohibited. If you have received this communication in error, please notify
> us immediately by telephone and then destroy or delete this communication,
> or return it to us by mail if requested by us. The City of Calgary thanks
> you for your attention and co-operation.
>


Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Kevin Risden
I'm going to share how I've debugged a similar OOM crash and solving it had
nothing to do with increasing heap.

https://risdenk.github.io/2017/12/18/ambari-infra-solr-ranger.html

This is specifically for Apache Ranger and how to fix it but you can treat
it just like any application using Solr.

There were a few things that caused issues "out of the blue":

   - Document TTL
  - The documents getting deleted after some time would trigger OOM
  (due to caches taking up too much heap)
   - Extra query load
  - caches again taking up too much memory
   - Extra inserts
  - too many commits refreshing caches and again going OOM

Many of these can be reduced by using docvalues for fields that you
typically sort/filter on.

Kevin Risden

On Wed, Apr 11, 2018 at 6:01 PM, Deepak Goel <deic...@gmail.com> wrote:

> A few observations:
>
> 1. The Old Gen Heap on 9th April is about 6GB occupied which then runs up
> to 9+GB on 10th April (It steadily increases throughout the day)
> 2. The Old Gen GC is never able to reclaim any free memory
>
>
>
> Deepak
> "Please stop cruelty to Animals, help by becoming a Vegan"
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> On Wed, Apr 11, 2018 at 8:53 PM, Adam Harrison-Fuller <
> aharrison-ful...@mintel.com> wrote:
>
> > In addition, here is the GC log leading up to the crash.
> >
> > https://www.dropbox.com/s/sq09d6hbss9b5ov/solr_gc_log_
> > 20180410_1009.zip?dl=0
> >
> > Thanks!
> >
> > Adam
> >
> > On 11 April 2018 at 16:18, Adam Harrison-Fuller <
> > aharrison-ful...@mintel.com
> > > wrote:
> >
> > > Thanks for the advice so far.
> > >
> > > The directoryFactory is set to ${solr.directoryFactory:solr.
> > NRTCachingDirectoryFactory}.
> > >
> > >
> > > The servers workload is predominantly queries with updates taking place
> > > once a day.  It seems the servers are more likely to go down whilst the
> > > servers are indexing but not exclusively so.
> > >
> > > I'm having issues locating the actual out of memory exception.  I can
> > tell
> > > that it has ran out of memory as its called the oom_killer script which
> > as
> > > left a log file in the logs directory.  I cannot find the actual
> > exception
> > > in the solr.log or our solr_gc.log, any suggestions?
> > >
> > > Cheers,
> > > Adam
> > >
> > >
> > > On 11 April 2018 at 15:49, Walter Underwood <wun...@wunderwood.org>
> > wrote:
> > >
> > >> For readability, I’d use -Xmx12G instead of
> -XX:MaxHeapSize=12884901888.
> > >> Also, I always use a start size the same as the max size, since
> servers
> > >> will eventually grow to the max size. So:
> > >>
> > >> -Xmx12G -Xms12G
> > >>
> > >> wunder
> > >> Walter Underwood
> > >> wun...@wunderwood.org
> > >> http://observer.wunderwood.org/  (my blog)
> > >>
> > >> > On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar <
> sujaybawas...@gmail.com>
> > >> wrote:
> > >> >
> > >> > What is directory factory defined in solrconfig.xml? Your JVM heap
> > >> should
> > >> > be tuned up with respect to that.
> > >> > How solr is being use,  is it more updates and less query or less
> > >> updates
> > >> > more queries?
> > >> > What is OOM error? Is it frequent GC or Error 12?
> > >> >
> > >> > On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller <
> > >> > aharrison-ful...@mintel.com> wrote:
> > >> >
> > >> >> Hey Jesus,
> > >> >>
> > >> >> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to
> > >> them.
> > >> >>
> > >> >> Cheers!
> > >> >> Adam
> > >> >>
> > >> >> On 11 April 2018 at 11:22, Jesus Olivan <jesus.oli...@letgo.com>
> > >> wrote:
> > >> >>
> > >> >>> Hi Adam,
> > >> >>>
> > >> >>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of
> physical
> > >> RAM,
> > >> >>> your JVM can afford more RAM without threading penalties due to
> > >> outside
> > >> >>> heap RAM la

Re: solr 5.2->7.2, suggester failure

2018-04-03 Thread Kevin Risden
It looks like there were changes in Lucene 7.0 that limited the size of the
automaton to prevent overflowing the stack.

https://issues.apache.org/jira/browse/LUCENE-7914

The commit being:
https://github.com/apache/lucene-solr/commit/7dde798473d1a8640edafb41f28ad25d17f25a2d

Kevin Risden

On Tue, Apr 3, 2018 at 1:45 PM, David Hastings <hastings.recurs...@gmail.com
> wrote:

> For data, its primarily a lot of garbage, around 200k titles, varying
> length.  im actually looking through my application now to see if I even
> still use it or if it was an early experiment.  I am just finding it odd
> thats its failing in 7 but does fine on 5
>
> On Tue, Apr 3, 2018 at 2:41 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > What kinds of things go into your title field? On first blush that's a
> > bit odd for a multi-word title field since it treats the entire input
> > as a single string. The code is trying to build a large FST to hold
> > all of this data. Would AnalyzingInfixLookupFactory or similar make
> > more sense?
> >
> > buildOnStartup and buildOnOptimize are other red flags. This means
> > that every time you start up, the data for the title field is read
> > from disk and the FST is built (or index if you use a different impl).
> > On a large corpus this may take many minutes.
> >
> > Best,
> > Erick
> >
> > On Tue, Apr 3, 2018 at 11:28 AM, David Hastings
> > <hastings.recurs...@gmail.com> wrote:
> > > Hey all, I recently got a 7.2 instance up and running, and it seems to
> be
> > > going well however, I have ran into this when creating one of my
> indexes,
> > > and was wondering if anyone had a quick idea right off the top of their
> > > head.
> > >
> > > solrconfig:
> > >
> > > 
> > >   
> > > fixspell
> > > FuzzyLookupFactory
> > >
> > > string
> > >
> > > DocumentDictionaryFactory
> > > title
> > > true
> > > true
> > >   
> > >
> > >
> > > received error:
> > >
> > >
> > > ERROR true
> > > SuggestComponent
> > > Exception in building suggester index for: fixspell
> > > java.lang.IllegalArgumentException: input automaton is too large: 1001
> > > at
> > > org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(
> > Operations.java:1298)
> > > at
> > > org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(
> > Operations.java:1306)
> > > at
> > > org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(
> > Operations.java:1306)
> > >
> > > .
> > >
> > > at
> > > org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(
> > Operations.java:1306)
> > > at
> > > org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(
> > Operations.java:1306)
> > > at
> > > org.apache.lucene.util.automaton.Operations.topoSortStates(Operations.
> > java:1275)
> > > at
> > > org.apache.lucene.search.suggest.analyzing.
> > AnalyzingSuggester.replaceSep(AnalyzingSuggester.java:292)
> > > at
> > > org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.
> > toAutomaton(AnalyzingSuggester.java:854)
> > > at
> > > org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.build(
> > AnalyzingSuggester.java:430)
> > > at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:190)
> > > at
> > > org.apache.solr.spelling.suggest.SolrSuggester.build(
> > SolrSuggester.java:181)
> > > at
> > > org.apache.solr.handler.component.SuggestComponent$SuggesterListener.
> > buildSuggesterIndex(SuggestComponent.java:529)
> > > at
> > > org.apache.solr.handler.component.SuggestComponent$
> > SuggesterListener.newSearcher(SuggestComponent.java:511)
> > > at org.apache.solr.core.SolrCore.lambda$getSearcher$17(
> > SolrCore.java:2275)
> >
>


Re: Ingestion not scaling horizontally as I add more cores to Solr

2018-01-11 Thread Kevin Risden
When you say "multiple machines", was these all local machines or vms or
something else? I worked with a group once that used laptops to benchmark a
service and it was a WiFi network limit that caused weird results. LAN
connections or even better a dedicated client machine would help push more
documents.

Kevin Risden

On Thu, Jan 11, 2018 at 11:39 AM, Shashank Pedamallu <spedama...@vmware.com>
wrote:

> Thank you very much for the reply Shawn. Is the jmeter running on a
> different machine from Solr or on the same machine?
> Solr is running on a dedicated VM. And I’ve tried to split the client
> requests from multiple machines but the result was not different. So, I
> don’t think the bottleneck is with the client side.
>
> Thanks,
> Shashank
>
>
> On 1/10/18, 10:54 PM, "Shawn Heisey" <apa...@elyograg.org> wrote:
>
> On 1/10/2018 12:58 PM, Shashank Pedamallu wrote:
> > As you can see, the number of documents being ingested per core is
> not scaling horizontally as I'm adding more cores. Rather the total number
> of documents getting ingested for Solr JVM is being topped around 90k
> documents per second.
>
> I would call 90K documents per second a very respectable speed.  I
> can't
> get my indexing to happen at anywhere near that rate.  My indexing is
> not multi-threaded, though.
>
> >  From the iostats and top commands, I do not see any bottlenecks
> with the iops or cpu respectively, CPU usaeg is around 65% and a sample of
> iostats is below:
> >
> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >
> >55.320.002.331.640.00   40.71
> >
> > Device:tpskB_read/skB_wrtn/skB_read
> kB_wrtn
> >
> > sda5   2523.00 45812.00298312.00  45812
>  298312
>
> Nearly 300 megabytes per second write speed?  That's a LOT of data.
> This storage must be quite a bit better than a single spinning disk.
> You won't get that kind of sustained transfer speed out of standard
> spinning disks unless they are using something like RAID10 or RAID0.
> This transfer speed is also well beyond the capabilities of Gigabit
> Ethernet.
>
> When Gus asked whether you were sending documents to the cloud from
> your
> local machine, I don't think he was referring to a public cloud.  I
> think he assumed you were running SolrCloud, so "cloud" was probably
> referring to your Solr installation, not a public cloud service.  If I
> had to guess, I think the intent was to find out what caliber of
> machine
> you're using to send the indexing requests.
>
> I don't know if the bottleneck is on the client side or the server
> side.
>   But I would imagine that with everything on a single machine, you may
> not be able to get the ingestion rate to go much higher.
>
> Is the jmeter running on a different machine from Solr or on the same
> machine?
>
> Thanks,
> Shawn
>
>
>


Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-22 Thread Kevin Risden
Thanks for the detailed answers Joe. Definitely sounds like you covered
most of the easy HDFS performance items.

Kevin Risden

On Wed, Nov 22, 2017 at 7:44 AM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Hi Kevin -
> * HDFS is part of Cloudera 5.12.0.
> * Solr is co-located in most cases.  We do have several nodes that run on
> servers that are not data nodes, but most do. Unfortunately, our nodes are
> not the same size.  Some nodes have 8TBytes of disk, while our largest
> nodes are 64TBytes.  This results in a lot of data that needs to go over
> the network.
>
> * Command is:
> /usr/lib/jvm/jre-1.8.0/bin/java -server -Xms12g -Xmx16g -Xss2m
> -XX:+UseG1GC -XX:MaxDirectMemorySize=11g -XX:+PerfDisableSharedMem
> -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=16m
> -XX:MaxGCPauseMillis=300 -XX:InitiatingHeapOccupancyPercent=75
> -XX:+UseLargePages -XX:ParallelGCThreads=16 -XX:-ResizePLAB
> -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> -Xloggc:/opt/solr6/server/logs/solr_gc.log -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M -DzkClientTimeout=30
> -DzkHost=frodo.querymasters.com:2181,bilbo.querymasters.com:2181,
> gandalf.querymasters.com:2181,cordelia.querymasters.com:2181,cressida.
> querymasters.com:2181/solr6.6.0 -Dsolr.log.dir=/opt/solr6/server/logs
> -Djetty.port=9100 -DSTOP.PORT=8100 -DSTOP.KEY=solrrocks -Dhost=tarvos
> -Duser.timezone=UTC -Djetty.home=/opt/solr6/server
> -Dsolr.solr.home=/opt/solr6/server/solr -Dsolr.install.dir=/opt/solr6
> -Dsolr.clustering.enabled=true -Dsolr.lock.type=hdfs
> -Dsolr.autoSoftCommit.maxTime=12 -Dsolr.autoCommit.maxTime=180
> -Dsolr.solr.home=/etc/solr6 -Djava.library.path=/opt/cloud
> era/parcels/CDH/lib/hadoop/lib/native -Xss256k -Dsolr.log.muteconsole
> -XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100
> /opt/solr6/server/logs -jar start.jar --module=http
>
> * We have enabled short circuit reads.
>
> Right now, we have a relatively small block cache due to the requirements
> that the servers run other software.  We tried to find the best balance
> between block cache size, and RAM for programs, while still giving enough
> for local FS cache.  This came out to be 84 128M blocks - or about 10G for
> the cache per node (45 nodes total).
>
>  class="solr.HdfsDirectoryFactory">
> true
> true
> 84
> true bool>
> 16384
> true
> true
> 128
> 1024
>     hdfs://nameservice1:8020/solr6.6.0 r>
> /etc/hadoop/conf.cloudera.hdfs1 r>
> 
>
> Thanks for reviewing!
>
> -Joe
>
>
>
> On 11/22/2017 8:20 AM, Kevin Risden wrote:
>
>> Joe,
>>
>> I have a few questions about your Solr and HDFS setup that could help
>> improve the recovery performance.
>>
>> * Is HDFS part of a distribution from Hortonworks, Cloudera, etc?
>> * Is Solr colocated with HDFS data nodes?
>> * What is the output of "ps aux | grep solr"? (specifically looking for
>> the
>> Java arguments that are being set.)
>>
>> Depending on how Solr on HDFS was setup, there are some potentially simple
>> settings that can help significantly improve performance.
>>
>> 1) Short circuit reads
>>
>> If Solr is colocated with an HDFS datanode, short circuit reads can
>> improve
>> read performance since it skips a network hop if the data is local to that
>> node. This requires HDFS native libraries to be added to Solr.
>>
>> 2) HDFS block cache in Solr
>>
>> Solr without HDFS uses the OS page cache to handle caching data for
>> queries. With HDFS, Solr has a special HDFS block cache which allows for
>> caching HDFS blocks. This significantly helps query performance. There are
>> a few configuration parameters that can help here.
>>
>> Kevin Risden
>>
>> On Wed, Nov 22, 2017 at 4:20 AM, Hendrik Haddorp <hendrik.hadd...@gmx.net
>> >
>> wrote:
>>
>> Hi Joe,
>>>
>>> sorry, I have not seen that problem. I would normally not delete a
>>> replica
>>> if the shard is down but only if there is an active shard. Without an
>>> active leader the replica should not be able to recover. I also just had
>>> a
>>> case where all replicas of a shard stayed in down state and restarts
>>> didn't
>>> help. This was however also caused by lock files. Once I cleaned them up
>>> and restarted all Solr instances that had a 

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-22 Thread Kevin Risden
Joe,

I have a few questions about your Solr and HDFS setup that could help
improve the recovery performance.

* Is HDFS part of a distribution from Hortonworks, Cloudera, etc?
* Is Solr colocated with HDFS data nodes?
* What is the output of "ps aux | grep solr"? (specifically looking for the
Java arguments that are being set.)

Depending on how Solr on HDFS was setup, there are some potentially simple
settings that can help significantly improve performance.

1) Short circuit reads

If Solr is colocated with an HDFS datanode, short circuit reads can improve
read performance since it skips a network hop if the data is local to that
node. This requires HDFS native libraries to be added to Solr.

2) HDFS block cache in Solr

Solr without HDFS uses the OS page cache to handle caching data for
queries. With HDFS, Solr has a special HDFS block cache which allows for
caching HDFS blocks. This significantly helps query performance. There are
a few configuration parameters that can help here.

Kevin Risden

On Wed, Nov 22, 2017 at 4:20 AM, Hendrik Haddorp <hendrik.hadd...@gmx.net>
wrote:

> Hi Joe,
>
> sorry, I have not seen that problem. I would normally not delete a replica
> if the shard is down but only if there is an active shard. Without an
> active leader the replica should not be able to recover. I also just had a
> case where all replicas of a shard stayed in down state and restarts didn't
> help. This was however also caused by lock files. Once I cleaned them up
> and restarted all Solr instances that had a replica they recovered.
>
> For the lock files I discovered that the index is not always in the
> "index" folder but can also be in an index. folder. There can be
> an "index.properties" file in the "data" directory in HDFS and this
> contains the correct index folder name.
>
> If you are really desperate you could also delete all but one replica so
> that the leader election is quite trivial. But this does of course increase
> the risk of finally loosing the data quite a bit. So I would try looking
> into the code and figure out what the problem is here and maybe compare the
> state in HDFS and ZK with a shard that works.
>
> regards,
> Hendrik
>
>
> On 21.11.2017 23:57, Joe Obernberger wrote:
>
>> Hi Hendrick - the shards in question have three replicas.  I tried
>> restarting each one (one by one) - no luck.  No leader is found. I deleted
>> one of the replicas and added a new one, and the new one also shows as
>> 'down'.  I also tried the FORCELEADER call, but that had no effect.  I
>> checked the OVERSEERSTATUS, but there is nothing unusual there.  I don't
>> see anything useful in the logs except the error:
>>
>> org.apache.solr.common.SolrException: Error getting leader from zk for
>> shard shard21
>> at org.apache.solr.cloud.ZkController.getLeader(ZkController.
>> java:996)
>> at org.apache.solr.cloud.ZkController.register(ZkController.java:902)
>> at org.apache.solr.cloud.ZkController.register(ZkController.java:846)
>> at org.apache.solr.core.ZkContainer.lambda$registerInZk$0(
>> ZkContainer.java:181)
>> at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
>> xecutor.lambda$execute$0(ExecutorUtil.java:229)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1149)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.solr.common.SolrException: Could not get leader
>> props
>> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkControll
>> er.java:1043)
>> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkControll
>> er.java:1007)
>> at org.apache.solr.cloud.ZkController.getLeader(ZkController.
>> java:963)
>> ... 7 more
>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>> KeeperErrorCode = NoNode for /collections/UNCLASS/leaders/shard21/leader
>> at org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:111)
>> at org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:51)
>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
>> at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkCl
>> ient.java:357)
>> at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkCl
>> ient.java:354)
>> at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(Zk
>> CmdExecutor.java:60)
>> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClie
>> nt.java:354)
>> at org.apache.solr.cloud.ZkController.g

Re: Solr7: Bad query throughput around commit time

2017-11-11 Thread Kevin Risden
> One machine runs with a 3TB drive, running 3 solr processes (each with
one core as described above).

How much total memory on the machine?

Kevin Risden

On Sat, Nov 11, 2017 at 1:08 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
wrote:

> Thanks for a quick and detailed response, Erick!
>
> Unfortunately i don't have a proof, but our servers with solr 4.5 are
> running really nicely with the above config. I had assumed that same  or
> similar settings will also perform well with Solr 7, but that assumption
> didn't hold. As, a lot has changed in 3 major releases.
> I have tweaked the cache values as you suggested but increasing or
> decreasing doesn't seem to do any noticeable improvement.
>
> At the moment, my one core has 800GB index, ~450 Million documents, 48 G
> Xmx. GC pauses haven't been an issue though.  One machine runs with a 3TB
> drive, running 3 solr processes (each with one core as described above).  I
> agree that it is a very atypical system so i should probably try different
> parameters with a fresh eye to find the solution.
>
>
> I tried with autocommits (commit with opensearcher=false very half minute ;
> and softcommit every 5 minutes). That supported the hypothesis that the
> query throughput decreases after opening a new searcher and **not** after
> committing the index . Cache hit ratios are all in 80+% (even when i
> decreased the filterCache to 128, so i will keep it at this lower value).
> Document cache hitratio is really bad, it drops to around 40% after
> newSearcher. But i guess that is expected, since it cannot be warmed up
> anyway.
>
>
> Thanks
> Nawab
>
>
>
> On Thu, Nov 9, 2017 at 9:11 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > What evidence to you have that the changes you've made to your configs
> > are useful? There's lots of things in here that are suspect:
> >
> >   1
> >
> > First, this is useless unless you are forceMerging/optimizing. Which
> > you shouldn't be doing under most circumstances. And you're going to
> > be rewriting a lot of data every time See:
> >
> > https://lucidworks.com/2017/10/13/segment-merging-deleted-
> > documents-optimize-may-bad/
> >
> > filterCache size of size="10240" is far in excess of what we usually
> > recommend. Each entry can be up to maxDoc/8 and you have 10K of them.
> > Why did you choose this? On the theory that "more is better?" If
> > you're using NOW then you may not be using the filterCache well, see:
> >
> > https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/
> >
> > autowarmCount="1024"
> >
> > Every time you commit you're firing off 1024 queries which is going to
> > spike the CPU a lot. Again, this is super-excessive. I usually start
> > with 16 or so.
> >
> > Why are you committing from a cron job? Why not just set your
> > autocommit settings and forget about it? That's what they're for.
> >
> > Your queryResultCache is likewise kind of large, but it takes up much
> > less space than the filterCache per entry so it's probably OK. I'd
> > still shrink it and set the autowarm to 16 or so to start, unless
> > you're seeing a pretty high hit ratio, which is pretty unusual but
> > does happen.
> >
> > 48G of memory is just asking for long GC pauses. How many docs do you
> > have in each core anyway? If you're really using this much heap, then
> > it'd be good to see what you can do to shrink in. Enabling docValues
> > for all fields you facet, sort or group on will help that a lot if you
> > haven't already.
> >
> > How much memory on your entire machine? And how much is used by _all_
> > the JVMs you running on a particular machine? MMapDirectory needs as
> > much OS memory space as it can get, see:
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >
> > Lately we've seen some structures that consume memory until a commit
> > happens (either soft or hard). I'd shrink my autocommit down to 60
> > seconds or even less (openSearcher=false).
> >
> > In short, I'd go back mostly to the default settings and build _up_ as
> > you can demonstrate improvements. You've changed enough things here
> > that untangling which one is the culprit will be hard. You want the
> > JVM to have as little memory as possible, unfortunately that's
> > something you figure out by experimentation.
> >
> > Best,
> > Erick
> >
> > On Thu, Nov 9, 2017 at 8:42 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> > wrote:
> > > Hi,
> > >
> > > I am committing every

Re: Parallel SQL: GROUP BY throws exception

2017-10-17 Thread Kevin Risden
Calcite might support this in 0.14. I know group by support was improved
lately. It might be as simple as upgrading the dependency? A test case
showing the NPE would be helpful. We are using MySQL dialect under the hood
with Calcite.

Kevin Risden

On Tue, Oct 17, 2017 at 8:09 AM, Joel Bernstein <joels...@gmail.com> wrote:

> This would be a good jira to create at (
> https://issues.apache.org/jira/projects/SOLR)
>
> Interesting that the query works in MySQL. I'm assuming MySQL automatically
> adds the group by field to the field list. We can look at doing this as
> well.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Oct 17, 2017 at 6:48 AM, Dmitry Gerasimov <
> dgerasi...@kommunion.com>
> wrote:
>
> > Joel,
> >
> > Thanks for the tip. That worked. I was confused since this query works
> > just fine in MySQL.
> > It would of course be very helpful if SOLR was responding with a
> > proper error. What’s the process here? Where do I post this request?
> >
> > Dmitry
> >
> >
> >
> >
> > > -- Forwarded message --
> > > From: Joel Bernstein <joels...@gmail.com>
> > > To: solr-user@lucene.apache.org
> > > Cc:
> > > Bcc:
> > > Date: Mon, 16 Oct 2017 11:16:28 -0400
> > > Subject: Re: Parallel SQL: GROUP BY throws exception
> > > Ok, I just the read the query again.
> > >
> > > Try the failing query like this:
> > >
> > > SELECT people_person_id, sum(amount) as total FROM donation GROUP BY
> > > people_person_id
> > >
> > > That is the correct syntax for the SQL group by aggregation.
> > >
> > > It looks like you found a null pointer though where a proper error
> > message
> > > is needed.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Mon, Oct 16, 2017 at 9:49 AM, Joel Bernstein <joels...@gmail.com>
> > wrote:
> > >
> > > > Also what version are you using?
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Mon, Oct 16, 2017 at 9:49 AM, Joel Bernstein <joels...@gmail.com>
> > > > wrote:
> > > >
> > > >> Can you provide the stack trace?
> > > >>
> > > >> Are you in SolrCloud mode?
> > > >>
> > > >>
> > > >>
> > > >> Joel Bernstein
> > > >> http://joelsolr.blogspot.com/
> > > >>
> > > >> On Mon, Oct 16, 2017 at 9:20 AM, Dmitry Gerasimov <
> > > >> dgerasi...@kommunion.com> wrote:
> > > >>
> > > >>> Hi all!
> > > >>>
> > > >>> This query works as expected:
> > > >>> SELECT sum(amount) as total FROM donation
> > > >>>
> > > >>> Adding GROUP BY:
> > > >>> SELECT sum(amount) as total FROM donation GROUP BY people_person_id
> > > >>>
> > > >>> Now I get response:
> > > >>> {
> > > >>>   "result-set":{
> > > >>> "docs":[{
> > > >>> "EXCEPTION":"Failed to execute sqlQuery 'SELECT sum(amount)
> > as
> > > >>> total  FROM donation GROUP BY people_person_id' against JDBC
> > connection
> > > >>> 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT
> sum(amount)
> > as
> > > >>> total  FROM donation GROUP BY people_person_id\": null",
> > > >>> "EOF":true,
> > > >>> "RESPONSE_TIME":279}]}
> > > >>> }
> > > >>>
> > > >>> Any ideas on what is causing this? Or how to debug?
> > > >>>
> > > >>>
> > > >>> Here is the collection structure:
> > > >>>
> > > >>>  > > >>> required="true"
> > > >>> multiValued="false"/>
> > > >>>  > stored="true"
> > > >>> required="true" multiValued="false" docValues="true"/>
> > > >>>  > > >>> required="true" multiValued="false"/>
> > > >>>  > > >>> multiValued="false" docValues="true"/>
> > > >>>
> > > >>>
> > > >>> Thanks!
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>


Re: Solr uses lots of shared memory!

2017-09-02 Thread Kevin Risden
I haven't looked at reproducing this locally, but since it seems like
there haven't been any new ideas decided to share this in case it
helps:

I noticed in Travis CI [1] they are adding the environment variable
MALLOC_ARENA_MAX=2 and so I googled what that configuration did. To my
surprise, I came across a stackoverflow post [2] about how glibc could
actually be the case and report memory differently. I then found a
Hadoop issue HADOOP-7154 [3] about setting this as well to reduce
virtual memory usage. I found some more cases where this has helped as
well [4], [5], and [6]

[1] https://docs.travis-ci.com/user/build-environment-updates/2017-09-06/#Added
[2] 
https://stackoverflow.com/questions/10575342/what-would-cause-a-java-process-to-greatly-exceed-the-xmx-or-xss-limit
[3] https://issues.apache.org/jira/browse/HADOOP-7154?focusedCommentId=14505792
[4] https://github.com/cloudfoundry/java-buildpack/issues/320
[5] https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
[6] 
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en
Kevin Risden


On Thu, Aug 24, 2017 at 10:19 AM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
> Hello Bernd,
>
> According to the man page, i should get a list of stuff in shared memory if i 
> invoke it with just a PID. Which shows a list of libraries that together 
> account for about 25 MB's shared memory usage. Accoring to ps and top, the 
> JVM uses 2800 MB shared memory (not virtual), that leaves 2775 MB unaccounted 
> for. Any ideas? Anyone else to reproduce it on a freshly restarted node?
>
> Thanks,
> Markus
>
>
>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
> 18901 markus20   0 14,778g 4,965g 2,987g S 891,1 31,7  20:21.63 java
>
> 0x55b9a17f1000  6K  /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
> 0x7fdf1d314000  182K
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libsunec.so
> 0x7fdf1e548000  38K 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libmanagement.so
> 0x7fdf1e78e000  94K 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so
> 0x7fdf1e9a6000  75K 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnio.so
> 0x7fdf5cd6e000  34K 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libzip.so
> 0x7fdf5cf77000  46K /lib/x86_64-linux-gnu/libnss_files-2.24.so
> 0x7fdf5d189000  46K /lib/x86_64-linux-gnu/libnss_nis-2.24.so
> 0x7fdf5d395000  90K /lib/x86_64-linux-gnu/libnsl-2.24.so
> 0x7fdf5d5ae000  34K /lib/x86_64-linux-gnu/libnss_compat-2.24.so
> 0x7fdf5d7b7000  187K
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjava.so
> 0x7fdf5d9e6000  70K 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libverify.so
> 0x7fdf5dbf8000  30K /lib/x86_64-linux-gnu/librt-2.24.so
> 0x7fdf5de0  90K /lib/x86_64-linux-gnu/libgcc_s.so.1
> 0x7fdf5e017000  1063K   /lib/x86_64-linux-gnu/libm-2.24.so
> 0x7fdf5e32  1553K   /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
> 0x7fdf5e6a8000  15936K  
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> 0x7fdf5f5ed000  139K/lib/x86_64-linux-gnu/libpthread-2.24.so
> 0x7fdf5f80b000  14K /lib/x86_64-linux-gnu/libdl-2.24.so
> 0x7fdf5fa0f000  110K/lib/x86_64-linux-gnu/libz.so.1.2.11
> 0x7fdf5fc2b000  1813K   /lib/x86_64-linux-gnu/libc-2.24.so
> 0x7fdf5fff2000  58K 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/jli/libjli.so
> 0x7fdf60201000  158K/lib/x86_64-linux-gnu/ld-2.24.so
>
> -Original message-
>> From:Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
>> Sent: Thursday 24th August 2017 15:39
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr uses lots of shared memory!
>>
>> Just an idea, how about taking a dump with jmap and using
>> MemoryAnalyzerTool to see what is going on?
>>
>> Regards
>> Bernd
>>
>>
>> Am 24.08.2017 um 11:49 schrieb Markus Jelsma:
>> > Hello Shalin,
>> >
>> > Yes, the main search index has DocValues on just a few fields, they are 
>> > used for facetting and function queries, we started using DocValues when 
>> > 6.0 was released. Most fields are content fields for many languages. I 
>> > don't think it is going to be DocValues because the max shared memory 
>> > consumption is reduced my searching on fields fewer languages, and by 
>> > disabling highlighting, both not using DocValues.
>> >
>> > But it tried the option regardless, and because i didn't kno

Re: Possible regression in Parallel SQL in 6.5.1?

2017-05-16 Thread Kevin Risden
Well didn't take as long as I thought:
https://issues.apache.org/jira/browse/CALCITE-1306

Once Calcite 1.13 is released we should upgrade and get support for this
again.

Kevin Risden

On Tue, May 16, 2017 at 7:23 PM, Kevin Risden <compuwizard...@gmail.com>
wrote:

> Yea this came up on the calcite mailing list. Not sure if aliases in the
> having clause were going to be added. I'll have to see if I can find that
> discussion or JIRA.
>
> Kevin Risden
>
> On May 16, 2017 18:54, "Joel Bernstein" <joels...@gmail.com> wrote:
>
>> Yeah, Calcite doesn't support field aliases in the having clause. The
>> query
>> should work if you use count(*). We could consider this a regression, but
>> I
>> think this will be a won't fix.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Tue, May 16, 2017 at 12:51 PM, Timothy Potter <thelabd...@gmail.com>
>> wrote:
>>
>> > This SQL used to work pre-calcite:
>> >
>> > SELECT movie_id, COUNT(*) as num_ratings, avg(rating) as aggAvg FROM
>> > ratings GROUP BY movie_id HAVING num_ratings > 100 ORDER BY aggAvg ASC
>> > LIMIT 10
>> >
>> > Now I get:
>> > Caused by: java.io.IOException: -->
>> > http://192.168.1.4:8983/solr/ratings_shard2_replica1/:Failed to
>> > execute sqlQuery 'SELECT movie_id, COUNT(*) as num_ratings,
>> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
>> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10' against JDBC
>> > connection 'jdbc:calcitesolr:'.
>> > Error while executing SQL "SELECT movie_id, COUNT(*) as num_ratings,
>> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
>> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10": From line 1, column
>> > 103 to line 1, column 113: Column 'num_ratings' not found in any table
>> > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
>> > SolrStream.java:235)
>> > at com.lucidworks.spark.query.TupleStreamIterator.fetchNextTupl
>> e(
>> > TupleStreamIterator.java:82)
>> > at com.lucidworks.spark.query.TupleStreamIterator.hasNext(
>> > TupleStreamIterator.java:47)
>> > ... 31 more
>> >
>>
>


Re: Possible regression in Parallel SQL in 6.5.1?

2017-05-16 Thread Kevin Risden
Yea this came up on the calcite mailing list. Not sure if aliases in the
having clause were going to be added. I'll have to see if I can find that
discussion or JIRA.

Kevin Risden

On May 16, 2017 18:54, "Joel Bernstein" <joels...@gmail.com> wrote:

> Yeah, Calcite doesn't support field aliases in the having clause. The query
> should work if you use count(*). We could consider this a regression, but I
> think this will be a won't fix.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, May 16, 2017 at 12:51 PM, Timothy Potter <thelabd...@gmail.com>
> wrote:
>
> > This SQL used to work pre-calcite:
> >
> > SELECT movie_id, COUNT(*) as num_ratings, avg(rating) as aggAvg FROM
> > ratings GROUP BY movie_id HAVING num_ratings > 100 ORDER BY aggAvg ASC
> > LIMIT 10
> >
> > Now I get:
> > Caused by: java.io.IOException: -->
> > http://192.168.1.4:8983/solr/ratings_shard2_replica1/:Failed to
> > execute sqlQuery 'SELECT movie_id, COUNT(*) as num_ratings,
> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10' against JDBC
> > connection 'jdbc:calcitesolr:'.
> > Error while executing SQL "SELECT movie_id, COUNT(*) as num_ratings,
> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10": From line 1, column
> > 103 to line 1, column 113: Column 'num_ratings' not found in any table
> > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> > SolrStream.java:235)
> > at com.lucidworks.spark.query.TupleStreamIterator.
> fetchNextTuple(
> > TupleStreamIterator.java:82)
> > at com.lucidworks.spark.query.TupleStreamIterator.hasNext(
> > TupleStreamIterator.java:47)
> > ... 31 more
> >
>


Re: Solr with HDFS on AWS S3 - Server restart fails to load the core

2017-04-07 Thread Kevin Risden
>
> Thank you for the response. Setting “loadOnStartup=true“ results in showing
> the connection timeout on clicking 'Core Admin' on Solr UI. Also, reload
> does not work as the core is not loaded at all.


Can you clarify what you mean by this? Does the core get loaded after you
restart Solr?

The initial description was the core wasn't loaded after Solr was
restarted. What you are describing now is different I think.

Kevin Risden

On Fri, Apr 7, 2017 at 6:31 PM, Amarnath palavalli <pamarn...@gmail.com>
wrote:

> Hi Trey,
>
> Thank you for the response. Setting “loadOnStartup=true“ results in showing
> the connection timeout on clicking 'Core Admin' on Solr UI. Also, reload
> does not work as the core is not loaded at all.
>
> I suspect, something to do with HTTP connection idle time, probably the
> connection is closed before the data is pulled from S3. I see that the '
> maxUpdateConnectionIdleTime' is 40 seconds by default. However, don't know
> how to change it.
>
> Thanks,
> Amar
>
>
>
> On Fri, Apr 7, 2017 at 12:47 PM, Cahill, Trey <trey.cah...@siemens.com>
> wrote:
>
> > Hi Amarnath,
> >
> > It looks like you’ve set the core to not load on startup via the
> > “loadOnStartup=false“ property.   Your response also shows that the core
> is
> > not loaded, “false“.
> >
> > I’m not really sure how to load cores after a restart, but possibly using
> > the Core Admin Reload would do it (https://cwiki.apache.org/
> > confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD).
> >
> > Best of luck,
> >
> > Trey
> >
> > From: Amarnath palavalli [mailto:pamarn...@gmail.com]
> > Sent: Friday, April 07, 2017 3:20 PM
> > To: solr-user@lucene.apache.org
> > Subject: Solr with HDFS on AWS S3 - Server restart fails to load the core
> >
> > Hello,
> >
> > I configured Solr to use HDFS, which in turn configured to use S3N. I
> used
> > the information from this issue to configure:
> > https://issues.apache.org/jira/browse/SOLR-9952
> >
> > Here is the command I have used to start the Solr with HDFS:
> > bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory
> > -Dsolr.lock.type=hdfs -Dsolr.hdfs.home=s3n://amar-hdfs/solr
> > -Dsolr.hdfs.confdir=/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop
> > -DXX:MaxDirectMemorySize=2g
> >
> > I am able to create a core, with the following properties:
> > #Written by CorePropertiesLocator
> > #Thu Apr 06 23:08:57 UTC 2017
> > name=amar-s3
> > loadOnStartup=false
> > transient=true
> > configSet=base-config
> >
> > I am able to ingest messages into Solr and also query the content.
> > Everything seems to be fine until this stage and I can see the data dir
> on
> > S3.
> >
> > However, the problem is when I restart the Solr server, that is when I
> see
> > the core not loaded even when accessed/queried against it. Here is the
> > admin API to get all cores gives:
> > 
> > 
> > 0
> > 617
> > 
> > 
> > 
> > ...
> > 
> > amar-s3
> > 
> > /Users/apalavalli/solr/solr-deployment/server/solr/amar-s3
> > 
> > data/
> > solrconfig.xml
> > schema.xml
> > false
> > 
> > 
> > 
> >
> > I don't see any issues reported in the log as well, but see this error
> > from the UI:
> >
> > [Inline image 1]
> >
> >
> > Not sure about the problem. This is happening when I ingest more than 40K
> > messages in core before restarting Solr server.
> >
> > I am using Hadoop 2.7.3 with S3N FS. Please help me on resolving this
> > issue.
> >
> > Thanks,
> > Regards,
> > Amar
> >
> >
> >
> >
> >
>


Re: Searchable archive of this mailing list

2017-03-31 Thread Kevin Risden
Google usually does a pretty good job of indexing this mailing list.

The other place I'll usually go is here:
http://search-lucene.com/?project=Solr

Kevin Risden

On Fri, Mar 31, 2017 at 4:18 PM, OTH <omer.t@gmail.com> wrote:

> Hi all,
>
> Is there a searchable archive of this mailing list?
>
> I'm asking just so I don't have to post a question in the future which may
> have been answered before already.
>
> Thanks
>


Re: Add fieldType from Solr API

2017-02-26 Thread Kevin Risden
As Alex said there is no Admin UI support. The API is called the Schema API:

https://cwiki.apache.org/confluence/display/solr/Schema+API

That allows you to modify the schema programatically. You will have to
reload the collection either way.

Kevin Risden

On Sun, Feb 26, 2017 at 1:33 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> You can hand edit it, just make sure to reload the collection after.
>
> Otherwise, I believe, there is API.
>
> Not the Admin UI yet, unfortunately.
>
> Regards,
> Alex
>
> On 26 Feb 2017 1:50 PM, "OTH" <omer.t@gmail.com> wrote:
>
> Hello,
>
> I am new to Solr, and am using Solr v. 6.4.1.
>
> I need to add a new "fieldType" to my schema.  My version of Solr is using
> the "managed-schema" XML file, which I gather one is not supposed to modify
> directly.  Is it possible to add a new fieldType using the Solr Admin via
> the browser?  The "schema" page doesn't seem to provide this option, at
> least from what I can tell.
>
> Thanks
>


JSON Facet API - Range Query - Missing field parameter NPE

2017-02-24 Thread Kevin Risden
One of my colleagues ran into this testing the JSON Facet API. A malformed
JSON Facet API range query seems to get a NPE and then devolves into saying
no live servers to handle the request. It looks like the
FacetRangeProcessor should check the inputs before trying to getField. Does
this seem reasonable?

The problematic query:

json.facet={price:{type:range,start:0,end:600,gap:50}}

The fixed query:

json.facet={prices:{field:price,type:range,start:0,end:600,gap:50}}

The stack trace:

INFO  - 2017-02-24 20:54:52.217; [c:gettingstarted s:shard1 r:core_node2
x:gettingstarted_shard1_replica1] org.apache.solr.core.SolrCore;
[gettingstarted_shard1_replica1]  webapp=/solr path=/select
params={df=_text_=false&_facet_={}=id=
score=1048580=0=true=htt
p://localhost:8983/solr/gettingstarted_shard1_replica1/|
http://localhost:7574/solr/gettingstarted_shard1_replica2/=10;
version=2=*:*={price:{type:range,start:0,end:600,gap:50}}=
1487969692214=true=javabin} hits=2328 status=500 QTime=1
ERROR - 2017-02-24 20:54:52.218; [c:gettingstarted s:shard1 r:core_node2
x:gettingstarted_shard1_replica1] org.apache.solr.common.SolrException;
null:java.lang.NullPointerException
at org.apache.solr.schema.IndexSchema$DynamicReplacement$
DynamicPattern$NameEndsWith.matches(IndexSchema.java:1043)
at org.apache.solr.schema.IndexSchema$DynamicReplacement.matches(
IndexSchema.java:1057)
at org.apache.solr.schema.IndexSchema.getFieldOrNull(IndexSchema.java:1213)
at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1230)
at org.apache.solr.search.facet.FacetRangeProcessor.process(
FacetRange.java:96)
at org.apache.solr.search.facet.FacetProcessor.processSubs(
FacetProcessor.java:439)
at org.apache.solr.search.facet.FacetProcessor.fillBucket(
FacetProcessor.java:396)
at org.apache.solr.search.facet.FacetQueryProcessor.process(
FacetQuery.java:60)
at org.apache.solr.search.facet.FacetModule.process(FacetModule.java:96)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(
SearchHandler.java:295)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)

Kevin Risden


Re: SSL using signed client certificate not working

2017-02-15 Thread Kevin Risden
It sounds like Edge, Firefox, and Chrome aren't setup on your computer to
do client authentication. You can set need client authentication to false
and use want client authentication in solr.in.sh. This will allow browsers
that don't present a client certificate to work. Otherwise you need to
configure your browsers.

Client authentication is an extra part of SSL and not usually required.

Kevin Risden

On Feb 15, 2017 4:43 AM, "Espen Rise Halstensen" <e...@dips.no> wrote:

>
> Hi,
>
> I have some problems with client certificates. By the look of it, it works
> with
> curl and safari prompts for and accepts my certificate. Does not work with
> Edge,
> Firefox or Chrome. The certificates are requested from our CA.
>
> When requesting https://s02/solr in the browser, it doesn't
> prompt for certificate and I get the following error message in Chrome:
> >This site can't provide a secure connection
> >s02 didn't accept your login certificate, or one may not have been
> provided.
> >Try contacting the system admin.
>
> When debugging with wireshark I can see the s01t9 certificate in the
> "certificate request"-part of the handshake, but the browser answers
> without certificate.
>
>
> Setup as follows:
>
> solr.in.sh:
> SOLR_SSL_KEY_STORE=etc/keystore.jks
> SOLR_SSL_KEY_STORE_PASSWORD=secret
> SOLR_SSL_TRUST_STORE=etc/truststore.jks
> SOLR_SSL_TRUST_STORE_PASSWORD=secret
> SOLR_SSL_NEED_CLIENT_AUTH=true
> SOLR_SSL_WANT_CLIENT_AUTH=false
>
> Content of truststore.jks:
> [solruser@s02 etc]# keytool -list -keystore 
> /opt/solr-6.4.0/server/etc/truststore.jks
> -storepass secret
>
> Keystore type: JKS
> Keystore provider: SUN
>
> Your keystore contains 1 entry
>
> s01t9, 15.feb.2017, trustedCertEntry,
> Certificate fingerprint (SHA1): CF:BD:02:71:64:F0:BA:65:71:10:
> A1:23:42:34:E0:3C:37:75:E1:BF
>
>
>
> Curl(returns html of admin page with -L option):
>
> curl -v -E  s01t9.pem:secret --cacert  rootca.pem 'https://vs02/solr'
> * Hostname was NOT found in DNS cache
> *   Trying 10.0.121.132...
> * Connected to s02 (10.0.121.132) port 443 (#0)
> * successfully set certificate verify locations:
> *   CAfile: rootca.pem
>   CApath: /etc/ssl/certs
> * SSLv3, TLS handshake, Client hello (1):
> * SSLv3, TLS handshake, Server hello (2):
> * SSLv3, TLS handshake, CERT (11):
> * SSLv3, TLS handshake, Request CERT (13):
> * SSLv3, TLS handshake, Server finished (14):
> * SSLv3, TLS handshake, CERT (11):
> * SSLv3, TLS handshake, Client key exchange (16):
> * SSLv3, TLS handshake, CERT verify (15):
> * SSLv3, TLS change cipher, Client hello (1):
> * SSLv3, TLS handshake, Finished (20):
> * SSLv3, TLS change cipher, Client hello (1):
> * SSLv3, TLS handshake, Finished (20):
> * SSL connection using AES256-SHA256
> * Server certificate:
> *subject: CN=s01t9
> *start date: 2017-01-09 11:31:49 GMT
> *expire date: 2022-01-08 11:31:49 GMT
> *subjectAltName: s02 matched
> *issuer: DC=local; DC=com; CN=Root CA
> *SSL certificate verify ok.
> > GET /solr HTTP/1.1
> > User-Agent: curl/7.35.0
> > Host: s02
> > Accept: */*
> >
> < HTTP/1.1 302 Found
> < Location: https://s02 /solr/
> < Content-Length: 0
> <
> * Connection #0 to host s02 left intact
>
> Thanks,
> Espen
>


Re: 回复: bin/post and self-signed SSL

2017-02-06 Thread Kevin Risden
I expect that the commands work the same or very close from 5.5.x through
6.4.x. There have been some cleaning up of the bin/solr and bin/post
commands but not many security changes. If you find differently then please
let us know.

Kevin Risden

On Feb 5, 2017 21:02, "alias" <524839...@qq.com> wrote:

> You mean this can only be used in this version 5.5.x? Other versions
> invalid?
>
>
>
>
> -- 原始邮件 --
> 发件人: "Kevin Risden";<compuwizard...@gmail.com>;
> 发送时间: 2017年2月6日(星期一) 上午9:44
> 收件人: "solr-user"<solr-user@lucene.apache.org>;
>
> 主题: Re: bin/post and self-signed SSL
>
>
>
> Originally formatted as MarkDown. This was tested against Solr 5.5.x
> packaged as Lucidworks HDP Search. It would be the same as Solr 5.5.x.
>
> # Using Solr
> *
> https://cwiki.apache.org/confluence/display/solr/Solr+
> Start+Script+Reference
> * https://cwiki.apache.org/confluence/display/solr/Running+Solr
> * https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> ## Create collection (w/o Kerberos)
> ```bash
> /opt/lucidworks-hdpsearch/solr/bin/solr create -c test
> ```
>
> ## Upload configuration directory (w/ SSL and Kerberos)
> ```bash
> /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh
> -zkhost ZK_CONNECTION_STRING -cmd upconfig -confname basic_config -confdir
> /opt/lucidworks-hdpsearch/solr/server/solr/configsets/basic_configs/conf
> ```
>
> ## Create Collection (w/ SSL and Kerberos)
> ```bash
> curl -k --negotiate -u : "
> https://SOLR_HOST:8983/solr/admin/collections?action=
> CREATE=newCollection=1=
> 1=basic_config
> "
> ```
>
> ## Delete collection (w/o Kerberos)
> ```bash
> /opt/lucidworks-hdpsearch/solr/bin/solr delete -c test
> ```
>
> ## Delete Collection (w/ SSL and Kerberos)
> ```bash
> curl -k --negotiate -u : "
> https://SOLR_HOST:8983/solr/admin/collections?action=
> DELETE=newCollection
> "
> ```
>
> ## Adding some test docs (w/o SSL)
> ```bash
> /opt/lucidworks-hdpsearch/solr/bin/post -c test
> /opt/lucidworks-hdpsearch/solr/example/exampledocs/*.xml
> ```
>
> ## Adding documents (w/ SSL and Kerberos)
> ```bash
> curl -k --negotiate -u : "
> https://SOLR_HOST:8983/solr/newCollection/update?commit=true; -H
> "Content-Type: application/json" --data-binary
> @/opt/lucidworks-hdpsearch/solr/example/exampledocs/books.json
> ```
>
> ## List Collections (w/ SSL and Kerberos)
> ```bash
> curl -k --negotiate -u : "
> https://SOLR_HOST:8983/solr/admin/collections?action=LIST;
> ```
>
> Kevin Risden
>
> On Sun, Feb 5, 2017 at 5:55 PM, Kevin Risden <compuwizard...@gmail.com>
> wrote:
>
> > Last time I looked at this, there was no way to pass any Java properties
> > to the bin/post command. This made it impossible to even set the SSL
> > properties manually. I checked master just now and still there is no
> place
> > to enter Java properties that would make it to the Java command.
> >
> > I came up with a chart of commands previously that worked with standard
> > (no SSL or Kerberos), SSL only, and SSL with Kerberos. Only the standard
> > solr setup worked for the bin/solr and bin/post commands. Errors popped
> up
> > that I couldn't work around. I've been meaning to get back to it just
> > haven't had a chance.
> >
> > I'll try to share that info when I get back to my laptop.
> >
> > Kevin Risden
> >
> > On Feb 5, 2017 12:31, "Jan Høydahl" <jan@cominvent.com> wrote:
> >
> >> Hi,
> >>
> >> I’m trying to post a document to Solr using bin/post after enabling SSL
> >> with self signed certificate. Result is:
> >>
> >> $ post -url https://localhost:8983/solr/sslColl *.html
> >> /usr/lib/jvm/java-8-openjdk-amd64/bin/java -classpath
> >> /opt/solr/dist/solr-core-6.4.0.jar -Dauto=yes -Durl=
> >> https://localhost:8983/solr/sslColl -Dc= -Ddata=files
> >> org.apache.solr.util.SimplePostTool lab-index.html lab-ops1.html
> >> lab-ops2.html lab-ops3.html lab-ops4.html lab-ops6.html lab-ops8.html
> >> SimplePostTool version 5.0.0
> >> Posting files to [base] url https://localhost:8983/solr/sslColl...
> >> Entering auto mode. File endings considered are
> >> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,
> >> ods,ott,otp,ots,rtf,htm,html,txt,log
> >> POSTing file lab-index.html (text/html) to [base]/extract
> >> SimplePostTool: FATAL: Connection error (is Solr running at
> >> https://localhost:8983/solr/sslColl ?): javax.net.ssl.
> SSLHandshakeException:
> >> sun.security.validator.ValidatorException: PKIX path building failed:
> >> sun.security.provider.certpath.SunCertPathBuilderException: unable to
> >> find valid certification path to requested target
> >>
> >>
> >> Do anyone know a workaround for letting bin/post accept self-signed
> cert?
> >> Have not tested it against a CA signed Solr...
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >>


Re: bin/post and self-signed SSL

2017-02-05 Thread Kevin Risden
Originally formatted as MarkDown. This was tested against Solr 5.5.x
packaged as Lucidworks HDP Search. It would be the same as Solr 5.5.x.

# Using Solr
*
https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference
* https://cwiki.apache.org/confluence/display/solr/Running+Solr
* https://cwiki.apache.org/confluence/display/solr/Collections+API

## Create collection (w/o Kerberos)
```bash
/opt/lucidworks-hdpsearch/solr/bin/solr create -c test
```

## Upload configuration directory (w/ SSL and Kerberos)
```bash
/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh
-zkhost ZK_CONNECTION_STRING -cmd upconfig -confname basic_config -confdir
/opt/lucidworks-hdpsearch/solr/server/solr/configsets/basic_configs/conf
```

## Create Collection (w/ SSL and Kerberos)
```bash
curl -k --negotiate -u : "
https://SOLR_HOST:8983/solr/admin/collections?action=CREATE=newCollection=1=1=basic_config
"
```

## Delete collection (w/o Kerberos)
```bash
/opt/lucidworks-hdpsearch/solr/bin/solr delete -c test
```

## Delete Collection (w/ SSL and Kerberos)
```bash
curl -k --negotiate -u : "
https://SOLR_HOST:8983/solr/admin/collections?action=DELETE=newCollection
"
```

## Adding some test docs (w/o SSL)
```bash
/opt/lucidworks-hdpsearch/solr/bin/post -c test
/opt/lucidworks-hdpsearch/solr/example/exampledocs/*.xml
```

## Adding documents (w/ SSL and Kerberos)
```bash
curl -k --negotiate -u : "
https://SOLR_HOST:8983/solr/newCollection/update?commit=true; -H
"Content-Type: application/json" --data-binary
@/opt/lucidworks-hdpsearch/solr/example/exampledocs/books.json
```

## List Collections (w/ SSL and Kerberos)
```bash
curl -k --negotiate -u : "
https://SOLR_HOST:8983/solr/admin/collections?action=LIST;
```

Kevin Risden

On Sun, Feb 5, 2017 at 5:55 PM, Kevin Risden <compuwizard...@gmail.com>
wrote:

> Last time I looked at this, there was no way to pass any Java properties
> to the bin/post command. This made it impossible to even set the SSL
> properties manually. I checked master just now and still there is no place
> to enter Java properties that would make it to the Java command.
>
> I came up with a chart of commands previously that worked with standard
> (no SSL or Kerberos), SSL only, and SSL with Kerberos. Only the standard
> solr setup worked for the bin/solr and bin/post commands. Errors popped up
> that I couldn't work around. I've been meaning to get back to it just
> haven't had a chance.
>
> I'll try to share that info when I get back to my laptop.
>
> Kevin Risden
>
> On Feb 5, 2017 12:31, "Jan Høydahl" <jan@cominvent.com> wrote:
>
>> Hi,
>>
>> I’m trying to post a document to Solr using bin/post after enabling SSL
>> with self signed certificate. Result is:
>>
>> $ post -url https://localhost:8983/solr/sslColl *.html
>> /usr/lib/jvm/java-8-openjdk-amd64/bin/java -classpath
>> /opt/solr/dist/solr-core-6.4.0.jar -Dauto=yes -Durl=
>> https://localhost:8983/solr/sslColl -Dc= -Ddata=files
>> org.apache.solr.util.SimplePostTool lab-index.html lab-ops1.html
>> lab-ops2.html lab-ops3.html lab-ops4.html lab-ops6.html lab-ops8.html
>> SimplePostTool version 5.0.0
>> Posting files to [base] url https://localhost:8983/solr/sslColl...
>> Entering auto mode. File endings considered are
>> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,
>> ods,ott,otp,ots,rtf,htm,html,txt,log
>> POSTing file lab-index.html (text/html) to [base]/extract
>> SimplePostTool: FATAL: Connection error (is Solr running at
>> https://localhost:8983/solr/sslColl ?): javax.net.ssl.SSLHandshakeException:
>> sun.security.validator.ValidatorException: PKIX path building failed:
>> sun.security.provider.certpath.SunCertPathBuilderException: unable to
>> find valid certification path to requested target
>>
>>
>> Do anyone know a workaround for letting bin/post accept self-signed cert?
>> Have not tested it against a CA signed Solr...
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>>


Re: bin/post and self-signed SSL

2017-02-05 Thread Kevin Risden
Last time I looked at this, there was no way to pass any Java properties to
the bin/post command. This made it impossible to even set the SSL
properties manually. I checked master just now and still there is no place
to enter Java properties that would make it to the Java command.

I came up with a chart of commands previously that worked with standard (no
SSL or Kerberos), SSL only, and SSL with Kerberos. Only the standard solr
setup worked for the bin/solr and bin/post commands. Errors popped up that
I couldn't work around. I've been meaning to get back to it just haven't
had a chance.

I'll try to share that info when I get back to my laptop.

Kevin Risden

On Feb 5, 2017 12:31, "Jan Høydahl" <jan@cominvent.com> wrote:

> Hi,
>
> I’m trying to post a document to Solr using bin/post after enabling SSL
> with self signed certificate. Result is:
>
> $ post -url https://localhost:8983/solr/sslColl *.html
> /usr/lib/jvm/java-8-openjdk-amd64/bin/java -classpath
> /opt/solr/dist/solr-core-6.4.0.jar -Dauto=yes -Durl=
> https://localhost:8983/solr/sslColl -Dc= -Ddata=files
> org.apache.solr.util.SimplePostTool lab-index.html lab-ops1.html
> lab-ops2.html lab-ops3.html lab-ops4.html lab-ops6.html lab-ops8.html
> SimplePostTool version 5.0.0
> Posting files to [base] url https://localhost:8983/solr/sslColl...
> Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,
> docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> POSTing file lab-index.html (text/html) to [base]/extract
> SimplePostTool: FATAL: Connection error (is Solr running at
> https://localhost:8983/solr/sslColl ?): javax.net.ssl.SSLHandshakeException:
> sun.security.validator.ValidatorException: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable to
> find valid certification path to requested target
>
>
> Do anyone know a workaround for letting bin/post accept self-signed cert?
> Have not tested it against a CA signed Solr...
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>


Re: How long for autoAddReplica?

2017-02-02 Thread Kevin Risden
>
> so migrating by replacing nodes is going to be a bother.


Not sure what you mean by migrating and replacing nodes, but these two new
actions on the Collections API as of Solr 6.2 may be of use:

   -
   
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-REPLACENODE:MoveAllReplicasinaNodetoAnother
   -
   
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DELETENODE:DeleteReplicasinaNode



Kevin Risden

On Thu, Feb 2, 2017 at 11:46 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> bq: I don’t see a way to add replicas through the UI, so migrating by
> replacing nodes is going to be a bother
>
> There's a lot of improvements in the admin UI for SolrCloud that I'd
> love to see. Drag/drop replicas would be really cool for instance.
>
> At present though using
> ADDREPLICA/wait-for-new-replica-to-be-active/DELETEREPLICA through the
> collections API is what's available.
>
> Best,
> Erick
>
> On Thu, Feb 2, 2017 at 8:37 AM, Walter Underwood <wun...@wunderwood.org>
> wrote:
> > Oh, missed that limitation.
> >
> > Seems like something that would be very handy in all installations. I
> don’t see a way to add replicas through the UI, so migrating by replacing
> nodes is going to be a bother.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> >> On Feb 2, 2017, at 12:25 AM, Hendrik Haddorp <hendrik.hadd...@gmx.net>
> wrote:
> >>
> >> Hi,
> >>
> >> are you using HDFS? According to the documentation the feature should
> be only available if you are using HDFS. For me it did however also fail on
> that. See the thread "Solr on HDFS: AutoAddReplica does not add a replica"
> from about two weeks ago.
> >>
> >> regards,
> >> Hendrik
> >>
> >> On 02.02.2017 07:21, Walter Underwood wrote:
> >>> I added a new node an shut down a node with a shard replica on it. It
> has been an hour and I don’t see any activity toward making a new replica.
> >>>
> >>> The new node and the one I shut down are both 6.4. The rest of the
> 16-node cluster is 6.2.1.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>
> >>>
> >>
> >
>


Re: 6.4 in a 6.2.1 cluster?

2017-01-31 Thread Kevin Risden
Just my two cents: I wouldn't trust that it completely works to be honest.
It works for the very small test case that was put together (select q=*:*).
I would love to add more tests to it. If there are any ideas of things that
you think should be tested that would be great to comment on the JIRA
(ideally everything but prioritizing some examples would be nice).

Kevin Risden

On Tue, Jan 31, 2017 at 11:19 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

> I’m sure people need to do this, so I’ll share it worked for me.
>
> I just noticed that there is a new integration test being written to
> verify that this works. Great!
>
> https://issues.apache.org/jira/browse/SOLR-8581 <
> https://issues.apache.org/jira/browse/SOLR-8581>
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jan 25, 2017, at 11:18 AM, Walter Underwood <wun...@wunderwood.org>
> wrote:
> >
> > Has anybody done this? Not for long term use of course, but does it work
> well enough
> > for a rolling upgrade?
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> > http://observer.wunderwood.org/  (my blog)
> >
> >
>
>


Re: Adding DocExpirationUpdateProcessorFactory causes "Overlapping onDeckSearchers" warnings

2016-12-09 Thread Kevin Risden
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/update/processor/DocExpirationUpdateProcessorFactory.java#L407

Based on that it looks like this would definitely trigger additional
commits. Specifically with openSearcher being true.

Not sure the best way around this.

Kevin Risden

On Fri, Dec 9, 2016 at 5:15 PM, Brent <brent.pear...@gmail.com> wrote:

> I'm using Solr Cloud 6.1.0, and my client application is using SolrJ 6.1.0.
>
> Using this Solr config, I get none of the dreaded "PERFORMANCE WARNING:
> Overlapping onDeckSearchers=2" log messages:
> https://dl.dropboxusercontent.com/u/49733981/solrconfig-no_warnings.xml
>
> However, I start getting them frequently after I add an expiration update
> processor to the update request processor chain, as seen in this config (at
> the bottom):
> https://dl.dropboxusercontent.com/u/49733981/solrconfig-warnings.xml
>
> Do I have something configured wrong in the way I've tried to add the
> function of expiring documents? My client application sets the "expire_at"
> field with the date to remove the document being added, so I don't need
> anything on the Solr Cloud side to calculate the expiration date using a
> TTL. I've confirmed that the documents are getting removed as expected
> after
> the TTL duration.
>
> Is it possible that the expiration processor is triggering additional
> commits? Seems like the warning is usually the result of commits happening
> too frequently. If the commit spacing is fine without the expiration
> processor, but not okay when I add it, it seems like maybe each update is
> now triggering a (soft?) commit. Although, that'd actually be crazy and I'm
> sure I'd see a lot more errors if that were the case... is it triggering a
> commit every 30 seconds, because that's what I have the
> autoDeletePeriodSeconds set to? Maybe if I try to offset that a bit from
> the
> 10 second auto soft commit I'm using? Seems like it'd be better (if that is
> the case) if the processor simple didn't have to do a commit when it
> expires
> documents, and instead let the auto commit settings handle that.
>
> Do I still need the line:
>  name="/update">
> when I have the
>  default="true">
> element?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Adding-
> DocExpirationUpdateProcessorFactory-causes-Overlapping-
> onDeckSearchers-warnings-tp4309155.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Highlighting, offsets -- external doc store

2016-11-29 Thread Kevin Risden
For #2 you might be able to get away with the following:

https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component

The Term Vector component can return offsets and positions. Not sure how
useful they would be to you, but at least is a starting point. I'm assuming
this requires only termVecotrs and termPositions and won't require stored
to be true.

Kevin Risden

On Tue, Nov 29, 2016 at 12:00 PM, Kevin Risden <compuwizard...@gmail.com>
wrote:

> For #3 specifically, I've always found this page useful:
>
> https://cwiki.apache.org/confluence/display/solr/Field+
> Properties+by+Use+Case
>
> It lists out what properties are necessary on each field based on a use
> case.
>
> Kevin Risden
>
> On Tue, Nov 29, 2016 at 11:49 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> (1) No that I have readily at hand. And to make it
>> worse, there's the UnifiedHighlighter coming out soon
>>
>> I don't think there's a good way for (2).
>>
>> for (3) at least yes. The reason is simple. For analyzed text,
>> the only thing in the index is what's made it through the
>> analysis chains. So stopwords are missing. Stemming
>> has been done. You could even have put a phonetic filter
>> in there and have terms like ARDT KNTR which would
>> be...er...not very useful to show the end user so the original
>> text must be available.
>>
>>
>>
>>
>> Not much help...
>> Erick
>>
>> On Tue, Nov 29, 2016 at 8:43 AM, John Bickerstaff
>> <j...@johnbickerstaff.com> wrote:
>> > All,
>> >
>> > One of the questions I've been asked to answer / prove out is around the
>> > question of highlighting query matches in responses.
>> >
>> > BTW - One assumption I'm making is that highlighting is basically a
>> > function of storing offsets for terms / tokens at index time.  If that's
>> > not right, I'd be grateful for pointers in the right direction.
>> >
>> > My underlying need is to get highlighting on search term matches for
>> > returned documents.  I need to choose between doing this in Solr and
>> using
>> > an external document store, so I'm interested in whether Solr can
>> provide
>> > the doc store with the information necessary to identify which
>> section(s)
>> > of the doc to highlight in a query response...
>> >
>> > A few questions:
>> >
>> > 1. This page doesn't say a lot about how things work - is there
>> somewhere
>> > with more information on dealing with offsets and highlighting? On
>> offsets
>> > and how they're handled?
>> > https://cwiki.apache.org/confluence/display/solr/Highlighting
>> >
>> > 2. Can I return offset information with a query response or is that
>> > internal only?  If yes, can I return offset info if I have NOT stored
>> the
>> > data in Solr but indexed only?
>> >
>> > (Explanation: Currently my project is considering indexing only and
>> storing
>> > the entire text elsewhere -- using Solr to return only doc ID's for
>> > searches.  If Solr could also return offsets, these could be used in
>> > processing the text stored elsewhere to provide highlighting)
>> >
>> > 3. Do I assume correctly that in order for Solr highlighting to work
>> > correctly, the text MUST also be stored in Solr (I.E. not indexed only,
>> but
>> > stored=true)
>> >
>> > Many thanks...
>>
>
>


Re: Highlighting, offsets -- external doc store

2016-11-29 Thread Kevin Risden
For #3 specifically, I've always found this page useful:

https://cwiki.apache.org/confluence/display/solr/Field+Properties+by+Use+Case

It lists out what properties are necessary on each field based on a use
case.

Kevin Risden

On Tue, Nov 29, 2016 at 11:49 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> (1) No that I have readily at hand. And to make it
> worse, there's the UnifiedHighlighter coming out soon
>
> I don't think there's a good way for (2).
>
> for (3) at least yes. The reason is simple. For analyzed text,
> the only thing in the index is what's made it through the
> analysis chains. So stopwords are missing. Stemming
> has been done. You could even have put a phonetic filter
> in there and have terms like ARDT KNTR which would
> be...er...not very useful to show the end user so the original
> text must be available.
>
>
>
>
> Not much help...
> Erick
>
> On Tue, Nov 29, 2016 at 8:43 AM, John Bickerstaff
> <j...@johnbickerstaff.com> wrote:
> > All,
> >
> > One of the questions I've been asked to answer / prove out is around the
> > question of highlighting query matches in responses.
> >
> > BTW - One assumption I'm making is that highlighting is basically a
> > function of storing offsets for terms / tokens at index time.  If that's
> > not right, I'd be grateful for pointers in the right direction.
> >
> > My underlying need is to get highlighting on search term matches for
> > returned documents.  I need to choose between doing this in Solr and
> using
> > an external document store, so I'm interested in whether Solr can provide
> > the doc store with the information necessary to identify which section(s)
> > of the doc to highlight in a query response...
> >
> > A few questions:
> >
> > 1. This page doesn't say a lot about how things work - is there somewhere
> > with more information on dealing with offsets and highlighting? On
> offsets
> > and how they're handled?
> > https://cwiki.apache.org/confluence/display/solr/Highlighting
> >
> > 2. Can I return offset information with a query response or is that
> > internal only?  If yes, can I return offset info if I have NOT stored the
> > data in Solr but indexed only?
> >
> > (Explanation: Currently my project is considering indexing only and
> storing
> > the entire text elsewhere -- using Solr to return only doc ID's for
> > searches.  If Solr could also return offsets, these could be used in
> > processing the text stored elsewhere to provide highlighting)
> >
> > 3. Do I assume correctly that in order for Solr highlighting to work
> > correctly, the text MUST also be stored in Solr (I.E. not indexed only,
> but
> > stored=true)
> >
> > Many thanks...
>


Re: Documentation of Zookeeper's specific roles and functions in Solr Cloud?

2016-11-29 Thread Kevin Risden
If using CloudSolrClient or another zookeeper aware client, then a request
gets sent to Zookeeper to determine the live nodes. If indexing,
CloudSolrClient can find the leader and send documents directly there. The
client then uses that information to query the correct nodes directly.

Zookeeper is not forwarding requests to Solr. The client requests from
Zookeeper and then the client uses that information to query Solr directly.

Kevin Risden

On Tue, Nov 29, 2016 at 10:49 AM, John Bickerstaff <j...@johnbickerstaff.com
> wrote:

> All,
>
> I've thought I understood that Solr search requests are made to the Solr
> servers and NOT Zookeeper directly.  (I.E. Zookeeper doesn't decide which
> Solr server responds to requests and requests are made directly to Solr)
>
> My new place tells me they're sending requests to Zookeeper - and those are
> getting sent on to Solr by Zookeeper - -- this is news to me if it's
> true...
>
> Is there any documentation of exactly the role(s) played by Zookeeper in a
> SolrCloud setup?
>


Re: Solr 6.3.0 SQL question

2016-11-28 Thread Kevin Risden
Is there a longer error/stack trace in your Solr server logs? I wonder if
the real error is being masked.

Kevin Risden

On Mon, Nov 28, 2016 at 3:24 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> I'm running this query:
>
> curl --data-urlencode 'stmt=SELECT avg(TextSize) from UNCLASS'
> http://cordelia:9100/solr/UNCLASS/sql?aggregationMode=map_reduce
>
> The error that I get back is:
>
> {"result-set":{"docs":[
> {"EXCEPTION":"org.apache.solr.common.SolrException: Collection not found:
> unclass","EOF":true,"RESPONSE_TIME":2}]}}
>
> TextSize is defined as:
>  indexed="true" stored="true"/>
>
> This query works fine:
> curl --data-urlencode 'stmt=SELECT TextSize from UNCLASS'
> http://cordelia:9100/solr/UNCLASS/sql?aggregationMode=map_reduce
>
> Any idea what I'm doing wrong?
> Thank you!
>
> -Joe
>
>


Re: Basic Auth for Solr Streaming Expressions

2016-11-16 Thread Kevin Risden
Thanks Sandeep!

Kevin Risden

On Wed, Nov 16, 2016 at 3:33 PM, sandeep mukherjee <
wiredcit...@yahoo.com.invalid> wrote:

> [SOLR-9779] Basic auth in not supported in Streaming Expressions - ASF JIRA
>
> |
> |
> |
> |   ||
>
>|
>
>   |
> |
> |   |
> [SOLR-9779] Basic auth in not supported in Streaming Expressions - ASF JIRA
>|   |
>
>   |
>
>   |
>
>
>
> I have created the above jira ticket for the base auth support in solr
> streaming expressions.
> ThanksSandeep
>
> On Wednesday, November 16, 2016 8:22 AM, sandeep mukherjee
> <wiredcit...@yahoo.com.INVALID> wrote:
>
>
>   blockquote, div.yahoo_quoted { margin-left: 0 !important;
> border-left:1px #715FFA solid !important; padding-left:1ex !important;
> background-color:white !important; }  Nope never got past the login screen.
> Will create one today.
>
>
> Sent from Yahoo Mail for iPhone
>
>
> On Wednesday, November 16, 2016, 8:17 AM, Kevin Risden <
> compuwizard...@gmail.com> wrote:
>
> Was a JIRA ever created for this? I couldn't find it searching.
>
> One that is semi related is SOLR-8213 for SolrJ JDBC auth.
>
> Kevin Risden
>
> On Wed, Nov 9, 2016 at 8:25 PM, Joel Bernstein <joels...@gmail.com> wrote:
>
> > Thanks for digging into this, let's create a jira ticket for this.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, Nov 9, 2016 at 6:23 PM, sandeep mukherjee <
> > wiredcit...@yahoo.com.invalid> wrote:
> >
> > > I have more progress since my last mail. I figured out that  in the
> > > StreamContext object there is a way to set the SolrClientCache object
> > which
> > > keep reference to all the CloudSolrClient where I can set a reference
> to
> > > HttpClient which sets the Basic Auth header. However the problem is,
> > inside
> > > the SolrClientCache there is no way to set your own version of
> > > CloudSolrClient with BasicAuth enabled. Unfortunately, SolrClientCache
> > has
> > > no set method which takes a CloudSolrClient object.
> > > So long story short we need an API in SolrClientCache to
> > > accept CloudSolrClient object from user.
> > > Please let me know if there is a better way to enable Basic Auth when
> > > using StreamFactory as mentioned in my previous email.
> > > Thanks much,Sandeep
> > >
> > >On Wednesday, November 9, 2016 11:44 AM, sandeep mukherjee
> > > <wiredcit...@yahoo.com.INVALID> wrote:
> > >
> > >
> > >  Hello everyone,
> > > I trying to find the documentation for Basic Auth plugin for Solr
> > > Streaming expressions. But I'm not able to find it in the documentation
> > > anywhere. Could you please point me in right direction of how to enable
> > > Basic auth for Solr Streams?
> > > I'm creating StreamFactory as follows: I wonder how and where can I
> > > specify Basic Auth username and password
> > > @Bean
> > > public StreamFactory streamFactory() {
> > >SolrConfig solrConfig = ConfigManager.getNamedConfig("solr",
> > > SolrConfig.class);
> > >
> > >return new StreamFactory().withDefaultZkHost(solrConfig.
> > > getConnectString())
> > >.withFunctionName("gatherNodes", GatherNodesStream.class);
> > > }
> > >
> > >
> > >
> >
>
>
>
>
>
>


Re: Hardware size in solrcloud

2016-11-16 Thread Kevin Risden
First question: is your initial sizing correct?

7GB/1 billion = 7 bytes per document? That would be basically 7 characters?

Anyway there are lots of variables regarding sizing. The typical response
is:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Kevin Risden

On Wed, Nov 16, 2016 at 1:12 PM, Mugeesh Husain <muge...@gmail.com> wrote:

> I have lots of document i dont know now how much it would be in future. for
> the inilial stage, I am looking for hardware details(assumption).
>
> I are looking forward to setting up a billion document(1 billion approx)
> solr index and the size is 7GB.
>
> Can you please suggest the hardware details as per experience.
> 1. OS(32/64bit):
> 2. Processor:
> 3. RAM:
> 4. No of physical servers/systems :
>
>
> Thanks
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Hardware-size-in-solrcloud-tp4306169.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Basic Auth for Solr Streaming Expressions

2016-11-16 Thread Kevin Risden
Was a JIRA ever created for this? I couldn't find it searching.

One that is semi related is SOLR-8213 for SolrJ JDBC auth.

Kevin Risden

On Wed, Nov 9, 2016 at 8:25 PM, Joel Bernstein <joels...@gmail.com> wrote:

> Thanks for digging into this, let's create a jira ticket for this.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Nov 9, 2016 at 6:23 PM, sandeep mukherjee <
> wiredcit...@yahoo.com.invalid> wrote:
>
> > I have more progress since my last mail. I figured out that  in the
> > StreamContext object there is a way to set the SolrClientCache object
> which
> > keep reference to all the CloudSolrClient where I can set a reference to
> > HttpClient which sets the Basic Auth header. However the problem is,
> inside
> > the SolrClientCache there is no way to set your own version of
> > CloudSolrClient with BasicAuth enabled. Unfortunately, SolrClientCache
> has
> > no set method which takes a CloudSolrClient object.
> > So long story short we need an API in SolrClientCache to
> > accept CloudSolrClient object from user.
> > Please let me know if there is a better way to enable Basic Auth when
> > using StreamFactory as mentioned in my previous email.
> > Thanks much,Sandeep
> >
> > On Wednesday, November 9, 2016 11:44 AM, sandeep mukherjee
> > <wiredcit...@yahoo.com.INVALID> wrote:
> >
> >
> >  Hello everyone,
> > I trying to find the documentation for Basic Auth plugin for Solr
> > Streaming expressions. But I'm not able to find it in the documentation
> > anywhere. Could you please point me in right direction of how to enable
> > Basic auth for Solr Streams?
> > I'm creating StreamFactory as follows: I wonder how and where can I
> > specify Basic Auth username and password
> > @Bean
> > public StreamFactory streamFactory() {
> > SolrConfig solrConfig = ConfigManager.getNamedConfig("solr",
> > SolrConfig.class);
> >
> > return new StreamFactory().withDefaultZkHost(solrConfig.
> > getConnectString())
> > .withFunctionName("gatherNodes", GatherNodesStream.class);
> > }
> >
> >
> >
>


Re: Sorl shards: very sensitive to swap space usage !?

2016-11-10 Thread Kevin Risden
Agreed with what Shawn and Erick said.

If you don't see anything in the Solr logs and your servers are swapping a
lot, this could mean the Linux OOM killer is killing the Solr process (and
maybe others). There is usually a log of this depending on your Linux
distribution.

Kevin Risden

On Thu, Nov 10, 2016 at 6:42 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 11/10/2016 3:20 PM, Chetas Joshi wrote:
> > I have a SolrCloud (Solr 5.5.0) of 50 nodes. The JVM heap memory usage
> > of my solr shards is never more than 50% of the total heap. However,
> > the hosts on which my solr shards are deployed often run into 99% swap
> > space issue. This causes the solr shards go down. Why solr shards are
> > so sensitive to the swap space usage? The JVM heap is more than enough
> > so the shards should never require the swap space. What could be the
> > reason? Where can find the reason why the solr shards go down. I don't
> > see anything on the solr logs.
>
> If the machine that Solr is installed on is using swap, that means
> you're having serious problems, and your performance will be TERRIBLE.
> This kind of problem cannot be caused by Solr if it is properly
> configured for the machine it's running on.
>
> Solr is a Java program.  That means its memory usage is limited to the
> Java heap, plus a little bit for Java itself, and absolutely cannot go
> any higher.  If the Java heap is set too large, then the operating
> system might utilize swap to meet Java's memory demands.  The solution
> is to set your Java heap to a value that's significantly smaller than
> the amount of available physical memory.  Setting the heap to a value
> that's close to (or more than) the amount of physical memory, is a
> recipe for very bad performance.
>
> You need to also limit the memory usage of other software installed on
> the machine, or you might run into a situation where swap is required
> that is not Solr's fault.
>
> Thanks,
> Shawn
>
>


Re: How to substract numeric value stored in 2 documents related by correlation id one-to-one

2016-10-19 Thread Kevin Risden
The Parallel SQL support for what you are asking for doesn't exist quite
yet. The use case you described is close to what I was envisioning for the
Solr SQL support. This would allow full text searches and then some
analytics on top of it (like call duration).

I'm not sure if subtracting fields (c2.time-c1.time) is supported in
streaming expressions yet. The leftOuterJoin is but not sure about
arbitrary math equations. The Parallel SQL side has an issue w/ 1!=0 right
now so I'm guessing adding/subtracting is also out for now.

The ticket you will want to follow is SOLR-8593 (
https://issues.apache.org/jira/browse/SOLR-8593) This is the Calcite
integration and should enable a lot more SQL syntax as a result.

Kevin Risden
Apache Lucene/Solr Committer
Hadoop and Search Tech Lead | Avalon Consulting, LLC
<http://www.avalonconsult.com/>
M: 732 213 8417
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>

-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

On Wed, Oct 19, 2016 at 8:23 AM, <ka...@kahle.cz> wrote:

> Hello,
> I have 2 documents recorded at request or response of a service call  :
> Entity Request
>  {
>   "type":"REQ",
>   "reqid":"MES0",
>"service":"service0",
>"time":1,
>  }
> Entity response
>  {
>   "type":"RES",
>   "reqid":"MES0",
>"time":10,
>  }
>
> I need to create following statistics:
> Total service call duration for each call (reqid is unique for each
> service call) :
> similar to query :
> select c1.reqid,c1.service,c1.time as REQTime, c2.time as RESTime ,
> c2.time - c1.time as TotalTime from collection c1 left join collection c2
> on c1.reqid = c2.reqid and c2.type = 'RES'
>
>  {
>"reqid":"MES0",
>"service":service0,
>"REQTime":1,
>"RESTime":10,
>"TotalTime":9
>  }
>
> Average service call duration :
> similar to query :
> select c1.service,  avg(c2.time - c1.time) as AvgTime, count(*) from
> collection c1 left join collection c2 on c1.reqid = c2.reqid and c2.type =
> 'RES' group by c1.service
>
>  {
>"service":service0,
>"AvgTime":9,
>"Count": 1
>  }
>
> I Tried to find solution in archives, I experimented  with !join,
> subquery, _query_ etc. but not succeeded..
> I can probably use streaming and leftOuterJoin, but in my understanding
> this functionality is not ready for production.
> Is SOLR capable to fulfill these use cases?  What are the key functions to
> focus on ?
>
> Thanks' Pavel
>
>
>
>
>
>
>
>
>


Re: Problem with Password Decryption in Data Import Handler

2016-10-06 Thread Kevin Risden
I haven't tried this but is it possible there is a new line at the end in
the file?

If you did something like echo "" > file.txt then there would be a new
line. Use echo -n "" > file.txt

Also you should be able to check how many characters are in the file.

Kevin Risden

On Wed, Oct 5, 2016 at 5:00 PM, Jamie Jackson <jamieja...@gmail.com> wrote:

> Hi Folks,
>
> (Using Solr 5.5.3.)
>
> As far as I know, the only place where encrypted password use is documented
> is in
> https://cwiki.apache.org/confluence/display/solr/
> Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler,
> under the "Configuring the DIH Configuration File", in a comment in the
> sample XML file:
>
> 
>
> Anyway, I can encrypt just fine:
>
> $ openssl enc -aes-128-cbc -a -salt -in stgps.txt
> enter aes-128-cbc encryption password:
> Verifying - enter aes-128-cbc encryption password:
> U2FsdGVkX1+VtVoQtmEREvB5qZjn3131+N4jRXmjyIY=
>
>
> I can also decrypt just fine from the command line.
>
> However, if I use the encrypted password and encryptKeyFile in the config
> file, I end up with an error: "String length must be a multiple of four."
>
> https://gist.github.com/jamiejackson/3852dacb03432328ea187d43ade5e4d9
>
> How do I get this working?
>
> Thanks,
> Jamie
>


Re: CheckHdfsIndex with Kerberos not working

2016-10-03 Thread Kevin Risden
You need to have the hadoop pieces on the classpath. Like core-site.xml and
hdfs-site.xml. There is an hdfs classpath command that would help but it
may have too many pieces. You may just need core-site and hdfs-site so you
don't get conflicting jars.

Something like this may work for you:

java -cp
"$(hdfs classpath):./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/
ext/*:/hadoop/hadoop-client/lib/servlet-api-2.5.jar"
-ea:org.apache.lucene... org.apache.solr.index.hdfs.CheckHdfsIndex
hdfs://:8020/apps/solr/data/ExampleCollection/
core_node1/data/index

Kevin Risden

On Mon, Oct 3, 2016 at 1:38 PM, Rishabh Patel <
rishabh.mahendra.pa...@gmail.com> wrote:

> Hello,
>
> My SolrCloud 5.5 installation has Kerberos enabled. The CheckHdfsIndex test
> fails to run. However, without Kerberos, I am able to run the test with no
> issues.
>
> I ran the following command:
>
> java -cp
> "./server/solr-webapp/webapp/WEB-INF/lib/*:./server/lib/
> ext/*:/hadoop/hadoop-client/lib/servlet-api-2.5.jar"
> -ea:org.apache.lucene... org.apache.solr.index.hdfs.CheckHdfsIndex
> hdfs://:8020/apps/solr/data/ExampleCollection/
> core_node1/data/index
>
> The error is:
>
> ERROR: could not open hdfs directory "
> hdfs://:8020/apps/solr/data/ExampleCollection/
> core_node1/data/index
> ";
> exiting org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.
> AccessControlException):
> SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
> Does this error message imply that the test cannot run with Kerberos
> enabled?
>
> For reference, I followed this blog
> http://yonik.com/solr-5-5/
>
> --
> Regards,
> *Rishabh Patel*
>


Re: Unable to connect to correct port in solr 6.2.0

2016-09-12 Thread Kevin Risden
Jan - the issue you are hitting is Docker and /proc/version is getting the
underlying OS kernel and not what you would expect from the Docker
container. The errors for update-rc.d and service are because the docker
image you are using is trimmed down.

Kevin Risden

On Mon, Sep 12, 2016 at 3:19 PM, Jan Høydahl <jan@cominvent.com> wrote:

> I tried it on a Docker RHEL system (gidikern/rhel-oracle-jre) and the
> install failed with errors
>
> ./install_solr_service.sh: line 322: update-rc.d: command not found
> ./install_solr_service.sh: line 326: service: command not found
> ./install_solr_service.sh: line 328: service: command not found
>
> Turns out that /proc/version returns “Ubuntu” this on the system:
> Linux version 4.4.19-moby (root@3934ed318998) (gcc version 5.4.0 20160609
> (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #1 SMP Thu Sep 1 09:44:30 UTC 2016
> There is also a /etc/redhat-release file:
> Red Hat Enterprise Linux Server release 7.1 (Maipo)
>
> So the install of rc.d failed completely because of this. Don’t know if
> this is common on RHEL systems, perhaps we need to improve distro detection
> in installer?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 12. sep. 2016 kl. 21.31 skrev Shalin Shekhar Mangar <
> shalinman...@gmail.com>:
> >
> > I just tried this out on ubuntu (sorry I don't have access to a red hat
> > system) and it works fine.
> >
> > One thing that you have to take care of is that if you install the
> service
> > on the default 8983 port then, trying to upgrade with the same tar to a
> > different port does not work. So please ensure that you hadn't already
> > installed the service before already.
> >
> > On Tue, Sep 13, 2016 at 12:53 AM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> >> Which version of red hat? Is lsof installed on this system?
> >>
> >> On Mon, Sep 12, 2016 at 4:30 PM, Preeti Bhat <preeti.b...@shoregrp.com>
> >> wrote:
> >>
> >>> HI All,
> >>>
> >>> I am trying to setup the solr in Redhat Linux, using the
> >>> install_solr_service.sh script of solr.6.2.0  tgz. The script runs and
> >>> starts the solr on port 8983 even when the port is specifically
> specified
> >>> as 2016.
> >>>
> >>> /root/install_solr_service.sh solr-6.2.0.tgz -i /opt -d /var/solr -u
> root
> >>> -s solr -p 2016
> >>>
> >>> Is this correct way to setup solr in linux? Also, I have observed that
> if
> >>> I go to the /bin/solr and start with the port number its working as
> >>> expected but not as service.
> >>>
> >>> I would like to setup the SOLR in SOLRCloud mode with external
> zookeepers.
> >>>
> >>> Could someone please advise on this?
> >>>
> >>>
> >>>
> >>> NOTICE TO RECIPIENTS: This communication may contain confidential
> and/or
> >>> privileged information. If you are not the intended recipient (or have
> >>> received this communication in error) please notify the sender and
> >>> it-supp...@shoregrp.com immediately, and destroy this communication.
> Any
> >>> unauthorized copying, disclosure or distribution of the material in
> this
> >>> communication is strictly forbidden. Any views or opinions presented in
> >>> this email are solely those of the author and do not necessarily
> represent
> >>> those of the company. Finally, the recipient should check this email
> and
> >>> any attachments for the presence of viruses. The company accepts no
> >>> liability for any damage caused by any virus transmitted by this email.
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>
>


Re: NoNode error on -downconfig when node does exist?

2016-08-08 Thread Kevin Risden
Just a quick guess: do you have a period (.) in your zk connection string
chroot when you meant an underscore (_)?

When you do the ls you use /solr6_1/configs, but you have /solr6.1 in your
zk connection string chroot.

Kevin Risden

On Mon, Aug 8, 2016 at 4:44 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> First, the caveat:  I understand this is technically a zookeeper error.  It
> is an error that occurs when trying to deal with Solr however, so I'm
> hoping someone on the list may have some insight.  Also, I'm getting the
> error via the zkcli.sh tool that comes with Solr...
>
> I have created a collection in SolrCloud (6.1) giving the "techproducts"
> sample directory as the location of the conf files.
>
> I then wanted to download those files from zookeeper to the local machine
> via the -cmd downconfig command, so I issue this command:
>
> sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd downconfig
> -confdir /home/john/conf/ -confname statdx -z 192.168.56.5/solr6.1
>
> Instead of the files, I get a stacktrace / error back which says :
>
> exception in thread "main" java.io.IOException: Error downloading files
> from zookeeper path /configs/statdx to /home/john/conf
> at
> org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK(
> ZkConfigManager.java:117)
> at
> org.apache.solr.common.cloud.ZkConfigManager.downloadConfigDir(
> ZkConfigManager.java:153)
> at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:237)
> *Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /configs/statdx*
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
> at
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:331)
> at
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:328)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(
> ZkCmdExecutor.java:60)
> at
> org.apache.solr.common.cloud.SolrZkClient.getChildren(
> SolrZkClient.java:328)
> at
> org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK(
> ZkConfigManager.java:101)
> ... 2 more
>
> However, when I actually look in Zookeeper, I find that the "directory"
> does exist and that inside it are listed all the files.
>
> Here is the output from zookeeper:
>
> [zk: localhost:2181(CONNECTED) 0] *ls /solr6_1/configs*
> [statdx]
>
> and...
>
> [zk: localhost:2181(CONNECTED) 1] *ls /solr6_1/configs/statdx*
> [mapping-FoldToASCII.txt, currency.xml, managed-schema, protwords.txt,
> synonyms.txt, stopwords.txt, _schema_analysis_synonyms_english.json,
> velocity, admin-extra.html, update-script.js,
> _schema_analysis_stopwords_english.json, solrconfig.xml,
> admin-extra.menu-top.html, elevate.xml, clustering, xslt,
> _rest_managed.json, mapping-ISOLatin1Accent.txt, spellings.txt, lang,
> admin-extra.menu-bottom.html]
>
> I've rebooted all my zookeeper nodes and restarted them - just in case...
> Same deal.
>
> Has anyone seen anything like this?
>


Re: Solr 6 / Solrj RuntimeException: First tuple is not a metadata tuple

2016-05-04 Thread Kevin Risden
>
> java.sql.SQLException: java.lang.RuntimeException: First tuple is not a
> metadata tuple
>

That is a client side error message meaning that the statement couldn't be
handled. There should be better error handling around this, but its not in
place currently.

And on Solr side, the logs seem okay:


The logs you shared don't seem to be the full logs. There will be a related
exception on the Solr server side. The exception on the Solr server side
will explain the cause of the problem.

Kevin Risden

On Wed, May 4, 2016 at 2:57 AM, deniz <denizdurmu...@gmail.com> wrote:

> I am trying to go through the steps  here
> <http://https://sematext.com/blog/2016/04/26/solr-6-as-jdbc-data-source/>
> to start playing with the new api, but I am getting:
>
> java.sql.SQLException: java.lang.RuntimeException: First tuple is not a
> metadata tuple
> at
>
> org.apache.solr.client.solrj.io.sql.StatementImpl.executeQuery(StatementImpl.java:70)
> at com.sematext.blog.App.main(App.java:28)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> Caused by: java.lang.RuntimeException: First tuple is not a metadata tuple
> at
>
> org.apache.solr.client.solrj.io.sql.ResultSetImpl.(ResultSetImpl.java:75)
> at
>
> org.apache.solr.client.solrj.io.sql.StatementImpl.executeQuery(StatementImpl.java:67)
> ... 6 more
>
>
>
> My code is
>
> import java.sql.Connection;
> import java.sql.DriverManager;
> import java.sql.ResultSet;
> import java.sql.SQLException;
> import java.sql.Statement;
>
>
> /**
>  * Hello world!
>  *
>  */
> public class App
> {
> public static void main( String[] args )
> {
>
>
> Connection connection = null;
> Statement statement = null;
> ResultSet resultSet = null;
>
> try{
> String connectionString =
>
> "jdbc:solr://zkhost:port?collection=test=map_reduce=1";
> connection = DriverManager.getConnection(connectionString);
> statement  = connection.createStatement();
> resultSet = statement.executeQuery("select id, text from test
> where tits=1 limit 5");
> while(resultSet.next()){
> String id = resultSet.getString("id");
> String nickname = resultSet.getString("text");
>
> System.out.println(id + " : " + nickname);
> }
> }catch(Exception e){
> e.printStackTrace();
> }finally{
> if (resultSet != null) {
> try {
> resultSet.close();
> } catch (Exception ex) {
> }
> }
> if (statement != null) {
> try {
> statement.close();
> } catch (Exception ex) {
> }
> }
> if (connection != null) {
> try {
> connection.close();
> } catch (Exception ex) {
> }
> }
> }
>
>
> }
> }
>
>
> I tried to figure out what is happening, but there is no more logs other
> than the one above. And on Solr side, the logs seem okay:
>
> 2016-05-04 15:52:30.364 INFO  (qtp1634198-41) [c:test s:shard1 r:core_node1
> x:test] o.a.s.c.S.Request [test]  webapp=/solr path=/sql
>
> params={includeMetadata=true=1=json=2.2=select+id,+text+from+test+where+tits%3D1+limit+5=map_reduce}
> status=0 QTime=3
> 2016-05-04 15:52:30.382 INFO  (qtp1634198-46) [c:test s:shard1 r:core_node1
> x:test] o.a.s.c.S.Request [test]  webapp=/solr path=/select
>
> params={q=(tits:"1")=false=id,text,score=score+desc=5=json=2.2}
> hits=5624 status=0 QTime=1
>
>
> The error is happening because of some missing handlers on errors on the
> code or because of some strict checks on IDE(Ideaj)? Anyone had similar
> issues while using sql with solrj?
>
>
> Thanks
>
> Deniz
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-6-Solrj-RuntimeException-First-tuple-is-not-a-metadata-tuple-tp4274451.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Parallel SQL Interface returns "java.lang.NullPointerException" after reloading collection

2016-05-03 Thread Kevin Risden
What I think is happening is that since the CloudSolrClient is from the
SolrCache and the collection was reloaded. zkStateReader is actually null
since there was no cloudSolrClient.connect() call after the reload. I think
that would cause the NPE on anything that uses the zkStateReader like
getClusterState().

ZkStateReader zkStateReader = cloudSolrClient.getZkStateReader();
ClusterState clusterState = zkStateReader.getClusterState();


Kevin Risden
Apache Lucene/Solr Committer
Hadoop and Search Tech Lead | Avalon Consulting, LLC
<http://www.avalonconsult.com/>
M: 732 213 8417
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>

-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

On Mon, May 2, 2016 at 9:58 PM, Joel Bernstein <joels...@gmail.com> wrote:

> Looks like the loop below is throwing a Null pointer. I suspect the
> collection has not yet come back online. In theory this should be self
> healing and when the collection comes back online it should start working
> again. If not then that would be a bug.
>
> for(String col : clusterState.getCollections()) {
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 2, 2016 at 10:06 PM, Ryan Yacyshyn <ryan.yacys...@gmail.com>
> wrote:
>
> > Yes stack trace can be found here:
> >
> > http://pastie.org/10821638
> >
> >
> >
> > On Mon, 2 May 2016 at 01:05 Joel Bernstein <joels...@gmail.com> wrote:
> >
> > > Can you post your stack trace? I suspect this has to do with how the
> > > Streaming API is interacting with SolrCloud. We can probably also
> create
> > a
> > > jira ticket for this.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Sun, May 1, 2016 at 4:02 AM, Ryan Yacyshyn <ryan.yacys...@gmail.com
> >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'm exploring with parallel SQL queries and found something strange
> > after
> > > > reloading the collection: the same query will return a
> > > > java.lang.NullPointerException error. Here are my steps on a fresh
> > > install
> > > > of Solr 6.0.0.
> > > >
> > > > *Start Solr in cloud mode with example*
> > > > bin/solr -e cloud -noprompt
> > > >
> > > > *Index some data*
> > > > bin/post -c gettingstarted example/exampledocs/*.xml
> > > >
> > > > *Send query, which works*
> > > > curl --data-urlencode 'stmt=select id,name from gettingstarted where
> > > > inStock = true limit 2'
> http://localhost:8983/solr/gettingstarted/sql
> > > >
> > > > *Reload the collection*
> > > > curl '
> > > >
> > > >
> > >
> >
> http://localhost:8983/solr/admin/collections?action=RELOAD=gettingstarted
> > > > '
> > > >
> > > > After reloading, running the exact query above will return the null
> > > pointer
> > > > exception error. Any idea why?
> > > >
> > > > If I stop all Solr severs and restart, then it's fine.
> > > >
> > > > *java -version*
> > > > java version "1.8.0_25"
> > > > Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
> > > > Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
> > > >
> > > > Thanks,
> > > > Ryan
> > > >
> > >
> >
>


Re: Question on Solr JDBC driver with SQL client like DB Visualizer

2016-04-15 Thread Kevin Risden
>
> Page 11, the screenshot specifies to select a
> "solr-solrj-6.0.0-SNAPSHOT.jar" which is equivalent into
> "solr-solrj-6.0.0.jar" shipped with released version, correct?
>

Correct the PDF was generated before 6.0.0 was released. The documentation
from SOLR-8521 is being migrated to here:

https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface#ParallelSQLInterface-SQLClientsandDatabaseVisualizationTools


> When I try adding that jar, it doesn't show up driver class, DBVisualizer
> still shows "No new driver class". Does it mean the class is not added to
> this jar yet?
>

I checked the Solr 6.0.0 release and the driver is there. I was testing it
yesterday for a blog series that I'm putting together.

Just for reference here is the output for the Solr 6 release:

tar -tvf solr-solrj-6.0.0.jar | grep sql
drwxrwxrwx  0 0  0   0 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/
-rwxrwxrwx  0 0  0 842 Apr  1 14:40
META-INF/services/java.sql.Driver
-rwxrwxrwx  0 0  0   10124 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/ConnectionImpl.class
-rwxrwxrwx  0 0  0   23557 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class
-rwxrwxrwx  0 0  04459 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/DriverImpl.class
-rwxrwxrwx  0 0  0   28333 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/ResultSetImpl.class
-rwxrwxrwx  0 0  05167 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class
-rwxrwxrwx  0 0  0   10451 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/StatementImpl.class
-rwxrwxrwx  0 0  0 141 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/package-info.class


Kevin Risden
Apache Lucene/Solr Committer
Hadoop and Search Tech Lead | Avalon Consulting, LLC
<http://www.avalonconsult.com/>
M: 732 213 8417
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>

-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.


Re: Which line is solr following in terms of a BI Tool?

2016-04-13 Thread Kevin Risden
For Solr 6, ParallelSQL and Solr JDBC driver are going to be developed more
as well as JSON facets. The Solr JDBC driver that is in Solr 6 contains
SOLR-8502. There are further improvements coming in SOLR-8659 that didn't
make it into 6.0. The Solr JDBC piece leverages ParallelSQL and in some
cases uses JSON facets under the hood.

The Solr JDBC driver should enable BI tools to connect to Solr and use the
language of SQL. This is also a familiar interface for many Java developers.

Just a note: Solr is not an RDBMS and shouldn't be treated like one even
with a JDBC driver. The Solr JDBC driver is more of a convenience for
querying.

Kevin Risden

On Tue, Apr 12, 2016 at 6:24 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> The unsatisfactory answer is that the have different characteristics.
>
> The analytics contrib does not work in distributed mode. It's not
> receiving a lot of love at this point.
>
> The JSON facets are estimations. Generally very close but are not
> guaranteed to be 100% accurate. The variance, as I understand it,
> is something on the order of < 1% in most cases.
>
> The pivot facets are accurate, but more expensive than the JSON
> facets.
>
> And, to make matters worse, the ParllelSQL way of doing some
> aggregations is going to give yet another approach.
>
> Best,
> Erick
>
> On Tue, Apr 12, 2016 at 7:15 AM, Pablo <anzorena.f...@gmail.com> wrote:
> > Hello,
> > I think this topic is important for solr users that are planning to use
> solr
> > as a BI Tool.
> > Speaking about facets, nowadays there are three majors way of doing
> (more or
> > less) the same  in solr.
> > First, you have the pivot facets, on the other hand you have the
> Analytics
> > component and finally you have the JSON Facet Api.
> > So, which line is Solr following? Which of these component is going to
> be in
> > constant development and which one is going to be deprecated sooner.
> > In Yonik page, there are some test that shows how JSON Facet Api performs
> > better than legacy facets, also the Api was way simpler than the pivot
> > facets, so in my case that was enough to base my solution around the JSON
> > Api. But I would like to know what are the thoughts of the solr
> developers.
> >
> > Thanks!
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Which-line-is-solr-following-in-terms-of-a-BI-Tool-tp4269597.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: NoSuchFileException errors common on version 5.5.0

2016-03-10 Thread Kevin Risden
This sounds related to SOLR-8587 and there is a fix in SOLR-8793 that isn't
out in a release since it was fixed after 5.5 went out.

Kevin Risden
Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/>
M: 732 213 8417
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>

-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

On Thu, Mar 10, 2016 at 11:02 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> I have a dev system running 5.5.0.  I am seeing a lot of
> NoSuchFileException errors (for segments_XXXfilenames).
>
> Here's a log excerpt:
>
> 2016-03-10 09:52:00.054 INFO  (qtp1012570586-821) [   x:inclive]
> org.apache.solr.core.SolrCore.Request [inclive]  webapp=/solr
> path=/admin/luke
> params={qt=/admin/luke=schema=javabin=2} status=500 QTime=1
> 2016-03-10 09:52:00.055 ERROR (qtp1012570586-821) [   x:inclive]
> org.apache.solr.servlet.HttpSolrCall
> null:java.nio.file.NoSuchFileException:
> /index/solr5/data/data/inc_0/index/segments_ias
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at
>
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
> at
>
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
> at
>
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
> at java.nio.file.Files.readAttributes(Files.java:1737)
> at java.nio.file.Files.size(Files.java:2332)
> at
> org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:209)
> 
>
> I did not include the full stacktrace, only up to the first Lucene/Solr
> class.
>
> Most of the error logs are preceded by a request to the /admin/luke
> handler, like you see above, but there are also entries where a failed
> request is not logged right before the error.  My index maintenance
> program calls /admin/luke to programmatically determine the uniqueKey
> for the index.
>
> These errors do not seem to actually interfere with Solr operation, but
> they do concern me.
>
> Thanks,
> Shawn
>
>


[ANNOUNCE] YCSB 0.7.0 Release

2016-02-26 Thread Kevin Risden
On behalf of the development community, I am pleased to announce the
release of YCSB 0.7.0.

Highlights:

* GemFire binding replaced with Apache Geode (incubating) binding
* Apache Solr binding was added
* OrientDB binding improvements
* HBase Kerberos support and use single connection
* Accumulo improvements
* JDBC improvements
* Couchbase scan implementation
* MongoDB improvements
* Elasticsearch version increase to 2.1.1

Full release notes, including links to source and convenience binaries:
https://github.com/brianfrankcooper/YCSB/releases/tag/0.7.0

This release covers changes from the last 1 month.


Re: CloudSolrClient query /admin/info/system

2015-10-27 Thread Kevin Risden
Created https://issues.apache.org/jira/browse/SOLR-8216

Kevin Risden
Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/>
M: 732 213 8417
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>

-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

On Tue, Oct 27, 2015 at 5:11 AM, Alan Woodward <a...@flax.co.uk> wrote:

> Hi Kevin,
>
> This looks like a bug in CSC - could you raise an issue?
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 26 Oct 2015, at 22:21, Kevin Risden wrote:
>
> > I am trying to use CloudSolrClient to query information about the Solr
> > server including version information. I found /admin/info/system and it
> > seems to provide the information I am looking for. However, it looks like
> > CloudSolrClient cannot query /admin/info since INFO_HANDLER_PATH [1] is
> not
> > part of the ADMIN_PATHS in CloudSolrClient.java [2]. Was this possibly
> > missed as part of SOLR-4943 [3]?
> >
> > Is this an issue or is there a better way to query this information?
> >
> > As a side note, ZK_PATH also isn't listed in ADMIN_PATHS. I'm not sure
> what
> > issues that could cause. Is there a reason that ADMIN_PATHS in
> > CloudSolrClient would be different than the paths in CommonParams [1]?
> >
> > [1]
> >
> https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/params/CommonParams.java#L168
> > [2]
> >
> https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L808
> > [3] https://issues.apache.org/jira/browse/SOLR-4943
> >
> > Kevin Risden
> > Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/
> >
> > M: 732 213 8417
> > LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> |
> Google+
> > <http://www.google.com/+AvalonConsultingLLC> | Twitter
> > <https://twitter.com/avalonconsult>
>
>


CloudSolrClient query /admin/info/system

2015-10-26 Thread Kevin Risden
I am trying to use CloudSolrClient to query information about the Solr
server including version information. I found /admin/info/system and it
seems to provide the information I am looking for. However, it looks like
CloudSolrClient cannot query /admin/info since INFO_HANDLER_PATH [1] is not
part of the ADMIN_PATHS in CloudSolrClient.java [2]. Was this possibly
missed as part of SOLR-4943 [3]?

Is this an issue or is there a better way to query this information?

As a side note, ZK_PATH also isn't listed in ADMIN_PATHS. I'm not sure what
issues that could cause. Is there a reason that ADMIN_PATHS in
CloudSolrClient would be different than the paths in CommonParams [1]?

[1]
https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/params/CommonParams.java#L168
[2]
https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L808
[3] https://issues.apache.org/jira/browse/SOLR-4943

Kevin Risden
Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/>
M: 732 213 8417
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>


Lucene/Solr Git Mirrors 5 day lag behind SVN?

2015-10-23 Thread Kevin Risden
It looks like both Apache Git mirror (git://git.apache.org/lucene-solr.git)
and GitHub mirror (https://github.com/apache/lucene-solr.git) are 5 days
behind SVN. This seems to have happened before:
https://issues.apache.org/jira/browse/INFRA-9182

Is this a known issue?

Kevin Risden