Re: Regarding Solr UI authentication

2016-08-12 Thread Shawn Heisey
On 8/11/2016 11:12 PM, Pradeep Chandra wrote:
> I am running solr using the command *bin/solr start *in Ubuntu. Now I
> want to give UI authentication to secure my Solr. Can you tell me how
> to make Solr password protected. I am not using Zookeeper/SolrCloud.

For what I would call "typical" authentication setups, the first step
will be to switch to SolrCloud and use zookeeper.  The basic
authentication support only works from zookeeper.

https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin

If you have an existing kerberos infrastructure, then you can enable
authenticating to your kerberos server without zookeeper.  This page
talks about how to enable it if you're running standalone mode rather
than SolrCloud:

https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Switching to SolrCloud is probably the best option.  It is likely that
more and more features will require it in the future.

Note: If you're adding authentication because untrustworthy people have
access to your Solr server ... you really should put it someplace where
those people can't reach it -- where it won't need authentication.  You
asked about authenticating the UI ... but it isn't actually the UI that
gets authenticated.  The UI is just a bunch of static html, css, and
javascript that can't do much of anything on its own.

Authentication happens for all the API calls that the UI uses, which are
the same API calls that are used to query/update Solr from your
search-enabled application.  Typically *all* clients that use Solr will
need to provide credentials once you enable authentication.

Thanks,
Shawn



RE: Getting "collection already exists" when creating collection in admin UI

2016-08-12 Thread Alexandre Drouin
I removed everything related to Zookeeper or Solr between each of my test, 
including the data directory.

Alexandre Drouin


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: August 12, 2016 3:24 PM
To: solr-user 
Subject: RE: Getting "collection already exists" when creating collection in 
admin UI
Importance: High

Reinstalling ZK wouldn't help if the data directory weren't purged..m

On Aug 12, 2016 11:35, "Alexandre Drouin" 
wrote:

> Thanks for the offer however I think I have a different issue.  I 
> reinstalled my ZK and Solr servers between each tests so I didn't have 
> any unwanted files.
>
>
> Alexandre Drouin
>
> -Original Message-
> From: John Bickerstaff [mailto:j...@johnbickerstaff.com]
> Sent: August 12, 2016 1:43 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Getting "collection already exists" when creating 
> collection in admin UI
> Importance: High
>
> I don't know if this helps, but I had trouble creating collections due 
> to a number of issues and I think I got this error (I was using the 
> command line, not the UI)
>
> As I recall, if it exists in Zookeeper, it will error out.  It was a 
> while ago, but I think the way I had to solve it was to go into 
> Zookeeper and delete the "node".
>
> This was easier for me because I was using "chroot" in Zookeeper such 
> that each collection was separate - so all I had to do was delete the 
> entire node and start over.
>
> Take me with a grain of salt - it was a while ago.
>
> If you want, I have linux command lines for most / all of this... let 
> me know.
>
> On Fri, Aug 12, 2016 at 11:10 AM, Alexandre Drouin < 
> alexandre.dro...@orckestra.com> wrote:
>
> > Hi Esther-Melaine,
> >
> > The collection exists in Zookeeper under the /collections node and I 
> > can see the shardX_replicaX folders under $SOLR_HOME/server/solr of 
> > both servers.
> >
> > I was not able to replicate the issue using the collection API.  
> > Here are the logs where I added the 'MyNewerNode'
> > https://gist.github.com/orck-
> > adrouin/4d074cbb60141cba90c0aae9c55360d4
> >
> > I took a closer look at the admin UI and here are my findings:
> >   - In Chrome's devtool I can see the first create request
> >   - After 10 seconds the request getting aborted and a second create 
> > request is sent to the server
> >   - In Fiddler I can see that the first request completes 
> > successfully without any issues.  The second request is sent a few 
> > seconds before the first one ends so it looks like a admin UI issue.
> >
> > Is it possible that the admin UI has some kind of TTL for requests 
> > set to
> > 10 seconds?
> >
> > You mentioned something about the nodes going into recovery.  Any 
> > idea how I can fix this issue?
> >
> > My development environment (if it makes a difference):
> >   - OS: Windows
> >   - 2 Solr 6.1 nodes using SolrCloud.  They both are running on the 
> > same server using different ports.
> >   - Zookeeper 3.4.8
> >
> > Alexandre Drouin
> >
> >
> > -Original Message-
> > From: Esther-Melaine Quansah [mailto:esther.quan...@lucidworks.com]
> > Sent: August 12, 2016 10:46 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Getting "collection already exists" when creating 
> > collection in admin UI
> > Importance: High
> >
> > Hi Alexandre,
> >
> > The question here is why the create action is called twice. You’re 
> > getting that “collection already exists” error after the second 
> > action
> is called.
> > Can you verify if MyNewNode exists in /collections in ZK or on the 
> > machines running Solr at $SOLR_HOME/server/solr/ Your logs show a 
> > lot of issues around the overseer and it looks like those nodes are 
> > going into recovery pretty frequently. Can you replicate this issue 
> > by creating a collection through the API (not through the UI):
> >
> > http://localhost:8983/admin/collections?action=CREATE;
> > name=MyNewerNode=1=2&
> > maxShardsPerNode=1=DefaultConfig
> >
> > Thanks,
> > Esther
> >
> >
> > > On Aug 12, 2016, at 10:05 AM, Alexandre Drouin <
> > alexandre.dro...@orckestra.com> wrote:
> > >
> > > Hello,
> > >
> > > I am running SolrCloud with 2 nodes (Solr 6.1 with SSL and basic
> > > auth)
> > and with one Zookeeper node (for development purposes) and when I 
> > try to create a new collection in the admin UI with 'replicationFactor=2'
> > I get a "Connection to Solr lost" message and another message 
> > telling me
> "
> > collection already exists: MyNewNode".  I made sure that a 
> > collection with the same name does not exists and the issue does not 
> > appear with a replication factor of 1.
> > >
> > > While debugging I saw that the create action is called twice with 
> > > the following parameters:
> > > /solr/admin/collections?_=1471010473184=CREATE
> > > on
> > > fi
> > > gName=DefaultConfig=1=aaa=1
> > > at io 
> > > nFactor=2=compositeId=compositeId=json
> > >
> > > Can anyone 

RE: Getting "collection already exists" when creating collection in admin UI

2016-08-12 Thread Erick Erickson
Reinstalling ZK wouldn't help if the data directory weren't purged..m

On Aug 12, 2016 11:35, "Alexandre Drouin" 
wrote:

> Thanks for the offer however I think I have a different issue.  I
> reinstalled my ZK and Solr servers between each tests so I didn't have any
> unwanted files.
>
>
> Alexandre Drouin
>
> -Original Message-
> From: John Bickerstaff [mailto:j...@johnbickerstaff.com]
> Sent: August 12, 2016 1:43 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Getting "collection already exists" when creating collection
> in admin UI
> Importance: High
>
> I don't know if this helps, but I had trouble creating collections due to
> a number of issues and I think I got this error (I was using the command
> line, not the UI)
>
> As I recall, if it exists in Zookeeper, it will error out.  It was a while
> ago, but I think the way I had to solve it was to go into Zookeeper and
> delete the "node".
>
> This was easier for me because I was using "chroot" in Zookeeper such that
> each collection was separate - so all I had to do was delete the entire
> node and start over.
>
> Take me with a grain of salt - it was a while ago.
>
> If you want, I have linux command lines for most / all of this... let me
> know.
>
> On Fri, Aug 12, 2016 at 11:10 AM, Alexandre Drouin <
> alexandre.dro...@orckestra.com> wrote:
>
> > Hi Esther-Melaine,
> >
> > The collection exists in Zookeeper under the /collections node and I
> > can see the shardX_replicaX folders under $SOLR_HOME/server/solr of
> > both servers.
> >
> > I was not able to replicate the issue using the collection API.  Here
> > are the logs where I added the 'MyNewerNode'
> > https://gist.github.com/orck-
> > adrouin/4d074cbb60141cba90c0aae9c55360d4
> >
> > I took a closer look at the admin UI and here are my findings:
> >   - In Chrome's devtool I can see the first create request
> >   - After 10 seconds the request getting aborted and a second create
> > request is sent to the server
> >   - In Fiddler I can see that the first request completes successfully
> > without any issues.  The second request is sent a few seconds before
> > the first one ends so it looks like a admin UI issue.
> >
> > Is it possible that the admin UI has some kind of TTL for requests set
> > to
> > 10 seconds?
> >
> > You mentioned something about the nodes going into recovery.  Any idea
> > how I can fix this issue?
> >
> > My development environment (if it makes a difference):
> >   - OS: Windows
> >   - 2 Solr 6.1 nodes using SolrCloud.  They both are running on the
> > same server using different ports.
> >   - Zookeeper 3.4.8
> >
> > Alexandre Drouin
> >
> >
> > -Original Message-
> > From: Esther-Melaine Quansah [mailto:esther.quan...@lucidworks.com]
> > Sent: August 12, 2016 10:46 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Getting "collection already exists" when creating
> > collection in admin UI
> > Importance: High
> >
> > Hi Alexandre,
> >
> > The question here is why the create action is called twice. You’re
> > getting that “collection already exists” error after the second action
> is called.
> > Can you verify if MyNewNode exists in /collections in ZK or on the
> > machines running Solr at $SOLR_HOME/server/solr/ Your logs show a lot
> > of issues around the overseer and it looks like those nodes are going
> > into recovery pretty frequently. Can you replicate this issue by
> > creating a collection through the API (not through the UI):
> >
> > http://localhost:8983/admin/collections?action=CREATE;
> > name=MyNewerNode=1=2&
> > maxShardsPerNode=1=DefaultConfig
> >
> > Thanks,
> > Esther
> >
> >
> > > On Aug 12, 2016, at 10:05 AM, Alexandre Drouin <
> > alexandre.dro...@orckestra.com> wrote:
> > >
> > > Hello,
> > >
> > > I am running SolrCloud with 2 nodes (Solr 6.1 with SSL and basic
> > > auth)
> > and with one Zookeeper node (for development purposes) and when I try
> > to create a new collection in the admin UI with 'replicationFactor=2'
> > I get a "Connection to Solr lost" message and another message telling me
> "
> > collection already exists: MyNewNode".  I made sure that a collection
> > with the same name does not exists and the issue does not appear with
> > a replication factor of 1.
> > >
> > > While debugging I saw that the create action is called twice with
> > > the following parameters:
> > > /solr/admin/collections?_=1471010473184=CREATE
> > > fi
> > > gName=DefaultConfig=1=aaa=1
> > > io nFactor=2=compositeId=compositeId=json
> > >
> > > Can anyone replicate this issue?  I have not found it in JIRA.
> > >
> > >
> > > Below is the relevant log (if useful) and I posted the full logs
> > > here
> > > https://gist.github.com/orck-adrouin/690d485ba0835320273e7b2e09fb377
> > > 1
> > >
> > > 63549 ERROR
> > > (OverseerThreadFactory-5-thread-5-processing-n:orc-dev-solr-cd.local
> > > :8444_solr)
> > [   ] o.a.s.c.OverseerCollectionMessageHandler Collection: MyNewNode
> > operation: create 

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread John Bickerstaff
Thanks - I'll look at it...

On Fri, Aug 12, 2016 at 1:21 PM, Erick Erickson 
wrote:

> Maybe rerankqparserplugin?
>
> On Aug 12, 2016 11:54, "John Bickerstaff" 
> wrote:
>
> > @Hossman --  thanks again.
> >
> > I've made the following change and so far things look good.  I couldn't
> see
> > debug or find results for what I put in for $func, so I just removed it,
> > but making modifications as you suggested appears to be working.
> >
> > Including the actual line from my endpoint XML in case this thread helps
> > someone else...
> >
> > {!boost defType=synonym_edismax qf='title' synonyms='true'
> > synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq=''
> > v=$q}
> >
> > On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff <
> > j...@johnbickerstaff.com
> > > wrote:
> >
> > > Thanks!  I'll check it out.
> > >
> > > On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar  >
> > > wrote:
> > >
> > >> Not exactly sure what you are looking from chaining the results but
> > >> similar
> > >> functionality is available in Streaming expressions where result of
> > inner
> > >> expressions are passed to outer expressions and so on
> > >> https://cwiki.apache.org/confluence/display/solr/
> Streaming+Expressions
> > >>
> > >> HTH
> > >> Susheel
> > >>
> > >> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff <
> > >> j...@johnbickerstaff.com>
> > >> wrote:
> > >>
> > >> > Hossman - many thanks again for your comprehensive and very helpful
> > >> answer!
> > >> >
> > >> > All,
> > >> >
> > >> > I am (possibly mis-remembering) reading something about being able
> to
> > >> pass
> > >> > the results of one query to another query...  Essentially "chaining"
> > >> result
> > >> > sets.
> > >> >
> > >> > I have looked in docs and can't find anything on a quick search -- I
> > may
> > >> > have been reading about the Re-Ranking feature, which doesn't help
> me
> > (I
> > >> > know because I just tried and it seems to return all results anyway,
> > >> just
> > >> > re-ranking the number specified in the reRankDocs flag...)
> > >> >
> > >> > Is there a way to (cleanly) send the results of one query to another
> > >> query
> > >> > for further processing?  Essentially, pass ONLY the results
> (including
> > >> an
> > >> > empty set of results) to another query for processing?
> > >> >
> > >> > thanks...
> > >> >
> > >> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
> > >> > j...@johnbickerstaff.com>
> > >> > wrote:
> > >> >
> > >> > > Thanks!
> > >> > >
> > >> > > To answer your questions, while I digest the rest of that
> > >> information...
> > >> > >
> > >> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> > >> > > https://github.com/healthonnet/hon-lucene-synonyms
> > >> > >
> > >> > > The config looks like this - and IIRC, is simply a copy from the
> > >> > > recommended cofig on the site mentioned above.
> > >> > >
> > >> > >   class="com.github.healthonnet.
> > >> > search.
> > >> > > SynonymExpandingExtendedDismaxQParserPlugin">
> > >> > > 
> > >> > > 
> > >> > >   
> > >> > >   
> > >> > > 
> > >> > > 
> > >> > >   solr.PatternTokenizerFactory
> > >> > >   
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > >   solr.ShingleFilterFactory
> > >> > >   true
> > >> > >   true
> > >> > >   2
> > >> > >   4
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > >   solr.SynonymFilterFactory
> > >> > >   solr.
> > >> > KeywordTokenizerFactory
> > >> > >   example_synonym_file.txt
> > >> > >   true
> > >> > >   true
> > >> > > 
> > >> > >   
> > >> > > 
> > >> > >   
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <
> > >> > hossman_luc...@fucit.org
> > >> > > > wrote:
> > >> > >
> > >> > >>
> > >> > >> : First let me say that this is very possibly the "x - y problem"
> > so
> > >> let
> > >> > >> me
> > >> > >> : state up front what my ultimate need is -- then I'll ask about
> > the
> > >> > >> thing I
> > >> > >> : imagine might help...  which, of course, is heavily biased in
> the
> > >> > >> direction
> > >> > >> : of my experience coding Java and writing SQL...
> > >> > >>
> > >> > >> Thank you so much for asking your question this way!
> > >> > >>
> > >> > >> Right off the bat, the background you've provided seems
> > supicious...
> > >> > >>
> > >> > >> : I have a piece of a query that calculates a score based on a
> > >> > "weighting"
> > >> > >> ...
> > >> > >> : The specific line is this:
> > >> > >> : product(field(category_weight),20)
> > >> > >> :
> > >> > >> : What I just realized is that when I query Solr for a string
> that
> > >> has
> > >> > NO
> > >> > >> : matches in the entire corpus, I still get a slew of results
> > because
> > >> > >> EVERY
> > >> > >> : doc has the weighting 

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread Erick Erickson
Maybe rerankqparserplugin?

On Aug 12, 2016 11:54, "John Bickerstaff"  wrote:

> @Hossman --  thanks again.
>
> I've made the following change and so far things look good.  I couldn't see
> debug or find results for what I put in for $func, so I just removed it,
> but making modifications as you suggested appears to be working.
>
> Including the actual line from my endpoint XML in case this thread helps
> someone else...
>
> {!boost defType=synonym_edismax qf='title' synonyms='true'
> synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq=''
> v=$q}
>
> On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff <
> j...@johnbickerstaff.com
> > wrote:
>
> > Thanks!  I'll check it out.
> >
> > On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar 
> > wrote:
> >
> >> Not exactly sure what you are looking from chaining the results but
> >> similar
> >> functionality is available in Streaming expressions where result of
> inner
> >> expressions are passed to outer expressions and so on
> >> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> >>
> >> HTH
> >> Susheel
> >>
> >> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff <
> >> j...@johnbickerstaff.com>
> >> wrote:
> >>
> >> > Hossman - many thanks again for your comprehensive and very helpful
> >> answer!
> >> >
> >> > All,
> >> >
> >> > I am (possibly mis-remembering) reading something about being able to
> >> pass
> >> > the results of one query to another query...  Essentially "chaining"
> >> result
> >> > sets.
> >> >
> >> > I have looked in docs and can't find anything on a quick search -- I
> may
> >> > have been reading about the Re-Ranking feature, which doesn't help me
> (I
> >> > know because I just tried and it seems to return all results anyway,
> >> just
> >> > re-ranking the number specified in the reRankDocs flag...)
> >> >
> >> > Is there a way to (cleanly) send the results of one query to another
> >> query
> >> > for further processing?  Essentially, pass ONLY the results (including
> >> an
> >> > empty set of results) to another query for processing?
> >> >
> >> > thanks...
> >> >
> >> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
> >> > j...@johnbickerstaff.com>
> >> > wrote:
> >> >
> >> > > Thanks!
> >> > >
> >> > > To answer your questions, while I digest the rest of that
> >> information...
> >> > >
> >> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> >> > > https://github.com/healthonnet/hon-lucene-synonyms
> >> > >
> >> > > The config looks like this - and IIRC, is simply a copy from the
> >> > > recommended cofig on the site mentioned above.
> >> > >
> >> > >  
> >> > > 
> >> > > 
> >> > >   
> >> > >   
> >> > > 
> >> > > 
> >> > >   solr.PatternTokenizerFactory
> >> > >   
> >> > > 
> >> > > 
> >> > > 
> >> > >   solr.ShingleFilterFactory
> >> > >   true
> >> > >   true
> >> > >   2
> >> > >   4
> >> > > 
> >> > > 
> >> > > 
> >> > >   solr.SynonymFilterFactory
> >> > >   solr.
> >> > KeywordTokenizerFactory
> >> > >   example_synonym_file.txt
> >> > >   true
> >> > >   true
> >> > > 
> >> > >   
> >> > > 
> >> > >   
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <
> >> > hossman_luc...@fucit.org
> >> > > > wrote:
> >> > >
> >> > >>
> >> > >> : First let me say that this is very possibly the "x - y problem"
> so
> >> let
> >> > >> me
> >> > >> : state up front what my ultimate need is -- then I'll ask about
> the
> >> > >> thing I
> >> > >> : imagine might help...  which, of course, is heavily biased in the
> >> > >> direction
> >> > >> : of my experience coding Java and writing SQL...
> >> > >>
> >> > >> Thank you so much for asking your question this way!
> >> > >>
> >> > >> Right off the bat, the background you've provided seems
> supicious...
> >> > >>
> >> > >> : I have a piece of a query that calculates a score based on a
> >> > "weighting"
> >> > >> ...
> >> > >> : The specific line is this:
> >> > >> : product(field(category_weight),20)
> >> > >> :
> >> > >> : What I just realized is that when I query Solr for a string that
> >> has
> >> > NO
> >> > >> : matches in the entire corpus, I still get a slew of results
> because
> >> > >> EVERY
> >> > >> : doc has the weighting value in the category_weight field - and
> >> > therefore
> >> > >> : every doc gets some score.
> >> > >>
> >> > >> ...that is *NOT* how dismax and edisamx normally work.
> >> > >>
> >> > >> While both the "bf" abd "bq" params result in "additive" boosting,
> >> and
> >> > the
> >> > >> implementation of that "additive boost" comes from adding new
> >> optional
> >> > >> clauses to the top level BooleanQuery that is executed, that only
> >> > happens
> >> > >> after the "main" query (from your "q" param) is added to that top
> >> 

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread John Bickerstaff
@Hossman --  thanks again.

I've made the following change and so far things look good.  I couldn't see
debug or find results for what I put in for $func, so I just removed it,
but making modifications as you suggested appears to be working.

Including the actual line from my endpoint XML in case this thread helps
someone else...

{!boost defType=synonym_edismax qf='title' synonyms='true'
synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq=''
v=$q}

On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff  wrote:

> Thanks!  I'll check it out.
>
> On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar 
> wrote:
>
>> Not exactly sure what you are looking from chaining the results but
>> similar
>> functionality is available in Streaming expressions where result of inner
>> expressions are passed to outer expressions and so on
>> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>>
>> HTH
>> Susheel
>>
>> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff <
>> j...@johnbickerstaff.com>
>> wrote:
>>
>> > Hossman - many thanks again for your comprehensive and very helpful
>> answer!
>> >
>> > All,
>> >
>> > I am (possibly mis-remembering) reading something about being able to
>> pass
>> > the results of one query to another query...  Essentially "chaining"
>> result
>> > sets.
>> >
>> > I have looked in docs and can't find anything on a quick search -- I may
>> > have been reading about the Re-Ranking feature, which doesn't help me (I
>> > know because I just tried and it seems to return all results anyway,
>> just
>> > re-ranking the number specified in the reRankDocs flag...)
>> >
>> > Is there a way to (cleanly) send the results of one query to another
>> query
>> > for further processing?  Essentially, pass ONLY the results (including
>> an
>> > empty set of results) to another query for processing?
>> >
>> > thanks...
>> >
>> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
>> > j...@johnbickerstaff.com>
>> > wrote:
>> >
>> > > Thanks!
>> > >
>> > > To answer your questions, while I digest the rest of that
>> information...
>> > >
>> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
>> > > https://github.com/healthonnet/hon-lucene-synonyms
>> > >
>> > > The config looks like this - and IIRC, is simply a copy from the
>> > > recommended cofig on the site mentioned above.
>> > >
>> > >  
>> > > 
>> > > 
>> > >   
>> > >   
>> > > 
>> > > 
>> > >   solr.PatternTokenizerFactory
>> > >   
>> > > 
>> > > 
>> > > 
>> > >   solr.ShingleFilterFactory
>> > >   true
>> > >   true
>> > >   2
>> > >   4
>> > > 
>> > > 
>> > > 
>> > >   solr.SynonymFilterFactory
>> > >   solr.
>> > KeywordTokenizerFactory
>> > >   example_synonym_file.txt
>> > >   true
>> > >   true
>> > > 
>> > >   
>> > > 
>> > >   
>> > >
>> > >
>> > >
>> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <
>> > hossman_luc...@fucit.org
>> > > > wrote:
>> > >
>> > >>
>> > >> : First let me say that this is very possibly the "x - y problem" so
>> let
>> > >> me
>> > >> : state up front what my ultimate need is -- then I'll ask about the
>> > >> thing I
>> > >> : imagine might help...  which, of course, is heavily biased in the
>> > >> direction
>> > >> : of my experience coding Java and writing SQL...
>> > >>
>> > >> Thank you so much for asking your question this way!
>> > >>
>> > >> Right off the bat, the background you've provided seems supicious...
>> > >>
>> > >> : I have a piece of a query that calculates a score based on a
>> > "weighting"
>> > >> ...
>> > >> : The specific line is this:
>> > >> : product(field(category_weight),20)
>> > >> :
>> > >> : What I just realized is that when I query Solr for a string that
>> has
>> > NO
>> > >> : matches in the entire corpus, I still get a slew of results because
>> > >> EVERY
>> > >> : doc has the weighting value in the category_weight field - and
>> > therefore
>> > >> : every doc gets some score.
>> > >>
>> > >> ...that is *NOT* how dismax and edisamx normally work.
>> > >>
>> > >> While both the "bf" abd "bq" params result in "additive" boosting,
>> and
>> > the
>> > >> implementation of that "additive boost" comes from adding new
>> optional
>> > >> clauses to the top level BooleanQuery that is executed, that only
>> > happens
>> > >> after the "main" query (from your "q" param) is added to that top
>> level
>> > >> BooleanQuery as a "mandaory" clause.
>> > >>
>> > >> So, for example, "bf=true()" and "bq=*:*" should match & boost every
>> > doc,
>> > >> but with the techprducts configs/data these requests still don't
>> match
>> > >> anything...
>> > >>
>> > >> /select?defType=edismax=bogus=true()=*:*=query
>> > >> /select?defType=dismax=bogus=true()=*:*=query
>> > >>
>> > >> ...and if you look at the debug 

Re: Effects of insert order on query performance

2016-08-12 Thread Jeff Wartes
Thanks Emir. I’m unfortunately already using a routing key that needs to be at 
the top level, since I’m collapsing on that field. 

Adding a sub-key won’t help much if my theory is correct, as even a single 
shard (distrib=false) showed serious performance degradation, and query latency 
is the max(shard latency). I’d need a routing scheme that assured that a given 
shard has *only* A’s, or *only* B’s.

Even if I could use “permissions” as the top-level routing key though, this is 
a very low cardinality field, so I’d expect to end up with very large 
differences between the sizes of the shards in that case. That’s fine from a 
SolrCloud query perspective of course, but it makes for more difficult resource 
provisioning.


On 8/12/16, 1:39 AM, "Emir Arnautovic"  wrote:

Hi Jeff,

I will not comment on your theory (will let that to guys more familiar 
with Lucene code) but will point to one alternative solution: routing. 
You can use routing to split documents with different permission to 
different shards and use composite hash routing to split "A" (and maybe 
"B" as well) documents to multiple shards. That will make sure all doc 
with the same permission are on the same shard and on query time only 
those will be queried (less shards to query) and there is no need to 
include term query or filter query at all.

Here is blog explaining benefits of composite hash routing: 
https://sematext.com/blog/2015/09/29/solrcloud-large-tenants-and-routing/

Regards,
Emir

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On 11.08.2016 19:39, Jeff Wartes wrote:
> This isn’t really a question, although some validation would be nice. 
It’s more of a warning.
>
> Tldr is that the insert order of documents in my collection appears to 
have had a huge effect on my query speed.
>
>
> I have a very large (sharded) SolrCloud 5.4 index. One aspect of this 
index is a multi-valued field (“permissions”) that for 90% of docs contains one 
particular value, (“A”) and for 10% of docs contains another distinct value. 
(“B”) It’s intended to represent something like permissions, so more values are 
possible in the future, but not present currently. In fact, the addition of 
docs with value B to this index was very recent, previously all docs had value 
“A”. All queries, in addition to various other Boolean-query type restrictions, 
have a terms query on this field, like {!terms f=permissions v=A} or {!terms 
f=permissions v=A,B}
>
> Last week, I tried to re-index the whole collection from scratch, using 
source data. Query performance on the resulting re-index proved to be abysmal, 
I could get barely 10% of my previous query throughput, and even that was at 
latencies that were orders of magnitude higher than what I had in production.
>
> I hooked up some CPU profiling to a server that had shards from both the 
old and new version of the collection, and eventually it looked like the 
significant difference in processing the two collections was coming from 
ConstantWeight.scorer()
> Specifically, this line
> 
https://github.com/apache/lucene-solr/blob/0a1dd10d5262153f4188dfa14a08ba28ec4ccb60/solr/core/src/java/org/apache/solr/search/SolrConstantScoreQuery.java#L102
> was far more expensive in my re-indexed collection. From there, the call 
chain goes through an LRUQueryCache, down to a BulkScorer, and ends up with the 
extra work happening here:
> 
https://github.com/apache/lucene-solr/blob/0a1dd10d5262153f4188dfa14a08ba28ec4ccb60/lucene/core/src/java/org/apache/lucene/search/Weight.java#L169
>
> I don’t pretend to understand all that code, but the difference in my 
re-index appears to have something to do either with that cache, or the 
aggregate docIdSets that need weights generated is simply much bigger in my 
re-index.
>
>
> But the queries didn’t change, and the data is basically the same, what 
else could have changed?
>
> The documents with the “B” distinct value were added recently to the 
high-performance collection, but the A’s and the B’s were all mixed up in the 
source data dump I used to re-index. On a hunch, I manually ordered the docs 
such that the A’s were all first and re-indexed again, and performance is great!
>
> Here’s my theory: Using TieredMergePolicy, the vast quantity of the 
documents in an index are contained in the largest segments. I’m guessing 
there’s an optimization somewhere that says something like “This segment only 
has A’s”. By indexing all the A’s first, those biggest segments only contain 
A’s, and only the smallest, newest segments are unable to make use of that 
optimization.
>
> Here’s the scary part: Although my re-index is now performing well, if 
this theory is right, some random insert (or a 

RE: Getting "collection already exists" when creating collection in admin UI

2016-08-12 Thread Alexandre Drouin
Thanks for the offer however I think I have a different issue.  I reinstalled 
my ZK and Solr servers between each tests so I didn't have any unwanted files. 


Alexandre Drouin

-Original Message-
From: John Bickerstaff [mailto:j...@johnbickerstaff.com] 
Sent: August 12, 2016 1:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Getting "collection already exists" when creating collection in 
admin UI
Importance: High

I don't know if this helps, but I had trouble creating collections due to a 
number of issues and I think I got this error (I was using the command line, 
not the UI)

As I recall, if it exists in Zookeeper, it will error out.  It was a while ago, 
but I think the way I had to solve it was to go into Zookeeper and delete the 
"node".

This was easier for me because I was using "chroot" in Zookeeper such that each 
collection was separate - so all I had to do was delete the entire node and 
start over.

Take me with a grain of salt - it was a while ago.

If you want, I have linux command lines for most / all of this... let me know.

On Fri, Aug 12, 2016 at 11:10 AM, Alexandre Drouin < 
alexandre.dro...@orckestra.com> wrote:

> Hi Esther-Melaine,
>
> The collection exists in Zookeeper under the /collections node and I 
> can see the shardX_replicaX folders under $SOLR_HOME/server/solr of 
> both servers.
>
> I was not able to replicate the issue using the collection API.  Here 
> are the logs where I added the 'MyNewerNode' 
> https://gist.github.com/orck-
> adrouin/4d074cbb60141cba90c0aae9c55360d4
>
> I took a closer look at the admin UI and here are my findings:
>   - In Chrome's devtool I can see the first create request
>   - After 10 seconds the request getting aborted and a second create 
> request is sent to the server
>   - In Fiddler I can see that the first request completes successfully 
> without any issues.  The second request is sent a few seconds before 
> the first one ends so it looks like a admin UI issue.
>
> Is it possible that the admin UI has some kind of TTL for requests set 
> to
> 10 seconds?
>
> You mentioned something about the nodes going into recovery.  Any idea 
> how I can fix this issue?
>
> My development environment (if it makes a difference):
>   - OS: Windows
>   - 2 Solr 6.1 nodes using SolrCloud.  They both are running on the 
> same server using different ports.
>   - Zookeeper 3.4.8
>
> Alexandre Drouin
>
>
> -Original Message-
> From: Esther-Melaine Quansah [mailto:esther.quan...@lucidworks.com]
> Sent: August 12, 2016 10:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Getting "collection already exists" when creating 
> collection in admin UI
> Importance: High
>
> Hi Alexandre,
>
> The question here is why the create action is called twice. You’re 
> getting that “collection already exists” error after the second action is 
> called.
> Can you verify if MyNewNode exists in /collections in ZK or on the 
> machines running Solr at $SOLR_HOME/server/solr/ Your logs show a lot 
> of issues around the overseer and it looks like those nodes are going 
> into recovery pretty frequently. Can you replicate this issue by 
> creating a collection through the API (not through the UI):
>
> http://localhost:8983/admin/collections?action=CREATE;
> name=MyNewerNode=1=2&
> maxShardsPerNode=1=DefaultConfig
>
> Thanks,
> Esther
>
>
> > On Aug 12, 2016, at 10:05 AM, Alexandre Drouin <
> alexandre.dro...@orckestra.com> wrote:
> >
> > Hello,
> >
> > I am running SolrCloud with 2 nodes (Solr 6.1 with SSL and basic 
> > auth)
> and with one Zookeeper node (for development purposes) and when I try 
> to create a new collection in the admin UI with 'replicationFactor=2' 
> I get a "Connection to Solr lost" message and another message telling me "
> collection already exists: MyNewNode".  I made sure that a collection 
> with the same name does not exists and the issue does not appear with 
> a replication factor of 1.
> >
> > While debugging I saw that the create action is called twice with 
> > the following parameters:
> > /solr/admin/collections?_=1471010473184=CREATE
> > fi 
> > gName=DefaultConfig=1=aaa=1
> > io nFactor=2=compositeId=compositeId=json
> >
> > Can anyone replicate this issue?  I have not found it in JIRA.
> >
> >
> > Below is the relevant log (if useful) and I posted the full logs 
> > here
> > https://gist.github.com/orck-adrouin/690d485ba0835320273e7b2e09fb377
> > 1
> >
> > 63549 ERROR 
> > (OverseerThreadFactory-5-thread-5-processing-n:orc-dev-solr-cd.local
> > :8444_solr)
> [   ] o.a.s.c.OverseerCollectionMessageHandler Collection: MyNewNode
> operation: create failed:org.apache.solr.common.SolrException: 
> collection already exists: MyNewNode
> >   at org.apache.solr.cloud.OverseerCollectionMessageHandl
> er.createCollection(OverseerCollectionMessageHandler.java:1832)
> >   at org.apache.solr.cloud.OverseerCollectionMessageHandl
> er.processMessage(OverseerCollectionMessageHandler.java:224)
> >   at 

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread John Bickerstaff
Thanks!  I'll check it out.

On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar 
wrote:

> Not exactly sure what you are looking from chaining the results but similar
> functionality is available in Streaming expressions where result of inner
> expressions are passed to outer expressions and so on
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>
> HTH
> Susheel
>
> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > Hossman - many thanks again for your comprehensive and very helpful
> answer!
> >
> > All,
> >
> > I am (possibly mis-remembering) reading something about being able to
> pass
> > the results of one query to another query...  Essentially "chaining"
> result
> > sets.
> >
> > I have looked in docs and can't find anything on a quick search -- I may
> > have been reading about the Re-Ranking feature, which doesn't help me (I
> > know because I just tried and it seems to return all results anyway, just
> > re-ranking the number specified in the reRankDocs flag...)
> >
> > Is there a way to (cleanly) send the results of one query to another
> query
> > for further processing?  Essentially, pass ONLY the results (including an
> > empty set of results) to another query for processing?
> >
> > thanks...
> >
> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
> > j...@johnbickerstaff.com>
> > wrote:
> >
> > > Thanks!
> > >
> > > To answer your questions, while I digest the rest of that
> information...
> > >
> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> > > https://github.com/healthonnet/hon-lucene-synonyms
> > >
> > > The config looks like this - and IIRC, is simply a copy from the
> > > recommended cofig on the site mentioned above.
> > >
> > >  
> > > 
> > > 
> > >   
> > >   
> > > 
> > > 
> > >   solr.PatternTokenizerFactory
> > >   
> > > 
> > > 
> > > 
> > >   solr.ShingleFilterFactory
> > >   true
> > >   true
> > >   2
> > >   4
> > > 
> > > 
> > > 
> > >   solr.SynonymFilterFactory
> > >   solr.
> > KeywordTokenizerFactory
> > >   example_synonym_file.txt
> > >   true
> > >   true
> > > 
> > >   
> > > 
> > >   
> > >
> > >
> > >
> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <
> > hossman_luc...@fucit.org
> > > > wrote:
> > >
> > >>
> > >> : First let me say that this is very possibly the "x - y problem" so
> let
> > >> me
> > >> : state up front what my ultimate need is -- then I'll ask about the
> > >> thing I
> > >> : imagine might help...  which, of course, is heavily biased in the
> > >> direction
> > >> : of my experience coding Java and writing SQL...
> > >>
> > >> Thank you so much for asking your question this way!
> > >>
> > >> Right off the bat, the background you've provided seems supicious...
> > >>
> > >> : I have a piece of a query that calculates a score based on a
> > "weighting"
> > >> ...
> > >> : The specific line is this:
> > >> : product(field(category_weight),20)
> > >> :
> > >> : What I just realized is that when I query Solr for a string that has
> > NO
> > >> : matches in the entire corpus, I still get a slew of results because
> > >> EVERY
> > >> : doc has the weighting value in the category_weight field - and
> > therefore
> > >> : every doc gets some score.
> > >>
> > >> ...that is *NOT* how dismax and edisamx normally work.
> > >>
> > >> While both the "bf" abd "bq" params result in "additive" boosting, and
> > the
> > >> implementation of that "additive boost" comes from adding new optional
> > >> clauses to the top level BooleanQuery that is executed, that only
> > happens
> > >> after the "main" query (from your "q" param) is added to that top
> level
> > >> BooleanQuery as a "mandaory" clause.
> > >>
> > >> So, for example, "bf=true()" and "bq=*:*" should match & boost every
> > doc,
> > >> but with the techprducts configs/data these requests still don't match
> > >> anything...
> > >>
> > >> /select?defType=edismax=bogus=true()=*:*=query
> > >> /select?defType=dismax=bogus=true()=*:*=query
> > >>
> > >> ...and if you look at the debug output, the parsed queries shows that
> > the
> > >> "bogus" part of the query is mandatory...
> > >>
> > >> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*)
> > >> FunctionQuery(const(true))
> > >>
> > >> (i didn't use "pf" in that example, but the effect is the same, the
> "pf"
> > >> based clauses are optional, while the "qf" based clauses are
> mandatory)
> > >>
> > >> If you compare that example to your debug output, you'll notice a
> > >> difference in structure -- it's a bit hard to see in your example, but
> > if
> > >> you simplify your qf, pf, and q fields it should be more obvious, but
> > >> AFAICT the "main" parts of your query are getting wrapped in an extra
> > >> layer of parents (ie: an 

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread Susheel Kumar
Not exactly sure what you are looking from chaining the results but similar
functionality is available in Streaming expressions where result of inner
expressions are passed to outer expressions and so on
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions

HTH
Susheel

On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff 
wrote:

> Hossman - many thanks again for your comprehensive and very helpful answer!
>
> All,
>
> I am (possibly mis-remembering) reading something about being able to pass
> the results of one query to another query...  Essentially "chaining" result
> sets.
>
> I have looked in docs and can't find anything on a quick search -- I may
> have been reading about the Re-Ranking feature, which doesn't help me (I
> know because I just tried and it seems to return all results anyway, just
> re-ranking the number specified in the reRankDocs flag...)
>
> Is there a way to (cleanly) send the results of one query to another query
> for further processing?  Essentially, pass ONLY the results (including an
> empty set of results) to another query for processing?
>
> thanks...
>
> On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > Thanks!
> >
> > To answer your questions, while I digest the rest of that information...
> >
> > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> > https://github.com/healthonnet/hon-lucene-synonyms
> >
> > The config looks like this - and IIRC, is simply a copy from the
> > recommended cofig on the site mentioned above.
> >
> >  
> > 
> > 
> >   
> >   
> > 
> > 
> >   solr.PatternTokenizerFactory
> >   
> > 
> > 
> > 
> >   solr.ShingleFilterFactory
> >   true
> >   true
> >   2
> >   4
> > 
> > 
> > 
> >   solr.SynonymFilterFactory
> >   solr.
> KeywordTokenizerFactory
> >   example_synonym_file.txt
> >   true
> >   true
> > 
> >   
> > 
> >   
> >
> >
> >
> > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <
> hossman_luc...@fucit.org
> > > wrote:
> >
> >>
> >> : First let me say that this is very possibly the "x - y problem" so let
> >> me
> >> : state up front what my ultimate need is -- then I'll ask about the
> >> thing I
> >> : imagine might help...  which, of course, is heavily biased in the
> >> direction
> >> : of my experience coding Java and writing SQL...
> >>
> >> Thank you so much for asking your question this way!
> >>
> >> Right off the bat, the background you've provided seems supicious...
> >>
> >> : I have a piece of a query that calculates a score based on a
> "weighting"
> >> ...
> >> : The specific line is this:
> >> : product(field(category_weight),20)
> >> :
> >> : What I just realized is that when I query Solr for a string that has
> NO
> >> : matches in the entire corpus, I still get a slew of results because
> >> EVERY
> >> : doc has the weighting value in the category_weight field - and
> therefore
> >> : every doc gets some score.
> >>
> >> ...that is *NOT* how dismax and edisamx normally work.
> >>
> >> While both the "bf" abd "bq" params result in "additive" boosting, and
> the
> >> implementation of that "additive boost" comes from adding new optional
> >> clauses to the top level BooleanQuery that is executed, that only
> happens
> >> after the "main" query (from your "q" param) is added to that top level
> >> BooleanQuery as a "mandaory" clause.
> >>
> >> So, for example, "bf=true()" and "bq=*:*" should match & boost every
> doc,
> >> but with the techprducts configs/data these requests still don't match
> >> anything...
> >>
> >> /select?defType=edismax=bogus=true()=*:*=query
> >> /select?defType=dismax=bogus=true()=*:*=query
> >>
> >> ...and if you look at the debug output, the parsed queries shows that
> the
> >> "bogus" part of the query is mandatory...
> >>
> >> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*)
> >> FunctionQuery(const(true))
> >>
> >> (i didn't use "pf" in that example, but the effect is the same, the "pf"
> >> based clauses are optional, while the "qf" based clauses are mandatory)
> >>
> >> If you compare that example to your debug output, you'll notice a
> >> difference in structure -- it's a bit hard to see in your example, but
> if
> >> you simplify your qf, pf, and q fields it should be more obvious, but
> >> AFAICT the "main" parts of your query are getting wrapped in an extra
> >> layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in
> >> the top level query ... i don't see *any* mandatory clauses in your top
> >> level BooleanQuery, which is why any match on a bf or bq function is
> >> enough to cause a document to match.
> >>
> >> I suspect the reason your parsed query structure is so diff has to do
> with
> >> this...
> >>
> >> :synonym_edismax>
> >>
> >>
> >> 1) how exactly is 

Re: Getting "collection already exists" when creating collection in admin UI

2016-08-12 Thread John Bickerstaff
I don't know if this helps, but I had trouble creating collections due to a
number of issues and I think I got this error (I was using the command
line, not the UI)

As I recall, if it exists in Zookeeper, it will error out.  It was a while
ago, but I think the way I had to solve it was to go into Zookeeper and
delete the "node".

This was easier for me because I was using "chroot" in Zookeeper such that
each collection was separate - so all I had to do was delete the entire
node and start over.

Take me with a grain of salt - it was a while ago.

If you want, I have linux command lines for most / all of this... let me
know.

On Fri, Aug 12, 2016 at 11:10 AM, Alexandre Drouin <
alexandre.dro...@orckestra.com> wrote:

> Hi Esther-Melaine,
>
> The collection exists in Zookeeper under the /collections node and I can
> see the shardX_replicaX folders under $SOLR_HOME/server/solr of both
> servers.
>
> I was not able to replicate the issue using the collection API.  Here are
> the logs where I added the 'MyNewerNode' https://gist.github.com/orck-
> adrouin/4d074cbb60141cba90c0aae9c55360d4
>
> I took a closer look at the admin UI and here are my findings:
>   - In Chrome's devtool I can see the first create request
>   - After 10 seconds the request getting aborted and a second create
> request is sent to the server
>   - In Fiddler I can see that the first request completes successfully
> without any issues.  The second request is sent a few seconds before the
> first one ends so it looks like a admin UI issue.
>
> Is it possible that the admin UI has some kind of TTL for requests set to
> 10 seconds?
>
> You mentioned something about the nodes going into recovery.  Any idea how
> I can fix this issue?
>
> My development environment (if it makes a difference):
>   - OS: Windows
>   - 2 Solr 6.1 nodes using SolrCloud.  They both are running on the same
> server using different ports.
>   - Zookeeper 3.4.8
>
> Alexandre Drouin
>
>
> -Original Message-
> From: Esther-Melaine Quansah [mailto:esther.quan...@lucidworks.com]
> Sent: August 12, 2016 10:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Getting "collection already exists" when creating collection
> in admin UI
> Importance: High
>
> Hi Alexandre,
>
> The question here is why the create action is called twice. You’re getting
> that “collection already exists” error after the second action is called.
> Can you verify if MyNewNode exists in /collections in ZK or on the machines
> running Solr at $SOLR_HOME/server/solr/ Your logs show a lot of issues
> around the overseer and it looks like those nodes are going into recovery
> pretty frequently. Can you replicate this issue by creating a collection
> through the API (not through the UI):
>
> http://localhost:8983/admin/collections?action=CREATE;
> name=MyNewerNode=1=2&
> maxShardsPerNode=1=DefaultConfig
>
> Thanks,
> Esther
>
>
> > On Aug 12, 2016, at 10:05 AM, Alexandre Drouin <
> alexandre.dro...@orckestra.com> wrote:
> >
> > Hello,
> >
> > I am running SolrCloud with 2 nodes (Solr 6.1 with SSL and basic auth)
> and with one Zookeeper node (for development purposes) and when I try to
> create a new collection in the admin UI with 'replicationFactor=2' I get a
> "Connection to Solr lost" message and another message telling me "
> collection already exists: MyNewNode".  I made sure that a collection with
> the same name does not exists and the issue does not appear with a
> replication factor of 1.
> >
> > While debugging I saw that the create action is called twice with the
> > following parameters:
> > /solr/admin/collections?_=1471010473184=CREATE
> > gName=DefaultConfig=1=aaa=1
> > nFactor=2=compositeId=compositeId=json
> >
> > Can anyone replicate this issue?  I have not found it in JIRA.
> >
> >
> > Below is the relevant log (if useful) and I posted the full logs here
> > https://gist.github.com/orck-adrouin/690d485ba0835320273e7b2e09fb3771
> >
> > 63549 ERROR 
> > (OverseerThreadFactory-5-thread-5-processing-n:orc-dev-solr-cd.local:8444_solr)
> [   ] o.a.s.c.OverseerCollectionMessageHandler Collection: MyNewNode
> operation: create failed:org.apache.solr.common.SolrException: collection
> already exists: MyNewNode
> >   at org.apache.solr.cloud.OverseerCollectionMessageHandl
> er.createCollection(OverseerCollectionMessageHandler.java:1832)
> >   at org.apache.solr.cloud.OverseerCollectionMessageHandl
> er.processMessage(OverseerCollectionMessageHandler.java:224)
> >   at org.apache.solr.cloud.OverseerTaskProcessor$Runner.
> run(OverseerTaskProcessor.java:463)
> >   at org.apache.solr.common.util.ExecutorUtil$
> MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
> >   at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> >   at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> >   at java.lang.Thread.run(Thread.java:745)
> >
> > Thanks,
> > Alexandre Drouin
>
>


ConcurrentModificationException due to high volume of requests

2016-08-12 Thread Katherine Mora
Hi All,

We are using Solr 5.2.1 in a production environment where we have a high volume 
of requests. We are not having any timeouts or performance issues. However, the 
logs are filled with this exception.
We've been researching and there isn't a lot of information about this problem 
or why it happens. We have tried to reproduce it by sending queries using 
JMeter and it only happens when multiple queries are sent at the same time.

Checking the code we found that this exception is within the core Solr code and 
part of the exception caught our attention as it is logging debug information 
(we are using INFO logging level) 
(org.apache.solr.search.stats.LocalStatsCache.get(LocalStatsCache.java:40)).

Has anyone seen this exception before? Would it be OK to generate a patch? We 
were thinking about commenting the debug line or adding try/catch statements.

Thank you!


null:java.util.ConcurrentModificationException
 at 
java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394)
 at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:413)
 at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:412)
 at 
org.apache.solr.common.params.ModifiableSolrParams.toString(ModifiableSolrParams.java:201)
 at java.lang.String.valueOf(String.java:2849)
 at java.lang.StringBuilder.append(StringBuilder.java:128)
 at 
org.apache.solr.request.SolrQueryRequestBase.toString(SolrQueryRequestBase.java:165)
 at 
org.apache.solr.search.stats.LocalStatsCache.get(LocalStatsCache.java:40)
 at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:366)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
 at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
 at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
 at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
 at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
 at 
org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:95)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1129)
 at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
 at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
 at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
 at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
 at org.eclipse.jetty.server.Server.handle(Server.java:497)
 at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
 at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
 at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
 at java.lang.Thread.run(Thread.java:745)


KATHERINE MORA
Senior Engineer



Re: Consume sql response using solrj

2016-08-12 Thread Pablo Anzorena
Thanks Joel, that's work perfectly well. I checked some cases and the data
is consistent.



2016-08-11 14:17 GMT-03:00 Joel Bernstein :

> Actually try this:
>
> select a from b where _query_='a:b'
>
> *This produces the query:*
>
> (_query_:"a:b")
>
> which should run.
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Aug 11, 2016 at 1:04 PM, Joel Bernstein 
> wrote:
>
> > There are no test cases for this but you can try this syntax:
> >
> > select a from b where _query_=(a:c AND d:f)
> >
> > This should get translated to:
> >
> > _query_:(a:c AND d:f)
> >
> > This link describes the behavior of _query_ https://lucidworks.
> > com/blog/2009/03/31/nested-queries-in-solr/
> >
> > Just not positive how the SQL parser will treat the : in the query.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Aug 11, 2016 at 12:22 PM, Pablo Anzorena <
> anzorena.f...@gmail.com>
> > wrote:
> >
> >> Joel, one more thing.
> >>
> >> Is there anyway to use the sql and the lucene query syntax? The thing is
> >> that my bussiness application is tightly coupled with the lucene query
> >> syntax, so I need a way to use both the sql features (without the where
> >> clause) and the query syntax of lucene.
> >>
> >> Thanks.
> >>
> >> 2016-08-11 11:40 GMT-03:00 Pablo Anzorena :
> >>
> >> > Excellent!
> >> >
> >> > Thanks Joel
> >> >
> >> > 2016-08-11 11:19 GMT-03:00 Joel Bernstein :
> >> >
> >> >> There are two ways to do this with SolrJ:
> >> >>
> >> >> 1) Use the JDBC driver.
> >> >>
> >> >> 2) Use the SolrStream to send the request and then read() the Tuples.
> >> This
> >> >> is what the JDBC driver does under the covers. The sample code can be
> >> >> found
> >> >> here:
> >> >> https://github.com/apache/lucene-solr/blob/master/solr/solrj
> >> >> /src/java/org/apache/solr/client/solrj/io/sql/StatementImpl.java
> >> >>
> >> >> The constructStream() method creates a SolrStream with the request.
> >> >>
> >> >> Joel Bernstein
> >> >> http://joelsolr.blogspot.com/
> >> >>
> >> >> On Thu, Aug 11, 2016 at 10:05 AM, Pablo Anzorena <
> >> anzorena.f...@gmail.com
> >> >> >
> >> >> wrote:
> >> >>
> >> >> > Hey,
> >> >> >
> >> >> > I'm trying to get the response of solr via QueryResponse using
> >> >> > QueryResponse queryResponse = client.query(solrParams); (where
> client
> >> >> is a
> >> >> > CloudSolrClient)
> >> >> >
> >> >> > The error it thows is:
> >> >> >
> >> >> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrE
> >> xception:
> >> >> > Error
> >> >> > from server at http://tywin:8983/solr/testcol
> >> lection1_shard1_replica1:
> >> >> > Expected mime type application/octet-stream but got text/plain.
> >> >> > {"result-set":{"docs":[
> >> >> > {"count(*)":5304,"d1":2},
> >> >> > {"count(*)":5160,"d1":1},
> >> >> > {"count(*)":5016,"d1":3},
> >> >> > {"count(*)":4893,"d1":4},
> >> >> > {"count(*)":4824,"d1":5},
> >> >> > {"EOF":true,"RESPONSE_TIME":11}]}}
> >> >> > at
> >> >> > org.apache.solr.client.solrj.impl.HttpSolrClient.
> >> >> > executeMethod(HttpSolrClient.java:558)
> >> >> >
> >> >> > Then I tryed to implement a custom ResponseParser that override the
> >> >> > getContentType() and returns "text/plain", but it returns another
> >> error.
> >> >> >
> >> >> > So... Is it a way to get the sql response via this method?
> >> >> >
> >> >> > I make it works via Connection and ResultSets, but I need to use
> the
> >> >> other
> >> >> > way (if possible).
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>


RE: Getting "collection already exists" when creating collection in admin UI

2016-08-12 Thread Alexandre Drouin
Hi Esther-Melaine,

The collection exists in Zookeeper under the /collections node and I can see 
the shardX_replicaX folders under $SOLR_HOME/server/solr of both servers.

I was not able to replicate the issue using the collection API.  Here are the 
logs where I added the 'MyNewerNode' 
https://gist.github.com/orck-adrouin/4d074cbb60141cba90c0aae9c55360d4

I took a closer look at the admin UI and here are my findings:
  - In Chrome's devtool I can see the first create request
  - After 10 seconds the request getting aborted and a second create request is 
sent to the server
  - In Fiddler I can see that the first request completes successfully without 
any issues.  The second request is sent a few seconds before the first one ends 
so it looks like a admin UI issue.

Is it possible that the admin UI has some kind of TTL for requests set to 10 
seconds?

You mentioned something about the nodes going into recovery.  Any idea how I 
can fix this issue?  

My development environment (if it makes a difference):
  - OS: Windows
  - 2 Solr 6.1 nodes using SolrCloud.  They both are running on the same server 
using different ports.
  - Zookeeper 3.4.8

Alexandre Drouin


-Original Message-
From: Esther-Melaine Quansah [mailto:esther.quan...@lucidworks.com] 
Sent: August 12, 2016 10:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Getting "collection already exists" when creating collection in 
admin UI
Importance: High

Hi Alexandre,

The question here is why the create action is called twice. You’re getting that 
“collection already exists” error after the second action is called. Can you 
verify if MyNewNode exists in /collections in ZK or on the machines running 
Solr at $SOLR_HOME/server/solr/ Your logs show a lot of issues around the 
overseer and it looks like those nodes are going into recovery pretty 
frequently. Can you replicate this issue by creating a collection through the 
API (not through the UI): 

http://localhost:8983/admin/collections?action=CREATE=MyNewerNode=1=2=1=DefaultConfig

Thanks,
Esther


> On Aug 12, 2016, at 10:05 AM, Alexandre Drouin 
>  wrote:
> 
> Hello,
> 
> I am running SolrCloud with 2 nodes (Solr 6.1 with SSL and basic auth) and 
> with one Zookeeper node (for development purposes) and when I try to create a 
> new collection in the admin UI with 'replicationFactor=2' I get a  
> "Connection to Solr lost" message and another message telling me " collection 
> already exists: MyNewNode".  I made sure that a collection with the same name 
> does not exists and the issue does not appear with a replication factor of 1. 
>  
> 
> While debugging I saw that the create action is called twice with the 
> following parameters: 
> /solr/admin/collections?_=1471010473184=CREATE
> gName=DefaultConfig=1=aaa=1
> nFactor=2=compositeId=compositeId=json
> 
> Can anyone replicate this issue?  I have not found it in JIRA.
> 
> 
> Below is the relevant log (if useful) and I posted the full logs here 
> https://gist.github.com/orck-adrouin/690d485ba0835320273e7b2e09fb3771
> 
> 63549 ERROR 
> (OverseerThreadFactory-5-thread-5-processing-n:orc-dev-solr-cd.local:8444_solr)
>  [   ] o.a.s.c.OverseerCollectionMessageHandler Collection: MyNewNode 
> operation: create failed:org.apache.solr.common.SolrException: collection 
> already exists: MyNewNode
>   at 
> org.apache.solr.cloud.OverseerCollectionMessageHandler.createCollection(OverseerCollectionMessageHandler.java:1832)
>   at 
> org.apache.solr.cloud.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:224)
>   at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:463)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> 
> Thanks,
> Alexandre Drouin



Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread John Bickerstaff
Hossman - many thanks again for your comprehensive and very helpful answer!

All,

I am (possibly mis-remembering) reading something about being able to pass
the results of one query to another query...  Essentially "chaining" result
sets.

I have looked in docs and can't find anything on a quick search -- I may
have been reading about the Re-Ranking feature, which doesn't help me (I
know because I just tried and it seems to return all results anyway, just
re-ranking the number specified in the reRankDocs flag...)

Is there a way to (cleanly) send the results of one query to another query
for further processing?  Essentially, pass ONLY the results (including an
empty set of results) to another query for processing?

thanks...

On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff 
wrote:

> Thanks!
>
> To answer your questions, while I digest the rest of that information...
>
> I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> https://github.com/healthonnet/hon-lucene-synonyms
>
> The config looks like this - and IIRC, is simply a copy from the
> recommended cofig on the site mentioned above.
>
>  
> 
> 
>   
>   
> 
> 
>   solr.PatternTokenizerFactory
>   
> 
> 
> 
>   solr.ShingleFilterFactory
>   true
>   true
>   2
>   4
> 
> 
> 
>   solr.SynonymFilterFactory
>   solr.KeywordTokenizerFactory
>   example_synonym_file.txt
>   true
>   true
> 
>   
> 
>   
>
>
>
> On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter  > wrote:
>
>>
>> : First let me say that this is very possibly the "x - y problem" so let
>> me
>> : state up front what my ultimate need is -- then I'll ask about the
>> thing I
>> : imagine might help...  which, of course, is heavily biased in the
>> direction
>> : of my experience coding Java and writing SQL...
>>
>> Thank you so much for asking your question this way!
>>
>> Right off the bat, the background you've provided seems supicious...
>>
>> : I have a piece of a query that calculates a score based on a "weighting"
>> ...
>> : The specific line is this:
>> : product(field(category_weight),20)
>> :
>> : What I just realized is that when I query Solr for a string that has NO
>> : matches in the entire corpus, I still get a slew of results because
>> EVERY
>> : doc has the weighting value in the category_weight field - and therefore
>> : every doc gets some score.
>>
>> ...that is *NOT* how dismax and edisamx normally work.
>>
>> While both the "bf" abd "bq" params result in "additive" boosting, and the
>> implementation of that "additive boost" comes from adding new optional
>> clauses to the top level BooleanQuery that is executed, that only happens
>> after the "main" query (from your "q" param) is added to that top level
>> BooleanQuery as a "mandaory" clause.
>>
>> So, for example, "bf=true()" and "bq=*:*" should match & boost every doc,
>> but with the techprducts configs/data these requests still don't match
>> anything...
>>
>> /select?defType=edismax=bogus=true()=*:*=query
>> /select?defType=dismax=bogus=true()=*:*=query
>>
>> ...and if you look at the debug output, the parsed queries shows that the
>> "bogus" part of the query is mandatory...
>>
>> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*)
>> FunctionQuery(const(true))
>>
>> (i didn't use "pf" in that example, but the effect is the same, the "pf"
>> based clauses are optional, while the "qf" based clauses are mandatory)
>>
>> If you compare that example to your debug output, you'll notice a
>> difference in structure -- it's a bit hard to see in your example, but if
>> you simplify your qf, pf, and q fields it should be more obvious, but
>> AFAICT the "main" parts of your query are getting wrapped in an extra
>> layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in
>> the top level query ... i don't see *any* mandatory clauses in your top
>> level BooleanQuery, which is why any match on a bf or bq function is
>> enough to cause a document to match.
>>
>> I suspect the reason your parsed query structure is so diff has to do with
>> this...
>>
>> :synonym_edismax>
>>
>>
>> 1) how exactly is "synonym_edismax" defined in your solrconfig.xml?
>> 2) what QParserPlugin are you using to implement that?
>>
>> I suspect whatever QParserPlugin you are using has a bug in it :)
>>
>>
>> If you can't fix the bug, one possibile workaround would be to abandon bf
>> and bq params completely, and instead wrap the query it produces in in a
>> {!boost} parser with whatever function you want (using functions like
>> sum() or prod() to combine multiple functions, and query() to incorporate
>> your current bq param).  Doing this will require chanign how you specify
>> you input (example below) and it will result in *multiplicitive* boosts --
>> so your 

Re: Need Help Resolving Unknown Shape Definition Error

2016-08-12 Thread Jennifer Coston

I figured out the solution to this and figured I would send out the
solution in case anyone else runs into this issue and stumbles across this.
It turns out that I was using an outdated version of the jts jar. When I
updated to version 1.14 it works.

Thanks,

Jennifer Coston



From:   Jennifer Coston 
To: solr-user@lucene.apache.org
Date:   08/12/2016 09:34 AM
Subject:Re: Need Help Resolving Unknown Shape Definition Error



Update: I get the same error when I try to use curl instead of the admin
console. I'm really stuck on this. Any help, tips, suggestions, etc. would
be greatly appreciated!

Curl Command:
curl -X POST -H 'Content-Type: application/json'
'http://localhost:8983/solr/spaceknow/update/json/docs' --data-binary
'{"observationId":"8e09f47f", "observationType":"spaceknow",
"startTime":"2015-09-19T21:03:51Z", "endTime":"2015-09-19T21:03:51Z",
"receiptTime":"2016-07-29T15:49:49.328Z", "locationLat":38.9225015078814,
"locationLon":-77.22900299194423,
"position":"38.9225015078814,-77.22900299194423", "positionWkt":"POLYGON
((-77.23 38.922, -77.23 38.923, -77.228 38.923, -77.228 38.922, -77.23
38.922))", "provider":"a"}'

Response:
curl -X POST -H 'Content-Type: application/json'
'http://localhost:8983/solr/spaceknow/update/json/docs' --data-binary
'{"observationId":"8e09f47f", "observationType":"spaceknow",
"startTime":"2015-09-19T21:03:51Z", "endTime":"2015-09-19T21:03:51Z",
"receiptTime":"2016-07-29T15:49:49.328Z", "locationLat":38.9225015078814,
"locationLon":-77.22900299194423,
"position":"38.9225015078814,-77.22900299194423", "positionWkt":"POLYGON
((-77.23 38.922, -77.23 38.923, -77.228 38.923, -77.228 38.922, -77.23
38.922))", "provider":"dg"}'

Full Schema.xml file:





   
   
   

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

observationId





























  

  


  



  
  




  


  






  
  







  


  






  
  







  


  








  


  




  
  




  


  




  


  


  














Thanks!
  Jennifer Coston



Jennifer Coston ---08/11/2016 04:04:31 PM---Hello, I am trying to setup a
local solr core so that I can perform Spatial

From: Jennifer Coston 
To: solr-user@lucene.apache.org
Date: 08/11/2016 04:04 PM
Subject: Need Help Resolving Unknown Shape Definition Error




Hello,

I am trying to setup a local solr core so that I can perform Spatial
searches on it. I am using version 5.2.1. I have updated my schema.xml file
to include the location-rpt fieldType:



And I have defined my field to use this type:



I also added the jts-1.4.0.jar file to C:\solr-5.2.1\server\solr-webapp
\webapp\WEB-INF\lib.

However when I try to add a document through the Solr Admin Console I am
seeing this response:

{
 "responseHeader": {
   "status": 400,
   "QTime": 6
 },
 "error": {
   "msg": "Unknown Shape definition [POLYGON((-77.23 38.922, -77.23
38.923, -77.228 38.923, -77.228 38.922, -77.23 38.922))]",
   "code": 400
 }
}

I can submit documents successfully if I remove the positionWkt field. Did
I miss a configuration step?

Here is the document I am trying to add:

{
   "observationId": "8e09f47f",
   "observationType": "image",
   "startTime": "2015-09-19T21:03:51Z",
   "endTime": "2015-09-19T21:03:51Z",
   "receiptTime": "2016-07-29T15:49:49.328Z",
   "locationLat": 38.9225015078814,
   "locationLon": -77.22900299194423,
   "position": "38.9225015078814,-77.22900299194423",
   "positionWkt": "POLYGON((-77.23 38.922, -77.23 38.923, -77.228
38.923, -77.228 38.922, -77.23 38.922))",
   "provider": "a"
}

Here are the fields I added to the schema.xml file (I started with the
template, please let me know if you need the whole thing):

observationId














Thank you!

Jennifer



Re: Solr 6: Use facet with Streaming Expressions- LeftOuterJoin

2016-08-12 Thread Joel Bernstein
What issue were you having with the rollup() function?

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Aug 12, 2016 at 5:01 AM, vrindavda  wrote:

> Hey Joel,
>
> Thanks for you quick response, I was able to merge documents using
> OutherHashJoin. But I am not able to use rollup() to get count(*) for
> multiple fields, as we get using facets.
>
> Please suggest if last option is to merge documents using atomic updates,
> and then use facets(or json.facet). Is there any other way to merge
> documents permanently ?
>
> Thank you,
> Vrinda Davda
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Solr-6-Use-facet-with-Streaming-Expressions-LeftOuterJoin-
> tp4290526p4291397.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Getting "collection already exists" when creating collection in admin UI

2016-08-12 Thread Esther-Melaine Quansah
Hi Alexandre,

The question here is why the create action is called twice. You’re getting that 
“collection already exists” error after the second action is called. Can you 
verify if MyNewNode exists in /collections in ZK or on the machines running 
Solr at $SOLR_HOME/server/solr/
Your logs show a lot of issues around the overseer and it looks like those 
nodes are going into recovery pretty frequently. Can you replicate this issue 
by creating a collection through the API (not through the UI): 

http://localhost:8983/admin/collections?action=CREATE=MyNewerNode=1=2=1=DefaultConfig

Thanks,
Esther


> On Aug 12, 2016, at 10:05 AM, Alexandre Drouin 
>  wrote:
> 
> Hello,
> 
> I am running SolrCloud with 2 nodes (Solr 6.1 with SSL and basic auth) and 
> with one Zookeeper node (for development purposes) and when I try to create a 
> new collection in the admin UI with 'replicationFactor=2' I get a  
> "Connection to Solr lost" message and another message telling me " collection 
> already exists: MyNewNode".  I made sure that a collection with the same name 
> does not exists and the issue does not appear with a replication factor of 1. 
>  
> 
> While debugging I saw that the create action is called twice with the 
> following parameters: 
> /solr/admin/collections?_=1471010473184=CREATE=DefaultConfig=1=aaa=1=2=compositeId=compositeId=json
> 
> Can anyone replicate this issue?  I have not found it in JIRA.
> 
> 
> Below is the relevant log (if useful) and I posted the full logs here 
> https://gist.github.com/orck-adrouin/690d485ba0835320273e7b2e09fb3771
> 
> 63549 ERROR 
> (OverseerThreadFactory-5-thread-5-processing-n:orc-dev-solr-cd.local:8444_solr)
>  [   ] o.a.s.c.OverseerCollectionMessageHandler Collection: MyNewNode 
> operation: create failed:org.apache.solr.common.SolrException: collection 
> already exists: MyNewNode
>   at 
> org.apache.solr.cloud.OverseerCollectionMessageHandler.createCollection(OverseerCollectionMessageHandler.java:1832)
>   at 
> org.apache.solr.cloud.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:224)
>   at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:463)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> 
> Thanks,
> Alexandre Drouin



Getting "collection already exists" when creating collection in admin UI

2016-08-12 Thread Alexandre Drouin
Hello,

I am running SolrCloud with 2 nodes (Solr 6.1 with SSL and basic auth) and with 
one Zookeeper node (for development purposes) and when I try to create a new 
collection in the admin UI with 'replicationFactor=2' I get a  "Connection to 
Solr lost" message and another message telling me " collection already exists: 
MyNewNode".  I made sure that a collection with the same name does not exists 
and the issue does not appear with a replication factor of 1.  

While debugging I saw that the create action is called twice with the following 
parameters: 
/solr/admin/collections?_=1471010473184=CREATE=DefaultConfig=1=aaa=1=2=compositeId=compositeId=json

Can anyone replicate this issue?  I have not found it in JIRA.


Below is the relevant log (if useful) and I posted the full logs here 
https://gist.github.com/orck-adrouin/690d485ba0835320273e7b2e09fb3771

63549 ERROR 
(OverseerThreadFactory-5-thread-5-processing-n:orc-dev-solr-cd.local:8444_solr) 
[   ] o.a.s.c.OverseerCollectionMessageHandler Collection: MyNewNode operation: 
create failed:org.apache.solr.common.SolrException: collection already exists: 
MyNewNode
at 
org.apache.solr.cloud.OverseerCollectionMessageHandler.createCollection(OverseerCollectionMessageHandler.java:1832)
at 
org.apache.solr.cloud.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:224)
at 
org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:463)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Thanks,
Alexandre Drouin


Re: Need Help Resolving Unknown Shape Definition Error

2016-08-12 Thread Jennifer Coston

Update: I get the same error when I try to use curl instead of the admin
console. I'm really stuck on this. Any help, tips, suggestions, etc. would
be greatly appreciated!

Curl Command:
curl -X POST -H 'Content-Type: application/json'
'http://localhost:8983/solr/spaceknow/update/json/docs' --data-binary
'{"observationId":"8e09f47f", "observationType":"spaceknow",
"startTime":"2015-09-19T21:03:51Z", "endTime":"2015-09-19T21:03:51Z",
"receiptTime":"2016-07-29T15:49:49.328Z", "locationLat":38.9225015078814,
"locationLon":-77.22900299194423,
"position":"38.9225015078814,-77.22900299194423", "positionWkt":"POLYGON
((-77.23 38.922, -77.23 38.923, -77.228 38.923, -77.228 38.922, -77.23
38.922))", "provider":"a"}'

Response:
curl -X POST -H 'Content-Type: application/json'
'http://localhost:8983/solr/spaceknow/update/json/docs' --data-binary
'{"observationId":"8e09f47f", "observationType":"spaceknow",
"startTime":"2015-09-19T21:03:51Z", "endTime":"2015-09-19T21:03:51Z",
"receiptTime":"2016-07-29T15:49:49.328Z", "locationLat":38.9225015078814,
"locationLon":-77.22900299194423,
"position":"38.9225015078814,-77.22900299194423", "positionWkt":"POLYGON
((-77.23 38.922, -77.23 38.923, -77.228 38.923, -77.228 38.922, -77.23
38.922))", "provider":"dg"}'

Full Schema.xml file:





   
   
   

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

observationId





























  

  


  



  
  




  


  






  
  







  


  






  
  







  


  








  


  




  
  




  


  




  


  


  














Thanks!
Jennifer Coston



From:   Jennifer Coston 
To: solr-user@lucene.apache.org
Date:   08/11/2016 04:04 PM
Subject:Need Help Resolving Unknown Shape Definition Error




Hello,

I am trying to setup a local solr core so that I can perform Spatial
searches on it. I am using version 5.2.1. I have updated my schema.xml file
to include the location-rpt fieldType:



And I have defined my field to use this type:



I also added the jts-1.4.0.jar file to C:\solr-5.2.1\server\solr-webapp
\webapp\WEB-INF\lib.

However when I try to add a document through the Solr Admin Console I am
seeing this response:

{
  "responseHeader": {
"status": 400,
"QTime": 6
  },
  "error": {
"msg": "Unknown Shape definition [POLYGON((-77.23 38.922, -77.23
38.923, -77.228 38.923, -77.228 38.922, -77.23 38.922))]",
"code": 400
  }
}

I can submit documents successfully if I remove the positionWkt field. Did
I miss a configuration step?

Here is the document I am trying to add:

{
"observationId": "8e09f47f",
"observationType": "image",
"startTime": "2015-09-19T21:03:51Z",
"endTime": "2015-09-19T21:03:51Z",
"receiptTime": "2016-07-29T15:49:49.328Z",
"locationLat": 38.9225015078814,
"locationLon": -77.22900299194423,
"position": "38.9225015078814,-77.22900299194423",
"positionWkt": "POLYGON((-77.23 38.922, -77.23 38.923, -77.228
38.923, -77.228 38.922, -77.23 38.922))",
"provider": "a"
}

Here are the fields I added to the schema.xml file (I started with the
template, please let me know if you need the whole thing):

observationId














Thank you!

Jennifer

Re: Getting dynamic fields using LukeRequest.

2016-08-12 Thread Pranaya Behera

Hi,
 With solrj I am getting inconsistent results. Previously it was 
working great. But now it is not giving any expanded results while 
querying the same search in solr admin it gives the expanded query. When 
I say previously is that after the first release of 6.1.0, now 
LukeRequest is not working, getExpandedResults() always gives me 0 docs. 
Only expanded result works in solr admin but not the LukeRequest, for 
LukeRequest I have to query on each shard to get all the dynamic indexed 
fields.

Any solutions to this ?  Please mention if any response is needed.

On 10/08/16 11:49, Pranaya Behera wrote:
And also when I hit the request for each individual shard I get some 
results that are close to it using /admin/luke endpoint but to the 
whole collection it doesnt even show that have dynamic fields.


On 10/08/16 11:23, Pranaya Behera wrote:

Hi Steve,
  I did look at the schema api but it only gives the 
defined dynamic fields not the indexed dynamic fields. For indexed 
fields with the rule of the defined dynamic field I guess LukeRequest 
is the only option. (Please correct me if I am wrong.)


Hence I am unable to fetch each and every indexed field with the 
defined dynamic field.


On 09/08/16 19:26, Steve Rowe wrote:
Not sure what the issue is with LukeRequest, but Solrj has Schema 
API support: 



You can see which options are supported here: 



--
Steve
www.lucidworks.com

On Aug 9, 2016, at 8:52 AM, Pranaya Behera  
wrote:


Hi,
 I have the following script to retrieve all the fields in the 
collection. I am using SolrCloud 6.1.0.

LukeRequest lukeRequest = new LukeRequest();
lukeRequest.setNumTerms(0);
lukeRequest.setShowSchema(false);
LukeResponse lukeResponse = lukeRequest.process(cloudSolrClient);
Map fieldInfoMap = 
lukeResponse.getFieldInfo();
for (Map.Entry entry : 
fieldInfoMap.entrySet()) {
  entry.getKey(); // Here fieldInfoMap is size of 0 for sometime 
and sometime it is getting incomplete data.

}


Setting showSchema to true doesn't yield any result. Only making it 
false yields result that too incomplete data. As I can see in the 
doc that it has more than what it is saying it has.


LukeRequest hits 
/solr/product/admin/luke?numTerms=0=javabin=2 HTTP/1.1 .


How it should be configured for solrcloud ?
I have already mentioned

class="org.apache.solr.handler.admin.LukeRequestHandler" />


in the solrconfig.xml. It doesn't matter whether it is present in 
the solrconfig or not as I am requesting it from solrj.










AUTO: Brian J. Vanecek is out of the office (returning 08/15/2016)

2016-08-12 Thread Brian J. Vanecek


I am out of the office until 08/15/2016.




Note: This is an automated response to your message  "Re: commit it taking
1300 ms" sent on 8/11/2016 10:58:34 PM.

This is the only notification you will receive while this person is away.
**

This email and any attachments may contain information that is confidential 
and/or privileged for the sole use of the intended recipient.  Any use, review, 
disclosure, copying, distribution or reliance by others, and any forwarding of 
this email or its contents, without the express permission of the sender is 
strictly prohibited by law.  If you are not the intended recipient, please 
contact the sender immediately, delete the e-mail and destroy all copies.
**


Unable to upgrade from 5.4 to 5.5.2

2016-08-12 Thread Yago Riveiro
I'm trying to upgrade my Solr cluster from 5.4 to 5.5.2 in a rolling restart
without success.  
  
The first node upgraded with 5.4 worked without any issue, the problem arise
with the second. When the second node is restarted with the 5.4 version, the
heap of both nodes grows until the first node (I don't why, but always is the
first node) hit an OOM.  
  
It's like something in the PeerSync process is consuming ram (my index is huge
I have replicas with 250G)  until hit the OOM.

  

https://issues.apache.org/jira/browse/SOLR-8586 was added in 5.5, this issue
changes the way how shard synchronization is done. Can this issue be related
with my problem?  

  

\--

  

/Yago Riveiro

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/local-2eefb82c-
7bf1?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)



Re: Effects of insert order on query performance

2016-08-12 Thread Emir Arnautovic

Hi Jeff,

I will not comment on your theory (will let that to guys more familiar 
with Lucene code) but will point to one alternative solution: routing. 
You can use routing to split documents with different permission to 
different shards and use composite hash routing to split "A" (and maybe 
"B" as well) documents to multiple shards. That will make sure all doc 
with the same permission are on the same shard and on query time only 
those will be queried (less shards to query) and there is no need to 
include term query or filter query at all.


Here is blog explaining benefits of composite hash routing: 
https://sematext.com/blog/2015/09/29/solrcloud-large-tenants-and-routing/


Regards,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On 11.08.2016 19:39, Jeff Wartes wrote:

This isn’t really a question, although some validation would be nice. It’s more 
of a warning.

Tldr is that the insert order of documents in my collection appears to have had 
a huge effect on my query speed.


I have a very large (sharded) SolrCloud 5.4 index. One aspect of this index is 
a multi-valued field (“permissions”) that for 90% of docs contains one 
particular value, (“A”) and for 10% of docs contains another distinct value. 
(“B”) It’s intended to represent something like permissions, so more values are 
possible in the future, but not present currently. In fact, the addition of 
docs with value B to this index was very recent, previously all docs had value 
“A”. All queries, in addition to various other Boolean-query type restrictions, 
have a terms query on this field, like {!terms f=permissions v=A} or {!terms 
f=permissions v=A,B}

Last week, I tried to re-index the whole collection from scratch, using source 
data. Query performance on the resulting re-index proved to be abysmal, I could 
get barely 10% of my previous query throughput, and even that was at latencies 
that were orders of magnitude higher than what I had in production.

I hooked up some CPU profiling to a server that had shards from both the old 
and new version of the collection, and eventually it looked like the 
significant difference in processing the two collections was coming from 
ConstantWeight.scorer()
Specifically, this line
https://github.com/apache/lucene-solr/blob/0a1dd10d5262153f4188dfa14a08ba28ec4ccb60/solr/core/src/java/org/apache/solr/search/SolrConstantScoreQuery.java#L102
was far more expensive in my re-indexed collection. From there, the call chain 
goes through an LRUQueryCache, down to a BulkScorer, and ends up with the extra 
work happening here:
https://github.com/apache/lucene-solr/blob/0a1dd10d5262153f4188dfa14a08ba28ec4ccb60/lucene/core/src/java/org/apache/lucene/search/Weight.java#L169

I don’t pretend to understand all that code, but the difference in my re-index 
appears to have something to do either with that cache, or the aggregate 
docIdSets that need weights generated is simply much bigger in my re-index.


But the queries didn’t change, and the data is basically the same, what else 
could have changed?

The documents with the “B” distinct value were added recently to the 
high-performance collection, but the A’s and the B’s were all mixed up in the 
source data dump I used to re-index. On a hunch, I manually ordered the docs 
such that the A’s were all first and re-indexed again, and performance is great!

Here’s my theory: Using TieredMergePolicy, the vast quantity of the documents 
in an index are contained in the largest segments. I’m guessing there’s an 
optimization somewhere that says something like “This segment only has A’s”. By 
indexing all the A’s first, those biggest segments only contain A’s, and only 
the smallest, newest segments are unable to make use of that optimization.

Here’s the scary part: Although my re-index is now performing well, if this 
theory is right, some random insert (or a deliberate optimize) at some random 
point in the future could cascade a segment merge such that the largest 
segment(s) now contain both A’s and B’s, and performance suddenly goes over a 
cliff. I have no way to prevent this possibility except to stop doing inserts.

My current thinking is that I need to pull the terms-query part out of the 
query and do a filter query for it instead. Probably as a post-filter, since 
I’ve had bad luck with very large filter queries and the filter cache. I’d 
tested this originally (when I only had A’s), but found the performance was a 
bit worse than just leaving it in the query. I’ll take a bit worse and 
predictability over a bit better and a time bomb though, if those are my 
choices.


If anyone has any comments refuting or supporting this theory, I’d certainly 
like to hear it. This is the first time I’ve encountered anything about insert 
order mattering from a performance perspective, and it becomes a general-form 
question around how to handle low-cardinality fields.



Re: Solr 6: Use facet with Streaming Expressions- LeftOuterJoin

2016-08-12 Thread vrindavda
Hey Joel,

Thanks for you quick response, I was able to merge documents using
OutherHashJoin. But I am not able to use rollup() to get count(*) for
multiple fields, as we get using facets.

Please suggest if last option is to merge documents using atomic updates,
and then use facets(or json.facet). Is there any other way to merge
documents permanently ?

Thank you,
Vrinda Davda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-Use-facet-with-Streaming-Expressions-LeftOuterJoin-tp4290526p4291397.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Wildcard search not working

2016-08-12 Thread Ribeaud, Christian (Ext)
Hi Ahmet, Hi Upayavira,

OK, it seems that I have to dive a bit deeper in the Solr filters and 
tokenizers. I've just realized that my command there is too limited.
Thanks a lot guys so far for help. Cheers and have a nice day,

christian

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Freitag, 12. August 2016 07:41
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Christian,

Please use the following filter before/above the stemmer.


Plus, you may want to add :


  
  
  

Ahmet



On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, 
honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the 
corresponding field:

...




 









...

What is wrong with this schema? Respectively, what should I change to be able 
to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel



-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Donnerstag, 11. August 2016 16:00
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche 
depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are 
executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match 
it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser 
NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for 
instance searches with term 'roche' in a specific core. Everything fine, I am 
getting for instance two matches. I would expect at least the same number of 
matches with term 'r?che'. However, this does NOT happen. I am getting zero 
matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 
'roch*' works.

Switching debug mode brings following output:

"debug": {
"rawquerystring": "roch?",
"querystring": "roch?",
"parsedquery": "text:roch?",
"parsedquery_toString": "text:roch?",
"explain": {},
"QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian


Re: Wildcard search not working

2016-08-12 Thread Ahmet Arslan
Hi Christian,

Please use the following filter before/above the stemmer.


Plus, you may want to add :


  
  
  

Ahmet



On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, 
honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the 
corresponding field:

...




 









...

What is wrong with this schema? Respectively, what should I change to be able 
to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel



-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Donnerstag, 11. August 2016 16:00
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche 
depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are 
executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match 
it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser 
NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for 
instance searches with term 'roche' in a specific core. Everything fine, I am 
getting for instance two matches. I would expect at least the same number of 
matches with term 'r?che'. However, this does NOT happen. I am getting zero 
matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 
'roch*' works.

Switching debug mode brings following output:

"debug": {
"rawquerystring": "roch?",
"querystring": "roch?",
"parsedquery": "text:roch?",
"parsedquery_toString": "text:roch?",
"explain": {},
"QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian


Re: commit it taking 1300 ms

2016-08-12 Thread Esther-Melaine Quansah
Midas,

I’d like further clarification as well. Are you sending commits along with each 
document that you’re POSTing to Solr? If so, you’re essentially either opening 
a new searcher or flushing to disk with each POST which could explain latency 
between each request.

Thanks,

Esther
> On Aug 11, 2016, at 12:19 PM, Erick Erickson  wrote:
> 
> bq:  we post json documents through the curl it takes the time (same time i
> would like to say that we are not hard committing ). that curl takes time
> i.e. 1.3 sec.
> 
> OK, I'm really confused. _what_ is taking 1.3 seconds? When you said
> commit, I was thinking of Solr's commit operation, which is totally distinct
> from just adding a doc to the index. But I read the above statement
> as you're saying it takes 1.3 seconds just to send a doc to Solr.
> 
> Let's see the exact curl command you're using please?
> 
> Best,
> Erick
> 
> 
> On Thu, Aug 11, 2016 at 5:32 AM, Emir Arnautovic
>  wrote:
>> Hi Midas,
>> 
>> 1. How many indexing threads?
>> 2. Do you batch documents and what is your batch size?
>> 3. How frequently do you commit?
>> 
>> I would recommend:
>> 1. Move commits to Solr (set auto soft commit to max allowed time)
>> 2. Use batches (bulks)
>> 3. tune bulk size and number of threads to achieve max performance.
>> 
>> Thanks,
>> Emir
>> 
>> 
>> 
>> On 11.08.2016 08:21, Midas A wrote:
>>> 
>>> Emir,
>>> 
>>> other queries:
>>> 
>>> a) Solr cloud : NO
>>> b) >> size="5000" initialSize="5000" autowarmCount="10"/>
>>> c)  >> size="1000" initialSize="1000" autowarmCount="10"/>
>>> d) >> size="1000" initialSize="1000" autowarmCount="10"/>
>>> e) we are using multi threaded system.
>>> 
>>> On Thu, Aug 11, 2016 at 11:48 AM, Midas A  wrote:
>>> 
 Emir,
 
 we post json documents through the curl it takes the time (same time i
 would like to say that we are not hard committing ). that curl takes time
 i.e. 1.3 sec.
 
 On Wed, Aug 10, 2016 at 2:29 PM, Emir Arnautovic <
 emir.arnauto...@sematext.com> wrote:
 
> Hi Midas,
> 
> According to your autocommit configuration and your worry about commit
> time I assume that you are doing explicit commits from client code and
> that
> 1.3s is client observed commit time. If that is the case, than it might
> be
> opening searcher that is taking time.
> 
> How do you index data - single threaded or multithreaded? How frequently
> do you commit from client? Can you let Solr do soft commits instead of
> explicitly committing? Do you have warmup queries? Is this SolrCloud?
> What
> is number of servers (what spec), shards, docs?
> 
> In any case monitoring can give you more info about server/Solr behavior
> and help you diagnose issues more easily/precisely. One such monitoring
> tool is our SPM .
> 
> Regards,
> Emir
> 
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> 
> On 10.08.2016 05:20, Midas A wrote:
> 
>> Thanks for replying
>> 
>> index size:9GB
>> 2000 docs/sec.
>> 
>> Actually earlier it was taking less but suddenly it has increased .
>> 
>> Currently we do not have any monitoring  tool.
>> 
>> On Tue, Aug 9, 2016 at 7:00 PM, Emir Arnautovic <
>> emir.arnauto...@sematext.com> wrote:
>> 
>> Hi Midas,
>>> 
>>> Can you give us more details on your index: size, number of new docs
>>> between commits. Why do you think 1.3s for commit is to much and why
>>> do
>>> you
>>> need it to take less? Did you do any system/Solr monitoring?
>>> 
>>> Emir
>>> 
>>> 
>>> On 09.08.2016 14:10, Midas A wrote:
>>> 
>>> please reply it is urgent.
 
 On Tue, Aug 9, 2016 at 11:17 AM, Midas A 
 wrote:
 
 Hi ,
 
> commit is taking more than 1300 ms . what should i check on server.
> 
> below is my configuration .
> 
>  ${solr.autoCommit.maxTime:15000} <
> openSearcher>false  
> 
> ${solr.autoSoftCommit.maxTime:-1} 
> 
> 
> 
> --
>>> 
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>> 
>>> 
>>> 
>> 
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>> 



Regarding Solr UI authentication

2016-08-12 Thread Pradeep Chandra
Hi

I am running solr using the command *bin/solr start *in Ubuntu. Now I want
to give UI authentication to secure my Solr. Can you tell me how to make
Solr password protected. I am not using Zookeeper/SolrCloud.


Thanks and Regards,
M Pradeep Chandra