Re: Need help with query syntax

2017-08-10 Thread Dave
Eric you going to vegas next month? 

> On Aug 10, 2017, at 7:38 PM, Erick Erickson  wrote:
> 
> Omer:
> 
> Solr does not implement pure boolean logic, see:
> https://lucidworks.com/2011/12/28/why-not-and-or-and-not/.
> 
> With appropriate parentheses it can give the same results as you're
> discovering.
> 
> Best
> Erick
> 
>> On Thu, Aug 10, 2017 at 3:00 PM, OTH  wrote:
>> Thanks for the help!
>> That's resolved the issue.
>> 
>> On Fri, Aug 11, 2017 at 1:48 AM, David Hastings <
>> hastings.recurs...@gmail.com> wrote:
>> 
>>> type:value AND (name:america^1+name:state^1+name:united^1)
>>> 
>>> but in reality what you want to do is use the fq parameter with type:value
>>> 
 On Thu, Aug 10, 2017 at 4:36 PM, OTH  wrote:
 
 Hello,
 
 I have the following use case:
 
 I have two fields (among others); one is 'name' and the other is 'type'.
 'Name' is the field I need to search, whereas, with 'type', I need to
>>> make
 sure that it has a certain value, depending on the situation.  Often,
>>> when
 I search the 'name' field, the search query would have multiple tokens.
 Furthermore, each query token needs to have a scoring weight attached to
 it.
 
 However, I'm unable to figure out the syntax which would allow all these
 things to happen.
 
 For example, if I use the following query:
 select?q=type:value+AND+name:america^1+name:state^1+name:united^1
 It would only return documents where 'name' includes the token 'america'
 (and where type==value).  It will totally ignore
 "+name:state^1+name:united^1", it seems.
 
 This does not happen if I omit "type:value+AND+".  So, with the following
 query:
 select?q=name:america^1+name:state^1+name:united^1
 It returns all documents which contain any of the three tokens {america,
 state, united}; which is what I need.  However, it also returns documents
 where type != value; which I can't have.
 
 If I put "type:value" at the end of the query command, like so:
 select?q=name:america^1+name:state^1+name:united^1+AND+type:value
 In this case, it will only return documents which contain the "united"
 token in the name field (and where type==value).  Again, it will totally
 ignore "name:america^1+name:state^1", it seems.
 
 I tried putting an "AND" between everything, like so:
 select?q=type:value+AND+name:america^1+AND+name:state^1+
>>> AND+name:united^1
 But this, of course, would only return documents which contain all the
 tokens {america, state, united}; whereas I need all documents which
>>> contain
 any of those tokens.
 
 
 If anyone could help me out with how this could be done / what the
>>> correct
 syntax would be, that would be a huge help.
 
 Much thanks
 Omer
 
>>> 


Re: Issue with delta import

2017-08-10 Thread vrindavda
refer this :

http://lucene.472066.n3.nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-Import-td4338162.html#a4339168



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-delta-import-tp4347680p4350157.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with query syntax

2017-08-10 Thread Erick Erickson
Omer:

Solr does not implement pure boolean logic, see:
https://lucidworks.com/2011/12/28/why-not-and-or-and-not/.

With appropriate parentheses it can give the same results as you're
discovering.

Best
Erick

On Thu, Aug 10, 2017 at 3:00 PM, OTH  wrote:
> Thanks for the help!
> That's resolved the issue.
>
> On Fri, Aug 11, 2017 at 1:48 AM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
>
>> type:value AND (name:america^1+name:state^1+name:united^1)
>>
>> but in reality what you want to do is use the fq parameter with type:value
>>
>> On Thu, Aug 10, 2017 at 4:36 PM, OTH  wrote:
>>
>> > Hello,
>> >
>> > I have the following use case:
>> >
>> > I have two fields (among others); one is 'name' and the other is 'type'.
>> >  'Name' is the field I need to search, whereas, with 'type', I need to
>> make
>> > sure that it has a certain value, depending on the situation.  Often,
>> when
>> > I search the 'name' field, the search query would have multiple tokens.
>> > Furthermore, each query token needs to have a scoring weight attached to
>> > it.
>> >
>> > However, I'm unable to figure out the syntax which would allow all these
>> > things to happen.
>> >
>> > For example, if I use the following query:
>> > select?q=type:value+AND+name:america^1+name:state^1+name:united^1
>> > It would only return documents where 'name' includes the token 'america'
>> > (and where type==value).  It will totally ignore
>> > "+name:state^1+name:united^1", it seems.
>> >
>> > This does not happen if I omit "type:value+AND+".  So, with the following
>> > query:
>> > select?q=name:america^1+name:state^1+name:united^1
>> > It returns all documents which contain any of the three tokens {america,
>> > state, united}; which is what I need.  However, it also returns documents
>> > where type != value; which I can't have.
>> >
>> > If I put "type:value" at the end of the query command, like so:
>> > select?q=name:america^1+name:state^1+name:united^1+AND+type:value
>> > In this case, it will only return documents which contain the "united"
>> > token in the name field (and where type==value).  Again, it will totally
>> > ignore "name:america^1+name:state^1", it seems.
>> >
>> > I tried putting an "AND" between everything, like so:
>> > select?q=type:value+AND+name:america^1+AND+name:state^1+
>> AND+name:united^1
>> > But this, of course, would only return documents which contain all the
>> > tokens {america, state, united}; whereas I need all documents which
>> contain
>> > any of those tokens.
>> >
>> >
>> > If anyone could help me out with how this could be done / what the
>> correct
>> > syntax would be, that would be a huge help.
>> >
>> > Much thanks
>> > Omer
>> >
>>


Re: Need help with query syntax

2017-08-10 Thread OTH
Thanks for the help!
That's resolved the issue.

On Fri, Aug 11, 2017 at 1:48 AM, David Hastings <
hastings.recurs...@gmail.com> wrote:

> type:value AND (name:america^1+name:state^1+name:united^1)
>
> but in reality what you want to do is use the fq parameter with type:value
>
> On Thu, Aug 10, 2017 at 4:36 PM, OTH  wrote:
>
> > Hello,
> >
> > I have the following use case:
> >
> > I have two fields (among others); one is 'name' and the other is 'type'.
> >  'Name' is the field I need to search, whereas, with 'type', I need to
> make
> > sure that it has a certain value, depending on the situation.  Often,
> when
> > I search the 'name' field, the search query would have multiple tokens.
> > Furthermore, each query token needs to have a scoring weight attached to
> > it.
> >
> > However, I'm unable to figure out the syntax which would allow all these
> > things to happen.
> >
> > For example, if I use the following query:
> > select?q=type:value+AND+name:america^1+name:state^1+name:united^1
> > It would only return documents where 'name' includes the token 'america'
> > (and where type==value).  It will totally ignore
> > "+name:state^1+name:united^1", it seems.
> >
> > This does not happen if I omit "type:value+AND+".  So, with the following
> > query:
> > select?q=name:america^1+name:state^1+name:united^1
> > It returns all documents which contain any of the three tokens {america,
> > state, united}; which is what I need.  However, it also returns documents
> > where type != value; which I can't have.
> >
> > If I put "type:value" at the end of the query command, like so:
> > select?q=name:america^1+name:state^1+name:united^1+AND+type:value
> > In this case, it will only return documents which contain the "united"
> > token in the name field (and where type==value).  Again, it will totally
> > ignore "name:america^1+name:state^1", it seems.
> >
> > I tried putting an "AND" between everything, like so:
> > select?q=type:value+AND+name:america^1+AND+name:state^1+
> AND+name:united^1
> > But this, of course, would only return documents which contain all the
> > tokens {america, state, united}; whereas I need all documents which
> contain
> > any of those tokens.
> >
> >
> > If anyone could help me out with how this could be done / what the
> correct
> > syntax would be, that would be a huge help.
> >
> > Much thanks
> > Omer
> >
>


Re: Problem loading custom similarity class via blob API

2017-08-10 Thread Webster Homer
The blob store api is indeed severely limited (near useless) by this:
https://issues.apache.org/jira/browse/SOLR-9175

On Thu, Aug 10, 2017 at 4:08 PM, Webster Homer 
wrote:

> I have a need to override the default behavior of the BM25Similarty class.
> It was trivial to create the class. My problem is that I cannot load it, at
> least via the blob api as described here:
> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+
> SolrCloud+Mode
>
> I set enable.runtime.lib=true on my solr startup
>
> I created the .system collection and uploaded a jar containing my class
> following the instructions here:
> https://cwiki.apache.org/confluence/display/solr/Blob+Store+API
>
> I can see that the blob was created.
>
> I added the runtime lib to my collection, and I can see it in the
> configorelay.json file
> Finally I added this line to the schema.xml file:
>  runtimeLib="true"/>
>
> uploaded it to Zookeeper and reloaded the collection. only to get this
> error:
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Could not load conf for core test-catalog-product-170724_shard1_replica1:
> Can't load schema schema.xml: Error loading class 'com.sial.similarity.
> SialBM25Similarity'
>
> What am I missing? Is this plugins api limited to components and request
> handlers? If so that's a HUGE limitation, makes it just about useless.
> I need it for this similarity and some filter factories, and having a
> simple way to deploy code to solrcloud nodes that can be easily automated
> would be great.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Problem loading custom similarity class via blob API

2017-08-10 Thread Webster Homer
I have a need to override the default behavior of the BM25Similarty class.
It was trivial to create the class. My problem is that I cannot load it, at
least via the blob api as described here:
https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode

I set enable.runtime.lib=true on my solr startup

I created the .system collection and uploaded a jar containing my class
following the instructions here:
https://cwiki.apache.org/confluence/display/solr/Blob+Store+API

I can see that the blob was created.

I added the runtime lib to my collection, and I can see it in the
configorelay.json file
Finally I added this line to the schema.xml file:


uploaded it to Zookeeper and reloaded the collection. only to get this
error:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Could not load conf for core test-catalog-product-170724_shard1_replica1:
Can't load schema schema.xml: Error loading class
'com.sial.similarity.SialBM25Similarity'

What am I missing? Is this plugins api limited to components and request
handlers? If so that's a HUGE limitation, makes it just about useless.
I need it for this similarity and some filter factories, and having a
simple way to deploy code to solrcloud nodes that can be easily automated
would be great.

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Need help with query syntax

2017-08-10 Thread David Hastings
type:value AND (name:america^1+name:state^1+name:united^1)

but in reality what you want to do is use the fq parameter with type:value

On Thu, Aug 10, 2017 at 4:36 PM, OTH  wrote:

> Hello,
>
> I have the following use case:
>
> I have two fields (among others); one is 'name' and the other is 'type'.
>  'Name' is the field I need to search, whereas, with 'type', I need to make
> sure that it has a certain value, depending on the situation.  Often, when
> I search the 'name' field, the search query would have multiple tokens.
> Furthermore, each query token needs to have a scoring weight attached to
> it.
>
> However, I'm unable to figure out the syntax which would allow all these
> things to happen.
>
> For example, if I use the following query:
> select?q=type:value+AND+name:america^1+name:state^1+name:united^1
> It would only return documents where 'name' includes the token 'america'
> (and where type==value).  It will totally ignore
> "+name:state^1+name:united^1", it seems.
>
> This does not happen if I omit "type:value+AND+".  So, with the following
> query:
> select?q=name:america^1+name:state^1+name:united^1
> It returns all documents which contain any of the three tokens {america,
> state, united}; which is what I need.  However, it also returns documents
> where type != value; which I can't have.
>
> If I put "type:value" at the end of the query command, like so:
> select?q=name:america^1+name:state^1+name:united^1+AND+type:value
> In this case, it will only return documents which contain the "united"
> token in the name field (and where type==value).  Again, it will totally
> ignore "name:america^1+name:state^1", it seems.
>
> I tried putting an "AND" between everything, like so:
> select?q=type:value+AND+name:america^1+AND+name:state^1+AND+name:united^1
> But this, of course, would only return documents which contain all the
> tokens {america, state, united}; whereas I need all documents which contain
> any of those tokens.
>
>
> If anyone could help me out with how this could be done / what the correct
> syntax would be, that would be a huge help.
>
> Much thanks
> Omer
>


Need help with query syntax

2017-08-10 Thread OTH
Hello,

I have the following use case:

I have two fields (among others); one is 'name' and the other is 'type'.
 'Name' is the field I need to search, whereas, with 'type', I need to make
sure that it has a certain value, depending on the situation.  Often, when
I search the 'name' field, the search query would have multiple tokens.
Furthermore, each query token needs to have a scoring weight attached to
it.

However, I'm unable to figure out the syntax which would allow all these
things to happen.

For example, if I use the following query:
select?q=type:value+AND+name:america^1+name:state^1+name:united^1
It would only return documents where 'name' includes the token 'america'
(and where type==value).  It will totally ignore
"+name:state^1+name:united^1", it seems.

This does not happen if I omit "type:value+AND+".  So, with the following
query:
select?q=name:america^1+name:state^1+name:united^1
It returns all documents which contain any of the three tokens {america,
state, united}; which is what I need.  However, it also returns documents
where type != value; which I can't have.

If I put "type:value" at the end of the query command, like so:
select?q=name:america^1+name:state^1+name:united^1+AND+type:value
In this case, it will only return documents which contain the "united"
token in the name field (and where type==value).  Again, it will totally
ignore "name:america^1+name:state^1", it seems.

I tried putting an "AND" between everything, like so:
select?q=type:value+AND+name:america^1+AND+name:state^1+AND+name:united^1
But this, of course, would only return documents which contain all the
tokens {america, state, united}; whereas I need all documents which contain
any of those tokens.


If anyone could help me out with how this could be done / what the correct
syntax would be, that would be a huge help.

Much thanks
Omer


Re: Token "states" not getting lemmatized by Solr?

2017-08-10 Thread OTH
Hello - Sorry, I obviously made a mistake here.

I said earlier that it seems to me that the word 'united' is being
lemmatized (to 'unite').  But it seems that's not the case.  It seems that
there isn't any lemmatization or stemming being done.  I had previously
assumed that the default 'text_general' fieldtype in Solr probably handles
this; but seems that's not the case.

I realize that what is going on with me is something else.  I will start
another email thread for that.

Thanks.


On Thu, Aug 10, 2017 at 11:33 PM, OTH  wrote:

> Hi,
>
> Regarding 'analysis chain':
>
> I'm using Solr 6.4.1, and in the managed-schema file, I find the following:
>  positionIncrementGap="100" multiValued="true">
> 
>   
>ignoreCase="true"/>
>   
> 
> 
>   
>ignoreCase="true"/>
>ignoreCase="true" synonyms="synonyms.txt"/>
>   
> 
>   
>
>
> Regarding the Admin UI >> Analysis page:  I just tried that, and to be
> honest, I can't seem to get much useful info out of it, especially in terms
> of lemmatization.
>
> For example, for any text I enter in it to "analyse", all it does is seem
> to tell me which analysers (if that's the right term?) are being used for
> the selected field / fieldtype, and for each of these analyzers, it would
> give some very basic info, like text, raw_bytes, etc.  Eg, for the input
> "united" in the "field value (index)" box, having "text_general" selected
> for fieldtype, all I get is this:
>
> ST
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> 
> 1
> SF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> 
> 1
> LCF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> 
> 1
> Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
> "org.apache.lucene.analysis.standard.StandardTokenizer",
> "org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.
>
>
> So - should 'states' not be lemmatized to 'state' using these settings?
>  (If not, then I would need to figure out how to use a different lemmatizer)
>
> Thanks
>
> On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson 
> wrote:
>
>> saying the field is "text_general" is not sufficient, please post the
>> analysis chain defined in your schema.
>>
>> Also the admin UI>>analysis page will help you figure out exactly what
>> part of the analysis chain does what.
>>
>> Best,
>> Erick
>>
>> On Thu, Aug 10, 2017 at 8:37 AM, OTH  wrote:
>> > Hello,
>> >
>> > It seems for me that the token "states" is not getting lemmatized to
>> > "state" by Solr.
>> >
>> > Eg, I have a document with the value "united states of america".
>> > This document is not returned when the following query is issued:
>> > q=name:state^1+name:america^1+name:united^1
>> > However, all documents which contain the token "state" are indeed
>> returned,
>> > with the above query.
>> > The "united states of america" document is returned if I change "state"
>> in
>> > the query to "states"; so:
>> > q=name:states^1+name:america^1+name:united^1
>> >
>> > At first I thought maybe the lemmatization isn't working for some
>> reason.
>> > However, when I changed "united" in the query to "unite", then it did
>> still
>> > return the "united states of america" document:
>> > q=name:states^1+name:america^1+name:unite^1
>> > Which means that the lemmatization is working for the token "united",
>> but
>> > not for the token "states".
>> >
>> > The "name" field above is defined as "text_general".
>> >
>> > So it seems to me, that perhaps the default Solr lemmatizer does not
>> > lemmatize "states" to "state"?
>> > Can anyone confirm if this is indeed the expected behaviour?
>> > And what can I do to change it?
>> > If I need to put in a customer lemmatizer, then what would be the (best)
>> > way to do that?
>> >
>> > Much thanks
>> > Omer
>>
>
>


Re: Token "states" not getting lemmatized by Solr?

2017-08-10 Thread Erick Erickson
First, if you turn off the "verbose" checkbox, it'll reduce a lot of
the clutter. The important point is that when you hover over those
abbreviations, it tells you exactly what class did the associated
transformation the analysis chain on the tokens. You'll note that
StandardTokenizer breaks the input up into tokens. "united" doesn't to
very much that's exciting, make some letters uppercase and you'll see
the obvious for lowercaseFilter.

Why do you suppose lemmatization will be done for text_general?
There's nothing in the analysis chain that would perform any
lemmatization.
StandartTokenizerFactory will break the input up into tokens. Each
token is then sent through filter where:
StopFilterfactory will remove stopwords defined in stopwrods.txt
LowercaseFilterFactory will lowercase the token

that's all you've told Solr to do with the inptu at index time. And at
query time SynonymFilterFactory will substitute synonyms. There's
nothing here that has anything to do with lemmatization.

Here's a partial list of available filters that you can choose from:
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions

Best,
Erick


On Thu, Aug 10, 2017 at 11:33 AM, OTH  wrote:
> Hi,
>
> Regarding 'analysis chain':
>
> I'm using Solr 6.4.1, and in the managed-schema file, I find the following:
>  positionIncrementGap="100" multiValued="true">
> 
>   
>ignoreCase="true"/>
>   
> 
> 
>   
>ignoreCase="true"/>
>ignoreCase="true" synonyms="synonyms.txt"/>
>   
> 
>   
>
>
> Regarding the Admin UI >> Analysis page:  I just tried that, and to be
> honest, I can't seem to get much useful info out of it, especially in terms
> of lemmatization.
>
> For example, for any text I enter in it to "analyse", all it does is seem
> to tell me which analysers (if that's the right term?) are being used for
> the selected field / fieldtype, and for each of these analyzers, it would
> give some very basic info, like text, raw_bytes, etc.  Eg, for the input
> "united" in the "field value (index)" box, having "text_general" selected
> for fieldtype, all I get is this:
>
> ST
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> 
> 1
> SF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> 
> 1
> LCF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> 
> 1
> Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
> "org.apache.lucene.analysis.standard.StandardTokenizer",
> "org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.
>
>
> So - should 'states' not be lemmatized to 'state' using these settings?
>  (If not, then I would need to figure out how to use a different lemmatizer)
>
> Thanks
>
> On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson 
> wrote:
>
>> saying the field is "text_general" is not sufficient, please post the
>> analysis chain defined in your schema.
>>
>> Also the admin UI>>analysis page will help you figure out exactly what
>> part of the analysis chain does what.
>>
>> Best,
>> Erick
>>
>> On Thu, Aug 10, 2017 at 8:37 AM, OTH  wrote:
>> > Hello,
>> >
>> > It seems for me that the token "states" is not getting lemmatized to
>> > "state" by Solr.
>> >
>> > Eg, I have a document with the value "united states of america".
>> > This document is not returned when the following query is issued:
>> > q=name:state^1+name:america^1+name:united^1
>> > However, all documents which contain the token "state" are indeed
>> returned,
>> > with the above query.
>> > The "united states of america" document is returned if I change "state"
>> in
>> > the query to "states"; so:
>> > q=name:states^1+name:america^1+name:united^1
>> >
>> > At first I thought maybe the lemmatization isn't working for some reason.
>> > However, when I changed "united" in the query to "unite", then it did
>> still
>> > return the "united states of america" document:
>> > q=name:states^1+name:america^1+name:unite^1
>> > Which means that the lemmatization is working for the token "united", but
>> > not for the token "states".
>> >
>> > The "name" field above is defined as "text_general".
>> >
>> > So it seems to me, that perhaps the default Solr lemmatizer does not
>> > lemmatize "states" to "state"?
>> > Can anyone confirm if this is indeed the expected behaviour?
>> > And what can I do to change it?
>> > If I need to put in a customer lemmatizer, then what would be the (best)
>> > way to do that?
>> >
>> > Much thanks
>> > Omer
>>


Re: Token "states" not getting lemmatized by Solr?

2017-08-10 Thread Ahmet Arslan
Hi Omer,
Your analysis chain does not include a stem filter (lemmatizer)
Assuming you are dealing with English text, you can use KStemFilterFactory or 
SnowballFilterFactory.
Ahmet


On Thursday, August 10, 2017, 9:33:08 PM GMT+3, OTH  
wrote:


Hi,

Regarding 'analysis chain':

I'm using Solr 6.4.1, and in the managed-schema file, I find the following:

    
      
      
      
    
    
      
      
      
      
    
  


Regarding the Admin UI >> Analysis page:  I just tried that, and to be
honest, I can't seem to get much useful info out of it, especially in terms
of lemmatization.

For example, for any text I enter in it to "analyse", all it does is seem
to tell me which analysers (if that's the right term?) are being used for
the selected field / fieldtype, and for each of these analyzers, it would
give some very basic info, like text, raw_bytes, etc.  Eg, for the input
"united" in the "field value (index)" box, having "text_general" selected
for fieldtype, all I get is this:

ST
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1

1
SF
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1

1
LCF
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1

1
Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
"org.apache.lucene.analysis.standard.StandardTokenizer",
"org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.


So - should 'states' not be lemmatized to 'state' using these settings?
(If not, then I would need to figure out how to use a different lemmatizer)

Thanks

On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson 
wrote:

> saying the field is "text_general" is not sufficient, please post the
> analysis chain defined in your schema.
>
> Also the admin UI>>analysis page will help you figure out exactly what
> part of the analysis chain does what.
>
> Best,
> Erick
>
> On Thu, Aug 10, 2017 at 8:37 AM, OTH  wrote:
> > Hello,
> >
> > It seems for me that the token "states" is not getting lemmatized to
> > "state" by Solr.
> >
> > Eg, I have a document with the value "united states of america".
> > This document is not returned when the following query is issued:
> > q=name:state^1+name:america^1+name:united^1
> > However, all documents which contain the token "state" are indeed
> returned,
> > with the above query.
> > The "united states of america" document is returned if I change "state"
> in
> > the query to "states"; so:
> > q=name:states^1+name:america^1+name:united^1
> >
> > At first I thought maybe the lemmatization isn't working for some reason.
> > However, when I changed "united" in the query to "unite", then it did
> still
> > return the "united states of america" document:
> > q=name:states^1+name:america^1+name:unite^1
> > Which means that the lemmatization is working for the token "united", but
> > not for the token "states".
> >
> > The "name" field above is defined as "text_general".
> >
> > So it seems to me, that perhaps the default Solr lemmatizer does not
> > lemmatize "states" to "state"?
> > Can anyone confirm if this is indeed the expected behaviour?
> > And what can I do to change it?
> > If I need to put in a customer lemmatizer, then what would be the (best)
> > way to do that?
> >
> > Much thanks
> > Omer
>

Re: Token "states" not getting lemmatized by Solr?

2017-08-10 Thread OTH
Hi,

Regarding 'analysis chain':

I'm using Solr 6.4.1, and in the managed-schema file, I find the following:


  
  
  


  
  
  
  

  


Regarding the Admin UI >> Analysis page:  I just tried that, and to be
honest, I can't seem to get much useful info out of it, especially in terms
of lemmatization.

For example, for any text I enter in it to "analyse", all it does is seem
to tell me which analysers (if that's the right term?) are being used for
the selected field / fieldtype, and for each of these analyzers, it would
give some very basic info, like text, raw_bytes, etc.  Eg, for the input
"united" in the "field value (index)" box, having "text_general" selected
for fieldtype, all I get is this:

ST
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1

1
SF
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1

1
LCF
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1

1
Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
"org.apache.lucene.analysis.standard.StandardTokenizer",
"org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.


So - should 'states' not be lemmatized to 'state' using these settings?
 (If not, then I would need to figure out how to use a different lemmatizer)

Thanks

On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson 
wrote:

> saying the field is "text_general" is not sufficient, please post the
> analysis chain defined in your schema.
>
> Also the admin UI>>analysis page will help you figure out exactly what
> part of the analysis chain does what.
>
> Best,
> Erick
>
> On Thu, Aug 10, 2017 at 8:37 AM, OTH  wrote:
> > Hello,
> >
> > It seems for me that the token "states" is not getting lemmatized to
> > "state" by Solr.
> >
> > Eg, I have a document with the value "united states of america".
> > This document is not returned when the following query is issued:
> > q=name:state^1+name:america^1+name:united^1
> > However, all documents which contain the token "state" are indeed
> returned,
> > with the above query.
> > The "united states of america" document is returned if I change "state"
> in
> > the query to "states"; so:
> > q=name:states^1+name:america^1+name:united^1
> >
> > At first I thought maybe the lemmatization isn't working for some reason.
> > However, when I changed "united" in the query to "unite", then it did
> still
> > return the "united states of america" document:
> > q=name:states^1+name:america^1+name:unite^1
> > Which means that the lemmatization is working for the token "united", but
> > not for the token "states".
> >
> > The "name" field above is defined as "text_general".
> >
> > So it seems to me, that perhaps the default Solr lemmatizer does not
> > lemmatize "states" to "state"?
> > Can anyone confirm if this is indeed the expected behaviour?
> > And what can I do to change it?
> > If I need to put in a customer lemmatizer, then what would be the (best)
> > way to do that?
> >
> > Much thanks
> > Omer
>


which class is: o.a.s.c.S.Request

2017-08-10 Thread Nawab Zada Asad Iqbal
Hi

I see logs from this class 'o.a.s.c.S.Request',  and I am able to tune this
log by going to the logging webpage (Solr -> Request), but I cannot find
the full class name in code. What should I put in the log properties file
to disable this log?


Thanks
Nawab


Re: Token "states" not getting lemmatized by Solr?

2017-08-10 Thread Erick Erickson
saying the field is "text_general" is not sufficient, please post the
analysis chain defined in your schema.

Also the admin UI>>analysis page will help you figure out exactly what
part of the analysis chain does what.

Best,
Erick

On Thu, Aug 10, 2017 at 8:37 AM, OTH  wrote:
> Hello,
>
> It seems for me that the token "states" is not getting lemmatized to
> "state" by Solr.
>
> Eg, I have a document with the value "united states of america".
> This document is not returned when the following query is issued:
> q=name:state^1+name:america^1+name:united^1
> However, all documents which contain the token "state" are indeed returned,
> with the above query.
> The "united states of america" document is returned if I change "state" in
> the query to "states"; so:
> q=name:states^1+name:america^1+name:united^1
>
> At first I thought maybe the lemmatization isn't working for some reason.
> However, when I changed "united" in the query to "unite", then it did still
> return the "united states of america" document:
> q=name:states^1+name:america^1+name:unite^1
> Which means that the lemmatization is working for the token "united", but
> not for the token "states".
>
> The "name" field above is defined as "text_general".
>
> So it seems to me, that perhaps the default Solr lemmatizer does not
> lemmatize "states" to "state"?
> Can anyone confirm if this is indeed the expected behaviour?
> And what can I do to change it?
> If I need to put in a customer lemmatizer, then what would be the (best)
> way to do that?
>
> Much thanks
> Omer


Re: SolrDispatchFilter Request Consumption

2017-08-10 Thread Chris Ulicny
We're using version 6.3.0 with the jetty container provided. The full log
message is included below and is consistently the same. We only find this
log message when there is extended high update load on the solrcloud
(usually of small documents). Shorter bursts do not generate the same
messages. However, in either case we do not see any data loss.

All of our updates and queries come through various .NET processes.

The only information I've been able to find slightly related to the problem
might be this issue: https://github.com/eclipse/jetty.project/issues/277. I
believe it was fixed after the 9.3.8 version (and not backported) that was
packaged with 6.3.0. No idea if it is actually related though.

Thanks,
Chris

o.a.s.s.SolrDispatchFilter Could not consume full client request
java.io.IOException: Committed before 100 Continues
at
org.eclipse.jetty.server.HttpChannelOverHttp.continue100(HttpChannelOverHttp.java:206)
at org.eclipse.jetty.server.Request.getInputStream(Request.java:802)
at
org.apache.solr.servlet.SolrDispatchFilter.consumeInputFully(SolrDispatchFilter.java:329)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:320)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)


On Thu, Aug 10, 2017 at 10:44 AM Shawn Heisey  wrote:

> On 8/10/2017 7:31 AM, Chris Ulicny wrote:
> > I've noticed that there are quite a few INFO log entries of the form:
> >
> > "...o.a.s.s.SolrDispatchFilter Could not consume full client request"
> >
> > Should we be worried about these, or is there something wrong with the
> > requests we're making?
> >
> > We aren't noticing anything unexpected in what data makes it into solr,
> so
> > it doesn't seem problematic right now.
>
> Off the top of my head, I would say that this probably means that the
> client is incorrectly using the HTTP standard in some way.  What kind of
> software are you using to send the requests to Solr?  The place in the
> Solr code where the error happens is using the Servlet API, interfacing
> into a servlet container, which is most likely going to be Jetty.  This
> is the first time I've seen this, so it is not likely to be a bug on the
> server side.  I'm not ruling out a bug in Solr, but I think if there
> were one, the problem would be really common.  If you happen to be
> running Solr in a container other than the Jetty that it is shipped
> with, there might be a problem with the container.
>
> The Solr log will contain many more lines for this log message than you
> have already shared.  There will be potentially dozens of lines of a
> Java stacktrace, and it may contain multiple "Caused by" sections with
> more lines.  Can you share the entire thing, and let us know the precise
> Solr version 

Token "states" not getting lemmatized by Solr?

2017-08-10 Thread OTH
Hello,

It seems for me that the token "states" is not getting lemmatized to
"state" by Solr.

Eg, I have a document with the value "united states of america".
This document is not returned when the following query is issued:
q=name:state^1+name:america^1+name:united^1
However, all documents which contain the token "state" are indeed returned,
with the above query.
The "united states of america" document is returned if I change "state" in
the query to "states"; so:
q=name:states^1+name:america^1+name:united^1

At first I thought maybe the lemmatization isn't working for some reason.
However, when I changed "united" in the query to "unite", then it did still
return the "united states of america" document:
q=name:states^1+name:america^1+name:unite^1
Which means that the lemmatization is working for the token "united", but
not for the token "states".

The "name" field above is defined as "text_general".

So it seems to me, that perhaps the default Solr lemmatizer does not
lemmatize "states" to "state"?
Can anyone confirm if this is indeed the expected behaviour?
And what can I do to change it?
If I need to put in a customer lemmatizer, then what would be the (best)
way to do that?

Much thanks
Omer


Re: Solr LTR with high rerankDocs

2017-08-10 Thread Erick Erickson
I have to confess that I know very little about the mechanics of LTR, but
I can talk a little bit about compression.

When a stored values is retrieved for a document it is read from the
*.fdt file which is a compressed, verbatim copy of the field. DocValues
can bypass this stored data and read directly from the DV format.
There's a discussion of useDocValuesAsStored in solr/CHANGES.txt.

The restriction of docValues is that they can only be used for
primitive types, numerics, strings and the like, specifically _not_
fields with class="solr.TextField".

WARNING: I have no real clue whether LTR is built to leverage
docValues fields. If you add docValues="true" to the relevant
fields you'll have to re-index completely. In fact I'd use a new
collection.

And don't be put off by the fact that the index size on disk will grow
on disk if you add docValues, the memory is MMapped to OS
disk space and will actually _reduce_ your JVM requirements.

Best,
Erick



On Thu, Aug 10, 2017 at 6:57 AM, Sebastian Klemke
 wrote:
> Hi,
>
> we're currently experimenting with LTR reranking on large rerank
> windows (rerankDocs=1000+). On a >500M documents SolrCloud collection,
> we were only able to get sub-second response times with
> FieldValueFeature. Therefore we created a custom feature extractor that
> matches field values with constant strings to substitute simple
> SolrFeature usages. Apparently, the response time is now dominated by
> loading stored fields, more specifically by uncompressing chunks of
> stored field data.
>
> We're now wondering how many documents LTR can rerank in practice and
> what the bottlenecks are. Do you guys have any experience using it?
>
>
> Regards,
>
> Sebastian
>
>
> --
> Sebastian Klemke
> Senior Software Engineer
>
> ResearchGate GmbH
> Invalidenstr. 115, 10115 Berlin, Germany
>
> www.researchgate.net
>
> Registered Seat: Hannover, HR B 202837
> Managing Directors: Dr Ijad Madisch, Dr Sören Hofmayer VAT-ID: DE258434568
> A proud affiliate of: ResearchGate Corporation, 350 Townsend St #754, San 
> Francisco, CA 94107
>


RE: Is there a way to determine fields available for faceting for a search without doing the faceting?

2017-08-10 Thread Markus Jelsma
solr/search/admin/luke?show=schema=json=true gives you schema 
information. Look for all fields that are string, int etc and test if they are 
either indexed are have docValues.
 
 
-Original message-
> From:Michael Joyner 
> Sent: Thursday 10th August 2017 16:12
> To: solr-user@lucene.apache.org
> Cc: Jason Carter 
> Subject: Is there a way to determine fields available for faceting for a 
> search without doing the faceting?
> 
> Hey all!
> 
> Is there a way to determine fields available for faceting (those with 
> data) for a search without actually doing the faceting for the fields?
> 
> -Mike/NewsRx
> 
> 


Re: How to remove Scripts and Styles in content of SOLR Indexes[content field] while indexed through URL?

2017-08-10 Thread Steve Rowe
Hi Daniel,

HTMLStripCharFilterFactory in your index analyzer should do the trick: 


--
Steve
www.lucidworks.com

> On Aug 10, 2017, at 4:13 AM, Daniel von der Helm 
>  wrote:
> 
> Hi,
> if a fetched HTML page (using SimplePostTool: -Ddata=web) contains 

Re: SolrDispatchFilter Request Consumption

2017-08-10 Thread Shawn Heisey
On 8/10/2017 7:31 AM, Chris Ulicny wrote:
> I've noticed that there are quite a few INFO log entries of the form:
>
> "...o.a.s.s.SolrDispatchFilter Could not consume full client request"
>
> Should we be worried about these, or is there something wrong with the
> requests we're making?
>
> We aren't noticing anything unexpected in what data makes it into solr, so
> it doesn't seem problematic right now.

Off the top of my head, I would say that this probably means that the
client is incorrectly using the HTTP standard in some way.  What kind of
software are you using to send the requests to Solr?  The place in the
Solr code where the error happens is using the Servlet API, interfacing
into a servlet container, which is most likely going to be Jetty.  This
is the first time I've seen this, so it is not likely to be a bug on the
server side.  I'm not ruling out a bug in Solr, but I think if there
were one, the problem would be really common.  If you happen to be
running Solr in a container other than the Jetty that it is shipped
with, there might be a problem with the container.

The Solr log will contain many more lines for this log message than you
have already shared.  There will be potentially dozens of lines of a
Java stacktrace, and it may contain multiple "Caused by" sections with
more lines.  Can you share the entire thing, and let us know the precise
Solr version you're running so we can check it against the source code? 
With that, it may be possible to give you more specific information
about what's going wrong.

Thanks,
Shawn



How to remove Scripts and Styles in content of SOLR Indexes[content field] while indexed through URL?

2017-08-10 Thread Daniel von der Helm
Hi,
if a fetched HTML page (using SimplePostTool: -Ddata=web) contains 

Is there a way to determine fields available for faceting for a search without doing the faceting?

2017-08-10 Thread Michael Joyner

Hey all!

Is there a way to determine fields available for faceting (those with 
data) for a search without actually doing the faceting for the fields?


-Mike/NewsRx



Solr LTR with high rerankDocs

2017-08-10 Thread Sebastian Klemke
Hi,

we're currently experimenting with LTR reranking on large rerank
windows (rerankDocs=1000+). On a >500M documents SolrCloud collection,
we were only able to get sub-second response times with
FieldValueFeature. Therefore we created a custom feature extractor that
matches field values with constant strings to substitute simple
SolrFeature usages. Apparently, the response time is now dominated by
loading stored fields, more specifically by uncompressing chunks of
stored field data.

We're now wondering how many documents LTR can rerank in practice and
what the bottlenecks are. Do you guys have any experience using it?


Regards,

Sebastian


-- 
Sebastian Klemke
Senior Software Engineer
  
ResearchGate GmbH
Invalidenstr. 115, 10115 Berlin, Germany
  
www.researchgate.net
  
Registered Seat: Hannover, HR B 202837
Managing Directors: Dr Ijad Madisch, Dr Sören Hofmayer VAT-ID: DE258434568
A proud affiliate of: ResearchGate Corporation, 350 Townsend St #754, San 
Francisco, CA 94107



SolrDispatchFilter Request Consumption

2017-08-10 Thread Chris Ulicny
Hi all,

I've noticed that there are quite a few INFO log entries of the form:

"...o.a.s.s.SolrDispatchFilter Could not consume full client request"

Should we be worried about these, or is there something wrong with the
requests we're making?

We aren't noticing anything unexpected in what data makes it into solr, so
it doesn't seem problematic right now.

Thanks,
Chris


Re: Move index directory to another partition

2017-08-10 Thread Mahmoud Almokadem
Thanks all for your commits.

I followed Shawn steps (rsync) cause everything on that volume (ZooKeeper,
Solr home and data) and everything went great.

Thanks again,
Mahmoud


On Sun, Aug 6, 2017 at 12:47 AM, Erick Erickson 
wrote:

> bq: I was envisioning a scenario where the entire solr home is on the old
> volume that's going away.  If I were setting up a Solr install where the
> large/fast storage was a separate filesystem, I would put the solr home
> (or possibly even the entire install) under that mount point.  It would
> be a lot easier than setting dataDir in core.properties for every core,
> especially in a cloud install.
>
> Agreed. Nothing in what I said precludes this. If you don't specify
> dataDir,
> then the index for a new replica goes in the default place, i.e. under
> your install
> directory usually. In your case under your new mount point. I usually don't
> recommend trying to take control of where dataDir points, just let it
> default.
> I only mentioned it so you'd be aware it exists. So if your new install
> is associated with a bigger/better/larger EBS it's all automatic.
>
> bq: If the dataDir property is already in use to relocate index data, then
> ADDREPLICA and DELETEREPLICA would be a great way to go.  I would not
> expect most SolrCloud users to use that method.
>
> I really don't understand this. Each Solr replica has an associated
> dataDir whether you specified it or not (the default is relative to
> the core.properties file). ADDREPLICA creates a new replica in a new
> place, initially the data directory and index are empty. The new
> replica goes into recovery and uses the standard replication process
> to copy the index via HTTP from a healthy replica and write it to its
> data directory. Once that's done, the replica becomes live. There's
> nothing about dataDir already being in use here at all.
>
> When you start Solr there's the default place Solr expects to find the
> replicas. This is not necessarily where Solr is executing from, see
> the "-s" option in bin/solr start -s.
>
> If you're talking about using dataDir to point to an existing index,
> yes that would be a problem and not something I meant to imply at all.
>
> Why wouldn't most SolrCloud users use ADDREPLICA/DELTEREPLICA? It's
> commonly used to more replicas around a cluster.
>
> Best,
> Erick
>
> On Fri, Aug 4, 2017 at 11:15 AM, Shawn Heisey  wrote:
> > On 8/2/2017 9:17 AM, Erick Erickson wrote:
> >> Not entirely sure about AWS intricacies, but getting a new replica to
> >> use a particular index directory in the general case is just
> >> specifying dataDir=some_directory on the ADDREPLICA command. The index
> >> just needs an HTTP connection (uses the old replication process) so
> >> nothing huge there. Then DELETEREPLICA for the old one. There's
> >> nothing that ZK has to know about to make this work, it's all local to
> >> the Solr instance.
> >
> > I was envisioning a scenario where the entire solr home is on the old
> > volume that's going away.  If I were setting up a Solr install where the
> > large/fast storage was a separate filesystem, I would put the solr home
> > (or possibly even the entire install) under that mount point.  It would
> > be a lot easier than setting dataDir in core.properties for every core,
> > especially in a cloud install.
> >
> > If the dataDir property is already in use to relocate index data, then
> > ADDREPLICA and DELETEREPLICA would be a great way to go.  I would not
> > expect most SolrCloud users to use that method.
> >
> > Thanks,
> > Shawn
> >
>


RE: 6.6.0 getNumShards() NPE?!

2017-08-10 Thread Markus Jelsma
I can now reproduce it on the two shard, two replica cluster.

It does NOT happen on the collection_shard1_replica1 and 
collection_shard2_replica1 nodes.

It happens consistently on the collection_shard1_replica2 and 
collection_shard2_replica2 nodes.

Any ideas?

-Original message-
> From:Markus Jelsma 
> Sent: Thursday 10th August 2017 12:34
> To: Solr-user 
> Subject: 6.6.0 getNumShards() NPE?!
> 
> Hello,
> 
> Having trouble, again, with CloudDescriptor and friend, getting the number of 
> shards of the collection. It sometimes returns 1 for a collection of two 
> shards. Having this code:
> 
>   cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor();
>   return cloudDescriptor.getNumShards();
> 
> In some cases cloudDescriptor.getNumShards() throws an NPE. Am i doing 
> something wrong?
> 
> Thanks,
> Markus
> 


6.6.0 getNumShards() NPE?!

2017-08-10 Thread Markus Jelsma
Hello,

Having trouble, again, with CloudDescriptor and friend, getting the number of 
shards of the collection. It sometimes returns 1 for a collection of two 
shards. Having this code:

  cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor();
  return cloudDescriptor.getNumShards();

In some cases cloudDescriptor.getNumShards() throws an NPE. Am i doing 
something wrong?

Thanks,
Markus


Getting solr source count without using search query every time

2017-08-10 Thread Selvam Raman
​Hi All,

​I am using solr cloud environment to search and index the data.

Example

Source_field_s:

A,

B,

C,

 etc


​Expected result:

A(100)

B(200)

C(50),

etc


​

Data stored in solr.  Every second or 10 seconds i need to get source ​
​
​facet( A,B,C)​ to produce statistics. I do not want to disturb production
solr for this facet as it is already serving end user request and indexing
data.


i read about CDCR approach, could use CDCR(Target) to get only statistics
where as it is waste of storage. I do not worry about the data and concern
about count of all sources.


is there any internal approach available to get Stat or facet about
source_field for every second or batch whenever there is update in
index(add, delete and update).


Like CDCR, is there a way to get only source count(facet query result)
update to target in sync with source solr.


Could you please provide a approach to handle this problem?

​Thanks,

selvam R​