Re: Solr Json Facet

2018-05-08 Thread Erick Erickson
Follow the instructions here:
http://lucene.apache.org/solr/community.html#mailing-lists-irc. You
must use the _exact_ same e-mail as you used to subscribe.

If the initial try doesn't work and following the suggestions at the
"problems" link doesn't work for you, let us know. But note you need
to show us the _entire_ return header to allow anyone to diagnose the
problem.


On Tue, May 8, 2018 at 9:23 PM, Asher Shih  wrote:
> unsubscribe
>
> On Tue, May 8, 2018 at 9:19 PM, Kojo  wrote:
>> Everything working now. The code is not that clean and I am rewriting, so I
>> don't know exactly what was wrong, but something malformed.
>>
>> I would like to ask another question regarding json facet.
>>
>> With GET method, i was used to use many fq on the same query, each one with
>> it's own tag. It was working wondefully.
>>
>> With POST method, to post more than one fq parameter is a little
>> complicated, so I am joining all queries in one fq with all the tags. When
>> I select the first facet everything seems to be ok, but when I select the
>> second facet it is "cleaning" the first filter for the facets which shows
>> all the original values for this second facet, even though the result-set
>> is filtering as expected. I will make more tests to understand the
>> mechanics of this, but if someone has some advise on this subject I
>> appreciate a lot.
>>
>> Thank you,
>>
>>
>>
>>
>>
>> 2018-05-08 23:54 GMT-03:00 Yonik Seeley :
>>
>>> Looks like some sort of proxy server inbetween the python client and
>>> solr server.
>>> I would still check first if the output from the python client is
>>> correctly escaped/encoded HTTP.
>>>
>>> One easy way is to use netcat to pretend to be a server:
>>> $ nc -l 8983
>>> And then send point the python client at that and send the request.
>>>
>>> -Yonik
>>>
>>>
>>> On Tue, May 8, 2018 at 9:17 PM, Kojo  wrote:
>>> > Thank you all. I tried escaping but still not working
>>> >
>>> > Yonik, I am using Python Requests. It works if my fq is a single word,
>>> even
>>> > if I use double quotes on this single word without escaping.
>>> >
>>> > This is the HTTP response:
>>> >
>>> > response.content
>>> > 
>>> > '>> > 2.0//EN">\n\n400 Bad
>>> > Request\n\nBad Request\nYour browser
>>> sent
>>> > a request that this server could not understand.>> > />\n\n\nApache/2.2.15 (Oracle) Server at leydenh Port
>>> > 80\n\n'
>>> >
>>> >
>>> > Thank you,
>>> >
>>> >
>>> >
>>> > 2018-05-08 18:46 GMT-03:00 Yonik Seeley :
>>> >
>>> >> On Tue, May 8, 2018 at 1:36 PM, Kojo  wrote:
>>> >> > If I tag the fq query and I query for a simple word it works fine too.
>>> >> But
>>> >> > if query a multi word with space in the middle it breaks:
>>> >>
>>> >> Most likely the full query is not getting to Solr because of an HTTP
>>> >> protocol error (i.e. the request is not encoded correctly).
>>> >> How are you sending your request to Solr (with curl, or with some other
>>> >> method?)
>>> >>
>>> >> -Yonik
>>> >>
>>>


Re: How to do multi-threading indexing on huge volume of JSON files?

2018-05-08 Thread Erick Erickson
I'd seriously consider a SolrJ program rather than posting, posting
files is really intended to be a simple way to get started, when it
comes to indexing large volumes it's not very efficient.

As a comparison, I index 3-4K docs/second (Wikipedia dump) on my macbook pro.

Note that if each of your businesses has that many documents, you're
talking 12 billion, hope you're sharding!

Here's some SolrJ to get you started. Note you'll pretty much throw
out the Tika and RDBMS in favor of constructing the SolrInputDocuments
from parsing your data with your favorite JSON parser.

https://lucidworks.com/2012/02/14/indexing-with-solrj/

Then you can rack N of these SolrJ programs (each presumably working
on a separate subset of the data) to get your indexing speed up to
what you need.

95% of the time slow indexing is because of the ETL pipeline. One key
is to check the CPU usage on your Solr server and see if it's running
hot or not. If not, then you aren't feeding docs fast enough to Solr.

Do batch docs together as in the program, I typically start with
batches of 1,000 docs.

Best,
Erick


On Tue, May 8, 2018 at 8:25 PM, Raymond Xie  wrote:
> I have a huge amount of JSON files to be indexed in Solr, it costs me 22
> minutes to index 300,000 JSON files which were generated from 1 single bz2
> file, this is only 0.25% of the total amount of data from the same business
> flow, there are 100+ business flow to be index'ed.
>
> I absolutely need a good solution on this, at the moment I use the post.jar
> to work on folder and I am running the post.jar in single thread.
>
> I wonder what is the best practice to do multi-threading indexing? Can
> anyone provide detailed example?
>
>
>
> **
> *Sincerely yours,*
>
>
> *Raymond*


Re: Solr Json Facet

2018-05-08 Thread Asher Shih
unsubscribe

On Tue, May 8, 2018 at 9:19 PM, Kojo  wrote:
> Everything working now. The code is not that clean and I am rewriting, so I
> don't know exactly what was wrong, but something malformed.
>
> I would like to ask another question regarding json facet.
>
> With GET method, i was used to use many fq on the same query, each one with
> it's own tag. It was working wondefully.
>
> With POST method, to post more than one fq parameter is a little
> complicated, so I am joining all queries in one fq with all the tags. When
> I select the first facet everything seems to be ok, but when I select the
> second facet it is "cleaning" the first filter for the facets which shows
> all the original values for this second facet, even though the result-set
> is filtering as expected. I will make more tests to understand the
> mechanics of this, but if someone has some advise on this subject I
> appreciate a lot.
>
> Thank you,
>
>
>
>
>
> 2018-05-08 23:54 GMT-03:00 Yonik Seeley :
>
>> Looks like some sort of proxy server inbetween the python client and
>> solr server.
>> I would still check first if the output from the python client is
>> correctly escaped/encoded HTTP.
>>
>> One easy way is to use netcat to pretend to be a server:
>> $ nc -l 8983
>> And then send point the python client at that and send the request.
>>
>> -Yonik
>>
>>
>> On Tue, May 8, 2018 at 9:17 PM, Kojo  wrote:
>> > Thank you all. I tried escaping but still not working
>> >
>> > Yonik, I am using Python Requests. It works if my fq is a single word,
>> even
>> > if I use double quotes on this single word without escaping.
>> >
>> > This is the HTTP response:
>> >
>> > response.content
>> > 
>> > '> > 2.0//EN">\n\n400 Bad
>> > Request\n\nBad Request\nYour browser
>> sent
>> > a request that this server could not understand.> > />\n\n\nApache/2.2.15 (Oracle) Server at leydenh Port
>> > 80\n\n'
>> >
>> >
>> > Thank you,
>> >
>> >
>> >
>> > 2018-05-08 18:46 GMT-03:00 Yonik Seeley :
>> >
>> >> On Tue, May 8, 2018 at 1:36 PM, Kojo  wrote:
>> >> > If I tag the fq query and I query for a simple word it works fine too.
>> >> But
>> >> > if query a multi word with space in the middle it breaks:
>> >>
>> >> Most likely the full query is not getting to Solr because of an HTTP
>> >> protocol error (i.e. the request is not encoded correctly).
>> >> How are you sending your request to Solr (with curl, or with some other
>> >> method?)
>> >>
>> >> -Yonik
>> >>
>>


Re: Solr Json Facet

2018-05-08 Thread Kojo
Everything working now. The code is not that clean and I am rewriting, so I
don't know exactly what was wrong, but something malformed.

I would like to ask another question regarding json facet.

With GET method, i was used to use many fq on the same query, each one with
it's own tag. It was working wondefully.

With POST method, to post more than one fq parameter is a little
complicated, so I am joining all queries in one fq with all the tags. When
I select the first facet everything seems to be ok, but when I select the
second facet it is "cleaning" the first filter for the facets which shows
all the original values for this second facet, even though the result-set
is filtering as expected. I will make more tests to understand the
mechanics of this, but if someone has some advise on this subject I
appreciate a lot.

Thank you,





2018-05-08 23:54 GMT-03:00 Yonik Seeley :

> Looks like some sort of proxy server inbetween the python client and
> solr server.
> I would still check first if the output from the python client is
> correctly escaped/encoded HTTP.
>
> One easy way is to use netcat to pretend to be a server:
> $ nc -l 8983
> And then send point the python client at that and send the request.
>
> -Yonik
>
>
> On Tue, May 8, 2018 at 9:17 PM, Kojo  wrote:
> > Thank you all. I tried escaping but still not working
> >
> > Yonik, I am using Python Requests. It works if my fq is a single word,
> even
> > if I use double quotes on this single word without escaping.
> >
> > This is the HTTP response:
> >
> > response.content
> > 
> > ' > 2.0//EN">\n\n400 Bad
> > Request\n\nBad Request\nYour browser
> sent
> > a request that this server could not understand. > />\n\n\nApache/2.2.15 (Oracle) Server at leydenh Port
> > 80\n\n'
> >
> >
> > Thank you,
> >
> >
> >
> > 2018-05-08 18:46 GMT-03:00 Yonik Seeley :
> >
> >> On Tue, May 8, 2018 at 1:36 PM, Kojo  wrote:
> >> > If I tag the fq query and I query for a simple word it works fine too.
> >> But
> >> > if query a multi word with space in the middle it breaks:
> >>
> >> Most likely the full query is not getting to Solr because of an HTTP
> >> protocol error (i.e. the request is not encoded correctly).
> >> How are you sending your request to Solr (with curl, or with some other
> >> method?)
> >>
> >> -Yonik
> >>
>


How to do multi-threading indexing on huge volume of JSON files?

2018-05-08 Thread Raymond Xie
I have a huge amount of JSON files to be indexed in Solr, it costs me 22
minutes to index 300,000 JSON files which were generated from 1 single bz2
file, this is only 0.25% of the total amount of data from the same business
flow, there are 100+ business flow to be index'ed.

I absolutely need a good solution on this, at the moment I use the post.jar
to work on folder and I am running the post.jar in single thread.

I wonder what is the best practice to do multi-threading indexing? Can
anyone provide detailed example?



**
*Sincerely yours,*


*Raymond*


How to do indexing on remote location

2018-05-08 Thread Raymond Xie
Please take this as no joking! Any suggestion is welcome and appreciated.

I have data on remote WORM drive on a cluster that include 3 hosts, each
host contains same copy of data.

I have Solr server on a different host and need to do the indexing on the
WORM drive.

It is said the indexing can only be done on local host, or hdfs if in the
same cluster.

I was proposing to create a mapped drive/mount so Solr server would see the
WORM drive as its local location.

The proposal was returned today by management saying cross mount
potentially introduces risk and I was asked to figure out a workaround to
do the indexing on a remote host without the cross mount.


Thank you very much.

**
*Sincerely yours,*


*Raymond*


Re: Solr Json Facet

2018-05-08 Thread Yonik Seeley
Looks like some sort of proxy server inbetween the python client and
solr server.
I would still check first if the output from the python client is
correctly escaped/encoded HTTP.

One easy way is to use netcat to pretend to be a server:
$ nc -l 8983
And then send point the python client at that and send the request.

-Yonik


On Tue, May 8, 2018 at 9:17 PM, Kojo  wrote:
> Thank you all. I tried escaping but still not working
>
> Yonik, I am using Python Requests. It works if my fq is a single word, even
> if I use double quotes on this single word without escaping.
>
> This is the HTTP response:
>
> response.content
> 
> ' 2.0//EN">\n\n400 Bad
> Request\n\nBad Request\nYour browser sent
> a request that this server could not understand. />\n\n\nApache/2.2.15 (Oracle) Server at leydenh Port
> 80\n\n'
>
>
> Thank you,
>
>
>
> 2018-05-08 18:46 GMT-03:00 Yonik Seeley :
>
>> On Tue, May 8, 2018 at 1:36 PM, Kojo  wrote:
>> > If I tag the fq query and I query for a simple word it works fine too.
>> But
>> > if query a multi word with space in the middle it breaks:
>>
>> Most likely the full query is not getting to Solr because of an HTTP
>> protocol error (i.e. the request is not encoded correctly).
>> How are you sending your request to Solr (with curl, or with some other
>> method?)
>>
>> -Yonik
>>


Re: Solr Json Facet

2018-05-08 Thread Kojo
Thank you all. I tried escaping but still not working

Yonik, I am using Python Requests. It works if my fq is a single word, even
if I use double quotes on this single word without escaping.

This is the HTTP response:

response.content

'\n\n400 Bad
Request\n\nBad Request\nYour browser sent
a request that this server could not understand.\n\n\nApache/2.2.15 (Oracle) Server at leydenh Port
80\n\n'


Thank you,



2018-05-08 18:46 GMT-03:00 Yonik Seeley :

> On Tue, May 8, 2018 at 1:36 PM, Kojo  wrote:
> > If I tag the fq query and I query for a simple word it works fine too.
> But
> > if query a multi word with space in the middle it breaks:
>
> Most likely the full query is not getting to Solr because of an HTTP
> protocol error (i.e. the request is not encoded correctly).
> How are you sending your request to Solr (with curl, or with some other
> method?)
>
> -Yonik
>


Rule based replica placement solr cloud 6.2.1

2018-05-08 Thread Natarajan, Rajeswari
 Hi,

Would like to have below rule set up in solr cloud 6.2.1. Not sure how to model 
this with default snitch. Any suggestions?

Don’t assign more than 1 replica of this collection to a host


Regards,
Rajeswari



Re: Solr Json Facet

2018-05-08 Thread Yonik Seeley
On Tue, May 8, 2018 at 1:36 PM, Kojo  wrote:
> If I tag the fq query and I query for a simple word it works fine too. But
> if query a multi word with space in the middle it breaks:

Most likely the full query is not getting to Solr because of an HTTP
protocol error (i.e. the request is not encoded correctly).
How are you sending your request to Solr (with curl, or with some other method?)

-Yonik


Re: Solr Json Facet

2018-05-08 Thread Shawn Heisey
On 5/8/2018 11:36 AM, Kojo wrote:
> If I tag the fq query and I query for a simple word it works fine too. But
> if query a multi word with space in the middle it breaks:
>
> {'q':'*:*', 'fl': '*',
> 'fq':'{!tag=city_colaboration_tag}city_colaboration:"College
> Station"', 'json.facet': '{city_colaboration:{type:terms, field:
> city_colaboration ,limit:5000, domain:{excludeTags:city_
> colaboration_tag}}}'}

Best guess is that this is happening because your JSON fails
validation.  One of the rules is that quotes must be escaped if you want
to use a literal quote.

Putting your JSON into a validator, it gets flagged with a BUNCH of errors.

https://jsonformatter.curiousconcept.com/

I think I managed to fix it.  Here's a new version that passes strict
validation.  The paste will expire one month from now:

https://apaste.info/M46c

I also fixed/validated the inner json in the json.facet parameter before
I escaped it.  As you can see, nested json is messy when it is correctly
formed.

This is the tool I used for the escaping:

https://codebeautify.org/json-escape-unescape

Development libraries for constructing JSON data would probably handle
the escaping automatically.

The JSON parser that Solr uses can handle some deviations from the
strict standard, but not ALL deviations.  Using data that passes strict
validation will make success more likely.  It's not what I would do, but
you could probably also get this working just by escaping the quotes
around the query text:
\"College Station\"

Thanks,
Shawn



Re: Must clause with filter queries

2018-05-08 Thread Shawn Heisey
On 5/8/2018 9:58 AM, root23 wrote:
> In case of frange query how do we specify the Must clause ?

Looking at how frange works, I'm pretty sure that all queries with
frange are going to be effectively single-clause.  So you don't need to
specify MUST -- it's implied.

> the reason we are using frange instead of the normal syntax is that we need
> to add a cost to this clause. Since this will return a lot of documents, we
> want to calculate at the end of all the clauses. That is why we are using
> frange with a cost of 200.

Ah, you want it to be a postFilter, which frange supports, but the
standard lucene parser doesn't.  FYI, to actually achieve a postFilter,
you need to set cache=false in addition to a cost of 100 or higher. 
It's not possible to cache postFilters because of how they work, so they
must be uncached.  Which also means you don't need to worry about using
NOW/DAY date rounding.

See the "Expensive Filters" section on this blog post for an example
with frange that includes cache=false and cost=200:

https://lucidworks.com/2012/02/10/advanced-filter-caching-in-solr/

The requirement for cache=false is not mentioned on the blog post
above.  It was this post that alerted me to that requirement:

https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/

> We have near real time requirements and that is the reason we are using 500
> ms in the autosoft commit.
> We have autowarmCount="60%" for filter cache.

What is the size of the filterCache?  Chances are very good that this
translates to a fairly high autowarmCount, and that it is making your
automatic soft commits take far longer than 500 milliseconds.  If the
warming is slow, then you're not getting the half-second latency anyway,
so configuring it is at best a waste of resources, and at worst a big
performance problem.

Achieving NRT indexing requires turning off all warming.  To see how
long it took to warm the searcher on the last commit, go to the admin
UI.  Choose your index from the dropdown, click on Plugins/Stats, click
on CORE, then open the "searcher" entry.  In the displayed information
will be "warmupTime", with a value in milliseconds.  I'm betting that
this number will be larger than 500.  If I'm wrongabout that, then you
might not have anything to worry about.

You can also see warmup times for the individual caches with the CACHE
entry in Plugins/Stats.  Typically it's filterCache that takes the longest.

https://www.dropbox.com/s/izwad4h2vl1z752/solr-filtercache-stats.png?dl=0

A long time ago, I was having issues on my servers with commits taking a
minute or more.  I discovered that it was autowarming on the filterCache
that caused it.  So I reduced autowarmCount on that cache. Eventually I
got to an autowarmCount of *four*.  Not 4 percent, I am literally doing
warming from the top 4 cache entries.  Even with the count that low,
commits still sometimes take 10 seconds or more, and the vast majority
of that time is spent executing those four warming queries from the
filterCache.

Thanks,
Shawn



Re: Solr Json Facet

2018-05-08 Thread Mikhail Khludnev
Single backslash escaping works for me.

On Tue, May 8, 2018 at 8:36 PM, Kojo  wrote:

> Hello,
> recently I have changed the way I get facet data from Solr. I was using GET
> method on request but due to the limit of the query I changed to POST
> method.
>
> Bellow is a sample of the data I send to Solr, in order to get facets. But
> there is something here that I don´t understand.
>
> If I do not tag the fq query, it woks fine:
> {'q':'*:*', 'fl': '*', 'fq':'city_colaboration:"College Station"',
> 'json.facet': '{city_colaboration:{type:terms, field: city_colaboration
> ,limit:5000}}'}
>
> If I tag the fq query and I query for a simple word it works fine too. But
> if query a multi word with space in the middle it breaks:
>
> {'q':'*:*', 'fl': '*',
> 'fq':'{!tag=city_colaboration_tag}city_colaboration:"College
> Station"', 'json.facet': '{city_colaboration:{type:terms, field:
> city_colaboration ,limit:5000, domain:{excludeTags:city_
> colaboration_tag}}}'}
>
>
> All of this works fine for GET method, but breks on POST method.
>
>
> Below is the portion of the log. I really appreciate your help.
>
> Regards,
> Koji
>
>
>
> 01:49
> ERROR true
> RequestHandlerBase
> org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
> Cannot parse 'city_colaboration:"College': Lexical error at line 1,
> column 34. Encountered:  after : "\"College"
> org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
> Cannot parse 'cidade_colaboracao_exact:"College': Lexical error at line 1,
> column 34.  Encountered:  after : "\"College"
> at org.apache.solr.handler.component.QueryComponent.
> prepare(QueryComponent.java:219)
> at org.apache.solr.handler.component.SearchHandler.handleRequestBody(
> SearchHandler.java:270)
> at org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:173)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:361)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:305)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1691)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:582)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> at org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> at org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1180)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:512)
> at org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1112)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)
> at org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
> RewriteHandler.java:335)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> at org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)
> at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)
> at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> executeProduceConsume(ExecuteProduceConsume.java:303)
> at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceConsume(ExecuteProduceConsume.java:148)
> at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProduceConsume.java:136)
> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:671)
> at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:748)
>



-- 
Sincerely yours
Mikhail Khludnev


managed resources and SolrJ

2018-05-08 Thread Hendrik Haddorp

Hi,

we are looking into using manged resources for synonyms via the 
ManagedSynonymGraphFilterFactory. It seems like there is no SolrJ API 
for that. I would be especially interested in one via the 
CloudSolrClient. I found 
http://lifelongprogrammer.blogspot.de/2017/01/build-rest-apis-to-update-solrs-managed-resources.html. 
Is there a better solution?


regards,
Hendrik


Solr Json Facet

2018-05-08 Thread Kojo
Hello,
recently I have changed the way I get facet data from Solr. I was using GET
method on request but due to the limit of the query I changed to POST
method.

Bellow is a sample of the data I send to Solr, in order to get facets. But
there is something here that I don´t understand.

If I do not tag the fq query, it woks fine:
{'q':'*:*', 'fl': '*', 'fq':'city_colaboration:"College Station"',
'json.facet': '{city_colaboration:{type:terms, field: city_colaboration
,limit:5000}}'}

If I tag the fq query and I query for a simple word it works fine too. But
if query a multi word with space in the middle it breaks:

{'q':'*:*', 'fl': '*',
'fq':'{!tag=city_colaboration_tag}city_colaboration:"College
Station"', 'json.facet': '{city_colaboration:{type:terms, field:
city_colaboration ,limit:5000, domain:{excludeTags:city_
colaboration_tag}}}'}


All of this works fine for GET method, but breks on POST method.


Below is the portion of the log. I really appreciate your help.

Regards,
Koji



01:49
ERROR true
RequestHandlerBase
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
Cannot parse 'city_colaboration:"College': Lexical error at line 1,
column 34. Encountered:  after : "\"College"
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
Cannot parse 'cidade_colaboracao_exact:"College': Lexical error at line 1,
column 34.  Encountered:  after : "\"College"
at org.apache.solr.handler.component.QueryComponent.
prepare(QueryComponent.java:219)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(
SearchHandler.java:270)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:361)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:305)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
doFilter(ServletHandler.java:1691)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(
ServletHandler.java:582)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(
ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(
SecurityHandler.java:548)
at org.eclipse.jetty.server.session.SessionHandler.
doHandle(SessionHandler.java:226)
at org.eclipse.jetty.server.handler.ContextHandler.
doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at org.eclipse.jetty.server.session.SessionHandler.
doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.
doScope(ContextHandler.java:1112)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(
ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
ContextHandlerCollection.java:213)
at org.eclipse.jetty.server.handler.HandlerCollection.
handle(HandlerCollection.java:119)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
HandlerWrapper.java:134)
at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
RewriteHandler.java:335)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at org.eclipse.jetty.server.HttpConnection.onFillable(
HttpConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
SelectChannelEndPoint.java:93)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
executeProduceConsume(ExecuteProduceConsume.java:303)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
produceConsume(ExecuteProduceConsume.java:148)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
ExecuteProduceConsume.java:136)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
QueuedThreadPool.java:671)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)


Re: Filter Must/Must not clauses and parenthesis

2018-05-08 Thread Alfonso Noriega
Thanks Shawn!

I was not thinking of it as a subtraction but it makes all the sense put
like that.

On 8 May 2018 at 17:55, Shawn Heisey  wrote:

> On 5/8/2018 4:02 AM, Alfonso Noriega wrote:
>
>>   I found solr 5.5.4 is doing some unexpected behavior (at least
>> unexpected
>> for me) when using Must and Must not operator and parenthesis for
>> filtering
>> and it would be great if someone can confirm if this is unexpected or not
>> and why.
>>
>
> 
>
> Do you have any idea why is this happening?
>>
>
> I'm surprised ANY of those examples are working.  While the bug that Erick
> mentioned could be a problem, I think this is happening because you've got
> a multi-clause pure negative query. All query clauses have NOT attached to
> them.  Purely negative queries do not actually work.
>
> The reason negative queries don't work is that if you start with nothing
> and then start subtracting things, you end up with nothing.
>
> To properly work, the first example would need to be written like this:
>
> *:* AND NOT(status:"DELETED") AND (*:* AND NOT(length:[186+TO+365])
> AND NOT(length:[366+TO+*]))
>
> I have added the all documents query as the starting point for both major
> clauses, so that the subtraction (AND NOT) has something to subtract from.
> Some of those parentheses are unnecessary, but I have preserved them in the
> rewritten query.Without unnecessary parentheses/quotes, the query would
> look like this:
>
> *:* AND NOT status:DELETED AND (*:* AND NOT length:[186+TO+365]
> AND NOT length:[366+TO+*])
>
> You might be wondering why something like "fq=-status:DELETED" will work
> even though it's a purely negative query. This works because with a
> super-simple query like that, Solr is able to detect the unworkable
> situation and automatically fix it by adding the all-docs starting point
> behind the scenes. The example you gave is too complicated for Solr's
> detection to work, so it doesn't get fixed.
>
> Thanks,
> Shawn
>
>


-- 
-- 
Alfonso Noriega
Software engineer
Redlink GmbH
e: alfonso.nori...@redlink.co 
w: http://redlink.co


Re: Must clause with filter queries

2018-05-08 Thread root23
Hi Shawn,
Thanks for the repsonse. We have multiple clauses. I was just giving an bare
bone example. Usually all our queries will have more then one clause.

In case of frange query how do we specify the Must clause ?

the reason we are using frange instead of the normal syntax is that we need
to add a cost to this clause. Since this will return a lot of documents, we
want to calculate at the end of all the clauses. That is why we are using
frange with a cost of 200.

We have near real time requirements and that is the reason we are using 500
ms in the autosoft commit.
We have autowarmCount="60%" for filter cache.

We are using solr 6.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Filter Must/Must not clauses and parenthesis

2018-05-08 Thread Shawn Heisey

On 5/8/2018 4:02 AM, Alfonso Noriega wrote:

  I found solr 5.5.4 is doing some unexpected behavior (at least unexpected
for me) when using Must and Must not operator and parenthesis for filtering
and it would be great if someone can confirm if this is unexpected or not
and why.





Do you have any idea why is this happening?


I'm surprised ANY of those examples are working.  While the bug that 
Erick mentioned could be a problem, I think this is happening because 
you've got a multi-clause pure negative query. All query clauses have 
NOT attached to them.  Purely negative queries do not actually work.


The reason negative queries don't work is that if you start with nothing 
and then start subtracting things, you end up with nothing.


To properly work, the first example would need to be written like this:

*:* AND NOT(status:"DELETED") AND (*:* AND 
NOT(length:[186+TO+365]) AND NOT(length:[366+TO+*]))


I have added the all documents query as the starting point for both 
major clauses, so that the subtraction (AND NOT) has something to 
subtract from. Some of those parentheses are unnecessary, but I have 
preserved them in the rewritten query.Without unnecessary 
parentheses/quotes, the query would look like this:


*:* AND NOT status:DELETED AND (*:* AND NOT length:[186+TO+365] 
AND NOT length:[366+TO+*])


You might be wondering why something like "fq=-status:DELETED" will work 
even though it's a purely negative query. This works because with a 
super-simple query like that, Solr is able to detect the unworkable 
situation and automatically fix it by adding the all-docs starting point 
behind the scenes. The example you gave is too complicated for Solr's 
detection to work, so it doesn't get fixed.


Thanks,
Shawn



Re: Async exceptions during distributed update

2018-05-08 Thread Jay Potharaju
Hi Emir,
I was seeing this error as long as the indexing was running. Once I stopped
the indexing the errors also stopped.  Yes, we do monitor both hosts & solr
but have not seen anything out of the ordinary except for a small network
blip. In my experience solr generally recovers after a network blip and
there are a few errors for streaming solr client...but have never seen this
error before.

Thanks
Jay

Thanks
Jay Potharaju


On Tue, May 8, 2018 at 12:56 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Jay,
> This is low ingestion rate. What is the size of your index? What is heap
> size? I am guessing that this is not a huge index, so  I am leaning toward
> what Shawn mentioned - some combination of DBQ/merge/commit/optimise that
> is blocking indexing. Though, it is strange that it is happening only on
> one node if you are sending updates randomly to both nodes. Do you monitor
> your hosts/Solr? Do you see anything different at the time when timeouts
> happen?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 8 May 2018, at 03:23, Jay Potharaju  wrote:
> >
> > I have about 3-5 updates per second.
> >
> >
> >> On May 7, 2018, at 5:02 PM, Shawn Heisey  wrote:
> >>
> >>> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
> >>> There are some deletes by query. I have not had any issues with DBQ,
> >>> currently have 5.3 running in production.
> >>
> >> Here's the big problem with DBQ.  Imagine this sequence of events with
> >> these timestamps:
> >>
> >> 13:00:00: A commit for change visibility happens.
> >> 13:00:00: A segment merge is triggered by the commit.
> >> (It's a big merge that takes exactly 3 minutes.)
> >> 13:00:05: A deleteByQuery is sent.
> >> 13:00:15: An update to the index is sent.
> >> 13:00:25: An update to the index is sent.
> >> 13:00:35: An update to the index is sent.
> >> 13:00:45: An update to the index is sent.
> >> 13:00:55: An update to the index is sent.
> >> 13:01:05: An update to the index is sent.
> >> 13:01:15: An update to the index is sent.
> >> 13:01:25: An update to the index is sent.
> >> {time passes, more updates might be sent}
> >> 13:03:00: The merge finishes.
> >>
> >> Here's what would happen in this scenario:  The DBQ and all of the
> >> update requests sent *after* the DBQ will block until the merge
> >> finishes.  That means that it's going to take up to three minutes for
> >> Solr to respond to those requests.  If the client that is sending the
> >> request is configured with a 60 second socket timeout, which inter-node
> >> requests made by Solr are by default, then it is going to experience a
> >> timeout error.  The request will probably complete successfully once the
> >> merge finishes, but the connection is gone, and the client has already
> >> received an error.
> >>
> >> Now imagine what happens if an optimize (forced merge of the entire
> >> index) is requested on an index that's 50GB.  That optimize may take 2-3
> >> hours, possibly longer.  A deleteByQuery started on that index after the
> >> optimize begins (and any updates requested after the DBQ) will pause
> >> until the optimize is done.  A pause of 2 hours or more is a BIG
> problem.
> >>
> >> This is why deleteByQuery is not recommended.
> >>
> >> If the deleteByQuery were changed into a two-step process involving a
> >> query to retrieve ID values and then one or more deleteById requests,
> >> then none of that blocking would occur.  The deleteById operation can
> >> run at the same time as a segment merge, so neither it nor subsequent
> >> update requests will have the significant pause.  From what I
> >> understand, you can even do commits in this scenario and have changes be
> >> visible before the merge completes.  I haven't verified that this is the
> >> case.
> >>
> >> Experienced devs: Can we fix this problem with DBQ?  On indexes with a
> >> uniqueKey, can DBQ be changed to use the two-step process I mentioned?
> >>
> >> Thanks,
> >> Shawn
> >>
>
>


Re: Solr Slave failed to initialize collection

2018-05-08 Thread Shawn Heisey

On 5/8/2018 4:32 AM, Aji Viswanadhan wrote:

Is this issue happened due to the size of the index? or any recommendations
to not happen in future. Please let me know.


I have no idea why it happened.  Running out of disk space could cause 
any number of problems.  Program operation becomes unpredictable if 
resources run out.


Thanks,
Shawn



Re: Filter Must/Must not clauses and parenthesis

2018-05-08 Thread Erick Erickson
Just skimmed, but perhaps related to :
https://issues.apache.org/jira/browse/SOLR-12212?

Best,
Erick

On Tue, May 8, 2018 at 3:02 AM, Alfonso Noriega
 wrote:
> Hi everyone,
>  I found solr 5.5.4 is doing some unexpected behavior (at least unexpected
> for me) when using Must and Must not operator and parenthesis for filtering
> and it would be great if someone can confirm if this is unexpected or not
> and why.
>
> To clarify I will write an example:
> The following problematic query should give results but it is actually not
> giving anyi
> q=*:*=edismax=NOT(status:"DELETED")+AND+(NOT(length:[186+TO+365])+AND+NOT(length:[366+TO+*]))
> which is parsed as -dynamic_multi_stored_facet_string_static_status:DELETED
> +(-dynamic_multi_stored_facet_long_core_length:[186 TO 365]
> -dynamic_multi_stored_facet_long_core_length:[366 TO *])
>
> If I rewrite the query removing the enclosing parentheses as
> q=*:*=edismax=NOT(status:"DELETED")+AND+NOT(length:[186+TO+365])+AND+NOT(length:[366+TO+*]))
> is parsed as -dynamic_multi_stored_facet_string_static_status:DELETED
> -dynamic_multi_stored_facet_long_core_length:[186 TO 365]
> -dynamic_multi_stored_facet_long_core_length:[366 TO *]
> and it gives the expected results.
>
> Again if the parenthesis enclosed condition is alone as
> q=*:*=edismax=(NOT(length:[186+TO+365])+AND+NOT(length:[366+TO+*]))
> it is pased as (-dynamic_multi_stored_facet_long_core_length:[186 TO
> 365] -dynamic_multi_stored_facet_long_core_length:[366 TO *]) and
> giving more results.
>
> Do you have any idea why is this happening?
>
> Thanks for your help,
> Alfonso.
>
> --
> Alfonso Noriega
> Software engineer
> Redlink GmbH
> e: alfonso.nori...@redlink.co 
> w: http://redlink.co


Re:LTR performance issues

2018-05-08 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hello ilayaraja, 

I think it would be good to move this discussion on the Jira item: 

https://issues.apache.org/jira/browse/SOLR-8776?attachmentOrder=asc

You can add your comments there, and also in the page I explained how it works. 
On the performance you are right: at the moment it is slow. 

We recently improved the performance a lot for the particular use case where 
you are interested only in one document per group ( first part of the change 
has been upstreamed in the las vegas patch [1] ). 

For the general case, my opinion is that we could speed up by allowing the user 
to rerank only the groups (without affecting the order of the documents 
**within** each group). 

1. How many top groups are actually re-ranked, is it exactly what we pass in
reRankDocs?

> rerankDocs will rerank the top $rerankDocs groups, so if your groups contain 
> many documents you will rerank much more documents

2. How many documents within each group is re-ranked? Can we control it with
group.limit or some other parameter?

> $rerankDocs documents  will be reranked inside each group - please double 
> check on the jira and add your comments there. 

Cheers,
Diego


[1] 
https://issues.apache.org/jira/browse/SOLR-11831?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel=16316605#comment-16316605


From: solr-user@lucene.apache.org At: 05/08/18 07:07:01To:  
solr-user@lucene.apache.org
Subject: LTR performance issues

LTR with grouping results in very high latency (3x) even while re-ranking 24
top groups.

How is re-ranking implemented in Solr? Is it expected that it would result
in 3x more query time.

Need clarifications on:
1. How many top groups are actually re-ranked, is it exactly what we pass in
reRankDocs?
2. How many documents within each group is re-ranked? Can we control it with
group.limit or some other parameter?

What causes LTR take more time when grouping is performed? Is it scoring the
documents again or merging the re-ranked docs with rest of the docs?

Is there anyway to optimize this? 


-
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Re: Howto disable PrintGCTimeStamps in Solr

2018-05-08 Thread Bernd Fehling
Hi Shawn,

the goal is that some GCviewer get confused if both DateStamps and TimeStamps
are present in solr_gc.log file. And _not_ to reduce the GC log size, that
would be stupid.
Now I have a Perl-Script which will remove the TimeStamps (and only leaf the
DateStamps) for Analysis of solr_gc.log for some GCviewers.
Problem solved :-)

Generally I can understand that DateStamps or TimeStamps are added as default
when logging to a file, but it should be only one type and not both at once 
possible.

Thanks for filing the bug report, I missed that.

Regards
Bernd


Am 08.05.2018 um 11:32 schrieb Shawn Heisey:
> On 5/7/2018 8:22 AM, Bernd Fehling wrote:
>> thanks for asking, I figured it out this morning.
>> If setting -Xloggc= the option -XX:+PrintGCTimeStamps will be set
>> as default and can't be disabled. It's inside JAVA.
>>
>> Currently using Solr 6.4.2 with
>> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE 
>> (1.8.0_121-b13)
> 
> What is the end goal that has you trying to disable PrintGCTimeStamps? 
> Is it to reduce the size of the GC log by only including one timestamp,
> or something else?
> 
> Running java 1.8.0_144, I cannot seem to actually do it.  I tried
> removing the parameter from the start script, and I also tried
> *changing* the parameter to explicitly disable it:
> 
>  -XX:-PrintGCTimeStamps
> 
> Both times, I verified that the commandline had changed.  GC logging
> still includes both the full date stamp, which PrintGCDateStamps
> enables, and seconds since JVM start, which PrintGCTimeStamps enables.
> 
> For the attempt where I changed the parameter instead of removing it,
> this is the full commandline on the running java process that the start
> script executed:
> 
> "C:\Program Files\Java\jdk1.8.0_144\bin\java"  -server -Xms512m -Xmx512m
> -Duser.timezone=UTC -XX:NewRatio=3    -XX:SurvivorRatio=4   
> -XX:TargetSurvivorRatio=90    -XX:MaxTenuringThreshold=8   
> -XX:+UseConcMarkSweepGC    -XX:ConcGCThreads=4
> -XX:ParallelGCThreads=4    -XX:+CMSScavengeBeforeRemark   
> -XX:PretenureSizeThreshold=64m    -XX:+UseCMSInitiatingOccupancyOnly   
> -XX:CMSInitiatingOccupancyFraction=50   
> -XX:CMSMaxAbortablePrecleanTime=6000    -XX:+CMSParallelRemarkEnabled   
> -XX:+ParallelRefProcEnabled    -XX:-OmitStackTraceInFastThrow
> -verbose:gc  -XX:+PrintHeapAtGC  -XX:+PrintGCDetails 
> -XX:-PrintGCTimeStamps  -XX:+PrintGCDateStamps 
> -XX:+PrintTenuringDistribution  -XX:+PrintGCApplicationStoppedTime
> "-Xloggc:C:\Users\sheisey\Downloads\solr-7.3.0\server\logs\solr_gc.log"
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
> -Xss256k 
> -Dsolr.log.dir="C:\Users\sheisey\Downloads\solr-7.3.0\server\logs"
> -Dlog4j.configuration="file:C:\Users\sheisey\Downloads\solr-7.3.0\server\resources\log4j.properties"
> -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Dsolr.log.muteconsole
> -Dsolr.solr.home="C:\Users\sheisey\Downloads\solr-7.3.0\server\solr"
> -Dsolr.install.dir="C:\Users\sheisey\Downloads\solr-7.3.0"
> -Dsolr.default.confdir="C:\Users\sheisey\Downloads\solr-7.3.0\server\solr\configsets\_default\conf"
> 
> -Djetty.host=0.0.0.0 -Djetty.port=8983
> -Djetty.home="C:\Users\sheisey\Downloads\solr-7.3.0\server"
> -Djava.io.tmpdir="C:\Users\sheisey\Downloads\solr-7.3.0\server\tmp" -jar
> start.jar "--module=http" ""
> 
> That change should have done it.  I think we're dealing with a Java
> bug/misfeature.
> 
> Solr 5.5.5 with Java 1.7.0_80, 1.7.0_45, and 1.7.0_04 behave the same as
> 7.3.0 with Java 8.  I have also verified that Solr 4.7.2 with Java
> 1.7.0_72 has the same issue.  I do not have any information for Java 6
> versions.  All java versions examined are from Sun/Oracle.
> 
> I filed a bug with Oracle.  They have accepted it and it is now visible
> publicly.
> 
> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8202752
> 
> Thanks,
> Shawn
> 


Re: Solr Slave failed to initialize collection

2018-05-08 Thread Aji Viswanadhan
Hi Shawn ,

Thanks for the info!!

As I mentioned master index was fine, only for one of the collection in
salve index was corrupted. Yes, we fixed the issue by removing corrupted
index and replicated again. 

The error message shared we have received from Admin UI of Solr. Replication
strategy seems fine as it is happening properly from Master to Slave. 

Is this issue happened due to the size of the index? or any recommendations
to not happen in future. Please let me know.

Regards,
Aji Viswanadhan



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Filter Must/Must not clauses and parenthesis

2018-05-08 Thread Alfonso Noriega
Hi everyone,
 I found solr 5.5.4 is doing some unexpected behavior (at least unexpected
for me) when using Must and Must not operator and parenthesis for filtering
and it would be great if someone can confirm if this is unexpected or not
and why.

To clarify I will write an example:
The following problematic query should give results but it is actually not
giving anyi
q=*:*=edismax=NOT(status:"DELETED")+AND+(NOT(length:[186+TO+365])+AND+NOT(length:[366+TO+*]))
which is parsed as -dynamic_multi_stored_facet_string_static_status:DELETED
+(-dynamic_multi_stored_facet_long_core_length:[186 TO 365]
-dynamic_multi_stored_facet_long_core_length:[366 TO *])

If I rewrite the query removing the enclosing parentheses as
q=*:*=edismax=NOT(status:"DELETED")+AND+NOT(length:[186+TO+365])+AND+NOT(length:[366+TO+*]))
is parsed as -dynamic_multi_stored_facet_string_static_status:DELETED
-dynamic_multi_stored_facet_long_core_length:[186 TO 365]
-dynamic_multi_stored_facet_long_core_length:[366 TO *]
and it gives the expected results.

Again if the parenthesis enclosed condition is alone as
q=*:*=edismax=(NOT(length:[186+TO+365])+AND+NOT(length:[366+TO+*]))
it is pased as (-dynamic_multi_stored_facet_long_core_length:[186 TO
365] -dynamic_multi_stored_facet_long_core_length:[366 TO *]) and
giving more results.

Do you have any idea why is this happening?

Thanks for your help,
Alfonso.

-- 
Alfonso Noriega
Software engineer
Redlink GmbH
e: alfonso.nori...@redlink.co 
w: http://redlink.co


Re: Determine Solr Core Creation Timestamp

2018-05-08 Thread Atita Arora
Thank you Shawn for looking into this to such a depth.
Let me try getting hold of someway to grab this information and use it and
I may reach back to you or list for further thoughts.

Thanks again,
Atita

On Tue, May 8, 2018, 3:11 PM Shawn Heisey  wrote:

> On 5/7/2018 3:50 PM, Atita Arora wrote:
> > I noticed the same and hence overruled the idea to use it.
> > Further , while exploring the V2 api (as we're currently in Solr 6.6 and
> > will soon be on Solr 7.X) ,I came across the shards API which has
> > "property.index.version": "1525453818563"
> >
> > Which is listed for each of the shards. I wonder if I should be
> leveraging
> > this as this seem to be the index version & I dont think this number
> should
> > vary on restart.
>
> The index version is a number that is milliseconds since the epoch --
> 1970-01-01 00:00:00 UTC.  This is how Java represents timestamps
> internally.  All Lucene indexes have this information.
>
> The index version value appears to update every time the index changes,
> probably when a new searcher is opened.
>
> For SolrCloud collections, this information is actually already
> available, although getting to it may not be obvious.  ZooKeeeper itself
> keeps track of when all znodes are created, so the /collections/x
> znode creation time is effectively what you're after.  This can be seen
> in Cloud->Tree in the admin UI, which means that there is a way to
> obtain the information with an HTTP API.
>
> When cores are created or manipulated by API calls, the core.properties
> file will have a comment with a timestamp of the last time Solr
> wrote/changed the file.  CoreAdmin operations like CREATE, SWAP, RENAME,
> and others will update or create the timestamp in that comment, but if
> the properties file doesn't ever get changed by Solr, then the comment
> would reflect the creation time.  That makes it not entirely reliable.
> Also, I do not know of a way to access that information with any Solr
> API -- access to the filesystem would probably be required.
>
> The core.properties file could be a place to store a true creation time,
> using a new property that Solr doesn't need for any other purpose.  Solr
> could look for a creation time in that file when the core is started and
> update it to include the current time as the creation time if it is not
> present, and certain CoreAdmin operations could also write that
> property.  Retrieving the value would needed to be added to the
> CoreAdmin API.
>
> Thanks,
> Shawn
>
>


Re: Determine Solr Core Creation Timestamp

2018-05-08 Thread Shawn Heisey
On 5/7/2018 3:50 PM, Atita Arora wrote:
> I noticed the same and hence overruled the idea to use it.
> Further , while exploring the V2 api (as we're currently in Solr 6.6 and
> will soon be on Solr 7.X) ,I came across the shards API which has
> "property.index.version": "1525453818563"
>
> Which is listed for each of the shards. I wonder if I should be leveraging
> this as this seem to be the index version & I dont think this number should
> vary on restart.

The index version is a number that is milliseconds since the epoch --
1970-01-01 00:00:00 UTC.  This is how Java represents timestamps
internally.  All Lucene indexes have this information.

The index version value appears to update every time the index changes,
probably when a new searcher is opened.

For SolrCloud collections, this information is actually already
available, although getting to it may not be obvious.  ZooKeeeper itself
keeps track of when all znodes are created, so the /collections/x
znode creation time is effectively what you're after.  This can be seen
in Cloud->Tree in the admin UI, which means that there is a way to
obtain the information with an HTTP API.

When cores are created or manipulated by API calls, the core.properties
file will have a comment with a timestamp of the last time Solr
wrote/changed the file.  CoreAdmin operations like CREATE, SWAP, RENAME,
and others will update or create the timestamp in that comment, but if
the properties file doesn't ever get changed by Solr, then the comment
would reflect the creation time.  That makes it not entirely reliable. 
Also, I do not know of a way to access that information with any Solr
API -- access to the filesystem would probably be required.

The core.properties file could be a place to store a true creation time,
using a new property that Solr doesn't need for any other purpose.  Solr
could look for a creation time in that file when the core is started and
update it to include the current time as the creation time if it is not
present, and certain CoreAdmin operations could also write that
property.  Retrieving the value would needed to be added to the
CoreAdmin API.

Thanks,
Shawn



Re: Must clause with filter queries

2018-05-08 Thread Shawn Heisey
On 5/7/2018 9:51 AM, manuj singh wrote:
> I am kind of confused how must clause(+) behaves with the filter queries.
> e.g i have below query:
> q=*:*=+{!frange cost=200 l=NOW-179DAYS u=NOW/DAY+1DAY incl=true
> incu=false}date
>
> So i am filtering documents which are less then 179 old days.
> So e.g if now is May 7th, 10.23 cst,2018, i should only see documents which
> have date > Nov 9th, 10.23 cst, 2017.
>
> However with the above query i am also seeing documents which are done on
> Nov 5th,2017 (which seems like it is returning some docs from filter cache.
> which is wired because in my date range for the start date  i am using
> NOW-179DAYS and
> Now is changing every time, so it shouldn't go to filtercache as every new
> request will have  a different time stamp. )
>
> However if i remove the + from the filter query it seems to work fine.

I'm not sure that trying to use the + with the frange query makes any
sense.  For one thing, putting anything before the localparams (which is
the {!stuff otherstuff} syntax) probably causes Solr to not correctly
interpret the localparams syntax.  Typically localparams must be at the
very beginning of the query.  Adding a plus to a single-clause query
like that is not necessary.  Queries with one clause will effectively be
interpreted as having the +/MUST on that clause.

> I am mostly thinking it seems to be a filtercache issue but not sure how i
> prove that.
>
> Our auto soft commit is 500 ms , so every 0.5 second we should have a new
> searcher open and cache should be flushed.

A commit interval that low could result in some big problems.  I hope
the autowarmCount setting on all your caches is zero.  If it's not,
you're going to want to have a much longer interval than 500 milliseconds.

> Something is not right and i am not able to figure out what. Has some one
> seen this kind of issue before ?
>
> If i move the query from fq to q then also it works fine.
>
> One more thing when i put debug query i see the following in the parse query
>
> *"QParser": "LuceneQParser", "filter_queries": [ "+{!frange cost=200
> l=NOW-179DAYS u=NOW/DAY+1DAY incl=true incu=false}date", "-_parent_:F" ],
> "parsed_filter_queries": [
> "+FunctionRangeQuery(ConstantScore(frange(date(date)):[NOW-179DAYS TO
> NOW/DAY+1DAY}))", "-_parent_:false" ]*
>
> So in the above i do not see the date getting resolved to an actual time
> stamp.
>
> However if i change the syntax of the query to not use frange and local
> params i see the transaction date resolving into correct timestamp.
>
> So for the following query
> q=*:*=+date:[NOW-179DAYS TO NOW/DAY+1DAY]
>
> i see the following in the debug query, and see the actualy timestamp:
> "QParser": "LuceneQParser", "filter_queries": [ "date:[NOW-179DAYS TO
> NOW/DAY+1DAY]", "-_parent_:F" ], "parsed_filter_queries": [
> "date:[1510242067383
> TO 152573760]", "-_parent_:false" ],

If the filter you're trying to use is this kind of simple date range, I
would stick with lucene and not use localparams to switch to another
parser.  I would also set the low value of the range to NOW/DAY-179DAYS
so there's at least a chance that caching will be effective.  Also, as
mentioned, because this example only has one query clause, adding + is
unnecessary.  It might become necessary if you have multiple query
clauses ... but in that case, you're not likely to be using something
like frange.

Thanks,
Shawn



Re: Howto disable PrintGCTimeStamps in Solr

2018-05-08 Thread Shawn Heisey
On 5/7/2018 8:22 AM, Bernd Fehling wrote:
> thanks for asking, I figured it out this morning.
> If setting -Xloggc= the option -XX:+PrintGCTimeStamps will be set
> as default and can't be disabled. It's inside JAVA.
>
> Currently using Solr 6.4.2 with
> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE 
> (1.8.0_121-b13)

What is the end goal that has you trying to disable PrintGCTimeStamps? 
Is it to reduce the size of the GC log by only including one timestamp,
or something else?

Running java 1.8.0_144, I cannot seem to actually do it.  I tried
removing the parameter from the start script, and I also tried
*changing* the parameter to explicitly disable it:

 -XX:-PrintGCTimeStamps

Both times, I verified that the commandline had changed.  GC logging
still includes both the full date stamp, which PrintGCDateStamps
enables, and seconds since JVM start, which PrintGCTimeStamps enables.

For the attempt where I changed the parameter instead of removing it,
this is the full commandline on the running java process that the start
script executed:

"C:\Program Files\Java\jdk1.8.0_144\bin\java"  -server -Xms512m -Xmx512m
-Duser.timezone=UTC -XX:NewRatio=3    -XX:SurvivorRatio=4   
-XX:TargetSurvivorRatio=90    -XX:MaxTenuringThreshold=8   
-XX:+UseConcMarkSweepGC    -XX:ConcGCThreads=4
-XX:ParallelGCThreads=4    -XX:+CMSScavengeBeforeRemark   
-XX:PretenureSizeThreshold=64m    -XX:+UseCMSInitiatingOccupancyOnly   
-XX:CMSInitiatingOccupancyFraction=50   
-XX:CMSMaxAbortablePrecleanTime=6000    -XX:+CMSParallelRemarkEnabled   
-XX:+ParallelRefProcEnabled    -XX:-OmitStackTraceInFastThrow
-verbose:gc  -XX:+PrintHeapAtGC  -XX:+PrintGCDetails 
-XX:-PrintGCTimeStamps  -XX:+PrintGCDateStamps 
-XX:+PrintTenuringDistribution  -XX:+PrintGCApplicationStoppedTime
"-Xloggc:C:\Users\sheisey\Downloads\solr-7.3.0\server\logs\solr_gc.log"
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
-Xss256k 
-Dsolr.log.dir="C:\Users\sheisey\Downloads\solr-7.3.0\server\logs"
-Dlog4j.configuration="file:C:\Users\sheisey\Downloads\solr-7.3.0\server\resources\log4j.properties"
-DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Dsolr.log.muteconsole
-Dsolr.solr.home="C:\Users\sheisey\Downloads\solr-7.3.0\server\solr"
-Dsolr.install.dir="C:\Users\sheisey\Downloads\solr-7.3.0"
-Dsolr.default.confdir="C:\Users\sheisey\Downloads\solr-7.3.0\server\solr\configsets\_default\conf"

-Djetty.host=0.0.0.0 -Djetty.port=8983
-Djetty.home="C:\Users\sheisey\Downloads\solr-7.3.0\server"
-Djava.io.tmpdir="C:\Users\sheisey\Downloads\solr-7.3.0\server\tmp" -jar
start.jar "--module=http" ""

That change should have done it.  I think we're dealing with a Java
bug/misfeature.

Solr 5.5.5 with Java 1.7.0_80, 1.7.0_45, and 1.7.0_04 behave the same as
7.3.0 with Java 8.  I have also verified that Solr 4.7.2 with Java
1.7.0_72 has the same issue.  I do not have any information for Java 6
versions.  All java versions examined are from Sun/Oracle.

I filed a bug with Oracle.  They have accepted it and it is now visible
publicly.

https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8202752

Thanks,
Shawn



Re: Async exceptions during distributed update

2018-05-08 Thread Emir Arnautović
Hi Jay,
This is low ingestion rate. What is the size of your index? What is heap size? 
I am guessing that this is not a huge index, so  I am leaning toward what Shawn 
mentioned - some combination of DBQ/merge/commit/optimise that is blocking 
indexing. Though, it is strange that it is happening only on one node if you 
are sending updates randomly to both nodes. Do you monitor your hosts/Solr? Do 
you see anything different at the time when timeouts happen?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 8 May 2018, at 03:23, Jay Potharaju  wrote:
> 
> I have about 3-5 updates per second.
> 
> 
>> On May 7, 2018, at 5:02 PM, Shawn Heisey  wrote:
>> 
>>> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
>>> There are some deletes by query. I have not had any issues with DBQ,
>>> currently have 5.3 running in production.
>> 
>> Here's the big problem with DBQ.  Imagine this sequence of events with
>> these timestamps:
>> 
>> 13:00:00: A commit for change visibility happens.
>> 13:00:00: A segment merge is triggered by the commit.
>> (It's a big merge that takes exactly 3 minutes.)
>> 13:00:05: A deleteByQuery is sent.
>> 13:00:15: An update to the index is sent.
>> 13:00:25: An update to the index is sent.
>> 13:00:35: An update to the index is sent.
>> 13:00:45: An update to the index is sent.
>> 13:00:55: An update to the index is sent.
>> 13:01:05: An update to the index is sent.
>> 13:01:15: An update to the index is sent.
>> 13:01:25: An update to the index is sent.
>> {time passes, more updates might be sent}
>> 13:03:00: The merge finishes.
>> 
>> Here's what would happen in this scenario:  The DBQ and all of the
>> update requests sent *after* the DBQ will block until the merge
>> finishes.  That means that it's going to take up to three minutes for
>> Solr to respond to those requests.  If the client that is sending the
>> request is configured with a 60 second socket timeout, which inter-node
>> requests made by Solr are by default, then it is going to experience a
>> timeout error.  The request will probably complete successfully once the
>> merge finishes, but the connection is gone, and the client has already
>> received an error.
>> 
>> Now imagine what happens if an optimize (forced merge of the entire
>> index) is requested on an index that's 50GB.  That optimize may take 2-3
>> hours, possibly longer.  A deleteByQuery started on that index after the
>> optimize begins (and any updates requested after the DBQ) will pause
>> until the optimize is done.  A pause of 2 hours or more is a BIG problem.
>> 
>> This is why deleteByQuery is not recommended.
>> 
>> If the deleteByQuery were changed into a two-step process involving a
>> query to retrieve ID values and then one or more deleteById requests,
>> then none of that blocking would occur.  The deleteById operation can
>> run at the same time as a segment merge, so neither it nor subsequent
>> update requests will have the significant pause.  From what I
>> understand, you can even do commits in this scenario and have changes be
>> visible before the merge completes.  I haven't verified that this is the
>> case.
>> 
>> Experienced devs: Can we fix this problem with DBQ?  On indexes with a
>> uniqueKey, can DBQ be changed to use the two-step process I mentioned?
>> 
>> Thanks,
>> Shawn
>> 



LTR performance issues

2018-05-08 Thread ilayaraja
LTR with grouping results in very high latency (3x) even while re-ranking 24
top groups.

How is re-ranking implemented in Solr? Is it expected that it would result
in 3x more query time.

Need clarifications on:
1. How many top groups are actually re-ranked, is it exactly what we pass in
reRankDocs?
2. How many documents within each group is re-ranked? Can we control it with
group.limit or some other parameter?

What causes LTR take more time when grouping is performed? Is it scoring the
documents again or merging the re-ranked docs with rest of the docs?

Is there anyway to optimize this? 







-
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html