Re: Dealing with null values in streaming rollup

2018-10-22 Thread RAUNAK AGRAWAL
Thanks a lot Jan. Will try with 7.5

I am currently using 7.2.1 version. Is there a way to fix it?

On Fri, Oct 19, 2018 at 12:31 AM Jan Høydahl  wrote:

> Have you tried with Solr 7.5? I think it may have been fixed in that
> version? At least for the timeseries() expression...
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 18. okt. 2018 kl. 05:35 skrev RAUNAK AGRAWAL :
> >
> > Hi,
> >
> > I am trying to use streaming rollup expression to aggregate the sales
> > values over week. Here is the query:
> >
> > curl http://localhost:8983/solr/metrics_data/stream -d 'expr=rollup(
> >   search(metrics_data, q=id:123, fl="week_no,sales,qty", qt="/export",
> > sort="week_no desc"),
> >  over="week",
> >   sum(sales),
> >   sum(qty)
> > )'
> >
> > But I am getting exception like:
> >
> > {
> > "result-set": {
> > "docs": [{
> > "EXCEPTION": null,
> > "EOF": true,
> > "RESPONSE_TIME": 169
> > }]
> > }
> > }
> >
> > The reason being some of the documents are having null as sales. One
> option
> > is to wrap the search with select expression
> > with replace(field,null,withValue=0). Is there any other way for rollup
> to
> > ignore those docs which has some fields as null?
> >
> > Thanks in advance
>
>


Dealing with null values in streaming rollup

2018-10-17 Thread RAUNAK AGRAWAL
Hi,

I am trying to use streaming rollup expression to aggregate the sales
values over week. Here is the query:

curl http://localhost:8983/solr/metrics_data/stream -d 'expr=rollup(
   search(metrics_data, q=id:123, fl="week_no,sales,qty", qt="/export",
sort="week_no desc"),
  over="week",
   sum(sales),
   sum(qty)
)'

But I am getting exception like:

{
"result-set": {
"docs": [{
"EXCEPTION": null,
"EOF": true,
"RESPONSE_TIME": 169
}]
}
}

The reason being some of the documents are having null as sales. One option
is to wrap the search with select expression
with replace(field,null,withValue=0). Is there any other way for rollup to
ignore those docs which has some fields as null?

Thanks in advance


Re: Streaming rollUp vs Streaming facet

2018-10-17 Thread RAUNAK AGRAWAL
Thanks a lot Joel. This makes sense but in my use case, I am aggregating 10
fields but it is performing 2x better than the facet streaming.

On Wed, Oct 17, 2018 at 6:56 PM Joel Bernstein  wrote:

> They are very different.
>
> The "facet" expression sends a request to the JSON facet API which pushes
> the aggregation into the search engine. In most scenarios this is the
> preferred method because it only streams aggregated results. I would always
> try the "facet" expression first before going to rollup.
>
> The "rollup" expression rolls up aggregations over a sorted stream of
> tuples. It almost always involves exporting and sorting entire result sets
> with the /export handler. There are only two reasons to use this approach:
>
> 1) Very high cardinality faceting. By very high I mean millions of facet
> values are being returned in the same query.
> 2) Rollups following any kind of relational algebra. For example a rollup
> on top of a hashJoin.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Oct 16, 2018 at 8:54 AM RAUNAK AGRAWAL 
> wrote:
>
> > Hi Guys,
> >
> > I am trying to do an aggregation (sum) using streaming API. I have around
> > 10 billion documents in my collection and every document has around 10
> > docValues.
> >
> > So streaming facet is taking close to 6 secs to respond with aggregation
> on
> > 10 fields while streaming rollup is returning the response in 2 secs.
> >
> > So my questions are:
> >
> > 1. What is the fundamental difference between streaming facet and rollUp.
> > 2. When to use facet and when to use rollUp.
> >
> > Thanks
> >
>


Streaming rollUp vs Streaming facet

2018-10-16 Thread RAUNAK AGRAWAL
Hi Guys,

I am trying to do an aggregation (sum) using streaming API. I have around
10 billion documents in my collection and every document has around 10
docValues.

So streaming facet is taking close to 6 secs to respond with aggregation on
10 fields while streaming rollup is returning the response in 2 secs.

So my questions are:

1. What is the fundamental difference between streaming facet and rollUp.
2. When to use facet and when to use rollUp.

Thanks


Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread RAUNAK AGRAWAL
Thank you Joel. Looking forward to the latest version of solr.

Thanks

On Fri, Sep 28, 2018 at 12:22 PM Joel Bernstein  wrote:

> The facet expression is currently not as expressive as the JSON facet API.
> So for very demanding use cases you can create more highly tuned JSON facet
> API call.
>
> The good news is we are working this. And also working on other expressions
> that can be wrapped around the facet expression to implement parallelism
> and scaling. We hope to have this ready for Solr 8, which is just around
> the corner.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Sep 28, 2018 at 2:52 PM RAUNAK AGRAWAL 
> wrote:
>
> > Thanks a lot Toki. I will get back to you soon regarding patch update
> after
> > having discussion with the team.
> >
> > Thanks & Regards
> >
> >
> > On Fri, Sep 28, 2018 at 11:30 AM Toke Eskildsen  wrote:
> >
> > > RAUNAK AGRAWAL  wrote:
> > >
> > > > curl http://localhost:8983/solr/collection_name/stream -d
> > > > 'expr=facet(collection_name,q="id:953",bucketSorts="week
> > > > desc",buckets="week",bucketSizeLimit=200,sum(sales),
> > > > sum(amount),sum(days))'
> > >
> > > Stats on numeric fields then.
> > >
> > > > Also in my collection, I have almost 10 Billion documents
> > > > with many deletions (close to 40%).
> > >
> > > Quite a lot of documents and in this case deletions counts, as the
> > > internal structures for the deleted documents still needs to be
> iterated.
> > > In scale this looks somewhat like our 18 billion document setup, with
> the
> > > addendum that we use quite large segments (900GB).
> > >
> > > The performance regressions we encountered with Solr 7 lead to
> > > https://issues.apache.org/jira/browse/LUCENE-8374 which helped a lot
> > > (performance testing has not finished). If you have or can easily
> create
> > a
> > > test server where your shard(s) is the same size as your production
> > shards,
> > > I'd be happy to port the patch to Solr 7.2.1 to see it it helps. I am
> > > looking for independent verification, so it is no bother.
> > >
> > > > I was planning to run optimise to merge the segments but
> > > > spoke to admin team and lucidworks guys and they were
> > > > against it saying that it will make very large segment file.
> > >
> > > If your bottleneck is the same as ours, the large segment would mean
> > worse
> > > performance (with Solr 7).
> > >
> > > > Is it true that optimise in solr should not be used, as it comes with
> > > other issues?
> > >
> > > No simple answer there. If you have an index that you update very
> rarely,
> > > it can save memory and processing power. If you have a live index where
> > you
> > > add and delete documents, it will probably be a bad idea. One strategy
> > used
> > > with time series data is to have old and immutable data in dedicated
> > > collections, which can then be optimized.
> > >
> > > - Toke Eskildsen
> > >
> >
>


Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread RAUNAK AGRAWAL
Thanks a lot Toki. I will get back to you soon regarding patch update after
having discussion with the team.

Thanks & Regards


On Fri, Sep 28, 2018 at 11:30 AM Toke Eskildsen  wrote:

> RAUNAK AGRAWAL  wrote:
>
> > curl http://localhost:8983/solr/collection_name/stream -d
> > 'expr=facet(collection_name,q="id:953",bucketSorts="week
> > desc",buckets="week",bucketSizeLimit=200,sum(sales),
> > sum(amount),sum(days))'
>
> Stats on numeric fields then.
>
> > Also in my collection, I have almost 10 Billion documents
> > with many deletions (close to 40%).
>
> Quite a lot of documents and in this case deletions counts, as the
> internal structures for the deleted documents still needs to be iterated.
> In scale this looks somewhat like our 18 billion document setup, with the
> addendum that we use quite large segments (900GB).
>
> The performance regressions we encountered with Solr 7 lead to
> https://issues.apache.org/jira/browse/LUCENE-8374 which helped a lot
> (performance testing has not finished). If you have or can easily create a
> test server where your shard(s) is the same size as your production shards,
> I'd be happy to port the patch to Solr 7.2.1 to see it it helps. I am
> looking for independent verification, so it is no bother.
>
> > I was planning to run optimise to merge the segments but
> > spoke to admin team and lucidworks guys and they were
> > against it saying that it will make very large segment file.
>
> If your bottleneck is the same as ours, the large segment would mean worse
> performance (with Solr 7).
>
> > Is it true that optimise in solr should not be used, as it comes with
> other issues?
>
> No simple answer there. If you have an index that you update very rarely,
> it can save memory and processing power. If you have a live index where you
> add and delete documents, it will probably be a bad idea. One strategy used
> with time series data is to have old and immutable data in dedicated
> collections, which can then be optimized.
>
> - Toke Eskildsen
>


Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread RAUNAK AGRAWAL
Thanks a lot Erick for the documentation. I will go through it and get back
to you in case of any queries.

Regards,
Raunak

On Fri, Sep 28, 2018 at 11:09 AM Erick Erickson 
wrote:

> It Depends (tm). The behavior changed with Solr 7.5. Here are all the
> gory details:
>
>
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
>
> and for 7.5+
> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
>
> Best,
> Erick
> On Fri, Sep 28, 2018 at 10:09 AM RAUNAK AGRAWAL
>  wrote:
> >
> > Hey Guys,
> >
> > This is the sample query I am making:
> >
> >
> > curl http://localhost:8983/solr/collection_name/stream -d
> > 'expr=facet(collection_name,q="id:953",bucketSorts="week
> >
> desc",buckets="week",bucketSizeLimit=200,sum(sales),sum(amount),sum(days))'
> >
> >
> > Also in my collection, I have almost 10 Billion documents with many
> > deletions (close to 40%). I was planning to run optimise to merge the
> > segments but spoke to admin team and lucidworks guys and they were
> against
> > it saying that it will make very large segment file. Is it true that
> > optimise in solr should not be used, as it comes with other issues?
> >
> > Thanks
> >
> > On Fri, Sep 28, 2018 at 7:40 AM Toke Eskildsen  wrote:
> >
> > > On Thu, 2018-09-27 at 15:52 -0700, RAUNAK AGRAWAL wrote:
> > > > But for last few days, we are observing now that streaming facet
> > > > response is slower that json facets. Also we have increased the
> > > > number of documents in collection (30%).
> > >
> > > Export performance goes down when segment size goes way up, so I would
> > > expect streaming to do the same. I would not expect a 30% increase to
> > > cause something serious on that account though. How many documents in
> > > your index?
> > >
> > > - Toke Eskildsen, Royal Danish Library
> > >
> > >
>


Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread RAUNAK AGRAWAL
Hey Guys,

This is the sample query I am making:


curl http://localhost:8983/solr/collection_name/stream -d
'expr=facet(collection_name,q="id:953",bucketSorts="week
desc",buckets="week",bucketSizeLimit=200,sum(sales),sum(amount),sum(days))'


Also in my collection, I have almost 10 Billion documents with many
deletions (close to 40%). I was planning to run optimise to merge the
segments but spoke to admin team and lucidworks guys and they were against
it saying that it will make very large segment file. Is it true that
optimise in solr should not be used, as it comes with other issues?

Thanks

On Fri, Sep 28, 2018 at 7:40 AM Toke Eskildsen  wrote:

> On Thu, 2018-09-27 at 15:52 -0700, RAUNAK AGRAWAL wrote:
> > But for last few days, we are observing now that streaming facet
> > response is slower that json facets. Also we have increased the
> > number of documents in collection (30%).
>
> Export performance goes down when segment size goes way up, so I would
> expect streaming to do the same. I would not expect a 30% increase to
> cause something serious on that account though. How many documents in
> your index?
>
> - Toke Eskildsen, Royal Danish Library
>
>


Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-27 Thread RAUNAK AGRAWAL
Hi Guys,

Just to give you context, we were using JSON Facets for doing analytical
queries in solr but they were slower. Hence we migrated our application to
use solr streaming facet queries.

But for last few days, we are observing now that streaming facet response
is slower that json facets. Also we have increased the number of documents
in collection (30%).

So I have couple of questions:

1. When to use JSON Facets and when to use solr streaming facets?
2. Solr streaming also comes with rollup? How is it different from
streaming facets?
3. Is there a way to debug the queries in streaming mode because I tried
debug=true but it is not working in streaming queries.
4. When I don't mention any number of workers for streaming queries, does
all the shards of a collection becomes the workers?

Looking forward to your reply .


Thanks and regards
Raunak


Unable to make IN queries on a particular field in solr

2018-05-23 Thread RAUNAK AGRAWAL
Hi,

I am facing an issue where I have a collection named employee collection.

Suppose I was to search employee by its id, so my query is *id:(1 2 3*) and
it is working fine in solr. Now let say I want to search by their name. So
my query is name:(Alice Bob).

Now the problem is when I am querying by *name:(Alice Bob)*, I am not
getting any result but if I query by *name:(Alice OR Bob)*, I am able to
fetch the result.

Can someone please explain:


   - Why IN query for name is not working with space and working with *OR*
   - *Why IN query for id is working with space and not working for name
   though both are fields in same collection.*


Thanks


Re: How to escape OR or any other keyword in solr

2018-03-27 Thread RAUNAK AGRAWAL
Hi Peter,

Yes, I am using the stopword file which has *or *in it. Thanks for pointing
out. Will remove it from the stopword file and test it again. Thank you
very much!!

On Tue, Mar 27, 2018 at 1:17 PM, Peter Lancaster <
peter.lancas...@findmypast.com> wrote:

> Hi Raunak,
>
> Are you using a stop word file? That might be why you're getting 0 results
> searching for "OR".
>
> Cheers,
> Peter.
>
> -Original Message-
> From: RAUNAK AGRAWAL [mailto:agrawal.rau...@gmail.com]
> Sent: 27 March 2018 07:45
> To: solr-user@lucene.apache.org
> Subject: How to escape OR or any other keyword in solr
>
> I have to search for state "OR" [short form for Oregon]. When I am making
> query state:OR, I am getting SolrException since it is recognising it as
> keyword.
>
> Now I tried with quotes ("") or //OR as well and when doing so..Solr
> doesn't give exception but it also doesn't return any matching document.
>
> Kindly let me know what is the workaround for this issue?
>
> Thanks
> 
>
> This message is confidential and may contain privileged information. You
> should not disclose its contents to any other person. If you are not the
> intended recipient, please notify the sender named above immediately. It is
> expressly declared that this e-mail does not constitute nor form part of a
> contract or unilateral obligation. Opinions, conclusions and other
> information in this message that do not relate to the official business of
> findmypast shall be understood as neither given nor endorsed by it.
> 
>
> __
>
> This email has been checked for virus and other malicious content prior to
> leaving our network.
> __
>


How to escape OR or any other keyword in solr

2018-03-27 Thread RAUNAK AGRAWAL
I have to search for state "OR" [short form for Oregon]. When I am making
query state:OR, I am getting SolrException since it is recognising it as
keyword.

Now I tried with quotes ("") or //OR as well and when doing so..Solr
doesn't give exception but it also doesn't return any matching document.

Kindly let me know what is the workaround for this issue?

Thanks


Re: Json Facet Query Stripping Field Name with Hyphen

2018-01-04 Thread RAUNAK AGRAWAL
Hi Erick/Yonik,

Thank you guys. I am going to rename the fields.

On Thu, Jan 4, 2018 at 10:04 PM, Yonik Seeley <ysee...@gmail.com> wrote:

> The JSON Facet API uses the function query parser for something like
> sum(week_-91) so you'll probably have problems with any function that
> uses these fields as well.
> As Erick says, you're better off renaming the fields.  There is a
> workaround for wonky field names via the "field" function:
> sum(field(week_-91))
>
> -Yonik
>
>
> On Thu, Jan 4, 2018 at 10:02 AM, RAUNAK AGRAWAL
> <agrawal.rau...@gmail.com> wrote:
> > Hi Guys,
> >
> > I am facing issue where I am trying to follow the JSON facet API. I have
> > data in my collection and field names are like "week_0", "week_-1" which
> > means current week and previous week respectively.
> >
> > When I am querying for week_0 summation using the following query I am
> able
> > to get the result.
> >
> > http://localhost:8983/solr/collection1/query?q=*:*.
> facet={week_0_sum:'sum(week_0)'}=0
> >
> >
> > But when I am trying to do the same for any field "week_-*", it is break.
> >
> > For example when I am trying:
> > http://localhost:8983/solr/collection1/query?q=*:*.
> facet={week_-91_sum:%27sum(week_-91)%27}=0
> >
> >
> > I am getting the exception as* "msg": "undefined field: \"week_\"''*
> >
> >
> > That means solr is stripping field name after hyphen (-). Is there
> > workaround to fix this. I tried adding escape character (\) but it is of
> no
> > help.
> >
> > With escape:
> > http://localhost:8983/solr/collection1/query?q=*:*.
> facet={week_-91_sum:%27sum(week_\-91)%27}=0
> >
> >
> > Please help me regarding this.
> >
> > Thanks
>


Json Facet Query Stripping Field Name with Hyphen

2018-01-04 Thread RAUNAK AGRAWAL
Hi Guys,

I am facing issue where I am trying to follow the JSON facet API. I have
data in my collection and field names are like "week_0", "week_-1" which
means current week and previous week respectively.

When I am querying for week_0 summation using the following query I am able
to get the result.

http://localhost:8983/solr/collection1/query?q=*:*={week_0_sum:'sum(week_0)'}=0


But when I am trying to do the same for any field "week_-*", it is break.

For example when I am trying:
http://localhost:8983/solr/collection1/query?q=*:*={week_-91_sum:%27sum(week_-91)%27}=0


I am getting the exception as* "msg": "undefined field: \"week_\"''*


That means solr is stripping field name after hyphen (-). Is there
workaround to fix this. I tried adding escape character (\) but it is of no
help.

With escape:
http://localhost:8983/solr/collection1/query?q=*:*={week_-91_sum:%27sum(week_\-91)%27}=0


Please help me regarding this.

Thanks


Re: SolrJ with Async Http Client

2018-01-03 Thread RAUNAK AGRAWAL
Yes, I am talking about event driven way of calling solr, so that I can
write pure async web service. Does SolrJ provides support for non-blocking
calls?

On Wed, Jan 3, 2018 at 6:22 PM, Hendrik Haddorp <hendrik.hadd...@gmx.net>
wrote:

> There is asynchronous and non-blocking. If I use 100 threads to perform
> calls to Solr using the standard Java HTTP client or SolrJ I block 100
> threads even if I don't block my program logic threads by using async
> calls. However if I perform those HTTP calls using a non-blocking HTTP
> client, like netty, I basically only need a single eventing thread in
> addition to my normal threads. The advantage is less memory usage and an
> often better scaling. I would however expect that the main advantage would
> be on the server side.
>
>
> On 02.01.2018 22:02, Gus Heck wrote:
>
>> It's not very clear (to me) what your use case is, but generally speaking,
>> asynchronous requests can be achieved by using threads/executors/futures
>> (java) or ajax (javascript). The link seems to be a scala project, I'm
>> sure
>> scala has analogous facilities.
>>
>> On Tue, Jan 2, 2018 at 10:31 AM, RAUNAK AGRAWAL <agrawal.rau...@gmail.com
>> >
>> wrote:
>>
>> Hi Guys,
>>>
>>> I am trying to write fully async service where solr calls are also async.
>>> Just wondering did anyone tried calling solr in non-blocking mode or is
>>> there is a way to do it? I have come across one such project
>>> <https://github.com/inoio/solrs> but wondering is there anything
>>> provided
>>> by solrj?
>>>
>>> Thanks
>>>
>>>
>>
>>
>


SolrJ with Async Http Client

2018-01-02 Thread RAUNAK AGRAWAL
Hi Guys,

I am trying to write fully async service where solr calls are also async.
Just wondering did anyone tried calling solr in non-blocking mode or is
there is a way to do it? I have come across one such project
 but wondering is there anything provided
by solrj?

Thanks


Re: Solr Aggregation queries are way slower than Elastic Search

2017-12-12 Thread RAUNAK AGRAWAL
Thanks Yonik and Joel. I will try with JSON Facet API and update the
results here.

On Tue, Dec 12, 2017 at 10:56 PM, Yonik Seeley <ysee...@gmail.com> wrote:

> On Tue, Dec 12, 2017 at 9:17 AM, RAUNAK AGRAWAL
> <agrawal.rau...@gmail.com> wrote:
> > Hi Yonik,
> >
> > So if the query is fine then I guess even using JSON Facet API will not
> > help me here.
>
> As Joel mentioned, it's completely different code than the old stats API.
> This is a very simple use-case, so if we're slower than ES for some
> reason, it should be very easy to fix.
>
> -Yonik
>
>
> > On Tue, Dec 12, 2017 at 7:27 PM, Yonik Seeley <ysee...@gmail.com> wrote:
> >
> >> OK great, so it's definitely not the main query (which is just a
> >> single term query in this example!)
> >>
> >> > Also I have looked into the JSON Facet API. If I have to use facets, I
> >> will
> >> > have to then define 3600 facets in a single query and I guess that
> would
> >> be
> >> > also slow.
> >>
> >> You can ask for any number of stats for a given facet (even the root
> >> facet bucket w/o faceting on any fields):
> >>
> >> cutl 'http://localhost:8983/solr/collection1.query?q=variable1:
> >> 290=0={
> >>   s1:"sum(metric_1)",
> >>   s2:"sum(metric_2)",
> >>   s3:"sum(metric_3)"
> >> }'
> >>
> >> -Yonik
> >>
> >>
> >> On Tue, Dec 12, 2017 at 5:40 AM, RAUNAK AGRAWAL
> >> <agrawal.rau...@gmail.com> wrote:
> >> > Hi Yonik,
> >> >
> >> > As you asked here is the code snippet and the actual solr query.
> Please
> >> > have a look. I have included only 104 metrics but like this we can go
> >> upto
> >> > 3600 rollups.
> >> >
> >> > Also I have looked into the JSON Facet API. If I have to use facets, I
> >> will
> >> > have to then define 3600 facets in a single query and I guess that
> would
> >> be
> >> > also slow. Also is there any max limit on the number of facets we can
> >> > define in a single query?
> >> >
> >> > Code snippet:
> >> >
> >> > private SolrQuery buildQuery(Integer variable1, List metrics)
> {
> >> > SolrQuery query = new SolrQuery();
> >> > query.set("q", "variable1:" + variable1);
> >> > query.setRows(0);
> >> > metrics.forEach(
> >> > metric -> query.setGetFieldStatistics("{!sum=true }" +
> >> metric)
> >> > );
> >> > return query;
> >> > }
> >> >
> >> >
> >> > The generated query:
> >> >
> >> > {! q=variable1:290 rows=0 stats=true stats.field='{!sum=true
> >> > }metric_1' stats.field='{!sum=true }metric_2' stats.field='{!sum=true
> >> > }metric_3' stats.field='{!sum=true }metric_4' stats.field='{!sum=true
> >> > }metric_5' stats.field='{!sum=true }metric_6' stats.field='{!sum=true
> >> > }metric_7' stats.field='{!sum=true }metric_8' stats.field='{!sum=true
> >> > }metric_9' stats.field='{!sum=true }metric_10' stats.field='{!sum=true
> >> > }metric_11' stats.field='{!sum=true }metric_12'
> >> > stats.field='{!sum=true }metric_13' stats.field='{!sum=true
> >> > }metric_14' stats.field='{!sum=true }metric_15'
> >> > stats.field='{!sum=true }metric_16' stats.field='{!sum=true
> >> > }metric_17' stats.field='{!sum=true }metric_18'
> >> > stats.field='{!sum=true }metric_19' stats.field='{!sum=true
> >> > }metric_20' stats.field='{!sum=true }metric_21'
> >> > stats.field='{!sum=true }metric_22' stats.field='{!sum=true
> >> > }metric_23' stats.field='{!sum=true }metric_24'
> >> > stats.field='{!sum=true }metric_25' stats.field='{!sum=true
> >> > }metric_26' stats.field='{!sum=true }metric_27'
> >> > stats.field='{!sum=true }metric_28' stats.field='{!sum=true
> >> > }metric_29' stats.field='{!sum=true }metric_30'
> >> > stats.field='{!sum=true }metric_31' stats.field='{!sum=true
> >> > }metric_32' stats.field='{!sum=true }metric_33'
> >> > stats.field='{!sum=true }metric_34' stats.field='{!sum=true
> >> > }metric_35' stats.field='{!sum=true }metric_36'
> >> > stats.field='{!sum=true }metric_37' stats.field='{!sum=true
> >> > }metric_38' stats.field='{!sum=true }metric_39'
> >> > sta

Re: Solr Aggregation queries are way slower than Elastic Search

2017-12-12 Thread RAUNAK AGRAWAL
Hi Yonik,

So if the query is fine then I guess even using JSON Facet API will not
help me here. Can you suggest me some other idea or further tuning which
will help me in reducing the latency?

On Tue, Dec 12, 2017 at 7:27 PM, Yonik Seeley <ysee...@gmail.com> wrote:

> OK great, so it's definitely not the main query (which is just a
> single term query in this example!)
>
> > Also I have looked into the JSON Facet API. If I have to use facets, I
> will
> > have to then define 3600 facets in a single query and I guess that would
> be
> > also slow.
>
> You can ask for any number of stats for a given facet (even the root
> facet bucket w/o faceting on any fields):
>
> cutl 'http://localhost:8983/solr/collection1.query?q=variable1:
> 290=0={
>   s1:"sum(metric_1)",
>   s2:"sum(metric_2)",
>   s3:"sum(metric_3)"
> }'
>
> -Yonik
>
>
> On Tue, Dec 12, 2017 at 5:40 AM, RAUNAK AGRAWAL
> <agrawal.rau...@gmail.com> wrote:
> > Hi Yonik,
> >
> > As you asked here is the code snippet and the actual solr query. Please
> > have a look. I have included only 104 metrics but like this we can go
> upto
> > 3600 rollups.
> >
> > Also I have looked into the JSON Facet API. If I have to use facets, I
> will
> > have to then define 3600 facets in a single query and I guess that would
> be
> > also slow. Also is there any max limit on the number of facets we can
> > define in a single query?
> >
> > Code snippet:
> >
> > private SolrQuery buildQuery(Integer variable1, List metrics) {
> > SolrQuery query = new SolrQuery();
> > query.set("q", "variable1:" + variable1);
> > query.setRows(0);
> > metrics.forEach(
> > metric -> query.setGetFieldStatistics("{!sum=true }" +
> metric)
> > );
> > return query;
> > }
> >
> >
> > The generated query:
> >
> > {! q=variable1:290 rows=0 stats=true stats.field='{!sum=true
> > }metric_1' stats.field='{!sum=true }metric_2' stats.field='{!sum=true
> > }metric_3' stats.field='{!sum=true }metric_4' stats.field='{!sum=true
> > }metric_5' stats.field='{!sum=true }metric_6' stats.field='{!sum=true
> > }metric_7' stats.field='{!sum=true }metric_8' stats.field='{!sum=true
> > }metric_9' stats.field='{!sum=true }metric_10' stats.field='{!sum=true
> > }metric_11' stats.field='{!sum=true }metric_12'
> > stats.field='{!sum=true }metric_13' stats.field='{!sum=true
> > }metric_14' stats.field='{!sum=true }metric_15'
> > stats.field='{!sum=true }metric_16' stats.field='{!sum=true
> > }metric_17' stats.field='{!sum=true }metric_18'
> > stats.field='{!sum=true }metric_19' stats.field='{!sum=true
> > }metric_20' stats.field='{!sum=true }metric_21'
> > stats.field='{!sum=true }metric_22' stats.field='{!sum=true
> > }metric_23' stats.field='{!sum=true }metric_24'
> > stats.field='{!sum=true }metric_25' stats.field='{!sum=true
> > }metric_26' stats.field='{!sum=true }metric_27'
> > stats.field='{!sum=true }metric_28' stats.field='{!sum=true
> > }metric_29' stats.field='{!sum=true }metric_30'
> > stats.field='{!sum=true }metric_31' stats.field='{!sum=true
> > }metric_32' stats.field='{!sum=true }metric_33'
> > stats.field='{!sum=true }metric_34' stats.field='{!sum=true
> > }metric_35' stats.field='{!sum=true }metric_36'
> > stats.field='{!sum=true }metric_37' stats.field='{!sum=true
> > }metric_38' stats.field='{!sum=true }metric_39'
> > stats.field='{!sum=true }metric_40' stats.field='{!sum=true
> > }metric_41' stats.field='{!sum=true }metric_42'
> > stats.field='{!sum=true }metric_43' stats.field='{!sum=true
> > }metric_44' stats.field='{!sum=true }metric_45'
> > stats.field='{!sum=true }metric_46' stats.field='{!sum=true
> > }metric_47' stats.field='{!sum=true }metric_48'
> > stats.field='{!sum=true }metric_49' stats.field='{!sum=true
> > }metric_50' stats.field='{!sum=true }metric_51'
> > stats.field='{!sum=true }metric_52' stats.field='{!sum=true
> > }metric_53' stats.field='{!sum=true }metric_54'
> > stats.field='{!sum=true }metric_55' stats.field='{!sum=true
> > }metric_56' stats.field='{!sum=true }metric_57'
> > stats.field='{!sum=true }metric_58' stats.field='{!sum=true
> > }metric_59' stats.field='{!sum=true }metric_60'
> > stats.field='{!sum=true }metric_61' stats.field='{!sum=true
> > }metric_62' stats.field='{!sum=true }metric_63'
> > stats.field='{!sum=true }metric_64' stats.field='{!sum=true
> > }metric_65' stats.field='{!sum=true }metric_66'
> > stats.field='{!sum=true }metric_67' sta

Re: Solr Aggregation queries are way slower than Elastic Search

2017-12-12 Thread RAUNAK AGRAWAL
Hi Yonik,

As you asked here is the code snippet and the actual solr query. Please
have a look. I have included only 104 metrics but like this we can go upto
3600 rollups.

Also I have looked into the JSON Facet API. If I have to use facets, I will
have to then define 3600 facets in a single query and I guess that would be
also slow. Also is there any max limit on the number of facets we can
define in a single query?

Code snippet:

private SolrQuery buildQuery(Integer variable1, List metrics) {
SolrQuery query = new SolrQuery();
query.set("q", "variable1:" + variable1);
query.setRows(0);
metrics.forEach(
metric -> query.setGetFieldStatistics("{!sum=true }" + metric)
);
return query;
}


The generated query:

{! q=variable1:290 rows=0 stats=true stats.field='{!sum=true
}metric_1' stats.field='{!sum=true }metric_2' stats.field='{!sum=true
}metric_3' stats.field='{!sum=true }metric_4' stats.field='{!sum=true
}metric_5' stats.field='{!sum=true }metric_6' stats.field='{!sum=true
}metric_7' stats.field='{!sum=true }metric_8' stats.field='{!sum=true
}metric_9' stats.field='{!sum=true }metric_10' stats.field='{!sum=true
}metric_11' stats.field='{!sum=true }metric_12'
stats.field='{!sum=true }metric_13' stats.field='{!sum=true
}metric_14' stats.field='{!sum=true }metric_15'
stats.field='{!sum=true }metric_16' stats.field='{!sum=true
}metric_17' stats.field='{!sum=true }metric_18'
stats.field='{!sum=true }metric_19' stats.field='{!sum=true
}metric_20' stats.field='{!sum=true }metric_21'
stats.field='{!sum=true }metric_22' stats.field='{!sum=true
}metric_23' stats.field='{!sum=true }metric_24'
stats.field='{!sum=true }metric_25' stats.field='{!sum=true
}metric_26' stats.field='{!sum=true }metric_27'
stats.field='{!sum=true }metric_28' stats.field='{!sum=true
}metric_29' stats.field='{!sum=true }metric_30'
stats.field='{!sum=true }metric_31' stats.field='{!sum=true
}metric_32' stats.field='{!sum=true }metric_33'
stats.field='{!sum=true }metric_34' stats.field='{!sum=true
}metric_35' stats.field='{!sum=true }metric_36'
stats.field='{!sum=true }metric_37' stats.field='{!sum=true
}metric_38' stats.field='{!sum=true }metric_39'
stats.field='{!sum=true }metric_40' stats.field='{!sum=true
}metric_41' stats.field='{!sum=true }metric_42'
stats.field='{!sum=true }metric_43' stats.field='{!sum=true
}metric_44' stats.field='{!sum=true }metric_45'
stats.field='{!sum=true }metric_46' stats.field='{!sum=true
}metric_47' stats.field='{!sum=true }metric_48'
stats.field='{!sum=true }metric_49' stats.field='{!sum=true
}metric_50' stats.field='{!sum=true }metric_51'
stats.field='{!sum=true }metric_52' stats.field='{!sum=true
}metric_53' stats.field='{!sum=true }metric_54'
stats.field='{!sum=true }metric_55' stats.field='{!sum=true
}metric_56' stats.field='{!sum=true }metric_57'
stats.field='{!sum=true }metric_58' stats.field='{!sum=true
}metric_59' stats.field='{!sum=true }metric_60'
stats.field='{!sum=true }metric_61' stats.field='{!sum=true
}metric_62' stats.field='{!sum=true }metric_63'
stats.field='{!sum=true }metric_64' stats.field='{!sum=true
}metric_65' stats.field='{!sum=true }metric_66'
stats.field='{!sum=true }metric_67' stats.field='{!sum=true
}metric_68' stats.field='{!sum=true }metric_69'
stats.field='{!sum=true }metric_70' stats.field='{!sum=true
}metric_71' stats.field='{!sum=true }metric_72'
stats.field='{!sum=true }metric_73' stats.field='{!sum=true
}metric_74' stats.field='{!sum=true }metric_75'
stats.field='{!sum=true }metric_76' stats.field='{!sum=true
}metric_77' stats.field='{!sum=true }metric_78'
stats.field='{!sum=true }metric_79' stats.field='{!sum=true
}metric_80' stats.field='{!sum=true }metric_81'
stats.field='{!sum=true }metric_82' stats.field='{!sum=true
}metric_83' stats.field='{!sum=true }metric_84'
stats.field='{!sum=true }metric_85' stats.field='{!sum=true
}metric_86' stats.field='{!sum=true }metric_87'
stats.field='{!sum=true }metric_88' stats.field='{!sum=true
}metric_89' stats.field='{!sum=true }metric_90'
stats.field='{!sum=true }metric_91' stats.field='{!sum=true
}metric_92' stats.field='{!sum=true }metric_93'
stats.field='{!sum=true }metric_94' stats.field='{!sum=true
}metric_95' stats.field='{!sum=true }metric_96'
stats.field='{!sum=true }metric_97' stats.field='{!sum=true
}metric_98' stats.field='{!sum=true }metric_99'
stats.field='{!sum=true }metric_100' stats.field='{!sum=true
}metric_101' stats.field='{!sum=true }metric_102'
stats.field='{!sum=true }metric_103' stats.field='{!sum=true
}metric_104'}




On Tue, Dec 12, 2017 at 10:21 AM, RAUNAK AGRAWAL <agrawal.rau...@gmail.com>
wrote:

> Hi Yonik,
>
> I will try the JSON Facet API and update here but my hunch is that
> querying mechanism is not the problem. Rather the problem lies with the
> solr aggregations.
>
> Thanks
>
> On Tue, Dec 12, 2017 at 6:31 AM, Yonik Seeley <ysee...@gmail.com> wrote:
>
>> I think the SolrJ be

Re: Solr Aggregation queries are way slower than Elastic Search

2017-12-11 Thread RAUNAK AGRAWAL
Hi Yonik,

I will try the JSON Facet API and update here but my hunch is that querying
mechanism is not the problem. Rather the problem lies with the solr
aggregations.

Thanks

On Tue, Dec 12, 2017 at 6:31 AM, Yonik Seeley <ysee...@gmail.com> wrote:

> I think the SolrJ below uses the old stats component.
> Hopefully the JSON Facet API would be faster for this, but it's not
> completely clear what the main query here looks like, and if it's the
> source of any bottleneck rather than the aggregations.
> What does the generated query string actually look like (it may be
> easiest just to pull this from the logs).
>
> -Yonik
>
>
> On Mon, Dec 11, 2017 at 7:32 PM, RAUNAK AGRAWAL
> <agrawal.rau...@gmail.com> wrote:
> > Hi,
> >
> > We have a use case where there are 4-5 dimensions and around 3500 metrics
> > in a single document. We have indexed only 2 dimensions and made all the
> > metrics as doc_values so that we can run the aggregation queries.
> >
> > We have 6 million such documents and we are using solr cloud(6.6) on a 6
> > node cluster with 8 Vcores and 24 GB RAM each.
> >
> > On the same set of hardware in elastic search we were getting the
> response
> > in about 10ms whereas in solr we are getting response in around 300-400
> ms.
> >
> > This is how I am querying the data.
> >
> > private SolrQuery buildQuery(Integer variable1, List groups,
> > List metrics) {
> > SolrQuery query = new SolrQuery();
> > String groupQuery = " (" + groups.stream().map(g -> "group:" +
> g).collect
> > (Collectors.joining(" OR ")) + ")";
> > String finalQuery = "variable1:" + variable1 + " AND " + groupQuery;
> > query.set("q", finalQuery);
> > query.setRows(0);
> > metrics.forEach(
> > metric -> query.setGetFieldStatistics("{!sum=true }" +
> metric)
> > );
> > return query;
> > }
> >
> > Any help will be appreciated regarding this.
> >
> >
> > Thanks,
> >
> > Raunak
>


Solr Aggregation queries are way slower than Elastic Search

2017-12-11 Thread RAUNAK AGRAWAL
Hi,

We have a use case where there are 4-5 dimensions and around 3500 metrics
in a single document. We have indexed only 2 dimensions and made all the
metrics as doc_values so that we can run the aggregation queries.

We have 6 million such documents and we are using solr cloud(6.6) on a 6
node cluster with 8 Vcores and 24 GB RAM each.

On the same set of hardware in elastic search we were getting the response
in about 10ms whereas in solr we are getting response in around 300-400 ms.

This is how I am querying the data.

private SolrQuery buildQuery(Integer variable1, List groups,
List metrics) {
SolrQuery query = new SolrQuery();
String groupQuery = " (" + groups.stream().map(g -> "group:" + g).collect
(Collectors.joining(" OR ")) + ")";
String finalQuery = "variable1:" + variable1 + " AND " + groupQuery;
query.set("q", finalQuery);
query.setRows(0);
metrics.forEach(
metric -> query.setGetFieldStatistics("{!sum=true }" + metric)
);
return query;
}

Any help will be appreciated regarding this.


Thanks,

Raunak