Re: Large Filter Query

2019-06-26 Thread Lucky Sharma
Thanks, David, Shawn, Jagdish

Help and suggestions are really appreciated.

Regards,
Lucky Sharma

On Thu, Jun 27, 2019 at 12:50 AM Shawn Heisey  wrote:
>
> On 6/26/2019 12:56 PM, Lucky Sharma wrote:
> > @Shawn: Sorry I forgot to mention the corpus size: the corpus size is
> > around 3 million docs, where we need to query for 1500 docs and run
> > aggregations, sorting, search on them.
>
> Assuming the documents aren't HUGE, that sounds like something Solr
> should be able to handle pretty easily on a typical modern 64-bit
> system.  I handled multiple indexes much larger than that with an 8GB
> heap on Linux servers with 64GB total memory.  Most likely you won't
> need anything that large.
>
> Depending on exactly what you're going to do with it, that's probably
> also something easily handled by a relational database or a more modern
> NoSQL solution ... especially if "traditional search" is not part of
> your goals.  Solr can do things beyond search, but search is where
> everything is optimized, so if search is not part of your goal, you
> might want to look elsewhere.
>
> > @David: But will that not be a performance hit (resource incentive)?
> > since it will have that many terms to search upon, the query parse
> > tree will be big, isn't it?
>
> The terms query parser is far more efficient than a simple boolean "OR"
> search with the same number of terms.  It is highly recommended for use
> cases like you have described.
>
> The default maxBooleanClauses limit that Lucene enforces on boolean
> queries is 1024 ... but this is an arbitrary value.  The limit was
> designed as a way to prevent massive queries from running when it wasn't
> truly intended for such queries to have been created in the first place.
>   It is common for users to increase the default limit.
>
> You're probably going to want to send your queries as POST requests,
> because those have a 2MB default body-size restriction, which can be
> increased.  GET requests are limited by the HTTP header size
> restriction, which defaults 8192 bytes on all web server implementations
> I have checked, including the one that's included with Solr.  Increasing
> that is possible, but not recommended ... especially to the sizes you
> would need for the queries you have described.
>
> Thanks,
> Shawn



-- 
Warm Regards,

Lucky Sharma
Contact No :+91 9821559918


Re: Large Filter Query

2019-06-26 Thread Shawn Heisey

On 6/26/2019 12:56 PM, Lucky Sharma wrote:

@Shawn: Sorry I forgot to mention the corpus size: the corpus size is
around 3 million docs, where we need to query for 1500 docs and run
aggregations, sorting, search on them.


Assuming the documents aren't HUGE, that sounds like something Solr 
should be able to handle pretty easily on a typical modern 64-bit 
system.  I handled multiple indexes much larger than that with an 8GB 
heap on Linux servers with 64GB total memory.  Most likely you won't 
need anything that large.


Depending on exactly what you're going to do with it, that's probably 
also something easily handled by a relational database or a more modern 
NoSQL solution ... especially if "traditional search" is not part of 
your goals.  Solr can do things beyond search, but search is where 
everything is optimized, so if search is not part of your goal, you 
might want to look elsewhere.



@David: But will that not be a performance hit (resource incentive)?
since it will have that many terms to search upon, the query parse
tree will be big, isn't it?


The terms query parser is far more efficient than a simple boolean "OR" 
search with the same number of terms.  It is highly recommended for use 
cases like you have described.


The default maxBooleanClauses limit that Lucene enforces on boolean 
queries is 1024 ... but this is an arbitrary value.  The limit was 
designed as a way to prevent massive queries from running when it wasn't 
truly intended for such queries to have been created in the first place. 
 It is common for users to increase the default limit.


You're probably going to want to send your queries as POST requests, 
because those have a 2MB default body-size restriction, which can be 
increased.  GET requests are limited by the HTTP header size 
restriction, which defaults 8192 bytes on all web server implementations 
I have checked, including the one that's included with Solr.  Increasing 
that is possible, but not recommended ... especially to the sizes you 
would need for the queries you have described.


Thanks,
Shawn


Re: Large Filter Query

2019-06-26 Thread jai dutt
Then term query parser is best way to do that.
You can check below link  for performance detail.

http://yonik.com/solr-terms-query/

n Thu, 27 Jun, 2019, 12:31 AM Lucky Sharma,  wrote:

> Thanks, Jagdish
> But what if we need to perform search and filtering on those 1.5k doc
> ids results, also for URI error, we can go with the POST approach,
> and what if the data is not sharded.
>
> Regards,
> Lucky Sharma
>
> On Thu, Jun 27, 2019 at 12:28 AM jai dutt 
> wrote:
> >
> > 1. No Solr is not for id search.  rdms a better option.
> > 2. Yes correct it going to impact query  performance. And you may got
> > large uri error.
> > 3 ya you can pass ids internally by writing any custom parser.or divide
> > data into different shard.
> >
> >
> >
> > On Thu, 27 Jun, 2019, 12:01 AM Lucky Sharma,  wrote:
> >
> > > Hi all,
> > >
> > > What we are doing is, we will be having a set of unique Ids of solr
> > > document at max 1500, we need to run faceting and sorting among them.
> > > there is no direct search involved.
> > > It's a head-on search since we already know the document unique keys
> > > beforehand.
> > >
> > > 1. Is Solr a better use case for such kind of problem?
> > > 2. Since we will be passing 1500 unique document ids, As per my
> > > understanding it will impact query tree as it will grow bigger. Will
> > > there be any other impacts?
> > > 3. Is it wise to use or solve the situation in this way?
> > >
> > >
> > > --
> > > Warm Regards,
> > >
> > > Lucky Sharma
> > >
>
>
>
> --
> Warm Regards,
>
> Lucky Sharma
> Contact No :+91 9821559918
>


Re: Large Filter Query

2019-06-26 Thread David Hastings
yeah there is a performance hit but that is expected.  in my scenario i
pass sometimes a few thousand using this method, but i pre-process my
results since its a set.  you will not have any issues if you are using
POST with the uri length.

On Wed, Jun 26, 2019 at 3:02 PM Lucky Sharma  wrote:

> Thanks, Jagdish
> But what if we need to perform search and filtering on those 1.5k doc
> ids results, also for URI error, we can go with the POST approach,
> and what if the data is not sharded.
>
> Regards,
> Lucky Sharma
>
> On Thu, Jun 27, 2019 at 12:28 AM jai dutt 
> wrote:
> >
> > 1. No Solr is not for id search.  rdms a better option.
> > 2. Yes correct it going to impact query  performance. And you may got
> > large uri error.
> > 3 ya you can pass ids internally by writing any custom parser.or divide
> > data into different shard.
> >
> >
> >
> > On Thu, 27 Jun, 2019, 12:01 AM Lucky Sharma,  wrote:
> >
> > > Hi all,
> > >
> > > What we are doing is, we will be having a set of unique Ids of solr
> > > document at max 1500, we need to run faceting and sorting among them.
> > > there is no direct search involved.
> > > It's a head-on search since we already know the document unique keys
> > > beforehand.
> > >
> > > 1. Is Solr a better use case for such kind of problem?
> > > 2. Since we will be passing 1500 unique document ids, As per my
> > > understanding it will impact query tree as it will grow bigger. Will
> > > there be any other impacts?
> > > 3. Is it wise to use or solve the situation in this way?
> > >
> > >
> > > --
> > > Warm Regards,
> > >
> > > Lucky Sharma
> > >
>
>
>
> --
> Warm Regards,
>
> Lucky Sharma
> Contact No :+91 9821559918
>


Re: Large Filter Query

2019-06-26 Thread Lucky Sharma
Thanks, Jagdish
But what if we need to perform search and filtering on those 1.5k doc
ids results, also for URI error, we can go with the POST approach,
and what if the data is not sharded.

Regards,
Lucky Sharma

On Thu, Jun 27, 2019 at 12:28 AM jai dutt  wrote:
>
> 1. No Solr is not for id search.  rdms a better option.
> 2. Yes correct it going to impact query  performance. And you may got
> large uri error.
> 3 ya you can pass ids internally by writing any custom parser.or divide
> data into different shard.
>
>
>
> On Thu, 27 Jun, 2019, 12:01 AM Lucky Sharma,  wrote:
>
> > Hi all,
> >
> > What we are doing is, we will be having a set of unique Ids of solr
> > document at max 1500, we need to run faceting and sorting among them.
> > there is no direct search involved.
> > It's a head-on search since we already know the document unique keys
> > beforehand.
> >
> > 1. Is Solr a better use case for such kind of problem?
> > 2. Since we will be passing 1500 unique document ids, As per my
> > understanding it will impact query tree as it will grow bigger. Will
> > there be any other impacts?
> > 3. Is it wise to use or solve the situation in this way?
> >
> >
> > --
> > Warm Regards,
> >
> > Lucky Sharma
> >



-- 
Warm Regards,

Lucky Sharma
Contact No :+91 9821559918


Re: Large Filter Query

2019-06-26 Thread jai dutt
1. No Solr is not for id search.  rdms a better option.
2. Yes correct it going to impact query  performance. And you may got
large uri error.
3 ya you can pass ids internally by writing any custom parser.or divide
data into different shard.



On Thu, 27 Jun, 2019, 12:01 AM Lucky Sharma,  wrote:

> Hi all,
>
> What we are doing is, we will be having a set of unique Ids of solr
> document at max 1500, we need to run faceting and sorting among them.
> there is no direct search involved.
> It's a head-on search since we already know the document unique keys
> beforehand.
>
> 1. Is Solr a better use case for such kind of problem?
> 2. Since we will be passing 1500 unique document ids, As per my
> understanding it will impact query tree as it will grow bigger. Will
> there be any other impacts?
> 3. Is it wise to use or solve the situation in this way?
>
>
> --
> Warm Regards,
>
> Lucky Sharma
>


Re: Large Filter Query

2019-06-26 Thread Lucky Sharma
@Shawn: Sorry I forgot to mention the corpus size: the corpus size is
around 3 million docs, where we need to query for 1500 docs and run
aggregations, sorting, search on them.

@David: But will that not be a performance hit (resource incentive)?
since it will have that many terms to search upon, the query parse
tree will be big, isn't it?


Re: Large Filter Query

2019-06-26 Thread David Hastings
you can use the !terms operator and send them separated by a comma:

{!terms f=id}id1,id2,..id1499,id1500

and run facets normally


On Wed, Jun 26, 2019 at 2:31 PM Lucky Sharma  wrote:

> Hi all,
>
> What we are doing is, we will be having a set of unique Ids of solr
> document at max 1500, we need to run faceting and sorting among them.
> there is no direct search involved.
> It's a head-on search since we already know the document unique keys
> beforehand.
>
> 1. Is Solr a better use case for such kind of problem?
> 2. Since we will be passing 1500 unique document ids, As per my
> understanding it will impact query tree as it will grow bigger. Will
> there be any other impacts?
> 3. Is it wise to use or solve the situation in this way?
>
>
> --
> Warm Regards,
>
> Lucky Sharma
>


Re: Large Filter Query

2019-06-26 Thread Shawn Heisey

On 6/26/2019 12:31 PM, Lucky Sharma wrote:

What we are doing is, we will be having a set of unique Ids of solr
document at max 1500, we need to run faceting and sorting among them.
there is no direct search involved.
It's a head-on search since we already know the document unique keys beforehand.

1. Is Solr a better use case for such kind of problem?
2. Since we will be passing 1500 unique document ids, As per my
understanding it will impact query tree as it will grow bigger. Will
there be any other impacts?
3. Is it wise to use or solve the situation in this way?


Where exactly does the number "1500" fit in?  It's not clear from what 
you've said.  If there will be 1500 documents total, that's an extremely 
small index.  If that's the number of values in a single query, there 
are solutions for any of the problems that might arise as a result of that.


When you ask whether Solr is better, better than what?

More detail will be needed in order to provide any useful information.

Thanks,
Shawn


Large Filter Query

2019-06-26 Thread Lucky Sharma
Hi all,

What we are doing is, we will be having a set of unique Ids of solr
document at max 1500, we need to run faceting and sorting among them.
there is no direct search involved.
It's a head-on search since we already know the document unique keys beforehand.

1. Is Solr a better use case for such kind of problem?
2. Since we will be passing 1500 unique document ids, As per my
understanding it will impact query tree as it will grow bigger. Will
there be any other impacts?
3. Is it wise to use or solve the situation in this way?


-- 
Warm Regards,

Lucky Sharma