Re: Large Filter Query
Thanks, David, Shawn, Jagdish Help and suggestions are really appreciated. Regards, Lucky Sharma On Thu, Jun 27, 2019 at 12:50 AM Shawn Heisey wrote: > > On 6/26/2019 12:56 PM, Lucky Sharma wrote: > > @Shawn: Sorry I forgot to mention the corpus size: the corpus size is > > around 3 million docs, where we need to query for 1500 docs and run > > aggregations, sorting, search on them. > > Assuming the documents aren't HUGE, that sounds like something Solr > should be able to handle pretty easily on a typical modern 64-bit > system. I handled multiple indexes much larger than that with an 8GB > heap on Linux servers with 64GB total memory. Most likely you won't > need anything that large. > > Depending on exactly what you're going to do with it, that's probably > also something easily handled by a relational database or a more modern > NoSQL solution ... especially if "traditional search" is not part of > your goals. Solr can do things beyond search, but search is where > everything is optimized, so if search is not part of your goal, you > might want to look elsewhere. > > > @David: But will that not be a performance hit (resource incentive)? > > since it will have that many terms to search upon, the query parse > > tree will be big, isn't it? > > The terms query parser is far more efficient than a simple boolean "OR" > search with the same number of terms. It is highly recommended for use > cases like you have described. > > The default maxBooleanClauses limit that Lucene enforces on boolean > queries is 1024 ... but this is an arbitrary value. The limit was > designed as a way to prevent massive queries from running when it wasn't > truly intended for such queries to have been created in the first place. > It is common for users to increase the default limit. > > You're probably going to want to send your queries as POST requests, > because those have a 2MB default body-size restriction, which can be > increased. GET requests are limited by the HTTP header size > restriction, which defaults 8192 bytes on all web server implementations > I have checked, including the one that's included with Solr. Increasing > that is possible, but not recommended ... especially to the sizes you > would need for the queries you have described. > > Thanks, > Shawn -- Warm Regards, Lucky Sharma Contact No :+91 9821559918
Re: Large Filter Query
On 6/26/2019 12:56 PM, Lucky Sharma wrote: @Shawn: Sorry I forgot to mention the corpus size: the corpus size is around 3 million docs, where we need to query for 1500 docs and run aggregations, sorting, search on them. Assuming the documents aren't HUGE, that sounds like something Solr should be able to handle pretty easily on a typical modern 64-bit system. I handled multiple indexes much larger than that with an 8GB heap on Linux servers with 64GB total memory. Most likely you won't need anything that large. Depending on exactly what you're going to do with it, that's probably also something easily handled by a relational database or a more modern NoSQL solution ... especially if "traditional search" is not part of your goals. Solr can do things beyond search, but search is where everything is optimized, so if search is not part of your goal, you might want to look elsewhere. @David: But will that not be a performance hit (resource incentive)? since it will have that many terms to search upon, the query parse tree will be big, isn't it? The terms query parser is far more efficient than a simple boolean "OR" search with the same number of terms. It is highly recommended for use cases like you have described. The default maxBooleanClauses limit that Lucene enforces on boolean queries is 1024 ... but this is an arbitrary value. The limit was designed as a way to prevent massive queries from running when it wasn't truly intended for such queries to have been created in the first place. It is common for users to increase the default limit. You're probably going to want to send your queries as POST requests, because those have a 2MB default body-size restriction, which can be increased. GET requests are limited by the HTTP header size restriction, which defaults 8192 bytes on all web server implementations I have checked, including the one that's included with Solr. Increasing that is possible, but not recommended ... especially to the sizes you would need for the queries you have described. Thanks, Shawn
Re: Large Filter Query
Then term query parser is best way to do that. You can check below link for performance detail. http://yonik.com/solr-terms-query/ n Thu, 27 Jun, 2019, 12:31 AM Lucky Sharma, wrote: > Thanks, Jagdish > But what if we need to perform search and filtering on those 1.5k doc > ids results, also for URI error, we can go with the POST approach, > and what if the data is not sharded. > > Regards, > Lucky Sharma > > On Thu, Jun 27, 2019 at 12:28 AM jai dutt > wrote: > > > > 1. No Solr is not for id search. rdms a better option. > > 2. Yes correct it going to impact query performance. And you may got > > large uri error. > > 3 ya you can pass ids internally by writing any custom parser.or divide > > data into different shard. > > > > > > > > On Thu, 27 Jun, 2019, 12:01 AM Lucky Sharma, wrote: > > > > > Hi all, > > > > > > What we are doing is, we will be having a set of unique Ids of solr > > > document at max 1500, we need to run faceting and sorting among them. > > > there is no direct search involved. > > > It's a head-on search since we already know the document unique keys > > > beforehand. > > > > > > 1. Is Solr a better use case for such kind of problem? > > > 2. Since we will be passing 1500 unique document ids, As per my > > > understanding it will impact query tree as it will grow bigger. Will > > > there be any other impacts? > > > 3. Is it wise to use or solve the situation in this way? > > > > > > > > > -- > > > Warm Regards, > > > > > > Lucky Sharma > > > > > > > -- > Warm Regards, > > Lucky Sharma > Contact No :+91 9821559918 >
Re: Large Filter Query
yeah there is a performance hit but that is expected. in my scenario i pass sometimes a few thousand using this method, but i pre-process my results since its a set. you will not have any issues if you are using POST with the uri length. On Wed, Jun 26, 2019 at 3:02 PM Lucky Sharma wrote: > Thanks, Jagdish > But what if we need to perform search and filtering on those 1.5k doc > ids results, also for URI error, we can go with the POST approach, > and what if the data is not sharded. > > Regards, > Lucky Sharma > > On Thu, Jun 27, 2019 at 12:28 AM jai dutt > wrote: > > > > 1. No Solr is not for id search. rdms a better option. > > 2. Yes correct it going to impact query performance. And you may got > > large uri error. > > 3 ya you can pass ids internally by writing any custom parser.or divide > > data into different shard. > > > > > > > > On Thu, 27 Jun, 2019, 12:01 AM Lucky Sharma, wrote: > > > > > Hi all, > > > > > > What we are doing is, we will be having a set of unique Ids of solr > > > document at max 1500, we need to run faceting and sorting among them. > > > there is no direct search involved. > > > It's a head-on search since we already know the document unique keys > > > beforehand. > > > > > > 1. Is Solr a better use case for such kind of problem? > > > 2. Since we will be passing 1500 unique document ids, As per my > > > understanding it will impact query tree as it will grow bigger. Will > > > there be any other impacts? > > > 3. Is it wise to use or solve the situation in this way? > > > > > > > > > -- > > > Warm Regards, > > > > > > Lucky Sharma > > > > > > > -- > Warm Regards, > > Lucky Sharma > Contact No :+91 9821559918 >
Re: Large Filter Query
Thanks, Jagdish But what if we need to perform search and filtering on those 1.5k doc ids results, also for URI error, we can go with the POST approach, and what if the data is not sharded. Regards, Lucky Sharma On Thu, Jun 27, 2019 at 12:28 AM jai dutt wrote: > > 1. No Solr is not for id search. rdms a better option. > 2. Yes correct it going to impact query performance. And you may got > large uri error. > 3 ya you can pass ids internally by writing any custom parser.or divide > data into different shard. > > > > On Thu, 27 Jun, 2019, 12:01 AM Lucky Sharma, wrote: > > > Hi all, > > > > What we are doing is, we will be having a set of unique Ids of solr > > document at max 1500, we need to run faceting and sorting among them. > > there is no direct search involved. > > It's a head-on search since we already know the document unique keys > > beforehand. > > > > 1. Is Solr a better use case for such kind of problem? > > 2. Since we will be passing 1500 unique document ids, As per my > > understanding it will impact query tree as it will grow bigger. Will > > there be any other impacts? > > 3. Is it wise to use or solve the situation in this way? > > > > > > -- > > Warm Regards, > > > > Lucky Sharma > > -- Warm Regards, Lucky Sharma Contact No :+91 9821559918
Re: Large Filter Query
1. No Solr is not for id search. rdms a better option. 2. Yes correct it going to impact query performance. And you may got large uri error. 3 ya you can pass ids internally by writing any custom parser.or divide data into different shard. On Thu, 27 Jun, 2019, 12:01 AM Lucky Sharma, wrote: > Hi all, > > What we are doing is, we will be having a set of unique Ids of solr > document at max 1500, we need to run faceting and sorting among them. > there is no direct search involved. > It's a head-on search since we already know the document unique keys > beforehand. > > 1. Is Solr a better use case for such kind of problem? > 2. Since we will be passing 1500 unique document ids, As per my > understanding it will impact query tree as it will grow bigger. Will > there be any other impacts? > 3. Is it wise to use or solve the situation in this way? > > > -- > Warm Regards, > > Lucky Sharma >
Re: Large Filter Query
@Shawn: Sorry I forgot to mention the corpus size: the corpus size is around 3 million docs, where we need to query for 1500 docs and run aggregations, sorting, search on them. @David: But will that not be a performance hit (resource incentive)? since it will have that many terms to search upon, the query parse tree will be big, isn't it?
Re: Large Filter Query
you can use the !terms operator and send them separated by a comma: {!terms f=id}id1,id2,..id1499,id1500 and run facets normally On Wed, Jun 26, 2019 at 2:31 PM Lucky Sharma wrote: > Hi all, > > What we are doing is, we will be having a set of unique Ids of solr > document at max 1500, we need to run faceting and sorting among them. > there is no direct search involved. > It's a head-on search since we already know the document unique keys > beforehand. > > 1. Is Solr a better use case for such kind of problem? > 2. Since we will be passing 1500 unique document ids, As per my > understanding it will impact query tree as it will grow bigger. Will > there be any other impacts? > 3. Is it wise to use or solve the situation in this way? > > > -- > Warm Regards, > > Lucky Sharma >
Re: Large Filter Query
On 6/26/2019 12:31 PM, Lucky Sharma wrote: What we are doing is, we will be having a set of unique Ids of solr document at max 1500, we need to run faceting and sorting among them. there is no direct search involved. It's a head-on search since we already know the document unique keys beforehand. 1. Is Solr a better use case for such kind of problem? 2. Since we will be passing 1500 unique document ids, As per my understanding it will impact query tree as it will grow bigger. Will there be any other impacts? 3. Is it wise to use or solve the situation in this way? Where exactly does the number "1500" fit in? It's not clear from what you've said. If there will be 1500 documents total, that's an extremely small index. If that's the number of values in a single query, there are solutions for any of the problems that might arise as a result of that. When you ask whether Solr is better, better than what? More detail will be needed in order to provide any useful information. Thanks, Shawn
Large Filter Query
Hi all, What we are doing is, we will be having a set of unique Ids of solr document at max 1500, we need to run faceting and sorting among them. there is no direct search involved. It's a head-on search since we already know the document unique keys beforehand. 1. Is Solr a better use case for such kind of problem? 2. Since we will be passing 1500 unique document ids, As per my understanding it will impact query tree as it will grow bigger. Will there be any other impacts? 3. Is it wise to use or solve the situation in this way? -- Warm Regards, Lucky Sharma