Re: Performance warning: Overlapping onDeskSearchers=2 solr
+1 to change to new message A strawman new message could be: "Performance warning: Overlapping onDeskSearchers=2; consider reducing commit frequency if performance problems encountered" On Wed, May 17, 2017 at 1:15 PM, Mike Drobwrote: > You're committing too frequently, so you have new searchers getting queued > up before the previous ones have been processed. > > You have several options on how to deal with this. Can increase commit > interval, add hardware, or reduce query warming. > > I don't know if uncommenting that section will help because I don't know > what your current settings are. Or if you are using manual commits. > > Mike > > On Wed, May 17, 2017, 4:58 AM Srinivas Kashyap > wrote: > > > Hi All, > > > > We are using Solr 5.2.1 version and are currently experiencing below > > Warning in Solr Logging Console: > > > > Performance warning: Overlapping onDeskSearchers=2 > > > > Also we encounter, > > > > org.apache.solr.common.SolrException: Error opening new searcher. > exceeded > > limit of maxWarmingSearchers=2, try again later. > > > > > > The reason being, we are doing mass update on our application and solr > > experiencing the higher loads at times. Data is being indexed using > DIH(sql > > queries). > > > > In solrconfig.xml below is the code. > > > > > > > > Should we be uncommenting the above lines and try to avoid this error? > > Please help me. > > > > Thanks and Regards, > > Srinivas Kashyap > > > > > > > > DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are > > confidential. If you are not the intended recipient, please notify the > > sender immediately by replying to the e-mail, and then delete it without > > making copies or using it in any way. No representation is made that this > > email or any attachments are free of viruses. Virus scanning is > recommended > > and is the responsibility of the recipient. > > >
Is IndexSchema addFields and addCopyFields concurrent?
Hi all: I am new to Solr, and I am using Solr 6.4.2. I try to add fields and copyFields to schema programmatically as below. However, in a very few occasions, I see a few fields are not added but copyFields are added when I try to add a lot of fields and copyFields (about 80 fields, 40 copyFields (one field is source, the other is destination). This causes core initialization failure because the associated fields for copyFields do not exist? Can someone help me? Thank you! --Michael Hu synchronized (oldSchema.getSchemaUpdateLock()) { try { IndexSchema newSchema = oldSchema.addFields(newFields).addCopyFields(newCopyFields, true); if (null != newSchema) { core.setLatestSchema(newSchema); cmd.getReq().updateSchemaToLatest(); latestSchemaMap.put(core.getName(), newSchema); log.debug("Successfully added field(s) to the schema."); break; // success - exit from the retry loop } else { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Failed to add fields."); } } catch (ManagedIndexSchema.FieldExistsException e) { log.error("At least one field to be added already exists in the schema - retrying."); oldSchema = core.getLatestSchema(); cmd.getReq().updateSchemaToLatest(); } catch (ManagedIndexSchema.SchemaChangedInZkException e) { log.debug("Schema changed while processing request - retrying."); oldSchema = core.getLatestSchema(); cmd.getReq().updateSchemaToLatest(); } }
Re: Possible regression in Parallel SQL in 6.5.1?
cool, thanks ... easy enough to fix the SQL statement for now ;-) On Tue, May 16, 2017 at 6:27 PM, Kevin Risdenwrote: > Well didn't take as long as I thought: > https://issues.apache.org/jira/browse/CALCITE-1306 > > Once Calcite 1.13 is released we should upgrade and get support for this > again. > > Kevin Risden > > On Tue, May 16, 2017 at 7:23 PM, Kevin Risden > wrote: > >> Yea this came up on the calcite mailing list. Not sure if aliases in the >> having clause were going to be added. I'll have to see if I can find that >> discussion or JIRA. >> >> Kevin Risden >> >> On May 16, 2017 18:54, "Joel Bernstein" wrote: >> >>> Yeah, Calcite doesn't support field aliases in the having clause. The >>> query >>> should work if you use count(*). We could consider this a regression, but >>> I >>> think this will be a won't fix. >>> >>> Joel Bernstein >>> http://joelsolr.blogspot.com/ >>> >>> On Tue, May 16, 2017 at 12:51 PM, Timothy Potter >>> wrote: >>> >>> > This SQL used to work pre-calcite: >>> > >>> > SELECT movie_id, COUNT(*) as num_ratings, avg(rating) as aggAvg FROM >>> > ratings GROUP BY movie_id HAVING num_ratings > 100 ORDER BY aggAvg ASC >>> > LIMIT 10 >>> > >>> > Now I get: >>> > Caused by: java.io.IOException: --> >>> > http://192.168.1.4:8983/solr/ratings_shard2_replica1/:Failed to >>> > execute sqlQuery 'SELECT movie_id, COUNT(*) as num_ratings, >>> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING >>> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10' against JDBC >>> > connection 'jdbc:calcitesolr:'. >>> > Error while executing SQL "SELECT movie_id, COUNT(*) as num_ratings, >>> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING >>> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10": From line 1, column >>> > 103 to line 1, column 113: Column 'num_ratings' not found in any table >>> > at org.apache.solr.client.solrj.io.stream.SolrStream.read( >>> > SolrStream.java:235) >>> > at com.lucidworks.spark.query.TupleStreamIterator.fetchNextTupl >>> e( >>> > TupleStreamIterator.java:82) >>> > at com.lucidworks.spark.query.TupleStreamIterator.hasNext( >>> > TupleStreamIterator.java:47) >>> > ... 31 more >>> > >>> >>
Re: Performance warning: Overlapping onDeskSearchers=2 solr
You're committing too frequently, so you have new searchers getting queued up before the previous ones have been processed. You have several options on how to deal with this. Can increase commit interval, add hardware, or reduce query warming. I don't know if uncommenting that section will help because I don't know what your current settings are. Or if you are using manual commits. Mike On Wed, May 17, 2017, 4:58 AM Srinivas Kashyapwrote: > Hi All, > > We are using Solr 5.2.1 version and are currently experiencing below > Warning in Solr Logging Console: > > Performance warning: Overlapping onDeskSearchers=2 > > Also we encounter, > > org.apache.solr.common.SolrException: Error opening new searcher. exceeded > limit of maxWarmingSearchers=2, try again later. > > > The reason being, we are doing mass update on our application and solr > experiencing the higher loads at times. Data is being indexed using DIH(sql > queries). > > In solrconfig.xml below is the code. > > > > Should we be uncommenting the above lines and try to avoid this error? > Please help me. > > Thanks and Regards, > Srinivas Kashyap > > > > DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are > confidential. If you are not the intended recipient, please notify the > sender immediately by replying to the e-mail, and then delete it without > making copies or using it in any way. No representation is made that this > email or any attachments are free of viruses. Virus scanning is recommended > and is the responsibility of the recipient. >
Re: Solr Admin Documents tab
Chris, Shawn, I am using 5.2.1 . Neither the array (Shawn) nor the document list (Chris) works for me in the Admin panel. However, CSV works fine. Clearly we are long overdue for an upgrade. Cheers -- Rick On May 17, 2017 10:22:28 AM EDT, Shawn Heiseywrote: >On 5/16/2017 12:41 PM, Rick Leir wrote: >> In the Solr Admin Documents tab, with the document type set to JSON, >I cannot get it to accept more than one document. The legend says >"Document(s)". What syntax is expected? It rejects an array of >documents. Thanks -- Rick > >See the box labeled "Adding Multiple JSON Documents" on this page for >an >example of multiple JSON documents being added in one request: > >https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-Solr-StyleJSON > >The next section, labeled "Sending JSON Update Commands" shows how to >use the command-based syntax with multiple documents and commands. > >Thanks, >Shawn -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: Performance warning: Overlapping onDeskSearchers=2 solr
This has been changed already in 6.4.0. From the CHANGES.txt entry: SOLR-9712: maxWarmingSearchers now defaults to 1, and more importantly commits will now block if this limit is exceeded instead of throwing an exception (a good thing). Consequently there is no longer a risk in overlapping commits. Nonetheless users should continue to avoid excessive committing. Users are advised to remove any pre-existing maxWarmingSearchers entries from their solrconfig.xml files. On Wed, May 17, 2017 at 8:45 PM, Jason Gerlowskiwrote: > Hey Shawn, others. > > This is a pitfall that Solr users seem to run into with some > frequency. (Anecdotally, I've bookmarked the Lucidworks article you > referenced because I end up referring people to it often enough.) > > The immediate first advice when someone encounters these > onDeckSearcher error messages is to examine their commit settings. Is > there any other possible cause for those messages? If not, can we > consider changing the log/exception error message to be more explicit > about the cause? > > A strawman new message could be: "Performance warning: Overlapping > onDeskSearchers=2; consider reducing commit frequency if performance > problems encountered" > > Happy to create a JIRA/patch for this; just wanted to get some > feedback first in case there's an obvious reason the messages don't > get explicit about the cause. > > Jason > > On Wed, May 17, 2017 at 8:49 AM, Shawn Heisey wrote: >> On 5/17/2017 5:57 AM, Srinivas Kashyap wrote: >>> We are using Solr 5.2.1 version and are currently experiencing below >>> Warning in Solr Logging Console: >>> >>> Performance warning: Overlapping onDeskSearchers=2 >>> >>> Also we encounter, >>> >>> org.apache.solr.common.SolrException: Error opening new searcher. exceeded >>> limit of maxWarmingSearchers=2, try again later. >>> >>> >>> The reason being, we are doing mass update on our application and solr >>> experiencing the higher loads at times. Data is being indexed using DIH(sql >>> queries). >>> >>> In solrconfig.xml below is the code. >>> >>> >>> >>> Should we be uncommenting the above lines and try to avoid this error? >>> Please help me. >> >> This warning means that you are committing so frequently that there are >> already two searchers warming when you start another commit. >> >> DIH does a commit exactly once -- at the end of the import. One import will >> not cause the warning message you're seeing, so if there is one import >> happening at a time, either you are sending explicit commit requests during >> the import, or you have autoSoftCommit enabled with values that are far too >> small. >> >> You should definitely have autoCommit configured, but I would remove >> maxDocs and set maxTime to something like 6 -- one minute. The >> autoCommit should also set openSearcher to false. This kind of commit >> will not make new changes visible, but it will start a new transaction >> log frequently. >> >> >> 6 >> false >> >> >> An automatic commit (soft or hard) with a one second interval is going to >> cause that warning you're seeing. >> >> https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ >> >> Thanks, >> Shawn >> -- Regards, Shalin Shekhar Mangar.
Re: Performance warning: Overlapping onDeskSearchers=2 solr
Also, what is your autoSoftCommit setting? That also opens up a new searcher. On Wed, May 17, 2017 at 8:15 AM, Jason Gerlowskiwrote: > Hey Shawn, others. > > This is a pitfall that Solr users seem to run into with some > frequency. (Anecdotally, I've bookmarked the Lucidworks article you > referenced because I end up referring people to it often enough.) > > The immediate first advice when someone encounters these > onDeckSearcher error messages is to examine their commit settings. Is > there any other possible cause for those messages? If not, can we > consider changing the log/exception error message to be more explicit > about the cause? > > A strawman new message could be: "Performance warning: Overlapping > onDeskSearchers=2; consider reducing commit frequency if performance > problems encountered" > > Happy to create a JIRA/patch for this; just wanted to get some > feedback first in case there's an obvious reason the messages don't > get explicit about the cause. > > Jason > > On Wed, May 17, 2017 at 8:49 AM, Shawn Heisey wrote: >> On 5/17/2017 5:57 AM, Srinivas Kashyap wrote: >>> We are using Solr 5.2.1 version and are currently experiencing below >>> Warning in Solr Logging Console: >>> >>> Performance warning: Overlapping onDeskSearchers=2 >>> >>> Also we encounter, >>> >>> org.apache.solr.common.SolrException: Error opening new searcher. exceeded >>> limit of maxWarmingSearchers=2, try again later. >>> >>> >>> The reason being, we are doing mass update on our application and solr >>> experiencing the higher loads at times. Data is being indexed using DIH(sql >>> queries). >>> >>> In solrconfig.xml below is the code. >>> >>> >>> >>> Should we be uncommenting the above lines and try to avoid this error? >>> Please help me. >> >> This warning means that you are committing so frequently that there are >> already two searchers warming when you start another commit. >> >> DIH does a commit exactly once -- at the end of the import. One import will >> not cause the warning message you're seeing, so if there is one import >> happening at a time, either you are sending explicit commit requests during >> the import, or you have autoSoftCommit enabled with values that are far too >> small. >> >> You should definitely have autoCommit configured, but I would remove >> maxDocs and set maxTime to something like 6 -- one minute. The >> autoCommit should also set openSearcher to false. This kind of commit >> will not make new changes visible, but it will start a new transaction >> log frequently. >> >> >> 6 >> false >> >> >> An automatic commit (soft or hard) with a one second interval is going to >> cause that warning you're seeing. >> >> https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ >> >> Thanks, >> Shawn >>
Re: Performance warning: Overlapping onDeskSearchers=2 solr
Hey Shawn, others. This is a pitfall that Solr users seem to run into with some frequency. (Anecdotally, I've bookmarked the Lucidworks article you referenced because I end up referring people to it often enough.) The immediate first advice when someone encounters these onDeckSearcher error messages is to examine their commit settings. Is there any other possible cause for those messages? If not, can we consider changing the log/exception error message to be more explicit about the cause? A strawman new message could be: "Performance warning: Overlapping onDeskSearchers=2; consider reducing commit frequency if performance problems encountered" Happy to create a JIRA/patch for this; just wanted to get some feedback first in case there's an obvious reason the messages don't get explicit about the cause. Jason On Wed, May 17, 2017 at 8:49 AM, Shawn Heiseywrote: > On 5/17/2017 5:57 AM, Srinivas Kashyap wrote: >> We are using Solr 5.2.1 version and are currently experiencing below Warning >> in Solr Logging Console: >> >> Performance warning: Overlapping onDeskSearchers=2 >> >> Also we encounter, >> >> org.apache.solr.common.SolrException: Error opening new searcher. exceeded >> limit of maxWarmingSearchers=2, try again later. >> >> >> The reason being, we are doing mass update on our application and solr >> experiencing the higher loads at times. Data is being indexed using DIH(sql >> queries). >> >> In solrconfig.xml below is the code. >> >> >> >> Should we be uncommenting the above lines and try to avoid this error? >> Please help me. > > This warning means that you are committing so frequently that there are > already two searchers warming when you start another commit. > > DIH does a commit exactly once -- at the end of the import. One import will > not cause the warning message you're seeing, so if there is one import > happening at a time, either you are sending explicit commit requests during > the import, or you have autoSoftCommit enabled with values that are far too > small. > > You should definitely have autoCommit configured, but I would remove > maxDocs and set maxTime to something like 6 -- one minute. The > autoCommit should also set openSearcher to false. This kind of commit > will not make new changes visible, but it will start a new transaction > log frequently. > > > 6 > false > > > An automatic commit (soft or hard) with a one second interval is going to > cause that warning you're seeing. > > https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > > Thanks, > Shawn >
Re: solr /export handler - behavior during close()
Thanks Joel, will try that. Binary response would be more performant. I observed the server sends responses in 32 kb chunks and the client reads it with 8 kb buffer on inputstream. I don't know if changing that can impact anything on performance. Even if buffer size is increased on httpclient, it can't override the hardcoded 8kb buffer on sun.nio.cs.StreamDecoder Thanks, Susmit On Wed, May 17, 2017 at 5:49 AM, Joel Bernsteinwrote: > Susmit, > > You could wrap a LimitStream around the outside of all the relational > algebra. For example: > > parallel(limit((intersect(intersect(search, search), union(search, > search) > > In this scenario the limit would happen on the workers. > > As far as the worker/replica ratio. This will depend on how heavy the > export is. If it's a light export, small number of fields, mostly numeric, > simple sort params, then I've seen a ratio of 5 (workers) to 1 (replica) > work well. This will basically saturate the CPU on the replica. But heavier > exports will saturate the replicas with fewer workers. > > Also I tend to use Direct DocValues to get the best performance. I'm not > sure how much difference this makes, but it should eliminate the > compression overhead fetching the data from the DocValues. > > Varun's suggestion of using the binary transport will provide a nice > performance increase as well. But you'll need to upgrade. You may need to > do that anyway as the fix on the early stream close will be on a later > version that was refactored to support the binary transport. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, May 16, 2017 at 8:03 PM, Joel Bernstein > wrote: > > > Yep, saw it. I'll comment on the ticket for what I believe needs to be > > done. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Tue, May 16, 2017 at 8:00 PM, Varun Thacker > wrote: > > > >> Hi Joel,Susmit > >> > >> I created https://issues.apache.org/jira/browse/SOLR-10698 to track the > >> issue > >> > >> @Susmit looking at the stack trace I see the expression is using > >> JSONTupleStream > >> . I wonder if you tried using JavabinTupleStreamParser could it help > >> improve performance ? > >> > >> On Tue, May 16, 2017 at 9:39 AM, Susmit Shukla > > >> wrote: > >> > >> > Hi Joel, > >> > > >> > queries can be arbitrarily nested with AND/OR/NOT joins e.g. > >> > > >> > (intersect(intersect(search, search), union(search, search))). If I > cut > >> off > >> > the innermost stream with a limit, the complete intersection would not > >> > happen at upper levels. Also would the limit stream have same effect > as > >> > using /select handler with rows parameter? > >> > I am trying to force input stream close through reflection, just to > see > >> if > >> > it gives performance gains. > >> > > >> > 2) would experiment with null streams. Is workers = number of replicas > >> in > >> > data collection a good thumb rule? is parallelstream performance upper > >> > bounded by number of replicas? > >> > > >> > Thanks, > >> > Susmit > >> > > >> > On Tue, May 16, 2017 at 5:59 AM, Joel Bernstein > >> > wrote: > >> > > >> > > Your approach looks OK. The single sharded worker collection is only > >> > needed > >> > > if you were using CloudSolrStream to send the initial Streaming > >> > Expression > >> > > to the /stream handler. You are not doing this, so you're approach > is > >> > fine. > >> > > > >> > > Here are some thoughts on what you described: > >> > > > >> > > 1) If you are closing the parallel stream after the top 1000 > results, > >> > then > >> > > try wrapping the intersect in a LimitStream. This stream doesn't > exist > >> > yet > >> > > so it will be a custom stream. The LimitStream can return the EOF > >> tuple > >> > > after it reads N tuples. This will cause the worker nodes to close > the > >> > > underlying stream and cause the Broken Pipe exception to occur at > the > >> > > /export handler, which will stop the /export. > >> > > > >> > > Here is the basic approach: > >> > > > >> > > parallel(limit(intersect(search, search))) > >> > > > >> > > > >> > > 2) It can be tricky to understand where the bottleneck lies when > using > >> > the > >> > > ParallelStream for parallel relational algebra. You can use the > >> > NullStream > >> > > to get an understanding of why performance is not increasing when > you > >> > > increase the workers. Here is the basic approach: > >> > > > >> > > parallel(null(intersect(search, search))) > >> > > > >> > > The NullStream will eat all the tuples on the workers and return a > >> single > >> > > tuple with the tuple count and the time taken to run the expression. > >> So > >> > > you'll get one tuple from each worker. This will eliminate any > >> bottleneck > >> > > on tuples returning through the ParallelStream and you can focus on > >> the > >> > > performance of the intersect and the /export handler. > >> > > > >> > >
Re: cursorMark value causes Request-URI Too Long excpetion
On 5/17/2017 2:40 AM, Giedrius wrote: > I've been using cursorMark for quite a while, but I noticed that > sometimes the value is huge (more than 8K). It results in Request-URI > Too Long response. Is there a way to send cursorMark in POST request's > Body? If it is, could you please provide an example? If post is not > possilbe, is there any other way to fix the issue? Yes, you can send any/all parameters as a POST request body. Exactly how to do this will depend on what method/language you're using to make the requests now. Without knowing that, I have no idea how you would adjust to use POST. You can also increase the header size that the servlet container running Solr will accept, but using POST is a better option. Thanks, Shawn
Re: Solr Admin Documents tab
On 5/16/2017 12:41 PM, Rick Leir wrote: > In the Solr Admin Documents tab, with the document type set to JSON, I cannot > get it to accept more than one document. The legend says "Document(s)". What > syntax is expected? It rejects an array of documents. Thanks -- Rick See the box labeled "Adding Multiple JSON Documents" on this page for an example of multiple JSON documents being added in one request: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-Solr-StyleJSON The next section, labeled "Sending JSON Update Commands" shows how to use the command-based syntax with multiple documents and commands. Thanks, Shawn
cursorMark value causes Request-URI Too Long excpetion
Hi, I've been using cursorMark for quite a while, but I noticed that sometimes the value is huge (more than 8K). It results in Request-URI Too Long response. Is there a way to send cursorMark in POST request's Body? If it is, could you please provide an example? If post is not possilbe, is there any other way to fix the issue? Thanks!
Re: setup solrcloud from scratch vie web-ui
On 5/17/2017 6:18 AM, Thomas Porschberg wrote: > Thank you. I am now a step further. > I could import data into the new collection with the DIH. However I observed > the following exception > in solr.log: > > request: > http://127.0.1.1:8983/solr/hugo_shard1_replica1/update?update.distrib=TOLEADER=http%3A%2F%2F127.0.1.1%3A8983%2Fsolr%2Fhugo_shard2_replica1%2F=javabin=2 > Remote error message: This IndexSchema is not mutable. This probably means that the configuration has an update processor that adds unknown fields, but is using the classic schema instead of the managed schema. If you want unknown fields to automatically be guessed and added, then you need the managed schema. If not, then remove the custom update processor chain. If this doesn't sound like what's wrong, then we will need the entire error message including the full Java stacktrace. That may be in the other instance's solr.log file. > I imagine to split my data per day of the year. My idea was to create 365 > shards of type compositeKey. You cannot control shard routing explicitly with the compositeId router. That router uses a hash of the uniqueKey field to decide which shard gets the document. As its name implies, the hash can be composite -- parts of the hash can be decided by multiple parts of the value in the field, but it's still hashed. You must use the implicit router (which means all routing is manual) if you want to explicitly name the shard that receives the data. Thanks, Shawn
Re: Performance warning: Overlapping onDeskSearchers=2 solr
On 5/17/2017 5:57 AM, Srinivas Kashyap wrote: > We are using Solr 5.2.1 version and are currently experiencing below Warning > in Solr Logging Console: > > Performance warning: Overlapping onDeskSearchers=2 > > Also we encounter, > > org.apache.solr.common.SolrException: Error opening new searcher. exceeded > limit of maxWarmingSearchers=2, try again later. > > > The reason being, we are doing mass update on our application and solr > experiencing the higher loads at times. Data is being indexed using DIH(sql > queries). > > In solrconfig.xml below is the code. > > > > Should we be uncommenting the above lines and try to avoid this error? Please > help me. This warning means that you are committing so frequently that there are already two searchers warming when you start another commit. DIH does a commit exactly once -- at the end of the import. One import will not cause the warning message you're seeing, so if there is one import happening at a time, either you are sending explicit commit requests during the import, or you have autoSoftCommit enabled with values that are far too small. You should definitely have autoCommit configured, but I would remove maxDocs and set maxTime to something like 6 -- one minute. The autoCommit should also set openSearcher to false. This kind of commit will not make new changes visible, but it will start a new transaction log frequently. 6 false An automatic commit (soft or hard) with a one second interval is going to cause that warning you're seeing. https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Thanks, Shawn
Re: solr /export handler - behavior during close()
Susmit, You could wrap a LimitStream around the outside of all the relational algebra. For example: parallel(limit((intersect(intersect(search, search), union(search, search) In this scenario the limit would happen on the workers. As far as the worker/replica ratio. This will depend on how heavy the export is. If it's a light export, small number of fields, mostly numeric, simple sort params, then I've seen a ratio of 5 (workers) to 1 (replica) work well. This will basically saturate the CPU on the replica. But heavier exports will saturate the replicas with fewer workers. Also I tend to use Direct DocValues to get the best performance. I'm not sure how much difference this makes, but it should eliminate the compression overhead fetching the data from the DocValues. Varun's suggestion of using the binary transport will provide a nice performance increase as well. But you'll need to upgrade. You may need to do that anyway as the fix on the early stream close will be on a later version that was refactored to support the binary transport. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, May 16, 2017 at 8:03 PM, Joel Bernsteinwrote: > Yep, saw it. I'll comment on the ticket for what I believe needs to be > done. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, May 16, 2017 at 8:00 PM, Varun Thacker wrote: > >> Hi Joel,Susmit >> >> I created https://issues.apache.org/jira/browse/SOLR-10698 to track the >> issue >> >> @Susmit looking at the stack trace I see the expression is using >> JSONTupleStream >> . I wonder if you tried using JavabinTupleStreamParser could it help >> improve performance ? >> >> On Tue, May 16, 2017 at 9:39 AM, Susmit Shukla >> wrote: >> >> > Hi Joel, >> > >> > queries can be arbitrarily nested with AND/OR/NOT joins e.g. >> > >> > (intersect(intersect(search, search), union(search, search))). If I cut >> off >> > the innermost stream with a limit, the complete intersection would not >> > happen at upper levels. Also would the limit stream have same effect as >> > using /select handler with rows parameter? >> > I am trying to force input stream close through reflection, just to see >> if >> > it gives performance gains. >> > >> > 2) would experiment with null streams. Is workers = number of replicas >> in >> > data collection a good thumb rule? is parallelstream performance upper >> > bounded by number of replicas? >> > >> > Thanks, >> > Susmit >> > >> > On Tue, May 16, 2017 at 5:59 AM, Joel Bernstein >> > wrote: >> > >> > > Your approach looks OK. The single sharded worker collection is only >> > needed >> > > if you were using CloudSolrStream to send the initial Streaming >> > Expression >> > > to the /stream handler. You are not doing this, so you're approach is >> > fine. >> > > >> > > Here are some thoughts on what you described: >> > > >> > > 1) If you are closing the parallel stream after the top 1000 results, >> > then >> > > try wrapping the intersect in a LimitStream. This stream doesn't exist >> > yet >> > > so it will be a custom stream. The LimitStream can return the EOF >> tuple >> > > after it reads N tuples. This will cause the worker nodes to close the >> > > underlying stream and cause the Broken Pipe exception to occur at the >> > > /export handler, which will stop the /export. >> > > >> > > Here is the basic approach: >> > > >> > > parallel(limit(intersect(search, search))) >> > > >> > > >> > > 2) It can be tricky to understand where the bottleneck lies when using >> > the >> > > ParallelStream for parallel relational algebra. You can use the >> > NullStream >> > > to get an understanding of why performance is not increasing when you >> > > increase the workers. Here is the basic approach: >> > > >> > > parallel(null(intersect(search, search))) >> > > >> > > The NullStream will eat all the tuples on the workers and return a >> single >> > > tuple with the tuple count and the time taken to run the expression. >> So >> > > you'll get one tuple from each worker. This will eliminate any >> bottleneck >> > > on tuples returning through the ParallelStream and you can focus on >> the >> > > performance of the intersect and the /export handler. >> > > >> > > Then experiment with: >> > > >> > > 1) Increasing the number of parallel workers. >> > > 2) Increasing the number of replicas in the data collections. >> > > >> > > And watch the timing information coming back from the NullStream >> tuples. >> > If >> > > increasing the workers is not improving performance then the >> bottleneck >> > may >> > > be in the /export handler. So try increasing replicas and see if that >> > > improves performance. Different partitions of the streams will be >> served >> > by >> > > different replicas. >> > > >> > > If performance doesn't improve with the NullStream after increasing >> both >> > > workers and replicas then we know the bottleneck is the network. >> > > >> > > Joel
Re: setup solrcloud from scratch vie web-ui
> Tom Evanshat am 17. Mai 2017 um 11:48 geschrieben: > > > On Wed, May 17, 2017 at 6:28 AM, Thomas Porschberg > wrote: > > Hi, > > > > I did not manipulating the data dir. What I did was: > > > > 1. Downloaded solr-6.5.1.zip > > 2. ensured no solr process is running > > 3. unzipped solr-6.5.1.zip to ~/solr_new2/solr-6.5.1 > > 3. started an external zookeeper > > 4. copied a conf directory from a working non-cloudsolr (6.5.1) to > >~/solr_new2/solr-6.5.1 so that I have ~/solr_new2/solr-6.5.1/conf > > (see http://randspringer.de/solrcloud_test/my.zip for content) > > ..in which you've manipulated the dataDir! :) > > The problem (I think) is that you have set a fixed data dir, and when > Solr attempts to create a second core (for whatever reason, in your > case it looks like you are adding a shard), Solr puts it exactly where > you have told it to, in the same directory as the previous one. It > finds the lock and blows up, because each core needs to be in a > separate directory, but you've instructed Solr to put them in the same > one. > > Start with a the solrconfig from basic_configs configset that ships > with Solr and add the special things that your installation needs. I > am not massively surprised that your non cloud config does not work in > cloud mode, when we moved to SolrCloud, we rewrote from scratch > solrconfig.xml and schema.xml, starting from basic_configs and adding > anything particular that we needed from our old config, checking every > difference that we have from stock config and noting/discerning why, > and ensuring that our field types are using the same names for the > same types as basic_config wherever possible. > > I only say all that because to fix this issue is a single thing, but > you should spend the time comparing configs because this will not be > the only issue. Anyway, to fix this problem, in your solrconfig.xml > you have: > > data > > It should be > > ${solr.data.dir:} > > Which is still in your config, you've just got it commented out :) Thank you. I am now a step further. I could import data into the new collection with the DIH. However I observed the following exception in solr.log: request: http://127.0.1.1:8983/solr/hugo_shard1_replica1/update?update.distrib=TOLEADER=http%3A%2F%2F127.0.1.1%3A8983%2Fsolr%2Fhugo_shard2_replica1%2F=javabin=2 Remote error message: This IndexSchema is not mutable. at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345) I also noticed that only one shard is filled. The wiki describes how to populate data with the rest api. However I use the data importer. I imagine to split my data per day of the year. My idea was to create 365 shards of type compositeKey. In my SQL I have a date field and it is no problem to overwrite data after one year. However I'm looking for a good example how to achieve this. May be I need in this case 365 dataimport.xml files under each shard one... with some modulo-expression for the specific day. Currently the dataimport.xml is in the conf directory. So I'm looking for a good example how to use the DIH with solrcloud. Should it work to create a implicit router instead of compositeKey router (with 365 shards) and simply specfiy as router.field= ? Thomas
Performance warning: Overlapping onDeskSearchers=2 solr
Hi All, We are using Solr 5.2.1 version and are currently experiencing below Warning in Solr Logging Console: Performance warning: Overlapping onDeskSearchers=2 Also we encounter, org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. The reason being, we are doing mass update on our application and solr experiencing the higher loads at times. Data is being indexed using DIH(sql queries). In solrconfig.xml below is the code. Should we be uncommenting the above lines and try to avoid this error? Please help me. Thanks and Regards, Srinivas Kashyap DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.
Re: knowing which fields were successfully hit
hey erik, totally unaware of those two. we're able to retrieve metadata about the query itself that way? -- *John Blythe* Product Manager & Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Tue, May 16, 2017 at 1:54 PM, Erik Hatcherwrote: > Is this the equivalent of facet.query’s? or maybe rather, group.query? > > Erik > > > > > On May 16, 2017, at 1:16 PM, Dorian Hoxha > wrote: > > > > Something like elasticsearch named-queries, right > > https://www.elastic.co/guide/en/elasticsearch/reference/ > current/search-request-named-queries-and-filters.html > > ? > > > > > > On Tue, May 16, 2017 at 7:10 PM, John Blythe wrote: > > > >> sorry for the confusion. as in i received results due to matches on > field x > >> vs. field y. > >> > >> i've gone w a highlighting solution for now. the fact that it requires > >> field storage isn't yet prohibitive for me, so can serve well for now. > open > >> to any alternative approaches all the same > >> > >> thanks- > >> > >> -- > >> *John Blythe* > >> Product Manager & Lead Developer > >> > >> 251.605.3071 | j...@curvolabs.com > >> www.curvolabs.com > >> > >> 58 Adams Ave > >> Evansville, IN 47713 > >> > >> On Tue, May 16, 2017 at 11:37 AM, David Hastings < > >> hastings.recurs...@gmail.com> wrote: > >> > >>> what do you mean "hit?" As in the user clicked it? > >>> > >>> On Tue, May 16, 2017 at 11:35 AM, John Blythe > >> wrote: > >>> > hey all. i'm sending data out that could represent a purchased item or > >> a > competitive alternative. when the results are returned i'm needing to > >>> know > which of the two were hit so i can serve up the *other*. > > i can make a blunt instrument in the application layer to simply look > >>> for a > match between the queried terms and the resulting fields, but the > >> problem > of fuzzy matching and some of the special analysis being done to get > >> the > hits will be for naught. > > cursory googling landed me at a similar discussion that suggested > using > >>> hit > highlighting or retrieving the debuggers explain data to sort through. > > is there another, more efficient means or are these the two tools in > >> the > toolbox? > > thanks! > > >>> > >> > >
Re: setup solrcloud from scratch vie web-ui
On Wed, May 17, 2017 at 6:28 AM, Thomas Porschbergwrote: > Hi, > > I did not manipulating the data dir. What I did was: > > 1. Downloaded solr-6.5.1.zip > 2. ensured no solr process is running > 3. unzipped solr-6.5.1.zip to ~/solr_new2/solr-6.5.1 > 3. started an external zookeeper > 4. copied a conf directory from a working non-cloudsolr (6.5.1) to >~/solr_new2/solr-6.5.1 so that I have ~/solr_new2/solr-6.5.1/conf > (see http://randspringer.de/solrcloud_test/my.zip for content) ..in which you've manipulated the dataDir! :) The problem (I think) is that you have set a fixed data dir, and when Solr attempts to create a second core (for whatever reason, in your case it looks like you are adding a shard), Solr puts it exactly where you have told it to, in the same directory as the previous one. It finds the lock and blows up, because each core needs to be in a separate directory, but you've instructed Solr to put them in the same one. Start with a the solrconfig from basic_configs configset that ships with Solr and add the special things that your installation needs. I am not massively surprised that your non cloud config does not work in cloud mode, when we moved to SolrCloud, we rewrote from scratch solrconfig.xml and schema.xml, starting from basic_configs and adding anything particular that we needed from our old config, checking every difference that we have from stock config and noting/discerning why, and ensuring that our field types are using the same names for the same types as basic_config wherever possible. I only say all that because to fix this issue is a single thing, but you should spend the time comparing configs because this will not be the only issue. Anyway, to fix this problem, in your solrconfig.xml you have: data It should be ${solr.data.dir:} Which is still in your config, you've just got it commented out :) Cheers Tom
CollapsingQParserPlugin with more than one field
Hi, I want to group huge documents with 3 or 4 fields, but CollapsingQParserPlugin works only with one field. If I add more than one fq for CollapsingQParserPlugin for different fields, it doesn't group the fields but it uses the group of the first field as input to the second one and hence gives me wrong results. I want to group by more than one field at the same collector, is there a way for this? Thanks Mikhail