Re: Handling All Replicas Down in Solr 8.3 Cloud Collection
Here's roughly what was going on: 1. set up three node cluster with a collection. The collection has one shard and three replicas for that shard. 2. Shut down two of the nodes and verify the remaining node is the leader. Verified the other two nodes are registered as dead in solr ui. 3. bulk import several million documents into solr from a CSV file. 4. shut down the remaining node 5. start up all three nodes Even after three minutes no leader was active. I executed the FORCELEADER API call which completed successfully and waited three minutes -- still no replica was elected leader. I then compared my solr 8 cluster to a different solr cluster. I noticed that the znode */collections/example/leaders/shard1 *existed on both clusters but in the solr 8 cluster the znode was empty. I manually uploaded a json document with the proper settings to that znode and then called the FORCELEADER API call again and waited 3 minutes. A leader still wasn't elected. Then, I removed the replica for the node that I imported all the documents into it and added the replica back in. At that point, a leader was elected. I am not sure i have exact steps to reproduce but I did get it working. Thanks, Joe On Tue, Feb 4, 2020 at 7:54 AM Erick Erickson wrote: > First, be sure to wait at least 3 minutes before concluding the replicas > are permanently down, that’s the default wait period for certain leader > election fallbacks. It’s easy to conclude it’s never going to recover, 180 > seconds is an eternity ;). > > You can try the collections API FORCELEADER command. Assuming a leader is > elected and becomes active, you _may_ have to restart the other two Solr > nodes. > > How did you stop the servers? You mention disaster recovery, so I’m > thinking you did a “kill -9” or similar? Were you actively indexing at the > time? Solr _should_ manage the recovery even in that case, I’m mostly > wondering what the sequence of events that lead up to this was… > > Best, > Erick > > > On Feb 4, 2020, at 8:38 AM, Joseph Lorenzini wrote: > > > > Hi all, > > > > I have a 3 node solr cloud instance with a single collection. The solr > > nodes are pointed to a 3-node zookeeper ensemble. I was doing some basic > > disaster recovery testing and have encountered a problem that hasn't been > > obvious to me on how to fix. > > > > After i started back up the three solr java processes, i can see that > they > > are registered back in the solr UI. However, each replica is in a down > > state permanently. there are no logs in either solr or zookeeper that may > > indicate what the the problem would be -- neither exceptions nor > warnings. > > > > So is there any way to collect more diagnostics to figure out what's > going > > on? Short of deleting and recreating the replicas is there any way to fix > > this? > > > > Thanks, > > Joe > >
Handling All Replicas Down in Solr 8.3 Cloud Collection
Hi all, I have a 3 node solr cloud instance with a single collection. The solr nodes are pointed to a 3-node zookeeper ensemble. I was doing some basic disaster recovery testing and have encountered a problem that hasn't been obvious to me on how to fix. After i started back up the three solr java processes, i can see that they are registered back in the solr UI. However, each replica is in a down state permanently. there are no logs in either solr or zookeeper that may indicate what the the problem would be -- neither exceptions nor warnings. So is there any way to collect more diagnostics to figure out what's going on? Short of deleting and recreating the replicas is there any way to fix this? Thanks, Joe
Re: Importing Large CSV File into Solr Cloud Fails with 400 Bad Request
Hi Shawn/Erick, This information has been very helpful. Thank you. So I did some more investigation into our ETL process and I verified that with the exception of the text I sent above they are all obviously invalid dates. For example, one field value had 00 for a day so would guess that field had a non-printable character in it. S at least in the case of a record where a field has invalid date, the entire import process is aborted. I'll adjust the ETL process to stop passing invalid dates but this does lead me to question about failure modes for importing large data sets into a collection. Is there any way to specify a "continue on failure" mode such that solr logs that it was unable to parse a record and why and then continues onto the next node? Thanks, Joe On Sun, Feb 2, 2020 at 4:46 PM Shawn Heisey wrote: > On 2/2/2020 8:47 AM, Joseph Lorenzini wrote: > > > > 1000 > > 1 > > > > That autoSoftCommit setting is far too aggressive, especially for bulk > indexing. I don't know whether it's causing the specific problem you're > asking about here, but it's still a setting that will cause problems, > because Solr will constantly be doing commit operations while bulk > indexing is underway. > > Erick mentioned this as well. Greatly increasing the maxTime, and > removing maxDocs, is recommended. I would recommend starting at one > minute. The maxDocs setting should be removed from autoCommit as well. > > > So I turned off two solr nodes, leaving a single solr node up. When I ran > > curl again, I noticed the import aborted with this exception. > > > > Error adding field 'primary_dob'='1983-12-21T00:00:00Z' msg=Invalid Date > in > > Date Math String:'1983-12-21T00:00:00Z > > caused by: java.time.format.DateTimeParseException: Text > > '1983-12-21T00:00:00Z' could not be parsed at index 0' > > That date string looks OK. Which MIGHT mean there are characters in it > that are not visible. Erick said that the single quote is balanced in > his message, which COULD mean that the character causing the problem is > one that deletes things when it is printed. > > Thanks, > Shawn >
Re: Importing Large CSV File into Solr Cloud Fails with 400 Bad Request
Hi Eric, Thanks for the help. For commit settings, you are referring to https://lucene.apache.org/solr/guide/8_3/updatehandlers-in-solrconfig.html. If so, yes, i have soft commits on. According to the docs, open search is turned by default. Here are the settings. 60 18 1000 1 Please note, I am actually streaming a file from disk -- i am not sending the data via curl. curl is merely telling solr what local file to read from. So I turned off two solr nodes, leaving a single solr node up. When I ran curl again, I noticed the import aborted with this exception. Error adding field 'primary_dob'='1983-12-21T00:00:00Z' msg=Invalid Date in Date Math String:'1983-12-21T00:00:00Z caused by: java.time.format.DateTimeParseException: Text '1983-12-21T00:00:00Z' could not be parsed at index 0' This field is a DatePointField. I've verified that if i remove records with a DatePointField that has parsing problems then solr upload proceeds further until it hits another record with a similar problem. I was surprised that a single record with invalid DatePointField would abort the whole process but that does seem to be what's happening. So that's easy enough to fix if I knew why the text was failing to parse. The date certainly seems valid to me based on this documentation. http://lucene.apache.org/solr/7_2_1/solr-core/org/apache/solr/schema/DatePointField.html Any ideas on why that won't parse? Thanks, Joe On Sun, Feb 2, 2020 at 8:51 AM Erick Erickson wrote: > What are your commit settings? Solr keeps certain in-memory structures > between commits, so it’s important to commit periodically. Say every 60 > seconds as a straw-man proposal (and openSearcher should be set to > true or soft commits should be enabled). > > When firing a zillion docs at Solr, it’s also best that your commits (both > hard > and soft) aren’t happening too frequently, thus my 60 second proposal. > > The commit on the command you send will be executed after the last doc > is sent, so it’s irrelevant to the above. > > Apart from that, when indexing every time you do commit, background > merges are kicked off and there’s a limited number of threads that are > allowed to run concurrently. When that max is reached the next update is > queued until one of the threads is free. So you _may_ be hitting a simple > timeout that’s showing up as a 400 error, which is something of a > catch-all return code. If this is the case, just lengthening the timeouts > might fix the issue. > > Are you sending the documents to the leader? That’ll make the process > simpler since docs received by followers are simply forwarded to the > leader. That shouldn’t really matter, just a side-note. > > Not all that helpful I know. Does the failure happen in the same place? > I.e. > is it possible that a particular doc is making this happen? Unlikely, but > worth > asking. One bad doc shouldn’t stop the whole process, but it’d be a clue > if there was. > > If you’re particularly interested in performance, you should consider > indexing to a leader-only collection, either by deleting the followers or > shutting down the Solr instances. There’s a performance penalty due to > forwarding the docs (talking NRT replicas here) that can be quite > substantial. When you turn the Solr instances back on (or ADDREPLICA), > they’ll sync back up. > > Finally, I mistrust just sending a large amount of data via HTTP, just > because > there’s not much you can do except hope it all works. If this is a > recurring > process I’d seriously consider writing a SolrJ program that parsed the > csv file and sent it to Solr. > > Best, > Erick > > > > > On Feb 2, 2020, at 9:32 AM, Joseph Lorenzini wrote: > > > > Hi all, > > > > I have three node solr cloud cluster. The collection has a single shard. > I > > am importing 140 GB CSV file into solr using curl with a URL that looks > > roughly like this. I am streaming the file from disk for performance > > reasons. > > > > > http://localhost:8983/solr/example/update?separator=%09=/tmp/input.tsv=text/csv;charset=utf-8=true=true=%7C > > > > There are 139 million records in that file. I am able to import about > > 800,000 records into solr at which point solr hangs and then several > > minutes later returns a 400 bad request back to curl. I looked in the > logs > > and I did find a handful of exceptions (e.g invalid date, docvalues field > > is too large etc) for particular records but nothing that would explain > why > > the processing stalled and failed. > > > > My expectation is that if solr encounters a record it cannot ingest, it > > will throw an exception for that particular record and continue > processing > > the next record. Is that how the importing works or do all records need > to > > be valid? If invalid records should not abort the process, then does > anyone > > have any idea what might be going on here? > > > > Thanks, > > Joe > >
Importing Large CSV File into Solr Cloud Fails with 400 Bad Request
Hi all, I have three node solr cloud cluster. The collection has a single shard. I am importing 140 GB CSV file into solr using curl with a URL that looks roughly like this. I am streaming the file from disk for performance reasons. http://localhost:8983/solr/example/update?separator=%09=/tmp/input.tsv=text/csv;charset=utf-8=true=true=%7C There are 139 million records in that file. I am able to import about 800,000 records into solr at which point solr hangs and then several minutes later returns a 400 bad request back to curl. I looked in the logs and I did find a handful of exceptions (e.g invalid date, docvalues field is too large etc) for particular records but nothing that would explain why the processing stalled and failed. My expectation is that if solr encounters a record it cannot ingest, it will throw an exception for that particular record and continue processing the next record. Is that how the importing works or do all records need to be valid? If invalid records should not abort the process, then does anyone have any idea what might be going on here? Thanks, Joe
Cost of Stored=True Setting for All Fields
Hi all, I am in the process of migrating a solr collection from 4 to 8. I discovered that there was no ETL process for loading all the data into a new collection in solr 8, so I had to build one. For technical reasons that aren't important here, I'd prefer this tool to be a one-off. In the future, I'd like to use the Solr DIH to do the reindexing. However, that can only work if the DIH can get all the solr fields. I discovered that in solr 4 at least if a field is set to stored=false, then the DIH won't get that field. So I am wondering if i can fix this by simply set stored=true for all the fields. Since I am going to have to do a full re-index for solr 8 migration, now would be the time to update the schema for this. I expect that disk size would grow but I'd like to find out if there are any other costs or potential problems that could come up if i go that route. Thanks, Joe
Performance of Bulk Importing TSV File in Solr 8
Hi all, I have TSV file that contains 1.2 million rows. I want to bulk import this file into solr where each row becomes a solr document. The TSV has 24 columns. I am using the streaming API like so: curl -v ' http://localhost:8983/solr/example/update?stream.file=/opt/solr/results.tsv=%09=%5c=text/csv;charset=utf-8=true ' The ingestion rate is 167,000 rows a minute and takes about 7.5 minutes to complete. I have a few questions. - is there a way to increase the performance of the ingestion rate? I am open to doing something other than bulk import of a TSV up to and including writing a small program. I am just not sure what that would look like at a high level. - if the file is a TSV, I noticed that solr never closes a HTTP connection with a 200 OK after all the documents are uploaded. The connection seems to be held open indefinitely. If however, i upload the same file as a CSV, then solr does close the http connection. Is this a bug?
need for re-indexing when using managed schema
Hi all, I have question about the managed schema functionality. According to the docs, "All changes to a collection’s schema require reindexing". This would imply that if you use a managed schema and you use the schema API to update the schema, then doing a full re-index is necessary each time. Is this accurate or can a full re-index be avoided? Thanks, Joe