Timothy Potter created SOLR-9961:
------------------------------------

             Summary: RestoreCore needs the option to download files in 
parallel.
                 Key: SOLR-9961
                 URL: https://issues.apache.org/jira/browse/SOLR-9961
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: Backup/Restore
    Affects Versions: 6.2.1
            Reporter: Timothy Potter


My backup to cloud storage (Google cloud storage in this case, but I think this 
is a general problem) takes 8 minutes ... the restore of the same core takes 
hours. The restore loop in RestoreCore is serial and doesn't allow me to 
parallelize the expensive part of this operation (the IO from the remote cloud 
storage service). We need the option to parallelize the download (like distcp). 

Also, I tried downloading the same directory using gsutil and it was very fast, 
like 2 minutes. So I know it's not the pipe that's limiting perf here.

Here's a very rough patch that does the parallelization. We may also want to 
consider a two-step approach: 1) download in parallel to a temp dir, 2) perform 
all the of the checksum validation against the local temp dir. That will save 
round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to