[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-07-23 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890751#comment-16890751
 ] 

Tim Owen commented on SOLR-9961:


Thanks.. yes indeed, we had to cherry pick the patch for that into our build. 
Finally everything is working!

 

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-07-23 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890741#comment-16890741
 ] 

Mikhail Khludnev commented on SOLR-9961:


bq.  backups to HDFS and to S3 (via S3A)
[~TimOwen], beware of SOLR-11556. 

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-07-22 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890015#comment-16890015
 ] 

Tim Owen commented on SOLR-9961:


Thanks Mikhail, we're interested in your findings too, as we do backups to HDFS 
and to S3 (via S3A) and are currently profiling performance of backups and 
restores in particular.

 

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-07-22 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889991#comment-16889991
 ] 

Mikhail Khludnev commented on SOLR-9961:


[~TimOwen], I'm not able to measure it now, your observations are really 
appreciated. Note, it's purposed for clouds where it might be more significant 
than in hdfs.  

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-07-22 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889969#comment-16889969
 ] 

Tim Owen commented on SOLR-9961:


Just curious if you tried increasing the copy buffer size as per SOLR-13029 to 
speed up restores? It would be good to compare the performance of making that 
change, vs the extra complexity of parallelisation.

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-07-20 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889599#comment-16889599
 ] 

Mikhail Khludnev commented on SOLR-9961:


Linking a bunch of jiras proving that {{fs.hdfs.impl.disable.cache=true}} is 
ours' everything, which hard to believe for me.  

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-07-19 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888695#comment-16888695
 ] 

Lucene/Solr QA commented on SOLR-9961:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
22s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  3m  0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  3m  0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  3m  0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 59m 
49s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-9961 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12974867/SOLR-9961.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 24b94b8 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/499/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/499/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-07-01 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876525#comment-16876525
 ] 

Mikhail Khludnev commented on SOLR-9961:


Design would be: 
* {{BackupRepositoryFactory}} holds shared thread pool
* thread pool is injected into created {{BackupRepository}} optionally
* Restore (Backup) operation(s) uses dedicated operation {{listAll(path, 
lambda)}} or {{forEach(list/file, lambda)}}
* Repoes, which accepted thread pool, invoke the lambda in threads
* Lambda accepts a repository delegate and expected to operate with it. This 
delegate reuses HDFS and close/release it after it's done. 
WDYT?   

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-06-30 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875750#comment-16875750
 ] 

Lucene/Solr QA commented on SOLR-9961:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
1s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m 54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m 54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m 54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 33m  
5s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 40m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-9961 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973241/SOLR-9961.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.4.0-137-generic #163~14.04.1-Ubuntu SMP Mon 
Sep 24 17:14:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 2fdb4dd |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/473/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/473/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-06-30 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875740#comment-16875740
 ] 

Mikhail Khludnev commented on SOLR-9961:


Here's the question: what should hold thread pool, repository factory 
(singleton) or repository instance, which is made for every operation (a few 
times) and are n't closed yet SOLR-13587?  

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, 
> SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-06-29 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875448#comment-16875448
 ] 

Lucene/Solr QA commented on SOLR-9961:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} SOLR-9961 does not apply to master. Rebase required? Wrong 
Branch? See 
https://wiki.apache.org/solr/HowToContribute#Creating_the_patch_file for help. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-9961 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973125/SOLR-9961.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/466/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-06-27 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874521#comment-16874521
 ] 

Mikhail Khludnev commented on SOLR-9961:


Attached dirty draft. Really dirty. Turns out backup repos aren't closed in the 
code ever now.  I'm really surprised. 

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2019-06-13 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863500#comment-16863500
 ] 

Mikhail Khludnev commented on SOLR-9961:


{quote}This can even benefit backup operation. What do you think ?
{quote}
+1 to the question. [~thelabdude], what's your take on that? 
{quote}API in BackupRepository interface which accepts a list of files to be 
copied
{quote}
+1

 

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2018-11-30 Thread Tim Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705257#comment-16705257
 ] 

Tim Owen commented on SOLR-9961:


We considered using this patch locally, but actually found the problem was in 
slow HDFS restores because of an undersized copy buffer. See SOLR-13029 for our 
change to alleviate that. Since we had lots of collections to restore, we did 
those in parallel instead of making the file restore parallelised. But the 
buffer patch made each file restore about 10x faster, with a 256kB buffer 
instead of 4k.

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
>Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2017-01-13 Thread Hrishikesh Gadre (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822297#comment-15822297
 ] 

Hrishikesh Gadre commented on SOLR-9961:


[~thelabdude] I think this is a great improvement! Couple of comments,

bq. But as stated in the description, this now causes the various FileSystem 
already closed issue, so would need to be used with hdfs cache disabled.

I think the root cause of this problem is the fact that HdfsDirectory is using 
FileSystem.get(...) API. If we change that to FileSystem.newInstance(...) that 
problem will most likely go away. I think this would be a better solution than 
disabling HDFS caching. [~markrmil...@gmail.com] any thoughts?

bq. adds an option for BackupRepository implementations to download in parallel 
using a thread pool.

It seems a bit odd to add this configuration to BackupRepository interface. If 
we can ensure that all BackupRepository implementations to support concurrent 
copy operations then we can make the thread-pool and time out configurations 
global. For this to be feasible, the BackupRepository implementation just needs 
to make sure that the client state kept separate for each copy operation (which 
I think is doable)

The other approach could be to add another API in BackupRepository interface 
which accepts a list of files to be copied. The implementation of this API can 
choose to use multi-threaded (or a sequential) execution. This can even benefit 
backup operation. What do you think ?

Also as a minor comment, did you think about using CompletionService to fetch 
the results of completed tasks? Seems a bit cleaner...
   


> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
> Attachments: SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2017-01-13 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822190#comment-15822190
 ] 

Timothy Potter commented on SOLR-9961:
--

The other thing I found here is HdfsDirectory is closing a shared FileSystem 
object because HdfsBackupRepository uses try with resources:

{code}
  @Override
  public void copyFileTo(URI sourceRepo, String fileName, Directory dest) 
throws IOException {
try (HdfsDirectory dir = new HdfsDirectory(new Path(sourceRepo), 
NoLockFactory.INSTANCE,
hdfsConfig, HdfsDirectory.DEFAULT_BUFFER_SIZE * 10)) {
  dest.copyFrom(dir, fileName, fileName, 
DirectoryFactory.IOCONTEXT_NO_CACHE);
}
  }
{code}

This closes the FileSystem object that was retrieved with FileSystem.get. 
Because of this (I think), I'm seeing lots of errors like the following while 
doing the restore:
{code}
WARN  - 2017-01-13 14:09:44.249; [   ] org.apache.solr.handler.RestoreCore; 
Exception while restoring the backup index 
java.lang.RuntimeException: Problem creating directory: 
gs://hd-fusion/aggr_solr/myAggr3/snapshot.shard1
at 
org.apache.solr.store.hdfs.HdfsDirectory.(HdfsDirectory.java:91)
at 
org.apache.solr.core.backup.repository.HdfsBackupRepository.copyFileTo(HdfsBackupRepository.java:175)
at 
org.apache.solr.handler.RestoreCore.downloadFile(RestoreCore.java:196)
at org.apache.solr.handler.RestoreCore.access$000(RestoreCore.java:47)
at org.apache.solr.handler.RestoreCore$1.call(RestoreCore.java:101)
at org.apache.solr.handler.RestoreCore$1.call(RestoreCore.java:99)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: GoogleHadoopFileSystem has been closed or not 
initialized.
at 
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.checkOpen(GoogleHadoopFileSystemBase.java:1802)
at 
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1284)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at 
org.apache.solr.store.hdfs.HdfsDirectory.(HdfsDirectory.java:83)
... 9 more
{code}

There's a handy prop that allows you to disable the cache (add to 
core-site.xml), which makes this error go away:
{code}
  
fs.gs.impl.disable.cache
true
  
{code}

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
> Attachments: SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

2017-01-13 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821860#comment-15821860
 ] 

Timothy Potter commented on SOLR-9961:
--

Patch is not ready for commit. We need to think about how to provide some 
config options like max time to wait and number of threads. Right now, I get 
the number of threads from a sys prop, but I think it should probably come 
through a marker interface that specific backup repos can implement ... will 
post up a better version later today.

> RestoreCore needs the option to download files in parallel.
> ---
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 6.2.1
>Reporter: Timothy Potter
> Attachments: SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org