[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-11-26 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-558654601
 
 
   The error is: 
   
   ```
   [ERROR] Failed to execute goal on project spark-catalyst_2.12: Could not 
resolve dependencies for project 
org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT: The following 
artifacts could not be resolved: 
org.codehaus.janino:commons-compiler:jar:3.0.15, 
com.univocity:univocity-parsers:jar:2.8.3, 
org.apache.arrow:arrow-vector:jar:0.15.1: Could not transfer artifact 
org.codehaus.janino:commons-compiler:jar:3.0.15 from/to central 
(https://repo.maven.apache.org/maven2): Connection timed out (Read failed) -> 
[Help 1]
   ```
   
   Meanwhile this link is working: [janino 
3.0.15](https://repo.maven.apache.org/maven2/org/codehaus/janino/janino/3.0.15/).
   
   Could it be the local maven repo in the Jenkins which runs the PR builder 
should be purged?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-11-26 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-558624810
 
 
   jenkins retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-11-25 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-558214745
 
 
   @tgravescs The block content reading will occur at the client side of the 
'ShuffleBlockFetchIterator' as the blocks are wrapped into 
'FileSegmentManagedBuffer'.
   Over and above on one specific executor the host local blocks with cached 
executor directories are handled synchronously in one single thread (just like 
local blocks) and host local blocks for which directories are not cached are 
handled on a separate but single thread (on the thread which used to get all 
the missing directories via one RPC call).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-11-22 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-557687439
 
 
   jenkins retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-11-15 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-554339093
 
 
   jenkins retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-11-13 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-553580712
 
 
   With my previous commit accessing host local directories became 
asynchronous. And it is uses the external shuffle service where directories are 
cached for all host local executors (as 
`org.apache.spark.network.shuffle.ExternalShuffleBlockResolver#executorRemoved` 
does not remove any entry from `executors`) so this way no fallback to remote 
fetch is needed.
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-10-07 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-539100344
 
 
   I am working on a solution where the local directories for the executor 
(which are not cached at the executor side) is loaded from the external shuffle 
service asynchronously. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-09-06 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-528886783
 
 
   The current state only covers the original idea to have communication with 
the driver: no files 
(https://github.com/apache/spark/pull/25299#issuecomment-527029909) and no 
async parts (https://github.com/apache/spark/pull/25299#discussion_r318239195).
   
   If you agree I would do the mentioned improvements in a separate PR / Jira. 
What do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-17 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-522217408
 
 
   jenkins retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-16 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-522207781
 
 
   The last three builds were failing for different reasons and all seams to me 
unrelated for my change. I trigger the build again without any change 
(previously rebased on top of master).
   
   jenkins retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-16 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-521998645
 
 
   retest this please.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-15 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-521748111
 
 
   retest this please.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-15 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-521709371
 
 
   ok to test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-15 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-521670791
 
 
   retest this please.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-15 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-521650177
 
 
   jenkins retest this please
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-15 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-521633221
 
 
   jenkins retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-12 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-520345813
 
 
   @maropu I think the performance gains would be significant if the block size 
is over "spark.maxRemoteBlockSizeFetchToMem" (which default is 2GB), then 
without this change the shuffle block during fetching:
   1) would be read from disk
   2) sent via the network
   3) streamed to disk
   4) and finally re-read from disk when it is used.
   
   With this change it would be just read from disk directly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-05 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-518320725
 
 
   I can revert the metric changes and can count the host-local related bytes 
and number of blocks into the local ones. So should I go ahead and do the 
revert?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-02 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-517737974
 
 
   @tgravescs unfortunately haven't run any as for this case I have no good 
performance tests. 
   But I am positive it makes a difference especially for small clusters.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-07-31 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-516867434
 
 
   No. To fetch a block from the external shuffle service the network must be 
used. 
   Here the executor can read the block for itself from the disk directly so 
the fetching via network is saved. What I meant the mechanism how it is done 
(reading the disk of the other executors) is the very same.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-07-31 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-516843077
 
 
   Yes.
   
   Basically external shuffle service basically does the same: I mean it reads 
the local disks of the host local executors directly (it has its own mapping 
from app ID and executor ID tuples to local disk). With this feature a regular 
executor will be able to do the same (here the mapping from executor to local 
disk is at the block manager master). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org