[ 
https://issues.apache.org/jira/browse/SPARK-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13700.
-------------------------------
    Resolution: Not A Problem

I think this might be best to float on a list first, since I don't think this 
would be implemented. 

Some things like this already exist in {{AsyncRDDActions}} I have the 
impression they're not quite deprecated but also something that isn't going to 
be added to. The semantics are tough to get right.

However you seem to be talking about something else, where you implement N 
expensive synchronous calls in a map operation. You can already do this 
yourself with mapPartitions and a thread pool.

> Rdd.mapAsync(): Easily mix Spark and asynchroneous transformation
> -----------------------------------------------------------------
>
>                 Key: SPARK-13700
>                 URL: https://issues.apache.org/jira/browse/SPARK-13700
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Paulo Costa
>            Priority: Minor
>              Labels: async, features, rdd, transform
>
> Spark is great for synchronous operations.
> But sometimes I need to call a database/web server/etc from my transform, and 
> the Spark pipeline stalls waiting for it.
> Avoiding that would be great!
> I suggest we add a new method RDD.mapAsync(), which can execute these 
> operations concurrently, avoiding the bottleneck.
> I've written a quick'n'dirty implementation of what I have in mind: 
> https://gist.github.com/paulo-raca/d121cf27905cfb1fafc3
> What do you think?
> If you agree with this feature, I can work on a pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to