No, it means # HTTP calls = # executor slots. But even then, you're welcome to, say, use thread pools to execute even more concurrently as most are I/O bound. Your code can do what you want.
On Thu, May 14, 2020 at 6:14 PM Chetan Khatri <chetan.opensou...@gmail.com> wrote: > > Thanks, that means number of executor = number of http calls, I can make. I > can't boost more number of http calls in single executors, I mean - I can't > go beyond the threashold of number of executors. > > On Thu, May 14, 2020 at 6:26 PM Sean Owen <sro...@gmail.com> wrote: >> >> Default is not 200, but the number of executor slots. Yes you can only >> simultaneously execute as many tasks as slots regardless of partitions. >> >> On Thu, May 14, 2020, 5:19 PM Chetan Khatri <chetan.opensou...@gmail.com> >> wrote: >>> >>> Thanks Sean, Jerry. >>> >>> Default Spark DataFrame partitions are 200 right? does it have relationship >>> with number of cores? 8 cores - 4 workers. is not it like I can do only 8 * >>> 4 = 32 http calls. Because in Spark number of partitions = number cores is >>> untrue. >>> >>> Thanks >>> >>> On Thu, May 14, 2020 at 6:11 PM Sean Owen <sro...@gmail.com> wrote: >>>> >>>> Yes any code that you write in code that you apply with Spark runs in >>>> the executors. You would be running as many HTTP clients as you have >>>> partitions. >>>> >>>> On Thu, May 14, 2020 at 4:31 PM Jerry Vinokurov <grapesmo...@gmail.com> >>>> wrote: >>>> > >>>> > I believe that if you do this within the context of an operation that is >>>> > already parallelized such as a map, the work will be distributed to >>>> > executors and they will do it in parallel. I could be wrong about this >>>> > as I never investigated this specific use case, though. >>>> > >>>> > On Thu, May 14, 2020 at 5:24 PM Chetan Khatri >>>> > <chetan.opensou...@gmail.com> wrote: >>>> >> >>>> >> Thanks for the quick response. >>>> >> >>>> >> I am curious to know whether would it be parallel pulling data for 100+ >>>> >> HTTP request or it will only go on Driver node? the post body would be >>>> >> part of DataFrame. Think as I have a data frame of employee_id, >>>> >> employee_name now the http GET call has to be made for each employee_id >>>> >> and DataFrame is dynamic for each spark job run. >>>> >> >>>> >> Does it make sense? >>>> >> >>>> >> Thanks >>>> >> >>>> >> >>>> >> On Thu, May 14, 2020 at 5:12 PM Jerry Vinokurov <grapesmo...@gmail.com> >>>> >> wrote: >>>> >>> >>>> >>> Hi Chetan, >>>> >>> >>>> >>> You can pretty much use any client to do this. When I was using Spark >>>> >>> at a previous job, we used OkHttp, but I'm sure there are plenty of >>>> >>> others. In our case, we had a startup phase in which we gathered >>>> >>> metadata via a REST API and then broadcast it to the workers. I think >>>> >>> if you need all the workers to have access to whatever you're getting >>>> >>> from the API, that's the way to do it. >>>> >>> >>>> >>> Jerry >>>> >>> >>>> >>> On Thu, May 14, 2020 at 5:03 PM Chetan Khatri >>>> >>> <chetan.opensou...@gmail.com> wrote: >>>> >>>> >>>> >>>> Hi Spark Users, >>>> >>>> >>>> >>>> How can I invoke the Rest API call from Spark Code which is not only >>>> >>>> running on Spark Driver but distributed / parallel? >>>> >>>> >>>> >>>> Spark with Scala is my tech stack. >>>> >>>> >>>> >>>> Thanks >>>> >>>> >>>> >>>> >>>> >>> >>>> >>> >>>> >>> -- >>>> >>> http://www.google.com/profiles/grapesmoker >>>> > >>>> > >>>> > >>>> > -- >>>> > http://www.google.com/profiles/grapesmoker --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org