Re: Calling HTTP Rest APIs from Spark Job

Sean Owen Thu, 14 May 2020 16:28:27 -0700

No, it means # HTTP calls = # executor slots. But even then, you're
welcome to, say, use thread pools to execute even more concurrently as
most are I/O bound. Your code can do what you want.


On Thu, May 14, 2020 at 6:14 PM Chetan Khatri
<chetan.opensou...@gmail.com> wrote:
>
> Thanks, that means number of executor = number of http calls, I can make. I 
> can't boost more number of http calls in single executors, I mean - I can't 
> go beyond the threashold of number of executors.
>
> On Thu, May 14, 2020 at 6:26 PM Sean Owen <sro...@gmail.com> wrote:
>>
>> Default is not 200, but the number of executor slots. Yes you can only 
>> simultaneously execute as many tasks as slots regardless of partitions.
>>
>> On Thu, May 14, 2020, 5:19 PM Chetan Khatri <chetan.opensou...@gmail.com> 
>> wrote:
>>>
>>> Thanks Sean, Jerry.
>>>
>>> Default Spark DataFrame partitions are 200 right? does it have relationship 
>>> with number of cores? 8 cores - 4 workers. is not it like I can do only 8 * 
>>> 4 = 32 http calls. Because in Spark number of partitions = number cores is 
>>> untrue.
>>>
>>> Thanks
>>>
>>> On Thu, May 14, 2020 at 6:11 PM Sean Owen <sro...@gmail.com> wrote:
>>>>
>>>> Yes any code that you write in code that you apply with Spark runs in
>>>> the executors. You would be running as many HTTP clients as you have
>>>> partitions.
>>>>
>>>> On Thu, May 14, 2020 at 4:31 PM Jerry Vinokurov <grapesmo...@gmail.com> 
>>>> wrote:
>>>> >
>>>> > I believe that if you do this within the context of an operation that is 
>>>> > already parallelized such as a map, the work will be distributed to 
>>>> > executors and they will do it in parallel. I could be wrong about this 
>>>> > as I never investigated this specific use case, though.
>>>> >
>>>> > On Thu, May 14, 2020 at 5:24 PM Chetan Khatri 
>>>> > <chetan.opensou...@gmail.com> wrote:
>>>> >>
>>>> >> Thanks for the quick response.
>>>> >>
>>>> >> I am curious to know whether would it be parallel pulling data for 100+ 
>>>> >> HTTP request or it will only go on Driver node? the post body would be 
>>>> >> part of DataFrame. Think as I have a data frame of employee_id, 
>>>> >> employee_name now the http GET call has to be made for each employee_id 
>>>> >> and DataFrame is dynamic for each spark job run.
>>>> >>
>>>> >> Does it make sense?
>>>> >>
>>>> >> Thanks
>>>> >>
>>>> >>
>>>> >> On Thu, May 14, 2020 at 5:12 PM Jerry Vinokurov <grapesmo...@gmail.com> 
>>>> >> wrote:
>>>> >>>
>>>> >>> Hi Chetan,
>>>> >>>
>>>> >>> You can pretty much use any client to do this. When I was using Spark 
>>>> >>> at a previous job, we used OkHttp, but I'm sure there are plenty of 
>>>> >>> others. In our case, we had a startup phase in which we gathered 
>>>> >>> metadata via a REST API and then broadcast it to the workers. I think 
>>>> >>> if you need all the workers to have access to whatever you're getting 
>>>> >>> from the API, that's the way to do it.
>>>> >>>
>>>> >>> Jerry
>>>> >>>
>>>> >>> On Thu, May 14, 2020 at 5:03 PM Chetan Khatri 
>>>> >>> <chetan.opensou...@gmail.com> wrote:
>>>> >>>>
>>>> >>>> Hi Spark Users,
>>>> >>>>
>>>> >>>> How can I invoke the Rest API call from Spark Code which is not only 
>>>> >>>> running on Spark Driver but distributed / parallel?
>>>> >>>>
>>>> >>>> Spark with Scala is my tech stack.
>>>> >>>>
>>>> >>>> Thanks
>>>> >>>>
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> http://www.google.com/profiles/grapesmoker
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > http://www.google.com/profiles/grapesmoker

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Calling HTTP Rest APIs from Spark Job

Reply via email to