Re: Calling HTTP Rest APIs from Spark Job

Chetan Khatri Fri, 15 May 2020 10:46:29 -0700

Hi Sean,
Thanks for great answer.

What I am trying to do is to use something like Scala Future (cats-effect
IO) to do concurrent calls. Was understanding if any limitation
thresholds to make those calls.


On Thu, May 14, 2020 at 7:28 PM Sean Owen <sro...@gmail.com> wrote:

> No, it means # HTTP calls = # executor slots. But even then, you're
> welcome to, say, use thread pools to execute even more concurrently as
> most are I/O bound. Your code can do what you want.
>
> On Thu, May 14, 2020 at 6:14 PM Chetan Khatri
> <chetan.opensou...@gmail.com> wrote:
> >
> > Thanks, that means number of executor = number of http calls, I can
> make. I can't boost more number of http calls in single executors, I mean -
> I can't go beyond the threashold of number of executors.
> >
> > On Thu, May 14, 2020 at 6:26 PM Sean Owen <sro...@gmail.com> wrote:
> >>
> >> Default is not 200, but the number of executor slots. Yes you can only
> simultaneously execute as many tasks as slots regardless of partitions.
> >>
> >> On Thu, May 14, 2020, 5:19 PM Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
> >>>
> >>> Thanks Sean, Jerry.
> >>>
> >>> Default Spark DataFrame partitions are 200 right? does it have
> relationship with number of cores? 8 cores - 4 workers. is not it like I
> can do only 8 * 4 = 32 http calls. Because in Spark number of partitions =
> number cores is untrue.
> >>>
> >>> Thanks
> >>>
> >>> On Thu, May 14, 2020 at 6:11 PM Sean Owen <sro...@gmail.com> wrote:
> >>>>
> >>>> Yes any code that you write in code that you apply with Spark runs in
> >>>> the executors. You would be running as many HTTP clients as you have
> >>>> partitions.
> >>>>
> >>>> On Thu, May 14, 2020 at 4:31 PM Jerry Vinokurov <
> grapesmo...@gmail.com> wrote:
> >>>> >
> >>>> > I believe that if you do this within the context of an operation
> that is already parallelized such as a map, the work will be distributed to
> executors and they will do it in parallel. I could be wrong about this as I
> never investigated this specific use case, though.
> >>>> >
> >>>> > On Thu, May 14, 2020 at 5:24 PM Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
> >>>> >>
> >>>> >> Thanks for the quick response.
> >>>> >>
> >>>> >> I am curious to know whether would it be parallel pulling data for
> 100+ HTTP request or it will only go on Driver node? the post body would be
> part of DataFrame. Think as I have a data frame of employee_id,
> employee_name now the http GET call has to be made for each employee_id and
> DataFrame is dynamic for each spark job run.
> >>>> >>
> >>>> >> Does it make sense?
> >>>> >>
> >>>> >> Thanks
> >>>> >>
> >>>> >>
> >>>> >> On Thu, May 14, 2020 at 5:12 PM Jerry Vinokurov <
> grapesmo...@gmail.com> wrote:
> >>>> >>>
> >>>> >>> Hi Chetan,
> >>>> >>>
> >>>> >>> You can pretty much use any client to do this. When I was using
> Spark at a previous job, we used OkHttp, but I'm sure there are plenty of
> others. In our case, we had a startup phase in which we gathered metadata
> via a REST API and then broadcast it to the workers. I think if you need
> all the workers to have access to whatever you're getting from the API,
> that's the way to do it.
> >>>> >>>
> >>>> >>> Jerry
> >>>> >>>
> >>>> >>> On Thu, May 14, 2020 at 5:03 PM Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
> >>>> >>>>
> >>>> >>>> Hi Spark Users,
> >>>> >>>>
> >>>> >>>> How can I invoke the Rest API call from Spark Code which is not
> only running on Spark Driver but distributed / parallel?
> >>>> >>>>
> >>>> >>>> Spark with Scala is my tech stack.
> >>>> >>>>
> >>>> >>>> Thanks
> >>>> >>>>
> >>>> >>>>
> >>>> >>>
> >>>> >>>
> >>>> >>> --
> >>>> >>> http://www.google.com/profiles/grapesmoker
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > http://www.google.com/profiles/grapesmoker
>

Re: Calling HTTP Rest APIs from Spark Job

Reply via email to