Thanks Sean, Jerry.

Default Spark DataFrame partitions are 200 right? does it have relationship
with number of cores? 8 cores - 4 workers. is not it like I can do only 8 *
4 = 32 http calls. Because in Spark number of partitions = number cores is
untrue.

Thanks

On Thu, May 14, 2020 at 6:11 PM Sean Owen <sro...@gmail.com> wrote:

> Yes any code that you write in code that you apply with Spark runs in
> the executors. You would be running as many HTTP clients as you have
> partitions.
>
> On Thu, May 14, 2020 at 4:31 PM Jerry Vinokurov <grapesmo...@gmail.com>
> wrote:
> >
> > I believe that if you do this within the context of an operation that is
> already parallelized such as a map, the work will be distributed to
> executors and they will do it in parallel. I could be wrong about this as I
> never investigated this specific use case, though.
> >
> > On Thu, May 14, 2020 at 5:24 PM Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
> >>
> >> Thanks for the quick response.
> >>
> >> I am curious to know whether would it be parallel pulling data for 100+
> HTTP request or it will only go on Driver node? the post body would be part
> of DataFrame. Think as I have a data frame of employee_id, employee_name
> now the http GET call has to be made for each employee_id and DataFrame is
> dynamic for each spark job run.
> >>
> >> Does it make sense?
> >>
> >> Thanks
> >>
> >>
> >> On Thu, May 14, 2020 at 5:12 PM Jerry Vinokurov <grapesmo...@gmail.com>
> wrote:
> >>>
> >>> Hi Chetan,
> >>>
> >>> You can pretty much use any client to do this. When I was using Spark
> at a previous job, we used OkHttp, but I'm sure there are plenty of others.
> In our case, we had a startup phase in which we gathered metadata via a
> REST API and then broadcast it to the workers. I think if you need all the
> workers to have access to whatever you're getting from the API, that's the
> way to do it.
> >>>
> >>> Jerry
> >>>
> >>> On Thu, May 14, 2020 at 5:03 PM Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
> >>>>
> >>>> Hi Spark Users,
> >>>>
> >>>> How can I invoke the Rest API call from Spark Code which is not only
> running on Spark Driver but distributed / parallel?
> >>>>
> >>>> Spark with Scala is my tech stack.
> >>>>
> >>>> Thanks
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> http://www.google.com/profiles/grapesmoker
> >
> >
> >
> > --
> > http://www.google.com/profiles/grapesmoker
>

Reply via email to