Thanks Sean, Jerry. Default Spark DataFrame partitions are 200 right? does it have relationship with number of cores? 8 cores - 4 workers. is not it like I can do only 8 * 4 = 32 http calls. Because in Spark number of partitions = number cores is untrue.
Thanks On Thu, May 14, 2020 at 6:11 PM Sean Owen <sro...@gmail.com> wrote: > Yes any code that you write in code that you apply with Spark runs in > the executors. You would be running as many HTTP clients as you have > partitions. > > On Thu, May 14, 2020 at 4:31 PM Jerry Vinokurov <grapesmo...@gmail.com> > wrote: > > > > I believe that if you do this within the context of an operation that is > already parallelized such as a map, the work will be distributed to > executors and they will do it in parallel. I could be wrong about this as I > never investigated this specific use case, though. > > > > On Thu, May 14, 2020 at 5:24 PM Chetan Khatri < > chetan.opensou...@gmail.com> wrote: > >> > >> Thanks for the quick response. > >> > >> I am curious to know whether would it be parallel pulling data for 100+ > HTTP request or it will only go on Driver node? the post body would be part > of DataFrame. Think as I have a data frame of employee_id, employee_name > now the http GET call has to be made for each employee_id and DataFrame is > dynamic for each spark job run. > >> > >> Does it make sense? > >> > >> Thanks > >> > >> > >> On Thu, May 14, 2020 at 5:12 PM Jerry Vinokurov <grapesmo...@gmail.com> > wrote: > >>> > >>> Hi Chetan, > >>> > >>> You can pretty much use any client to do this. When I was using Spark > at a previous job, we used OkHttp, but I'm sure there are plenty of others. > In our case, we had a startup phase in which we gathered metadata via a > REST API and then broadcast it to the workers. I think if you need all the > workers to have access to whatever you're getting from the API, that's the > way to do it. > >>> > >>> Jerry > >>> > >>> On Thu, May 14, 2020 at 5:03 PM Chetan Khatri < > chetan.opensou...@gmail.com> wrote: > >>>> > >>>> Hi Spark Users, > >>>> > >>>> How can I invoke the Rest API call from Spark Code which is not only > running on Spark Driver but distributed / parallel? > >>>> > >>>> Spark with Scala is my tech stack. > >>>> > >>>> Thanks > >>>> > >>>> > >>> > >>> > >>> -- > >>> http://www.google.com/profiles/grapesmoker > > > > > > > > -- > > http://www.google.com/profiles/grapesmoker >