Hi Sean, Thanks for great answer. What I am trying to do is to use something like Scala Future (cats-effect IO) to do concurrent calls. Was understanding if any limitation thresholds to make those calls.
On Thu, May 14, 2020 at 7:28 PM Sean Owen <sro...@gmail.com> wrote: > No, it means # HTTP calls = # executor slots. But even then, you're > welcome to, say, use thread pools to execute even more concurrently as > most are I/O bound. Your code can do what you want. > > On Thu, May 14, 2020 at 6:14 PM Chetan Khatri > <chetan.opensou...@gmail.com> wrote: > > > > Thanks, that means number of executor = number of http calls, I can > make. I can't boost more number of http calls in single executors, I mean - > I can't go beyond the threashold of number of executors. > > > > On Thu, May 14, 2020 at 6:26 PM Sean Owen <sro...@gmail.com> wrote: > >> > >> Default is not 200, but the number of executor slots. Yes you can only > simultaneously execute as many tasks as slots regardless of partitions. > >> > >> On Thu, May 14, 2020, 5:19 PM Chetan Khatri < > chetan.opensou...@gmail.com> wrote: > >>> > >>> Thanks Sean, Jerry. > >>> > >>> Default Spark DataFrame partitions are 200 right? does it have > relationship with number of cores? 8 cores - 4 workers. is not it like I > can do only 8 * 4 = 32 http calls. Because in Spark number of partitions = > number cores is untrue. > >>> > >>> Thanks > >>> > >>> On Thu, May 14, 2020 at 6:11 PM Sean Owen <sro...@gmail.com> wrote: > >>>> > >>>> Yes any code that you write in code that you apply with Spark runs in > >>>> the executors. You would be running as many HTTP clients as you have > >>>> partitions. > >>>> > >>>> On Thu, May 14, 2020 at 4:31 PM Jerry Vinokurov < > grapesmo...@gmail.com> wrote: > >>>> > > >>>> > I believe that if you do this within the context of an operation > that is already parallelized such as a map, the work will be distributed to > executors and they will do it in parallel. I could be wrong about this as I > never investigated this specific use case, though. > >>>> > > >>>> > On Thu, May 14, 2020 at 5:24 PM Chetan Khatri < > chetan.opensou...@gmail.com> wrote: > >>>> >> > >>>> >> Thanks for the quick response. > >>>> >> > >>>> >> I am curious to know whether would it be parallel pulling data for > 100+ HTTP request or it will only go on Driver node? the post body would be > part of DataFrame. Think as I have a data frame of employee_id, > employee_name now the http GET call has to be made for each employee_id and > DataFrame is dynamic for each spark job run. > >>>> >> > >>>> >> Does it make sense? > >>>> >> > >>>> >> Thanks > >>>> >> > >>>> >> > >>>> >> On Thu, May 14, 2020 at 5:12 PM Jerry Vinokurov < > grapesmo...@gmail.com> wrote: > >>>> >>> > >>>> >>> Hi Chetan, > >>>> >>> > >>>> >>> You can pretty much use any client to do this. When I was using > Spark at a previous job, we used OkHttp, but I'm sure there are plenty of > others. In our case, we had a startup phase in which we gathered metadata > via a REST API and then broadcast it to the workers. I think if you need > all the workers to have access to whatever you're getting from the API, > that's the way to do it. > >>>> >>> > >>>> >>> Jerry > >>>> >>> > >>>> >>> On Thu, May 14, 2020 at 5:03 PM Chetan Khatri < > chetan.opensou...@gmail.com> wrote: > >>>> >>>> > >>>> >>>> Hi Spark Users, > >>>> >>>> > >>>> >>>> How can I invoke the Rest API call from Spark Code which is not > only running on Spark Driver but distributed / parallel? > >>>> >>>> > >>>> >>>> Spark with Scala is my tech stack. > >>>> >>>> > >>>> >>>> Thanks > >>>> >>>> > >>>> >>>> > >>>> >>> > >>>> >>> > >>>> >>> -- > >>>> >>> http://www.google.com/profiles/grapesmoker > >>>> > > >>>> > > >>>> > > >>>> > -- > >>>> > http://www.google.com/profiles/grapesmoker >