By looking at your usecase, the whole processing logic seems to be very custom. I would recommend using ParDo's to express your use case. If the processing for individual dictionary is expensive then you can potentially use a reshuffle operation to distribute the updation of dictionary over multiple workers.
Note: As you are going to make write API calls your self, in case of worker failure, your transform can be executed multiple times. On Mon, Jun 3, 2019 at 11:41 AM Anjana Pydi <[email protected]> wrote: > Hi Ankur, > > Thanks for reply. Please find responses updated in below mail. > > Thanks, > Anjana > ------------------------------ > *From:* Ankur Goenka [[email protected]] > *Sent:* Monday, June 03, 2019 11:01 AM > *To:* [email protected] > *Subject:* Re: How to build a beam python pipeline which does GET/POST > request to API's > > Thanks for providing more information. > > Some follow up questions/comments > 1. Call an API which would provide a dictionary as response. > Question: Do you need to make multiple of these API calls? If yes, what > distinguishes API call1 from call2? If its the input to the API, then can > you provide the inputs to in a file etc? What I am trying to identify is an > input source to the pipeline so that beam can distribute the work. > Answer : When an API call is made, it can provide a list of dictionaries > as response, we have to go through every dictionary, do the same > transformations for each and send it. > 2. Transform dictionary to add / remove few keys. > 3. Send transformed dictionary as JSON to an API which prints this JSON as > output. > Question: Are these write operation idempotent? As you are doing your own > api calls, its possible that after a failure, the calls are done again for > the same input. If write calls are not idempotent then their can be > duplicate data. > Answer : Suppose, if I receive a list of 1000 dictionaries as response > when I called API in point1, I should do only 1000 write operations > respectively to each input. If there is a failure for any input, only that > should not be posted and remaining should be posted successfully. > > On Sat, Jun 1, 2019 at 8:13 PM Anjana Pydi <[email protected]> > wrote: > >> Hi Ankur, >> >> Thanks for the reply! Below is more details of the usecase: >> >> 1. Call an API which would provide a dictionary as response. >> 2. Transform dictionary to add / remove few keys. >> 3. Send transformed dictionary as JSON to an API which prints this JSON >> as output. >> >> Please let me know in case of any clarifications. >> >> Thanks, >> Anjana >> ------------------------------ >> *From:* Ankur Goenka [[email protected]] >> *Sent:* Saturday, June 01, 2019 6:47 PM >> *To:* [email protected] >> *Subject:* Re: How to build a beam python pipeline which does GET/POST >> request to API's >> >> Hi Anjana, >> >> You can write your API logic in a ParDo and subsequently pass the >> elements to other ParDos to transform and eventually make an API call to to >> another endpoint. >> >> However, this might not be a good fit for Beam as the input is not well >> defined and hence scaling and "once processing" of elements will not be >> possible as their is no well defined input. >> >> It will be better to elaborate a bit more on the usecase for better >> suggestions. >> >> Thanks, >> Ankur >> >> On Sat, Jun 1, 2019 at 5:50 PM Anjana Pydi <[email protected]> >> wrote: >> >>> Hi, >>> >>> I have a requirement to create an apache beam python pipeline to read a >>> JSON from an API endpoint, transform it (add/remove few fields)and send the >>> transformed JSON to another API endpoint. >>> >>> Can anyone please provide some suggestions on how to do it. >>> >>> Thanks, >>> Anjana >>> ----------------------------------------------------------------------------------------------------------------------- >>> The information contained in this communication is intended solely for the >>> use of the individual or entity to whom it is addressed and others >>> authorized to receive it. It may contain confidential or legally privileged >>> information. If you are not the intended recipient you are hereby notified >>> that any disclosure, copying, distribution or taking any action in reliance >>> on the contents of this information is strictly prohibited and may be >>> unlawful. If you are not the intended recipient, please notify us >>> immediately by responding to this email and then delete it from your >>> system. Bahwan Cybertek is neither liable for the proper and complete >>> transmission of the information contained in this communication nor for any >>> delay in its receipt. >>> >> ----------------------------------------------------------------------------------------------------------------------- >> The information contained in this communication is intended solely for the >> use of the individual or entity to whom it is addressed and others >> authorized to receive it. It may contain confidential or legally privileged >> information. If you are not the intended recipient you are hereby notified >> that any disclosure, copying, distribution or taking any action in reliance >> on the contents of this information is strictly prohibited and may be >> unlawful. If you are not the intended recipient, please notify us >> immediately by responding to this email and then delete it from your >> system. Bahwan Cybertek is neither liable for the proper and complete >> transmission of the information contained in this communication nor for any >> delay in its receipt. >> > ----------------------------------------------------------------------------------------------------------------------- > The information contained in this communication is intended solely for the > use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally privileged > information. If you are not the intended recipient you are hereby notified > that any disclosure, copying, distribution or taking any action in reliance > on the contents of this information is strictly prohibited and may be > unlawful. If you are not the intended recipient, please notify us > immediately by responding to this email and then delete it from your > system. Bahwan Cybertek is neither liable for the proper and complete > transmission of the information contained in this communication nor for any > delay in its receipt. >
