Re: Development of new HTTP source
Sorry Romain, I did not get it right ... Are you saying so for now, develop from scratch the source Http? Sorry I'm new to the group, what does PR mean? 2018-04-11 14:38 GMT-03:00 Romain Manni-Bucau: > I guess for now you can feth his fork or just refork it and pr on it > directly. > > Le 11 avr. 2018 19:21, "Daniel Salerno" a écrit : > >> Hello Romain, >> >> Thanks for the feedback. >> It sounds like a good idea! >> How could I get the rest IO from JB? >> >> Thank you. >> >> 2018-04-11 10:50 GMT-03:00 Romain Manni-Bucau : >> >>> Hi Daniel, >>> >>> I know JB has a started a RestIO, it is not yet complete and needs >>> some love but probably some opportunity for converge and collaboration >>> here >>> >>> Romain Manni-Bucau >>> @rmannibucau | Blog | Old Blog | Github | LinkedIn | Book >>> >>> >>> 2018-04-11 15:42 GMT+02:00 Daniel Salerno : >>> > Good morning, >>> > >>> > In my Big Data Google project we need to read batch data from the VTEX >>> > platform every 15 minutes and record json's return to our cloud storage >>> > datalake. >>> > It makes the data available through HTTP GET requests for its API: >>> > (https://documenter.getpostman.com/view/487146/vtex-oms-api/6tjSKqi) >>> > >>> > I have not found a good way to do this reading using standard Apache >>> Beam >>> > components, my idea is to develop an HttpIO component using Source >>> Sink to >>> > read the request and write a JSON file to Cloud Storage. After that >>> another >>> > pipeline will process the file and write to BigQuery. >>> > >>> > Can you please help me with this matter? Is that the best way to go? >>> Do you >>> > have any idea how to develop this new component? >>> > >>> > Thank you. >>> > >>> > Hugs, >>> > >>> > Daniel Salerno de Arruda >>> >> >>
Re: Development of new HTTP source
I guess for now you can feth his fork or just refork it and pr on it directly. Le 11 avr. 2018 19:21, "Daniel Salerno"a écrit : > Hello Romain, > > Thanks for the feedback. > It sounds like a good idea! > How could I get the rest IO from JB? > > Thank you. > > 2018-04-11 10:50 GMT-03:00 Romain Manni-Bucau : > >> Hi Daniel, >> >> I know JB has a started a RestIO, it is not yet complete and needs >> some love but probably some opportunity for converge and collaboration >> here >> >> Romain Manni-Bucau >> @rmannibucau | Blog | Old Blog | Github | LinkedIn | Book >> >> >> 2018-04-11 15:42 GMT+02:00 Daniel Salerno : >> > Good morning, >> > >> > In my Big Data Google project we need to read batch data from the VTEX >> > platform every 15 minutes and record json's return to our cloud storage >> > datalake. >> > It makes the data available through HTTP GET requests for its API: >> > (https://documenter.getpostman.com/view/487146/vtex-oms-api/6tjSKqi) >> > >> > I have not found a good way to do this reading using standard Apache >> Beam >> > components, my idea is to develop an HttpIO component using Source Sink >> to >> > read the request and write a JSON file to Cloud Storage. After that >> another >> > pipeline will process the file and write to BigQuery. >> > >> > Can you please help me with this matter? Is that the best way to go? Do >> you >> > have any idea how to develop this new component? >> > >> > Thank you. >> > >> > Hugs, >> > >> > Daniel Salerno de Arruda >> > >
Re: Development of new HTTP source
Hello Romain, Thanks for the feedback. It sounds like a good idea! How could I get the rest IO from JB? Thank you. 2018-04-11 10:50 GMT-03:00 Romain Manni-Bucau: > Hi Daniel, > > I know JB has a started a RestIO, it is not yet complete and needs > some love but probably some opportunity for converge and collaboration > here > > Romain Manni-Bucau > @rmannibucau | Blog | Old Blog | Github | LinkedIn | Book > > > 2018-04-11 15:42 GMT+02:00 Daniel Salerno : > > Good morning, > > > > In my Big Data Google project we need to read batch data from the VTEX > > platform every 15 minutes and record json's return to our cloud storage > > datalake. > > It makes the data available through HTTP GET requests for its API: > > (https://documenter.getpostman.com/view/487146/vtex-oms-api/6tjSKqi) > > > > I have not found a good way to do this reading using standard Apache Beam > > components, my idea is to develop an HttpIO component using Source Sink > to > > read the request and write a JSON file to Cloud Storage. After that > another > > pipeline will process the file and write to BigQuery. > > > > Can you please help me with this matter? Is that the best way to go? Do > you > > have any idea how to develop this new component? > > > > Thank you. > > > > Hugs, > > > > Daniel Salerno de Arruda >
Re: Development of new HTTP source
There is also Romain's filesystem PR that wraps vfs as a Beam filesystem . That would at least in theory allow to read from HTTP and FTP, no? Of course this is different from RestIO because it won't have the full HTTP verbs semantics but if the goal is just to read from (GET), maybe we should try to make this PR advance no? https://github.com/apache/beam/pull/4803 On Wed, Apr 11, 2018 at 3:50 PM, Romain Manni-Bucauwrote: > Hi Daniel, > > I know JB has a started a RestIO, it is not yet complete and needs > some love but probably some opportunity for converge and collaboration > here > > Romain Manni-Bucau > @rmannibucau | Blog | Old Blog | Github | LinkedIn | Book > > > 2018-04-11 15:42 GMT+02:00 Daniel Salerno : >> Good morning, >> >> In my Big Data Google project we need to read batch data from the VTEX >> platform every 15 minutes and record json's return to our cloud storage >> datalake. >> It makes the data available through HTTP GET requests for its API: >> (https://documenter.getpostman.com/view/487146/vtex-oms-api/6tjSKqi) >> >> I have not found a good way to do this reading using standard Apache Beam >> components, my idea is to develop an HttpIO component using Source Sink to >> read the request and write a JSON file to Cloud Storage. After that another >> pipeline will process the file and write to BigQuery. >> >> Can you please help me with this matter? Is that the best way to go? Do you >> have any idea how to develop this new component? >> >> Thank you. >> >> Hugs, >> >> Daniel Salerno de Arruda
Re: Development of new HTTP source
Hi Daniel, I know JB has a started a RestIO, it is not yet complete and needs some love but probably some opportunity for converge and collaboration here Romain Manni-Bucau @rmannibucau | Blog | Old Blog | Github | LinkedIn | Book 2018-04-11 15:42 GMT+02:00 Daniel Salerno: > Good morning, > > In my Big Data Google project we need to read batch data from the VTEX > platform every 15 minutes and record json's return to our cloud storage > datalake. > It makes the data available through HTTP GET requests for its API: > (https://documenter.getpostman.com/view/487146/vtex-oms-api/6tjSKqi) > > I have not found a good way to do this reading using standard Apache Beam > components, my idea is to develop an HttpIO component using Source Sink to > read the request and write a JSON file to Cloud Storage. After that another > pipeline will process the file and write to BigQuery. > > Can you please help me with this matter? Is that the best way to go? Do you > have any idea how to develop this new component? > > Thank you. > > Hugs, > > Daniel Salerno de Arruda