Re: Development of new HTTP source

2018-04-11 Thread Daniel Salerno
Sorry Romain, I did not get it right ...
Are you saying so for now, develop from scratch the source Http?

Sorry I'm new to the group, what does PR mean?

2018-04-11 14:38 GMT-03:00 Romain Manni-Bucau :

> I guess for now you can feth his fork or just refork it and pr on it
> directly.
>
> Le 11 avr. 2018 19:21, "Daniel Salerno"  a écrit :
>
>> Hello Romain,
>>
>> Thanks for the feedback.
>> It sounds like a good idea!
>> How could I get the rest IO from JB?
>>
>> Thank you.
>>
>> 2018-04-11 10:50 GMT-03:00 Romain Manni-Bucau :
>>
>>> Hi Daniel,
>>>
>>> I know JB has a started a RestIO, it is not yet complete and needs
>>> some love but probably some opportunity for converge and collaboration
>>> here
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>>
>>>
>>> 2018-04-11 15:42 GMT+02:00 Daniel Salerno :
>>> > Good morning,
>>> >
>>> > In my Big Data Google project we need to read batch data from the VTEX
>>> > platform every 15 minutes and record json's return to our cloud storage
>>> > datalake.
>>> > It makes the data available through HTTP GET requests for its API:
>>> > (https://documenter.getpostman.com/view/487146/vtex-oms-api/6tjSKqi)
>>> >
>>> > I have not found a good way to do this reading using standard Apache
>>> Beam
>>> > components, my idea is to develop an HttpIO component using Source
>>> Sink to
>>> > read the request and write a JSON file to Cloud Storage. After that
>>> another
>>> > pipeline will process the file and write to BigQuery.
>>> >
>>> > Can you please help me with this matter? Is that the best way to go?
>>> Do you
>>> > have any idea how to develop this new component?
>>> >
>>> > Thank you.
>>> >
>>> > Hugs,
>>> >
>>> > Daniel Salerno de Arruda
>>>
>>
>>


Re: Development of new HTTP source

2018-04-11 Thread Romain Manni-Bucau
I guess for now you can feth his fork or just refork it and pr on it
directly.

Le 11 avr. 2018 19:21, "Daniel Salerno"  a écrit :

> Hello Romain,
>
> Thanks for the feedback.
> It sounds like a good idea!
> How could I get the rest IO from JB?
>
> Thank you.
>
> 2018-04-11 10:50 GMT-03:00 Romain Manni-Bucau :
>
>> Hi Daniel,
>>
>> I know JB has a started a RestIO, it is not yet complete and needs
>> some love but probably some opportunity for converge and collaboration
>> here
>>
>> Romain Manni-Bucau
>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>
>>
>> 2018-04-11 15:42 GMT+02:00 Daniel Salerno :
>> > Good morning,
>> >
>> > In my Big Data Google project we need to read batch data from the VTEX
>> > platform every 15 minutes and record json's return to our cloud storage
>> > datalake.
>> > It makes the data available through HTTP GET requests for its API:
>> > (https://documenter.getpostman.com/view/487146/vtex-oms-api/6tjSKqi)
>> >
>> > I have not found a good way to do this reading using standard Apache
>> Beam
>> > components, my idea is to develop an HttpIO component using Source Sink
>> to
>> > read the request and write a JSON file to Cloud Storage. After that
>> another
>> > pipeline will process the file and write to BigQuery.
>> >
>> > Can you please help me with this matter? Is that the best way to go? Do
>> you
>> > have any idea how to develop this new component?
>> >
>> > Thank you.
>> >
>> > Hugs,
>> >
>> > Daniel Salerno de Arruda
>>
>
>


Re: Development of new HTTP source

2018-04-11 Thread Daniel Salerno
Hello Romain,

Thanks for the feedback.
It sounds like a good idea!
How could I get the rest IO from JB?

Thank you.

2018-04-11 10:50 GMT-03:00 Romain Manni-Bucau :

> Hi Daniel,
>
> I know JB has a started a RestIO, it is not yet complete and needs
> some love but probably some opportunity for converge and collaboration
> here
>
> Romain Manni-Bucau
> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>
>
> 2018-04-11 15:42 GMT+02:00 Daniel Salerno :
> > Good morning,
> >
> > In my Big Data Google project we need to read batch data from the VTEX
> > platform every 15 minutes and record json's return to our cloud storage
> > datalake.
> > It makes the data available through HTTP GET requests for its API:
> > (https://documenter.getpostman.com/view/487146/vtex-oms-api/6tjSKqi)
> >
> > I have not found a good way to do this reading using standard Apache Beam
> > components, my idea is to develop an HttpIO component using Source Sink
> to
> > read the request and write a JSON file to Cloud Storage. After that
> another
> > pipeline will process the file and write to BigQuery.
> >
> > Can you please help me with this matter? Is that the best way to go? Do
> you
> > have any idea how to develop this new component?
> >
> > Thank you.
> >
> > Hugs,
> >
> > Daniel Salerno de Arruda
>


Re: Development of new HTTP source

2018-04-11 Thread Ismaël Mejía
There is also Romain's filesystem PR that wraps vfs as a Beam
filesystem . That would at least in theory allow to read from HTTP and
FTP, no?

Of course this is different from RestIO because it won't have the full
HTTP verbs semantics but if the goal is just to read from (GET), maybe
we should try to make this PR advance no?

https://github.com/apache/beam/pull/4803



On Wed, Apr 11, 2018 at 3:50 PM, Romain Manni-Bucau
 wrote:
> Hi Daniel,
>
> I know JB has a started a RestIO, it is not yet complete and needs
> some love but probably some opportunity for converge and collaboration
> here
>
> Romain Manni-Bucau
> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>
>
> 2018-04-11 15:42 GMT+02:00 Daniel Salerno :
>> Good morning,
>>
>> In my Big Data Google project we need to read batch data from the VTEX
>> platform every 15 minutes and record json's return to our cloud storage
>> datalake.
>> It makes the data available through HTTP GET requests for its API:
>> (https://documenter.getpostman.com/view/487146/vtex-oms-api/6tjSKqi)
>>
>> I have not found a good way to do this reading using standard Apache Beam
>> components, my idea is to develop an HttpIO component using Source Sink to
>> read the request and write a JSON file to Cloud Storage. After that another
>> pipeline will process the file and write to BigQuery.
>>
>> Can you please help me with this matter? Is that the best way to go? Do you
>> have any idea how to develop this new component?
>>
>> Thank you.
>>
>> Hugs,
>>
>> Daniel Salerno de Arruda


Re: Development of new HTTP source

2018-04-11 Thread Romain Manni-Bucau
Hi Daniel,

I know JB has a started a RestIO, it is not yet complete and needs
some love but probably some opportunity for converge and collaboration
here

Romain Manni-Bucau
@rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book


2018-04-11 15:42 GMT+02:00 Daniel Salerno :
> Good morning,
>
> In my Big Data Google project we need to read batch data from the VTEX
> platform every 15 minutes and record json's return to our cloud storage
> datalake.
> It makes the data available through HTTP GET requests for its API:
> (https://documenter.getpostman.com/view/487146/vtex-oms-api/6tjSKqi)
>
> I have not found a good way to do this reading using standard Apache Beam
> components, my idea is to develop an HttpIO component using Source Sink to
> read the request and write a JSON file to Cloud Storage. After that another
> pipeline will process the file and write to BigQuery.
>
> Can you please help me with this matter? Is that the best way to go? Do you
> have any idea how to develop this new component?
>
> Thank you.
>
> Hugs,
>
> Daniel Salerno de Arruda