Re: Looking for I/O transform to untar a tar.gz

2018-03-15 Thread Pablo Estrada
Hi! Quick questions: - which sdk are you using? - is this batch or streaming? As JB mentioned, TextIO is able to work with compressed files that contain text. Nothing currently handles the double decompression that I believe you're looking for. TextIO for Java is also able to"watch" a directory

Re: Contributing

2018-03-15 Thread Ahmet Altay
Hi Austin, It was great meeting with you. We mentioned a list of starter bugs, here is that list [1]. It might give you some ides on where to start. [1]

Re: Looking for I/O transform to untar a tar.gz

2018-03-15 Thread Jean-Baptiste Onofré
Hi TextIO supports compressed file. Do you want to read files in text ? Can you detail a bit the use case ? Thanks Regards JB Le 15 mars 2018 à 18:28, à 18:28, Shirish Jamthe a écrit: >Hi, > >My input is a tar.gz or .zip file which contains thousands of tar.gz >files >and

Using the Go Beam SDK

2018-03-15 Thread Philip Gianfortoni
Hi dev team, I am an engineer at Token Transit, a company working on a mobile ticketing solution for transit companies. Through Holden (who's working on Py3 w/Beam and doing some dev advocacy), we heard about the experimental newly merged WIP Go Beam SDK. We are extremely excited about exploring

Looking for I/O transform to untar a tar.gz

2018-03-15 Thread Shirish Jamthe
Hi, My input is a tar.gz or .zip file which contains thousands of tar.gz files and other files. I would lile to extract the tar.gz files from the tar. Is there a transform that can do that? I couldn't find one. If not is it in works? Any pointers to start work on it? thanks

Contributing

2018-03-15 Thread Austin Bennett
Hi All, Enjoyed meeting many of you yesterday, and look forward to helping the project! I'll hope in the next week or two to submit some bit of code, and then [slowly] to get up to speed on conventions, guidelines, appropriate style, etc. Ideally, with some guidance. Also, should we be able to

Re: Dealing with AWS Regions

2018-03-15 Thread Lukasz Cwik
For now I think we should stick with registering different configurations of filesystems under different schemes so we should use s3a://, s3b://, and s3c://. If you go down the route of enhancing S3Options (similar to HadoopFileSystemOptions) to be able to register multiple S3 filesystems under

Re: [PROPOSITION] schedule some sanity tests on a daily basis

2018-03-15 Thread Jean-Baptiste Onofré
Hi, I would suggest to prepare a Maven profile to perform nexmark runs. Then, I can setup a job (seed/manual) in Jenkins to run this. Regards JB On 15/03/2018 22:13, Etienne Chauchot wrote: So what next? Shall we schedule nexmark runs and add a Bigquery sink to nexmark output? Le lundi

Re: [PROPOSITION] schedule some sanity tests on a daily basis

2018-03-15 Thread Etienne Chauchot
So what next? Shall we schedule nexmark runs and add a Bigquery sink to nexmark output? Le lundi 12 mars 2018 à 10:30 +0100, Etienne Chauchot a écrit : > Thanks everyone for your comments and support. > > Le vendredi 09 mars 2018 à 21:28 +, Alan Myrvold a écrit : > > Great ideas. I want to

Re: Board report - March '18

2018-03-15 Thread Davor Bonaci
The report is now submitted. Thanks to everyone who provided comments and feedback. On Thu, Mar 15, 2018 at 11:41 AM, Lukasz Cwik wrote: > +1 I also took a pass over it. > > > On Thu, Mar 15, 2018 at 9:29 AM Jean-Baptiste Onofré > wrote: > >> +1 >> >> It

Re: Board report - March '18

2018-03-15 Thread Lukasz Cwik
+1 I also took a pass over it. On Thu, Mar 15, 2018 at 9:29 AM Jean-Baptiste Onofré wrote: > +1 > > It looks good to me. > > Thanks ! > > Regards > JB > Le 15 mars 2018, à 08:40, Davor Bonaci a écrit: >> >> Thanks JB for starting the report. >> >> If

Re: [VOTE] Release 2.4.0, release candidate #2

2018-03-15 Thread Robert Bradshaw
Just to give an update on this, I plan on creating an RC3 as soon as I get the dataflow containers rebuilt. (Been busy with the beam summit among other things.) Except for direct runner fixes and the dataflow tags, I expect it to be the same as RC2, so testing on flink/apex/spark done now would

Re: Board report - March '18

2018-03-15 Thread Jean-Baptiste Onofré
+1 It looks good to me. Thanks ! Regards JB Le 15 mars 2018 à 08:40, à 08:40, Davor Bonaci a écrit: >Thanks JB for starting the report. > >If interested, please take a look at the complete draft [1], and >comment or >contribute content, as appropriate. I'll submit the report

Re: Board report - March '18

2018-03-15 Thread Davor Bonaci
Thanks JB for starting the report. If interested, please take a look at the complete draft [1], and comment or contribute content, as appropriate. I'll submit the report sometime in the next 24 hours. Thanks! Davor [1]

Re: (java) stream & beam?

2018-03-15 Thread Romain Manni-Bucau
Le 15 mars 2018 07:50, "Robert Bradshaw" a écrit : On Wed, Mar 14, 2018 at 11:04 PM Romain Manni-Bucau wrote: > Le 15 mars 2018 06:52, "Robert Bradshaw" a écrit : >> The stream API was looked at way back when we were designing

Re: Beam 2.4.0

2018-03-15 Thread Romain Manni-Bucau
Done, thanks. Romain Manni-Bucau @rmannibucau | Blog | Old Blog | Github | LinkedIn | Book