Towards a spec for robust streaming SQL, Part 2

2017-07-24 Thread Tyler Akidau
Hello Flink, Calcite, and Beam dev lists! Linked below is the second document I promised way back in April regarding a collaborative spec for streaming SQL in Beam/Calcite/Flink (& apologies for the delay; I thought I was nearly done a while back and then temporal joins expanded to something much

Re: [S]FTP support as Pipeline I/O

2017-07-24 Thread Jean-Baptiste Onofré
In Camel, we have different mode: with local file caching or using streaming when possible (it depends of the body in the Exchange). So, I think we can do the same in Beam. Regards JB On 07/24/2017 09:38 PM, Eugene Kirpichov wrote: I think Camille may have referred to python standard library

Re: [S]FTP support as Pipeline I/O

2017-07-24 Thread Eugene Kirpichov
I think Camille may have referred to python standard library class StringIO which means collecting the output into a string - and then I suppose uploading the string to FTP. That could work (similar stuff exists in Java library) but would limit us to files whose content fits in memory. On Mon,

Re: [S]FTP support as Pipeline I/O

2017-07-24 Thread Jean-Baptiste Onofré
I guess TextIO ? ;) Regards JB On Jul 24, 2017, 21:27, at 21:27, Eugene Kirpichov wrote: >What is StringIO? > >On Mon, Jul 24, 2017 at 1:47 AM Tolsa, Camille > >wrote: > >> Not necessary with StringIO >> >> On 24 July 2017 at 09:47,

Re: [S]FTP support as Pipeline I/O

2017-07-24 Thread Eugene Kirpichov
What is StringIO? On Mon, Jul 24, 2017 at 1:47 AM Tolsa, Camille wrote: > Not necessary with StringIO > > On 24 July 2017 at 09:47, Reuven Lax wrote: > > > This would require writing data to local files in order to upload it to > the > >

Custom window merging

2017-07-24 Thread Etienne Chauchot
Hi all, There is now 2 new ValidatesRunner tests: WindowTest.testMergingCustomWindows and WindowTest.testMergingCustomWindowsKeyedCollection. The aim of these tests is to verify that the runners can handle custom windowFn (extensions of windowFn that, for example, could rely on elements in

Re: [CANCEL][VOTE] Release 2.1.0, release candidate #2

2017-07-24 Thread Sourabh Bajaj
I created PR/3627 for cherry picking a fix for BEAM-2636. On Mon, Jul 24, 2017 at 8:20 AM Ismaël Mejía wrote: > Not a blocker but maybe it is worth considering the fix for > https://issues.apache.org/jira/browse/BEAM-2587 too. > > I also was bitten by this issue and I could

Re: [CANCEL][VOTE] Release 2.1.0, release candidate #2

2017-07-24 Thread Ismaël Mejía
Not a blocker but maybe it is worth considering the fix for https://issues.apache.org/jira/browse/BEAM-2587 too. I also was bitten by this issue and I could only get it to work by doing a 'pip install --user grpcio-tools' (not sure if this is a proper solution but it works for me), however when I

Re: [CANCEL][VOTE] Release 2.1.0, release candidate #2

2017-07-24 Thread Aljoscha Krettek
I opened a PR against the release-2.1.0 branch: https://github.com/apache/beam/pull/3625 This should not fail any tests since it was recently reviewed and merged for the master. Best, Aljoscha > On 24. Jul 2017, at 14:09, Jean-Baptiste Onofré

Re: [CANCEL][VOTE] Release 2.1.0, release candidate #2

2017-07-24 Thread Jean-Baptiste Onofré
+1 Definitely good to have it for RC3. Regards JB On 07/24/2017 02:05 PM, Aljoscha Krettek wrote: When we're cutting a new RC anyways we could also include the fixes for https://issues.apache.org/jira/browse/BEAM-2571 . It's an actual bug in

Re: [CANCEL][VOTE] Release 2.1.0, release candidate #2

2017-07-24 Thread Aljoscha Krettek
When we're cutting a new RC anyways we could also include the fixes for https://issues.apache.org/jira/browse/BEAM-2571 . It's an actual bug in the Flink Runner and the fix for that is a set of three fixes that should be easy to cherry-pick on

Re: [CANCEL][VOTE] Release 2.1.0, release candidate #2

2017-07-24 Thread Aviem Zur
We also have two tests failing in Spark runner as detailed by the following two tickets: https://issues.apache.org/jira/browse/BEAM-2670 https://issues.apache.org/jira/browse/BEAM-2671 On Mon, Jul 24, 2017 at 11:44 AM Jean-Baptiste Onofré wrote: > Hi all, > > due to

Re: [S]FTP support as Pipeline I/O

2017-07-24 Thread Tolsa, Camille
Not necessary with StringIO On 24 July 2017 at 09:47, Reuven Lax wrote: > This would require writing data to local files in order to upload it to the > remote FTP, right? > > On Mon, Jul 24, 2017 at 12:31 AM, Jean-Baptiste Onofré > wrote: > > > Hi

[CANCEL][VOTE] Release 2.1.0, release candidate #2

2017-07-24 Thread Jean-Baptiste Onofré
Hi all, due to https://issues.apache.org/jira/browse/BEAM-2662, I cancel this vote. We also have a build issue with the Spark runner that I would like to fix for RC3: https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_ValidatesRunner_Spark/2446/ So, we are going to work on the

Re: [S]FTP support as Pipeline I/O

2017-07-24 Thread Reuven Lax
This would require writing data to local files in order to upload it to the remote FTP, right? On Mon, Jul 24, 2017 at 12:31 AM, Jean-Baptiste Onofré wrote: > Hi Lucas, > > IMHO, it's not a IO, it's a filesystem that TextIO and others can support > (like GFS or HDFS). > >

Re: [S]FTP support as Pipeline I/O

2017-07-24 Thread Tolsa, Camille
Hello, I would definitively appreciate this feature. If i can help somehow tell me Camille. On 24 July 2017 at 09:31, Jean-Baptiste Onofré wrote: > Hi Lucas, > > IMHO, it's not a IO, it's a filesystem that TextIO and others can support > (like GFS or HDFS). > > It's what we

Re: [S]FTP support as Pipeline I/O

2017-07-24 Thread Jean-Baptiste Onofré
Hi Lucas, IMHO, it's not a IO, it's a filesystem that TextIO and others can support (like GFS or HDFS). It's what we did in Camel: the ftp component is just an extend of file component. It means that we would be able to do: pipeline.apply(TextIO.from("ftp://...;)). Thoughts ? If agree, I

Re: [VOTE] Release 2.1.0, release candidate #2

2017-07-24 Thread Jean-Baptiste Onofré
Great initiative Kenn ! I will take a look. Regards JB On 07/24/2017 07:57 AM, Kenneth Knowles wrote: Nice catch. Per our discussion on RC2 and now this, I started a spreadsheet for release criteria. Template: https://s.apache.org/beam-release-validation Copy for this release:

[S]FTP support as Pipeline I/O

2017-07-24 Thread Lucas Arruda
Hi Beam folks, I would like to suggest the creation of a Pipeline I/O to support FTP/SFTP as both source and sink locations for data processing. I've done some research and it looks like there isn't any kind of development ongoing for this (at least not on Jira). I'd like to know your thoughts