Re: Any reason to not use [vfs]?

2018-03-05 Thread Romain Manni-Bucau
Le 6 mars 2018 01:05, "Lukasz Cwik" a écrit : As is, how does VFS improve upon the current FileSystem solution? This is about the ecosystem. I never saw beam fs implemented outside beam but saw a tons of vfs users and impl. How much work is it before VFS supports the

Re: Should tests fail due to transient errors on Dataflow Runner?

2018-03-05 Thread Lukasz Cwik
That makes sense but you'll want to make sure that no test + runner is relying on this behavior by making your change and running all the validates runner tests. Historically what you say was not always the case because Dataflow streaming jobs were never "DONE", they only were in the "RUNNING"

Re: Any reason to not use [vfs]?

2018-03-05 Thread Lukasz Cwik
As is, how does VFS improve upon the current FileSystem solution? How much work is it before VFS supports the Apache Beam usecases (bulk operations, glob support)? Is it the right direction for the VFS project to support the above changes? (things that are important to a data parallel

Re: Any reason to not use [vfs]?

2018-03-05 Thread Romain Manni-Bucau
Le 5 mars 2018 22:26, "Robert Bradshaw" a écrit : First, let's try to make the terminology abundantly clear, as I for one have (I think) misinterpreted what has been proposed. VfsFileSystem: A subclass of https://github.com/apache/beam/blob/

Re: Any reason to not use [vfs]?

2018-03-05 Thread Eugene Kirpichov
If VFS was mature enough for our needs, then I'd give a +1 to using it in Beam Java SDK - currently it's not, so we can't use it directly. It's indeed a reasonable option to use the VFS API inside Beam, and port our implementations of FileSystem(s) to that API, and then potentially donate that to

Re: Any reason to not use [vfs]?

2018-03-05 Thread Robert Bradshaw
First, let's try to make the terminology abundantly clear, as I for one have (I think) misinterpreted what has been proposed. VfsFileSystem: A subclass of

Re: Any reason to not use [vfs]?

2018-03-05 Thread Reuven Lax
terminology is confusing here, since the existing FileIO is a PTransform. VfsFilesystem would be a better name. On Mon, Mar 5, 2018 at 11:46 AM Robert Bradshaw wrote: > On Mon, Mar 5, 2018 at 11:38 AM Reuven Lax wrote: > >> What about a beam Filesystem

Re: Any reason to not use [vfs]?

2018-03-05 Thread Robert Bradshaw
On Mon, Mar 5, 2018 at 11:38 AM Reuven Lax wrote: > What about a beam Filesystem impl on top of Vfs as an alternative > short-term solution? This would allow Vfs to be used with any IO. > Yes, I think this is the VfsIO that was proposed. > On Mon, Mar 5, 2018 at 11:37 AM

Re: Any reason to not use [vfs]?

2018-03-05 Thread Reuven Lax
Java only is not a blocker - we don't expect all language SDKs to look the same. They should all support the same functionality, but should do so in a way that is idiomatically correct for that language. On Mon, Mar 5, 2018 at 11:23 AM Romain Manni-Bucau wrote: > >

Re: Any reason to not use [vfs]?

2018-03-05 Thread Robert Bradshaw
On Mon, Mar 5, 2018 at 11:23 AM Romain Manni-Bucau wrote: > > 2018-03-05 20:04 GMT+01:00 Chamikara Jayalath : > >> I assume you mean https://commons.apache.org/proper/commons-vfs/. >> >> I'm not sure if we considered this when we originally

Re: Any reason to not use [vfs]?

2018-03-05 Thread Chamikara Jayalath
I assume you mean https://commons.apache.org/proper/commons-vfs/. I'm not sure if we considered this when we originally implemented our own file-system abstraction but based on a quick look seems like this is Java only. I think having a similar file-system abstraction for various languages is a

Re: Any reason to not use [vfs]?

2018-03-05 Thread Romain Manni-Bucau
2018-03-05 19:54 GMT+01:00 Reuven Lax : > Are the filesystem classes marked experimental? If so, precise > compatibility is less of a concern. However vfs does need to have better fs > support first. > Anyone has some cycle to list the details here? (even without being a spec

Re: Any reason to not use [vfs]?

2018-03-05 Thread Reuven Lax
Are the filesystem classes marked experimental? If so, precise compatibility is less of a concern. However vfs does need to have better fs support first. Also what about other languages? On Mon, Mar 5, 2018, 3:35 PM Romain Manni-Bucau wrote: > I'd say to beam 2.x and to

Re: to a modular embedded java runner to replace the direct runner?

2018-03-05 Thread Romain Manni-Bucau
Interesting view Thomas - and it makes a lot of sense. Would you rather see 2 modules? embedded-runner+portable-runner+direct-runner (with inheritance in between)? Would work for me. Romain Manni-Bucau @rmannibucau | Blog |

Re: to a modular embedded java runner to replace the direct runner?

2018-03-05 Thread Thomas Groh
The portable java 'DirectRunner' is already in-progress, and has been for several months - it's tracked by https://issues.apache.org/jira/browse/BEAM-2899 My expectation is that the actual portability augmentations is unlikely to require significant changes to the DirectRunner implementations.

Re: Any reason to not use [vfs]?

2018-03-05 Thread Romain Manni-Bucau
I'd say to beam 2.x and to beam 3 to move all IO/extension from the core to actual IO/extension modules. Sounds compatible this way - in the sense we can have it eagerly without breaking anything. wdyt? Romain Manni-Bucau @rmannibucau | Blog

Re: Any reason to not use [vfs]?

2018-03-05 Thread Reuven Lax
Actually FileIO is only somewhat related. It's an interesting proposal. However a quick look shows that vfs only has read-only support for hdfs and I'm not sure it has any support for gcs. Both are often used with Beam. Once vfs supports these filesystems it's worth looking at. Maybe add to the

Re: Any reason to not use [vfs]?

2018-03-05 Thread Romain Manni-Bucau
Yes (FileIO being the visible part of the FileSystems iceberg ;)). Romain Manni-Bucau @rmannibucau | Blog | Old Blog | Github | LinkedIn

Re: to a modular embedded java runner to replace the direct runner?

2018-03-05 Thread Romain Manni-Bucau
Hi Lukasz, concretely it is pretty simple - if not let me know, i'll try to gist some code but I don't think we need: (I'll use module names, let's not discuss them, it is just to share the idea) I see it as follow: 1. beam-java-runner - bare API impl (extracted from direct runner, this is not

Re: Any reason to not use [vfs]?

2018-03-05 Thread Reuven Lax
I'm confused, as FileIO doesn't seem the same as vfs. Are you maybe referring to the filesystem abstraction instead? On Mon, Mar 5, 2018, 3:19 PM Romain Manni-Bucau wrote: > Hi guys, > > What's the rational behind the fileIO impl? > > Why not using commons-vfs + a

Any reason to not use [vfs]?

2018-03-05 Thread Romain Manni-Bucau
Hi guys, What's the rational behind the fileIO impl? Why not using commons-vfs + a pluggable format? Sounds way more open and reusable for end users than a few hardcoded supported formats, no? What's the blocker? If there is a blocker, can't we contribute to [vfs] to make it disappear? Romain

Re: Schema-Aware PCollections revisited

2018-03-05 Thread Reuven Lax
Of course! I think some BeamSQL folks should be involved as well, as this directly affects SQL work. Anton especially has expressed interest in Row and schemas. Reuven On Mon, Mar 5, 2018 at 4:30 AM Jean-Baptiste Onofré wrote: > Cool, > > can I work with you on this

Re: Schema-Aware PCollections revisited

2018-03-05 Thread Jean-Baptiste Onofré
Cool, can I work with you on this (sharing a branch for instance) ? Thanks ! Regards JB On 03/05/2018 01:01 PM, Reuven Lax wrote: > Yes, I do have a PoC in progress. The Beam Row class was being refactored, so > I > paused to wait for that to finish. > > > On Sun, Mar 4, 2018 at 8:24 PM

Re: Schema-Aware PCollections revisited

2018-03-05 Thread Reuven Lax
Yes, I do have a PoC in progress. The Beam Row class was being refactored, so I paused to wait for that to finish. On Sun, Mar 4, 2018 at 8:24 PM Jean-Baptiste Onofré wrote: > Hi Reuven, > > I revive this discussion as I think it would be a great addition. > > We had