Re: Any reason to not use [vfs]?

2018-03-06 Thread Romain Manni-Bucau
Just to share a bit more than a ticket here is a bootstrap impl https://github.com/apache/beam/pull/4803 Romain Manni-Bucau @rmannibucau | Blog | Old Blog | Github

Re: Any reason to not use [vfs]?

2018-03-06 Thread Reuven Lax
Cool. Then for now we should create a separate Vfs-backed Filesystem impl. Once Vfs supports all we need, I think we can consider keeping only that. Keep in mind that the bulk operations Luke mentioned translate to native bulk operations for Gcs at least (BatchRequest is part of the Gcs API). I'm

Re: Any reason to not use [vfs]?

2018-03-06 Thread Jean-Baptiste Onofré
+1 for the discussion and tracking. Regards JB On 03/06/2018 12:07 PM, Romain Manni-Bucau wrote: > created https://issues.apache.org/jira/browse/BEAM-3786 to track the > discussion > (without putting too much details in the ticket for now) > > > Romain Manni-Bucau > @rmannibucau

Re: Any reason to not use [vfs]?

2018-03-06 Thread Romain Manni-Bucau
created https://issues.apache.org/jira/browse/BEAM-3786 to track the discussion (without putting too much details in the ticket for now) Romain Manni-Bucau @rmannibucau | Blog | Old Blog |

Re: Any reason to not use [vfs]?

2018-03-06 Thread Romain Manni-Bucau
@Reuven: this was what I had in mind yes. Romain Manni-Bucau @rmannibucau | Blog | Old Blog | Github | LinkedIn |

Re: Any reason to not use [vfs]?

2018-03-06 Thread Reuven Lax
Part of the point of the current Filesystem class _is_ to handle these things (e.g. bulk delete/rename). If Vfs doesn't, then maybe the right answer is to keep Filesystem but put Vfs under it (and maybe that will eventually allow us to remove some of the current code). On Mon, Mar 5, 2018 at

Re: Any reason to not use [vfs]?

2018-03-05 Thread Romain Manni-Bucau
Le 6 mars 2018 01:05, "Lukasz Cwik" a écrit : As is, how does VFS improve upon the current FileSystem solution? This is about the ecosystem. I never saw beam fs implemented outside beam but saw a tons of vfs users and impl. How much work is it before VFS supports the

Re: Any reason to not use [vfs]?

2018-03-05 Thread Lukasz Cwik
As is, how does VFS improve upon the current FileSystem solution? How much work is it before VFS supports the Apache Beam usecases (bulk operations, glob support)? Is it the right direction for the VFS project to support the above changes? (things that are important to a data parallel

Re: Any reason to not use [vfs]?

2018-03-05 Thread Romain Manni-Bucau
Le 5 mars 2018 22:26, "Robert Bradshaw" a écrit : First, let's try to make the terminology abundantly clear, as I for one have (I think) misinterpreted what has been proposed. VfsFileSystem: A subclass of https://github.com/apache/beam/blob/

Re: Any reason to not use [vfs]?

2018-03-05 Thread Eugene Kirpichov
If VFS was mature enough for our needs, then I'd give a +1 to using it in Beam Java SDK - currently it's not, so we can't use it directly. It's indeed a reasonable option to use the VFS API inside Beam, and port our implementations of FileSystem(s) to that API, and then potentially donate that to

Re: Any reason to not use [vfs]?

2018-03-05 Thread Robert Bradshaw
First, let's try to make the terminology abundantly clear, as I for one have (I think) misinterpreted what has been proposed. VfsFileSystem: A subclass of

Re: Any reason to not use [vfs]?

2018-03-05 Thread Reuven Lax
terminology is confusing here, since the existing FileIO is a PTransform. VfsFilesystem would be a better name. On Mon, Mar 5, 2018 at 11:46 AM Robert Bradshaw wrote: > On Mon, Mar 5, 2018 at 11:38 AM Reuven Lax wrote: > >> What about a beam Filesystem

Re: Any reason to not use [vfs]?

2018-03-05 Thread Robert Bradshaw
On Mon, Mar 5, 2018 at 11:38 AM Reuven Lax wrote: > What about a beam Filesystem impl on top of Vfs as an alternative > short-term solution? This would allow Vfs to be used with any IO. > Yes, I think this is the VfsIO that was proposed. > On Mon, Mar 5, 2018 at 11:37 AM

Re: Any reason to not use [vfs]?

2018-03-05 Thread Reuven Lax
Java only is not a blocker - we don't expect all language SDKs to look the same. They should all support the same functionality, but should do so in a way that is idiomatically correct for that language. On Mon, Mar 5, 2018 at 11:23 AM Romain Manni-Bucau wrote: > >

Re: Any reason to not use [vfs]?

2018-03-05 Thread Robert Bradshaw
On Mon, Mar 5, 2018 at 11:23 AM Romain Manni-Bucau wrote: > > 2018-03-05 20:04 GMT+01:00 Chamikara Jayalath : > >> I assume you mean https://commons.apache.org/proper/commons-vfs/. >> >> I'm not sure if we considered this when we originally

Re: Any reason to not use [vfs]?

2018-03-05 Thread Chamikara Jayalath
I assume you mean https://commons.apache.org/proper/commons-vfs/. I'm not sure if we considered this when we originally implemented our own file-system abstraction but based on a quick look seems like this is Java only. I think having a similar file-system abstraction for various languages is a

Re: Any reason to not use [vfs]?

2018-03-05 Thread Romain Manni-Bucau
2018-03-05 19:54 GMT+01:00 Reuven Lax : > Are the filesystem classes marked experimental? If so, precise > compatibility is less of a concern. However vfs does need to have better fs > support first. > Anyone has some cycle to list the details here? (even without being a spec

Re: Any reason to not use [vfs]?

2018-03-05 Thread Reuven Lax
Are the filesystem classes marked experimental? If so, precise compatibility is less of a concern. However vfs does need to have better fs support first. Also what about other languages? On Mon, Mar 5, 2018, 3:35 PM Romain Manni-Bucau wrote: > I'd say to beam 2.x and to

Re: Any reason to not use [vfs]?

2018-03-05 Thread Romain Manni-Bucau
I'd say to beam 2.x and to beam 3 to move all IO/extension from the core to actual IO/extension modules. Sounds compatible this way - in the sense we can have it eagerly without breaking anything. wdyt? Romain Manni-Bucau @rmannibucau | Blog

Re: Any reason to not use [vfs]?

2018-03-05 Thread Reuven Lax
Actually FileIO is only somewhat related. It's an interesting proposal. However a quick look shows that vfs only has read-only support for hdfs and I'm not sure it has any support for gcs. Both are often used with Beam. Once vfs supports these filesystems it's worth looking at. Maybe add to the

Re: Any reason to not use [vfs]?

2018-03-05 Thread Romain Manni-Bucau
Yes (FileIO being the visible part of the FileSystems iceberg ;)). Romain Manni-Bucau @rmannibucau | Blog | Old Blog | Github | LinkedIn

Re: Any reason to not use [vfs]?

2018-03-05 Thread Reuven Lax
I'm confused, as FileIO doesn't seem the same as vfs. Are you maybe referring to the filesystem abstraction instead? On Mon, Mar 5, 2018, 3:19 PM Romain Manni-Bucau wrote: > Hi guys, > > What's the rational behind the fileIO impl? > > Why not using commons-vfs + a