Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-05-03 Thread Ismaël Mejía
For info both AvroIO ReadAll/ParseAll and TextIO ReadAll deprecations were merged into master today and will be part of 2.13.0. For those working in other SDKs (Python, Go) please pay attention to not implement such transforms (or deprecate them too if already done) to keep the API ideas

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-02-06 Thread Jean-Baptiste Onofré
+1 Thanks for that Ismaël. Regards JB On 06/02/2019 11:24, Ismaël Mejía wrote: > Since it seems we have consensus on deprecating both transforms I created > > BEAM-6605 Deprecate TextIO.readAll() and TextIO.ReadAll transform > BEAM-6606 Deprecate AvroIO.readAll() and AvroIO.ReadAll transform >

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-02-06 Thread Ismaël Mejía
Since it seems we have consensus on deprecating both transforms I created BEAM-6605 Deprecate TextIO.readAll() and TextIO.ReadAll transform BEAM-6606 Deprecate AvroIO.readAll() and AvroIO.ReadAll transform Thanks everyone. On Fri, Feb 1, 2019 at 7:03 PM Chamikara Jayalath wrote: > > Python SDK

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-02-01 Thread Chamikara Jayalath
Python SDK doesn't have FileIO yet so let's keep ReadAllFromFoo transforms currently available for various file types around till we have that. Thanks, Cham On Fri, Feb 1, 2019 at 7:41 AM Jean-Baptiste Onofré wrote: > Hi, > > readFiles() should be used IMHO. We should remove readAll() to avoid

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-02-01 Thread Jean-Baptiste Onofré
Hi, readFiles() should be used IMHO. We should remove readAll() to avoid confusion. Regards JB On 30/01/2019 17:25, Ismaël Mejía wrote: > Hello, > > A ‘recent’ pattern of use in Beam is to have in file based IOs a > `readAll()` implementation that basically matches a `PCollection` of > file

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-02-01 Thread Łukasz Gajowy
+1 to deprecating and not implementing readAll() in transforms where it is equivalent to matchAll() + readMatches() + readFiles(). It encourages and advertises the use of a nice, composable api reducing the amount of code to be maintained. can the Python/Go SDK align with this? + 1 to that too.

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-02-01 Thread Ismaël Mejía
I want to chime in that I am also +1 to deprecating readAll, is there anyone strongly pro readAll instead of the explicit composition? And more important, can the Python/Go SDK align with this (deprecating ReadAll and implementing ReadFiles)? On Thu, Jan 31, 2019 at 12:34 AM Chamikara Jayalath

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Chamikara Jayalath
Thanks for the clarification Ismaël and Eugene. +1 for deprecating existing FooIO.readAll() transforms in favor of FooIO.readFiles(). On Wed, Jan 30, 2019 at 3:25 PM Eugene Kirpichov wrote: > TextIO.read() and AvroIO.read() indeed perform better than match() + > readMatches() + readFiles(), due

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Eugene Kirpichov
TextIO.read() and AvroIO.read() indeed perform better than match() + readMatches() + readFiles(), due to DWR - so for these two in particular I would not recommend such a refactoring. However, new file-based IOs that do not support DWR should only provide readFiles(). Those that do, should provide

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Ismaël Mejía
I meant TextIO.readFiles(). As you mention Cham they are in reality the 'same'. TextIO.readAll = FileIO.match() . FileIO.readMatches . TextIO.readFiles And this pattern can be repeated to all other file-based IOs. The whole point of this discussion is to decide if such 'wrapper' tranform

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Chamikara Jayalath
On Wed, Jan 30, 2019 at 2:37 PM Chamikara Jayalath wrote: > > > On Wed, Jan 30, 2019 at 2:33 PM Ismaël Mejía wrote: > >> Ups slight typo, in the first line of the previous email I meant read >> instead of readAll >> >> On Wed, Jan 30, 2019 at 11:32 PM Ismaël Mejía wrote: >> > >> > Reuven is

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Chamikara Jayalath
On Wed, Jan 30, 2019 at 2:33 PM Ismaël Mejía wrote: > Ups slight typo, in the first line of the previous email I meant read > instead of readAll > > On Wed, Jan 30, 2019 at 11:32 PM Ismaël Mejía wrote: > > > > Reuven is right for the example, readAll at this moment may be faster > > and also

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Ismaël Mejía
Ups slight typo, in the first line of the previous email I meant read instead of readAll On Wed, Jan 30, 2019 at 11:32 PM Ismaël Mejía wrote: > > Reuven is right for the example, readAll at this moment may be faster > and also supports Dynamic Work Rebalancing (DWR), but the performance > of the

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Ismaël Mejía
Reuven is right for the example, readAll at this moment may be faster and also supports Dynamic Work Rebalancing (DWR), but the performance of the other approach may (and must) be improved to be equal, once the internal implementation of TextIO.read moves to a SDF version instead of the

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Robert Bradshaw
Yes, this is precisely the goal of SDF. On Wed, Jan 30, 2019 at 8:41 PM Kenneth Knowles wrote: > > So is the latter is intended for splittable DoFn but not yet using it? The > promise of SDF is precisely this composability, isn't it? > > Kenn > > On Wed, Jan 30, 2019 at 10:16 AM Jeff Klukas

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Kenneth Knowles
So is the latter is intended for splittable DoFn but not yet using it? The promise of SDF is precisely this composability, isn't it? Kenn On Wed, Jan 30, 2019 at 10:16 AM Jeff Klukas wrote: > Reuven - Is TextIO.read().from() a more complex case than the topic Ismaël > is bringing up in this

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Jeff Klukas
Reuven - Is TextIO.read().from() a more complex case than the topic Ismaël is bringing up in this thread? I'm surprised to hear that the two examples have different performance characteristics. Reading through the implementation, I guess the fundamental difference is whether a given configuration

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Reuven Lax
Jeff, what you did here is not simply a refactoring. These two are quite different, and will likely have different performance characteristics. The first evaluates the wildcard, and allows the runner to pick appropriate bundling. Bundles might contain multiple files (if they are small), and the

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Jeff Klukas
I would prefer we move towards option [2]. I just tried the following refactor in my own code from: return input .apply(TextIO.read().from(fileSpec)); to: return input .apply(FileIO.match().filepattern(fileSpec)) .apply(FileIO.readMatches())

[DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Ismaël Mejía
Hello, A ‘recent’ pattern of use in Beam is to have in file based IOs a `readAll()` implementation that basically matches a `PCollection` of file patterns and reads them, e.g. `TextIO`, `AvroIO`. `ReadAll` is implemented by a expand function that matches files with FileIO and then reads them