Re: Meet up at Strata+Hadoop World in Singapore

2016-11-29 Thread Aljoscha Krettek
Hi, I'll also be there to give a talk (and also at the Beam tutorial). Cheers, Aljoscha On Wed, Nov 30, 2016, 00:51 Dan Halperin wrote: > Hey folks, > > Who will be attending Strata+Hadoop World next week in Singapore? Tyler and > I will be there, giving a Beam tutorial

Re: PCollection to PCollection Conversion

2016-11-29 Thread Jean-Baptiste Onofré
Hi Jesse, yes, I started something there (using JAXB and Jackson). Let me polish and push. Regards JB On 11/29/2016 10:00 PM, Jesse Anderson wrote: I went through the string conversions. Do you have an example of writing out XML/JSON/etc too? On Tue, Nov 29, 2016 at 3:46 PM Jean-Baptiste

Meet up at Strata+Hadoop World in Singapore

2016-11-29 Thread Dan Halperin
Hey folks, Who will be attending Strata+Hadoop World next week in Singapore? Tyler and I will be there, giving a Beam tutorial [0] and some talks [2,3]. I'd love to sync in person with anyone who wants to talk Beam. Please reach out to me directly if you'd like to meet. Thanks! Dan [0]

Re: PCollection to PCollection Conversion

2016-11-29 Thread Jesse Anderson
I went through the string conversions. Do you have an example of writing out XML/JSON/etc too? On Tue, Nov 29, 2016 at 3:46 PM Jean-Baptiste Onofré wrote: > Hi Jesse, > > > https://github.com/jbonofre/incubator-beam/tree/DATAFORMAT/sdks/java/extensions/dataformat > > it's

Re: PCollection to PCollection Conversion

2016-11-29 Thread Jean-Baptiste Onofré
Hi Jesse, https://github.com/jbonofre/incubator-beam/tree/DATAFORMAT/sdks/java/extensions/dataformat it's very simple and stupid and of course not complete at all (I have other commits but not merged as they need some polishing), but as I said, it's a base of discussion. Regards JB On

Re: PCollection to PCollection Conversion

2016-11-29 Thread Jesse Anderson
@jb Sounds good. Just let us know once you've pushed. On Tue, Nov 29, 2016 at 2:54 PM Jean-Baptiste Onofré wrote: > Good point Eugene. > > Right now, it's a DoFn collection to experiment a bit (a pure > extension). It's pretty stupid ;) > > But, you are right, depending the

Re: PCollection to PCollection Conversion

2016-11-29 Thread Jean-Baptiste Onofré
Good point Eugene. Right now, it's a DoFn collection to experiment a bit (a pure extension). It's pretty stupid ;) But, you are right, depending the direction of such extension, it could cover more use cases (even if it's not my first intention ;)). Let me push the branch (pretty small) as

Re: PCollection to PCollection Conversion

2016-11-29 Thread Eugene Kirpichov
Hi JB, Depending on the scope of what you want to ultimately accomplish with this extension, I think it may make sense to write a proposal document and discuss it. If it's just a collection of utility DoFn's for various well-defined source/target format pairs, then that's probably not needed, but

Re: PCollection to PCollection Conversion

2016-11-29 Thread Jean-Baptiste Onofré
By the way Jesse, I gonna push my DATAFORMAT branch on my github and I will post on the dev mailing list when done. Regards JB On 11/29/2016 07:01 PM, Jesse Anderson wrote: I want to bring this thread back up since we've had time to think about it more and make a plan. I think a

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-11-29 Thread Jean-Baptiste Onofré
Hi Pei, rethinking about that, I understand that the purpose of the Beam filesystem is to avoid to bring a bunch of dependencies into the core. That makes perfect sense. So, I agree that a Beam filesystem abstract is fine. My point is that we should provide a HadoopFilesystem

Re: PCollection to PCollection Conversion

2016-11-29 Thread Jean-Baptiste Onofré
Hi Jesse, actually, I started a extension as a PoC: sdks/java/extensions/dataformat The purpose is to have a base for discussion and show "actual" use cases. The dataformat extension contains ready to use converter fn. Right now, I implemented (by directional): 1. JmsRecord / Avro

Re: How to create a Pipeline with Cycles

2016-11-29 Thread Shen LI
Hi Maria, Bobby, Thanks for the explanation. Regards, Shen On Tue, Nov 29, 2016 at 12:37 PM, Bobby Evans wrote: > In my experience almost all of the time cycles are bad and cause a lot of > debugging problems. Most of the time you can implement what you want by >

Re: Question regarding UnboundedReader#getTotalBacklogBytes

2016-11-29 Thread Dan Halperin
Hi Aviem, A great question! The two backlog methods (getSplitBacklogBytes () and getTotalBacklogBytes

Re: How to create a Pipeline with Cycles

2016-11-29 Thread María García Herrero
Hi Shen, No. Beam pipelines are DAGs: http://beam.incubator.apache.org/documentation/sdks/javadoc/0.3.0-incubating/org/apache/beam/sdk/Pipeline.html Best, María On Tue, Nov 29, 2016 at 7:44 AM, Shen LI wrote: > Hi, > > Can I use Beam to create a pipeline with cycles? For

Question regarding UnboundedReader#getTotalBacklogBytes

2016-11-29 Thread Aviem Zur
UnboundedReader exposes method getTotalBacklogBytes() which promises: * Returns the size of the backlog of unread data in the underlying data source represented by split of this source. But there are no implementations of this method in any source, I'm assuming that this is because splits are