Re: Flink runner. Optimization for sideOutput with tags

2016-12-06 Thread Aljoscha Krettek
I'm having a look at your PRs now. I think the change is good, and it's actually quite simple too. Thanks for looking into this! On Mon, 5 Dec 2016 at 05:48 Alexey Demin wrote: > Aljoscha > > I mistaken with flink runtime =) > > What do you think about some modification

Test failure on beam-sdks-java-maven-archetypes-examples-java8

2016-12-06 Thread Manu Zhang
Guys, Has anyone seen the following failure on the latest master ? [INFO] java.lang.IllegalStateException: Failed to validate gs://apache-beam-samples/shakespeare/* [INFO] at it.pkg.MinimalWordCountJava8Test.testMinimalWordCountJava8(MinimalWordCountJava8Test.java:63) [INFO] Caused by:

Re: Increase stream parallelism after reading from UnboundedSource

2016-12-06 Thread Amit Sela
I think it is common in batch (and micro-batch for streaming) because at any given time you're computing a "chunk" (pick your naming.. we have lot's of them ;-) ) and slicing-up this chunk to distribute across more cpus if available is clearly better, but I was wondering about "event-at-a-time"

Re: Increase stream parallelism after reading from UnboundedSource

2016-12-06 Thread Raghu Angadi
On Sun, Dec 4, 2016 at 11:48 PM, Amit Sela wrote: > For any downstream computation, is it common for stream processors to > "fan-out/parallelise" the stream by shuffling the data into more > streams/partitions/bundles ? > I think so. It is pretty common in batch processing

Re: HiveIO

2016-12-06 Thread Jean-Baptiste Onofré
Hi, Ismaël and I started HiveIO. I have several IOs ready to propose as PR, but, in order to limit the number of open PRs, I would like to merge the pending ones. I will let you know when the branches/PRs will be available. Regards JB On 12/05/2016 11:40 PM, Vinoth Chandar wrote: Hi guys,

Re: [DISCUSS] ExecIO

2016-12-06 Thread Jean-Baptiste Onofré
Hi Eugene, thanks for the extended questions. I think we have two levels of expectations here: - end-user responsibility - worker/runner responsibility 1/ From a end-user perspective, the end-user has to know that using a system command (via ExecIO) and more generally speaking anything which

Re: HiveIO

2016-12-06 Thread Vinoth Chandar
Great. Thanks! Thanks, Vinoth > On Dec 6, 2016, at 2:06 AM, Jean-Baptiste Onofré wrote: > > Hi, > > Ismaël and I started HiveIO. > > I have several IOs ready to propose as PR, but, in order to limit the number > of open PRs, I would like to merge the pending ones. > > I

Re: HiveIO

2016-12-06 Thread Ismaël Mejía
Hello, If you really need to read/write via Hive, remember that you can use the Hive Jdbc driver, and achieve this with Beam using the JdbcIO (this is probably less efficient for the streaming case but still a valid solution). Ismaël On Tue, Dec 6, 2016 at 12:04 PM, Vinoth Chandar

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-06 Thread Pei He
Thanks Kenn for the feedback and questions. I responded inline. On Mon, Dec 5, 2016 at 7:49 PM, Kenneth Knowles wrote: > I really like this document. It is easy to read and informative. Three > things not addressed by the document: > > 1. Major Beam use cases. I'm sure

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-06 Thread Kenneth Knowles
Thanks for the thorough answers. It all sounds good to me. On Tue, Dec 6, 2016 at 12:57 PM, Pei He wrote: > Thanks Kenn for the feedback and questions. > > I responded inline. > > On Mon, Dec 5, 2016 at 7:49 PM, Kenneth Knowles > wrote: > > >

Re: [DISCUSS] ExecIO

2016-12-06 Thread Eugene Kirpichov
Ben - the issues of "things aren't hung, there is a shell command running", aren't they general to all DoFn's? i.e. I don't see why the runner would need to know that a shell command is running, but not that, say, a heavy monolithic computation is running. What's the benefit to the runner in