[jira] [Commented] (BEAM-1148) Port PAssert away from Aggregators

2016-12-14 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748240#comment-15748240
 ] 

Aljoscha Krettek commented on BEAM-1148:


Just out of curiosity, what are they going to be replaced with?

> Port PAssert away from Aggregators
> --
>
> Key: BEAM-1148
> URL: https://issues.apache.org/jira/browse/BEAM-1148
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>
> One step in the removal of Aggregators (in favor of Metrics) is to remove our 
> reliance on them for PAssert checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1096) flink streaming side output optimization using SplitStream

2016-12-07 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-1096:
---
Assignee: Alexey Diomin

> flink streaming side output optimization using SplitStream
> --
>
> Key: BEAM-1096
> URL: https://issues.apache.org/jira/browse/BEAM-1096
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 0.4.0-incubating
>Reporter: Alexey Diomin
>Assignee: Alexey Diomin
>Priority: Minor
> Fix For: 0.4.0-incubating
>
>
> Current implementation:
> 1) send all events in all output streams
> 2) filtering streams for necessary tags
> Cons: increased cpu usage for serialization all events
> Proposed implementation:
> 1) route event in correct streams based on tag



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1095) Add support set config for reuse-object on flink

2016-12-07 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-1095:
---
Assignee: Alexey Diomin

> Add support set config for reuse-object on flink
> 
>
> Key: BEAM-1095
> URL: https://issues.apache.org/jira/browse/BEAM-1095
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 0.4.0-incubating
>Reporter: Alexey Diomin
>Assignee: Alexey Diomin
>Priority: Trivial
> Fix For: 0.4.0-incubating
>
>
> Object-reuse is dangerous setting and disabled by default,
> but sometime we need use this option to omit performance overhead for 
> serialization-deserialization objects on every transformations 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-1095) Add support set config for reuse-object on flink

2016-12-07 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-1095.
--
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Add support set config for reuse-object on flink
> 
>
> Key: BEAM-1095
> URL: https://issues.apache.org/jira/browse/BEAM-1095
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 0.4.0-incubating
>Reporter: Alexey Diomin
>Assignee: Alexey Diomin
>Priority: Trivial
> Fix For: 0.4.0-incubating
>
>
> Object-reuse is dangerous setting and disabled by default,
> but sometime we need use this option to omit performance overhead for 
> serialization-deserialization objects on every transformations 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1107) Display user names for steps in the Flink Web UI

2016-12-07 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730375#comment-15730375
 ] 

Aljoscha Krettek commented on BEAM-1107:


Yep, you're right but even in the black text the operation names (MapPartition, 
GroupCombine and so on) are hardcoded in Flink right now so we cannot change 
that coming from Beam-on-Flink. Changing that would require changes to Flink 
(which I'm not opposed to).

> Display user names for steps in the Flink Web UI
> 
>
> Key: BEAM-1107
> URL: https://issues.apache.org/jira/browse/BEAM-1107
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
> Attachments: screenshot-1.png
>
>
> [copying in-person / email discussion at Strata Singapore to JIRA]
> The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the 
> "SDK name" for the transform.
> The "user name" for the transform is not available here, it is in fact on the 
> TransformHierarchy.Node as node.getFullName() [2].
> getFullName() is used some in Flink, but not when setting step names.
> I drafted a quick commit that sort of propagates the user names to the web UI 
> (but only for DataSource, and still too verbose: 
> https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c)
> Before this change, the "ReadLines" step showed up as: "DataSource (at 
> Read(CompressedSource) 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> With this change, it shows up as "DataSource (at ReadLines/Read 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.].
> Thoughts?
> [1] 
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129
> [2] 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-1102) Flink Batch Runner does not populate aggregator values

2016-12-06 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-1102.
--
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Flink Batch Runner does not populate aggregator values
> --
>
> Key: BEAM-1102
> URL: https://issues.apache.org/jira/browse/BEAM-1102
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.3.0-incubating
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
>Priority: Minor
> Fix For: 0.4.0-incubating
>
>
> Running the quickstart gives 0 for emptyLines.
> Running with {{--streaming=true}} gives the correct value (for my input file, 
> the default examples archetype {{pom.xml}}, the true value is 27 at the time 
> of writing).
> Streaming output:
> {code}
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: Final aggregator values:
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: DroppedDueToLateness : 0
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: emptyLines : 27
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: DroppedDueToClosedWindow : 0
> {code}
> Non-streaming output:
> {code}
> Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: Final aggregator values:
> Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: emptyLines : 0
> {code}
> (Note also that the lateness etc. aggregators are missing entirely, may be 
> expected).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1103) Add Tests For Aggregators in Flink Runner

2016-12-06 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-1103:
--

 Summary: Add Tests For Aggregators in Flink Runner
 Key: BEAM-1103
 URL: https://issues.apache.org/jira/browse/BEAM-1103
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Aljoscha Krettek


We currently don't have tests that verify that aggregator values are correctly 
forwarded to Flink.

They didn't work correctly in the Batch Flink runner, as seen in BEAM-1102.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1102) Flink Batch Runner does not populate aggregator values

2016-12-06 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727742#comment-15727742
 ] 

Aljoscha Krettek commented on BEAM-1102:


The problem is this part in {{FlinkProcessContextBase}}:

{code}
  @Override
  protected  Aggregator
  createAggregatorInternal(String name, Combine.CombineFn combiner) {
SerializableFnAggregatorWrapper wrapper =
new SerializableFnAggregatorWrapper<>(combiner);
Accumulator existingAccum =
(Accumulator) 
runtimeContext.getAccumulator(name);
if (existingAccum != null) {
  return wrapper;
} else {
  runtimeContext.addAccumulator(name, wrapper);
}
return wrapper;
  }
{code}

Notice how the newly created wrapper is returned if the accumulator already 
exists.

> Flink Batch Runner does not populate aggregator values
> --
>
> Key: BEAM-1102
> URL: https://issues.apache.org/jira/browse/BEAM-1102
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.3.0-incubating
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
>Priority: Minor
>
> Running the quickstart gives 0 for emptyLines.
> Running with {{--streaming=true}} gives the correct value (for my input file, 
> the default examples archetype {{pom.xml}}, the true value is 27 at the time 
> of writing).
> Streaming output:
> {code}
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: Final aggregator values:
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: DroppedDueToLateness : 0
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: emptyLines : 27
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: DroppedDueToClosedWindow : 0
> {code}
> Non-streaming output:
> {code}
> Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: Final aggregator values:
> Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: emptyLines : 0
> {code}
> (Note also that the lateness etc. aggregators are missing entirely, may be 
> expected).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts

2016-12-06 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-1092:
---
Comment: was deleted

(was: Is 1) necessary if we do 2).

I think shading is very necessary (unfortunately) so I like this issue but I 
would also like to keep the example/archetype POMs as simple as possible.)

> Shade commonly used libraries (e.g. Guava) to avoid class conflicts
> ---
>
> Key: BEAM-1092
> URL: https://issues.apache.org/jira/browse/BEAM-1092
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, sdk-java-extensions
>Affects Versions: 0.3.0-incubating
>Reporter: Maximilian Michels
>Assignee: Frances Perry
> Fix For: 0.4.0-incubating
>
>
> Beam shades away some of its dependencies like Guava to avoid user classes 
> from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, 
> do not shade any classes and directly depend on potentially conflicting 
> libraries (e.g. Guava). Also, users might manually add such libraries as 
> dependencies.
> Runners who add classes to the classpath (e.g. Hadoop) can run into conflict 
> with multiple versions of the same class. To prevent that, we should adjust 
> the Maven archetypes pom files used for the Quickstart to perform shading of 
> commonly used libraries (again, Guava is often the culprit).
> To prevent the problem in the first place, we should expand the shading of 
> Guava and other libraries to all modules which make use of these. 
> To solve both dimensions of the issue, we need to address:
> 1. Adding shading of commonly used libraries to the archetypes poms
> 2. Properly shade all commonly used libraries in the SDK modules
> 2) seems to be of highest priority since it affects users who simply use the 
> provided IO modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts

2016-12-06 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727530#comment-15727530
 ] 

Aljoscha Krettek commented on BEAM-1092:


Is 1) necessary if we do 2)?

I think shading is very necessary (unfortunately) so I like this issue but I 
would also like to keep the example/archetype POMs as simple as possible.

> Shade commonly used libraries (e.g. Guava) to avoid class conflicts
> ---
>
> Key: BEAM-1092
> URL: https://issues.apache.org/jira/browse/BEAM-1092
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, sdk-java-extensions
>Affects Versions: 0.3.0-incubating
>Reporter: Maximilian Michels
>Assignee: Frances Perry
> Fix For: 0.4.0-incubating
>
>
> Beam shades away some of its dependencies like Guava to avoid user classes 
> from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, 
> do not shade any classes and directly depend on potentially conflicting 
> libraries (e.g. Guava). Also, users might manually add such libraries as 
> dependencies.
> Runners who add classes to the classpath (e.g. Hadoop) can run into conflict 
> with multiple versions of the same class. To prevent that, we should adjust 
> the Maven archetypes pom files used for the Quickstart to perform shading of 
> commonly used libraries (again, Guava is often the culprit).
> To prevent the problem in the first place, we should expand the shading of 
> Guava and other libraries to all modules which make use of these. 
> To solve both dimensions of the issue, we need to address:
> 1. Adding shading of commonly used libraries to the archetypes poms
> 2. Properly shade all commonly used libraries in the SDK modules
> 2) seems to be of highest priority since it affects users who simply use the 
> provided IO modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts

2016-12-06 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727529#comment-15727529
 ] 

Aljoscha Krettek commented on BEAM-1092:


Is 1) necessary if we do 2).

I think shading is very necessary (unfortunately) so I like this issue but I 
would also like to keep the example/archetype POMs as simple as possible.

> Shade commonly used libraries (e.g. Guava) to avoid class conflicts
> ---
>
> Key: BEAM-1092
> URL: https://issues.apache.org/jira/browse/BEAM-1092
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, sdk-java-extensions
>Affects Versions: 0.3.0-incubating
>Reporter: Maximilian Michels
>Assignee: Frances Perry
> Fix For: 0.4.0-incubating
>
>
> Beam shades away some of its dependencies like Guava to avoid user classes 
> from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, 
> do not shade any classes and directly depend on potentially conflicting 
> libraries (e.g. Guava). Also, users might manually add such libraries as 
> dependencies.
> Runners who add classes to the classpath (e.g. Hadoop) can run into conflict 
> with multiple versions of the same class. To prevent that, we should adjust 
> the Maven archetypes pom files used for the Quickstart to perform shading of 
> commonly used libraries (again, Guava is often the culprit).
> To prevent the problem in the first place, we should expand the shading of 
> Guava and other libraries to all modules which make use of these. 
> To solve both dimensions of the issue, we need to address:
> 1. Adding shading of commonly used libraries to the archetypes poms
> 2. Properly shade all commonly used libraries in the SDK modules
> 2) seems to be of highest priority since it affects users who simply use the 
> provided IO modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-506) Fill in the documentation/runners/flink portion of the website

2016-12-02 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-506.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Fill in the documentation/runners/flink portion of the website
> --
>
> Key: BEAM-506
> URL: https://issues.apache.org/jira/browse/BEAM-506
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Frances Perry
>Assignee: Aljoscha Krettek
> Fix For: Not applicable
>
>
> As per 
> https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit.
> Should be a landing page for Flink-specific details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-506) Fill in the documentation/runners/flink portion of the website

2016-11-29 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek reassigned BEAM-506:
-

Assignee: Aljoscha Krettek  (was: James Malone)

> Fill in the documentation/runners/flink portion of the website
> --
>
> Key: BEAM-506
> URL: https://issues.apache.org/jira/browse/BEAM-506
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Frances Perry
>Assignee: Aljoscha Krettek
>
> As per 
> https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit.
> Should be a landing page for Flink-specific details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-1007) Runner Toggles in Quickstart.md Don't Work in Safari

2016-11-21 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-1007.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Runner Toggles in Quickstart.md Don't Work in Safari 
> -
>
> Key: BEAM-1007
> URL: https://issues.apache.org/jira/browse/BEAM-1007
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Aljoscha Krettek
>Assignee: Abdullah Bashir
>Priority: Minor
> Fix For: Not applicable
>
> Attachments: Screen Shot 2016-11-19 at 00.00.45.png
>
>
> I'm observing this with macOS Sierra 10.12.1 and Safari 10.0.1 
> (12602.2.14.0.7).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1007) Runner Toggles in Quickstart.md Don't Work in Safari

2016-11-18 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-1007:
---
Attachment: Screen Shot 2016-11-19 at 00.00.45.png

> Runner Toggles in Quickstart.md Don't Work in Safari 
> -
>
> Key: BEAM-1007
> URL: https://issues.apache.org/jira/browse/BEAM-1007
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Aljoscha Krettek
>Assignee: Abdullah Bashir
>Priority: Minor
> Attachments: Screen Shot 2016-11-19 at 00.00.45.png
>
>
> I'm observing this with macOS Sierra 10.12.1 and Safari 10.0.1 
> (12602.2.14.0.7).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1007) Runner Toggles in Quickstart.md Don't Work in Safari

2016-11-18 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678046#comment-15678046
 ] 

Aljoscha Krettek commented on BEAM-1007:


[~mabdullah353], I assigned to you because you initially implemented this and I 
have no idea of what might be going on there. Hope that's alright.

> Runner Toggles in Quickstart.md Don't Work in Safari 
> -
>
> Key: BEAM-1007
> URL: https://issues.apache.org/jira/browse/BEAM-1007
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Aljoscha Krettek
>Assignee: Abdullah Bashir
>Priority: Minor
> Attachments: Screen Shot 2016-11-19 at 00.00.45.png
>
>
> I'm observing this with macOS Sierra 10.12.1 and Safari 10.0.1 
> (12602.2.14.0.7).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1007) Runner Toggles in Quickstart.md Don't Work in Safari

2016-11-18 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-1007:
--

 Summary: Runner Toggles in Quickstart.md Don't Work in Safari 
 Key: BEAM-1007
 URL: https://issues.apache.org/jira/browse/BEAM-1007
 Project: Beam
  Issue Type: Bug
  Components: website
Reporter: Aljoscha Krettek
Assignee: Abdullah Bashir
Priority: Minor


I'm observing this with macOS Sierra 10.12.1 and Safari 10.0.1 (12602.2.14.0.7).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-899) Flink quickstart instructions

2016-11-16 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-899.
-
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Flink quickstart instructions
> -
>
> Key: BEAM-899
> URL: https://issues.apache.org/jira/browse/BEAM-899
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Frances Perry
>Assignee: Aljoscha Krettek
> Fix For: 0.4.0-incubating
>
>
> After initial quickstart structure is pushed, add commandlines for Flink 
> execution to quickstart.md and detailed Flink setup instructions to 
> learn/runners/flink.md



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-965) Source Transformations Don't Set Correct Output Type in Flink Streaming Runner

2016-11-14 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-965.
-
Resolution: Fixed

> Source Transformations Don't Set Correct Output Type in Flink Streaming Runner
> --
>
> Key: BEAM-965
> URL: https://issues.apache.org/jira/browse/BEAM-965
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.4.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-899) Flink quickstart instructions

2016-11-12 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659390#comment-15659390
 ] 

Aljoscha Krettek commented on BEAM-899:
---

Sorry for the delay, I'm working on this myself now.

> Flink quickstart instructions
> -
>
> Key: BEAM-899
> URL: https://issues.apache.org/jira/browse/BEAM-899
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Frances Perry
>Assignee: Aljoscha Krettek
>
> After initial quickstart structure is pushed, add commandlines for Flink 
> execution to quickstart.md and detailed Flink setup instructions to 
> learn/runners/flink.md



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-965) Source Operations Don't Set Correct Output Type in Flink Streaming Runner

2016-11-11 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-965:
-

 Summary: Source Operations Don't Set Correct Output Type in Flink 
Streaming Runner
 Key: BEAM-965
 URL: https://issues.apache.org/jira/browse/BEAM-965
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
 Fix For: 0.4.0-incubating






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-956) Execute ReduceFnRunner Directly in Flink Runner

2016-11-10 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-956:
-

 Summary: Execute ReduceFnRunner Directly in Flink Runner
 Key: BEAM-956
 URL: https://issues.apache.org/jira/browse/BEAM-956
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Aljoscha Krettek


Right now, a {{ReduceFnRunner}} is executed via 
{{GroupAlsoByWindowViaWindowSetDoFn}} which in turn is executed via a 
{{DoFnRunner}}. We should change that to get rid of the dependence on 
{{GroupAlsoByWindowViaWindowSetDoFn}} which is an {{OldDoFn}} and also to get 
rid of some unneeded layering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-931) Findbugs doesn't pass in Flink runner

2016-11-10 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-931.
-
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Findbugs doesn't pass in Flink runner
> -
>
> Key: BEAM-931
> URL: https://issues.apache.org/jira/browse/BEAM-931
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
> Fix For: 0.4.0-incubating
>
>
> {code}
> [INFO] --- findbugs-maven-plugin:3.0.1:check (default) @ 
> beam-runners-flink_2.10 ---
> [INFO] BugInstance size is 9
> [INFO] Error size is 0
> [INFO] Total bugs: 9
> [INFO] Class org.apache.beam.runners.flink.FlinkRunner$StreamingViewAsMap 
> defines non-transient non-serializable instance field runner 
> [org.apache.beam.runners.flink.FlinkRunner$StreamingViewAsMap] In 
> FlinkRunner.java
> [INFO] Class 
> org.apache.beam.runners.flink.FlinkRunner$StreamingViewAsMultimap defines 
> non-transient non-serializable instance field runner 
> [org.apache.beam.runners.flink.FlinkRunner$StreamingViewAsMultimap] In 
> FlinkRunner.java
> [INFO] Return value of org.apache.beam.sdk.io.Read$Bounded.getSource() 
> ignored, but method has no side effect 
> [org.apache.beam.runners.flink.translation.FlinkStreamingTransformTranslators$BoundedReadSourceTranslator]
>  At FlinkStreamingTransformTranslators.java:[line 282]
> [INFO] Return value of org.apache.beam.sdk.io.Read$Unbounded.getSource() 
> ignored, but method has no side effect 
> [org.apache.beam.runners.flink.translation.FlinkStreamingTransformTranslators$UnboundedReadSourceTranslator]
>  At FlinkStreamingTransformTranslators.java:[line 252]
> [INFO] 
> org.apache.beam.runners.flink.translation.wrappers.SerializableFnAggregatorWrapper.clone()
>  does not call super.clone() 
> [org.apache.beam.runners.flink.translation.wrappers.SerializableFnAggregatorWrapper]
>  At SerializableFnAggregatorWrapper.java:[lines 84-89]
> [INFO] Class 
> org.apache.beam.runners.flink.translation.wrappers.streaming.WindowDoFnOperator
>  defines non-transient non-serializable instance field stateInternals 
> [org.apache.beam.runners.flink.translation.wrappers.streaming.WindowDoFnOperator]
>  In WindowDoFnOperator.java
> [INFO] Class 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSocketSource$UnboundedSocketReader
>  defines non-transient non-serializable instance field reader 
> [org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSocketSource$UnboundedSocketReader]
>  In UnboundedSocketSource.java
> [INFO] Class 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSocketSource$UnboundedSocketReader
>  defines non-transient non-serializable instance field socket 
> [org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSocketSource$UnboundedSocketReader]
>  In UnboundedSocketSource.java
> [INFO] Unconditional wait in 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(SourceFunction$SourceContext)
>  
> [org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper]
>  At UnboundedSourceWrapper.java:[line 243]
> [INFO]   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-806) Maven Release Plugin Does Not Set Archetype Versions

2016-10-31 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-806:
--
Fix Version/s: (was: 0.3.0-incubating)
   0.4.0-incubating

> Maven Release Plugin Does Not Set Archetype Versions
> 
>
> Key: BEAM-806
> URL: https://issues.apache.org/jira/browse/BEAM-806
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 0.1.0-incubating, 0.2.0-incubating, 0.3.0-incubating
>Reporter: Aljoscha Krettek
>Priority: Blocker
> Fix For: 0.4.0-incubating
>
>
> When running {{mvn release:prepare}} as described in the new release guide 
> this does not update the version of the poms in the archetypes. To be clear, 
> the version of the archetype pom is updated, the pom in 
> {{archetype-resources}} (the pom of the project generated by the archetype) 
> is not updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-864) Update to latest Apache Maven-Parent

2016-10-30 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-864:
-

 Summary: Update to latest Apache Maven-Parent
 Key: BEAM-864
 URL: https://issues.apache.org/jira/browse/BEAM-864
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: Aljoscha Krettek
Assignee: Jean-Baptiste Onofré


The release plugin creates a DEPENDENCIES file that is not properly excluded 
from the rat check with the current version of the Apache Maven-Parent that we 
are using.

This is the relevant RAT Jira issue: 
https://issues.apache.org/jira/browse/RAT-184



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-862) Flink PostCommit Fails On Jenkins

2016-10-28 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616689#comment-15616689
 ] 

Aljoscha Krettek commented on BEAM-862:
---

No problemo!

> Flink PostCommit Fails On Jenkins
> -
>
> Key: BEAM-862
> URL: https://issues.apache.org/jira/browse/BEAM-862
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.4.0-incubating
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.4.0-incubating
>
>
> CC: [~kenn]
> Due to a combination of aggregator creation in the Flink `DoFnOperator` not 
> being idempotent and this recent change 
> https://github.com/apache/incubator-beam/commit/c2e751f49d72968f2478931cdb884fd4af173610
>  the tests are failing.
> I have a fix already and only now did I realise that you already have fixed 
> this for the Flink Batch runner here: 
> https://github.com/apache/incubator-beam/commit/2089c5cd2662a2eeea39ac7ebd1bfd8bcdc1aa16.
> I need to make the Jenkins emails more prominent in my inbox ... ;-)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-862) Flink PostCommit Fails On Jenkins

2016-10-28 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-862.
-
Resolution: Fixed

> Flink PostCommit Fails On Jenkins
> -
>
> Key: BEAM-862
> URL: https://issues.apache.org/jira/browse/BEAM-862
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.4.0-incubating
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.4.0-incubating
>
>
> CC: [~kenn]
> Due to a combination of aggregator creation in the Flink `DoFnOperator` not 
> being idempotent and this recent change 
> https://github.com/apache/incubator-beam/commit/c2e751f49d72968f2478931cdb884fd4af173610
>  the tests are failing.
> I have a fix already and only now did I realise that you already have fixed 
> this for the Flink Batch runner here: 
> https://github.com/apache/incubator-beam/commit/2089c5cd2662a2eeea39ac7ebd1bfd8bcdc1aa16.
> I need to make the Jenkins emails more prominent in my inbox ... ;-)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-862) Flink PostCommit Fails On Jenkins

2016-10-28 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-862:
-

 Summary: Flink PostCommit Fails On Jenkins
 Key: BEAM-862
 URL: https://issues.apache.org/jira/browse/BEAM-862
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Affects Versions: 0.4.0-incubating
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
 Fix For: 0.4.0-incubating


CC: [~kenn]

Due to a combination of aggregator creation in the Flink `DoFnOperator` not 
being idempotent and this recent change 
https://github.com/apache/incubator-beam/commit/c2e751f49d72968f2478931cdb884fd4af173610
 the tests are failing.

I have a fix already and only now did I realise that you already have fixed 
this for the Flink Batch runner here: 
https://github.com/apache/incubator-beam/commit/2089c5cd2662a2eeea39ac7ebd1bfd8bcdc1aa16.

I need to make the Jenkins emails more prominent in my inbox ... ;-)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-843) Use New DoFn Directly in Flink Runner

2016-10-27 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-843:
-

 Summary: Use New DoFn Directly in Flink Runner
 Key: BEAM-843
 URL: https://issues.apache.org/jira/browse/BEAM-843
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Aljoscha Krettek
 Fix For: 0.4.0-incubating






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-696) Side-Inputs non-deterministic with merging main-input windows

2016-10-24 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602937#comment-15602937
 ] 

Aljoscha Krettek commented on BEAM-696:
---

Just a clarification, the Flink runner does not pre combine when we have 
merging windows AND side inputs. If you just have merging windows then these 
are incrementally combined (you said pre-combined). If you have side inputs and 
a non-merging window function then we also incrementally combine (pre-combine).

> Side-Inputs non-deterministic with merging main-input windows
> -
>
> Key: BEAM-696
> URL: https://issues.apache.org/jira/browse/BEAM-696
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Ben Chambers
>Assignee: Pei He
>
> Side-Inputs are non-deterministic for several reasons:
> 1. Because they depend on triggering of the side-input (this is acceptable 
> because triggers are by their nature non-deterministic).
> 2. They depend on the current state of the main-input window in order to 
> lookup the side-input. This means that with merging
> 3. Any runner optimizations that affect when the side-input is looked up may 
> cause problems with either or both of these.
> This issue focuses on #2 -- the non-determinism of side-inputs that execute 
> within a Merging WindowFn.
> Possible solution would be to defer running anything that looks up the 
> side-input until we need to extract an output, and using the main-window at 
> that point. Specifically, if the main-window is a MergingWindowFn, don't 
> execute any kind of pre-combine, instead buffer all the inputs and combine 
> later.
> This could still run into some non-determinism if there are triggers 
> controlling when we extract output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-806) Maven Release Plugin Does Not Set Archetype Versions

2016-10-24 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-806:
--
Priority: Blocker  (was: Major)

> Maven Release Plugin Does Not Set Archetype Versions
> 
>
> Key: BEAM-806
> URL: https://issues.apache.org/jira/browse/BEAM-806
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 0.1.0-incubating, 0.2.0-incubating, 0.3.0-incubating
>Reporter: Aljoscha Krettek
>Priority: Blocker
> Fix For: 0.3.0-incubating
>
>
> When running {{mvn release:prepare}} as described in the new release guide 
> this does not update the version of the poms in the archetypes. To be clear, 
> the version of the archetype pom is updated, the pom in 
> {{archetype-resources}} (the pom of the project generated by the archetype) 
> is not updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-806) Maven Release Plugin Does Not Set Archetype Versions

2016-10-24 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-806:
-

 Summary: Maven Release Plugin Does Not Set Archetype Versions
 Key: BEAM-806
 URL: https://issues.apache.org/jira/browse/BEAM-806
 Project: Beam
  Issue Type: Improvement
  Components: build-system
Affects Versions: 0.2.0-incubating, 0.1.0-incubating, 0.3.0-incubating
Reporter: Aljoscha Krettek
 Fix For: 0.3.0-incubating


When running {{mvn release:prepare}} as described in the new release guide this 
does not update the version of the poms in the archetypes. To be clear, the 
version of the archetype pom is updated, the pom in {{archetype-resources}} 
(the pom of the project generated by the archetype) is not updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-633) Be able to import Beam codebase in Eclipse and support m2e

2016-10-24 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-633.
-
   Resolution: Duplicate
Fix Version/s: (was: 0.3.0-incubating)
   Not applicable

> Be able to import Beam codebase in Eclipse and support m2e
> --
>
> Key: BEAM-633
> URL: https://issues.apache.org/jira/browse/BEAM-633
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (BEAM-633) Be able to import Beam codebase in Eclipse and support m2e

2016-10-24 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek reopened BEAM-633:
---

Reopening to close as non-resolved so that it doesn't show up in the release 
notes.

> Be able to import Beam codebase in Eclipse and support m2e
> --
>
> Key: BEAM-633
> URL: https://issues.apache.org/jira/browse/BEAM-633
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
> Fix For: 0.3.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-102) Support Side Inputs in Flink Streaming Runner

2016-10-24 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-102:
--
Summary: Support Side Inputs in Flink Streaming Runner  (was: Side Inputs 
for Streaming)

> Support Side Inputs in Flink Streaming Runner
> -
>
> Key: BEAM-102
> URL: https://issues.apache.org/jira/browse/BEAM-102
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
> Fix For: 0.3.0-incubating
>
>
> The Flink Runner supports side inputs for batch mode but its missing support 
> for streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-696) Side-Inputs non-deterministic with merging main-input windows

2016-10-12 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569001#comment-15569001
 ] 

Aljoscha Krettek commented on BEAM-696:
---

Flink doesn't use bundles but it just keeps all the elements in fault-tolerant 
state until a trigger fires. So mostly yes to your question. :-)

> Side-Inputs non-deterministic with merging main-input windows
> -
>
> Key: BEAM-696
> URL: https://issues.apache.org/jira/browse/BEAM-696
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Ben Chambers
>Assignee: Pei He
>
> Side-Inputs are non-deterministic for several reasons:
> 1. Because they depend on triggering of the side-input (this is acceptable 
> because triggers are by their nature non-deterministic).
> 2. They depend on the current state of the main-input window in order to 
> lookup the side-input. This means that with merging
> 3. Any runner optimizations that affect when the side-input is looked up may 
> cause problems with either or both of these.
> This issue focuses on #2 -- the non-determinism of side-inputs that execute 
> within a Merging WindowFn.
> Possible solution would be to defer running anything that looks up the 
> side-input until we need to extract an output, and using the main-window at 
> that point. Specifically, if the main-window is a MergingWindowFn, don't 
> execute any kind of pre-combine, instead buffer all the inputs and combine 
> later.
> This could still run into some non-determinism if there are triggers 
> controlling when we extract output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-696) Side-Inputs non-deterministic with merging main-input windows

2016-10-11 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565001#comment-15565001
 ] 

Aljoscha Krettek commented on BEAM-696:
---

Just to make it clear what the Flink runner does for this: When a GroupByKey 
has a merging {{WindowFn}} and side inputs we don't to any eager 
processing/merging of the inputs but defer until "as late as possible". This is 
essentially what [~pei...@gmail.com] mentioned for the data flow runner above.

> Side-Inputs non-deterministic with merging main-input windows
> -
>
> Key: BEAM-696
> URL: https://issues.apache.org/jira/browse/BEAM-696
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Ben Chambers
>Assignee: Pei He
>
> Side-Inputs are non-deterministic for several reasons:
> 1. Because they depend on triggering of the side-input (this is acceptable 
> because triggers are by their nature non-deterministic).
> 2. They depend on the current state of the main-input window in order to 
> lookup the side-input. This means that with merging
> 3. Any runner optimizations that affect when the side-input is looked up may 
> cause problems with either or both of these.
> This issue focuses on #2 -- the non-determinism of side-inputs that execute 
> within a Merging WindowFn.
> Possible solution would be to defer running anything that looks up the 
> side-input until we need to extract an output, and using the main-window at 
> that point. Specifically, if the main-window is a MergingWindowFn, don't 
> execute any kind of pre-combine, instead buffer all the inputs and combine 
> later.
> This could still run into some non-determinism if there are triggers 
> controlling when we extract output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-720) Running WindowedWordCount Integration Test in Flink

2016-10-11 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-720:
--
Component/s: runner-flink

> Running WindowedWordCount Integration Test in Flink
> ---
>
> Key: BEAM-720
> URL: https://issues.apache.org/jira/browse/BEAM-720
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Mark Liu
>Assignee: Aljoscha Krettek
>
> In order to have coverage of streaming pipeline test in pre-commit, it's 
> important to have TestFlinkRunner to be able to run WindowedWordCountIT 
> successfully. 
> Relevant works in TestDataflowRunner:
> https://github.com/apache/incubator-beam/pull/1045



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-615) Add Support for Processing-Time Timers in FlinkRunner Window Operator

2016-09-28 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek reassigned BEAM-615:
-

Assignee: Aljoscha Krettek  (was: Kenneth Knowles)

> Add Support for Processing-Time Timers in FlinkRunner Window Operator
> -
>
> Key: BEAM-615
> URL: https://issues.apache.org/jira/browse/BEAM-615
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating, 0.2.0-incubating, 0.3.0-incubating
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The Flink runner never had support for processing-time timers, they are 
> silently ignored when a trigger tries to set one. This should be easy to add 
> in {{WindowFnOperator}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-644) Primitive to shift the watermark while assigning timestamps

2016-09-26 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522966#comment-15522966
 ] 

Aljoscha Krettek commented on BEAM-644:
---

[~kenn] I was referring to "two clusters of elements from two separate input 
elements" but that's somewhat besides the point because I was thinking about 
how a Kafka source would be implemented as a combination of {{DoFn}} plus 
{{SplittableDoFn}}. There you need to manage the watermark at the 
{{SplittableDoFn}} which would be responsible for reading from topics.

I think we might be talking about different things here. As I said, the 
proposed changes are very good in how they simplify the API of {{DoFn}} and 
also clean up stuff around allowed time skew.

What I was thinking about is in general a problem with watermarks. I though 
that the proposal here was meant to fixed that but I don't think we can. What I 
was trying to get at essentially boils down to this: If we want our watermark 
to be 100 % correct then we can never advance it because we never know what 
timestamps future elements will have. (For the general case, where any data 
with any timestamp can arrive at any point in (processing) time.). I was just 
pondering that and I'm afraid it derailed the discussion a bit.

> Primitive to shift the watermark while assigning timestamps
> ---
>
> Key: BEAM-644
> URL: https://issues.apache.org/jira/browse/BEAM-644
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> There is a general need, especially important in the presence of 
> SplittableDoFn, to be able to assign new timestamps to elements without 
> making them late or droppable.
>  - DoFn.withAllowedTimestampSkew is inadequate, because it simply allows one 
> to produce late data, but does not allow one to shift the watermark so the 
> new data is on-time.
>  - For a SplittableDoFn, one may receive an element such as the name of a log 
> file that contains elements for the day preceding the log file. The timestamp 
> on the filename must currently be the beginning of the log. If such elements 
> are constantly flowing, it may be OK, but since we don't know that element is 
> coming, in that absence of data, the watermark may advance. We need a way to 
> keep it far enough back even in the absence of data holding it back.
> One idea is a new primitive ShiftWatermark / AdjustTimestamps with the 
> following pieces:
>  - A constant duration (positive or negative) D by which to shift the 
> watermark.
>  - A function from TimestampedElement to new timestamp that is >= t + D
> So, for example, AdjustTimestamps(<-60 minutes>, f) would allow f to make 
> timestamps up to 60 minutes earlier.
> With this primitive added, outputWithTimestamp and withAllowedTimestampSkew 
> could be removed, simplifying DoFn.
> Alternatively, all of this functionality could be bolted on to DoFn.
> This ticket is not a proposal, but a record of the issue and ideas that were 
> mentioned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-644) Primitive to shift the watermark while assigning timestamps

2016-09-23 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515875#comment-15515875
 ] 

Aljoscha Krettek commented on BEAM-644:
---

Yes, as a replacement for {{outputWithTimestamp}} and 
{{withAllowedTimestampSkew}} this new proposal is perfect.

[~kenn], I was just thinking about {{SplittableDoFn}} and what happens in the 
absence of data. Say you have some data that you emit form the DoFn that is 
clustered around timestamp {{t}}, then you have no data for a while and then 
you get data that is clustered around {{t + 100}}. In order for that data to 
not be late the watermark has to be held at {{t + 100}} but you cannot know 
that until you actually see the newer data. Holding back by some constant {{D}} 
would not help in that case. Or I might be missing something, of course.

> Primitive to shift the watermark while assigning timestamps
> ---
>
> Key: BEAM-644
> URL: https://issues.apache.org/jira/browse/BEAM-644
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> There is a general need, especially important in the presence of 
> SplittableDoFn, to be able to assign new timestamps to elements without 
> making them late or droppable.
>  - DoFn.withAllowedTimestampSkew is inadequate, because it simply allows one 
> to produce late data, but does not allow one to shift the watermark so the 
> new data is on-time.
>  - For a SplittableDoFn, one may receive an element such as the name of a log 
> file that contains elements for the day preceding the log file. The timestamp 
> on the filename must currently be the beginning of the log. If such elements 
> are constantly flowing, it may be OK, but since we don't know that element is 
> coming, in that absence of data, the watermark may advance. We need a way to 
> keep it far enough back even in the absence of data holding it back.
> One idea is a new primitive ShiftWatermark / AdjustTimestamps with the 
> following pieces:
>  - A constant duration (positive or negative) D by which to shift the 
> watermark.
>  - A function from TimestampedElement to new timestamp that is >= t + D
> So, for example, AdjustTimestamps(<-60 minutes>, f) would allow f to make 
> timestamps up to 60 minutes earlier.
> With this primitive added, outputWithTimestamp and withAllowedTimestampSkew 
> could be removed, simplifying DoFn.
> Alternatively, all of this functionality could be bolted on to DoFn.
> This ticket is not a proposal, but a record of the issue and ideas that were 
> mentioned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-638) Add a Window function to create a bounded PCollection from an unbounded one

2016-09-20 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507706#comment-15507706
 ] 

Aljoscha Krettek commented on BEAM-638:
---

I don't think it's possible to provide such a function.

What would the semantics of such a function be? That is, when would it consider 
that window "done", what happens to the computation upstream from that 
function/transformation, would it be canceled? Also, which window would be the 
one window that we take, for example from {{FixedWindows}}?

> Add a Window function to create a bounded PCollection from an unbounded one
> ---
>
> Key: BEAM-638
> URL: https://issues.apache.org/jira/browse/BEAM-638
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Jean-Baptiste Onofré
>Assignee: Davor Bonaci
>
> Today, if the pipeline source is unbounded, and the sink expects a bounded 
> collection, there's no way to use a single pipeline. Even a window creates a 
> chunk on the unbounded PCollection, but the "sub" PCollection is still 
> unbounded.
> It would be helpful for users to have a Window function that create a bounded 
> PCollection (on the window) from an unbounded PCollection coming from the 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-260) Know the getSideInputWindow upper bound so can gc side input state

2016-09-17 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15498458#comment-15498458
 ] 

Aljoscha Krettek commented on BEAM-260:
---

The doc looks good, I'm not sure how many people will see this comment on a 
Jira issue so it might make sense to [POPOSE] it on the ML.

> Know the getSideInputWindow upper bound so can gc side input state
> --
>
> Key: BEAM-260
> URL: https://issues.apache.org/jira/browse/BEAM-260
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Mark Shields
>Assignee: Kenneth Knowles
>
> We currently have no static knowledge about the getSideInputWindow function, 
> and runners are thus forced to hold on to all side input state / elements in 
> case a future element reaches back into an earlier side input element.
> Maybe we need an upper bound on lag from current to result of 
> getSideInputWindow so we can have a progressing gc horizon as we do for  GKB 
> window state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-616) Update Flink Runner to 1.1.2

2016-09-10 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-616.
---
   Resolution: Fixed
Fix Version/s: 0.3.0-incubating

> Update Flink Runner to 1.1.2
> 
>
> Key: BEAM-616
> URL: https://issues.apache.org/jira/browse/BEAM-616
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.3.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-616) Update Flink Runner to 1.1.2

2016-09-06 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-616:
-

 Summary: Update Flink Runner to 1.1.2
 Key: BEAM-616
 URL: https://issues.apache.org/jira/browse/BEAM-616
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-615) Add Support for Processing-Time Timers in FlinkRunner Window Operator

2016-09-02 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-615:
-

 Summary: Add Support for Processing-Time Timers in FlinkRunner 
Window Operator
 Key: BEAM-615
 URL: https://issues.apache.org/jira/browse/BEAM-615
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Affects Versions: 0.2.0-incubating, 0.1.0-incubating, 0.3.0-incubating
Reporter: Aljoscha Krettek


The Flink runner never had support for processing-time timers, they are 
silently ignored when a trigger tries to set one. This should be easy to add in 
{{WindowFnOperator}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-485) Can't set Flink runner in code

2016-08-25 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438508#comment-15438508
 ] 

Aljoscha Krettek commented on BEAM-485:
---

Hi [~ecesena], could you check whether this bug is still valid in the current 
master version?

> Can't set Flink runner in code
> --
>
> Key: BEAM-485
> URL: https://issues.apache.org/jira/browse/BEAM-485
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Emanuele Cesena
>
> Calling:
> options.setRunner(FlinkRunner.class);
> doesn't seem to properly set the runner.
> Running --runner=FlinkRunner from the command line works.
> Both approaches were working on 0.1.0, but options.setRunner doesn't seem to 
> work on master anymore.
> I noticed there are tests that only cover the command line case:
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/test/java/org/apache/beam/runners/flink/FlinkRunnerRegistrarTest.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-253) Unify Flink Operator Wrappers

2016-08-25 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-253.
---
   Resolution: Fixed
Fix Version/s: 0.3.0-incubating

Implemented here: 
https://github.com/apache/incubator-beam/commit/1de76b7a5169a46ef9f14406e5a6e1284832f7f9

> Unify Flink Operator Wrappers
> -
>
> Key: BEAM-253
> URL: https://issues.apache.org/jira/browse/BEAM-253
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.3.0-incubating
>
>
> Right now, we have {{FlinkAbstractParDoWrapper}} with subclasses 
> {{FlinkParDoBoundWrapper}} and {{FlinkParDoBoundMultiWrapper}} as well as 
> {{FlinkGroupAlsoByWindowWrapper}}. They do essentially the same thing, but 
> slightly differently. The first three are implemented as a 
> {{FlatMapFunction}} while the latter is implemented as a {{StreamOperator}} 
> (which is more low-level and gives access to state and timers and such).
> We should unify this into one wrapper. (With possibly a more concise name...)
> In the process of this we should also make sure that we always use a 
> {{DoFnRunner}} via {{DoFnRunners.createDefault}}. This will help reduce bugs 
> such as [BEAM-241].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-315) Flink Runner compares keys unencoded which may produce incorrect results

2016-08-25 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-315.
---
Resolution: Fixed

> Flink Runner compares keys unencoded which may produce incorrect results
> 
>
> Key: BEAM-315
> URL: https://issues.apache.org/jira/browse/BEAM-315
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating, 0.2.0-incubating
>Reporter: Pawel Szczur
>Assignee: Aljoscha Krettek
> Fix For: 0.3.0-incubating
>
> Attachments: CoGroupPipelineStringKey.java, execution.log, 
> execution_split.log, execution_split_sorted.log
>
>
> Same keys are processed multiple times.
> A repo to reproduce the bug:
> https://github.com/orian/cogroup-wrong-grouping
> Discussion:
> http://mail-archives.apache.org/mod_mbox/incubator-beam-user/201605.mbox/%3CCAB2uKkG2xHsWpLFUkYnt8eEzdxU%3DB_nu6crTwVi-ZuUpugxkPQ%40mail.gmail.com%3E
> Notice: I haven't tested other runners (didn't manage to configure Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-102) Side Inputs for Streaming

2016-08-25 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-102.
---
   Resolution: Fixed
Fix Version/s: 0.3.0-incubating

Implemented 
here:https://github.com/apache/incubator-beam/commit/dfbdc6c2bbef5e749bfc1800f97d21377f0c713d

> Side Inputs for Streaming
> -
>
> Key: BEAM-102
> URL: https://issues.apache.org/jira/browse/BEAM-102
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
> Fix For: 0.3.0-incubating
>
>
> The Flink Runner supports side inputs for batch mode but its missing support 
> for streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-286) Reorganize flink runner directories

2016-08-25 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436708#comment-15436708
 ] 

Aljoscha Krettek commented on BEAM-286:
---

Now that the PR is in I think the reorganization/cleanup can be performed.

> Reorganize flink runner directories
> ---
>
> Key: BEAM-286
> URL: https://issues.apache.org/jira/browse/BEAM-286
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
> Fix For: Not applicable
>
>
> The flink runner Maven module uses two sub-modules: runner and examples. It's 
> the only one which use such layout (compare to spark, dataflow or 
> inprocess/direct runners).
> I will propose a PR to align flink runner module with the other, keeping the 
> examples in a sub-directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-581) Support Verifiers in TestFlinkRunner

2016-08-24 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-581:
-

 Summary: Support Verifiers in TestFlinkRunner
 Key: BEAM-581
 URL: https://issues.apache.org/jira/browse/BEAM-581
 Project: Beam
  Issue Type: Improvement
Reporter: Aljoscha Krettek


[~jasonkuster] suggested that we should support verifiers to better support E2E 
tests.

See 
https://github.com/apache/incubator-beam/blob/master/examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java
 for an example of how they're used and 
https://github.com/apache/incubator-beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/testing/TestDataflowRunner.java
 for how they are implemented in the TestDataflowRunner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-321) Hash encoded keys in Flink batch mode

2016-08-01 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402815#comment-15402815
 ] 

Aljoscha Krettek commented on BEAM-321:
---

I think we agreed a while back to only put the "fix version" tag once something 
was implemented. So that we don't have to go trough all the issues and remove 
the tag from those that were not actually implemented.

> Hash encoded keys in Flink batch mode
> -
>
> Key: BEAM-321
> URL: https://issues.apache.org/jira/browse/BEAM-321
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
> Fix For: 0.2.0-incubating
>
>
> Right now, hashing of keys happens on the value itself not on the encoded 
> representation. This is at odds with the Beam specification and can lead to 
> incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-322) Compare encoded keys in streaming mode

2016-08-01 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402355#comment-15402355
 ] 

Aljoscha Krettek commented on BEAM-322:
---

The fix for this is included in this commit of my side-input PR: 
https://github.com/apache/incubator-beam/pull/737/commits/ef9e7c480b55764efd03225109819d5a959c825b#diff-e3189955bcae0a6462c697ba467a303b


> Compare encoded keys in streaming mode
> --
>
> Key: BEAM-322
> URL: https://issues.apache.org/jira/browse/BEAM-322
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
> Fix For: 0.3.0-incubating
>
>
> Right now, hashing of keys happens on the value itself not on the encoded 
> representation. This is at odds with the Beam specification and can lead to 
> incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-321) Hash encoded keys in Flink batch mode

2016-08-01 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402329#comment-15402329
 ] 

Aljoscha Krettek commented on BEAM-321:
---

This was fixed a while ago in master and is part of the 0.2.0-incubating 
release.

> Hash encoded keys in Flink batch mode
> -
>
> Key: BEAM-321
> URL: https://issues.apache.org/jira/browse/BEAM-321
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
> Fix For: 0.2.0-incubating
>
>
> Right now, hashing of keys happens on the value itself not on the encoded 
> representation. This is at odds with the Beam specification and can lead to 
> incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-321) Hash encoded keys in Flink batch mode

2016-08-01 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-321:
--
Fix Version/s: (was: 0.3.0-incubating)
   0.2.0-incubating

> Hash encoded keys in Flink batch mode
> -
>
> Key: BEAM-321
> URL: https://issues.apache.org/jira/browse/BEAM-321
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
> Fix For: 0.2.0-incubating
>
>
> Right now, hashing of keys happens on the value itself not on the encoded 
> representation. This is at odds with the Beam specification and can lead to 
> incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-321) Hash encoded keys in Flink batch mode

2016-07-31 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-321:
--
Fix Version/s: (was: 0.2.0-incubating)

> Hash encoded keys in Flink batch mode
> -
>
> Key: BEAM-321
> URL: https://issues.apache.org/jira/browse/BEAM-321
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
>
> Right now, hashing of keys happens on the value itself not on the encoded 
> representation. This is at odds with the Beam specification and can lead to 
> incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-322) Compare encoded keys in streaming mode

2016-07-31 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-322:
--
Fix Version/s: (was: 0.2.0-incubating)

> Compare encoded keys in streaming mode
> --
>
> Key: BEAM-322
> URL: https://issues.apache.org/jira/browse/BEAM-322
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
>
> Right now, hashing of keys happens on the value itself not on the encoded 
> representation. This is at odds with the Beam specification and can lead to 
> incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-322) Compare encoded keys in streaming mode

2016-07-18 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek reassigned BEAM-322:
-

Assignee: Aljoscha Krettek

> Compare encoded keys in streaming mode
> --
>
> Key: BEAM-322
> URL: https://issues.apache.org/jira/browse/BEAM-322
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
> Fix For: 0.2.0-incubating
>
>
> Right now, hashing of keys happens on the value itself not on the encoded 
> representation. This is at odds with the Beam specification and can lead to 
> incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-242) Enable Checkstyle check and Javadoc build for the Flink Runner

2016-07-18 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382327#comment-15382327
 ] 

Aljoscha Krettek commented on BEAM-242:
---

The weekend before last weekend I worked on some stuff that should fix 
BEAM-102, BEAM-253 and BEAM-322. It's all intermingled, that's why it's solving 
all those issues at once.

If you could wait until that is done that would be swell. :-)

> Enable Checkstyle check and Javadoc build for the Flink Runner 
> ---
>
> Key: BEAM-242
> URL: https://issues.apache.org/jira/browse/BEAM-242
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>
> We don't have a Checkstyle check in place for the Flink Runner. I would like 
> to use the SDK's checkstyle rules.
> We could also think about a unified Checkstyle for all Runners.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-253) Unify Flink Operator Wrappers

2016-07-18 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek reassigned BEAM-253:
-

Assignee: Aljoscha Krettek

> Unify Flink Operator Wrappers
> -
>
> Key: BEAM-253
> URL: https://issues.apache.org/jira/browse/BEAM-253
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Right now, we have {{FlinkAbstractParDoWrapper}} with subclasses 
> {{FlinkParDoBoundWrapper}} and {{FlinkParDoBoundMultiWrapper}} as well as 
> {{FlinkGroupAlsoByWindowWrapper}}. They do essentially the same thing, but 
> slightly differently. The first three are implemented as a 
> {{FlatMapFunction}} while the latter is implemented as a {{StreamOperator}} 
> (which is more low-level and gives access to state and timers and such).
> We should unify this into one wrapper. (With possibly a more concise name...)
> In the process of this we should also make sure that we always use a 
> {{DoFnRunner}} via {{DoFnRunners.createDefault}}. This will help reduce bugs 
> such as [BEAM-241].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-438) Rename one of PTransform.apply and PInput.apply

2016-07-12 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372575#comment-15372575
 ] 

Aljoscha Krettek commented on BEAM-438:
---

+1 I've seen people do the former several times.

> Rename one of PTransform.apply and PInput.apply
> ---
>
> Key: BEAM-438
> URL: https://issues.apache.org/jira/browse/BEAM-438
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>
> Before releasing Beam 1.0, we should do this.
> Right now, it's legal to call:
> {{ptransform.apply(input)}}
> and 
> {{input.apply(ptransform)}}
> when only the latter is correct. The former skips various validation steps 
> and loses the notion of composite transforms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-300) Upgrade to Flink 1.0.3

2016-07-11 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-300.
-
   Resolution: Duplicate
Fix Version/s: 0.2.0-incubating

Flink was updated to 1.0.3 a while back.

> Upgrade to Flink 1.0.3
> --
>
> Key: BEAM-300
> URL: https://issues.apache.org/jira/browse/BEAM-300
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Jean-Baptiste Onofré
> Fix For: 0.2.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-392) Update flink runner to use flink version 1.0.3

2016-07-01 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-392.
---
   Resolution: Fixed
Fix Version/s: 0.2.0-incubating

> Update flink runner to use flink version 1.0.3
> --
>
> Key: BEAM-392
> URL: https://issues.apache.org/jira/browse/BEAM-392
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Ismaël Mejía
>Priority: Trivial
> Fix For: 0.2.0-incubating
>
>
> Flink has released a new minor stable version not included in Beam yet. Some 
> of the issues solved in Flink could benefit the users of the Beam runner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-321) Hash encoded keys in Flink batch mode

2016-06-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-321.
---
   Resolution: Fixed
Fix Version/s: (was: 0.1.0-incubating)
   0.2.0-incubating

> Hash encoded keys in Flink batch mode
> -
>
> Key: BEAM-321
> URL: https://issues.apache.org/jira/browse/BEAM-321
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
> Fix For: 0.2.0-incubating
>
>
> Right now, hashing of keys happens on the value itself not on the encoded 
> representation. This is at odds with the Beam specification and can lead to 
> incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-297) version typo at README.md of flink runner

2016-06-07 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-297.
---
   Resolution: Fixed
Fix Version/s: (was: 0.1.0-incubating)
   0.2.0-incubating

> version typo at README.md of flink runner
> -
>
> Key: BEAM-297
> URL: https://issues.apache.org/jira/browse/BEAM-297
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Jianfeng Qian
>Priority: Trivial
>  Labels: easyfix
> Fix For: 0.2.0-incubating
>
>
> version typo at README.md of flink runner
> at line 145, the version should be "0.1.0-incubating-SNAPSHOT" instead of 
> "0.4-SNAPSHOT"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-315) Flink Runner compares keys unencoded which may produce incorrect results

2016-06-02 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312560#comment-15312560
 ] 

Aljoscha Krettek commented on BEAM-315:
---

Thanks for the updates! I'll keep investigating.

> Flink Runner compares keys unencoded which may produce incorrect results
> 
>
> Key: BEAM-315
> URL: https://issues.apache.org/jira/browse/BEAM-315
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Pawel Szczur
>Assignee: Aljoscha Krettek
> Attachments: CoGroupPipelineStringKey.java, execution.log, 
> execution_split.log, execution_split_sorted.log
>
>
> Same keys are processed multiple times.
> A repo to reproduce the bug:
> https://github.com/orian/cogroup-wrong-grouping
> Discussion:
> http://mail-archives.apache.org/mod_mbox/incubator-beam-user/201605.mbox/%3CCAB2uKkG2xHsWpLFUkYnt8eEzdxU%3DB_nu6crTwVi-ZuUpugxkPQ%40mail.gmail.com%3E
> Notice: I haven't tested other runners (didn't manage to configure Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-322) Compare encoded keys in streaming mode

2016-06-01 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-322:
--
Description: Right now, hashing of keys happens on the value itself not on 
the encoded representation. This is at odds with the Beam specification and can 
lead to incorrect results.

> Compare encoded keys in streaming mode
> --
>
> Key: BEAM-322
> URL: https://issues.apache.org/jira/browse/BEAM-322
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Maximilian Michels
> Fix For: 0.1.0-incubating
>
>
> Right now, hashing of keys happens on the value itself not on the encoded 
> representation. This is at odds with the Beam specification and can lead to 
> incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-321) Hash encoded keys in Flink batch mode

2016-06-01 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-321:
--
Description: Right now, hashing of keys happens on the value itself not on 
the encoded representation. This is at odds with the Beam specification and can 
lead to incorrect results.

> Hash encoded keys in Flink batch mode
> -
>
> Key: BEAM-321
> URL: https://issues.apache.org/jira/browse/BEAM-321
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
> Fix For: 0.1.0-incubating
>
>
> Right now, hashing of keys happens on the value itself not on the encoded 
> representation. This is at odds with the Beam specification and can lead to 
> incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-321) Hash encoded keys in Flink batch mode

2016-06-01 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-321:
--
Summary: Hash encoded keys in Flink batch mode  (was: Compare encoded keys 
in batch mode)

> Hash encoded keys in Flink batch mode
> -
>
> Key: BEAM-321
> URL: https://issues.apache.org/jira/browse/BEAM-321
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
> Fix For: 0.1.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-315) Flink Runner compares keys unencoded which may produce incorrect results

2016-06-01 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310277#comment-15310277
 ] 

Aljoscha Krettek commented on BEAM-315:
---

I think we should make this issue into two issues. Leave this one as an issue 
for batch, I already have a fix for this. Then create a new issue for the 
streaming side, which is a bit more complicated. This way, we can correctly 
track progress.

What do you say [~mxm]?

> Flink Runner compares keys unencoded which may produce incorrect results
> 
>
> Key: BEAM-315
> URL: https://issues.apache.org/jira/browse/BEAM-315
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Pawel Szczur
>Assignee: Aljoscha Krettek
> Attachments: CoGroupPipelineStringKey.java
>
>
> Same keys are processed multiple times.
> A repo to reproduce the bug:
> https://github.com/orian/cogroup-wrong-grouping
> Discussion:
> http://mail-archives.apache.org/mod_mbox/incubator-beam-user/201605.mbox/%3CCAB2uKkG2xHsWpLFUkYnt8eEzdxU%3DB_nu6crTwVi-ZuUpugxkPQ%40mail.gmail.com%3E
> Notice: I haven't tested other runners (didn't manage to configure Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-295) Flink Create Functions call Collector.close()

2016-06-01 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-295.
---
   Resolution: Fixed
Fix Version/s: 0.1.0-incubating

> Flink Create Functions call Collector.close()
> -
>
> Key: BEAM-295
> URL: https://issues.apache.org/jira/browse/BEAM-295
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.1.0-incubating
>
>
> {{Collector.close()}} should only be called internally, by Flink. Calling 
> close() in the user function, as we do in {{FlinkCreateFunction}} and 
> {{FlinkStreamingCreateFunction}} will lead to downstream operations being 
> closed twice, which can lead to faulty behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-315) GroupByKey/CoGroupByKey doesn't group correctly in batch mode of FlinkPipelineRunner

2016-06-01 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309976#comment-15309976
 ] 

Aljoscha Krettek commented on BEAM-315:
---

I played around this morning and finally found the problem thanks to 
[~frances]'s remark.

The problem is that the Flink {{KvCoderComparator}} (this is used to do 
comparisons/hasing for shuffle and sorting) uses the hash of the key instead of 
the hash of the encoded key while using the encoded key for doing comparisons. 
I already fixed it, but I want to include a test for this as well. 
[~pawelszc...@gmail.com], can I include a modified version of your program as a 
test?

> GroupByKey/CoGroupByKey doesn't group correctly in batch mode of 
> FlinkPipelineRunner
> 
>
> Key: BEAM-315
> URL: https://issues.apache.org/jira/browse/BEAM-315
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Pawel Szczur
> Attachments: CoGroupPipelineStringKey.java
>
>
> Same keys are processed multiple times.
> A repo to reproduce the bug:
> https://github.com/orian/cogroup-wrong-grouping
> Discussion:
> http://mail-archives.apache.org/mod_mbox/incubator-beam-user/201605.mbox/%3CCAB2uKkG2xHsWpLFUkYnt8eEzdxU%3DB_nu6crTwVi-ZuUpugxkPQ%40mail.gmail.com%3E
> Notice: I haven't tested other runners (didn't manage to configure Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-315) GroupByKey/CoGroupByKey doesn't group correctly in batch mode of FlinkPipelineRunner

2016-06-01 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek reassigned BEAM-315:
-

Assignee: Aljoscha Krettek

> GroupByKey/CoGroupByKey doesn't group correctly in batch mode of 
> FlinkPipelineRunner
> 
>
> Key: BEAM-315
> URL: https://issues.apache.org/jira/browse/BEAM-315
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Pawel Szczur
>Assignee: Aljoscha Krettek
> Attachments: CoGroupPipelineStringKey.java
>
>
> Same keys are processed multiple times.
> A repo to reproduce the bug:
> https://github.com/orian/cogroup-wrong-grouping
> Discussion:
> http://mail-archives.apache.org/mod_mbox/incubator-beam-user/201605.mbox/%3CCAB2uKkG2xHsWpLFUkYnt8eEzdxU%3DB_nu6crTwVi-ZuUpugxkPQ%40mail.gmail.com%3E
> Notice: I haven't tested other runners (didn't manage to configure Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-315) GroupByKey/CoGroupByKey doesn't group correctly with FlinkPipelineRunner

2016-05-31 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307827#comment-15307827
 ] 

Aljoscha Krettek commented on BEAM-315:
---

I attached a version that uses a {{String}} as key. With this, the results are 
also wrong but "less wrong" than with the {{Key}} class. I think the problem 
with having {{Key}} as a key is that {{AvroCoder.consistentWithEquals()}} is 
{{false}} and the Flink runner uses the serialized bytes to do comparisons. Not 
sure how the Dataflow runner deals with this, though. Also, once data is 
sufficiently large for the bug to appear the pipeline can not be executed on 
the {{DirectPipelineRunner}} or the {{InProcessPipelineRunner}} because both 
fail with a OOM exception.

> GroupByKey/CoGroupByKey doesn't group correctly with FlinkPipelineRunner
> 
>
> Key: BEAM-315
> URL: https://issues.apache.org/jira/browse/BEAM-315
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Pawel Szczur
> Attachments: CoGroupPipelineStringKey.java
>
>
> Same keys are processed multiple times.
> A repo to reproduce the bug:
> https://github.com/orian/cogroup-wrong-grouping
> Discussion:
> http://mail-archives.apache.org/mod_mbox/incubator-beam-user/201605.mbox/%3CCAB2uKkG2xHsWpLFUkYnt8eEzdxU%3DB_nu6crTwVi-ZuUpugxkPQ%40mail.gmail.com%3E
> Notice: I haven't tested other runners (didn't manage to configure Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-315) GroupByKey/CoGroupByKey doesn't group correctly with FlinkPipelineRunner

2016-05-31 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-315:
--
Attachment: CoGroupPipelineStringKey.java

This is a version of the same program that uses a String key instead of the 
{{Key}} class.

> GroupByKey/CoGroupByKey doesn't group correctly with FlinkPipelineRunner
> 
>
> Key: BEAM-315
> URL: https://issues.apache.org/jira/browse/BEAM-315
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Pawel Szczur
> Attachments: CoGroupPipelineStringKey.java
>
>
> Same keys are processed multiple times.
> A repo to reproduce the bug:
> https://github.com/orian/cogroup-wrong-grouping
> Discussion:
> http://mail-archives.apache.org/mod_mbox/incubator-beam-user/201605.mbox/%3CCAB2uKkG2xHsWpLFUkYnt8eEzdxU%3DB_nu6crTwVi-ZuUpugxkPQ%40mail.gmail.com%3E
> Notice: I haven't tested other runners (didn't manage to configure Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-307) Upgrade/Test to Kafka 0.10

2016-05-25 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300147#comment-15300147
 ] 

Aljoscha Krettek commented on BEAM-307:
---

It might be that we have to create separate packages for Kafka 0.9 and Kafka 
0.10 because they have different features. For example, Kafka 0.10 has support 
for timestamps in elements.

> Upgrade/Test to Kafka 0.10
> --
>
> Key: BEAM-307
> URL: https://issues.apache.org/jira/browse/BEAM-307
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> I gonna test at least that the KafkaIO works fine with Kafka 0.10 (I'm 
> preparing new complete samples around that).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-286) Reorganize flink runner directories

2016-05-22 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295691#comment-15295691
 ] 

Aljoscha Krettek commented on BEAM-286:
---

For the Flink runner there is already such a separation. The core runner and 
tests reside in {{flink-runner_2.10}} while the examples are in 
{{flink-runner-examples_2.10}}. The fact that they share a parent POM is not 
visible to anyone using Maven, AFAIK.

We should get rid of the examples in {{flink-runner-examples}} that are only 
copies of Beam examples (when they were still Dataflow examples). The only file 
left would be {{KafkaIOExamples}}, which I'm hoping can go away once we merge 
the Kafka sink support in KafkaIO.

> Reorganize flink runner directories
> ---
>
> Key: BEAM-286
> URL: https://issues.apache.org/jira/browse/BEAM-286
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
> Fix For: 0.1.0-incubating
>
>
> The flink runner Maven module uses two sub-modules: runner and examples. It's 
> the only one which use such layout (compare to spark, dataflow or 
> inprocess/direct runners).
> I will propose a PR to align flink runner module with the other, keeping the 
> examples in a sub-directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-258) Execute selected RunnableOnService tests with Flink runner

2016-05-20 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-258.
---
   Resolution: Fixed
Fix Version/s: 0.1.0-incubating

> Execute selected RunnableOnService tests with Flink runner
> --
>
> Key: BEAM-258
> URL: https://issues.apache.org/jira/browse/BEAM-258
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink
>Reporter: Kenneth Knowles
> Fix For: 0.1.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-270) Support Timestamps/Windows in Flink Batch

2016-05-20 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-270.
---
   Resolution: Fixed
Fix Version/s: 0.1.0-incubating

> Support Timestamps/Windows in Flink Batch
> -
>
> Key: BEAM-270
> URL: https://issues.apache.org/jira/browse/BEAM-270
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.1.0-incubating
>
>
> Right now, Flink Batch execution does not use {{WindowedValue}} internally, 
> this means that all programs that interact with timestamps/windows will not 
> work. We should just internally wrap everything in {{WindowedValue}} as we do 
> in Flink Streaming. This also makes it very straightforward to add support 
> for windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-297) version typo at README.md of flink runner

2016-05-19 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290997#comment-15290997
 ] 

Aljoscha Krettek commented on BEAM-297:
---

Now that the issue is created could you please update your commit/pull request 
to include the issue number from here?

> version typo at README.md of flink runner
> -
>
> Key: BEAM-297
> URL: https://issues.apache.org/jira/browse/BEAM-297
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Jianfeng Qian
>Priority: Trivial
>  Labels: easyfix
> Fix For: 0.1.0-incubating
>
>
> version typo at README.md of flink runner
> at line 145, the version should be "0.1.0-incubating-SNAPSHOT" instead of 
> "0.4-SNAPSHOT"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-295) Flink Create Functions call Collector.close()

2016-05-18 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-295:
-

 Summary: Flink Create Functions call Collector.close()
 Key: BEAM-295
 URL: https://issues.apache.org/jira/browse/BEAM-295
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek


{{Collector.close()}} should only be called internally, by Flink. Calling 
close() in the user function, as we do in {{FlinkCreateFunction}} and 
{{FlinkStreamingCreateFunction}} will lead to downstream operations being 
closed twice, which can lead to faulty behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-131) Write to jdbc/database

2016-05-17 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286673#comment-15286673
 ] 

Aljoscha Krettek commented on BEAM-131:
---

Can this one be closed now that it is superseded by the more general BEAM-244?

> Write to jdbc/database 
> ---
>
> Key: BEAM-131
> URL: https://issues.apache.org/jira/browse/BEAM-131
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-273) Update Flink Runner version

2016-05-17 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286676#comment-15286676
 ] 

Aljoscha Krettek commented on BEAM-273:
---

Are there further steps towards updating the Flink version or can this be 
closed?

> Update Flink Runner version
> ---
>
> Key: BEAM-273
> URL: https://issues.apache.org/jira/browse/BEAM-273
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>
> This is a parent issue to keep track of steps to bump the version of the 
> Flink Runner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-48) BigQueryIO.Read reimplemented as BoundedSource

2016-05-17 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286670#comment-15286670
 ] 

Aljoscha Krettek commented on BEAM-48:
--

[~dhalp...@google.com] I saw that you merged the PR. Can we now close this and 
BEAM-128 or is there still some part missing?

> BigQueryIO.Read reimplemented as BoundedSource
> --
>
> Key: BEAM-48
> URL: https://issues.apache.org/jira/browse/BEAM-48
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Pei He
>
> BigQueryIO.Read is currently implemented in a hacky way: the 
> DirectPipelineRunner streams all rows in the table or query result directly 
> using the JSON API, in a single-threaded manner.
> In contrast, the DataflowPipelineRunner uses an entirely different code path 
> implemented in the Google Cloud Dataflow service. (A BigQuery export job to 
> GCS, followed by a parallel read from GCS).
> We need to reimplement BigQueryIO as a BoundedSource in order to support 
> other runners in a scalable way.
> I additionally suggest that we revisit the design of the BigQueryIO source in 
> the process. A short list:
> * Do not use TableRow as the default value for rows. It could be Map Object> with well-defined types, for example, or an Avro GenericRecord. 
> Dropping TableRow will get around a variety of issues with types, fields 
> named 'f', etc., and it will also reduce confusion as we use TableRow objects 
> differently than usual (for good reason).
> * We could also directly add support for a RowParser to a user's POJO.
> * We should expose TableSchema as a side output from the BigQueryIO.Read.
> * Our builders for BigQueryIO.Read are useful and we should keep them. Where 
> possible we should also allow users to provide the JSON objects that 
> configure the underlying intermediate tables, query export, etc. This would 
> let users directly control result flattening, location of intermediate 
> tables, table decorators, etc., and also optimistically let users take 
> advantage of some new BigQuery features without code changes.
> * We could use switch between whether we use a BigQuery export + parallel 
> scan vs API read based on factors such as the size of the table at pipeline 
> construction time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-291) PDone type translation fails

2016-05-17 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286608#comment-15286608
 ] 

Aljoscha Krettek commented on BEAM-291:
---

Also, I think {{PDone}} is never to be used as the return type of a {{DoFn}} or 
any operation that produces data. It is just the {{OutputT}} of a 
{{PTransform}} when we want to signify that it is a terminal operation.

> PDone type translation fails
> 
>
> Key: BEAM-291
> URL: https://issues.apache.org/jira/browse/BEAM-291
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>
> The {{PDone}} output type is currently not supported by the Flink Runner 
> because it doesn't have a Coder associated. This could also get in the way 
> when translating native Beam sinks which would likely return PDone.
> The simplest solution is to create a dummy PDone coder. Alternatively, we 
> could check for the PDone return type during translation and not retrieve the 
> coder at all.
> {noformat}
> Exception in thread "main" java.lang.IllegalStateException: Unable to return 
> a default Coder for AnonymousParDo.out [PCollection]. Correct one of the 
> following root causes:
>   No Coder has been manually specified;  you may do so using .setCoder().
>   Inferring a Coder from the CoderRegistry failed: Unable to provide a 
> default Coder for org.apache.beam.sdk.values.PDone. Correct one of the 
> following root causes:
>   Building a Coder using a registered CoderFactory failed: Cannot provide 
> coder based on value with class org.apache.beam.sdk.values.PDone: No 
> CoderFactory has been registered for the class.
>   Building a Coder from the @DefaultCoder annotation failed: Class 
> org.apache.beam.sdk.values.PDone does not have a @DefaultCoder annotation.
>   Building a Coder from the fallback CoderProvider failed: Cannot provide 
> coder for type org.apache.beam.sdk.values.PDone: 
> org.apache.beam.sdk.coders.protobuf.ProtoCoder$1@72ef8d15 could not provide a 
> Coder for type org.apache.beam.sdk.values.PDone: Cannot provide ProtoCoder 
> because org.apache.beam.sdk.values.PDone is not a subclass of 
> com.google.protobuf.Message; 
> org.apache.beam.sdk.coders.SerializableCoder$1@6aa8e115 could not provide a 
> Coder for type org.apache.beam.sdk.values.PDone: Cannot provide 
> SerializableCoder because org.apache.beam.sdk.values.PDone does not implement 
> Serializable.
>   Using the default output Coder from the producing PTransform failed: Unable 
> to provide a default Coder for org.apache.beam.sdk.values.PDone. Correct one 
> of the following root causes:
>   Building a Coder using a registered CoderFactory failed: Cannot provide 
> coder based on value with class org.apache.beam.sdk.values.PDone: No 
> CoderFactory has been registered for the class.
>   Building a Coder from the @DefaultCoder annotation failed: Class 
> org.apache.beam.sdk.values.PDone does not have a @DefaultCoder annotation.
>   Building a Coder from the fallback CoderProvider failed: Cannot provide 
> coder for type org.apache.beam.sdk.values.PDone: 
> org.apache.beam.sdk.coders.protobuf.ProtoCoder$1@72ef8d15 could not provide a 
> Coder for type org.apache.beam.sdk.values.PDone: Cannot provide ProtoCoder 
> because org.apache.beam.sdk.values.PDone is not a subclass of 
> com.google.protobuf.Message; 
> org.apache.beam.sdk.coders.SerializableCoder$1@6aa8e115 could not provide a 
> Coder for type org.apache.beam.sdk.values.PDone: Cannot provide 
> SerializableCoder because org.apache.beam.sdk.values.PDone does not implement 
> Serializable.
>   at 
> org.apache.beam.sdk.values.TypedPValue.inferCoderOrFail(TypedPValue.java:196)
>   at org.apache.beam.sdk.values.TypedPValue.getCoder(TypedPValue.java:49)
>   at org.apache.beam.sdk.values.PCollection.getCoder(PCollection.java:138)
>   at 
> org.apache.beam.runners.flink.translation.FlinkStreamingTransformTranslators$ParDoBoundStreamingTranslator.translateNode(FlinkStreamingTransformTranslators.java:315)
>   at 
> org.apache.beam.runners.flink.translation.FlinkStreamingTransformTranslators$ParDoBoundStreamingTranslator.translateNode(FlinkStreamingTransformTranslators.java:305)
>   at 
> org.apache.beam.runners.flink.translation.FlinkStreamingPipelineTranslator.applyStreamingTransform(FlinkStreamingPipelineTranslator.java:108)
>   at 
> org.apache.beam.runners.flink.translation.FlinkStreamingPipelineTranslator.visitPrimitiveTransform(FlinkStreamingPipelineTranslator.java:89)
>   at 
> org.apache.beam.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:225)
>   at 
> org.apache.beam.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:220)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:104)
>   

[jira] [Commented] (BEAM-291) PDone type translation fails

2016-05-17 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286597#comment-15286597
 ] 

Aljoscha Krettek commented on BEAM-291:
---

Could you maybe post the code for the example? With my latest PR for the Flink 
Batch runner it uses the native Beam sinks without problems.

> PDone type translation fails
> 
>
> Key: BEAM-291
> URL: https://issues.apache.org/jira/browse/BEAM-291
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>
> The {{PDone}} output type is currently not supported by the Flink Runner 
> because it doesn't have a Coder associated. This could also get in the way 
> when translating native Beam sinks which would likely return PDone.
> The simplest solution is to create a dummy PDone coder. Alternatively, we 
> could check for the PDone return type during translation and not retrieve the 
> coder at all.
> {noformat}
> Exception in thread "main" java.lang.IllegalStateException: Unable to return 
> a default Coder for AnonymousParDo.out [PCollection]. Correct one of the 
> following root causes:
>   No Coder has been manually specified;  you may do so using .setCoder().
>   Inferring a Coder from the CoderRegistry failed: Unable to provide a 
> default Coder for org.apache.beam.sdk.values.PDone. Correct one of the 
> following root causes:
>   Building a Coder using a registered CoderFactory failed: Cannot provide 
> coder based on value with class org.apache.beam.sdk.values.PDone: No 
> CoderFactory has been registered for the class.
>   Building a Coder from the @DefaultCoder annotation failed: Class 
> org.apache.beam.sdk.values.PDone does not have a @DefaultCoder annotation.
>   Building a Coder from the fallback CoderProvider failed: Cannot provide 
> coder for type org.apache.beam.sdk.values.PDone: 
> org.apache.beam.sdk.coders.protobuf.ProtoCoder$1@72ef8d15 could not provide a 
> Coder for type org.apache.beam.sdk.values.PDone: Cannot provide ProtoCoder 
> because org.apache.beam.sdk.values.PDone is not a subclass of 
> com.google.protobuf.Message; 
> org.apache.beam.sdk.coders.SerializableCoder$1@6aa8e115 could not provide a 
> Coder for type org.apache.beam.sdk.values.PDone: Cannot provide 
> SerializableCoder because org.apache.beam.sdk.values.PDone does not implement 
> Serializable.
>   Using the default output Coder from the producing PTransform failed: Unable 
> to provide a default Coder for org.apache.beam.sdk.values.PDone. Correct one 
> of the following root causes:
>   Building a Coder using a registered CoderFactory failed: Cannot provide 
> coder based on value with class org.apache.beam.sdk.values.PDone: No 
> CoderFactory has been registered for the class.
>   Building a Coder from the @DefaultCoder annotation failed: Class 
> org.apache.beam.sdk.values.PDone does not have a @DefaultCoder annotation.
>   Building a Coder from the fallback CoderProvider failed: Cannot provide 
> coder for type org.apache.beam.sdk.values.PDone: 
> org.apache.beam.sdk.coders.protobuf.ProtoCoder$1@72ef8d15 could not provide a 
> Coder for type org.apache.beam.sdk.values.PDone: Cannot provide ProtoCoder 
> because org.apache.beam.sdk.values.PDone is not a subclass of 
> com.google.protobuf.Message; 
> org.apache.beam.sdk.coders.SerializableCoder$1@6aa8e115 could not provide a 
> Coder for type org.apache.beam.sdk.values.PDone: Cannot provide 
> SerializableCoder because org.apache.beam.sdk.values.PDone does not implement 
> Serializable.
>   at 
> org.apache.beam.sdk.values.TypedPValue.inferCoderOrFail(TypedPValue.java:196)
>   at org.apache.beam.sdk.values.TypedPValue.getCoder(TypedPValue.java:49)
>   at org.apache.beam.sdk.values.PCollection.getCoder(PCollection.java:138)
>   at 
> org.apache.beam.runners.flink.translation.FlinkStreamingTransformTranslators$ParDoBoundStreamingTranslator.translateNode(FlinkStreamingTransformTranslators.java:315)
>   at 
> org.apache.beam.runners.flink.translation.FlinkStreamingTransformTranslators$ParDoBoundStreamingTranslator.translateNode(FlinkStreamingTransformTranslators.java:305)
>   at 
> org.apache.beam.runners.flink.translation.FlinkStreamingPipelineTranslator.applyStreamingTransform(FlinkStreamingPipelineTranslator.java:108)
>   at 
> org.apache.beam.runners.flink.translation.FlinkStreamingPipelineTranslator.visitPrimitiveTransform(FlinkStreamingPipelineTranslator.java:89)
>   at 
> org.apache.beam.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:225)
>   at 
> org.apache.beam.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:220)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:104)
>   at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:292)
> 

[jira] [Commented] (BEAM-286) Reorganize flink runner directories

2016-05-17 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286468#comment-15286468
 ] 

Aljoscha Krettek commented on BEAM-286:
---

+1, I think the examples can be removed. AFAIK they are only copies of Dataflow 
(not a mistake) examples that we added when the code was still in a separate 
repository.

> Reorganize flink runner directories
> ---
>
> Key: BEAM-286
> URL: https://issues.apache.org/jira/browse/BEAM-286
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> The flink runner Maven module uses two sub-modules: runner and examples. It's 
> the only one which use such layout (compare to spark, dataflow or 
> inprocess/direct runners).
> I will propose a PR to align flink runner module with the other, keeping the 
> examples in a sub-directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-283) CheckpointMark.finalize() is not called in Flink Source Wrapper

2016-05-13 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-283:
-

 Summary: CheckpointMark.finalize() is not called in Flink Source 
Wrapper
 Key: BEAM-283
 URL: https://issues.apache.org/jira/browse/BEAM-283
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Reporter: Aljoscha Krettek






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-236) Implement Windowing in batch execution

2016-05-11 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-236.
-
Resolution: Duplicate

Superseded by BEAM-270

> Implement Windowing in batch execution
> --
>
> Key: BEAM-236
> URL: https://issues.apache.org/jira/browse/BEAM-236
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>
> Windows need to be handled correctly in the batched execution of the Flink 
> Runner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-270) Use WindowedValue in Flink Batch

2016-05-11 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek reassigned BEAM-270:
-

Assignee: Aljoscha Krettek

> Use WindowedValue in Flink Batch
> 
>
> Key: BEAM-270
> URL: https://issues.apache.org/jira/browse/BEAM-270
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Right now, Flink Batch execution does not use {{WindowedValue}} internally, 
> this means that all programs that interact with timestamps/windows will not 
> work. We should just internally wrap everything in {{WindowedValue}} as we do 
> in Flink Streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-270) Support Timestamps/Windows in Flink Batch

2016-05-11 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-270:
--
Summary: Support Timestamps/Windows in Flink Batch  (was: Use WindowedValue 
in Flink Batch)

> Support Timestamps/Windows in Flink Batch
> -
>
> Key: BEAM-270
> URL: https://issues.apache.org/jira/browse/BEAM-270
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Right now, Flink Batch execution does not use {{WindowedValue}} internally, 
> this means that all programs that interact with timestamps/windows will not 
> work. We should just internally wrap everything in {{WindowedValue}} as we do 
> in Flink Streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-103) Make UnboundedSourceWrapper parallel

2016-05-10 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-103.
---
Resolution: Fixed

> Make UnboundedSourceWrapper parallel
> 
>
> Key: BEAM-103
> URL: https://issues.apache.org/jira/browse/BEAM-103
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
>
> As of now {{UnboundedSource}} s are executed with a parallelism of 1 
> regardless of the splits which the source returns. The corresponding 
> {{UnboundedSourceWrapper}} should implement {{RichParallelSourceFunction}} 
> and deal with splits correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-130) Checkpointing of custom sources and sinks

2016-05-10 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-130.
---
Resolution: Fixed

Resolved in this (closed) PR: https://github.com/apache/incubator-beam/pull/274

> Checkpointing of custom sources and sinks
> -
>
> Key: BEAM-130
> URL: https://issues.apache.org/jira/browse/BEAM-130
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kostas Kloudas
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-270) Use WindowedValue in Flink Batch

2016-05-09 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-270:
--
Issue Type: Sub-task  (was: Improvement)
Parent: BEAM-258

> Use WindowedValue in Flink Batch
> 
>
> Key: BEAM-270
> URL: https://issues.apache.org/jira/browse/BEAM-270
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>
> Right now, Flink Batch execution does not use {{WindowedValue}} internally, 
> this means that all programs that interact with timestamps/windows will not 
> work. We should just internally wrap everything in {{WindowedValue}} as we do 
> in Flink Streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-260) Know the getSideInputWindow upper bound so can gc side input state

2016-05-07 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275128#comment-15275128
 ] 

Aljoscha Krettek commented on BEAM-260:
---

I thought about this as well while working on the Flink Streaming side input 
support. Could it be enough to have something like 
{{WindowFn.getSideInputCleanupTime(BoundedWindow)}} that tells you when you can 
GC a side input window based on the main-input watermark. This would be called 
on the WindowFn of the side input, since it knows how the main-input windows 
are mapped to side inputs.

> Know the getSideInputWindow upper bound so can gc side input state
> --
>
> Key: BEAM-260
> URL: https://issues.apache.org/jira/browse/BEAM-260
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Mark Shields
>Assignee: Frances Perry
>
> We currently have no static knowledge about the getSideInputWindow function, 
> and runners are thus forced to hold on to all side input state / elements in 
> case a future element reaches back into an earlier side input element.
> Maybe we need an upper bound on lag from current to result of 
> getSideInputWindow so we can have a progressing gc horizon as we do for  GKB 
> window state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-242) Enable Checkstyle check for the Flink Runner

2016-05-06 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274971#comment-15274971
 ] 

Aljoscha Krettek commented on BEAM-242:
---

The PR is merged but checkstyle is not yet globally enabled, right? We should 
get this fixed rather quickly since we're moving fast in the Flink code right 
now.

> Enable Checkstyle check for the Flink Runner 
> -
>
> Key: BEAM-242
> URL: https://issues.apache.org/jira/browse/BEAM-242
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>
> We don't have a Checkstyle check in place for the Flink Runner. I would like 
> to use the SDK's checkstyle rules.
> We could also think about a unified Checkstyle for all Runners.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-216) Create Storm Runner

2016-05-06 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274961#comment-15274961
 ] 

Aljoscha Krettek commented on BEAM-216:
---

Btw, this is a duplicate of BEAM-9.

> Create Storm Runner 
> 
>
> Key: BEAM-216
> URL: https://issues.apache.org/jira/browse/BEAM-216
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Sriharsha Chintalapani
>Assignee: Sriharsha Chintalapani
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-140) Improve examples provided with the Flink runner

2016-05-03 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268726#comment-15268726
 ] 

Aljoscha Krettek commented on BEAM-140:
---

+1 for unification 

> Improve examples provided with the Flink runner
> ---
>
> Key: BEAM-140
> URL: https://issues.apache.org/jira/browse/BEAM-140
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>
> The examples should be backed by test cases. Further, they should print 
> meaningful error messages, e.g. on missing input/output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-129) Support pubsub IO

2016-05-03 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-129.
---
Resolution: Invalid

Subsumed by BEAM-53

> Support pubsub IO
> -
>
> Key: BEAM-129
> URL: https://issues.apache.org/jira/browse/BEAM-129
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kostas Kloudas
>
> Support pubsub IO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >