[jira] [Commented] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-02-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352612#comment-16352612
 ] 

Jean-Baptiste Onofré commented on BEAM-3587:


It's not a problem in Beam codebase itself, so not a blocker for the release. 
More a website issue to add a note on the runners (spark, flink, others as they 
all work the same way).

> User reports TextIO failure in FlinkRunner on master
> 
>
> Key: BEAM-3587
> URL: https://issues.apache.org/jira/browse/BEAM-3587
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
> Attachments: screen1.png, screen2.png
>
>
> Reported here: 
> [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E]
> "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink 
> cluster, using the latest Beam git revision (ff37337). The job fails to start 
> with the Exception:
>   {{java.lang.UnsupportedOperationException: The transform  is currently not 
> supported.}}
> It does work with Beam 2.2.0 though. All code, logs, and reproduction steps  
> [https://github.com/pelletier/beam-flink-example];
> My initial thoughts: I have a guess that this has to do with switching to 
> running from a portable pipeline representation, and it looks like there's a 
> non-composite transform with an empty URN and it threw a bad error message. 
> We can try to root cause but may also mitigate short-term by removing the 
> round-trip through pipeline proto for now.
> What is curious is that the ValidatesRunner and WordCountIT are working - 
> they only run on a local Flink, yet this seems to be a translation issue that 
> would occur for local or distributed runs.
> We need to certainly run this repro on the RC if we don't totally get to the 
> bottom of it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-02-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352590#comment-16352590
 ] 

Jean-Baptiste Onofré commented on BEAM-3587:


I created a PR on the user example to fix the issue: 
https://github.com/pelletier/beam-flink-example/pull/1

I also took screenshots that it runs fine on Flink cluster.

I'm adding a note in Beam documentation as some other users might have the same 
issue.

> User reports TextIO failure in FlinkRunner on master
> 
>
> Key: BEAM-3587
> URL: https://issues.apache.org/jira/browse/BEAM-3587
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Jean-Baptiste Onofré
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: screen1.png, screen2.png
>
>
> Reported here: 
> [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E]
> "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink 
> cluster, using the latest Beam git revision (ff37337). The job fails to start 
> with the Exception:
>   {{java.lang.UnsupportedOperationException: The transform  is currently not 
> supported.}}
> It does work with Beam 2.2.0 though. All code, logs, and reproduction steps  
> [https://github.com/pelletier/beam-flink-example];
> My initial thoughts: I have a guess that this has to do with switching to 
> running from a portable pipeline representation, and it looks like there's a 
> non-composite transform with an empty URN and it threw a bad error message. 
> We can try to root cause but may also mitigate short-term by removing the 
> round-trip through pipeline proto for now.
> What is curious is that the ValidatesRunner and WordCountIT are working - 
> they only run on a local Flink, yet this seems to be a translation issue that 
> would occur for local or distributed runs.
> We need to certainly run this repro on the RC if we don't totally get to the 
> bottom of it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-02-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352587#comment-16352587
 ] 

Jean-Baptiste Onofré commented on BEAM-3587:


As we have:

{code}

org.apache.maven.plugins
maven-shade-plugin
${maven-shade-plugin.version}


false


*:*

META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA






package

shade



true
shaded







{code}

for a Maven build, the same has to be applied to Gradle.

I'm adding a note in the documentation.

> User reports TextIO failure in FlinkRunner on master
> 
>
> Key: BEAM-3587
> URL: https://issues.apache.org/jira/browse/BEAM-3587
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Jean-Baptiste Onofré
>Priority: Blocker
> Fix For: 2.3.0
>
>
> Reported here: 
> [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E]
> "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink 
> cluster, using the latest Beam git revision (ff37337). The job fails to start 
> with the Exception:
>   {{java.lang.UnsupportedOperationException: The transform  is currently not 
> supported.}}
> It does work with Beam 2.2.0 though. All code, logs, and reproduction steps  
> [https://github.com/pelletier/beam-flink-example];
> My initial thoughts: I have a guess that this has to do with switching to 
> running from a portable pipeline representation, and it looks like there's a 
> non-composite transform with an empty URN and it threw a bad error message. 
> We can try to root cause but may also mitigate short-term by removing the 
> round-trip through pipeline proto for now.
> What is curious is that the ValidatesRunner and WordCountIT are working - 
> they only run on a local Flink, yet this seems to be a translation issue that 
> would occur for local or distributed runs.
> We need to certainly run this repro on the RC if we don't totally get to the 
> bottom of it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-02-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352564#comment-16352564
 ] 

Jean-Baptiste Onofré commented on BEAM-3587:


I cloned the test repo: https://github.com/pelletier/beam-flink-example

Clearly the problem is not in the artifacts, but in the shadowJar. I'm testing 
the fix and will provide a PR to user. If it's what I think, I will close this 
Jira as it's not a Beam issue.

> User reports TextIO failure in FlinkRunner on master
> 
>
> Key: BEAM-3587
> URL: https://issues.apache.org/jira/browse/BEAM-3587
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Ben Sidhom
>Priority: Blocker
> Fix For: 2.3.0
>
>
> Reported here: 
> [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E]
> "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink 
> cluster, using the latest Beam git revision (ff37337). The job fails to start 
> with the Exception:
>   {{java.lang.UnsupportedOperationException: The transform  is currently not 
> supported.}}
> It does work with Beam 2.2.0 though. All code, logs, and reproduction steps  
> [https://github.com/pelletier/beam-flink-example];
> My initial thoughts: I have a guess that this has to do with switching to 
> running from a portable pipeline representation, and it looks like there's a 
> non-composite transform with an empty URN and it threw a bad error message. 
> We can try to root cause but may also mitigate short-term by removing the 
> round-trip through pipeline proto for now.
> What is curious is that the ValidatesRunner and WordCountIT are working - 
> they only run on a local Flink, yet this seems to be a translation issue that 
> would occur for local or distributed runs.
> We need to certainly run this repro on the RC if we don't totally get to the 
> bottom of it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-02-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352390#comment-16352390
 ] 

Jean-Baptiste Onofré commented on BEAM-3587:


Using 2.3.0-SNAPSHOT artifacts provided by Maven with the following pipeline 
works without problem with the Flink runner:

{code}
public static final void main(String args[]) throws Exception {
PipelineOptions options = 
PipelineOptionsFactory.fromArgs(args).create();
Pipeline pipeline = Pipeline.create(options);

pipeline
.apply(TextIO.read().from("/home/jbonofre/artists.csv"))
.apply(ParDo.of(new DoFn() {
@ProcessElement
public void processElement(ProcessContext processContext) {
String element = processContext.element();
String[] split = element.split(",");
processContext.output(split[1]);
}
}))
.apply(Count.perElement())
.apply(MapElements.via(new SimpleFunction, 
String>() {
public String apply(KV element) {
return "{\"" + element.getKey() + "\": \"" + 
element.getValue() + "\"}";
}
}))
.apply(TextIO.write().to("/home/jbonofre/label.json"));

pipeline.run();
}
{code}

I'm now testing with artifacts built by Gradle.

> User reports TextIO failure in FlinkRunner on master
> 
>
> Key: BEAM-3587
> URL: https://issues.apache.org/jira/browse/BEAM-3587
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Ben Sidhom
>Priority: Blocker
> Fix For: 2.3.0
>
>
> Reported here: 
> [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E]
> "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink 
> cluster, using the latest Beam git revision (ff37337). The job fails to start 
> with the Exception:
>   {{java.lang.UnsupportedOperationException: The transform  is currently not 
> supported.}}
> It does work with Beam 2.2.0 though. All code, logs, and reproduction steps  
> [https://github.com/pelletier/beam-flink-example];
> My initial thoughts: I have a guess that this has to do with switching to 
> running from a portable pipeline representation, and it looks like there's a 
> non-composite transform with an empty URN and it threw a bad error message. 
> We can try to root cause but may also mitigate short-term by removing the 
> round-trip through pipeline proto for now.
> What is curious is that the ValidatesRunner and WordCountIT are working - 
> they only run on a local Flink, yet this seems to be a translation issue that 
> would occur for local or distributed runs.
> We need to certainly run this repro on the RC if we don't totally get to the 
> bottom of it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-02-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352050#comment-16352050
 ] 

Jean-Baptiste Onofré commented on BEAM-3587:


Hi Ben, thanks for the update. Let me take a look today.

> User reports TextIO failure in FlinkRunner on master
> 
>
> Key: BEAM-3587
> URL: https://issues.apache.org/jira/browse/BEAM-3587
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Ben Sidhom
>Priority: Blocker
> Fix For: 2.3.0
>
>
> Reported here: 
> [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E]
> "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink 
> cluster, using the latest Beam git revision (ff37337). The job fails to start 
> with the Exception:
>   {{java.lang.UnsupportedOperationException: The transform  is currently not 
> supported.}}
> It does work with Beam 2.2.0 though. All code, logs, and reproduction steps  
> [https://github.com/pelletier/beam-flink-example];
> My initial thoughts: I have a guess that this has to do with switching to 
> running from a portable pipeline representation, and it looks like there's a 
> non-composite transform with an empty URN and it threw a bad error message. 
> We can try to root cause but may also mitigate short-term by removing the 
> round-trip through pipeline proto for now.
> What is curious is that the ValidatesRunner and WordCountIT are working - 
> they only run on a local Flink, yet this seems to be a translation issue that 
> would occur for local or distributed runs.
> We need to certainly run this repro on the RC if we don't totally get to the 
> bottom of it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-02-04 Thread Ben Sidhom (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351945#comment-16351945
 ] 

Ben Sidhom commented on BEAM-3587:
--

I think I've figured it out. I had been using SBT to build my own uberjars and 
had been using custom resource merging.

When I tried using a gradle build modeled after the Java build under examples, 
I ran into the same failure as listed above. PtransformTranslation [uses a 
service 
loader|https://github.com/apache/beam/blob/24804e98d22180c1dc6603c8a437073ec2adde2d/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java#L98]
 to figure out urn mappings. It turns out that a simple shadowJar task does not 
handle this properly and instead only includes the special Flink override urns. 
Using the service-merging transform fixes this (with gradle):

{{shadowJar \{ mergeServiceFiles()  }}}

 

> User reports TextIO failure in FlinkRunner on master
> 
>
> Key: BEAM-3587
> URL: https://issues.apache.org/jira/browse/BEAM-3587
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Ben Sidhom
>Priority: Blocker
> Fix For: 2.3.0
>
>
> Reported here: 
> [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E]
> "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink 
> cluster, using the latest Beam git revision (ff37337). The job fails to start 
> with the Exception:
>   {{java.lang.UnsupportedOperationException: The transform  is currently not 
> supported.}}
> It does work with Beam 2.2.0 though. All code, logs, and reproduction steps  
> [https://github.com/pelletier/beam-flink-example];
> My initial thoughts: I have a guess that this has to do with switching to 
> running from a portable pipeline representation, and it looks like there's a 
> non-composite transform with an empty URN and it threw a bad error message. 
> We can try to root cause but may also mitigate short-term by removing the 
> round-trip through pipeline proto for now.
> What is curious is that the ValidatesRunner and WordCountIT are working - 
> they only run on a local Flink, yet this seems to be a translation issue that 
> would occur for local or distributed runs.
> We need to certainly run this repro on the RC if we don't totally get to the 
> bottom of it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-02-02 Thread Ben Sidhom (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351157#comment-16351157
 ] 

Ben Sidhom commented on BEAM-3587:
--

Just repeated it with literally just a read step. It still seems to run without 
error, but obviously there's no output to inspect:

{{p.apply(TextIO.read().from(options.getInputPath())).run().waitUntilFinish()}}

> User reports TextIO failure in FlinkRunner on master
> 
>
> Key: BEAM-3587
> URL: https://issues.apache.org/jira/browse/BEAM-3587
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Ben Sidhom
>Priority: Blocker
> Fix For: 2.3.0
>
>
> Reported here: 
> [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E]
> "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink 
> cluster, using the latest Beam git revision (ff37337). The job fails to start 
> with the Exception:
>   {{java.lang.UnsupportedOperationException: The transform  is currently not 
> supported.}}
> It does work with Beam 2.2.0 though. All code, logs, and reproduction steps  
> [https://github.com/pelletier/beam-flink-example];
> My initial thoughts: I have a guess that this has to do with switching to 
> running from a portable pipeline representation, and it looks like there's a 
> non-composite transform with an empty URN and it threw a bad error message. 
> We can try to root cause but may also mitigate short-term by removing the 
> round-trip through pipeline proto for now.
> What is curious is that the ValidatesRunner and WordCountIT are working - 
> they only run on a local Flink, yet this seems to be a translation issue that 
> would occur for local or distributed runs.
> We need to certainly run this repro on the RC if we don't totally get to the 
> bottom of it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-02-02 Thread Ben Sidhom (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351150#comment-16351150
 ] 

Ben Sidhom commented on BEAM-3587:
--

I built at head and ran a simple program that reads in a text file and writes 
it out to a different file. It seems to work for me.

> User reports TextIO failure in FlinkRunner on master
> 
>
> Key: BEAM-3587
> URL: https://issues.apache.org/jira/browse/BEAM-3587
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Ben Sidhom
>Priority: Blocker
> Fix For: 2.3.0
>
>
> Reported here: 
> [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E]
> "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink 
> cluster, using the latest Beam git revision (ff37337). The job fails to start 
> with the Exception:
>   {{java.lang.UnsupportedOperationException: The transform  is currently not 
> supported.}}
> It does work with Beam 2.2.0 though. All code, logs, and reproduction steps  
> [https://github.com/pelletier/beam-flink-example];
> My initial thoughts: I have a guess that this has to do with switching to 
> running from a portable pipeline representation, and it looks like there's a 
> non-composite transform with an empty URN and it threw a bad error message. 
> We can try to root cause but may also mitigate short-term by removing the 
> round-trip through pipeline proto for now.
> What is curious is that the ValidatesRunner and WordCountIT are working - 
> they only run on a local Flink, yet this seems to be a translation issue that 
> would occur for local or distributed runs.
> We need to certainly run this repro on the RC if we don't totally get to the 
> bottom of it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-02-02 Thread Ben Sidhom (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350806#comment-16350806
 ] 

Ben Sidhom commented on BEAM-3587:
--

As Ken mentioned, this is strange since we have ValidatesRunner tests that 
should in theory trigger this. Can we get a minimal example pipeline that hits 
this issue?

> User reports TextIO failure in FlinkRunner on master
> 
>
> Key: BEAM-3587
> URL: https://issues.apache.org/jira/browse/BEAM-3587
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Ben Sidhom
>Priority: Blocker
> Fix For: 2.3.0
>
>
> Reported here: 
> [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E]
> "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink 
> cluster, using the latest Beam git revision (ff37337). The job fails to start 
> with the Exception:
>   {{java.lang.UnsupportedOperationException: The transform  is currently not 
> supported.}}
> It does work with Beam 2.2.0 though. All code, logs, and reproduction steps  
> [https://github.com/pelletier/beam-flink-example];
> My initial thoughts: I have a guess that this has to do with switching to 
> running from a portable pipeline representation, and it looks like there's a 
> non-composite transform with an empty URN and it threw a bad error message. 
> We can try to root cause but may also mitigate short-term by removing the 
> round-trip through pipeline proto for now.
> What is curious is that the ValidatesRunner and WordCountIT are working - 
> they only run on a local Flink, yet this seems to be a translation issue that 
> would occur for local or distributed runs.
> We need to certainly run this repro on the RC if we don't totally get to the 
> bottom of it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3587) User reports TextIO failure in FlinkRunner on master

2018-02-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350324#comment-16350324
 ] 

Jean-Baptiste Onofré commented on BEAM-3587:


Can we get an update on this Jira ? I would like to help. So please don't 
hesitate to ping me.

> User reports TextIO failure in FlinkRunner on master
> 
>
> Key: BEAM-3587
> URL: https://issues.apache.org/jira/browse/BEAM-3587
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Ben Sidhom
>Priority: Blocker
> Fix For: 2.3.0
>
>
> Reported here: 
> [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E]
> "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink 
> cluster, using the latest Beam git revision (ff37337). The job fails to start 
> with the Exception:
>   {{java.lang.UnsupportedOperationException: The transform  is currently not 
> supported.}}
> It does work with Beam 2.2.0 though. All code, logs, and reproduction steps  
> [https://github.com/pelletier/beam-flink-example];
> My initial thoughts: I have a guess that this has to do with switching to 
> running from a portable pipeline representation, and it looks like there's a 
> non-composite transform with an empty URN and it threw a bad error message. 
> We can try to root cause but may also mitigate short-term by removing the 
> round-trip through pipeline proto for now.
> What is curious is that the ValidatesRunner and WordCountIT are working - 
> they only run on a local Flink, yet this seems to be a translation issue that 
> would occur for local or distributed runs.
> We need to certainly run this repro on the RC if we don't totally get to the 
> bottom of it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)