[jira] [Commented] (FLINK-1219) Add support for Apache Tez as execution engine

2015-04-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392725#comment-14392725
 ] 

ASF GitHub Bot commented on FLINK-1219:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/189#issuecomment-88910737
  
Fix for the build failure is being tested: 
https://travis-ci.org/rmetzger/flink/builds/56885481


 Add support for Apache Tez as execution engine
 --

 Key: FLINK-1219
 URL: https://issues.apache.org/jira/browse/FLINK-1219
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Kostas Tzoumas

 This is an umbrella issue to track Apache Tez support.
 The goal is to be able to run unmodified Flink programs as Apache Tez jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1219) Add support for Apache Tez as execution engine

2015-04-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393060#comment-14393060
 ] 

ASF GitHub Bot commented on FLINK-1219:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/189


 Add support for Apache Tez as execution engine
 --

 Key: FLINK-1219
 URL: https://issues.apache.org/jira/browse/FLINK-1219
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Kostas Tzoumas

 This is an umbrella issue to track Apache Tez support.
 The goal is to be able to run unmodified Flink programs as Apache Tez jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1219) Add support for Apache Tez as execution engine

2015-03-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354597#comment-14354597
 ] 

ASF GitHub Bot commented on FLINK-1219:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/189#discussion_r26108160
  
--- Diff: docs/flink_on_tez_guide.md ---
@@ -0,0 +1,293 @@
+---
+title: Running Flink on YARN leveraging Tez
+---
+!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+License); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+--
+
+* This will be replaced by the TOC
+{:toc}
+
+
+a href=#top/a
+
+## Introduction
+
+You can run Flink using Tez as an execution environment. Flink on Tez 
+is currently included in *flink-staging* in alpha. All classes are
+localted in the *org.apache.flink.tez* package.
+
+## Why Flink on Tez
+
+[Apache Tez](tez.apache.org) is a scalable data processing
+platform. Tez provides an API for specifying a directed acyclic
+graph (DAG), and functionality for placing the DAG vertices in YARN
+containers, as well as data shuffling.  In Flink's architecture,
+Tez is at about the same level as Flink's network stack. While Flink's
+network stack focuses heavily on low latency in order to support 
+pipelining, data streaming, and iterative algorithms, Tez
+focuses on scalability and elastic resource usage.
+
+Thus, by replacing Flink's network stack with Tez, users can get 
scalability
+and elastic resource usage in shared clusters while retaining Flink's 
+APIs, optimizer, and runtime algorithms (local sorts, hash tables, etc).
+
+Flink programs can run almost unmodified using Tez as an execution
+environment. Tez supports local execution (e.g., for debugging), and 
+remote execution on YARN.
+
+
+## Local execution
+
+The `LocalTezEnvironment` can be used run programs using the local
+mode provided by Tez. This is for example WordCount using Tez local mode.
+It is identical to a normal Flink WordCount, except that the 
`LocalTezEnvironment` is used.
+To run in local Tez mode, you can simply run a Flink on Tez program
+from your IDE (e.g., right click and run).
+  
+{% highlight java %}
+public class WordCountExample {
+public static void main(String[] args) throws Exception {
+final LocalTezEnvironment env = LocalTezEnvironment.create();
+
+   DataSetString text = env.fromElements(
+Who's there?,
+I think I hear them. Stand, ho! Who's there?);
+
+DataSetTuple2String, Integer wordCounts = text
+.flatMap(new LineSplitter())
+.groupBy(0)
+.sum(1);
+
+wordCounts.print();
+
+env.execute(Word Count Example);
+}
+
+public static class LineSplitter implements FlatMapFunctionString, 
Tuple2String, Integer {
+@Override
+public void flatMap(String line, CollectorTuple2String, 
Integer out) {
+for (String word : line.split( )) {
+out.collect(new Tuple2String, Integer(word, 1));
+}
+}
+}
+}
+{% endhighlight %}
+
+## YARN execution
+
+### Setup
+
+- Install Tez on your Hadoop 2 cluster following the instructions from the
+  [Apache Tez website](http://tez.apache.org/install.html). If you are 
able to run 
+  the examples that ship with Tez, then Tez has been successfully 
installed.
+  
+- Currently, you need to build Flink yourself to obtain Flink on Tez
+  (the reason is a Hadoop version compatibility: Tez releases artifacts
+  on Maven central with a Hadoop 2.6.0 dependency). Build Flink
+  using `mvn -DskipTests clean package -Dhadoop.version=X.X.X 
-Pinclude-tez`.
+  Make sure that the Hadoop version matches the version that Tez uses.
+  Obtain the jar file contained in the Flink distribution under
+  `flink-staging/flink-tez/target/flink-tez-x.y.z-flink-fat-jar.jar` 

[jira] [Commented] (FLINK-1219) Add support for Apache Tez as execution engine

2015-03-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354624#comment-14354624
 ] 

ASF GitHub Bot commented on FLINK-1219:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/189#issuecomment-78025522
  
The documentation is very well written.

I've tried the pull request on a Hortonworks Sandbox with HDP 2.2.

They have Tez 0.5.2.2.2.0.0 installed there. The mvn command to build Tez 
for this environment is:
```
mvn install -DskipTests -Pinclude-tez -Dhadoop.version=2.6.0 
-Dtez.version=0.5.2.2.2.0.0-2041 -Pvendor-repos
```

To change the tez version, I made the following changes to the pom:
```diff
diff --git a/flink-staging/flink-tez/pom.xml 
b/flink-staging/flink-tez/pom.xml
index e84f731..147dd32 100644
--- a/flink-staging/flink-tez/pom.xml
+++ b/flink-staging/flink-tez/pom.xml
@@ -34,6 +34,9 @@ under the License.
 nameflink-tez/name
 
 packagingjar/packaging
+properties
+   tez.version0.6.0/tez.version
+/properties
 
 dependencies
 dependency
@@ -81,25 +84,25 @@ under the License.
 dependency
 groupIdorg.apache.tez/groupId
 artifactIdtez-api/artifactId
-version0.6.0/version
+version${tez.version}/version
 /dependency
 
 dependency
 groupIdorg.apache.tez/groupId
 artifactIdtez-common/artifactId
-version0.6.0/version
+version${tez.version}/version
 /dependency
 
 dependency
 groupIdorg.apache.tez/groupId
 artifactIdtez-dag/artifactId
-version0.6.0/version
+version${tez.version}/version
 /dependency
 
 dependency
 groupIdorg.apache.tez/groupId
 artifactIdtez-runtime-library/artifactId
-version0.6.0/version
+version${tez.version}/version
 /dependency
 
 dependency
```

Once these issues have been resolved, I'd say the change is +1 to merge.


 Add support for Apache Tez as execution engine
 --

 Key: FLINK-1219
 URL: https://issues.apache.org/jira/browse/FLINK-1219
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Kostas Tzoumas

 This is an umbrella issue to track Apache Tez support.
 The goal is to be able to run unmodified Flink programs as Apache Tez jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1219) Add support for Apache Tez as execution engine

2015-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340225#comment-14340225
 ] 

ASF GitHub Bot commented on FLINK-1219:
---

Github user ktzoumas commented on the pull request:

https://github.com/apache/flink/pull/189#issuecomment-76407035
  
I tested this on a 25-node cluster with WordCount and the modified TPC-H 
query on generated TPC-H data of scale 1000. I also added docs. 


 Add support for Apache Tez as execution engine
 --

 Key: FLINK-1219
 URL: https://issues.apache.org/jira/browse/FLINK-1219
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Kostas Tzoumas

 This is an umbrella issue to track Apache Tez support.
 The goal is to be able to run unmodified Flink programs as Apache Tez jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1219) Add support for Apache Tez as execution engine

2015-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340266#comment-14340266
 ] 

ASF GitHub Bot commented on FLINK-1219:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/189#discussion_r25512682
  
--- Diff: 
flink-staging/flink-tez/src/main/java/org/apache/flink/tez/client/RemoteTezEnvironment.java
 ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.tez.client;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.Plan;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.compiler.DataStatistics;
+import org.apache.flink.compiler.PactCompiler;
+import org.apache.flink.compiler.costs.DefaultCostEstimator;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.ClassUtil;
+import org.apache.hadoop.util.ToolRunner;
+
+
+public class RemoteTezEnvironment extends ExecutionEnvironment {
+
+   private static final Log LOG = LogFactory.getLog(TezExecutor.class);
+   
+   private PactCompiler compiler;
+   private TezExecutor executor;
+   private Path jarPath = null;
+   
+
+   public void registerMainClass (Class mainClass) {
+   jarPath = new Path(ClassUtil.findContainingJar(mainClass));
+   //this.jarURL = 
mainClass.getProtectionDomain().getCodeSource().getLocation();
--- End diff --

this looks like debugging code


 Add support for Apache Tez as execution engine
 --

 Key: FLINK-1219
 URL: https://issues.apache.org/jira/browse/FLINK-1219
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Kostas Tzoumas

 This is an umbrella issue to track Apache Tez support.
 The goal is to be able to run unmodified Flink programs as Apache Tez jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1219) Add support for Apache Tez as execution engine

2015-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340267#comment-14340267
 ] 

ASF GitHub Bot commented on FLINK-1219:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/189#discussion_r25512719
  
--- Diff: 
flink-staging/flink-tez/src/main/java/org/apache/flink/tez/client/RemoteTezEnvironment.java
 ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.tez.client;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.Plan;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.compiler.DataStatistics;
+import org.apache.flink.compiler.PactCompiler;
+import org.apache.flink.compiler.costs.DefaultCostEstimator;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.ClassUtil;
+import org.apache.hadoop.util.ToolRunner;
+
+
+public class RemoteTezEnvironment extends ExecutionEnvironment {
+
+   private static final Log LOG = LogFactory.getLog(TezExecutor.class);
--- End diff --

The log messages will appear from the wrong class.


 Add support for Apache Tez as execution engine
 --

 Key: FLINK-1219
 URL: https://issues.apache.org/jira/browse/FLINK-1219
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Kostas Tzoumas

 This is an umbrella issue to track Apache Tez support.
 The goal is to be able to run unmodified Flink programs as Apache Tez jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1219) Add support for Apache Tez as execution engine

2015-02-07 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311019#comment-14311019
 ] 

Fabian Hueske commented on FLINK-1219:
--

Is this a duplicate of FLINK-972?

 Add support for Apache Tez as execution engine
 --

 Key: FLINK-1219
 URL: https://issues.apache.org/jira/browse/FLINK-1219
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Kostas Tzoumas

 This is an umbrella issue to track Apache Tez support.
 The goal is to be able to run unmodified Flink programs as Apache Tez jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)