[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-27 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/571#issuecomment-96828941
  
This PR was split into PR #632 and PR #633 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/571


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-27 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r29129684
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

I do not have a HBase setup here. 
Could you try to exclude all dependencies of hbase-server and add them 
until it works? I hope the TableInputFormat and TableOutputFormat have not too 
many external dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-27 Thread fpompermaier
Github user fpompermaier commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r29130488
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

Ok, I hope to be able to do it before this evening!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-23 Thread fpompermaier
Github user fpompermaier commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28966088
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

Could you do that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-23 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28961417
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

Fair enough. Then the dependency should not be in test scope, but in the 
default scope, so users get this dependency into their fat jar as well when 
using the HBase output format. May be worth to define a few exclusions, though, 
to not get the complete tail of transitive HBase dependencies (I think that 
even includes JRuby and so on)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-22 Thread fpompermaier
Github user fpompermaier commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28869610
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

Unfortunately the TableInputFormat and TableOutputFormat are in the server 
jar.
For the read we've reimplemented it to make it more robust so we don't need 
that jar, but for the output it is indeed required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-21 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28805740
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

Is the HBase server dependency really required for any client that wants to 
write into HBase? This seems like a pretty bad design on the HBase side.

Can you tell us what fails when you omit this dependency?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-21 Thread fpompermaier
Github user fpompermaier commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28765996
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

It is needed if you want to use the HBase TableOutputFormat


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-21 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28765691
  
--- Diff: flink-staging/flink-hbase/src/test/resources/hbase-site.xml ---
@@ -22,14 +22,13 @@
 --
 configuration
 
+!-- 
--- End diff --

Are these mandatory parameters to use HBase?
Otherwise, we should remove them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-21 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28765745
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

Why did you add this dependency? 
There are no additional tests that would require it, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-21 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28766220
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

But why is it in test scope then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-21 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/571#issuecomment-94735401
  
Looks good, except two minor things. 
Once these are resolved I would merge it and also backport it to the 0.8 
branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-21 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28766303
  
--- Diff: flink-staging/flink-hbase/src/test/resources/hbase-site.xml ---
@@ -22,14 +22,13 @@
 --
 configuration
 
+!-- 
--- End diff --

OK, let's remove them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-21 Thread fpompermaier
Github user fpompermaier commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28766073
  
--- Diff: flink-staging/flink-hbase/src/test/resources/hbase-site.xml ---
@@ -22,14 +22,13 @@
 --
 configuration
 
+!-- 
--- End diff --

I think you can remove the hbase-site.xml file. It is required only if you 
have hbase settings different from the default ones. Also the log4j.properties 
could be removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-21 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28768490
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

But putting it into test scope is not a proper solution to solve the issue. 
IMO, it should be either put in the regular scope such that it can be used 
at runtime or we put it into a comment and add a line explaining why we did 
that.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-07 Thread fpompermaier
Github user fpompermaier commented on the pull request:

https://github.com/apache/flink/pull/571#issuecomment-90481118
  
Ok, I created this issue (https://issues.apache.org/jira/browse/FLINK-1834) 
about the mapred.output.dir 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-07 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/571#issuecomment-90475934
  
Is the Hadoop configuration specified in the flink-conf.yaml loaded? If we 
set `mapred.output.dir` then we should check for an existing config entry 
beforehand. Otherwise, we overwrite Hadoop configuration values. Like @fhueske 
suggested, please open a JIRA for investigation. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-04 Thread fpompermaier
GitHub user fpompermaier opened a pull request:

https://github.com/apache/flink/pull/571

Fixed Configurable HadoopOutputFormat (FLINK-1828)

See https://issues.apache.org/jira/browse/FLINK-1828

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fpompermaier/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/571.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #571


commit 83655bf2773a871c0fb88481be51c6d61ee98881
Author: fpompermaier f.pomperma...@gmail.com
Date:   2015-04-04T10:57:36Z

Fixed Configurable Hadoop output format initialization, added a simple
HBase sink test and upgraded HBase dependencies (from 0.98.6 to 0.98.11)

commit 85dbacf46c6f97f6033a4247cdd60ded87b93641
Author: fpompermaier f.pomperma...@gmail.com
Date:   2015-04-04T10:57:36Z

Fixed Configurable Hadoop output format initialization, added a simple
HBase sink test and upgraded HBase dependencies (from 0.98.6 to 0.98.11)

commit da39bd2da2ab6ae03ff90b4434e167b8278d2df2
Author: fpompermaier f.pomperma...@gmail.com
Date:   2015-04-04T11:11:55Z

Merge branch 'master' of https://github.com/fpompermaier/flink.git

Conflicts:

flink-java/src/main/java/org/apache/flink/api/java/hadoop/mapreduce/HadoopOutputFormatBase.java




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-04 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r27768581
  
--- Diff: 
flink-staging/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseWriteExample.java
 ---
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.addons.hbase.example;
+
+import org.apache.flink.api.common.functions.FlatMapFunction;
+import org.apache.flink.api.common.functions.RichMapFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.hadoop.mapreduce.HadoopOutputFormat;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.util.Collector;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapreduce.Job;
+
+@SuppressWarnings(serial)
+public class HBaseWriteExample {
+   
+   // 
*
+   // PROGRAM
+   // 
*
+   
+   public static void main(String[] args) throws Exception {
+
+   if(!parseParameters(args)) {
+   return;
+   }
+   
+   // set up the execution environment
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+   
+   // get input data
+   DataSetString text = getTextDataSet(env);
+   
+   DataSetTuple2String, Integer counts = 
+   // split up the lines in pairs (2-tuples) 
containing: (word,1)
+   text.flatMap(new Tokenizer())
+   // group by the tuple field 0 and sum up 
tuple field 1
+   .groupBy(0)
+   .sum(1);
+
+   // emit result
+// if(fileOutput) {
--- End diff --

the `if` statement should be completely removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-04 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r27768585
  
--- Diff: 
flink-staging/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseWriteExample.java
 ---
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.addons.hbase.example;
+
+import org.apache.flink.api.common.functions.FlatMapFunction;
+import org.apache.flink.api.common.functions.RichMapFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.hadoop.mapreduce.HadoopOutputFormat;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.util.Collector;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapreduce.Job;
+
+@SuppressWarnings(serial)
+public class HBaseWriteExample {
+   
+   // 
*
+   // PROGRAM
+   // 
*
+   
+   public static void main(String[] args) throws Exception {
+
+   if(!parseParameters(args)) {
+   return;
+   }
+   
+   // set up the execution environment
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+   
+   // get input data
+   DataSetString text = getTextDataSet(env);
+   
+   DataSetTuple2String, Integer counts = 
+   // split up the lines in pairs (2-tuples) 
containing: (word,1)
+   text.flatMap(new Tokenizer())
+   // group by the tuple field 0 and sum up 
tuple field 1
+   .groupBy(0)
+   .sum(1);
+
+   // emit result
+// if(fileOutput) {
+   Job job = Job.getInstance();
+   
job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, outputTableName);
+   // TODO is mapred.output.dir really useful?
+   
job.getConfiguration().set(mapred.output.dir,/tmp/test);
+   counts.map(new RichMapFunction Tuple2String,Integer, 
Tuple2Text,Mutation() {
+   private final byte[] CF_SOME = 
Bytes.toBytes(test-column);
+   private final byte[] Q_SOME = 
Bytes.toBytes(value);
+   private transient Tuple2Text, Mutation reuse;
+
+   @Override
+   public void open(Configuration parameters) 
throws Exception {
+   super.open(parameters);
+   reuse = new Tuple2Text, Mutation();
+   }
+
+   @Override
+   public Tuple2Text, Mutation 
map(Tuple2String, Integer t) throws Exception {
+   reuse.f0 = new Text(t.f0);
+   Put put = new Put(t.f0.getBytes());
+   put.add(CF_SOME, Q_SOME, 
Bytes.toBytes(t.f1));
+   reuse.f1 = put;
+   return reuse;
+   }
+   }).output(new HadoopOutputFormatText, Mutation(new 
TableOutputFormatText(), job));
+// } else {
--- End diff --

`else` branch not necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on 

[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-04 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/571#issuecomment-89559472
  
Looks good to me, except for some comments that should be removed.

Regarding the `mapred.output.dir` parameter I am not sure whether this is 
generally expected for all Hadoop OutputFormats or only required for file-based 
OutputFormats. 
I would keep it for now and open a JIRA to investigate the issue and fix it 
if necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-04 Thread fpompermaier
Github user fpompermaier commented on the pull request:

https://github.com/apache/flink/pull/571#issuecomment-89570344
  
Removed comments and commented code as suggested by Fabian. Do I have also 
to create a JIRA ticket about mapred.output.dir parameter? I think that it can 
be defaulted to the Flink temp directory or flinkTempDir/hadoop/job-id


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---