[GitHub] flink pull request: [FLINK-1415] Akka cleanups

2015-01-26 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/319#discussion_r23526978
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java ---
@@ -329,6 +330,15 @@ public void unregisterMemoryManager(MemoryManager 
memoryManager) {
}
}
 
+   protected void notifyExecutionStateChange(ExecutionState executionState,
+   
Throwable optionalError) {
--- End diff --

This also seems weird 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-1436] refactor CLiFrontend to provide m...

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/331#issuecomment-71467375
  
I think it would be better not to print the help if the user specified 
something incorrectly. Maybe just the error message and a note that -h prints 
the help?

I've tried out the change, but now, the message is as the very bottom of 
the output. Its now probably even harder to find it.

**Bad** (see below for *Good*)

```
robert@robert-da ~/flink-workdir/flink2/build-target (git)-[flink-1436] % 
./bin/flink ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar 

Action run compiles and runs a program.

  Syntax: run [OPTIONS] jar-file arguments
  run action options:
 -c,--class classname   Class with the program entry point 
(main
  method or getPlan() method. Only 
needed
  if the JAR file does not specify the 
class
  in its manifest.
 -m,--jobmanager host:port  Address of the JobManager (master) to
  which to connect. Specify 
'yarn-cluster'
  as the JobManager to deploy a YARN 
cluster
  for the job. Use this flag to connect 
to a
  different JobManager than the one
  specified in the configuration.
 -p,--parallelism parallelism   The parallelism with which to run the
  program. Optional flag to override the
  default value specified in the
  configuration.
  Additional arguments if -m yarn-cluster is set:
 -yD argDynamic properties
 -yj,--yarnjar arg  Path to Flink jar file
 -yjm,--yarnjobManagerMemory argMemory for JobManager Container 
[in
  MB]
 -yn,--yarncontainer argNumber of YARN container to 
allocate
  (=Number of Task Managers)
 -yq,--yarnquery  Display available YARN resources
  (memory, cores)
 -yqu,--yarnqueue arg   Specify YARN queue.
 -ys,--yarnslots argNumber of slots per TaskManager
 -yt,--yarnship arg Ship files in the specified 
directory
  (t for transfer)
 -ytm,--yarntaskManagerMemory arg   Memory per TaskManager Container 
[in
  MB]

Invalid action: ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
1 robert@robert-da ~/flink-workdir/flink2/build-target (git)-[flink-1436]
```

The info command is over-engineered in my optionion. It contains only one 
possible option, which is -e for execution plan. I would vote to remove the 
info action and call it plan or so. 
Or keep its info name and print the plan by default (this is not @mxm's 
fault .. but it would be nice to fix this with the PR)
```
 ./bin/flink info  
./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar  

Action info displays information about a program.

  Syntax: info [OPTIONS] jar-file arguments
  info action options:
 -c,--class classname   Class with the program entry point 
(main
  method or getPlan() method. Only 
needed
  if the JAR file does not specify the 
class
  in its manifest.
 -e,--executionplan   Show optimized execution plan of the
  program (JSON)
 -m,--jobmanager host:port  Address of the JobManager (master) to
  which to connect. Specify 
'yarn-cluster'
  as the JobManager to deploy a YARN 
cluster
  for the job. Use this flag to connect 
to a
  different JobManager than the one
  specified in the configuration.
 -p,--parallelism parallelism   The parallelism with which to run the
  program. Optional flag to override the
  default value specified in the
  configuration.

Error: Specify one of the above options to display information.
```

**Good**

What I liked was the error reporting when passing an invalid file

[GitHub] flink pull request: [FLINK-1415] Akka cleanups

2015-01-26 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/319#discussion_r23526836
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotSharingGroupAssignment.java
 ---
@@ -42,86 +44,82 @@
 public class SlotSharingGroupAssignment implements Serializable {
 
static final long serialVersionUID = 42L;
-   
+
private static final Logger LOG = Scheduler.LOG;
-   
+
private transient final Object lock = new Object();
-   
+
/** All slots currently allocated to this sharing group */
private final SetSharedSlot allSlots = new 
LinkedHashSetSharedSlot();
-   
+
/** The slots available per vertex type (jid), keyed by instance, to 
make them locatable */
private final MapAbstractID, MapInstance, ListSharedSlot 
availableSlotsPerJid = new LinkedHashMapAbstractID, MapInstance, 
ListSharedSlot();
-   
-   
+
// 

-   
-   
-   public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex 
vertex) {
-   JobVertexID id = vertex.getJobvertexId();
-   return addNewSlotWithTask(slot, id, id);
-   }
-   
-   public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex 
vertex, CoLocationConstraint constraint) {
-   AbstractID groupId = constraint.getGroupId();
-   return addNewSlotWithTask(slot, groupId, null);
-   }
-   
-   private SubSlot addNewSlotWithTask(AllocatedSlot slot, AbstractID 
groupId, JobVertexID vertexId) {
-   
-   final SharedSlot sharedSlot = new SharedSlot(slot, this);
-   final Instance location = slot.getInstance();
-   
+
+   public SimpleSlot addSharedSlotAndAllocateSubSlot(SharedSlot 
sharedSlot, Locality locality,
+   
AbstractID groupId, CoLocationConstraint constraint) {
--- End diff --

indentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1415] Akka cleanups

2015-01-26 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/319#discussion_r23526787
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotAvailabilityListener.java
 ---
@@ -21,7 +21,7 @@
 import org.apache.flink.runtime.instance.Instance;
 
 /**
- * A SlotAvailabilityListener can be notified when new {@link 
org.apache.flink.runtime.instance.AllocatedSlot}s become available
+ * A SlotAvailabilityListener can be notified when new {@link 
org.apache.flink.runtime.instance.AllocatedSlot2}s become available
--- End diff --

I guess `AllocatedSlot2` is an automatic rename leftover


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71464164
  
-1
I think we need to add the `build-target` directory into the list of 
ignored directories for apache rat. Rat will fail subsequent builds
```
1 Unknown Licenses

***

Unapproved licenses:

  build-target/conf/slaves

***
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1391] Add support for using Avro-POJOs ...

2015-01-26 Thread rmetzger
Github user rmetzger closed the pull request at:

https://github.com/apache/flink/pull/323


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1437][Java API] Fixes copy() methods in...

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/342#issuecomment-71475896
  
This will probably conflict with https://github.com/apache/flink/pull/316.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1392] Add Kryo serializer for Protobuf

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/322#issuecomment-71484526
  
I will make this change part of a new pull request for 
https://issues.apache.org/jira/browse/FLINK-1417.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71469736
  
Oh, actually, that should work because the configuration explicitly binds 
the plugin to the clean phase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Qa bot

2015-02-04 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/362#issuecomment-72886592
  
Sorry guys, I was confused with the different repositories. I'll close it 
again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-592] Add support for Kerberos secured Y...

2015-02-04 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/358#issuecomment-72937500
  
Thank you.
Thats good to hear.


 On 04.02.2015, at 21:56, Daniel Warneke notificati...@github.com wrote:
 
 Tested the code and everything works as expected now. Great job!
 
 —
 Reply to this email directly or view it on GitHub.
 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...

2015-02-02 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-72533930
  
Wordcount with build-in data works :+1: nice.
```
robert@robert-tower ...k-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT (git)-[papipr] 
% ./bin/pyflink3.sh pyflink.py - /home/robert/flink-workdir/yarnLog 
/tmp/yarnPyWC
02/02/2015 21:20:34 Job execution switched to status RUNNING.
02/02/2015 21:20:34 DataSource (TextSource)(1/1) switched to SCHEDULED 
02/02/2015 21:20:34 DataSource (TextSource)(1/1) switched to DEPLOYING 
02/02/2015 21:20:34 DataSource (TextSource)(1/1) switched to RUNNING 
02/02/2015 21:20:34 MapPartition (PythonFlatMap - PythonCombine)(1/1) 
switched to SCHEDULED 
02/02/2015 21:20:34 MapPartition (PythonFlatMap - PythonCombine)(1/1) 
switched to DEPLOYING 
02/02/2015 21:20:34 MapPartition (PythonFlatMap - PythonCombine)(1/1) 
switched to RUNNING 
02/02/2015 21:20:34 DataSource (TextSource)(1/1) switched to FINISHED 
```

I wanted to run wordcount locally on some serious data, but sadly it seems 
that the job somehow deadlocked.

```
MapPartition (PythonFlatMap - PythonCombine) (1/1) #85 daemon prio=5 
os_prio=0 tid=0x01b59000 nid=0x855 runnable [0x7fd73b3f4000]
   java.lang.Thread.State: RUNNABLE
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
- locked 0xfad36828 (a java.net.PlainDatagramSocketImpl)
at 
java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:143)
- locked 0xfad36828 (a java.net.PlainDatagramSocketImpl)
at java.net.DatagramSocket.receive(DatagramSocket.java:781)
- locked 0xfad367a0 (a java.net.DatagramPacket)
- locked 0xfad367c8 (a java.net.DatagramSocket)
at 
org.apache.flink.languagebinding.api.java.common.streaming.Streamer.streamBufferWithoutGroups(Streamer.java:172)
at 
org.apache.flink.languagebinding.api.java.python.functions.PythonMapPartition.mapPartition(PythonMapPartition.java:55)
at 
org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:98)
at 
org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
at 
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360)
at 
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:204)
at java.lang.Thread.run(Thread.java:745)
```

```
Thread-23 #86 daemon prio=5 os_prio=0 tid=0x7fd634001800 nid=0x857 
runnable [0x7fd73b4f4000]
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:234)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked 0xfad3a440 (a 
java.lang.UNIXProcess$ProcessPipeInputStream)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
- locked 0xfad3e968 (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
- locked 0xfad3e968 (a java.io.InputStreamReader)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at 
org.apache.flink.languagebinding.api.java.common.streaming.StreamPrinter.run(StreamPrinter.java:34)
```

I have to dig further to understand whats going on.


My understanding of the pull request right now is the following: I see that 
this change was a LOT of work and that there had been some iterations of 
improvement.
What the code certainly needs are a few more developers. This will probably 
automatically lead to cleaner code, better code comments, better error handling 
and so on.
I'm still not convinced to merge the code in the state its currently in. 
Therefore, I'm just facing too many issues right now. That the example in the 
documentation is broken is certainly not the dealbreaker here. Issues like hard 
to find error messages or the issues I had with the wordcount (I don't know if 
its the runtime or an issue of the Python code)

Please don't get my feedback here wrong. I appreciate and see that this has 
been a lot of work, and we are probably close to 100% (we are probably in the 
90s already). Whats needed are people to test this in different environments, 
with different expectations etc. .. then we'll probably quickly achieve

[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...

2015-02-02 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/202#discussion_r23952693
  
--- Diff: 
flink-addons/flink-language-binding/src/main/python/org/apache/flink/languagebinding/api/python/dill/__diff.py
 ---
@@ -0,0 +1,247 @@

+
--- End diff --

We need to find another solution here with the licenses. I think we can not 
just re-distribute this file with our license.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...

2015-02-02 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-72526181
  
I've tested the changes again, because I would really like to merge them

The bin/pyflink3.sh script only works when called from the flink root dir
```
robert@robert-tower ...9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/bin (git)-[papipr] 
% ./pyflink3.sh
Error: Jar file: 'lib/flink-language-binding-0.9-SNAPSHOT.jar' does not 
exist.
```

This issue will be fixed soon because the `bin/flink` client will print all 
errors immediately (instead of asking the user to put a `-v`). For now, you can 
maybe add the `-v´ by default.
```
./bin/pyflink3.sh pyflink.py   
Traceback (most recent call last):
  File /tmp/flink_plan/plan.py, line 1, in module
bullshit
NameError: name 'bullshit' is not defined
20:16:20,658 WARN  org.apache.hadoop.util.NativeCodeLoader  
 - Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable
Error: The main method caused an error.
For a more detailed error message use the vebose output option '-v'.
```

The Python PlanBuilder seems to insist on using HDFS, even though I'm 
testing the code locally:
```
robert@robert-tower ...k-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT (git)-[papipr] 
% ./bin/pyflink3.sh pyflink.py
20:25:57,440 WARN  org.apache.hadoop.util.NativeCodeLoader  
 - Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable
Error: The main method caused an error.
org.apache.flink.client.program.ProgramInvocationException: The main method 
caused an error.
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:449)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:350)
at org.apache.flink.client.program.Client.run(Client.java:242)
at 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:389)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:358)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1068)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1092)
Caused by: java.io.IOException: The given HDFS file URI (hdfs:/tmp/flink) 
did not describe the HDFS NameNode. The attempt to use a default HDFS 
configuration, as specified in the 'fs.hdfs.hdfsdefault' or 'fs.hdfs.hdfssite' 
config parameter failed due to the following problem: Either no default file 
system was registered, or the provided configuration contains no valid 
authority component (fs.default.name or fs.defaultFS) describing the (hdfs 
namenode) host and port.
at 
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.initialize(HadoopFileSystem.java:287)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:261)
at 
org.apache.flink.languagebinding.api.java.python.PythonPlanBinder.clearPath(PythonPlanBinder.java:135)
at 
org.apache.flink.languagebinding.api.java.python.PythonPlanBinder.distributeFiles(PythonPlanBinder.java:153)
at 
org.apache.flink.languagebinding.api.java.python.PythonPlanBinder.runPlan(PythonPlanBinder.java:101)
at 
org.apache.flink.languagebinding.api.java.python.PythonPlanBinder.main(PythonPlanBinder.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:434)
... 6 more
```
Apparently, using `env.execute(local=True)` resolves the problem.

But leads to a new problem:
```
robert@robert-tower ...k-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT (git)-[papipr] 
% ./bin/pyflink3.sh pyflink.py
02/02/2015 20:55:00 Job execution switched to status RUNNING.
02/02/2015 20:55:00 DataSource (ValueSource)(1/1) switched to SCHEDULED 
02/02/2015 20:55:00 DataSource (ValueSource)(1/1) switched to DEPLOYING 
02/02/2015 20:55:01 DataSource (ValueSource)(1/1) switched to RUNNING 
02/02/2015 20:55:01 MapPartition (PythonFlatMap - PythonCombine)(1/1) 
switched to SCHEDULED 
02/02/2015 20:55:01 MapPartition (PythonFlatMap - PythonCombine)(1/1) 
switched to DEPLOYING 
02/02/2015 20:55:01 DataSource (ValueSource)(1/1) switched to FINISHED 
02/02/2015 20:55:01 MapPartition (PythonFlatMap - PythonCombine)(1/1) 
switched to RUNNING 
02/02/2015 20:55:05 MapPartition (PythonFlatMap - PythonCombine)(1/1) 
switched to FAILED 
java.lang.RuntimeException: External process for task MapPartition 
(PythonFlatMap

[GitHub] flink pull request: [FLINK-1105] Add support for locally sorted ou...

2015-02-02 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/347#discussion_r23914605
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/DataSink.java ---
@@ -83,6 +93,107 @@ public DataSink(DataSetT data, OutputFormatT 
format, TypeInformationT type
}
 
/**
+* Sorts each local partition of a {@link 
org.apache.flink.api.java.tuple.Tuple} data set
+* on the specified field in the specified {@link Order} before it is 
emitted by the output format./br
+* bNote: Only tuple data sets can be sorted using integer field 
indices./bbr/
+* The tuple data set can be sorted on multiple fields in different 
orders
+* by chaining {@link #sortLocalOutput(int, Order)} calls.
+*
+* @param field The Tuple field on which the data set is locally sorted.
+* @param order The Order in which the specified Tuple field is locally 
sorted.
+* @return This data sink operator with specified output order.
+*
+* @see org.apache.flink.api.java.tuple.Tuple
+* @see Order
+*/
+   public DataSinkT sortLocalOutput(int field, Order order) {
+
+   if (!this.type.isTupleType()) {
+   throw new InvalidProgramException(Specifying order 
keys via field positions is only valid for tuple data types);
+   }
+   if (field = this.type.getArity()) {
+   throw new InvalidProgramException(Order key out of 
tuple bounds.);
+   }
+
+   if(this.sortKeyPositions == null) {
+   // set sorting info
+   this.sortKeyPositions = new int[] {field};
+   this.sortOrders = new Order[] {order};
+   } else {
+   // append sorting info to exising info
+   int newLength = this.sortKeyPositions.length + 1;
+   this.sortKeyPositions = 
Arrays.copyOf(this.sortKeyPositions, newLength);
+   this.sortOrders = Arrays.copyOf(this.sortOrders, 
newLength);
+   this.sortKeyPositions[newLength-1] = field;
+   this.sortOrders[newLength-1] = order;
+   }
+   return this;
+   }
+
+   /**
+* Sorts each local partition of a data set on the field(s) specified 
by the field expression
+* in the specified {@link Order} before it is emitted by the output 
format./br
+* bNote: Non-composite types can only be sorted on the full element 
which is specified by
+* a wildcard expression (* or _)./bbr/
+* Data sets of composite types (Tuple or Pojo) can be sorted on 
multiple fields in different orders
+* by chaining {@link #sortLocalOutput(String, Order)} calls.
+*
+* @param fieldExpression The field expression for the field(s) on 
which the data set is locally sorted.
+* @param order The Order in which the specified field(s) are locally 
sorted.
+* @return This data sink operator with specified output order.
+*
+* @see Order
+*/
+   public DataSinkT sortLocalOutput(String fieldExpression, Order order) 
{
+
+   int numFields;
+   int[] fields;
+   Order[] orders;
+
+   if(this.type instanceof CompositeType) {
+   // compute flat field positions for (nested) sorting 
fields
+   Keys.ExpressionKeysT ek;
+   try {
+   ek = new Keys.ExpressionKeysT(new 
String[]{fieldExpression}, this.type);
+   } catch(IllegalArgumentException iae) {
+   throw new 
InvalidProgramException(iae.getMessage());
--- End diff --

Why are you creating a new exception with the error message instead of 
forwarding the illegal argument exception?
I personally like it very much when I can find the exact location where the 
exception was thrown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1105] Add support for locally sorted ou...

2015-02-02 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/347#issuecomment-72430165
  
The change is good to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...

2015-02-07 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-73366441
  
I'm really sorry that I've messed up this pull request by renaming 
flink-addons to flink-staging :(
I was doing it in a rush Really sorry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-592] Add support for Kerberos secured Y...

2015-02-03 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/358

[FLINK-592] Add support for Kerberos secured YARN setups to Flink.

This pull request is basically a port of @warneke's branch 
(https://github.com/warneke/flink/tree/security) to the latest `master` of 
Flink.

The port has been done mostly by @mxm. 
We tested the change on google compute engine (non-secure setup, to ensure 
that everything is working as before) and a local secure YARN setup with 
Kerberos.

Open issues:
- Test token renewal 

Once the open issues have been resolved, I would like to merge this asap 
because a user was asking for this on the mailing list.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink592

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/358.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #358


commit 3fc8d47f3f7322285539454c7a80a8cec4ba043f
Author: Max m...@posteo.de
Date:   2015-02-02T15:09:18Z

[FLINK-592] Add support for Kerberos secured YARN setups to Flink.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Discuss] Simplify SplittableIterator interfac...

2015-02-04 Thread rmetzger
Github user rmetzger closed the pull request at:

https://github.com/apache/flink/pull/338


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Discuss] Simplify SplittableIterator interfac...

2015-02-04 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/338#issuecomment-72861601
  
Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/7ac6447f.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-592] Add support for Kerberos secured Y...

2015-02-04 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/358#issuecomment-72856331
  
Thank you for the good feedback!
@mxm and I updated the pull request and addressed your concerns.

I'm now running the tests on Travis. If they pass I'm going to merge the 
changes. 

@warneke: It would be nice if you could test the code again to see if we 
really fixed the issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1471][java-api] Fixes wrong input valid...

2015-02-04 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/359#discussion_r24085204
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/typeutils/TypeExtractor.java 
---
@@ -680,10 +680,20 @@ private static void validateInputType(Type t, 
TypeInformation? inType) {
}
}

-   private static void validateInputType(Class? baseClass, Class? 
clazz, int inputParamPos, TypeInformation? inType) {
+   private static void validateInputType(Class? baseClass, Class? 
clazz, int inputParamPos, TypeInformation? inTypeInfo) {
ArrayListType typeHierarchy = new ArrayListType();
+
+   // try to get generic parameter
+   Type inType;
+   try {
+   inType = getParameterType(baseClass, typeHierarchy, 
clazz, inputParamPos);
+   }
+   catch (Exception e) {
+   return; // skip input validation e.g. for raw types
--- End diff --

I know its annoying to change pull requests for these minor changes.
If you want you can change it and push it to `master`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: yarn client tests

2015-02-05 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/134#issuecomment-73047820
  
Yes, the changes here have been subsumed by FLINK-883.

@skunert can you close this pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Port FLINK-1391 and FLINK-1392 to release-0.8...

2015-02-05 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/364

Port FLINK-1391 and FLINK-1392 to release-0.8 branch.

These commits port the fixes for the two issues (Avro and Protobuf support) 
to the release-0.8 branch.
They also contain a hotfix regarding the closure cleaner by @aljoscha.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink kryo081

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/364.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #364


commit 38ebc09ff5782005c5aa1f60b458cae250b8c26e
Author: Robert Metzger metzg...@web.de
Date:   2015-01-12T20:11:09Z

[FLINK-1391] Add support for using Avro-POJOs and Avro types with Kryo

Conflicts:

flink-java/src/main/java/org/apache/flink/api/java/typeutils/TypeExtractor.java

flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java

Conflicts:

flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java

commit cbe633537567cc39a9877125e79cd7da49ee7f3b
Author: Robert Metzger rmetz...@apache.org
Date:   2015-01-13T09:21:29Z

[FLINK-1392] Add Kryo serializer for Protobuf

Conflicts:
flink-java/pom.xml

flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java

Conflicts:
flink-shaded/pom.xml
pom.xml

commit 63472baff1fca18b83666831effe2204606cf355
Author: Aljoscha Krettek aljoscha.kret...@gmail.com
Date:   2015-01-15T10:46:53Z

[hotfix] Also use java closure cleaner on grouped operations

commit 9043582a4a1f4fd25e960217a99f3f32d4ba18a9
Author: Robert Metzger rmetz...@apache.org
Date:   2015-02-05T13:07:48Z

[backports] Cleanup and port changes to 0.8 branch.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool

2015-02-05 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/366#issuecomment-73089829
  
Thank you for trying it out.
Ideally, with the bot in place the number of warnings will go down over 
time.

I'll address the comments in the source.

I'm not sure if the number of compiler warnings is correct here.

A third option would be to
a) check out the current master
b) get the reference counts on the master
c) apply the pull request as a patch to master (checking if patching is 
possible (basically testing if rebase is possible))
d) if rebase was possible, get the new counts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool

2015-02-05 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/366#issuecomment-73084319
  
Its not completely building flink, its only generating javadocs or 
compiling for getting the compiler warnings.

The main purpose of the script is to be executed automatically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool

2015-02-06 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/366#issuecomment-73254306
  
I found out why the qa-check has more compile errors compared to the 
master: I've instructed the compiler to report them all ;)

I've addressed all comments in the pull request. I'm going to merge it soon 
if there are no other comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Allow KeySelectors to implement ResultTypeQuer...

2015-02-03 Thread rmetzger
Github user rmetzger closed the pull request at:

https://github.com/apache/flink/pull/354


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Allow KeySelectors to implement ResultTypeQuer...

2015-02-03 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/354#issuecomment-72620298
  
I'll close it.
I've filed a jira for the issue and assigned it to @twalthr: 
https://issues.apache.org/jira/browse/FLINK-1471


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...

2015-02-02 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/202#discussion_r23950076
  
--- Diff: docs/python_programming_guide.md ---
@@ -0,0 +1,600 @@
+---
+title: Python Programming Guide
+---
+!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+License); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+--
+
+* This will be replaced by the TOC
+{:toc}
+
+
+a href=#top/a
+
+Introduction
+
+
+Analysis programs in Flink are regular programs that implement 
transformations on data sets
+(e.g., filtering, mapping, joining, grouping). The data sets are initially 
created from certain
+sources (e.g., by reading files, or from collections). Results are 
returned via sinks, which may for
+example write the data to (distributed) files, or to standard output (for 
example the command line
+terminal). Flink programs run in a variety of contexts, standalone, or 
embedded in other programs.
+The execution can happen in a local JVM, or on clusters of many machines.
+
+In order to create your own Flink program, we encourage you to start with 
the
+[program skeleton](#program-skeleton) and gradually add your own
+[transformations](#transformations). The remaining sections act as 
references for additional
+operations and advanced features.
+
+
+Example Program
+---
+
+The following program is a complete, working example of WordCount. You can 
copy amp; paste the code
+to run it locally.
+
+{% highlight python %}
+from flink.plan.Environment import get_environment
+from flink.plan.Constants import INT, STRING
+from flink.functions.GroupReduceFunction import GroupReduceFunction
+
+class Adder(GroupReduceFunction):
+  def reduce(self, iterator, collector):
+count, word = iterator.next()
+count += sum([x[0] for x in iterator])
+collector.collect((count, word))
+
+if __name__ == __main__:
+  env = get_environment()
+  data = env.from_elements(Who's there?,
+   I think I hear them. Stand, ho! Who's there?)
+  
+  data \
+.flat_map(lambda x: x.lower().split(), (INT, STRING)) \
+.group_by(1) \
+.reduce_group(Adder(), (INT, STRING), combinable=True) \
+.output()
+  
+  env.execute()
+}
--- End diff --

I've copy pasted the program as said in the documentation but it doesn't 
run.
Most likely because of the `}` sign here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1478] Add support for strictly local in...

2015-02-09 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/375#issuecomment-73477321
  
Looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...

2015-02-09 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/374#issuecomment-73478511
  
I've tried it out locally. Looks very nice. Thank you.

+1 to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...

2015-01-14 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/304#discussion_r22968592
  
--- Diff: flink-java/pom.xml ---
@@ -64,6 +64,18 @@ under the License.
version0.5.1/version
/dependency
 
+   dependency
--- End diff --

This is pulling some unneeded dependencies: 
https://github.com/magro/kryo-serializers/blob/master/pom.xml
for example cglib,org.apache.wicket, 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1398] Introduce extractSingleField() in...

2015-01-14 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/308#issuecomment-69976328
  
+1 for some utilities.
I'm not sure however where to put it.
Should we add another maven module? Make it part of the current 
flink-java ? Or start it as a github repo outside of the main project?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for Subclasses, Interfaces, Abstra...

2015-01-15 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/236#issuecomment-70095528
  
I would like to have this merged soon.
It contains some good changes to the TypeExtractor (support for interfaces 
 abstract classes)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1147][Java API] TypeInference on POJOs

2015-01-20 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/315#issuecomment-70723442
  
I didn't find any issues in the code. Good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons

2015-01-20 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/326#discussion_r23252081
  
--- Diff: 
flink-addons/flink-gelly/src/main/java/org/apache/flink/gelly/Edge.java ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.gelly;
+
+import java.io.Serializable;
+
+import org.apache.flink.api.java.tuple.Tuple3;
+
+public class EdgeK extends ComparableK  Serializable, V extends 
Serializable 
--- End diff --

This Edge class, and all the 3 other Edge* classes below are missing 
javadocs. The rest of the graph api is very well documented, but these classes 
not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1392] Add Kryo serializer for Protobuf

2015-01-19 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/322

[FLINK-1392] Add Kryo serializer for Protobuf

I've checked the added dependencies and its not overriding any versions and 
no transitive dependencies are added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink1392

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/322.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #322


commit ed629e3e23001a0761d116d8c1151a65d88501eb
Author: Robert Metzger rmetz...@apache.org
Date:   2015-01-13T09:21:29Z

[FLINK-1392] Add Kryo serializer for Protobuf




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1382][java] Adds the new basic types Vo...

2015-01-16 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/299#issuecomment-70226685
  
The change looks good. I would like to see some test cases there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations

2015-01-16 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/311#issuecomment-70226947
  
How about names along the lines of Unmodified Fields ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations

2015-01-16 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/311#discussion_r23069617
  
--- Diff: 
flink-compiler/src/main/java/org/apache/flink/compiler/dag/BinaryUnionNode.java 
---
@@ -266,4 +268,44 @@ public void computeOutputEstimates(DataStatistics 
statistics) {
this.estimatedOutputSize = in1.estimatedOutputSize  0  
in2.estimatedOutputSize  0 ?
in1.estimatedOutputSize + in2.estimatedOutputSize : -1;
}
+
+   public static class UnionSemanticProperties implements 
SemanticProperties {
+
+   @Override
+   public FieldSet getTargetFields(int input, int sourceField) {
+   if (input != 0  input != 1) {
+   throw new IndexOutOfBoundsException();
--- End diff --

How about returning an exception that explains that unions only support 
input to be 0 or 1. ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations

2015-01-16 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/311#discussion_r23069937
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/SemanticPropUtil.java
 ---
@@ -40,38 +45,108 @@
 import org.apache.flink.api.java.functions.FunctionAnnotation.ReadFields;
 import 
org.apache.flink.api.java.functions.FunctionAnnotation.ReadFieldsFirst;
 import 
org.apache.flink.api.java.functions.FunctionAnnotation.ReadFieldsSecond;
+import org.apache.flink.api.java.operators.Keys;
+import org.apache.flink.api.java.typeutils.TupleTypeInfo;
 
 public class SemanticPropUtil {
 
-   private final static String REGEX_LIST = 
(\\s*(\\d+\\s*,\\s*)*(\\d+\\s*));
-   private final static String REGEX_FORWARD = (\\s*(\\d+)\\s*-( + 
REGEX_LIST + |(\\*)));
-   private final static String REGEX_LIST_OR_FORWARD = ( + REGEX_LIST + 
| + REGEX_FORWARD + );
-   private final static String REGEX_ANNOTATION = (\\s*( + 
REGEX_LIST_OR_FORWARD + \\s*;\\s*)*( + REGEX_LIST_OR_FORWARD + \\s*));
+   private final static String REGEX_WILDCARD = [\\+ 
Keys.ExpressionKeys.SELECT_ALL_CHAR+\\+ 
Keys.ExpressionKeys.SELECT_ALL_CHAR_SCALA+];
+   private final static String REGEX_SINGLE_FIELD = [a-zA-Z0-9_\\$]+;
+   private final static String REGEX_NESTED_FIELDS = (( + 
REGEX_SINGLE_FIELD + \\.)* + REGEX_SINGLE_FIELD + )(\\.+ REGEX_WILDCARD 
+)?;
 
+   private final static String REGEX_LIST = (( + REGEX_NESTED_FIELDS + 
;)*( + REGEX_NESTED_FIELDS + );?);
+   private final static String REGEX_FORWARD = ((+ REGEX_NESTED_FIELDS 
+|+ REGEX_WILDCARD +)-( + REGEX_NESTED_FIELDS + |+ REGEX_WILDCARD +));
+   private final static String REGEX_FIELD_OR_FORWARD = ( + 
REGEX_NESTED_FIELDS + | + REGEX_FORWARD + );
+   private final static String REGEX_ANNOTATION = (( + 
REGEX_FIELD_OR_FORWARD + ;)*( + REGEX_FIELD_OR_FORWARD + );?);
+
+   private static final Pattern PATTERN_WILDCARD = 
Pattern.compile(REGEX_WILDCARD);
private static final Pattern PATTERN_FORWARD = 
Pattern.compile(REGEX_FORWARD);
private static final Pattern PATTERN_ANNOTATION = 
Pattern.compile(REGEX_ANNOTATION);
private static final Pattern PATTERN_LIST = Pattern.compile(REGEX_LIST);
+   private static final Pattern PATTERN_FIELD = 
Pattern.compile(REGEX_NESTED_FIELDS);
 
-   private static final Pattern PATTERN_DIGIT = Pattern.compile(\\d+);
-
-   public static SingleInputSemanticProperties 
createProjectionPropertiesSingle(int[] fields) {
+   public static SingleInputSemanticProperties 
createProjectionPropertiesSingle(int[] fields, CompositeType? inType)
+   {
SingleInputSemanticProperties ssp = new 
SingleInputSemanticProperties();
-   for (int i = 0; i  fields.length; i++) {
-   ssp.addForwardedField(fields[i], i);
+
+   int[] sourceOffsets = new int[inType.getArity()];
+   sourceOffsets[0] = 0;
+   for(int i=1; iinType.getArity(); i++) {
+   sourceOffsets[i] = 
inType.getTypeAt(i-1).getTotalFields() + sourceOffsets[i-1];
}
+
+   int targetOffset = 0;
+   for(int i=0; ifields.length; i++) {
+   int sourceOffset = sourceOffsets[fields[i]];
+   int numFieldsToCopy = 
inType.getTypeAt(fields[i]).getTotalFields();
+
+   for(int j=0; jnumFieldsToCopy; j++) {
+   ssp.addForwardedField(sourceOffset+j, 
targetOffset+j);
+   }
+   targetOffset += numFieldsToCopy;
+   }
+
return ssp;
}
 
-   public static DualInputSemanticProperties 
createProjectionPropertiesDual(int[] fields, boolean[] isFromFirst) {
+   public static DualInputSemanticProperties 
createProjectionPropertiesDual(int[] fields, boolean[] isFromFirst,
+   

TypeInformation? inType1, TypeInformation? inType2)
+   {
DualInputSemanticProperties dsp = new 
DualInputSemanticProperties();
 
-   for (int i = 0; i  fields.length; i++) {
-   if (isFromFirst[i]) {
-   dsp.addForwardedField1(fields[i], i);
+   int[] sourceOffsets1;
+   if(inType1 instanceof TupleTypeInfo?) {
+   sourceOffsets1 = new int[inType1.getArity()];
+   sourceOffsets1[0] = 0;
+   for(int i=1; iinType1.getArity(); i++) {
+   sourceOffsets1[i] = 
((TupleTypeInfo?)inType1).getTypeAt(i-1).getTotalFields() + 
sourceOffsets1[i-1];
+   }
+   } else

[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations

2015-01-16 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/311#discussion_r23070438
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/typeutils/PojoTypeInfo.java 
---
@@ -45,6 +46,16 @@
  */
 public class PojoTypeInfoT extends CompositeTypeT{
 
+   private final static String REGEX_FIELD = 
[a-zA-Z_\\$][a-zA-Z0-9_\\$]*;
--- End diff --

Java allows to use any unicode character to be used as field names. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations

2015-01-16 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/311#discussion_r23070841
  
--- Diff: 
flink-compiler/src/main/java/org/apache/flink/compiler/dag/BinaryUnionNode.java 
---
@@ -266,4 +268,44 @@ public void computeOutputEstimates(DataStatistics 
statistics) {
this.estimatedOutputSize = in1.estimatedOutputSize  0  
in2.estimatedOutputSize  0 ?
in1.estimatedOutputSize + in2.estimatedOutputSize : -1;
}
+
+   public static class UnionSemanticProperties implements 
SemanticProperties {
+
+   @Override
+   public FieldSet getTargetFields(int input, int sourceField) {
+   if (input != 0  input != 1) {
+   throw new IndexOutOfBoundsException();
--- End diff --

Ah, okay. Then its fine.
I saw many helpful exceptions in this change. So I guess the user-facing 
exceptions are more descriptive. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations

2015-01-16 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/311#issuecomment-70230559
  
Except for the comments and the missing documentation, the change looks 
good.
I can however not really validate the changes in the optimizer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...

2015-01-16 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/304#discussion_r23071236
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java
 ---
@@ -185,6 +187,8 @@ private void checkKryoInitialized() {
this.kryo.setRegistrationRequired(false);
this.kryo.register(type);

this.kryo.setClassLoader(Thread.currentThread().getContextClassLoader());
+
+   kryo.register(DateTime.class, new 
JodaDateTimeSerializer());
--- End diff --

I would suggest to add some more serializers from `de.javakaffee` .. since 
we have it already as a dependency, it doesn't hurt to add them. 
I'm suggesting 
- `jodatime/JodaIntervalSerializer`,
- `guava/ImmutableListSerializer`,
- `UnmodifiableCollectionsSerializer`, 
- `GregorianCalendarSerializer`,
- `EnumSetSerializer`,
- `EnumMapSerializer`,
- BitSetSerializer - serializer for java.util.BitSet
- RegexSerializer - serializer for java.util.regex.Pattern
- URISerializer - serializer for java.net.URI
- UUIDSerializer - serializer for java.util.UUID



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...

2015-01-16 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/304#issuecomment-70231876
  
Change looks good except for comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1406] update Flink compatibility notice

2015-01-16 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/314#issuecomment-70234048
  
Good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1382][java] Adds the new basic types Vo...

2015-01-16 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/299#issuecomment-70269761
  
Oh, yes .. sorry. I need to be more careful when reviewing pull requests.

+1 to merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1183] Generate gentle notification mess...

2015-01-15 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/296#issuecomment-70090914
  
I think its fine to just merge it to the release-0.8 branch
It will then automatically go into 0.8.1. I'll create a version in JIRA

On Thu, Jan 15, 2015 at 2:52 PM, Ufuk Celebi notificati...@github.com
wrote:

 Yes +1 for the new message.

 I will merge this in the next batch. It's not super important, but I think
 this should go into 0.8.0 if there is a new RC.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/flink/pull/296#issuecomment-70087926.




-- 
Robert Metzger, Kontakt: metzg...@web.de, Mobil: 0171/7424461


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1295][FLINK-883] Allow to deploy 'job o...

2015-01-21 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/292#discussion_r23292258
  
--- Diff: 
flink-yarn/src/main/scala/org/apache/flink/yarn/ApplicationMaster.scala ---
@@ -41,9 +41,9 @@ object ApplicationMaster{
   val MODIFIED_CONF_FILE = flink-conf-modified.yaml
 
   def main(args: Array[String]): Unit ={
-val yarnClientUsername = System.getenv(Client.ENV_CLIENT_USERNAME)
-LOG.info(sYARN daemon runs as 
${UserGroupInformation.getCurrentUser.getShortUserName}  +
-  s' setting user to execute Flink ApplicationMaster/JobManager to 
${yarnClientUsername}')
+val yarnClientUsername = 
System.getenv(FlinkYarnClient.ENV_CLIENT_USERNAME)
+LOG.info(sYARN daemon runs as 
${UserGroupInformation.getCurrentUser.getShortUserName} +
--- End diff --

I leave the scala string interpolation here. The message is only logged 
once, we have the info level on by default and it helps improving the clearness 
to distinguish between the two usernames.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons

2015-01-20 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/326#discussion_r23248831
  
--- Diff: 
flink-addons/flink-gelly/src/test/java/org/apache/flink/gelly/test/TestWeaklyConnected.java
 ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.gelly.test;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.util.Collection;
+import java.util.LinkedList;
+
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.gelly.Graph;
+import org.apache.flink.test.util.JavaProgramTestBase;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.junit.runners.Parameterized.Parameters;
+
+@RunWith(Parameterized.class)
+public class TestWeaklyConnected extends JavaProgramTestBase {
--- End diff --

I think its recommended now to use the `MultipleProgramsTestBase` instead 
of the `JavaProgramTestBase` because the MultiProgramsTB is 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons

2015-01-20 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/326#issuecomment-70717064
  
Great, I'm super excited to see the graph API being offered to the main 
project.

I'll start reviewing the code right away, to merge it as soon as possible.
One question upfront: How did you come up with the name gelly? 
Why don't we call the baby by what it is? a graph api ?

Should we consider moving the classes while preserving their history? Thats 
what we did with the streaming system when we merged it. Right now, basically 
all the code from the graph api has one commit in its history (6c31f8e)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1372] [runtime] Fixes Akka logging

2015-01-21 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/329#issuecomment-70856935
  
Good to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Release 0.8 Preparations

2015-01-18 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/294#issuecomment-70418596
  
The changes have been merged to 0.8 and master (except for FLINK-1385, but 
this fix needs to be implemented differently for master. I'll do that as part 
of my yarn pull request).

I'm closing this pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Release 0.8 Preparations

2015-01-18 Thread rmetzger
Github user rmetzger closed the pull request at:

https://github.com/apache/flink/pull/294


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1295][FLINK-883] Allow to deploy 'job o...

2015-01-18 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/292#issuecomment-70416726
  
Thank you for the feedback. I'll address the inline comments.

Regarding the questions:
1. The separation between `flink-yarn` and `flink-yarn-tests` exists 
because the `flink-yarn-tests` expect the `flink-dist` package to be build 
before the `flink-yarn-tests` package.
The tests are really starting a YARN cluster and deploying a flink fat-jar 
to it. Therefore, we need to run `flink-dist` first. I asked on the maven 
mailing list if there is a way of simplifying this and it seems to be possible 
to store the archetype descriptors (used to build the fat-jar) in a separate 
maven module which is then accessed by `flink-dist` and a `prepare-tests` phase 
of `flink-yarn`. But for that approach, we would need to create an additional 
maven module (something like `flink-assemblies` for making the assembly 
descriptor independent of the `flink-dist` package). 

  *tl;dr* maven is not flexible enough for a better solution.
2. Yep. That doesn't make much sense (I used a local yarn cluster for 
testing it .. thats why I didn't really stumble across it). I'll fix it.
3. This is true for the old YARN client (before this pull request). As you 
can see here: 
https://github.com/apache/flink/pull/292/files#diff-37b2363833862d636afea47fab39a694L269
 I removed the code that was computing the port. This new YARN client is 
allocating ALL ports dynamically (web frontend, RPC). I'm using YARN to 
transfer the RPC port of the AM to the client.
4. Probably not hard. Its a matter of taste I guess. I think my approach is 
more flexible (imagine we want to have a `-m mesos-cluster` or a `-m 
flink-local` at some point). Also, we would need to throw an exception if a 
user is setting something like `-j -m myCluster:6123`. It would be good to get 
some more opinions here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1399] Add support for registering Seria...

2015-01-18 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/305#issuecomment-70424921
  
Maybe we should offer users both options: register and default serializer. 
In some cases, you don't know the exact types and you want to go for a 
default serializer.
I agree that we should in each case register the class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1504] support for secure HDFS access us...

2015-02-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/383#issuecomment-73912267
  
Looks good to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1504] support for secure HDFS access us...

2015-02-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/383#issuecomment-73916367
  
Rebased to master: https://github.com/rmetzger/flink/tree/kerberos_hdfs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1510] Make AvroInputFormat splittable

2015-02-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/382#issuecomment-73917179
  
I'm closing the PR until I've added tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: support for secure HDFS access using kerberos

2015-02-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/383#issuecomment-73910366
  
https://issues.apache.org/jira/browse/FLINK-1504


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1510] Make AvroInputFormat splittable

2015-02-11 Thread rmetzger
Github user rmetzger closed the pull request at:

https://github.com/apache/flink/pull/382


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1489] Fixes blocking scheduleOrUpdateCo...

2015-02-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/378#issuecomment-73911265
  
The job that was previously failing is fixed with this change.

We should merge this change ASAP, because its kinda impossible right now to 
seriously use flink 0.9-SNAPSHOT without it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1417] Automatically register types with...

2015-02-18 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/393#issuecomment-74833886
  
I would like to merge this pull request soon.
@aljoscha, do you agree that we can investigate the performance for the 
PojoComparator also when the change is merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1417] Automatically register types with...

2015-02-18 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/393#issuecomment-74843995
  
Yes.  First we need to understand why exactly the performance is so poor. 
Maybe its an issue we can easily fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-947] Add a declarative expression API

2015-02-18 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/405#issuecomment-74832989
  
One more thing, the maven module is called flink-linq. Are we certain 
that we can use the name LINQ without problems here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1461][api-extending] Add SortPartition ...

2015-02-12 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/381#issuecomment-74120225
  
Looks good to me (I'm uncertain regarding the optimizer changes)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1508] Removes AkkaUtil.ask

2015-02-12 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/384#issuecomment-74119549
  
I vote to merge this quickly and fix issues as they appear. 
The change touches a lot of different parts of the code and is predestined 
to become unmergeable quickly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1391] Register common Avro types at Kry...

2015-02-12 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/386

[FLINK-1391] Register common Avro types at Kryo



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink kryo081-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/386.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #386


commit 5ef83e310c90286b85a5c4f6715c193a56899012
Author: Robert Metzger rmetz...@apache.org
Date:   2015-02-12T11:32:27Z

[FLINK-1391] Register common Avro types at Kryo




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1522][gelly] Added test for SSSP Exampl...

2015-02-17 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/414#discussion_r24829996
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/SingleSourceShortestPathsExample.java
 ---
@@ -56,4 +63,77 @@ public static void main(String[] args) throws Exception {
public String getDescription() {
return Single Source Shortest Paths;
}
+
+   // 
**
+   // UTIL METHODS
+   // 
**
+
+   private static boolean fileOutput = false;
+
+   private static Long srcVertexId = null;
+
+   private static String verticesInputPath = null;
+
+   private static String edgesInputPath = null;
+
+   private static String outputPath = null;
+
+   private static int maxIterations = 5;
+
+   private static boolean parseParameters(String[] args) {
+
+   if (args.length  0) {
+   if (args.length == 5) {
+   fileOutput = true;
+   srcVertexId = Long.parseLong(args[0]);
+   verticesInputPath = args[1];
+   edgesInputPath = args[2];
+   outputPath = args[3];
+   maxIterations = Integer.parseInt(args[4]);
+   } else {
+   System.err.println(Usage: 
SingleSourceShortestPaths source vertex id +
+input vertices path input 
edges path output path num iterations);
+   return false;
+   }
+   }
+   return true;
+   }
+
+   private static DataSetVertexLong, Double 
getVerticesDataSet(ExecutionEnvironment env) {
+   if (fileOutput) {
+   return env.readCsvFile(verticesInputPath)
+   .lineDelimiter(\n)
+   .types(Long.class, Double.class)
+   .map(new MapFunctionTuple2Long, 
Double, VertexLong, Double() {
+
+   @Override
+   public VertexLong, Double 
map(Tuple2Long, Double tuple2) throws Exception {
+   return new VertexLong, 
Double(tuple2.f0, tuple2.f1);
+   }
+   });
+   } else {
+   System.err.println(Usage: SingleSourceShortestPaths 
source vertex id +
+input vertices path input edges 
path output path num iterations);
+   return null;
--- End diff --

I suspect the code will fail with a null pointer exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1466] Add HCatInputFormats to read from...

2015-02-17 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/411#discussion_r24833839
  
--- Diff: 
flink-staging/flink-hcatalog/src/main/java/org/apache/flink/hcatalog/HCatInputFormatBase.java
 ---
@@ -0,0 +1,413 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.hcatalog;
+
+import org.apache.flink.api.common.io.InputFormat;
+import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
+import org.apache.flink.api.common.io.statistics.BaseStatistics;
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.api.common.typeinfo.PrimitiveArrayTypeInfo;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.hadoop.mapreduce.utils.HadoopUtils;
+import org.apache.flink.api.java.hadoop.mapreduce.wrapper.HadoopInputSplit;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.api.java.typeutils.TupleTypeInfo;
+import org.apache.flink.api.java.typeutils.WritableTypeInfo;
+import org.apache.flink.core.io.InputSplitAssigner;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.WritableComparable;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.JobID;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.TaskAttemptID;
+import org.apache.hive.hcatalog.common.HCatException;
+import org.apache.hive.hcatalog.common.HCatUtil;
+import org.apache.hive.hcatalog.data.DefaultHCatRecord;
+import org.apache.hive.hcatalog.data.HCatRecord;
+import org.apache.hive.hcatalog.data.schema.HCatFieldSchema;
+import org.apache.hive.hcatalog.data.schema.HCatSchema;
+
+import java.io.IOException;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * A InputFormat to read from HCatalog tables.
+ * The InputFormat supports projection (selection and order of fields) and 
partition filters.
+ *
+ * Data can be returned as {@link 
org.apache.hive.hcatalog.data.HCatRecord} or Flink {@link 
org.apache.flink.api.java.tuple.Tuple}.
+ * Flink Tuples are only supported for primitive type fields
+ * (no STRUCT, ARRAY, or MAP data types) and have a size limitation.
+ *
+ * @param T
+ */
+public abstract class HCatInputFormatBaseT implements InputFormatT, 
HadoopInputSplit, ResultTypeQueryableT {
+
+   private static final long serialVersionUID = 1L;
+
+   private Configuration configuration;
+
+   private org.apache.hive.hcatalog.mapreduce.HCatInputFormat 
hCatInputFormat;
+   private RecordReaderWritableComparable, HCatRecord recordReader;
+   private boolean fetched = false;
+   private boolean hasNext;
+
+   protected String[] fieldNames = new String[0];
+   protected HCatSchema outputSchema;
+
+   private TypeInformationT resultType;
+
+   public HCatInputFormatBase() { }
+
+   /**
+* Creates a HCatInputFormat for the given database and table.
+* By default, the InputFormat returns {@link 
org.apache.hive.hcatalog.data.HCatRecord}.
+* The return type of the InputFormat can be changed to Flink {@link 
org.apache.flink.api.java.tuple.Tuple} by calling
+* {@link HCatInputFormatBase#asFlinkTuples()}.
+*
+* @param database The name of the database to read from.
+* @param table The name of the table to read.
+* @throws java.io.IOException
+*/
+   public HCatInputFormatBase(String database, String table) throws 
IOException {
+   this(database, table, new Configuration());
+   }
+
+   /**
+* Creates a HCatInputFormat for the given database, table, and
+* {@link org.apache.hadoop.conf.Configuration

[GitHub] flink pull request: [FLINK-1545] Fixes AsynchronousFileIOChannelsT...

2015-02-15 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/399#issuecomment-74414666
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-947] Add a declarative expression API

2015-02-17 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/405#discussion_r24858527
  
--- Diff: docs/linq.md ---
@@ -0,0 +1,65 @@
+---
+title: Language-Integrated Queries
+---
+!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+License); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+--
+
+* This will be replaced by the TOC
+{:toc}
+
+**Language-Integrated Queries are an experimental feature and can 
currently only be used with
--- End diff --

Good to see some documentation as well!

Which types are supported by the expression API? Only scala case-classes? 
POJOs ? Even more? 
Would be could if you could add that to the documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1549] Adds proper exception handling to...

2015-02-15 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/397#issuecomment-74414852
  
Changes look good.
Thank you for taking care of this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1589] Add option to pass configuration ...

2015-02-20 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/427

[FLINK-1589] Add option to pass configuration to LocalExecutor

Please review the changes.

I'll add a testcase and update the documentation later today.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink1589

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/427.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #427


commit b75b4c285f4810faa5d02d638b61dc7b8e125c8d
Author: Robert Metzger rmetz...@apache.org
Date:   2015-02-20T11:40:41Z

[FLINK-1589] Add option to pass configuration to LocalExecutor




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1589] Add option to pass configuration ...

2015-02-20 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/427#issuecomment-75244751
  
I've added documentation and tests to the change.
Lets see if travis gives us a green light.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1466] Add HCatInputFormats to read from...

2015-02-18 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/411#issuecomment-74895705
  
Cool. Then I think the change is good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1501] Add metrics library for monitorin...

2015-02-19 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/421

[FLINK-1501] Add metrics library for monitoring TaskManagers

Hey,
I've spend some time exploring the 
[metrics](https://dropwizard.github.io/metrics/3.1.0/) library for improving 
the performance monitoring in Flink.

This pull request is a first step into that direction. The primary 
objective is a clean integration of the JVM monitoring into our system.

I spend probably 80% of the time in making the javascript frontend work. 
For that, I've used [rickshaw](https://github.com/shutterstock/rickshaw), a 
project also used by projects like Apache Ambari for creating nice graphs.
Still, the visualization is not perfect and I would like to see incremental 
improvements there.

The next step for me will be metrics for individual jobs.


![newmonitoring](https://cloud.githubusercontent.com/assets/89049/6268186/391731bc-b84b-11e4-8379-cbd5428651c4.png)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink1501

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/421.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #421


commit 13d17153ccb6adb84f74e72261223b61382f4371
Author: Robert Metzger rmetz...@apache.org
Date:   2015-02-07T10:33:31Z

[FLINK-1501] Add metrics library for monitoring TaskManagers




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-02-17 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-74638095
  
Cool.
Lets merge this also to master and document it there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1417] Automatically register types with...

2015-02-17 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/393#issuecomment-74665100
  
Thank you for reviewing the pull request.

I've addressed your remark, added a fix for another issue (FLINK-1567) and 
now I'll run the performance tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...

2015-01-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71612618
  
@dan-blanchard What non-JVM language are you looking for?
Maybe we can do a little prototype with that language to see how well it 
works. Maybe you or somebody else from the community is interested in making 
the prototype production ready?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations

2015-01-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/311#issuecomment-71613018
  
+1 for merging it

Whats the plan with the documentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...

2015-01-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71610034
  
In addition to that, we should probably investigate why the clean phase 
is not removing the directory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1457] exclude avro test file from RAT c...

2015-01-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/345#issuecomment-71656804
  
Good to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1433] Add HADOOP_CLASSPATH to start scr...

2015-01-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/337#issuecomment-71631049
  
Merging it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-1452: Rename 'flink-addons' to 'flink-st...

2015-01-30 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/355

FLINK-1452: Rename 'flink-addons' to 'flink-staging'; add 'flink-contrib'



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink1452

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/355.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #355


commit 7032cffca0ca581cf954ae085ce18a5e1e908c9a
Author: Robert Metzger rmetz...@apache.org
Date:   2015-01-30T14:16:46Z

[FLINK-1452] Rename 'flink-addons' to 'flink-staging'

commit 732a9e37846c8e17d12bba1c28d6f62b317206fa
Author: Robert Metzger rmetz...@apache.org
Date:   2015-01-30T14:29:05Z

[FLINK-1452] Add 'flink-contrib' module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1376] [runtime] Add proper shared slot ...

2015-01-31 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/317#issuecomment-72319309
  
Okay, I'll file a JIRA


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1389] Allow changing the filenames of t...

2015-01-25 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/301#issuecomment-71366256
  
The only reason why I added the two variants is that I want to give users 
freedom to choose between these two variants. Its one more line of code that 
might (someday) make a user happy.

I hope you agree with me that its easier and more efficient for me now to 
just merge it as it is rather than removing the feature again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1434] [FLINK-1401] Streaming support ad...

2015-01-25 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/334#discussion_r23503538
  
--- Diff: flink-addons/flink-streaming/flink-streaming-core/pom.xml ---
@@ -48,6 +48,12 @@ under the License.
version${project.version}/version
scopetest/scope
/dependency
+
+dependency
+groupIdorg.apache.sling/groupId
+artifactIdorg.apache.sling.commons.json/artifactId
+version2.0.6/version
+/dependency
--- End diff --

how many new / and which transitive dependencies are added by this 
dependency?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1434] [FLINK-1401] Streaming support ad...

2015-01-25 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/334#discussion_r23503540
  
--- Diff: 
flink-addons/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/StreamGraph.java
 ---
@@ -536,4 +549,79 @@ public long getIterationTimeout(String vertexName) {
return iterationTimeouts.get(vertexName);
}
 
+   public String getOperatorName(String vertexName) {
+   return operatorNames.get(vertexName);
+   }
+
+   @Override
+   public String getStreamingPlanAsJSON() {
+
+   try {
+   JSONObject json = new JSONObject();
+   JSONArray nodes = new JSONArray();
+
+   json.put(nodes, nodes);
+
+   for (String id : operatorNames.keySet()) {
+   JSONObject node = new JSONObject();
+   nodes.put(node);
+
+   node.put(id, Integer.valueOf(id));
+   node.put(type, getOperatorName(id));
+
+   if (sources.contains(id)) {
+   node.put(pact, Data Source);
+   } else {
+   node.put(pact, Data Stream);
+   }
+
+   node.put(contents, getOperatorName(id) +  at 

+   + 
getInvokable(id).getUserFunction().getClass().getSimpleName());
+   node.put(parallelism, getParallelism(id));
+
+   int numIn = getInEdges(id).size();
+   if (numIn  0) {
+
+   JSONArray inputs = new JSONArray();
+   node.put(predecessors, inputs);
+
+   for (int i = 0; i  numIn; i++) {
+
+   String inID = 
getInEdges(id).get(i);
+
+   JSONObject input = new 
JSONObject();
+   inputs.put(input);
+
+   input.put(id, 
Integer.valueOf(inID));
+   input.put(ship_strategy, 
getOutPartitioner(inID, id).getStrategy());
+   if (i == 0) {
+   input.put(side, 
first);
+   } else if (i == 1) {
+   input.put(side, 
second);
+   }
+   }
+   }
+
+   }
+   return json.toString();
+   } catch (Exception e) {
--- End diff --

maybe we should at least LOG.debug() the exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1295][FLINK-883] Allow to deploy 'job o...

2015-01-26 Thread rmetzger
Github user rmetzger closed the pull request at:

https://github.com/apache/flink/pull/292


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1389] Allow changing the filenames of t...

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/301#issuecomment-71439040
  
You are right, I should have made the requirements clear in the beginning. 
I actually had a discussion with the user which approach is the best.
I took the view that a string based method is easier for now and 
implemented it.
Then, we had the whole discussion here and I thought, now, that everybody 
is unhappy with my approach, I better do exactly what the user wants, instead 
of going with a strong opinion.

And now I'm confused and upset. But to be realistic: We only had one user 
so who wanted to change the filenames at all. If there is ever going to be a 
second or third user, they either have to do another contribution or overwrite 
their input format. 

http://bikeshed.com/ (see also: 
http://en.wikipedia.org/wiki/Parkinson%27s_law_of_triviality)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Discuss] Simplify SplittableIterator interfac...

2015-01-26 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/338

[Discuss] Simplify SplittableIterator interface

While working on something, I found the SplittableIterator interface 
unnecessary complicated.
Let me know if you agree to merge this simplification.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink fix_interface

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/338.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #338


commit 4c2e3fb272c263d149e7b839fb2d9d496232a8be
Author: Robert Metzger rmetz...@apache.org
Date:   2015-01-25T18:52:02Z

Simplify SplittableIterator interface

commit fc0aff770fc51f7b944c6ef1c7f17c4a79d0c2ca
Author: Robert Metzger rmetz...@apache.org
Date:   2015-01-25T18:55:46Z

fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1433] Add HADOOP_CLASSPATH to start scr...

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/337#issuecomment-71429315
  
It depends on what the user does with the `HADOOP_CLASSPATH`.
In my understanding, it is meant as a variable for adding 3rd party jar 
files to Hadoop. The jar files of hadoop are added to the `CLASSPATH` variable 
in the `libexec/hadoop-config.sh` script. There, you see variables like 
`HADOOP_COMMON_LIB_JARS_DIR`, `HDFS_LIB_JARS_DIR`, `YARN_LIB_JARS_DIR`, ... 
being added to the CLASSPATH. In the very last step, they add the 
HADOOP_CLASSPATH variable (by default to the end of the classpath, but there is 
an additional option to put it in front of it).

I found that we need to add this on Google Compute Engine's Hadoop 
deployment. They have their Google Storage configured by default but it 
currently doesn't work in non-yarn setups because the Google Storage jar is not 
in our classpath. On these clusters, the `HADOOP_CLASSPATH` variable contains 
the path to the storage-jar.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1389] Allow changing the filenames of t...

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/301#issuecomment-71432009
  
I've completely changed the mechanism of setting a custom file name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1389] Allow changing the filenames of t...

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/301#issuecomment-71434721
  
The user who requested this feature actually asked for a custom method to 
overwrite.

There is one more important use-case for a custom method: If users want to 
have files named exactly like hadoop, they also need a method. Hadoop is using 
6-digit numbers, filled up with zeroes (for example part-m-01).

I'm tired of changing the pull request until everybody is happy. Its very 
inefficient and I have better stuff to do with my time. If you want, you can 
change the code once its merged, but I have rewritten this 3 times now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...

2015-01-25 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-71376640
  
Thank you.
Looks good. History is preserved and you addressed my comments.

+1 for merging it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1433] Add HADOOP_CLASSPATH to start scr...

2015-01-25 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/337

[FLINK-1433] Add HADOOP_CLASSPATH to start scripts



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink FLINK-1433

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/337.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #337


commit b9d1140df9b3232be53c105636f370f1d11aca37
Author: Robert Metzger rmetz...@apache.org
Date:   2015-01-25T15:05:20Z

[FLINK-1433] Add HADOOP_CLASSPATH to start scripts




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-1436] refactor CLiFrontend to provide m...

2015-01-25 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/331#issuecomment-71376859
  
@mxm: Since you have some experience with the CliFrontend now, would you 
mind looking into https://issues.apache.org/jira/browse/FLINK-1424 as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   4   5   6   7   8   9   10   >