[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291766#comment-14291766
 ] 

ASF GitHub Bot commented on FLINK-377:
--

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71453267
  
I've rebased and update this PR.

Notable new stuff:
* hybrid mode removed
* documentation update and integrated into website
* **chaining** on the python side (map,flatmap, filter, combine)
* groupreduce/cogroup reworked - grouping done on python side
* iterators passed to UDF's now iterable
* **lambda support**
* **test coverage** (works from IDE, maven and on travis)


 Create a general purpose framework for language bindings
 

 Key: FLINK-377
 URL: https://issues.apache.org/jira/browse/FLINK-377
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
  Labels: github-import
 Fix For: pre-apache


 A general purpose API to run operators with arbitrary binaries. 
 This will allow to run Stratosphere programs written in Python, JavaScript, 
 Ruby, Go or whatever you like. 
 We suggest using Google Protocol Buffers for data serialization. This is the 
 list of languages that currently support ProtoBuf: 
 https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns 
 Very early prototype with python: 
 https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing 
 protobuf)
 For Ruby: https://github.com/infochimps-labs/wukong
 Two new students working at Stratosphere (@skunert and @filiphaase) are 
 working on this.
 The reference binding language will be for Python, but other bindings are 
 very welcome.
 The best name for this so far is stratosphere-lang-bindings.
 I created this issue to track the progress (and give everybody a chance to 
 comment on this)
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/377
 Created by: [rmetzger|https://github.com/rmetzger]
 Labels: enhancement, 
 Assignee: [filiphaase|https://github.com/filiphaase]
 Created at: Tue Jan 07 19:47:20 CET 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291803#comment-14291803
 ] 

ASF GitHub Bot commented on FLINK-377:
--

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71458700
  
Wow, great news! :-)

In general, I think we really have to do something about getting the 
changes in. The PR is growing faster than its getting feedback. Has anybody 
looked into this and tried it out recently?


 Create a general purpose framework for language bindings
 

 Key: FLINK-377
 URL: https://issues.apache.org/jira/browse/FLINK-377
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
  Labels: github-import
 Fix For: pre-apache


 A general purpose API to run operators with arbitrary binaries. 
 This will allow to run Stratosphere programs written in Python, JavaScript, 
 Ruby, Go or whatever you like. 
 We suggest using Google Protocol Buffers for data serialization. This is the 
 list of languages that currently support ProtoBuf: 
 https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns 
 Very early prototype with python: 
 https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing 
 protobuf)
 For Ruby: https://github.com/infochimps-labs/wukong
 Two new students working at Stratosphere (@skunert and @filiphaase) are 
 working on this.
 The reference binding language will be for Python, but other bindings are 
 very welcome.
 The best name for this so far is stratosphere-lang-bindings.
 I created this issue to track the progress (and give everybody a chance to 
 comment on this)
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/377
 Created by: [rmetzger|https://github.com/rmetzger]
 Labels: enhancement, 
 Assignee: [filiphaase|https://github.com/filiphaase]
 Created at: Tue Jan 07 19:47:20 CET 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...

2015-01-26 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71458700
  
Wow, great news! :-)

In general, I think we really have to do something about getting the 
changes in. The PR is growing faster than its getting feedback. Has anybody 
looked into this and tried it out recently?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1415) Akka cleanups

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291770#comment-14291770
 ] 

ASF GitHub Bot commented on FLINK-1415:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/319#discussion_r23526836
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotSharingGroupAssignment.java
 ---
@@ -42,86 +44,82 @@
 public class SlotSharingGroupAssignment implements Serializable {
 
static final long serialVersionUID = 42L;
-   
+
private static final Logger LOG = Scheduler.LOG;
-   
+
private transient final Object lock = new Object();
-   
+
/** All slots currently allocated to this sharing group */
private final SetSharedSlot allSlots = new 
LinkedHashSetSharedSlot();
-   
+
/** The slots available per vertex type (jid), keyed by instance, to 
make them locatable */
private final MapAbstractID, MapInstance, ListSharedSlot 
availableSlotsPerJid = new LinkedHashMapAbstractID, MapInstance, 
ListSharedSlot();
-   
-   
+
// 

-   
-   
-   public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex 
vertex) {
-   JobVertexID id = vertex.getJobvertexId();
-   return addNewSlotWithTask(slot, id, id);
-   }
-   
-   public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex 
vertex, CoLocationConstraint constraint) {
-   AbstractID groupId = constraint.getGroupId();
-   return addNewSlotWithTask(slot, groupId, null);
-   }
-   
-   private SubSlot addNewSlotWithTask(AllocatedSlot slot, AbstractID 
groupId, JobVertexID vertexId) {
-   
-   final SharedSlot sharedSlot = new SharedSlot(slot, this);
-   final Instance location = slot.getInstance();
-   
+
+   public SimpleSlot addSharedSlotAndAllocateSubSlot(SharedSlot 
sharedSlot, Locality locality,
+   
AbstractID groupId, CoLocationConstraint constraint) {
--- End diff --

indentation?


 Akka cleanups
 -

 Key: FLINK-1415
 URL: https://issues.apache.org/jira/browse/FLINK-1415
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Currently, Akka has many different timeout values. From a user perspective, 
 it would be helpful to deduce all different timeouts from a single timeout 
 value. Additionally, the user should still be able to define specific values 
 for the different timeouts.
 Akka uses the akka.jobmanager.url config parameter to override the jobmanager 
 address and the port in case of a local setup. This mechanism is not safe 
 since it is exposed to the user. Thus, the mechanism should be replaced.
 The notifyExecutionStateChange method allows objects to access the internal 
 state of the TaskManager actor. This causes NullPointerExceptions when 
 shutting down the actor. This method should be removed to avoid accessing the 
 internal state of an actor by another object.
 With the latest Akka changes, the TaskManager watches the JobManager in order 
 to detect when it died or lost the connection to the TaskManager. This 
 behaviour should be tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1168) Support multi-character field delimiters in CSVInputFormats

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291783#comment-14291783
 ] 

ASF GitHub Bot commented on FLINK-1168:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/264#issuecomment-71455623
  
Updated the PR and will merge once Travis completed the build.


 Support multi-character field delimiters in CSVInputFormats
 ---

 Key: FLINK-1168
 URL: https://issues.apache.org/jira/browse/FLINK-1168
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.7.0-incubating
Reporter: Fabian Hueske
Assignee: Manu Kaul
Priority: Minor
  Labels: starter

 The CSVInputFormat supports multi-char (String) line delimiters, but only 
 single-char (char) field delimiters.
 This issue proposes to add support for multi-char field delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1424) bin/flink run does not recognize -c parameter anymore

2015-01-26 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-1424:
--
Assignee: Max Michels

 bin/flink run does not recognize -c parameter anymore
 -

 Key: FLINK-1424
 URL: https://issues.apache.org/jira/browse/FLINK-1424
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Affects Versions: master
Reporter: Carsten Brandt
Assignee: Max Michels

 bin/flink binary does not recognize `-c` parameter anymore which specifies 
 the class to run:
 {noformat}
 $ ./flink run /path/to/target/impro3-ws14-flink-1.0-SNAPSHOT.jar -c 
 de.tu_berlin.impro3.flink.etl.FollowerGraphGenerator /tmp/flink/testgraph.txt 
 1
 usage: emma-experiments-impro3-ss14-flink
[-?]
 emma-experiments-impro3-ss14-flink: error: unrecognized arguments: '-c'
 {noformat}
 before this command worked fine and executed the job.
 I tracked it down to the following commit using `git bisect`:
 {noformat}
 93eadca782ee8c77f89609f6d924d73021dcdda9 is the first bad commit
 commit 93eadca782ee8c77f89609f6d924d73021dcdda9
 Author: Alexander Alexandrov alexander.s.alexand...@gmail.com
 Date:   Wed Dec 24 13:49:56 2014 +0200
 [FLINK-1027] [cli] Added support for '--' and '-' prefixed tokens in CLI 
 program arguments.
 
 This closes #278
 :04 04 a1358e6f7fe308b4d51a47069f190a29f87fdeda 
 d6f11bbc9444227d5c6297ec908e44b9644289a9 Mflink-clients
 {noformat}
 https://github.com/apache/flink/commit/93eadca782ee8c77f89609f6d924d73021dcdda9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1443) Add replicated data source

2015-01-26 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske reassigned FLINK-1443:


Assignee: Fabian Hueske

 Add replicated data source
 --

 Key: FLINK-1443
 URL: https://issues.apache.org/jira/browse/FLINK-1443
 Project: Flink
  Issue Type: New Feature
  Components: Java API, JobManager, Optimizer
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Minor

 This issue proposes to add support for data sources that read the same data 
 in all parallel instances. This feature can be useful, if the data is 
 replicated to all machines in a cluster and can be locally read. 
 For example, a replicated input format can be used for a broadcast join 
 without sending any data over the network.
 The following changes are necessary to achieve this:
 1) Add a replicating InputSplitAssigner which assigns all splits to the all 
 parallel instances. This requires also to extend the InputSplitAssigner 
 interface to identify the exact parallel instance that requests an InputSplit 
 (currently only the hostname is provided).
 2) Make sure that the DOP of the replicated data source is identical to the 
 DOP of its successor.
 3) Let the optimizer know that the data is replicated and ensure that plan 
 enumeration works correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1105) Add support for locally sorted output

2015-01-26 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske reassigned FLINK-1105:


Assignee: Fabian Hueske

 Add support for locally sorted output
 -

 Key: FLINK-1105
 URL: https://issues.apache.org/jira/browse/FLINK-1105
 Project: Flink
  Issue Type: Sub-task
  Components: Java API
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Minor

 This feature will make it possible to sort the output which is sent to an 
 OutputFormat to obtain a locally sorted result.
 This feature was available in the old Java API and has not be ported to the 
 new Java API yet. Hence optimizer and runtime should already have support for 
 this feature. However, the API and job generation part is missing.
 It is also a subfeature of FLINK-598 which will provide also globally sorted 
 results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1168) Support multi-character field delimiters in CSVInputFormats

2015-01-26 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-1168.
--
Resolution: Fixed

Fixed with 0548a93dfc555a5403590f147d4850c730facaf6

 Support multi-character field delimiters in CSVInputFormats
 ---

 Key: FLINK-1168
 URL: https://issues.apache.org/jira/browse/FLINK-1168
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.7.0-incubating
Reporter: Fabian Hueske
Assignee: Manu Kaul
Priority: Minor
  Labels: starter

 The CSVInputFormat supports multi-char (String) line delimiters, but only 
 single-char (char) field delimiters.
 This issue proposes to add support for multi-char field delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1344] [streaming] Added static StreamEx...

2015-01-26 Thread senorcarbone
GitHub user senorcarbone opened a pull request:

https://github.com/apache/flink/pull/341

[FLINK-1344] [streaming] Added static StreamExecutionEnvironment 
initialisation and Implicits for scala sources

This PR addresses the ticket [1] for further scala constructs 
interoperability. I had to add static StreamExecutionEnvironment initialisation 
to make the implicit conversion possible. 

[1] https://issues.apache.org/jira/browse/FLINK-1344

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mbalassi/flink scala-seq

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/341.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #341


commit 2ea675895d605e5c0a442171388c7be9361acf79
Author: Paris Carbone seniorcarb...@gmail.com
Date:   2015-01-23T16:23:46Z

[FLINK-1344] [streaming] [scala] Added implicits from scala seq to 
datastream and static StreamExecutionEnvironment initialization




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Mk amulti char delim

2015-01-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/247


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-1153) Yarn container does not terminate if Flink's yarn client is terminated before the application master is completely started

2015-01-26 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-1153.
-
Resolution: Won't Fix

 Yarn container does not terminate if Flink's yarn client is terminated before 
 the application master is completely started
 --

 Key: FLINK-1153
 URL: https://issues.apache.org/jira/browse/FLINK-1153
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Reporter: Till Rohrmann
Assignee: Robert Metzger

 The yarn application master container does not terminate if the yarn client 
 is terminated after the container request is issued but before the 
 application master is completely started.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1016) Add directory for user-contributed programs

2015-01-26 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-1016.
-
Resolution: Duplicate

 Add directory for user-contributed programs
 ---

 Key: FLINK-1016
 URL: https://issues.apache.org/jira/browse/FLINK-1016
 Project: Flink
  Issue Type: Task
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Trivial

 As a result of this discussion: 
 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Where-to-host-user-contributed-Flink-programs-td750.html
  
 I'm going to add a directory for user-contributed Flink programs that are 
 reviewed by committers, but not shipped with releases. In addition, we 
 require that these programs provide test cases to improve test diversity and 
 the overall coverage.
 The first algorithm I'm going to add is the contribution made in FLINK-904.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...

2015-01-26 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71470142
  
Yes, I thought it would handle it as well, but I just ran it and it didn't 
work. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1330) Restructure directory layout

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291796#comment-14291796
 ] 

ASF GitHub Bot commented on FLINK-1330:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/333#discussion_r23528552
  
--- Diff: flink-dist/pom.xml ---
@@ -436,6 +436,37 @@ under the License.
/gitDescribe
/configuration
/plugin
+
+   !-- create a symbolic link to the build target in the 
root directory --
+   plugin
+   groupIdcom.pyx4j/groupId
+   artifactIdmaven-junction-plugin/artifactId
+   version1.0.3/version
+   executions
+   execution
+   phasepackage/phase
+   goals
+   goallink/goal
+   /goals
+   /execution
+   execution
+   idunlink/id
+   phaseclean/phase
+   goals
+   goalunlink/goal
+   /goals
+   /execution
+   /executions
+   configuration
+   links
+   link
+   
dst${basedir}/../build-target/dst
--- End diff --

I couldn't find a document stating that $basedir is deprecated, but I think 
you are right in the sense that the project prefix is used for everything 
related to the POM of the project (I think in previous versions the (now 
deprecated) prefix was `pom` and both the `version` and `basedir` properties 
are built-ins).

We use $basedir in other places as well.


 Restructure directory layout
 

 Key: FLINK-1330
 URL: https://issues.apache.org/jira/browse/FLINK-1330
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Documentation
Reporter: Max Michels
Priority: Minor
  Labels: usability

 When building Flink, the build results can currently be found under 
 flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/.
 I think we could improve the directory layout with the following:
 - provide the bin folder in the root by default
 - let the start up and submissions scripts in bin assemble the class path
 - in case the project hasn't been build yet, inform the user
 The changes would make it easier to work with Flink from source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...

2015-01-26 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/333#discussion_r23528552
  
--- Diff: flink-dist/pom.xml ---
@@ -436,6 +436,37 @@ under the License.
/gitDescribe
/configuration
/plugin
+
+   !-- create a symbolic link to the build target in the 
root directory --
+   plugin
+   groupIdcom.pyx4j/groupId
+   artifactIdmaven-junction-plugin/artifactId
+   version1.0.3/version
+   executions
+   execution
+   phasepackage/phase
+   goals
+   goallink/goal
+   /goals
+   /execution
+   execution
+   idunlink/id
+   phaseclean/phase
+   goals
+   goalunlink/goal
+   /goals
+   /execution
+   /executions
+   configuration
+   links
+   link
+   
dst${basedir}/../build-target/dst
--- End diff --

I couldn't find a document stating that $basedir is deprecated, but I think 
you are right in the sense that the project prefix is used for everything 
related to the POM of the project (I think in previous versions the (now 
deprecated) prefix was `pom` and both the `version` and `basedir` properties 
are built-ins).

We use $basedir in other places as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1168] Adds multi-char field delimiter s...

2015-01-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/264


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1201) Graph API for Flink

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291851#comment-14291851
 ] 

ASF GitHub Bot commented on FLINK-1201:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-71465448
  
Haven't had a closer look yet, but one thing that I noticed is the naming 
of the test files. 
In the current codebase all tests are named XyzTest (or XyzITCase) instead 
of TestXyz. Not sure if its worth changing though...


 Graph API for Flink 
 

 Key: FLINK-1201
 URL: https://issues.apache.org/jira/browse/FLINK-1201
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Vasia Kalavri

 This issue tracks the development of a Graph API/DSL for Flink.
 Until the code is pushed to the Flink repository, collaboration is happening 
 here: https://github.com/project-flink/flink-graph



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1330) Restructure directory layout

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291880#comment-14291880
 ] 

ASF GitHub Bot commented on FLINK-1330:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71468579
  
Very good catch.

I would address @aljoscha's comment and add the exclude. After that, I 
think it's fine to merge.

I've tested it locally and it works fine.


 Restructure directory layout
 

 Key: FLINK-1330
 URL: https://issues.apache.org/jira/browse/FLINK-1330
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Documentation
Reporter: Max Michels
Priority: Minor
  Labels: usability

 When building Flink, the build results can currently be found under 
 flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/.
 I think we could improve the directory layout with the following:
 - provide the bin folder in the root by default
 - let the start up and submissions scripts in bin assemble the class path
 - in case the project hasn't been build yet, inform the user
 The changes would make it easier to work with Flink from source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1330) Restructure directory layout

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291886#comment-14291886
 ] 

ASF GitHub Bot commented on FLINK-1330:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71469342
  
There's another minor thing. After running mvn clean, the symlink will 
point to a non-existing directory.


 Restructure directory layout
 

 Key: FLINK-1330
 URL: https://issues.apache.org/jira/browse/FLINK-1330
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Documentation
Reporter: Max Michels
Priority: Minor
  Labels: usability

 When building Flink, the build results can currently be found under 
 flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/.
 I think we could improve the directory layout with the following:
 - provide the bin folder in the root by default
 - let the start up and submissions scripts in bin assemble the class path
 - in case the project hasn't been build yet, inform the user
 The changes would make it easier to work with Flink from source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-956) Add a parameter to the YARN command line script that allows to define the amount of MemoryManager memory

2015-01-26 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-956.
--
   Resolution: Fixed
Fix Version/s: 0.7.0-incubating

This has been fixed as described in the comments.

 Add a parameter to the YARN command line script that allows to define the 
 amount of MemoryManager memory
 

 Key: FLINK-956
 URL: https://issues.apache.org/jira/browse/FLINK-956
 Project: Flink
  Issue Type: Improvement
  Components: YARN Client
Affects Versions: 0.6-incubating
Reporter: Stephan Ewen
Assignee: Robert Metzger
 Fix For: 0.7.0-incubating


 The current parameter specifies the YARN container size.
 It would be nice to have parameters for the JVM heap size 
 {{taskmanager.heap.mb}} and the amount of memory that goes to the memory 
 manager {{taskmanager.memory.size}}. Both should be optional.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1328) Rework Constant Field Annotations

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291904#comment-14291904
 ] 

ASF GitHub Bot commented on FLINK-1328:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/311#issuecomment-71470851
  
Addressed most comments and renamed constantFields/Sets to forwardedFields 
as discussed on dev-ml.
Would like to merge this soon.


 Rework Constant Field Annotations
 -

 Key: FLINK-1328
 URL: https://issues.apache.org/jira/browse/FLINK-1328
 Project: Flink
  Issue Type: Improvement
  Components: Java API, Optimizer, Scala API
Affects Versions: 0.7.0-incubating
Reporter: Fabian Hueske
Assignee: Fabian Hueske

 Constant field annotations are used by the optimizer to determine whether 
 physical data properties such as sorting or partitioning are retained by user 
 defined functions.
 The current implementation is limited and can be extended in several ways:
 - Fields that are copied to other positions
 - Field definitions for non-tuple data types (Pojos)
 There is a pull request (#83) that goes into this direction and which can be 
 extended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1453) Integration tests for YARN failing on OS X

2015-01-26 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-1453.
---
   Resolution: Fixed
Fix Version/s: 0.9

Issue resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/3bdeab1b.

Thanks to [~uce] for giving me ssh access to his Mac :D

 Integration tests for YARN failing on OS X
 --

 Key: FLINK-1453
 URL: https://issues.apache.org/jira/browse/FLINK-1453
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Reporter: Robert Metzger
Assignee: Robert Metzger
 Fix For: 0.9


 The flink yarn tests are failing on OS X, most likely through a port conflict:
 {code}
 11:59:38,870 INFO  org.eclipse.jetty.util.log 
- jetty-0.9-SNAPSHOT
 11:59:38,885 WARN  org.eclipse.jetty.util.log 
- FAILED SelectChannelConnector@0.0.0.0:8081: java.net.BindException: 
 Address already in use
 11:59:38,885 WARN  org.eclipse.jetty.util.log 
- FAILED org.eclipse.jetty.server.Server@281c7736: java.net.BindException: 
 Address already in use
 11:59:38,892 ERROR akka.actor.OneForOneStrategy   
- Address already in use
 akka.actor.ActorInitializationException: exception during creation
 at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
 at akka.actor.ActorCell.create(ActorCell.scala:596)
 at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
 at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
 at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.net.BindException: Address already in use
 at sun.nio.ch.Net.bind0(Native Method)
 at sun.nio.ch.Net.bind(Net.java:444)
 at sun.nio.ch.Net.bind(Net.java:436)
 at 
 sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
 at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
 at 
 org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:208)
 at 
 org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:288)
 at 
 org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:55)
 at org.eclipse.jetty.server.Server.doStart(Server.java:254)
 at 
 org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:55)
 at 
 org.apache.flink.runtime.jobmanager.web.WebInfoServer.start(WebInfoServer.java:198)
 at 
 org.apache.flink.runtime.jobmanager.WithWebServer$class.$init$(WithWebServer.scala:28)
 at 
 org.apache.flink.yarn.ApplicationMaster$$anonfun$startJobManager$2$$anon$1.init(ApplicationMaster.scala:181)
 at 
 org.apache.flink.yarn.ApplicationMaster$$anonfun$startJobManager$2.apply(ApplicationMaster.scala:181)
 at 
 org.apache.flink.yarn.ApplicationMaster$$anonfun$startJobManager$2.apply(ApplicationMaster.scala:181)
 at akka.actor.TypedCreatorFunctionConsumer.produce(Props.scala:343)
 at akka.actor.Props.newActor(Props.scala:252)
 at akka.actor.ActorCell.newActor(ActorCell.scala:552)
 at akka.actor.ActorCell.create(ActorCell.scala:578)
 ... 10 more
 {code}
 The issue does not appear on Travis or on Arch Linux (however, tests are also 
 failing on some Ubuntu versions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1428) Typos in Java code example for RichGroupReduceFunction

2015-01-26 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-1428.

Resolution: Fixed

Fixed in 06b2acf.

 Typos in Java code example for RichGroupReduceFunction
 --

 Key: FLINK-1428
 URL: https://issues.apache.org/jira/browse/FLINK-1428
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Minor

 http://flink.apache.org/docs/0.7-incubating/dataset_transformations.html
 String key = null //missing ';'
 public void combine(IterableTuple3String, Integer, Double in,
   CollectorTuple3String, Integer, Double out))
 -- one ')' too much



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...

2015-01-26 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/333#discussion_r23527422
  
--- Diff: flink-dist/pom.xml ---
@@ -436,6 +436,37 @@ under the License.
/gitDescribe
/configuration
/plugin
+
+   !-- create a symbolic link to the build target in the 
root directory --
+   plugin
+   groupIdcom.pyx4j/groupId
+   artifactIdmaven-junction-plugin/artifactId
+   version1.0.3/version
+   executions
+   execution
+   phasepackage/phase
+   goals
+   goallink/goal
+   /goals
+   /execution
+   execution
+   idunlink/id
+   phaseclean/phase
+   goals
+   goalunlink/goal
+   /goals
+   /execution
+   /executions
+   configuration
+   links
+   link
+   
dst${basedir}/../build-target/dst
--- End diff --

Isn't based deprecated? In the next line you use project.basedir.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1330) Restructure directory layout

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291781#comment-14291781
 ] 

ASF GitHub Bot commented on FLINK-1330:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/333#discussion_r23527422
  
--- Diff: flink-dist/pom.xml ---
@@ -436,6 +436,37 @@ under the License.
/gitDescribe
/configuration
/plugin
+
+   !-- create a symbolic link to the build target in the 
root directory --
+   plugin
+   groupIdcom.pyx4j/groupId
+   artifactIdmaven-junction-plugin/artifactId
+   version1.0.3/version
+   executions
+   execution
+   phasepackage/phase
+   goals
+   goallink/goal
+   /goals
+   /execution
+   execution
+   idunlink/id
+   phaseclean/phase
+   goals
+   goalunlink/goal
+   /goals
+   /execution
+   /executions
+   configuration
+   links
+   link
+   
dst${basedir}/../build-target/dst
--- End diff --

Isn't based deprecated? In the next line you use project.basedir.


 Restructure directory layout
 

 Key: FLINK-1330
 URL: https://issues.apache.org/jira/browse/FLINK-1330
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Documentation
Reporter: Max Michels
Priority: Minor
  Labels: usability

 When building Flink, the build results can currently be found under 
 flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/.
 I think we could improve the directory layout with the following:
 - provide the bin folder in the root by default
 - let the start up and submissions scripts in bin assemble the class path
 - in case the project hasn't been build yet, inform the user
 The changes would make it easier to work with Flink from source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1320) Add an off-heap variant of the managed memory

2015-01-26 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-1320:
--
Assignee: Max Michels

 Add an off-heap variant of the managed memory
 -

 Key: FLINK-1320
 URL: https://issues.apache.org/jira/browse/FLINK-1320
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Stephan Ewen
Assignee: Max Michels
Priority: Minor

 For (nearly) all memory that Flink accumulates (in the form of sort buffers, 
 hash tables, caching), we use a special way of representing data serialized 
 across a set of memory pages. The big work lies in the way the algorithms are 
 implemented to operate on pages, rather than on objects.
 The core class for the memory is the {{MemorySegment}}, which has all methods 
 to set and get primitives values efficiently. It is a somewhat simpler (and 
 faster) variant of a HeapByteBuffer.
 As such, it should be straightforward to create a version where the memory 
 segment is not backed by a heap byte[], but by memory allocated outside the 
 JVM, in a similar way as the NIO DirectByteBuffers, or the Netty direct 
 buffers do it.
 This may have multiple advantages:
   - We reduce the size of the JVM heap (garbage collected) and the number and 
 size of long living alive objects. For large JVM sizes, this may improve 
 performance quite a bit. Utilmately, we would in many cases reduce JVM size 
 to 1/3 to 1/2 and keep the remaining memory outside the JVM.
   - We save copies when we move memory pages to disk (spilling) or through 
 the network (shuffling / broadcasting / forward piping)
 The changes required to implement this are
   - Add a {{UnmanagedMemorySegment}} that only stores the memory adress as a 
 long, and the segment size. It is initialized from a DirectByteBuffer.
   - Allow the MemoryManager to allocate these MemorySegments, instead of the 
 current ones.
   - Make sure that the startup script pick up the mode and configure the heap 
 size and the max direct memory properly.
 Since the MemorySegment is probably the most performance critical class in 
 Flink, we must take care that we do this right. The following are critical 
 considerations:
   - If we want both solutions (heap and off-heap) to exist side-by-side 
 (configurable), we must make the base MemorySegment abstract and implement 
 two versions (heap and off-heap).
   - To get the best performance, we need to make sure that only one class 
 gets loaded (or at least ever used), to ensure optimal JIT de-virtualization 
 and inlining.
   - We should carefully measure the performance of both variants. From 
 previous micro benchmarks, I remember that individual byte accesses in 
 DirectByteBuffers (off-heap) were slightly slower than on-heap, any larger 
 accesses were equally good or slightly better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1428] Update dataset_transformations.md

2015-01-26 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/340#issuecomment-71453921
  
+1

Will merge this later to `master` and `release-0.8`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1415) Akka cleanups

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291773#comment-14291773
 ] 

ASF GitHub Bot commented on FLINK-1415:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/319#discussion_r23526978
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java ---
@@ -329,6 +330,15 @@ public void unregisterMemoryManager(MemoryManager 
memoryManager) {
}
}
 
+   protected void notifyExecutionStateChange(ExecutionState executionState,
+   
Throwable optionalError) {
--- End diff --

This also seems weird 


 Akka cleanups
 -

 Key: FLINK-1415
 URL: https://issues.apache.org/jira/browse/FLINK-1415
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Currently, Akka has many different timeout values. From a user perspective, 
 it would be helpful to deduce all different timeouts from a single timeout 
 value. Additionally, the user should still be able to define specific values 
 for the different timeouts.
 Akka uses the akka.jobmanager.url config parameter to override the jobmanager 
 address and the port in case of a local setup. This mechanism is not safe 
 since it is exposed to the user. Thus, the mechanism should be replaced.
 The notifyExecutionStateChange method allows objects to access the internal 
 state of the TaskManager actor. This causes NullPointerExceptions when 
 shutting down the actor. This method should be removed to avoid accessing the 
 internal state of an actor by another object.
 With the latest Akka changes, the TaskManager watches the JobManager in order 
 to detect when it died or lost the connection to the TaskManager. This 
 behaviour should be tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...

2015-01-26 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71454018
  
Very nice. +1

Will merge this later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1428) Typos in Java code example for RichGroupReduceFunction

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291771#comment-14291771
 ] 

ASF GitHub Bot commented on FLINK-1428:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/340#issuecomment-71453921
  
+1

Will merge this later to `master` and `release-0.8`.


 Typos in Java code example for RichGroupReduceFunction
 --

 Key: FLINK-1428
 URL: https://issues.apache.org/jira/browse/FLINK-1428
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Minor

 http://flink.apache.org/docs/0.7-incubating/dataset_transformations.html
 String key = null //missing ';'
 public void combine(IterableTuple3String, Integer, Double in,
   CollectorTuple3String, Integer, Double out))
 -- one ')' too much



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1415] Akka cleanups

2015-01-26 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/319#discussion_r23526978
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java ---
@@ -329,6 +330,15 @@ public void unregisterMemoryManager(MemoryManager 
memoryManager) {
}
}
 
+   protected void notifyExecutionStateChange(ExecutionState executionState,
+   
Throwable optionalError) {
--- End diff --

This also seems weird 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-1436] refactor CLiFrontend to provide m...

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/331#issuecomment-71467375
  
I think it would be better not to print the help if the user specified 
something incorrectly. Maybe just the error message and a note that -h prints 
the help?

I've tried out the change, but now, the message is as the very bottom of 
the output. Its now probably even harder to find it.

**Bad** (see below for *Good*)

```
robert@robert-da ~/flink-workdir/flink2/build-target (git)-[flink-1436] % 
./bin/flink ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar 

Action run compiles and runs a program.

  Syntax: run [OPTIONS] jar-file arguments
  run action options:
 -c,--class classname   Class with the program entry point 
(main
  method or getPlan() method. Only 
needed
  if the JAR file does not specify the 
class
  in its manifest.
 -m,--jobmanager host:port  Address of the JobManager (master) to
  which to connect. Specify 
'yarn-cluster'
  as the JobManager to deploy a YARN 
cluster
  for the job. Use this flag to connect 
to a
  different JobManager than the one
  specified in the configuration.
 -p,--parallelism parallelism   The parallelism with which to run the
  program. Optional flag to override the
  default value specified in the
  configuration.
  Additional arguments if -m yarn-cluster is set:
 -yD argDynamic properties
 -yj,--yarnjar arg  Path to Flink jar file
 -yjm,--yarnjobManagerMemory argMemory for JobManager Container 
[in
  MB]
 -yn,--yarncontainer argNumber of YARN container to 
allocate
  (=Number of Task Managers)
 -yq,--yarnquery  Display available YARN resources
  (memory, cores)
 -yqu,--yarnqueue arg   Specify YARN queue.
 -ys,--yarnslots argNumber of slots per TaskManager
 -yt,--yarnship arg Ship files in the specified 
directory
  (t for transfer)
 -ytm,--yarntaskManagerMemory arg   Memory per TaskManager Container 
[in
  MB]

Invalid action: ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
1 robert@robert-da ~/flink-workdir/flink2/build-target (git)-[flink-1436]
```

The info command is over-engineered in my optionion. It contains only one 
possible option, which is -e for execution plan. I would vote to remove the 
info action and call it plan or so. 
Or keep its info name and print the plan by default (this is not @mxm's 
fault .. but it would be nice to fix this with the PR)
```
 ./bin/flink info  
./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar  

Action info displays information about a program.

  Syntax: info [OPTIONS] jar-file arguments
  info action options:
 -c,--class classname   Class with the program entry point 
(main
  method or getPlan() method. Only 
needed
  if the JAR file does not specify the 
class
  in its manifest.
 -e,--executionplan   Show optimized execution plan of the
  program (JSON)
 -m,--jobmanager host:port  Address of the JobManager (master) to
  which to connect. Specify 
'yarn-cluster'
  as the JobManager to deploy a YARN 
cluster
  for the job. Use this flag to connect 
to a
  different JobManager than the one
  specified in the configuration.
 -p,--parallelism parallelism   The parallelism with which to run the
  program. Optional flag to override the
  default value specified in the
  configuration.

Error: Specify one of the above options to display information.
```

**Good**

What I liked was the error reporting when passing an invalid file as the 

[jira] [Commented] (FLINK-1318) Make quoted String parsing optional and configurable for CSVInputFormats

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291900#comment-14291900
 ] 

ASF GitHub Bot commented on FLINK-1318:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/265#issuecomment-71470231
  
Any comments on this PR?


 Make quoted String parsing optional and configurable for CSVInputFormats
 

 Key: FLINK-1318
 URL: https://issues.apache.org/jira/browse/FLINK-1318
 Project: Flink
  Issue Type: Improvement
  Components: Java API, Scala API
Affects Versions: 0.8
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Minor

 With the current implementation of the CSVInputFormat, quoted string parsing 
 kicks in, if the first non-whitespace character of a field is a double quote. 
 I see two issues with this implementation:
 1. Quoted String parsing cannot be disabled
 2. The quoting character is fixed to double quotes ()
 I propose to add parameters to disable quoted String parsing and set the 
 quote character.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1415] Akka cleanups

2015-01-26 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/319#discussion_r23526836
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotSharingGroupAssignment.java
 ---
@@ -42,86 +44,82 @@
 public class SlotSharingGroupAssignment implements Serializable {
 
static final long serialVersionUID = 42L;
-   
+
private static final Logger LOG = Scheduler.LOG;
-   
+
private transient final Object lock = new Object();
-   
+
/** All slots currently allocated to this sharing group */
private final SetSharedSlot allSlots = new 
LinkedHashSetSharedSlot();
-   
+
/** The slots available per vertex type (jid), keyed by instance, to 
make them locatable */
private final MapAbstractID, MapInstance, ListSharedSlot 
availableSlotsPerJid = new LinkedHashMapAbstractID, MapInstance, 
ListSharedSlot();
-   
-   
+
// 

-   
-   
-   public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex 
vertex) {
-   JobVertexID id = vertex.getJobvertexId();
-   return addNewSlotWithTask(slot, id, id);
-   }
-   
-   public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex 
vertex, CoLocationConstraint constraint) {
-   AbstractID groupId = constraint.getGroupId();
-   return addNewSlotWithTask(slot, groupId, null);
-   }
-   
-   private SubSlot addNewSlotWithTask(AllocatedSlot slot, AbstractID 
groupId, JobVertexID vertexId) {
-   
-   final SharedSlot sharedSlot = new SharedSlot(slot, this);
-   final Instance location = slot.getInstance();
-   
+
+   public SimpleSlot addSharedSlotAndAllocateSubSlot(SharedSlot 
sharedSlot, Locality locality,
+   
AbstractID groupId, CoLocationConstraint constraint) {
--- End diff --

indentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1415) Akka cleanups

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291769#comment-14291769
 ] 

ASF GitHub Bot commented on FLINK-1415:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/319#discussion_r23526787
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotAvailabilityListener.java
 ---
@@ -21,7 +21,7 @@
 import org.apache.flink.runtime.instance.Instance;
 
 /**
- * A SlotAvailabilityListener can be notified when new {@link 
org.apache.flink.runtime.instance.AllocatedSlot}s become available
+ * A SlotAvailabilityListener can be notified when new {@link 
org.apache.flink.runtime.instance.AllocatedSlot2}s become available
--- End diff --

I guess `AllocatedSlot2` is an automatic rename leftover


 Akka cleanups
 -

 Key: FLINK-1415
 URL: https://issues.apache.org/jira/browse/FLINK-1415
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Currently, Akka has many different timeout values. From a user perspective, 
 it would be helpful to deduce all different timeouts from a single timeout 
 value. Additionally, the user should still be able to define specific values 
 for the different timeouts.
 Akka uses the akka.jobmanager.url config parameter to override the jobmanager 
 address and the port in case of a local setup. This mechanism is not safe 
 since it is exposed to the user. Thus, the mechanism should be replaced.
 The notifyExecutionStateChange method allows objects to access the internal 
 state of the TaskManager actor. This causes NullPointerExceptions when 
 shutting down the actor. This method should be removed to avoid accessing the 
 internal state of an actor by another object.
 With the latest Akka changes, the TaskManager watches the JobManager in order 
 to detect when it died or lost the connection to the TaskManager. This 
 behaviour should be tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1415] Akka cleanups

2015-01-26 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/319#discussion_r23526787
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotAvailabilityListener.java
 ---
@@ -21,7 +21,7 @@
 import org.apache.flink.runtime.instance.Instance;
 
 /**
- * A SlotAvailabilityListener can be notified when new {@link 
org.apache.flink.runtime.instance.AllocatedSlot}s become available
+ * A SlotAvailabilityListener can be notified when new {@link 
org.apache.flink.runtime.instance.AllocatedSlot2}s become available
--- End diff --

I guess `AllocatedSlot2` is an automatic rename leftover


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1415] Akka cleanups

2015-01-26 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/319#issuecomment-71458216
  
I think @StephanEwen is reviewing the critical part.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1330) Restructure directory layout

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291839#comment-14291839
 ] 

ASF GitHub Bot commented on FLINK-1330:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71464164
  
-1
I think we need to add the `build-target` directory into the list of 
ignored directories for apache rat. Rat will fail subsequent builds
```
1 Unknown Licenses

***

Unapproved licenses:

  build-target/conf/slaves

***
```


 Restructure directory layout
 

 Key: FLINK-1330
 URL: https://issues.apache.org/jira/browse/FLINK-1330
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Documentation
Reporter: Max Michels
Priority: Minor
  Labels: usability

 When building Flink, the build results can currently be found under 
 flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/.
 I think we could improve the directory layout with the following:
 - provide the bin folder in the root by default
 - let the start up and submissions scripts in bin assemble the class path
 - in case the project hasn't been build yet, inform the user
 The changes would make it easier to work with Flink from source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1415) Akka cleanups

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291785#comment-14291785
 ] 

ASF GitHub Bot commented on FLINK-1415:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/319#issuecomment-71455790
  
This is a huge change. I took a look over the code, but I don't have enough 
experience with the scheduler to understand these changes.

I would suggest to merge this rather soon because its touching a lot of 
code due to minor scala style changes (semicolons, removal of parentheses from 
no-arg methods, unneeded { } and so on)
+1 for the added documentation to the classes!

The bug in FLINK-1453 would be more obvious with these changes were merged. 
That's another motivation for me to push this pull request forward ;)



 Akka cleanups
 -

 Key: FLINK-1415
 URL: https://issues.apache.org/jira/browse/FLINK-1415
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Currently, Akka has many different timeout values. From a user perspective, 
 it would be helpful to deduce all different timeouts from a single timeout 
 value. Additionally, the user should still be able to define specific values 
 for the different timeouts.
 Akka uses the akka.jobmanager.url config parameter to override the jobmanager 
 address and the port in case of a local setup. This mechanism is not safe 
 since it is exposed to the user. Thus, the mechanism should be replaced.
 The notifyExecutionStateChange method allows objects to access the internal 
 state of the TaskManager actor. This causes NullPointerExceptions when 
 shutting down the actor. This method should be removed to avoid accessing the 
 internal state of an actor by another object.
 With the latest Akka changes, the TaskManager watches the JobManager in order 
 to detect when it died or lost the connection to the TaskManager. This 
 behaviour should be tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71464164
  
-1
I think we need to add the `build-target` directory into the list of 
ignored directories for apache rat. Rat will fail subsequent builds
```
1 Unknown Licenses

***

Unapproved licenses:

  build-target/conf/slaves

***
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1446] Fix Kryo createInstance() method

2015-01-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/336


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1446) Make KryoSerializer.createInstance() return new instances instead of null

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291936#comment-14291936
 ] 

ASF GitHub Bot commented on FLINK-1446:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/336


 Make KryoSerializer.createInstance() return new instances instead of null
 -

 Key: FLINK-1446
 URL: https://issues.apache.org/jira/browse/FLINK-1446
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Robert Metzger
 Fix For: 0.9, 0.8.1






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...

2015-01-26 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71488487
  
I don't really understand how the static lock solves the mentioned issue. 
Is there a concurrency problem between creating files on disk and updating the 
count hash map?

I think there is a problem between the DeleteProcess and the CopyProcess. 
The CopyProcess is synchronized on the static lock object and the DeleteProcess 
is not. Thus, it might be the case that the copy method created the directories 
for a new file foobar, let's say /tmp/123/foobar, and afterwards the delete 
process deletes the directory /tmp/123 because it checked the count hash map 
before the createTmpFile method was called.

This problem should still persist with the current changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292024#comment-14292024
 ] 

ASF GitHub Bot commented on FLINK-1419:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71488487
  
I don't really understand how the static lock solves the mentioned issue. 
Is there a concurrency problem between creating files on disk and updating the 
count hash map?

I think there is a problem between the DeleteProcess and the CopyProcess. 
The CopyProcess is synchronized on the static lock object and the DeleteProcess 
is not. Thus, it might be the case that the copy method created the directories 
for a new file foobar, let's say /tmp/123/foobar, and afterwards the delete 
process deletes the directory /tmp/123 because it checked the count hash map 
before the createTmpFile method was called.

This problem should still persist with the current changes.


 DistributedCache doesn't preserver files for subsequent operations
 --

 Key: FLINK-1419
 URL: https://issues.apache.org/jira/browse/FLINK-1419
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler

 When subsequent operations want to access the same files in the DC it 
 frequently happens that the files are not created for the following operation.
 This is fairly odd, since the DC is supposed to either a) preserve files when 
 another operation kicks in within a certain time window, or b) just recreate 
 the deleted files. Both things don't happen.
 Increasing the time window had no effect.
 I'd like to use this issue as a starting point for a more general discussion 
 about the DistributedCache. 
 Currently:
 1. all files reside in a common job-specific directory
 2. are deleted during the job.
  
 One thing that was brought up about Trait 1 is that it basically forbids 
 modification of the files, concurrent access and all. Personally I'm not sure 
 if this a problem. Changing it to a task-specific place solved the issue 
 though.
 I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
 is realized with the scheduler, which adds a lot of complexity to the current 
 code. (It really is a pain to work on...) 
 If we moved the deletion to the end of the job it could be done as a clean-up 
 step in the TaskManager, With this we could reduce the DC to a 
 cacheFile(String source) method, the delete method in the TM, and throw out 
 everything else.
 Also, the current implementation implies that big files may be copied 
 multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Discuss] Simplify SplittableIterator interfac...

2015-01-26 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/338#issuecomment-71490635
  
LGTM

I suspect that the travis fail is caused by faulty colocated subslot 
disposal. Should be fixed with #317 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1415) Akka cleanups

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291950#comment-14291950
 ] 

ASF GitHub Bot commented on FLINK-1415:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/319#issuecomment-71477890
  
I think so too. The changes of the scheduler are only included because 
otherwise the test cases wouldn't pass. The scheduler relevant PR is #317.


 Akka cleanups
 -

 Key: FLINK-1415
 URL: https://issues.apache.org/jira/browse/FLINK-1415
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Currently, Akka has many different timeout values. From a user perspective, 
 it would be helpful to deduce all different timeouts from a single timeout 
 value. Additionally, the user should still be able to define specific values 
 for the different timeouts.
 Akka uses the akka.jobmanager.url config parameter to override the jobmanager 
 address and the port in case of a local setup. This mechanism is not safe 
 since it is exposed to the user. Thus, the mechanism should be replaced.
 The notifyExecutionStateChange method allows objects to access the internal 
 state of the TaskManager actor. This causes NullPointerExceptions when 
 shutting down the actor. This method should be removed to avoid accessing the 
 internal state of an actor by another object.
 With the latest Akka changes, the TaskManager watches the JobManager in order 
 to detect when it died or lost the connection to the TaskManager. This 
 behaviour should be tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1454) CliFrontend blocks for 100 seconds when submitting to a non-existent JobManager

2015-01-26 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291951#comment-14291951
 ] 

Robert Metzger commented on FLINK-1454:
---

Yes. It would be nicer if the system told the user that the network connection 
was refused.
Also, 100s is a pretty long time, in particular because many newbies will come 
across this issue.

 CliFrontend blocks for 100 seconds when submitting to a non-existent 
 JobManager
 ---

 Key: FLINK-1454
 URL: https://issues.apache.org/jira/browse/FLINK-1454
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Robert Metzger

 When a user tries to submit a job to a job manager which doesn't exist at 
 all, the CliFrontend blocks for 100 seconds.
 Ideally, Akka would fail because it can not connect to the given 
 hostname:port.
  
 {code}
 ./bin/flink run ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar -c 
 foo.Baz
 org.apache.flink.client.program.ProgramInvocationException: The main method 
 caused an error.
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:449)
   at 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:350)
   at org.apache.flink.client.program.Client.run(Client.java:242)
   at 
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:389)
   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:362)
   at 
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1078)
   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1102)
 Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
 [100 seconds]
   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
   at 
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
   at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
   at scala.concurrent.Await$.result(package.scala:107)
   at org.apache.flink.runtime.akka.AkkaUtils$.ask(AkkaUtils.scala:265)
   at 
 org.apache.flink.runtime.client.JobClient$.uploadJarFiles(JobClient.scala:169)
   at 
 org.apache.flink.runtime.client.JobClient.uploadJarFiles(JobClient.scala)
   at org.apache.flink.client.program.Client.run(Client.java:314)
   at org.apache.flink.client.program.Client.run(Client.java:296)
   at org.apache.flink.client.program.Client.run(Client.java:290)
   at 
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
   at 
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:434)
   ... 6 more
 The exception above occurred while trying to run your command.
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1147) TypeInference on POJOs

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291953#comment-14291953
 ] 

ASF GitHub Bot commented on FLINK-1147:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/315


 TypeInference on POJOs
 --

 Key: FLINK-1147
 URL: https://issues.apache.org/jira/browse/FLINK-1147
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Affects Versions: 0.7.0-incubating
Reporter: Stephan Ewen
Assignee: Timo Walther

 On Tuples, we currently use type inference that figures out the types of 
 output type variables relative to the input type variable.
 We need a similar functionality for POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1415] Akka cleanups

2015-01-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/319#discussion_r23537586
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotAvailabilityListener.java
 ---
@@ -21,7 +21,7 @@
 import org.apache.flink.runtime.instance.Instance;
 
 /**
- * A SlotAvailabilityListener can be notified when new {@link 
org.apache.flink.runtime.instance.AllocatedSlot}s become available
+ * A SlotAvailabilityListener can be notified when new {@link 
org.apache.flink.runtime.instance.AllocatedSlot2}s become available
--- End diff --

You are right. Good catch. I'll correct it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1147][Java API] TypeInference on POJOs

2015-01-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/315


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1391] Add support for using Avro-POJOs ...

2015-01-26 Thread rmetzger
Github user rmetzger closed the pull request at:

https://github.com/apache/flink/pull/323


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1201) Graph API for Flink

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292046#comment-14292046
 ] 

ASF GitHub Bot commented on FLINK-1201:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-71491484
  
No worries, I can rename the tests. It's better to be consistent :)
May I ask, what is the difference between a XyzTest and a XyzITCase test 
though? Thnx!


 Graph API for Flink 
 

 Key: FLINK-1201
 URL: https://issues.apache.org/jira/browse/FLINK-1201
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Vasia Kalavri

 This issue tracks the development of a Graph API/DSL for Flink.
 Until the code is pushed to the Flink repository, collaboration is happening 
 here: https://github.com/project-flink/flink-graph



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1430) Add test for streaming scala api completeness

2015-01-26 Thread Mingliang Qi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291921#comment-14291921
 ] 

Mingliang Qi commented on FLINK-1430:
-

I saw some methods like setConfig, getConfig, getStreamGraph in 
StreamExecutionEnvironment, which are included in the batch scala api but not 
in stream scala api. Should we just exclude these in the test? Another one 
missing is printToErr in DataStream class.

 Add test for streaming scala api completeness
 -

 Key: FLINK-1430
 URL: https://issues.apache.org/jira/browse/FLINK-1430
 Project: Flink
  Issue Type: Test
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Mingliang Qi

 Currently the completeness of the streaming scala api is not tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1442) Archived Execution Graph consumes too much memory

2015-01-26 Thread Max Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Michels reassigned FLINK-1442:
--

Assignee: Max Michels

 Archived Execution Graph consumes too much memory
 -

 Key: FLINK-1442
 URL: https://issues.apache.org/jira/browse/FLINK-1442
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Max Michels

 The JobManager archives the execution graphs, for analysis of jobs. The 
 graphs may consume a lot of memory.
 Especially the execution edges in all2all connection patterns are extremely 
 many and add up in memory consumption.
 The execution edges connect all parallel tasks. So for a all2all pattern 
 between n and m tasks, there are n*m edges. For parallelism of multiple 100 
 tasks, this can easily reach 100k objects and more, each with a set of 
 metadata.
 I propose the following to solve that:
 1.  Clear all execution edges from the graph (majority of the memory 
 consumers) when it is given to the archiver.
 2. Have the map/list of the archived graphs behind a soft reference, to it 
 will be removed under memory pressure before the JVM crashes. That may remove 
 graphs from the history early, but is much preferable to the JVM crashing, in 
 which case the graph is lost as well...
 3. Long term: The graph should be archived somewhere else. Somthing like the 
 History server used by Hadoop and Hive would be a good idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...

2015-01-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/339#discussion_r23541551
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/filecache/FileCache.java 
---
@@ -72,7 +72,7 @@
 * @return copy task
 */
public FutureTaskPath createTmpFile(String name, 
DistributedCacheEntry entry, JobID jobID) {
-   synchronized (count) {
+   synchronized (lock) {
--- End diff --

How does the static lock solves the problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292011#comment-14292011
 ] 

ASF GitHub Bot commented on FLINK-1419:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/339#discussion_r23541551
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/filecache/FileCache.java 
---
@@ -72,7 +72,7 @@
 * @return copy task
 */
public FutureTaskPath createTmpFile(String name, 
DistributedCacheEntry entry, JobID jobID) {
-   synchronized (count) {
+   synchronized (lock) {
--- End diff --

How does the static lock solves the problem?


 DistributedCache doesn't preserver files for subsequent operations
 --

 Key: FLINK-1419
 URL: https://issues.apache.org/jira/browse/FLINK-1419
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler

 When subsequent operations want to access the same files in the DC it 
 frequently happens that the files are not created for the following operation.
 This is fairly odd, since the DC is supposed to either a) preserve files when 
 another operation kicks in within a certain time window, or b) just recreate 
 the deleted files. Both things don't happen.
 Increasing the time window had no effect.
 I'd like to use this issue as a starting point for a more general discussion 
 about the DistributedCache. 
 Currently:
 1. all files reside in a common job-specific directory
 2. are deleted during the job.
  
 One thing that was brought up about Trait 1 is that it basically forbids 
 modification of the files, concurrent access and all. Personally I'm not sure 
 if this a problem. Changing it to a task-specific place solved the issue 
 though.
 I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
 is realized with the scheduler, which adds a lot of complexity to the current 
 code. (It really is a pain to work on...) 
 If we moved the deletion to the end of the job it could be done as a clean-up 
 step in the TaskManager, With this we could reduce the DC to a 
 cacheFile(String source) method, the delete method in the TM, and throw out 
 everything else.
 Also, the current implementation implies that big files may be copied 
 multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292017#comment-14292017
 ] 

ASF GitHub Bot commented on FLINK-1395:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/304#issuecomment-71487780
  
I'm going to merge the changes in this pull request into a custom branch.
I'll open a new pull request (FLINK-1417) containing the commit from this 
PR.

@aljoscha: Can you close this PR?


 Add Jodatime support to Kryo
 

 Key: FLINK-1395
 URL: https://issues.apache.org/jira/browse/FLINK-1395
 Project: Flink
  Issue Type: Sub-task
Reporter: Robert Metzger
Assignee: Robert Metzger





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...

2015-01-26 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/328#issuecomment-71476676
  
I updated the PR with the exponential backoff registration strategy. On the 
way, I fixed the flakey RecoveryIT case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291940#comment-14291940
 ] 

ASF GitHub Bot commented on FLINK-1352:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/328#issuecomment-71476676
  
I updated the PR with the exponential backoff registration strategy. On the 
way, I fixed the flakey RecoveryIT case.


 Buggy registration from TaskManager to JobManager
 -

 Key: FLINK-1352
 URL: https://issues.apache.org/jira/browse/FLINK-1352
 Project: Flink
  Issue Type: Bug
  Components: JobManager, TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Till Rohrmann
 Fix For: 0.9


 The JobManager's InstanceManager may refuse the registration attempt from a 
 TaskManager, because it has this taskmanager already connected, or,in the 
 future, because the TaskManager has been blacklisted as unreliable.
 Unpon refused registration, the instance ID is null, to signal that refused 
 registration. TaskManager reacts incorrectly to such methods, assuming 
 successful registration
 Possible solution: JobManager sends back a dedicated RegistrationRefused 
 message, if the instance manager returns null as the registration result. If 
 the TastManager receives that before being registered, it knows that the 
 registration response was lost (which should not happen on TCP and it would 
 indicate a corrupt connection)
 Followup question: Does it make sense to have the TaskManager trying 
 indefinitely to connect to the JobManager. With increasing interval (from 
 seconds to minutes)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1415] Akka cleanups

2015-01-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/319#discussion_r23537642
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotSharingGroupAssignment.java
 ---
@@ -42,86 +44,82 @@
 public class SlotSharingGroupAssignment implements Serializable {
 
static final long serialVersionUID = 42L;
-   
+
private static final Logger LOG = Scheduler.LOG;
-   
+
private transient final Object lock = new Object();
-   
+
/** All slots currently allocated to this sharing group */
private final SetSharedSlot allSlots = new 
LinkedHashSetSharedSlot();
-   
+
/** The slots available per vertex type (jid), keyed by instance, to 
make them locatable */
private final MapAbstractID, MapInstance, ListSharedSlot 
availableSlotsPerJid = new LinkedHashMapAbstractID, MapInstance, 
ListSharedSlot();
-   
-   
+
// 

-   
-   
-   public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex 
vertex) {
-   JobVertexID id = vertex.getJobvertexId();
-   return addNewSlotWithTask(slot, id, id);
-   }
-   
-   public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex 
vertex, CoLocationConstraint constraint) {
-   AbstractID groupId = constraint.getGroupId();
-   return addNewSlotWithTask(slot, groupId, null);
-   }
-   
-   private SubSlot addNewSlotWithTask(AllocatedSlot slot, AbstractID 
groupId, JobVertexID vertexId) {
-   
-   final SharedSlot sharedSlot = new SharedSlot(slot, this);
-   final Instance location = slot.getInstance();
-   
+
+   public SimpleSlot addSharedSlotAndAllocateSubSlot(SharedSlot 
sharedSlot, Locality locality,
+   
AbstractID groupId, CoLocationConstraint constraint) {
--- End diff --

I must have fallen asleep on the space button.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1392) Serializing Protobuf - issue 1

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291996#comment-14291996
 ] 

ASF GitHub Bot commented on FLINK-1392:
---

Github user rmetzger closed the pull request at:

https://github.com/apache/flink/pull/322


 Serializing Protobuf - issue 1
 --

 Key: FLINK-1392
 URL: https://issues.apache.org/jira/browse/FLINK-1392
 Project: Flink
  Issue Type: Sub-task
Reporter: Felix Neutatz
Assignee: Robert Metzger
Priority: Minor

 Hi, I started to experiment with Parquet using Protobuf.
 When I use the standard Protobuf class: 
 com.twitter.data.proto.tutorial.AddressBookProtos
 The code which I run, can be found here: 
 [https://github.com/FelixNeutatz/incubator-flink/blob/ParquetAtFlink/flink-addons/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapreduce/example/ParquetProtobufOutput.java]
 I get the following exception:
 {code:xml}
 Exception in thread main java.lang.Exception: Deserializing the 
 InputFormat (org.apache.flink.api.java.io.CollectionInputFormat) failed: 
 Could not read the user code wrapper: Error while deserializing element from 
 collection
   at 
 org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:60)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:179)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:172)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:172)
   at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:34)
   at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27)
   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
   at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:52)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
   at akka.dispatch.Mailbox.run(Mailbox.scala:221)
   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: 
 org.apache.flink.runtime.operators.util.CorruptConfigurationException: Could 
 not read the user code wrapper: Error while deserializing element from 
 collection
   at 
 org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:285)
   at 
 org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:57)
   ... 25 more
 Caused by: java.io.IOException: Error while deserializing element from 
 collection
   at 
 org.apache.flink.api.java.io.CollectionInputFormat.readObject(CollectionInputFormat.java:108)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
   at 

[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292036#comment-14292036
 ] 

ASF GitHub Bot commented on FLINK-1419:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71490079
  
oh i see what you mean, maybe extend the synchronized block to include the 
actual delete stuff. yup that's a good idea, all i know is i tried it without 
the change and ran into issues, with the change it ran.


 DistributedCache doesn't preserver files for subsequent operations
 --

 Key: FLINK-1419
 URL: https://issues.apache.org/jira/browse/FLINK-1419
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler

 When subsequent operations want to access the same files in the DC it 
 frequently happens that the files are not created for the following operation.
 This is fairly odd, since the DC is supposed to either a) preserve files when 
 another operation kicks in within a certain time window, or b) just recreate 
 the deleted files. Both things don't happen.
 Increasing the time window had no effect.
 I'd like to use this issue as a starting point for a more general discussion 
 about the DistributedCache. 
 Currently:
 1. all files reside in a common job-specific directory
 2. are deleted during the job.
  
 One thing that was brought up about Trait 1 is that it basically forbids 
 modification of the files, concurrent access and all. Personally I'm not sure 
 if this a problem. Changing it to a task-specific place solved the issue 
 though.
 I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
 is realized with the scheduler, which adds a lot of complexity to the current 
 code. (It really is a pain to work on...) 
 If we moved the deletion to the end of the job it could be done as a clean-up 
 step in the TaskManager, With this we could reduce the DC to a 
 cacheFile(String source) method, the delete method in the TM, and throw out 
 everything else.
 Also, the current implementation implies that big files may be copied 
 multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292038#comment-14292038
 ] 

ASF GitHub Bot commented on FLINK-1419:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71490264
  
Yes, but only the access to the count hash map. The delete action itself is 
not synchronized.


 DistributedCache doesn't preserver files for subsequent operations
 --

 Key: FLINK-1419
 URL: https://issues.apache.org/jira/browse/FLINK-1419
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler

 When subsequent operations want to access the same files in the DC it 
 frequently happens that the files are not created for the following operation.
 This is fairly odd, since the DC is supposed to either a) preserve files when 
 another operation kicks in within a certain time window, or b) just recreate 
 the deleted files. Both things don't happen.
 Increasing the time window had no effect.
 I'd like to use this issue as a starting point for a more general discussion 
 about the DistributedCache. 
 Currently:
 1. all files reside in a common job-specific directory
 2. are deleted during the job.
  
 One thing that was brought up about Trait 1 is that it basically forbids 
 modification of the files, concurrent access and all. Personally I'm not sure 
 if this a problem. Changing it to a task-specific place solved the issue 
 though.
 I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
 is realized with the scheduler, which adds a lot of complexity to the current 
 code. (It really is a pain to work on...) 
 If we moved the deletion to the end of the job it could be done as a clean-up 
 step in the TaskManager, With this we could reduce the DC to a 
 cacheFile(String source) method, the delete method in the TM, and throw out 
 everything else.
 Also, the current implementation implies that big files may be copied 
 multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...

2015-01-26 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71490264
  
Yes, but only the access to the count hash map. The delete action itself is 
not synchronized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (FLINK-1430) Add test for streaming scala api completeness

2015-01-26 Thread Mingliang Qi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Qi reassigned FLINK-1430:
---

Assignee: Mingliang Qi

 Add test for streaming scala api completeness
 -

 Key: FLINK-1430
 URL: https://issues.apache.org/jira/browse/FLINK-1430
 Project: Flink
  Issue Type: Test
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Mingliang Qi

 Currently the completeness of the streaming scala api is not tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1437) Bug in PojoSerializer's copy() method

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291934#comment-14291934
 ] 

ASF GitHub Bot commented on FLINK-1437:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/342#issuecomment-71475896
  
This will probably conflict with https://github.com/apache/flink/pull/316.


 Bug in PojoSerializer's copy() method
 -

 Key: FLINK-1437
 URL: https://issues.apache.org/jira/browse/FLINK-1437
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: Timo Walther
Assignee: Timo Walther

 The PojoSerializer's {{copy()}} method does not work properly with {{null}} 
 values. An exception could look like:
 {code}
 Caused by: java.io.IOException: Thread 'SortMerger spilling thread' 
 terminated due to an exception: null
   at 
 org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:792)
 Caused by: java.io.EOFException
   at 
 org.apache.flink.runtime.io.disk.RandomAccessInputView.nextSegment(RandomAccessInputView.java:83)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedInputView.advance(AbstractPagedInputView.java:159)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readByte(AbstractPagedInputView.java:270)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:277)
   at org.apache.flink.types.StringValue.copyString(StringValue.java:839)
   at 
 org.apache.flink.api.common.typeutils.base.StringSerializer.copy(StringSerializer.java:83)
   at 
 org.apache.flink.api.java.typeutils.runtime.PojoSerializer.copy(PojoSerializer.java:261)
   at 
 org.apache.flink.runtime.operators.sort.NormalizedKeySorter.writeToOutput(NormalizedKeySorter.java:449)
   at 
 org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1303)
   at 
 org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:788)
 {code}
 I'm working on a fix for that...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1454) CliFrontend blocks for 100 seconds when submitting to a non-existent JobManager

2015-01-26 Thread Mingliang Qi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291941#comment-14291941
 ] 

Mingliang Qi commented on FLINK-1454:
-

isn't it because akka timeout is set to 100s?

 CliFrontend blocks for 100 seconds when submitting to a non-existent 
 JobManager
 ---

 Key: FLINK-1454
 URL: https://issues.apache.org/jira/browse/FLINK-1454
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Robert Metzger

 When a user tries to submit a job to a job manager which doesn't exist at 
 all, the CliFrontend blocks for 100 seconds.
 Ideally, Akka would fail because it can not connect to the given 
 hostname:port.
  
 {code}
 ./bin/flink run ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar -c 
 foo.Baz
 org.apache.flink.client.program.ProgramInvocationException: The main method 
 caused an error.
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:449)
   at 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:350)
   at org.apache.flink.client.program.Client.run(Client.java:242)
   at 
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:389)
   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:362)
   at 
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1078)
   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1102)
 Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
 [100 seconds]
   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
   at 
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
   at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
   at scala.concurrent.Await$.result(package.scala:107)
   at org.apache.flink.runtime.akka.AkkaUtils$.ask(AkkaUtils.scala:265)
   at 
 org.apache.flink.runtime.client.JobClient$.uploadJarFiles(JobClient.scala:169)
   at 
 org.apache.flink.runtime.client.JobClient.uploadJarFiles(JobClient.scala)
   at org.apache.flink.client.program.Client.run(Client.java:314)
   at org.apache.flink.client.program.Client.run(Client.java:296)
   at org.apache.flink.client.program.Client.run(Client.java:290)
   at 
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
   at 
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:434)
   ... 6 more
 The exception above occurred while trying to run your command.
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292032#comment-14292032
 ] 

ASF GitHub Bot commented on FLINK-1419:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71489135
  
but that is exactly what is changing, both the delete and copy process are 
synchronized on the same object.


 DistributedCache doesn't preserver files for subsequent operations
 --

 Key: FLINK-1419
 URL: https://issues.apache.org/jira/browse/FLINK-1419
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler

 When subsequent operations want to access the same files in the DC it 
 frequently happens that the files are not created for the following operation.
 This is fairly odd, since the DC is supposed to either a) preserve files when 
 another operation kicks in within a certain time window, or b) just recreate 
 the deleted files. Both things don't happen.
 Increasing the time window had no effect.
 I'd like to use this issue as a starting point for a more general discussion 
 about the DistributedCache. 
 Currently:
 1. all files reside in a common job-specific directory
 2. are deleted during the job.
  
 One thing that was brought up about Trait 1 is that it basically forbids 
 modification of the files, concurrent access and all. Personally I'm not sure 
 if this a problem. Changing it to a task-specific place solved the issue 
 though.
 I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
 is realized with the scheduler, which adds a lot of complexity to the current 
 code. (It really is a pain to work on...) 
 If we moved the deletion to the end of the job it could be done as a clean-up 
 step in the TaskManager, With this we could reduce the DC to a 
 cacheFile(String source) method, the delete method in the TM, and throw out 
 everything else.
 Also, the current implementation implies that big files may be copied 
 multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...

2015-01-26 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-71492121
  
Sure, XyzTest are unit tests which are executed in Maven's test phase. 
These should execute rather fast. Everything that brings up a full Flink system 
is an integration test case (XyzITCase) and executed during mvn verify. These 
are used for long running tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1437) Bug in PojoSerializer's copy() method

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291931#comment-14291931
 ] 

ASF GitHub Bot commented on FLINK-1437:
---

GitHub user twalthr opened a pull request:

https://github.com/apache/flink/pull/342

[FLINK-1437][Java API] Fixes copy() methods in PojoSerializer for null 
values.

See description in FLINK-1437.

PR includes tests cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/twalthr/flink PojoCopyFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/342.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #342


commit f6917765fb599de74ab89580f22feb2096ca946c
Author: twalthr i...@twalthr.com
Date:   2015-01-26T15:09:24Z

[FLINK-1437][Java API] Fixes copy() methods in PojoSerializer for null 
values




 Bug in PojoSerializer's copy() method
 -

 Key: FLINK-1437
 URL: https://issues.apache.org/jira/browse/FLINK-1437
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: Timo Walther
Assignee: Timo Walther

 The PojoSerializer's {{copy()}} method does not work properly with {{null}} 
 values. An exception could look like:
 {code}
 Caused by: java.io.IOException: Thread 'SortMerger spilling thread' 
 terminated due to an exception: null
   at 
 org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:792)
 Caused by: java.io.EOFException
   at 
 org.apache.flink.runtime.io.disk.RandomAccessInputView.nextSegment(RandomAccessInputView.java:83)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedInputView.advance(AbstractPagedInputView.java:159)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readByte(AbstractPagedInputView.java:270)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:277)
   at org.apache.flink.types.StringValue.copyString(StringValue.java:839)
   at 
 org.apache.flink.api.common.typeutils.base.StringSerializer.copy(StringSerializer.java:83)
   at 
 org.apache.flink.api.java.typeutils.runtime.PojoSerializer.copy(PojoSerializer.java:261)
   at 
 org.apache.flink.runtime.operators.sort.NormalizedKeySorter.writeToOutput(NormalizedKeySorter.java:449)
   at 
 org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1303)
   at 
 org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:788)
 {code}
 I'm working on a fix for that...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1437][Java API] Fixes copy() methods in...

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/342#issuecomment-71475896
  
This will probably conflict with https://github.com/apache/flink/pull/316.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-1147) TypeInference on POJOs

2015-01-26 Thread Timo Walther (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther resolved FLINK-1147.
-
   Resolution: Fixed
Fix Version/s: 0.9

Fixed in commit 6067833fb6ad6c11a121d8654d7ca147cc909f05

 TypeInference on POJOs
 --

 Key: FLINK-1147
 URL: https://issues.apache.org/jira/browse/FLINK-1147
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Affects Versions: 0.7.0-incubating
Reporter: Stephan Ewen
Assignee: Timo Walther
 Fix For: 0.9


 On Tuples, we currently use type inference that figures out the types of 
 output type variables relative to the input type variable.
 We need a similar functionality for POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292040#comment-14292040
 ] 

ASF GitHub Bot commented on FLINK-1395:
---

Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/304


 Add Jodatime support to Kryo
 

 Key: FLINK-1395
 URL: https://issues.apache.org/jira/browse/FLINK-1395
 Project: Flink
  Issue Type: Sub-task
Reporter: Robert Metzger
Assignee: Robert Metzger





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...

2015-01-26 Thread aljoscha
Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/304


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1392] Add Kryo serializer for Protobuf

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/322#issuecomment-71484526
  
I will make this change part of a new pull request for 
https://issues.apache.org/jira/browse/FLINK-1417.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...

2015-01-26 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71489135
  
but that is exactly what is changing, both the delete and copy process are 
synchronized on the same object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...

2015-01-26 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71490079
  
oh i see what you mean, maybe extend the synchronized block to include the 
actual delete stuff. yup that's a good idea, all i know is i tried it without 
the change and ran into issues, with the change it ran.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1330) Restructure directory layout

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291774#comment-14291774
 ] 

ASF GitHub Bot commented on FLINK-1330:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71454018
  
Very nice. +1

Will merge this later.


 Restructure directory layout
 

 Key: FLINK-1330
 URL: https://issues.apache.org/jira/browse/FLINK-1330
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Documentation
Reporter: Max Michels
Priority: Minor
  Labels: usability

 When building Flink, the build results can currently be found under 
 flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/.
 I think we could improve the directory layout with the following:
 - provide the bin folder in the root by default
 - let the start up and submissions scripts in bin assemble the class path
 - in case the project hasn't been build yet, inform the user
 The changes would make it easier to work with Flink from source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1168] Adds multi-char field delimiter s...

2015-01-26 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/264#issuecomment-71455623
  
Updated the PR and will merge once Travis completed the build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1415) Akka cleanups

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291800#comment-14291800
 ] 

ASF GitHub Bot commented on FLINK-1415:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/319#issuecomment-71458216
  
I think @StephanEwen is reviewing the critical part.


 Akka cleanups
 -

 Key: FLINK-1415
 URL: https://issues.apache.org/jira/browse/FLINK-1415
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Currently, Akka has many different timeout values. From a user perspective, 
 it would be helpful to deduce all different timeouts from a single timeout 
 value. Additionally, the user should still be able to define specific values 
 for the different timeouts.
 Akka uses the akka.jobmanager.url config parameter to override the jobmanager 
 address and the port in case of a local setup. This mechanism is not safe 
 since it is exposed to the user. Thus, the mechanism should be replaced.
 The notifyExecutionStateChange method allows objects to access the internal 
 state of the TaskManager actor. This causes NullPointerExceptions when 
 shutting down the actor. This method should be removed to avoid accessing the 
 internal state of an actor by another object.
 With the latest Akka changes, the TaskManager watches the JobManager in order 
 to detect when it died or lost the connection to the TaskManager. This 
 behaviour should be tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-512) Add support for Tachyon File System to Flink

2015-01-26 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-512.
--
Resolution: Fixed

Has been resolved by FLINK-1266 (at least to the scope of this issue)

 Add support for Tachyon File System to Flink
 

 Key: FLINK-512
 URL: https://issues.apache.org/jira/browse/FLINK-512
 Project: Flink
  Issue Type: New Feature
Reporter: Chesnay Schepler
Assignee: Robert Metzger
Priority: Minor
  Labels: github-import
 Fix For: pre-apache

 Attachments: pull-request-512-9112930359400451481.patch


 Implementation of the Tachyon file system.
 The code was tested using junit and the wordcount example on a tachyon file 
 system running in local mode.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/pull/512
 Created by: [zentol|https://github.com/zentol]
 Labels: 
 Created at: Wed Feb 26 13:58:24 CET 2014
 State: closed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...

2015-01-26 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71469736
  
Oh, actually, that should work because the configuration explicitly binds 
the plugin to the clean phase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1330) Restructure directory layout

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291891#comment-14291891
 ] 

ASF GitHub Bot commented on FLINK-1330:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71469736
  
Oh, actually, that should work because the configuration explicitly binds 
the plugin to the clean phase.


 Restructure directory layout
 

 Key: FLINK-1330
 URL: https://issues.apache.org/jira/browse/FLINK-1330
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Documentation
Reporter: Max Michels
Priority: Minor
  Labels: usability

 When building Flink, the build results can currently be found under 
 flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/.
 I think we could improve the directory layout with the following:
 - provide the bin folder in the root by default
 - let the start up and submissions scripts in bin assemble the class path
 - in case the project hasn't been build yet, inform the user
 The changes would make it easier to work with Flink from source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1442] Reduce memory consumption of arch...

2015-01-26 Thread mxm
GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/344

[FLINK-1442] Reduce memory consumption of archived execution graph



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink flink-1442

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/344.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #344


commit baa04e386b70d3c928ceb07e78e50016f20520f0
Author: Max m...@posteo.de
Date:   2015-01-26T18:31:47Z

[FLINK-1442] Reduce memory consumption of archived execution graph




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1442) Archived Execution Graph consumes too much memory

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292251#comment-14292251
 ] 

ASF GitHub Bot commented on FLINK-1442:
---

GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/344

[FLINK-1442] Reduce memory consumption of archived execution graph



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink flink-1442

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/344.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #344


commit baa04e386b70d3c928ceb07e78e50016f20520f0
Author: Max m...@posteo.de
Date:   2015-01-26T18:31:47Z

[FLINK-1442] Reduce memory consumption of archived execution graph




 Archived Execution Graph consumes too much memory
 -

 Key: FLINK-1442
 URL: https://issues.apache.org/jira/browse/FLINK-1442
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Max Michels

 The JobManager archives the execution graphs, for analysis of jobs. The 
 graphs may consume a lot of memory.
 Especially the execution edges in all2all connection patterns are extremely 
 many and add up in memory consumption.
 The execution edges connect all parallel tasks. So for a all2all pattern 
 between n and m tasks, there are n*m edges. For parallelism of multiple 100 
 tasks, this can easily reach 100k objects and more, each with a set of 
 metadata.
 I propose the following to solve that:
 1.  Clear all execution edges from the graph (majority of the memory 
 consumers) when it is given to the archiver.
 2. Have the map/list of the archived graphs behind a soft reference, to it 
 will be removed under memory pressure before the JVM crashes. That may remove 
 graphs from the history early, but is much preferable to the JVM crashing, in 
 which case the graph is lost as well...
 3. Long term: The graph should be archived somewhere else. Somthing like the 
 History server used by Hadoop and Hive would be a good idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1456) Projection to fieldnames, keyselectors

2015-01-26 Thread JIRA
Márton Balassi created FLINK-1456:
-

 Summary: Projection to fieldnames, keyselectors
 Key: FLINK-1456
 URL: https://issues.apache.org/jira/browse/FLINK-1456
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Márton Balassi


The projection operator of both the batch and the streaming APIs only support 
projections with field positions as parameters. I'd like to extend this 
functionality with projection to fieldnames, and keyselectors providing a 
similar API to what groupBy already has.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1201) Graph API for Flink

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292076#comment-14292076
 ] 

ASF GitHub Bot commented on FLINK-1201:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-71494713
  
Alright, thanks @fhueske! So, it seems all of our tests are integration 
test cases. I will update later today I hope :)


 Graph API for Flink 
 

 Key: FLINK-1201
 URL: https://issues.apache.org/jira/browse/FLINK-1201
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Vasia Kalavri

 This issue tracks the development of a Graph API/DSL for Flink.
 Until the code is pushed to the Flink repository, collaboration is happening 
 here: https://github.com/project-flink/flink-graph



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...

2015-01-26 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71494058
  
Yeah, that would probably solve the problem. 

With race conditions it is often very tricky. Sometimes little changes 
change the process interleaving such that the problem seems to be fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1201) Graph API for Flink

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292167#comment-14292167
 ] 

ASF GitHub Bot commented on FLINK-1201:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-71507264
  
Tests renamed :)


 Graph API for Flink 
 

 Key: FLINK-1201
 URL: https://issues.apache.org/jira/browse/FLINK-1201
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Vasia Kalavri

 This issue tracks the development of a Graph API/DSL for Flink.
 Until the code is pushed to the Flink repository, collaboration is happening 
 here: https://github.com/project-flink/flink-graph



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1415) Akka cleanups

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292134#comment-14292134
 ] 

ASF GitHub Bot commented on FLINK-1415:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/319#issuecomment-71501654
  
Do you think this change is also addressing this error?
https://travis-ci.org/rmetzger/flink/jobs/48365538


 Akka cleanups
 -

 Key: FLINK-1415
 URL: https://issues.apache.org/jira/browse/FLINK-1415
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Currently, Akka has many different timeout values. From a user perspective, 
 it would be helpful to deduce all different timeouts from a single timeout 
 value. Additionally, the user should still be able to define specific values 
 for the different timeouts.
 Akka uses the akka.jobmanager.url config parameter to override the jobmanager 
 address and the port in case of a local setup. This mechanism is not safe 
 since it is exposed to the user. Thus, the mechanism should be replaced.
 The notifyExecutionStateChange method allows objects to access the internal 
 state of the TaskManager actor. This causes NullPointerExceptions when 
 shutting down the actor. This method should be removed to avoid accessing the 
 internal state of an actor by another object.
 With the latest Akka changes, the TaskManager watches the JobManager in order 
 to detect when it died or lost the connection to the TaskManager. This 
 behaviour should be tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...

2015-01-26 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-71507264
  
Tests renamed :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1442) Archived Execution Graph consumes too much memory

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292498#comment-14292498
 ] 

ASF GitHub Bot commented on FLINK-1442:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/344#issuecomment-71548964
  
Looks good so far. I see that you removed the LRU code. Was that on purpose?

Leaving it in may be a good idea, because the soft references are cleared 
in arbitrary order. It may make newer jobs disappear before older ones. Having 
the LRU in would mean things behave as previously as long as the memory is 
sufficient, and the soft reference clearing kicks in as a safety valve.


 Archived Execution Graph consumes too much memory
 -

 Key: FLINK-1442
 URL: https://issues.apache.org/jira/browse/FLINK-1442
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Max Michels

 The JobManager archives the execution graphs, for analysis of jobs. The 
 graphs may consume a lot of memory.
 Especially the execution edges in all2all connection patterns are extremely 
 many and add up in memory consumption.
 The execution edges connect all parallel tasks. So for a all2all pattern 
 between n and m tasks, there are n*m edges. For parallelism of multiple 100 
 tasks, this can easily reach 100k objects and more, each with a set of 
 metadata.
 I propose the following to solve that:
 1.  Clear all execution edges from the graph (majority of the memory 
 consumers) when it is given to the archiver.
 2. Have the map/list of the archived graphs behind a soft reference, to it 
 will be removed under memory pressure before the JVM crashes. That may remove 
 graphs from the history early, but is much preferable to the JVM crashing, in 
 which case the graph is lost as well...
 3. Long term: The graph should be archived somewhere else. Somthing like the 
 History server used by Hadoop and Hive would be a good idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1442] Reduce memory consumption of arch...

2015-01-26 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/344#issuecomment-71548964
  
Looks good so far. I see that you removed the LRU code. Was that on purpose?

Leaving it in may be a good idea, because the soft references are cleared 
in arbitrary order. It may make newer jobs disappear before older ones. Having 
the LRU in would mean things behave as previously as long as the memory is 
sufficient, and the soft reference clearing kicks in as a safety valve.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1433] Add HADOOP_CLASSPATH to start scr...

2015-01-26 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/337#issuecomment-71548405
  
Okay, if it is an auxiliary classpath then it should be fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Discuss] Simplify SplittableIterator interfac...

2015-01-26 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/338#issuecomment-71595710
  
I think in the current state, this makes sense.

I wrote the interface like it was, because it would enable implementations 
that does not compute/provide all splits on all machines. Think of a collection 
input format where we send a different subset of the collection to all each 
node (not supported in the runtime now, but might be at some point).

I guess we can realize something similar by splitting on the client and 
then sending the sub set iterators directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations

2015-01-26 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/311#issuecomment-71598228
  
+1 from my side


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...

2015-01-26 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71598417
  
Okay, let's add an exclude for the linked target directory and update 
`basedir` to `project.basedir`. Will that do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1330) Restructure directory layout

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293071#comment-14293071
 ] 

ASF GitHub Bot commented on FLINK-1330:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71598417
  
Okay, let's add an exclude for the linked target directory and update 
`basedir` to `project.basedir`. Will that do?


 Restructure directory layout
 

 Key: FLINK-1330
 URL: https://issues.apache.org/jira/browse/FLINK-1330
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Documentation
Reporter: Max Michels
Priority: Minor
  Labels: usability

 When building Flink, the build results can currently be found under 
 flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/.
 I think we could improve the directory layout with the following:
 - provide the bin folder in the root by default
 - let the start up and submissions scripts in bin assemble the class path
 - in case the project hasn't been build yet, inform the user
 The changes would make it easier to work with Flink from source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings

2015-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292761#comment-14292761
 ] 

ASF GitHub Bot commented on FLINK-377:
--

Github user dan-blanchard commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71570805
  
I've only recently started looking at Flink, and the lack of support for 
non-JVM languages was a bit of showstopper for me.  That's one of the main 
reasons we use Storm.

Anyway, is the idea here that this will just be for Python? Will it be 
simple to for third parties to add support for other languages?


 Create a general purpose framework for language bindings
 

 Key: FLINK-377
 URL: https://issues.apache.org/jira/browse/FLINK-377
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
  Labels: github-import
 Fix For: pre-apache


 A general purpose API to run operators with arbitrary binaries. 
 This will allow to run Stratosphere programs written in Python, JavaScript, 
 Ruby, Go or whatever you like. 
 We suggest using Google Protocol Buffers for data serialization. This is the 
 list of languages that currently support ProtoBuf: 
 https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns 
 Very early prototype with python: 
 https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing 
 protobuf)
 For Ruby: https://github.com/infochimps-labs/wukong
 Two new students working at Stratosphere (@skunert and @filiphaase) are 
 working on this.
 The reference binding language will be for Python, but other bindings are 
 very welcome.
 The best name for this so far is stratosphere-lang-bindings.
 I created this issue to track the progress (and give everybody a chance to 
 comment on this)
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/377
 Created by: [rmetzger|https://github.com/rmetzger]
 Labels: enhancement, 
 Assignee: [filiphaase|https://github.com/filiphaase]
 Created at: Tue Jan 07 19:47:20 CET 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >