[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings
[ https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291766#comment-14291766 ] ASF GitHub Bot commented on FLINK-377: -- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71453267 I've rebased and update this PR. Notable new stuff: * hybrid mode removed * documentation update and integrated into website * **chaining** on the python side (map,flatmap, filter, combine) * groupreduce/cogroup reworked - grouping done on python side * iterators passed to UDF's now iterable * **lambda support** * **test coverage** (works from IDE, maven and on travis) Create a general purpose framework for language bindings Key: FLINK-377 URL: https://issues.apache.org/jira/browse/FLINK-377 Project: Flink Issue Type: Improvement Reporter: GitHub Import Labels: github-import Fix For: pre-apache A general purpose API to run operators with arbitrary binaries. This will allow to run Stratosphere programs written in Python, JavaScript, Ruby, Go or whatever you like. We suggest using Google Protocol Buffers for data serialization. This is the list of languages that currently support ProtoBuf: https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns Very early prototype with python: https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing protobuf) For Ruby: https://github.com/infochimps-labs/wukong Two new students working at Stratosphere (@skunert and @filiphaase) are working on this. The reference binding language will be for Python, but other bindings are very welcome. The best name for this so far is stratosphere-lang-bindings. I created this issue to track the progress (and give everybody a chance to comment on this) Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/377 Created by: [rmetzger|https://github.com/rmetzger] Labels: enhancement, Assignee: [filiphaase|https://github.com/filiphaase] Created at: Tue Jan 07 19:47:20 CET 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings
[ https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291803#comment-14291803 ] ASF GitHub Bot commented on FLINK-377: -- Github user uce commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71458700 Wow, great news! :-) In general, I think we really have to do something about getting the changes in. The PR is growing faster than its getting feedback. Has anybody looked into this and tried it out recently? Create a general purpose framework for language bindings Key: FLINK-377 URL: https://issues.apache.org/jira/browse/FLINK-377 Project: Flink Issue Type: Improvement Reporter: GitHub Import Labels: github-import Fix For: pre-apache A general purpose API to run operators with arbitrary binaries. This will allow to run Stratosphere programs written in Python, JavaScript, Ruby, Go or whatever you like. We suggest using Google Protocol Buffers for data serialization. This is the list of languages that currently support ProtoBuf: https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns Very early prototype with python: https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing protobuf) For Ruby: https://github.com/infochimps-labs/wukong Two new students working at Stratosphere (@skunert and @filiphaase) are working on this. The reference binding language will be for Python, but other bindings are very welcome. The best name for this so far is stratosphere-lang-bindings. I created this issue to track the progress (and give everybody a chance to comment on this) Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/377 Created by: [rmetzger|https://github.com/rmetzger] Labels: enhancement, Assignee: [filiphaase|https://github.com/filiphaase] Created at: Tue Jan 07 19:47:20 CET 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71458700 Wow, great news! :-) In general, I think we really have to do something about getting the changes in. The PR is growing faster than its getting feedback. Has anybody looked into this and tried it out recently? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1415) Akka cleanups
[ https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291770#comment-14291770 ] ASF GitHub Bot commented on FLINK-1415: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/319#discussion_r23526836 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotSharingGroupAssignment.java --- @@ -42,86 +44,82 @@ public class SlotSharingGroupAssignment implements Serializable { static final long serialVersionUID = 42L; - + private static final Logger LOG = Scheduler.LOG; - + private transient final Object lock = new Object(); - + /** All slots currently allocated to this sharing group */ private final SetSharedSlot allSlots = new LinkedHashSetSharedSlot(); - + /** The slots available per vertex type (jid), keyed by instance, to make them locatable */ private final MapAbstractID, MapInstance, ListSharedSlot availableSlotsPerJid = new LinkedHashMapAbstractID, MapInstance, ListSharedSlot(); - - + // - - - public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex vertex) { - JobVertexID id = vertex.getJobvertexId(); - return addNewSlotWithTask(slot, id, id); - } - - public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex vertex, CoLocationConstraint constraint) { - AbstractID groupId = constraint.getGroupId(); - return addNewSlotWithTask(slot, groupId, null); - } - - private SubSlot addNewSlotWithTask(AllocatedSlot slot, AbstractID groupId, JobVertexID vertexId) { - - final SharedSlot sharedSlot = new SharedSlot(slot, this); - final Instance location = slot.getInstance(); - + + public SimpleSlot addSharedSlotAndAllocateSubSlot(SharedSlot sharedSlot, Locality locality, + AbstractID groupId, CoLocationConstraint constraint) { --- End diff -- indentation? Akka cleanups - Key: FLINK-1415 URL: https://issues.apache.org/jira/browse/FLINK-1415 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Currently, Akka has many different timeout values. From a user perspective, it would be helpful to deduce all different timeouts from a single timeout value. Additionally, the user should still be able to define specific values for the different timeouts. Akka uses the akka.jobmanager.url config parameter to override the jobmanager address and the port in case of a local setup. This mechanism is not safe since it is exposed to the user. Thus, the mechanism should be replaced. The notifyExecutionStateChange method allows objects to access the internal state of the TaskManager actor. This causes NullPointerExceptions when shutting down the actor. This method should be removed to avoid accessing the internal state of an actor by another object. With the latest Akka changes, the TaskManager watches the JobManager in order to detect when it died or lost the connection to the TaskManager. This behaviour should be tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1168) Support multi-character field delimiters in CSVInputFormats
[ https://issues.apache.org/jira/browse/FLINK-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291783#comment-14291783 ] ASF GitHub Bot commented on FLINK-1168: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/264#issuecomment-71455623 Updated the PR and will merge once Travis completed the build. Support multi-character field delimiters in CSVInputFormats --- Key: FLINK-1168 URL: https://issues.apache.org/jira/browse/FLINK-1168 Project: Flink Issue Type: Improvement Affects Versions: 0.7.0-incubating Reporter: Fabian Hueske Assignee: Manu Kaul Priority: Minor Labels: starter The CSVInputFormat supports multi-char (String) line delimiters, but only single-char (char) field delimiters. This issue proposes to add support for multi-char field delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1424) bin/flink run does not recognize -c parameter anymore
[ https://issues.apache.org/jira/browse/FLINK-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-1424: -- Assignee: Max Michels bin/flink run does not recognize -c parameter anymore - Key: FLINK-1424 URL: https://issues.apache.org/jira/browse/FLINK-1424 Project: Flink Issue Type: Bug Components: TaskManager Affects Versions: master Reporter: Carsten Brandt Assignee: Max Michels bin/flink binary does not recognize `-c` parameter anymore which specifies the class to run: {noformat} $ ./flink run /path/to/target/impro3-ws14-flink-1.0-SNAPSHOT.jar -c de.tu_berlin.impro3.flink.etl.FollowerGraphGenerator /tmp/flink/testgraph.txt 1 usage: emma-experiments-impro3-ss14-flink [-?] emma-experiments-impro3-ss14-flink: error: unrecognized arguments: '-c' {noformat} before this command worked fine and executed the job. I tracked it down to the following commit using `git bisect`: {noformat} 93eadca782ee8c77f89609f6d924d73021dcdda9 is the first bad commit commit 93eadca782ee8c77f89609f6d924d73021dcdda9 Author: Alexander Alexandrov alexander.s.alexand...@gmail.com Date: Wed Dec 24 13:49:56 2014 +0200 [FLINK-1027] [cli] Added support for '--' and '-' prefixed tokens in CLI program arguments. This closes #278 :04 04 a1358e6f7fe308b4d51a47069f190a29f87fdeda d6f11bbc9444227d5c6297ec908e44b9644289a9 Mflink-clients {noformat} https://github.com/apache/flink/commit/93eadca782ee8c77f89609f6d924d73021dcdda9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1443) Add replicated data source
[ https://issues.apache.org/jira/browse/FLINK-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske reassigned FLINK-1443: Assignee: Fabian Hueske Add replicated data source -- Key: FLINK-1443 URL: https://issues.apache.org/jira/browse/FLINK-1443 Project: Flink Issue Type: New Feature Components: Java API, JobManager, Optimizer Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Minor This issue proposes to add support for data sources that read the same data in all parallel instances. This feature can be useful, if the data is replicated to all machines in a cluster and can be locally read. For example, a replicated input format can be used for a broadcast join without sending any data over the network. The following changes are necessary to achieve this: 1) Add a replicating InputSplitAssigner which assigns all splits to the all parallel instances. This requires also to extend the InputSplitAssigner interface to identify the exact parallel instance that requests an InputSplit (currently only the hostname is provided). 2) Make sure that the DOP of the replicated data source is identical to the DOP of its successor. 3) Let the optimizer know that the data is replicated and ensure that plan enumeration works correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1105) Add support for locally sorted output
[ https://issues.apache.org/jira/browse/FLINK-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske reassigned FLINK-1105: Assignee: Fabian Hueske Add support for locally sorted output - Key: FLINK-1105 URL: https://issues.apache.org/jira/browse/FLINK-1105 Project: Flink Issue Type: Sub-task Components: Java API Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Minor This feature will make it possible to sort the output which is sent to an OutputFormat to obtain a locally sorted result. This feature was available in the old Java API and has not be ported to the new Java API yet. Hence optimizer and runtime should already have support for this feature. However, the API and job generation part is missing. It is also a subfeature of FLINK-598 which will provide also globally sorted results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1168) Support multi-character field delimiters in CSVInputFormats
[ https://issues.apache.org/jira/browse/FLINK-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-1168. -- Resolution: Fixed Fixed with 0548a93dfc555a5403590f147d4850c730facaf6 Support multi-character field delimiters in CSVInputFormats --- Key: FLINK-1168 URL: https://issues.apache.org/jira/browse/FLINK-1168 Project: Flink Issue Type: Improvement Affects Versions: 0.7.0-incubating Reporter: Fabian Hueske Assignee: Manu Kaul Priority: Minor Labels: starter The CSVInputFormat supports multi-char (String) line delimiters, but only single-char (char) field delimiters. This issue proposes to add support for multi-char field delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1344] [streaming] Added static StreamEx...
GitHub user senorcarbone opened a pull request: https://github.com/apache/flink/pull/341 [FLINK-1344] [streaming] Added static StreamExecutionEnvironment initialisation and Implicits for scala sources This PR addresses the ticket [1] for further scala constructs interoperability. I had to add static StreamExecutionEnvironment initialisation to make the implicit conversion possible. [1] https://issues.apache.org/jira/browse/FLINK-1344 You can merge this pull request into a Git repository by running: $ git pull https://github.com/mbalassi/flink scala-seq Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/341.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #341 commit 2ea675895d605e5c0a442171388c7be9361acf79 Author: Paris Carbone seniorcarb...@gmail.com Date: 2015-01-23T16:23:46Z [FLINK-1344] [streaming] [scala] Added implicits from scala seq to datastream and static StreamExecutionEnvironment initialization --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Mk amulti char delim
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/247 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-1153) Yarn container does not terminate if Flink's yarn client is terminated before the application master is completely started
[ https://issues.apache.org/jira/browse/FLINK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger closed FLINK-1153. - Resolution: Won't Fix Yarn container does not terminate if Flink's yarn client is terminated before the application master is completely started -- Key: FLINK-1153 URL: https://issues.apache.org/jira/browse/FLINK-1153 Project: Flink Issue Type: Bug Components: YARN Client Reporter: Till Rohrmann Assignee: Robert Metzger The yarn application master container does not terminate if the yarn client is terminated after the container request is issued but before the application master is completely started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1016) Add directory for user-contributed programs
[ https://issues.apache.org/jira/browse/FLINK-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger closed FLINK-1016. - Resolution: Duplicate Add directory for user-contributed programs --- Key: FLINK-1016 URL: https://issues.apache.org/jira/browse/FLINK-1016 Project: Flink Issue Type: Task Reporter: Robert Metzger Assignee: Robert Metzger Priority: Trivial As a result of this discussion: http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Where-to-host-user-contributed-Flink-programs-td750.html I'm going to add a directory for user-contributed Flink programs that are reviewed by committers, but not shipped with releases. In addition, we require that these programs provide test cases to improve test diversity and the overall coverage. The first algorithm I'm going to add is the contribution made in FLINK-904. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71470142 Yes, I thought it would handle it as well, but I just ran it and it didn't work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1330) Restructure directory layout
[ https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291796#comment-14291796 ] ASF GitHub Bot commented on FLINK-1330: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/333#discussion_r23528552 --- Diff: flink-dist/pom.xml --- @@ -436,6 +436,37 @@ under the License. /gitDescribe /configuration /plugin + + !-- create a symbolic link to the build target in the root directory -- + plugin + groupIdcom.pyx4j/groupId + artifactIdmaven-junction-plugin/artifactId + version1.0.3/version + executions + execution + phasepackage/phase + goals + goallink/goal + /goals + /execution + execution + idunlink/id + phaseclean/phase + goals + goalunlink/goal + /goals + /execution + /executions + configuration + links + link + dst${basedir}/../build-target/dst --- End diff -- I couldn't find a document stating that $basedir is deprecated, but I think you are right in the sense that the project prefix is used for everything related to the POM of the project (I think in previous versions the (now deprecated) prefix was `pom` and both the `version` and `basedir` properties are built-ins). We use $basedir in other places as well. Restructure directory layout Key: FLINK-1330 URL: https://issues.apache.org/jira/browse/FLINK-1330 Project: Flink Issue Type: Improvement Components: Build System, Documentation Reporter: Max Michels Priority: Minor Labels: usability When building Flink, the build results can currently be found under flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/. I think we could improve the directory layout with the following: - provide the bin folder in the root by default - let the start up and submissions scripts in bin assemble the class path - in case the project hasn't been build yet, inform the user The changes would make it easier to work with Flink from source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/333#discussion_r23528552 --- Diff: flink-dist/pom.xml --- @@ -436,6 +436,37 @@ under the License. /gitDescribe /configuration /plugin + + !-- create a symbolic link to the build target in the root directory -- + plugin + groupIdcom.pyx4j/groupId + artifactIdmaven-junction-plugin/artifactId + version1.0.3/version + executions + execution + phasepackage/phase + goals + goallink/goal + /goals + /execution + execution + idunlink/id + phaseclean/phase + goals + goalunlink/goal + /goals + /execution + /executions + configuration + links + link + dst${basedir}/../build-target/dst --- End diff -- I couldn't find a document stating that $basedir is deprecated, but I think you are right in the sense that the project prefix is used for everything related to the POM of the project (I think in previous versions the (now deprecated) prefix was `pom` and both the `version` and `basedir` properties are built-ins). We use $basedir in other places as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1168] Adds multi-char field delimiter s...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/264 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1201) Graph API for Flink
[ https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291851#comment-14291851 ] ASF GitHub Bot commented on FLINK-1201: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-71465448 Haven't had a closer look yet, but one thing that I noticed is the naming of the test files. In the current codebase all tests are named XyzTest (or XyzITCase) instead of TestXyz. Not sure if its worth changing though... Graph API for Flink Key: FLINK-1201 URL: https://issues.apache.org/jira/browse/FLINK-1201 Project: Flink Issue Type: New Feature Reporter: Kostas Tzoumas Assignee: Vasia Kalavri This issue tracks the development of a Graph API/DSL for Flink. Until the code is pushed to the Flink repository, collaboration is happening here: https://github.com/project-flink/flink-graph -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1330) Restructure directory layout
[ https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291880#comment-14291880 ] ASF GitHub Bot commented on FLINK-1330: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71468579 Very good catch. I would address @aljoscha's comment and add the exclude. After that, I think it's fine to merge. I've tested it locally and it works fine. Restructure directory layout Key: FLINK-1330 URL: https://issues.apache.org/jira/browse/FLINK-1330 Project: Flink Issue Type: Improvement Components: Build System, Documentation Reporter: Max Michels Priority: Minor Labels: usability When building Flink, the build results can currently be found under flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/. I think we could improve the directory layout with the following: - provide the bin folder in the root by default - let the start up and submissions scripts in bin assemble the class path - in case the project hasn't been build yet, inform the user The changes would make it easier to work with Flink from source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1330) Restructure directory layout
[ https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291886#comment-14291886 ] ASF GitHub Bot commented on FLINK-1330: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71469342 There's another minor thing. After running mvn clean, the symlink will point to a non-existing directory. Restructure directory layout Key: FLINK-1330 URL: https://issues.apache.org/jira/browse/FLINK-1330 Project: Flink Issue Type: Improvement Components: Build System, Documentation Reporter: Max Michels Priority: Minor Labels: usability When building Flink, the build results can currently be found under flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/. I think we could improve the directory layout with the following: - provide the bin folder in the root by default - let the start up and submissions scripts in bin assemble the class path - in case the project hasn't been build yet, inform the user The changes would make it easier to work with Flink from source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-956) Add a parameter to the YARN command line script that allows to define the amount of MemoryManager memory
[ https://issues.apache.org/jira/browse/FLINK-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-956. -- Resolution: Fixed Fix Version/s: 0.7.0-incubating This has been fixed as described in the comments. Add a parameter to the YARN command line script that allows to define the amount of MemoryManager memory Key: FLINK-956 URL: https://issues.apache.org/jira/browse/FLINK-956 Project: Flink Issue Type: Improvement Components: YARN Client Affects Versions: 0.6-incubating Reporter: Stephan Ewen Assignee: Robert Metzger Fix For: 0.7.0-incubating The current parameter specifies the YARN container size. It would be nice to have parameters for the JVM heap size {{taskmanager.heap.mb}} and the amount of memory that goes to the memory manager {{taskmanager.memory.size}}. Both should be optional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1328) Rework Constant Field Annotations
[ https://issues.apache.org/jira/browse/FLINK-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291904#comment-14291904 ] ASF GitHub Bot commented on FLINK-1328: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/311#issuecomment-71470851 Addressed most comments and renamed constantFields/Sets to forwardedFields as discussed on dev-ml. Would like to merge this soon. Rework Constant Field Annotations - Key: FLINK-1328 URL: https://issues.apache.org/jira/browse/FLINK-1328 Project: Flink Issue Type: Improvement Components: Java API, Optimizer, Scala API Affects Versions: 0.7.0-incubating Reporter: Fabian Hueske Assignee: Fabian Hueske Constant field annotations are used by the optimizer to determine whether physical data properties such as sorting or partitioning are retained by user defined functions. The current implementation is limited and can be extended in several ways: - Fields that are copied to other positions - Field definitions for non-tuple data types (Pojos) There is a pull request (#83) that goes into this direction and which can be extended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1453) Integration tests for YARN failing on OS X
[ https://issues.apache.org/jira/browse/FLINK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-1453. --- Resolution: Fixed Fix Version/s: 0.9 Issue resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/3bdeab1b. Thanks to [~uce] for giving me ssh access to his Mac :D Integration tests for YARN failing on OS X -- Key: FLINK-1453 URL: https://issues.apache.org/jira/browse/FLINK-1453 Project: Flink Issue Type: Bug Components: YARN Client Reporter: Robert Metzger Assignee: Robert Metzger Fix For: 0.9 The flink yarn tests are failing on OS X, most likely through a port conflict: {code} 11:59:38,870 INFO org.eclipse.jetty.util.log - jetty-0.9-SNAPSHOT 11:59:38,885 WARN org.eclipse.jetty.util.log - FAILED SelectChannelConnector@0.0.0.0:8081: java.net.BindException: Address already in use 11:59:38,885 WARN org.eclipse.jetty.util.log - FAILED org.eclipse.jetty.server.Server@281c7736: java.net.BindException: Address already in use 11:59:38,892 ERROR akka.actor.OneForOneStrategy - Address already in use akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:208) at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:288) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:55) at org.eclipse.jetty.server.Server.doStart(Server.java:254) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:55) at org.apache.flink.runtime.jobmanager.web.WebInfoServer.start(WebInfoServer.java:198) at org.apache.flink.runtime.jobmanager.WithWebServer$class.$init$(WithWebServer.scala:28) at org.apache.flink.yarn.ApplicationMaster$$anonfun$startJobManager$2$$anon$1.init(ApplicationMaster.scala:181) at org.apache.flink.yarn.ApplicationMaster$$anonfun$startJobManager$2.apply(ApplicationMaster.scala:181) at org.apache.flink.yarn.ApplicationMaster$$anonfun$startJobManager$2.apply(ApplicationMaster.scala:181) at akka.actor.TypedCreatorFunctionConsumer.produce(Props.scala:343) at akka.actor.Props.newActor(Props.scala:252) at akka.actor.ActorCell.newActor(ActorCell.scala:552) at akka.actor.ActorCell.create(ActorCell.scala:578) ... 10 more {code} The issue does not appear on Travis or on Arch Linux (however, tests are also failing on some Ubuntu versions) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1428) Typos in Java code example for RichGroupReduceFunction
[ https://issues.apache.org/jira/browse/FLINK-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-1428. Resolution: Fixed Fixed in 06b2acf. Typos in Java code example for RichGroupReduceFunction -- Key: FLINK-1428 URL: https://issues.apache.org/jira/browse/FLINK-1428 Project: Flink Issue Type: Bug Components: Project Website Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Minor http://flink.apache.org/docs/0.7-incubating/dataset_transformations.html String key = null //missing ';' public void combine(IterableTuple3String, Integer, Double in, CollectorTuple3String, Integer, Double out)) -- one ')' too much -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...
Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/333#discussion_r23527422 --- Diff: flink-dist/pom.xml --- @@ -436,6 +436,37 @@ under the License. /gitDescribe /configuration /plugin + + !-- create a symbolic link to the build target in the root directory -- + plugin + groupIdcom.pyx4j/groupId + artifactIdmaven-junction-plugin/artifactId + version1.0.3/version + executions + execution + phasepackage/phase + goals + goallink/goal + /goals + /execution + execution + idunlink/id + phaseclean/phase + goals + goalunlink/goal + /goals + /execution + /executions + configuration + links + link + dst${basedir}/../build-target/dst --- End diff -- Isn't based deprecated? In the next line you use project.basedir. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1330) Restructure directory layout
[ https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291781#comment-14291781 ] ASF GitHub Bot commented on FLINK-1330: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/333#discussion_r23527422 --- Diff: flink-dist/pom.xml --- @@ -436,6 +436,37 @@ under the License. /gitDescribe /configuration /plugin + + !-- create a symbolic link to the build target in the root directory -- + plugin + groupIdcom.pyx4j/groupId + artifactIdmaven-junction-plugin/artifactId + version1.0.3/version + executions + execution + phasepackage/phase + goals + goallink/goal + /goals + /execution + execution + idunlink/id + phaseclean/phase + goals + goalunlink/goal + /goals + /execution + /executions + configuration + links + link + dst${basedir}/../build-target/dst --- End diff -- Isn't based deprecated? In the next line you use project.basedir. Restructure directory layout Key: FLINK-1330 URL: https://issues.apache.org/jira/browse/FLINK-1330 Project: Flink Issue Type: Improvement Components: Build System, Documentation Reporter: Max Michels Priority: Minor Labels: usability When building Flink, the build results can currently be found under flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/. I think we could improve the directory layout with the following: - provide the bin folder in the root by default - let the start up and submissions scripts in bin assemble the class path - in case the project hasn't been build yet, inform the user The changes would make it easier to work with Flink from source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1320) Add an off-heap variant of the managed memory
[ https://issues.apache.org/jira/browse/FLINK-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-1320: -- Assignee: Max Michels Add an off-heap variant of the managed memory - Key: FLINK-1320 URL: https://issues.apache.org/jira/browse/FLINK-1320 Project: Flink Issue Type: Improvement Components: Local Runtime Reporter: Stephan Ewen Assignee: Max Michels Priority: Minor For (nearly) all memory that Flink accumulates (in the form of sort buffers, hash tables, caching), we use a special way of representing data serialized across a set of memory pages. The big work lies in the way the algorithms are implemented to operate on pages, rather than on objects. The core class for the memory is the {{MemorySegment}}, which has all methods to set and get primitives values efficiently. It is a somewhat simpler (and faster) variant of a HeapByteBuffer. As such, it should be straightforward to create a version where the memory segment is not backed by a heap byte[], but by memory allocated outside the JVM, in a similar way as the NIO DirectByteBuffers, or the Netty direct buffers do it. This may have multiple advantages: - We reduce the size of the JVM heap (garbage collected) and the number and size of long living alive objects. For large JVM sizes, this may improve performance quite a bit. Utilmately, we would in many cases reduce JVM size to 1/3 to 1/2 and keep the remaining memory outside the JVM. - We save copies when we move memory pages to disk (spilling) or through the network (shuffling / broadcasting / forward piping) The changes required to implement this are - Add a {{UnmanagedMemorySegment}} that only stores the memory adress as a long, and the segment size. It is initialized from a DirectByteBuffer. - Allow the MemoryManager to allocate these MemorySegments, instead of the current ones. - Make sure that the startup script pick up the mode and configure the heap size and the max direct memory properly. Since the MemorySegment is probably the most performance critical class in Flink, we must take care that we do this right. The following are critical considerations: - If we want both solutions (heap and off-heap) to exist side-by-side (configurable), we must make the base MemorySegment abstract and implement two versions (heap and off-heap). - To get the best performance, we need to make sure that only one class gets loaded (or at least ever used), to ensure optimal JIT de-virtualization and inlining. - We should carefully measure the performance of both variants. From previous micro benchmarks, I remember that individual byte accesses in DirectByteBuffers (off-heap) were slightly slower than on-heap, any larger accesses were equally good or slightly better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1428] Update dataset_transformations.md
Github user uce commented on the pull request: https://github.com/apache/flink/pull/340#issuecomment-71453921 +1 Will merge this later to `master` and `release-0.8`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1415) Akka cleanups
[ https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291773#comment-14291773 ] ASF GitHub Bot commented on FLINK-1415: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/319#discussion_r23526978 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java --- @@ -329,6 +330,15 @@ public void unregisterMemoryManager(MemoryManager memoryManager) { } } + protected void notifyExecutionStateChange(ExecutionState executionState, + Throwable optionalError) { --- End diff -- This also seems weird Akka cleanups - Key: FLINK-1415 URL: https://issues.apache.org/jira/browse/FLINK-1415 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Currently, Akka has many different timeout values. From a user perspective, it would be helpful to deduce all different timeouts from a single timeout value. Additionally, the user should still be able to define specific values for the different timeouts. Akka uses the akka.jobmanager.url config parameter to override the jobmanager address and the port in case of a local setup. This mechanism is not safe since it is exposed to the user. Thus, the mechanism should be replaced. The notifyExecutionStateChange method allows objects to access the internal state of the TaskManager actor. This causes NullPointerExceptions when shutting down the actor. This method should be removed to avoid accessing the internal state of an actor by another object. With the latest Akka changes, the TaskManager watches the JobManager in order to detect when it died or lost the connection to the TaskManager. This behaviour should be tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71454018 Very nice. +1 Will merge this later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1428) Typos in Java code example for RichGroupReduceFunction
[ https://issues.apache.org/jira/browse/FLINK-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291771#comment-14291771 ] ASF GitHub Bot commented on FLINK-1428: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/340#issuecomment-71453921 +1 Will merge this later to `master` and `release-0.8`. Typos in Java code example for RichGroupReduceFunction -- Key: FLINK-1428 URL: https://issues.apache.org/jira/browse/FLINK-1428 Project: Flink Issue Type: Bug Components: Project Website Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Minor http://flink.apache.org/docs/0.7-incubating/dataset_transformations.html String key = null //missing ';' public void combine(IterableTuple3String, Integer, Double in, CollectorTuple3String, Integer, Double out)) -- one ')' too much -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1415] Akka cleanups
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/319#discussion_r23526978 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java --- @@ -329,6 +330,15 @@ public void unregisterMemoryManager(MemoryManager memoryManager) { } } + protected void notifyExecutionStateChange(ExecutionState executionState, + Throwable optionalError) { --- End diff -- This also seems weird --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-1436] refactor CLiFrontend to provide m...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/331#issuecomment-71467375 I think it would be better not to print the help if the user specified something incorrectly. Maybe just the error message and a note that -h prints the help? I've tried out the change, but now, the message is as the very bottom of the output. Its now probably even harder to find it. **Bad** (see below for *Good*) ``` robert@robert-da ~/flink-workdir/flink2/build-target (git)-[flink-1436] % ./bin/flink ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar Action run compiles and runs a program. Syntax: run [OPTIONS] jar-file arguments run action options: -c,--class classname Class with the program entry point (main method or getPlan() method. Only needed if the JAR file does not specify the class in its manifest. -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Specify 'yarn-cluster' as the JobManager to deploy a YARN cluster for the job. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelism parallelism The parallelism with which to run the program. Optional flag to override the default value specified in the configuration. Additional arguments if -m yarn-cluster is set: -yD argDynamic properties -yj,--yarnjar arg Path to Flink jar file -yjm,--yarnjobManagerMemory argMemory for JobManager Container [in MB] -yn,--yarncontainer argNumber of YARN container to allocate (=Number of Task Managers) -yq,--yarnquery Display available YARN resources (memory, cores) -yqu,--yarnqueue arg Specify YARN queue. -ys,--yarnslots argNumber of slots per TaskManager -yt,--yarnship arg Ship files in the specified directory (t for transfer) -ytm,--yarntaskManagerMemory arg Memory per TaskManager Container [in MB] Invalid action: ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar 1 robert@robert-da ~/flink-workdir/flink2/build-target (git)-[flink-1436] ``` The info command is over-engineered in my optionion. It contains only one possible option, which is -e for execution plan. I would vote to remove the info action and call it plan or so. Or keep its info name and print the plan by default (this is not @mxm's fault .. but it would be nice to fix this with the PR) ``` ./bin/flink info ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar Action info displays information about a program. Syntax: info [OPTIONS] jar-file arguments info action options: -c,--class classname Class with the program entry point (main method or getPlan() method. Only needed if the JAR file does not specify the class in its manifest. -e,--executionplan Show optimized execution plan of the program (JSON) -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Specify 'yarn-cluster' as the JobManager to deploy a YARN cluster for the job. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelism parallelism The parallelism with which to run the program. Optional flag to override the default value specified in the configuration. Error: Specify one of the above options to display information. ``` **Good** What I liked was the error reporting when passing an invalid file as the
[jira] [Commented] (FLINK-1318) Make quoted String parsing optional and configurable for CSVInputFormats
[ https://issues.apache.org/jira/browse/FLINK-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291900#comment-14291900 ] ASF GitHub Bot commented on FLINK-1318: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/265#issuecomment-71470231 Any comments on this PR? Make quoted String parsing optional and configurable for CSVInputFormats Key: FLINK-1318 URL: https://issues.apache.org/jira/browse/FLINK-1318 Project: Flink Issue Type: Improvement Components: Java API, Scala API Affects Versions: 0.8 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Minor With the current implementation of the CSVInputFormat, quoted string parsing kicks in, if the first non-whitespace character of a field is a double quote. I see two issues with this implementation: 1. Quoted String parsing cannot be disabled 2. The quoting character is fixed to double quotes () I propose to add parameters to disable quoted String parsing and set the quote character. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1415] Akka cleanups
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/319#discussion_r23526836 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotSharingGroupAssignment.java --- @@ -42,86 +44,82 @@ public class SlotSharingGroupAssignment implements Serializable { static final long serialVersionUID = 42L; - + private static final Logger LOG = Scheduler.LOG; - + private transient final Object lock = new Object(); - + /** All slots currently allocated to this sharing group */ private final SetSharedSlot allSlots = new LinkedHashSetSharedSlot(); - + /** The slots available per vertex type (jid), keyed by instance, to make them locatable */ private final MapAbstractID, MapInstance, ListSharedSlot availableSlotsPerJid = new LinkedHashMapAbstractID, MapInstance, ListSharedSlot(); - - + // - - - public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex vertex) { - JobVertexID id = vertex.getJobvertexId(); - return addNewSlotWithTask(slot, id, id); - } - - public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex vertex, CoLocationConstraint constraint) { - AbstractID groupId = constraint.getGroupId(); - return addNewSlotWithTask(slot, groupId, null); - } - - private SubSlot addNewSlotWithTask(AllocatedSlot slot, AbstractID groupId, JobVertexID vertexId) { - - final SharedSlot sharedSlot = new SharedSlot(slot, this); - final Instance location = slot.getInstance(); - + + public SimpleSlot addSharedSlotAndAllocateSubSlot(SharedSlot sharedSlot, Locality locality, + AbstractID groupId, CoLocationConstraint constraint) { --- End diff -- indentation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1415) Akka cleanups
[ https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291769#comment-14291769 ] ASF GitHub Bot commented on FLINK-1415: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/319#discussion_r23526787 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotAvailabilityListener.java --- @@ -21,7 +21,7 @@ import org.apache.flink.runtime.instance.Instance; /** - * A SlotAvailabilityListener can be notified when new {@link org.apache.flink.runtime.instance.AllocatedSlot}s become available + * A SlotAvailabilityListener can be notified when new {@link org.apache.flink.runtime.instance.AllocatedSlot2}s become available --- End diff -- I guess `AllocatedSlot2` is an automatic rename leftover Akka cleanups - Key: FLINK-1415 URL: https://issues.apache.org/jira/browse/FLINK-1415 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Currently, Akka has many different timeout values. From a user perspective, it would be helpful to deduce all different timeouts from a single timeout value. Additionally, the user should still be able to define specific values for the different timeouts. Akka uses the akka.jobmanager.url config parameter to override the jobmanager address and the port in case of a local setup. This mechanism is not safe since it is exposed to the user. Thus, the mechanism should be replaced. The notifyExecutionStateChange method allows objects to access the internal state of the TaskManager actor. This causes NullPointerExceptions when shutting down the actor. This method should be removed to avoid accessing the internal state of an actor by another object. With the latest Akka changes, the TaskManager watches the JobManager in order to detect when it died or lost the connection to the TaskManager. This behaviour should be tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1415] Akka cleanups
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/319#discussion_r23526787 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotAvailabilityListener.java --- @@ -21,7 +21,7 @@ import org.apache.flink.runtime.instance.Instance; /** - * A SlotAvailabilityListener can be notified when new {@link org.apache.flink.runtime.instance.AllocatedSlot}s become available + * A SlotAvailabilityListener can be notified when new {@link org.apache.flink.runtime.instance.AllocatedSlot2}s become available --- End diff -- I guess `AllocatedSlot2` is an automatic rename leftover --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1415] Akka cleanups
Github user uce commented on the pull request: https://github.com/apache/flink/pull/319#issuecomment-71458216 I think @StephanEwen is reviewing the critical part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1330) Restructure directory layout
[ https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291839#comment-14291839 ] ASF GitHub Bot commented on FLINK-1330: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71464164 -1 I think we need to add the `build-target` directory into the list of ignored directories for apache rat. Rat will fail subsequent builds ``` 1 Unknown Licenses *** Unapproved licenses: build-target/conf/slaves *** ``` Restructure directory layout Key: FLINK-1330 URL: https://issues.apache.org/jira/browse/FLINK-1330 Project: Flink Issue Type: Improvement Components: Build System, Documentation Reporter: Max Michels Priority: Minor Labels: usability When building Flink, the build results can currently be found under flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/. I think we could improve the directory layout with the following: - provide the bin folder in the root by default - let the start up and submissions scripts in bin assemble the class path - in case the project hasn't been build yet, inform the user The changes would make it easier to work with Flink from source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1415) Akka cleanups
[ https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291785#comment-14291785 ] ASF GitHub Bot commented on FLINK-1415: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/319#issuecomment-71455790 This is a huge change. I took a look over the code, but I don't have enough experience with the scheduler to understand these changes. I would suggest to merge this rather soon because its touching a lot of code due to minor scala style changes (semicolons, removal of parentheses from no-arg methods, unneeded { } and so on) +1 for the added documentation to the classes! The bug in FLINK-1453 would be more obvious with these changes were merged. That's another motivation for me to push this pull request forward ;) Akka cleanups - Key: FLINK-1415 URL: https://issues.apache.org/jira/browse/FLINK-1415 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Currently, Akka has many different timeout values. From a user perspective, it would be helpful to deduce all different timeouts from a single timeout value. Additionally, the user should still be able to define specific values for the different timeouts. Akka uses the akka.jobmanager.url config parameter to override the jobmanager address and the port in case of a local setup. This mechanism is not safe since it is exposed to the user. Thus, the mechanism should be replaced. The notifyExecutionStateChange method allows objects to access the internal state of the TaskManager actor. This causes NullPointerExceptions when shutting down the actor. This method should be removed to avoid accessing the internal state of an actor by another object. With the latest Akka changes, the TaskManager watches the JobManager in order to detect when it died or lost the connection to the TaskManager. This behaviour should be tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71464164 -1 I think we need to add the `build-target` directory into the list of ignored directories for apache rat. Rat will fail subsequent builds ``` 1 Unknown Licenses *** Unapproved licenses: build-target/conf/slaves *** ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1446] Fix Kryo createInstance() method
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/336 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1446) Make KryoSerializer.createInstance() return new instances instead of null
[ https://issues.apache.org/jira/browse/FLINK-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291936#comment-14291936 ] ASF GitHub Bot commented on FLINK-1446: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/336 Make KryoSerializer.createInstance() return new instances instead of null - Key: FLINK-1446 URL: https://issues.apache.org/jira/browse/FLINK-1446 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Robert Metzger Fix For: 0.9, 0.8.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71488487 I don't really understand how the static lock solves the mentioned issue. Is there a concurrency problem between creating files on disk and updating the count hash map? I think there is a problem between the DeleteProcess and the CopyProcess. The CopyProcess is synchronized on the static lock object and the DeleteProcess is not. Thus, it might be the case that the copy method created the directories for a new file foobar, let's say /tmp/123/foobar, and afterwards the delete process deletes the directory /tmp/123 because it checked the count hash map before the createTmpFile method was called. This problem should still persist with the current changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292024#comment-14292024 ] ASF GitHub Bot commented on FLINK-1419: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71488487 I don't really understand how the static lock solves the mentioned issue. Is there a concurrency problem between creating files on disk and updating the count hash map? I think there is a problem between the DeleteProcess and the CopyProcess. The CopyProcess is synchronized on the static lock object and the DeleteProcess is not. Thus, it might be the case that the copy method created the directories for a new file foobar, let's say /tmp/123/foobar, and afterwards the delete process deletes the directory /tmp/123 because it checked the count hash map before the createTmpFile method was called. This problem should still persist with the current changes. DistributedCache doesn't preserver files for subsequent operations -- Key: FLINK-1419 URL: https://issues.apache.org/jira/browse/FLINK-1419 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Chesnay Schepler Assignee: Chesnay Schepler When subsequent operations want to access the same files in the DC it frequently happens that the files are not created for the following operation. This is fairly odd, since the DC is supposed to either a) preserve files when another operation kicks in within a certain time window, or b) just recreate the deleted files. Both things don't happen. Increasing the time window had no effect. I'd like to use this issue as a starting point for a more general discussion about the DistributedCache. Currently: 1. all files reside in a common job-specific directory 2. are deleted during the job. One thing that was brought up about Trait 1 is that it basically forbids modification of the files, concurrent access and all. Personally I'm not sure if this a problem. Changing it to a task-specific place solved the issue though. I'm more concerned about Trait #2. Besides the mentioned issue, the deletion is realized with the scheduler, which adds a lot of complexity to the current code. (It really is a pain to work on...) If we moved the deletion to the end of the job it could be done as a clean-up step in the TaskManager, With this we could reduce the DC to a cacheFile(String source) method, the delete method in the TM, and throw out everything else. Also, the current implementation implies that big files may be copied multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Discuss] Simplify SplittableIterator interfac...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/338#issuecomment-71490635 LGTM I suspect that the travis fail is caused by faulty colocated subslot disposal. Should be fixed with #317 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1415) Akka cleanups
[ https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291950#comment-14291950 ] ASF GitHub Bot commented on FLINK-1415: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/319#issuecomment-71477890 I think so too. The changes of the scheduler are only included because otherwise the test cases wouldn't pass. The scheduler relevant PR is #317. Akka cleanups - Key: FLINK-1415 URL: https://issues.apache.org/jira/browse/FLINK-1415 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Currently, Akka has many different timeout values. From a user perspective, it would be helpful to deduce all different timeouts from a single timeout value. Additionally, the user should still be able to define specific values for the different timeouts. Akka uses the akka.jobmanager.url config parameter to override the jobmanager address and the port in case of a local setup. This mechanism is not safe since it is exposed to the user. Thus, the mechanism should be replaced. The notifyExecutionStateChange method allows objects to access the internal state of the TaskManager actor. This causes NullPointerExceptions when shutting down the actor. This method should be removed to avoid accessing the internal state of an actor by another object. With the latest Akka changes, the TaskManager watches the JobManager in order to detect when it died or lost the connection to the TaskManager. This behaviour should be tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1454) CliFrontend blocks for 100 seconds when submitting to a non-existent JobManager
[ https://issues.apache.org/jira/browse/FLINK-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291951#comment-14291951 ] Robert Metzger commented on FLINK-1454: --- Yes. It would be nicer if the system told the user that the network connection was refused. Also, 100s is a pretty long time, in particular because many newbies will come across this issue. CliFrontend blocks for 100 seconds when submitting to a non-existent JobManager --- Key: FLINK-1454 URL: https://issues.apache.org/jira/browse/FLINK-1454 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Robert Metzger When a user tries to submit a job to a job manager which doesn't exist at all, the CliFrontend blocks for 100 seconds. Ideally, Akka would fail because it can not connect to the given hostname:port. {code} ./bin/flink run ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar -c foo.Baz org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:449) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:350) at org.apache.flink.client.program.Client.run(Client.java:242) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:389) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:362) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1078) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1102) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [100 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.flink.runtime.akka.AkkaUtils$.ask(AkkaUtils.scala:265) at org.apache.flink.runtime.client.JobClient$.uploadJarFiles(JobClient.scala:169) at org.apache.flink.runtime.client.JobClient.uploadJarFiles(JobClient.scala) at org.apache.flink.client.program.Client.run(Client.java:314) at org.apache.flink.client.program.Client.run(Client.java:296) at org.apache.flink.client.program.Client.run(Client.java:290) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:434) ... 6 more The exception above occurred while trying to run your command. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1147) TypeInference on POJOs
[ https://issues.apache.org/jira/browse/FLINK-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291953#comment-14291953 ] ASF GitHub Bot commented on FLINK-1147: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/315 TypeInference on POJOs -- Key: FLINK-1147 URL: https://issues.apache.org/jira/browse/FLINK-1147 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 0.7.0-incubating Reporter: Stephan Ewen Assignee: Timo Walther On Tuples, we currently use type inference that figures out the types of output type variables relative to the input type variable. We need a similar functionality for POJOs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1415] Akka cleanups
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/319#discussion_r23537586 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotAvailabilityListener.java --- @@ -21,7 +21,7 @@ import org.apache.flink.runtime.instance.Instance; /** - * A SlotAvailabilityListener can be notified when new {@link org.apache.flink.runtime.instance.AllocatedSlot}s become available + * A SlotAvailabilityListener can be notified when new {@link org.apache.flink.runtime.instance.AllocatedSlot2}s become available --- End diff -- You are right. Good catch. I'll correct it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1147][Java API] TypeInference on POJOs
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/315 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1391] Add support for using Avro-POJOs ...
Github user rmetzger closed the pull request at: https://github.com/apache/flink/pull/323 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1201) Graph API for Flink
[ https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292046#comment-14292046 ] ASF GitHub Bot commented on FLINK-1201: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-71491484 No worries, I can rename the tests. It's better to be consistent :) May I ask, what is the difference between a XyzTest and a XyzITCase test though? Thnx! Graph API for Flink Key: FLINK-1201 URL: https://issues.apache.org/jira/browse/FLINK-1201 Project: Flink Issue Type: New Feature Reporter: Kostas Tzoumas Assignee: Vasia Kalavri This issue tracks the development of a Graph API/DSL for Flink. Until the code is pushed to the Flink repository, collaboration is happening here: https://github.com/project-flink/flink-graph -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1430) Add test for streaming scala api completeness
[ https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291921#comment-14291921 ] Mingliang Qi commented on FLINK-1430: - I saw some methods like setConfig, getConfig, getStreamGraph in StreamExecutionEnvironment, which are included in the batch scala api but not in stream scala api. Should we just exclude these in the test? Another one missing is printToErr in DataStream class. Add test for streaming scala api completeness - Key: FLINK-1430 URL: https://issues.apache.org/jira/browse/FLINK-1430 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Mingliang Qi Currently the completeness of the streaming scala api is not tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1442) Archived Execution Graph consumes too much memory
[ https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Michels reassigned FLINK-1442: -- Assignee: Max Michels Archived Execution Graph consumes too much memory - Key: FLINK-1442 URL: https://issues.apache.org/jira/browse/FLINK-1442 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Max Michels The JobManager archives the execution graphs, for analysis of jobs. The graphs may consume a lot of memory. Especially the execution edges in all2all connection patterns are extremely many and add up in memory consumption. The execution edges connect all parallel tasks. So for a all2all pattern between n and m tasks, there are n*m edges. For parallelism of multiple 100 tasks, this can easily reach 100k objects and more, each with a set of metadata. I propose the following to solve that: 1. Clear all execution edges from the graph (majority of the memory consumers) when it is given to the archiver. 2. Have the map/list of the archived graphs behind a soft reference, to it will be removed under memory pressure before the JVM crashes. That may remove graphs from the history early, but is much preferable to the JVM crashing, in which case the graph is lost as well... 3. Long term: The graph should be archived somewhere else. Somthing like the History server used by Hadoop and Hive would be a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/339#discussion_r23541551 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/filecache/FileCache.java --- @@ -72,7 +72,7 @@ * @return copy task */ public FutureTaskPath createTmpFile(String name, DistributedCacheEntry entry, JobID jobID) { - synchronized (count) { + synchronized (lock) { --- End diff -- How does the static lock solves the problem? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292011#comment-14292011 ] ASF GitHub Bot commented on FLINK-1419: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/339#discussion_r23541551 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/filecache/FileCache.java --- @@ -72,7 +72,7 @@ * @return copy task */ public FutureTaskPath createTmpFile(String name, DistributedCacheEntry entry, JobID jobID) { - synchronized (count) { + synchronized (lock) { --- End diff -- How does the static lock solves the problem? DistributedCache doesn't preserver files for subsequent operations -- Key: FLINK-1419 URL: https://issues.apache.org/jira/browse/FLINK-1419 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Chesnay Schepler Assignee: Chesnay Schepler When subsequent operations want to access the same files in the DC it frequently happens that the files are not created for the following operation. This is fairly odd, since the DC is supposed to either a) preserve files when another operation kicks in within a certain time window, or b) just recreate the deleted files. Both things don't happen. Increasing the time window had no effect. I'd like to use this issue as a starting point for a more general discussion about the DistributedCache. Currently: 1. all files reside in a common job-specific directory 2. are deleted during the job. One thing that was brought up about Trait 1 is that it basically forbids modification of the files, concurrent access and all. Personally I'm not sure if this a problem. Changing it to a task-specific place solved the issue though. I'm more concerned about Trait #2. Besides the mentioned issue, the deletion is realized with the scheduler, which adds a lot of complexity to the current code. (It really is a pain to work on...) If we moved the deletion to the end of the job it could be done as a clean-up step in the TaskManager, With this we could reduce the DC to a cacheFile(String source) method, the delete method in the TM, and throw out everything else. Also, the current implementation implies that big files may be copied multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo
[ https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292017#comment-14292017 ] ASF GitHub Bot commented on FLINK-1395: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/304#issuecomment-71487780 I'm going to merge the changes in this pull request into a custom branch. I'll open a new pull request (FLINK-1417) containing the commit from this PR. @aljoscha: Can you close this PR? Add Jodatime support to Kryo Key: FLINK-1395 URL: https://issues.apache.org/jira/browse/FLINK-1395 Project: Flink Issue Type: Sub-task Reporter: Robert Metzger Assignee: Robert Metzger -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71476676 I updated the PR with the exponential backoff registration strategy. On the way, I fixed the flakey RecoveryIT case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager
[ https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291940#comment-14291940 ] ASF GitHub Bot commented on FLINK-1352: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71476676 I updated the PR with the exponential backoff registration strategy. On the way, I fixed the flakey RecoveryIT case. Buggy registration from TaskManager to JobManager - Key: FLINK-1352 URL: https://issues.apache.org/jira/browse/FLINK-1352 Project: Flink Issue Type: Bug Components: JobManager, TaskManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Till Rohrmann Fix For: 0.9 The JobManager's InstanceManager may refuse the registration attempt from a TaskManager, because it has this taskmanager already connected, or,in the future, because the TaskManager has been blacklisted as unreliable. Unpon refused registration, the instance ID is null, to signal that refused registration. TaskManager reacts incorrectly to such methods, assuming successful registration Possible solution: JobManager sends back a dedicated RegistrationRefused message, if the instance manager returns null as the registration result. If the TastManager receives that before being registered, it knows that the registration response was lost (which should not happen on TCP and it would indicate a corrupt connection) Followup question: Does it make sense to have the TaskManager trying indefinitely to connect to the JobManager. With increasing interval (from seconds to minutes)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1415] Akka cleanups
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/319#discussion_r23537642 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotSharingGroupAssignment.java --- @@ -42,86 +44,82 @@ public class SlotSharingGroupAssignment implements Serializable { static final long serialVersionUID = 42L; - + private static final Logger LOG = Scheduler.LOG; - + private transient final Object lock = new Object(); - + /** All slots currently allocated to this sharing group */ private final SetSharedSlot allSlots = new LinkedHashSetSharedSlot(); - + /** The slots available per vertex type (jid), keyed by instance, to make them locatable */ private final MapAbstractID, MapInstance, ListSharedSlot availableSlotsPerJid = new LinkedHashMapAbstractID, MapInstance, ListSharedSlot(); - - + // - - - public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex vertex) { - JobVertexID id = vertex.getJobvertexId(); - return addNewSlotWithTask(slot, id, id); - } - - public SubSlot addNewSlotWithTask(AllocatedSlot slot, ExecutionVertex vertex, CoLocationConstraint constraint) { - AbstractID groupId = constraint.getGroupId(); - return addNewSlotWithTask(slot, groupId, null); - } - - private SubSlot addNewSlotWithTask(AllocatedSlot slot, AbstractID groupId, JobVertexID vertexId) { - - final SharedSlot sharedSlot = new SharedSlot(slot, this); - final Instance location = slot.getInstance(); - + + public SimpleSlot addSharedSlotAndAllocateSubSlot(SharedSlot sharedSlot, Locality locality, + AbstractID groupId, CoLocationConstraint constraint) { --- End diff -- I must have fallen asleep on the space button. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1392) Serializing Protobuf - issue 1
[ https://issues.apache.org/jira/browse/FLINK-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291996#comment-14291996 ] ASF GitHub Bot commented on FLINK-1392: --- Github user rmetzger closed the pull request at: https://github.com/apache/flink/pull/322 Serializing Protobuf - issue 1 -- Key: FLINK-1392 URL: https://issues.apache.org/jira/browse/FLINK-1392 Project: Flink Issue Type: Sub-task Reporter: Felix Neutatz Assignee: Robert Metzger Priority: Minor Hi, I started to experiment with Parquet using Protobuf. When I use the standard Protobuf class: com.twitter.data.proto.tutorial.AddressBookProtos The code which I run, can be found here: [https://github.com/FelixNeutatz/incubator-flink/blob/ParquetAtFlink/flink-addons/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapreduce/example/ParquetProtobufOutput.java] I get the following exception: {code:xml} Exception in thread main java.lang.Exception: Deserializing the InputFormat (org.apache.flink.api.java.io.CollectionInputFormat) failed: Could not read the user code wrapper: Error while deserializing element from collection at org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:60) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:179) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:172) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:172) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:34) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:52) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.operators.util.CorruptConfigurationException: Could not read the user code wrapper: Error while deserializing element from collection at org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:285) at org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:57) ... 25 more Caused by: java.io.IOException: Error while deserializing element from collection at org.apache.flink.api.java.io.CollectionInputFormat.readObject(CollectionInputFormat.java:108) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at
[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292036#comment-14292036 ] ASF GitHub Bot commented on FLINK-1419: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71490079 oh i see what you mean, maybe extend the synchronized block to include the actual delete stuff. yup that's a good idea, all i know is i tried it without the change and ran into issues, with the change it ran. DistributedCache doesn't preserver files for subsequent operations -- Key: FLINK-1419 URL: https://issues.apache.org/jira/browse/FLINK-1419 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Chesnay Schepler Assignee: Chesnay Schepler When subsequent operations want to access the same files in the DC it frequently happens that the files are not created for the following operation. This is fairly odd, since the DC is supposed to either a) preserve files when another operation kicks in within a certain time window, or b) just recreate the deleted files. Both things don't happen. Increasing the time window had no effect. I'd like to use this issue as a starting point for a more general discussion about the DistributedCache. Currently: 1. all files reside in a common job-specific directory 2. are deleted during the job. One thing that was brought up about Trait 1 is that it basically forbids modification of the files, concurrent access and all. Personally I'm not sure if this a problem. Changing it to a task-specific place solved the issue though. I'm more concerned about Trait #2. Besides the mentioned issue, the deletion is realized with the scheduler, which adds a lot of complexity to the current code. (It really is a pain to work on...) If we moved the deletion to the end of the job it could be done as a clean-up step in the TaskManager, With this we could reduce the DC to a cacheFile(String source) method, the delete method in the TM, and throw out everything else. Also, the current implementation implies that big files may be copied multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292038#comment-14292038 ] ASF GitHub Bot commented on FLINK-1419: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71490264 Yes, but only the access to the count hash map. The delete action itself is not synchronized. DistributedCache doesn't preserver files for subsequent operations -- Key: FLINK-1419 URL: https://issues.apache.org/jira/browse/FLINK-1419 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Chesnay Schepler Assignee: Chesnay Schepler When subsequent operations want to access the same files in the DC it frequently happens that the files are not created for the following operation. This is fairly odd, since the DC is supposed to either a) preserve files when another operation kicks in within a certain time window, or b) just recreate the deleted files. Both things don't happen. Increasing the time window had no effect. I'd like to use this issue as a starting point for a more general discussion about the DistributedCache. Currently: 1. all files reside in a common job-specific directory 2. are deleted during the job. One thing that was brought up about Trait 1 is that it basically forbids modification of the files, concurrent access and all. Personally I'm not sure if this a problem. Changing it to a task-specific place solved the issue though. I'm more concerned about Trait #2. Besides the mentioned issue, the deletion is realized with the scheduler, which adds a lot of complexity to the current code. (It really is a pain to work on...) If we moved the deletion to the end of the job it could be done as a clean-up step in the TaskManager, With this we could reduce the DC to a cacheFile(String source) method, the delete method in the TM, and throw out everything else. Also, the current implementation implies that big files may be copied multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71490264 Yes, but only the access to the count hash map. The delete action itself is not synchronized. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-1430) Add test for streaming scala api completeness
[ https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Qi reassigned FLINK-1430: --- Assignee: Mingliang Qi Add test for streaming scala api completeness - Key: FLINK-1430 URL: https://issues.apache.org/jira/browse/FLINK-1430 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Mingliang Qi Currently the completeness of the streaming scala api is not tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1437) Bug in PojoSerializer's copy() method
[ https://issues.apache.org/jira/browse/FLINK-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291934#comment-14291934 ] ASF GitHub Bot commented on FLINK-1437: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/342#issuecomment-71475896 This will probably conflict with https://github.com/apache/flink/pull/316. Bug in PojoSerializer's copy() method - Key: FLINK-1437 URL: https://issues.apache.org/jira/browse/FLINK-1437 Project: Flink Issue Type: Bug Components: Java API Reporter: Timo Walther Assignee: Timo Walther The PojoSerializer's {{copy()}} method does not work properly with {{null}} values. An exception could look like: {code} Caused by: java.io.IOException: Thread 'SortMerger spilling thread' terminated due to an exception: null at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:792) Caused by: java.io.EOFException at org.apache.flink.runtime.io.disk.RandomAccessInputView.nextSegment(RandomAccessInputView.java:83) at org.apache.flink.runtime.memorymanager.AbstractPagedInputView.advance(AbstractPagedInputView.java:159) at org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readByte(AbstractPagedInputView.java:270) at org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:277) at org.apache.flink.types.StringValue.copyString(StringValue.java:839) at org.apache.flink.api.common.typeutils.base.StringSerializer.copy(StringSerializer.java:83) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.copy(PojoSerializer.java:261) at org.apache.flink.runtime.operators.sort.NormalizedKeySorter.writeToOutput(NormalizedKeySorter.java:449) at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1303) at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:788) {code} I'm working on a fix for that... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1454) CliFrontend blocks for 100 seconds when submitting to a non-existent JobManager
[ https://issues.apache.org/jira/browse/FLINK-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291941#comment-14291941 ] Mingliang Qi commented on FLINK-1454: - isn't it because akka timeout is set to 100s? CliFrontend blocks for 100 seconds when submitting to a non-existent JobManager --- Key: FLINK-1454 URL: https://issues.apache.org/jira/browse/FLINK-1454 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Robert Metzger When a user tries to submit a job to a job manager which doesn't exist at all, the CliFrontend blocks for 100 seconds. Ideally, Akka would fail because it can not connect to the given hostname:port. {code} ./bin/flink run ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar -c foo.Baz org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:449) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:350) at org.apache.flink.client.program.Client.run(Client.java:242) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:389) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:362) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1078) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1102) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [100 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.flink.runtime.akka.AkkaUtils$.ask(AkkaUtils.scala:265) at org.apache.flink.runtime.client.JobClient$.uploadJarFiles(JobClient.scala:169) at org.apache.flink.runtime.client.JobClient.uploadJarFiles(JobClient.scala) at org.apache.flink.client.program.Client.run(Client.java:314) at org.apache.flink.client.program.Client.run(Client.java:296) at org.apache.flink.client.program.Client.run(Client.java:290) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:434) ... 6 more The exception above occurred while trying to run your command. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292032#comment-14292032 ] ASF GitHub Bot commented on FLINK-1419: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71489135 but that is exactly what is changing, both the delete and copy process are synchronized on the same object. DistributedCache doesn't preserver files for subsequent operations -- Key: FLINK-1419 URL: https://issues.apache.org/jira/browse/FLINK-1419 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Chesnay Schepler Assignee: Chesnay Schepler When subsequent operations want to access the same files in the DC it frequently happens that the files are not created for the following operation. This is fairly odd, since the DC is supposed to either a) preserve files when another operation kicks in within a certain time window, or b) just recreate the deleted files. Both things don't happen. Increasing the time window had no effect. I'd like to use this issue as a starting point for a more general discussion about the DistributedCache. Currently: 1. all files reside in a common job-specific directory 2. are deleted during the job. One thing that was brought up about Trait 1 is that it basically forbids modification of the files, concurrent access and all. Personally I'm not sure if this a problem. Changing it to a task-specific place solved the issue though. I'm more concerned about Trait #2. Besides the mentioned issue, the deletion is realized with the scheduler, which adds a lot of complexity to the current code. (It really is a pain to work on...) If we moved the deletion to the end of the job it could be done as a clean-up step in the TaskManager, With this we could reduce the DC to a cacheFile(String source) method, the delete method in the TM, and throw out everything else. Also, the current implementation implies that big files may be copied multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-71492121 Sure, XyzTest are unit tests which are executed in Maven's test phase. These should execute rather fast. Everything that brings up a full Flink system is an integration test case (XyzITCase) and executed during mvn verify. These are used for long running tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1437) Bug in PojoSerializer's copy() method
[ https://issues.apache.org/jira/browse/FLINK-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291931#comment-14291931 ] ASF GitHub Bot commented on FLINK-1437: --- GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/342 [FLINK-1437][Java API] Fixes copy() methods in PojoSerializer for null values. See description in FLINK-1437. PR includes tests cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink PojoCopyFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/342.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #342 commit f6917765fb599de74ab89580f22feb2096ca946c Author: twalthr i...@twalthr.com Date: 2015-01-26T15:09:24Z [FLINK-1437][Java API] Fixes copy() methods in PojoSerializer for null values Bug in PojoSerializer's copy() method - Key: FLINK-1437 URL: https://issues.apache.org/jira/browse/FLINK-1437 Project: Flink Issue Type: Bug Components: Java API Reporter: Timo Walther Assignee: Timo Walther The PojoSerializer's {{copy()}} method does not work properly with {{null}} values. An exception could look like: {code} Caused by: java.io.IOException: Thread 'SortMerger spilling thread' terminated due to an exception: null at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:792) Caused by: java.io.EOFException at org.apache.flink.runtime.io.disk.RandomAccessInputView.nextSegment(RandomAccessInputView.java:83) at org.apache.flink.runtime.memorymanager.AbstractPagedInputView.advance(AbstractPagedInputView.java:159) at org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readByte(AbstractPagedInputView.java:270) at org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:277) at org.apache.flink.types.StringValue.copyString(StringValue.java:839) at org.apache.flink.api.common.typeutils.base.StringSerializer.copy(StringSerializer.java:83) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.copy(PojoSerializer.java:261) at org.apache.flink.runtime.operators.sort.NormalizedKeySorter.writeToOutput(NormalizedKeySorter.java:449) at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1303) at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:788) {code} I'm working on a fix for that... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1437][Java API] Fixes copy() methods in...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/342#issuecomment-71475896 This will probably conflict with https://github.com/apache/flink/pull/316. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-1147) TypeInference on POJOs
[ https://issues.apache.org/jira/browse/FLINK-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther resolved FLINK-1147. - Resolution: Fixed Fix Version/s: 0.9 Fixed in commit 6067833fb6ad6c11a121d8654d7ca147cc909f05 TypeInference on POJOs -- Key: FLINK-1147 URL: https://issues.apache.org/jira/browse/FLINK-1147 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 0.7.0-incubating Reporter: Stephan Ewen Assignee: Timo Walther Fix For: 0.9 On Tuples, we currently use type inference that figures out the types of output type variables relative to the input type variable. We need a similar functionality for POJOs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo
[ https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292040#comment-14292040 ] ASF GitHub Bot commented on FLINK-1395: --- Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/304 Add Jodatime support to Kryo Key: FLINK-1395 URL: https://issues.apache.org/jira/browse/FLINK-1395 Project: Flink Issue Type: Sub-task Reporter: Robert Metzger Assignee: Robert Metzger -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...
Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/304 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1392] Add Kryo serializer for Protobuf
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/322#issuecomment-71484526 I will make this change part of a new pull request for https://issues.apache.org/jira/browse/FLINK-1417. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71489135 but that is exactly what is changing, both the delete and copy process are synchronized on the same object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71490079 oh i see what you mean, maybe extend the synchronized block to include the actual delete stuff. yup that's a good idea, all i know is i tried it without the change and ran into issues, with the change it ran. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1330) Restructure directory layout
[ https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291774#comment-14291774 ] ASF GitHub Bot commented on FLINK-1330: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71454018 Very nice. +1 Will merge this later. Restructure directory layout Key: FLINK-1330 URL: https://issues.apache.org/jira/browse/FLINK-1330 Project: Flink Issue Type: Improvement Components: Build System, Documentation Reporter: Max Michels Priority: Minor Labels: usability When building Flink, the build results can currently be found under flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/. I think we could improve the directory layout with the following: - provide the bin folder in the root by default - let the start up and submissions scripts in bin assemble the class path - in case the project hasn't been build yet, inform the user The changes would make it easier to work with Flink from source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1168] Adds multi-char field delimiter s...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/264#issuecomment-71455623 Updated the PR and will merge once Travis completed the build. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1415) Akka cleanups
[ https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291800#comment-14291800 ] ASF GitHub Bot commented on FLINK-1415: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/319#issuecomment-71458216 I think @StephanEwen is reviewing the critical part. Akka cleanups - Key: FLINK-1415 URL: https://issues.apache.org/jira/browse/FLINK-1415 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Currently, Akka has many different timeout values. From a user perspective, it would be helpful to deduce all different timeouts from a single timeout value. Additionally, the user should still be able to define specific values for the different timeouts. Akka uses the akka.jobmanager.url config parameter to override the jobmanager address and the port in case of a local setup. This mechanism is not safe since it is exposed to the user. Thus, the mechanism should be replaced. The notifyExecutionStateChange method allows objects to access the internal state of the TaskManager actor. This causes NullPointerExceptions when shutting down the actor. This method should be removed to avoid accessing the internal state of an actor by another object. With the latest Akka changes, the TaskManager watches the JobManager in order to detect when it died or lost the connection to the TaskManager. This behaviour should be tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-512) Add support for Tachyon File System to Flink
[ https://issues.apache.org/jira/browse/FLINK-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-512. -- Resolution: Fixed Has been resolved by FLINK-1266 (at least to the scope of this issue) Add support for Tachyon File System to Flink Key: FLINK-512 URL: https://issues.apache.org/jira/browse/FLINK-512 Project: Flink Issue Type: New Feature Reporter: Chesnay Schepler Assignee: Robert Metzger Priority: Minor Labels: github-import Fix For: pre-apache Attachments: pull-request-512-9112930359400451481.patch Implementation of the Tachyon file system. The code was tested using junit and the wordcount example on a tachyon file system running in local mode. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/pull/512 Created by: [zentol|https://github.com/zentol] Labels: Created at: Wed Feb 26 13:58:24 CET 2014 State: closed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71469736 Oh, actually, that should work because the configuration explicitly binds the plugin to the clean phase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1330) Restructure directory layout
[ https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291891#comment-14291891 ] ASF GitHub Bot commented on FLINK-1330: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71469736 Oh, actually, that should work because the configuration explicitly binds the plugin to the clean phase. Restructure directory layout Key: FLINK-1330 URL: https://issues.apache.org/jira/browse/FLINK-1330 Project: Flink Issue Type: Improvement Components: Build System, Documentation Reporter: Max Michels Priority: Minor Labels: usability When building Flink, the build results can currently be found under flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/. I think we could improve the directory layout with the following: - provide the bin folder in the root by default - let the start up and submissions scripts in bin assemble the class path - in case the project hasn't been build yet, inform the user The changes would make it easier to work with Flink from source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1442] Reduce memory consumption of arch...
GitHub user mxm opened a pull request: https://github.com/apache/flink/pull/344 [FLINK-1442] Reduce memory consumption of archived execution graph You can merge this pull request into a Git repository by running: $ git pull https://github.com/mxm/flink flink-1442 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/344.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #344 commit baa04e386b70d3c928ceb07e78e50016f20520f0 Author: Max m...@posteo.de Date: 2015-01-26T18:31:47Z [FLINK-1442] Reduce memory consumption of archived execution graph --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1442) Archived Execution Graph consumes too much memory
[ https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292251#comment-14292251 ] ASF GitHub Bot commented on FLINK-1442: --- GitHub user mxm opened a pull request: https://github.com/apache/flink/pull/344 [FLINK-1442] Reduce memory consumption of archived execution graph You can merge this pull request into a Git repository by running: $ git pull https://github.com/mxm/flink flink-1442 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/344.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #344 commit baa04e386b70d3c928ceb07e78e50016f20520f0 Author: Max m...@posteo.de Date: 2015-01-26T18:31:47Z [FLINK-1442] Reduce memory consumption of archived execution graph Archived Execution Graph consumes too much memory - Key: FLINK-1442 URL: https://issues.apache.org/jira/browse/FLINK-1442 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Max Michels The JobManager archives the execution graphs, for analysis of jobs. The graphs may consume a lot of memory. Especially the execution edges in all2all connection patterns are extremely many and add up in memory consumption. The execution edges connect all parallel tasks. So for a all2all pattern between n and m tasks, there are n*m edges. For parallelism of multiple 100 tasks, this can easily reach 100k objects and more, each with a set of metadata. I propose the following to solve that: 1. Clear all execution edges from the graph (majority of the memory consumers) when it is given to the archiver. 2. Have the map/list of the archived graphs behind a soft reference, to it will be removed under memory pressure before the JVM crashes. That may remove graphs from the history early, but is much preferable to the JVM crashing, in which case the graph is lost as well... 3. Long term: The graph should be archived somewhere else. Somthing like the History server used by Hadoop and Hive would be a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1456) Projection to fieldnames, keyselectors
Márton Balassi created FLINK-1456: - Summary: Projection to fieldnames, keyselectors Key: FLINK-1456 URL: https://issues.apache.org/jira/browse/FLINK-1456 Project: Flink Issue Type: New Feature Components: Java API, Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Márton Balassi The projection operator of both the batch and the streaming APIs only support projections with field positions as parameters. I'd like to extend this functionality with projection to fieldnames, and keyselectors providing a similar API to what groupBy already has. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1201) Graph API for Flink
[ https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292076#comment-14292076 ] ASF GitHub Bot commented on FLINK-1201: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-71494713 Alright, thanks @fhueske! So, it seems all of our tests are integration test cases. I will update later today I hope :) Graph API for Flink Key: FLINK-1201 URL: https://issues.apache.org/jira/browse/FLINK-1201 Project: Flink Issue Type: New Feature Reporter: Kostas Tzoumas Assignee: Vasia Kalavri This issue tracks the development of a Graph API/DSL for Flink. Until the code is pushed to the Flink repository, collaboration is happening here: https://github.com/project-flink/flink-graph -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71494058 Yeah, that would probably solve the problem. With race conditions it is often very tricky. Sometimes little changes change the process interleaving such that the problem seems to be fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1201) Graph API for Flink
[ https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292167#comment-14292167 ] ASF GitHub Bot commented on FLINK-1201: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-71507264 Tests renamed :) Graph API for Flink Key: FLINK-1201 URL: https://issues.apache.org/jira/browse/FLINK-1201 Project: Flink Issue Type: New Feature Reporter: Kostas Tzoumas Assignee: Vasia Kalavri This issue tracks the development of a Graph API/DSL for Flink. Until the code is pushed to the Flink repository, collaboration is happening here: https://github.com/project-flink/flink-graph -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1415) Akka cleanups
[ https://issues.apache.org/jira/browse/FLINK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292134#comment-14292134 ] ASF GitHub Bot commented on FLINK-1415: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/319#issuecomment-71501654 Do you think this change is also addressing this error? https://travis-ci.org/rmetzger/flink/jobs/48365538 Akka cleanups - Key: FLINK-1415 URL: https://issues.apache.org/jira/browse/FLINK-1415 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Currently, Akka has many different timeout values. From a user perspective, it would be helpful to deduce all different timeouts from a single timeout value. Additionally, the user should still be able to define specific values for the different timeouts. Akka uses the akka.jobmanager.url config parameter to override the jobmanager address and the port in case of a local setup. This mechanism is not safe since it is exposed to the user. Thus, the mechanism should be replaced. The notifyExecutionStateChange method allows objects to access the internal state of the TaskManager actor. This causes NullPointerExceptions when shutting down the actor. This method should be removed to avoid accessing the internal state of an actor by another object. With the latest Akka changes, the TaskManager watches the JobManager in order to detect when it died or lost the connection to the TaskManager. This behaviour should be tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-71507264 Tests renamed :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1442) Archived Execution Graph consumes too much memory
[ https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292498#comment-14292498 ] ASF GitHub Bot commented on FLINK-1442: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/344#issuecomment-71548964 Looks good so far. I see that you removed the LRU code. Was that on purpose? Leaving it in may be a good idea, because the soft references are cleared in arbitrary order. It may make newer jobs disappear before older ones. Having the LRU in would mean things behave as previously as long as the memory is sufficient, and the soft reference clearing kicks in as a safety valve. Archived Execution Graph consumes too much memory - Key: FLINK-1442 URL: https://issues.apache.org/jira/browse/FLINK-1442 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Max Michels The JobManager archives the execution graphs, for analysis of jobs. The graphs may consume a lot of memory. Especially the execution edges in all2all connection patterns are extremely many and add up in memory consumption. The execution edges connect all parallel tasks. So for a all2all pattern between n and m tasks, there are n*m edges. For parallelism of multiple 100 tasks, this can easily reach 100k objects and more, each with a set of metadata. I propose the following to solve that: 1. Clear all execution edges from the graph (majority of the memory consumers) when it is given to the archiver. 2. Have the map/list of the archived graphs behind a soft reference, to it will be removed under memory pressure before the JVM crashes. That may remove graphs from the history early, but is much preferable to the JVM crashing, in which case the graph is lost as well... 3. Long term: The graph should be archived somewhere else. Somthing like the History server used by Hadoop and Hive would be a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1442] Reduce memory consumption of arch...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/344#issuecomment-71548964 Looks good so far. I see that you removed the LRU code. Was that on purpose? Leaving it in may be a good idea, because the soft references are cleared in arbitrary order. It may make newer jobs disappear before older ones. Having the LRU in would mean things behave as previously as long as the memory is sufficient, and the soft reference clearing kicks in as a safety valve. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1433] Add HADOOP_CLASSPATH to start scr...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/337#issuecomment-71548405 Okay, if it is an auxiliary classpath then it should be fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Discuss] Simplify SplittableIterator interfac...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/338#issuecomment-71595710 I think in the current state, this makes sense. I wrote the interface like it was, because it would enable implementations that does not compute/provide all splits on all machines. Think of a collection input format where we send a different subset of the collection to all each node (not supported in the runtime now, but might be at some point). I guess we can realize something similar by splitting on the client and then sending the sub set iterators directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/311#issuecomment-71598228 +1 from my side --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71598417 Okay, let's add an exclude for the linked target directory and update `basedir` to `project.basedir`. Will that do? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1330) Restructure directory layout
[ https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293071#comment-14293071 ] ASF GitHub Bot commented on FLINK-1330: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71598417 Okay, let's add an exclude for the linked target directory and update `basedir` to `project.basedir`. Will that do? Restructure directory layout Key: FLINK-1330 URL: https://issues.apache.org/jira/browse/FLINK-1330 Project: Flink Issue Type: Improvement Components: Build System, Documentation Reporter: Max Michels Priority: Minor Labels: usability When building Flink, the build results can currently be found under flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/. I think we could improve the directory layout with the following: - provide the bin folder in the root by default - let the start up and submissions scripts in bin assemble the class path - in case the project hasn't been build yet, inform the user The changes would make it easier to work with Flink from source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings
[ https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292761#comment-14292761 ] ASF GitHub Bot commented on FLINK-377: -- Github user dan-blanchard commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71570805 I've only recently started looking at Flink, and the lack of support for non-JVM languages was a bit of showstopper for me. That's one of the main reasons we use Storm. Anyway, is the idea here that this will just be for Python? Will it be simple to for third parties to add support for other languages? Create a general purpose framework for language bindings Key: FLINK-377 URL: https://issues.apache.org/jira/browse/FLINK-377 Project: Flink Issue Type: Improvement Reporter: GitHub Import Labels: github-import Fix For: pre-apache A general purpose API to run operators with arbitrary binaries. This will allow to run Stratosphere programs written in Python, JavaScript, Ruby, Go or whatever you like. We suggest using Google Protocol Buffers for data serialization. This is the list of languages that currently support ProtoBuf: https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns Very early prototype with python: https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing protobuf) For Ruby: https://github.com/infochimps-labs/wukong Two new students working at Stratosphere (@skunert and @filiphaase) are working on this. The reference binding language will be for Python, but other bindings are very welcome. The best name for this so far is stratosphere-lang-bindings. I created this issue to track the progress (and give everybody a chance to comment on this) Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/377 Created by: [rmetzger|https://github.com/rmetzger] Labels: enhancement, Assignee: [filiphaase|https://github.com/filiphaase] Created at: Tue Jan 07 19:47:20 CET 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)