[jira] [Commented] (FLINK-1682) Port Record-API based optimizer tests to new Java API
[ https://issues.apache.org/jira/browse/FLINK-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513714#comment-14513714 ] ASF GitHub Bot commented on FLINK-1682: --- GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/627 [FLINK-1682] Ported optimizer unit tests from Record API to Java API This is a step towards removing the deprecated Record API. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink recordOptimizerTests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/627.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #627 commit 303e86cc7af2604c3d9c781bd08a656daf6b99ae Author: Fabian Hueske fhue...@apache.org Date: 2015-04-24T23:30:11Z [FLINK-1682] Ported optimizer unit tests from Record API to Java API Port Record-API based optimizer tests to new Java API - Key: FLINK-1682 URL: https://issues.apache.org/jira/browse/FLINK-1682 Project: Flink Issue Type: Sub-task Components: Optimizer Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Minor Fix For: 0.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1789) Allow adding of URLs to the usercode class loader
[ https://issues.apache.org/jira/browse/FLINK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513870#comment-14513870 ] ASF GitHub Bot commented on FLINK-1789: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/593#issuecomment-96599424 Yes, user-facing remains strings which are file paths. Allow adding of URLs to the usercode class loader - Key: FLINK-1789 URL: https://issues.apache.org/jira/browse/FLINK-1789 Project: Flink Issue Type: Improvement Components: Distributed Runtime Reporter: Timo Walther Assignee: Timo Walther Priority: Minor Currently, there is no option to add customs classpath URLs to the FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even if they are already present on all nodes. It would be great if RemoteEnvironment also accepts valid classpaths URLs and forwards them to BlobLibraryCacheManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/593#issuecomment-96599424 Yes, user-facing remains strings which are file paths. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-1947) Make Avro and Tachyon test logging less verbose
Till Rohrmann created FLINK-1947: Summary: Make Avro and Tachyon test logging less verbose Key: FLINK-1947 URL: https://issues.apache.org/jira/browse/FLINK-1947 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Priority: Minor Currently, the {{AvroExternalJarProgramITCase}} and the Tachyon test cases write the cluster status messages to stdout. I think these messages are not needed and only clutter the test output. Therefore, we should maybe suppress these messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1938] Add Grunt for building the front-...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/623#issuecomment-96562609 You are definitely welcome to participate in the new web frontend development. Have a look at the branch I sent you and see if you find your way around the new code (it is not too much so far, so it is okay, hopefully). I am not familiar with Grunt. It says it is a JavaScript automation framework. What is it for, in the context of the web frontend? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1923] Replaces asynchronous logging wit...
GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/628 [FLINK-1923] Replaces asynchronous logging with synchronous logging in actors Replaces asynchronous logging with synchronous logging in actors. Additionally, all Scala implementations are now using the grizzled-slf4j SLF4J-wrapper which allows proper usage of SLF4J within Scala code. One problem the grizzled-slf4j wrapper fixes is the ambiguity between varargs and a string with two placeholders. Additionally it resolves the ambiguity between (String, String, Object) where the first String is used as a Marker and where the first string is the logging message. Grizzled-slf4j adds automatically logging guards which make the explicit checking for the log level redundant. Grizzled-slf4j's license is BSD. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixLogging Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/628.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #628 commit 34c9fd54448b340505c97679299b8876f0257828 Author: Till Rohrmann trohrm...@apache.org Date: 2015-04-24T14:33:34Z [FLINK-1923] [runtime] Replaces asynchronous logging with synchronous logging using grizzled-slf4j wrapper for Scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1923) Replace asynchronous logging in JobManager with regular slf4j logging
[ https://issues.apache.org/jira/browse/FLINK-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513746#comment-14513746 ] ASF GitHub Bot commented on FLINK-1923: --- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/628 [FLINK-1923] Replaces asynchronous logging with synchronous logging in actors Replaces asynchronous logging with synchronous logging in actors. Additionally, all Scala implementations are now using the grizzled-slf4j SLF4J-wrapper which allows proper usage of SLF4J within Scala code. One problem the grizzled-slf4j wrapper fixes is the ambiguity between varargs and a string with two placeholders. Additionally it resolves the ambiguity between (String, String, Object) where the first String is used as a Marker and where the first string is the logging message. Grizzled-slf4j adds automatically logging guards which make the explicit checking for the log level redundant. Grizzled-slf4j's license is BSD. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixLogging Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/628.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #628 commit 34c9fd54448b340505c97679299b8876f0257828 Author: Till Rohrmann trohrm...@apache.org Date: 2015-04-24T14:33:34Z [FLINK-1923] [runtime] Replaces asynchronous logging with synchronous logging using grizzled-slf4j wrapper for Scala. Replace asynchronous logging in JobManager with regular slf4j logging - Key: FLINK-1923 URL: https://issues.apache.org/jira/browse/FLINK-1923 Project: Flink Issue Type: Task Components: JobManager Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Till Rohrmann Its hard to understand exactly whats going on in the JobManager because the log messages are send asynchronously by a logging actor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1924] Minor Refactoring
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/616#issuecomment-96573887 +1, can you merge it @mxm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1682) Port Record-API based optimizer tests to new Java API
[ https://issues.apache.org/jira/browse/FLINK-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513773#comment-14513773 ] ASF GitHub Bot commented on FLINK-1682: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/627#issuecomment-96580250 Wow, very nice :-) Looks correct from a first glance (I did not thoroughly check everything). One remark: Since `print()` may become an eagerly evaluated command in the future, the tests become more robust when using `.output(new DiscardingOutputFormat())` as the sink. Port Record-API based optimizer tests to new Java API - Key: FLINK-1682 URL: https://issues.apache.org/jira/browse/FLINK-1682 Project: Flink Issue Type: Sub-task Components: Optimizer Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Minor Fix For: 0.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...
Github user twalthr commented on the pull request: https://github.com/apache/flink/pull/593#issuecomment-96591270 It is not a problem in exposing this to the client as well, I was just unsure if it is necessary. I will add to the client as well to be in sync. Yes, it makes sense to use URLs instead of URLs and Files. But the user facing APIs remain Strings right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1947) Make Avro and Tachyon test logging less verbose
[ https://issues.apache.org/jira/browse/FLINK-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513768#comment-14513768 ] Stephan Ewen commented on FLINK-1947: - You can avoid the sysout logging by using {{ExecutionEnvironment.getConfig().disableSystoutLogging()}}. Make Avro and Tachyon test logging less verbose --- Key: FLINK-1947 URL: https://issues.apache.org/jira/browse/FLINK-1947 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Priority: Minor Currently, the {{AvroExternalJarProgramITCase}} and the Tachyon test cases write the cluster status messages to stdout. I think these messages are not needed and only clutter the test output. Therefore, we should maybe suppress these messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [runtime] Bump Netty version to 4.27.Final and...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/617 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1682] Ported optimizer unit tests from ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/627#issuecomment-96580250 Wow, very nice :-) Looks correct from a first glance (I did not thoroughly check everything). One remark: Since `print()` may become an eagerly evaluated command in the future, the tests become more robust when using `.output(new DiscardingOutputFormat())` as the sink. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1789) Allow adding of URLs to the usercode class loader
[ https://issues.apache.org/jira/browse/FLINK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513849#comment-14513849 ] ASF GitHub Bot commented on FLINK-1789: --- Github user twalthr commented on the pull request: https://github.com/apache/flink/pull/593#issuecomment-96591270 It is not a problem in exposing this to the client as well, I was just unsure if it is necessary. I will add to the client as well to be in sync. Yes, it makes sense to use URLs instead of URLs and Files. But the user facing APIs remain Strings right? Allow adding of URLs to the usercode class loader - Key: FLINK-1789 URL: https://issues.apache.org/jira/browse/FLINK-1789 Project: Flink Issue Type: Improvement Components: Distributed Runtime Reporter: Timo Walther Assignee: Timo Walther Priority: Minor Currently, there is no option to add customs classpath URLs to the FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even if they are already present on all nodes. It would be great if RemoteEnvironment also accepts valid classpaths URLs and forwards them to BlobLibraryCacheManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1941) Add documentation for Gelly-GSA
[ https://issues.apache.org/jira/browse/FLINK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514777#comment-14514777 ] Vasia Kalavri commented on FLINK-1941: -- :-) you're way too fast for my current reviewing capacity! I should be able to work on this one by the end of this week or beginning of next.. Add documentation for Gelly-GSA --- Key: FLINK-1941 URL: https://issues.apache.org/jira/browse/FLINK-1941 Project: Flink Issue Type: Task Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Labels: docs, gelly Add a section in the Gelly guide to describe the newly introduced Gather-Sum-Apply iteration method. Show how GSA uses delta iterations internally and explain the differences of this model as compared to vertex-centric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user Elbehery commented on the pull request: https://github.com/apache/flink/pull/621#issuecomment-96809741 @aljoscha I have run code format from Intellij, I think the problem should be solved now .. I have tried to read the travis build log, but I could not find the cause of the problem, would you please tell me how to find it, for saving time in the future ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514915#comment-14514915 ] ASF GitHub Bot commented on FLINK-1615: --- Github user Elbehery commented on the pull request: https://github.com/apache/flink/pull/621#issuecomment-96809741 @aljoscha I have run code format from Intellij, I think the problem should be solved now .. I have tried to read the travis build log, but I could not find the cause of the problem, would you please tell me how to find it, for saving time in the future ? Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1828) Impossible to output data to an HBase table
[ https://issues.apache.org/jira/browse/FLINK-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515048#comment-14515048 ] ASF GitHub Bot commented on FLINK-1828: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/571 Impossible to output data to an HBase table --- Key: FLINK-1828 URL: https://issues.apache.org/jira/browse/FLINK-1828 Project: Flink Issue Type: Bug Components: Hadoop Compatibility Affects Versions: 0.9 Reporter: Flavio Pompermaier Labels: hadoop, hbase Fix For: 0.9 Right now it is not possible to use HBase TableOutputFormat as output format because Configurable.setConf is not called in the configure() method of the HadoopOutputFormatBase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1828) Impossible to output data to an HBase table
[ https://issues.apache.org/jira/browse/FLINK-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515050#comment-14515050 ] ASF GitHub Bot commented on FLINK-1828: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/571#issuecomment-96828941 This PR was split into PR #632 and PR #633 Impossible to output data to an HBase table --- Key: FLINK-1828 URL: https://issues.apache.org/jira/browse/FLINK-1828 Project: Flink Issue Type: Bug Components: Hadoop Compatibility Affects Versions: 0.9 Reporter: Flavio Pompermaier Labels: hadoop, hbase Fix For: 0.9 Right now it is not possible to use HBase TableOutputFormat as output format because Configurable.setConf is not called in the configure() method of the HadoopOutputFormatBase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/571#issuecomment-96828941 This PR was split into PR #632 and PR #633 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-1951) NullPointerException in DeltaIteration when no ForwardedFileds
[ https://issues.apache.org/jira/browse/FLINK-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske updated FLINK-1951: - Priority: Critical (was: Minor) NullPointerException in DeltaIteration when no ForwardedFileds -- Key: FLINK-1951 URL: https://issues.apache.org/jira/browse/FLINK-1951 Project: Flink Issue Type: Bug Components: Iterations Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Fabian Hueske Priority: Critical The following exception is thrown by the Connected Components example, if the @ForwardedFieldsFirst(*) annotation from the ComponentIdFilter join is removed: Caused by: java.lang.NullPointerException at org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186) at org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1) at org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) at org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) at java.lang.Thread.run(Thread.java:745) [Code | https://github.com/vasia/flink/tree/cc-test] and [dataset | http://snap.stanford.edu/data/com-DBLP.html] to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Fixed missing call to configure() for Configur...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/632 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/571 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-1828) Impossible to output data to an HBase table
[ https://issues.apache.org/jira/browse/FLINK-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-1828. -- Resolution: Fixed Fix Version/s: 0.8.2 Fixed in - 0.9 with de573cf5cef3bed6c489af85dba2cc61912db4c0 - 0.8.2 with ffc86f6686520b8ed4270ccd76ac304f64368c6e Impossible to output data to an HBase table --- Key: FLINK-1828 URL: https://issues.apache.org/jira/browse/FLINK-1828 Project: Flink Issue Type: Bug Components: Hadoop Compatibility Affects Versions: 0.9 Reporter: Flavio Pompermaier Labels: hadoop, hbase Fix For: 0.9, 0.8.2 Right now it is not possible to use HBase TableOutputFormat as output format because Configurable.setConf is not called in the configure() method of the HadoopOutputFormatBase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513718#comment-14513718 ] ASF GitHub Bot commented on FLINK-1615: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/621#issuecomment-96565454 The tests are failing because you use spaces in you code for indentation. Could you please change all indentation to tabs to satisfy the style checker? Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1930) NullPointerException in vertex-centric iteration
[ https://issues.apache.org/jira/browse/FLINK-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513703#comment-14513703 ] ASF GitHub Bot commented on FLINK-1930: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/624#issuecomment-96555080 Thanks. I'm merging this. (The failed Travis jobs are Travis-related I think.) NullPointerException in vertex-centric iteration Key: FLINK-1930 URL: https://issues.apache.org/jira/browse/FLINK-1930 Project: Flink Issue Type: Bug Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Ufuk Celebi Hello to my Squirrels, I came across this exception when having a vertex-centric iteration output followed by a group by. I'm not sure if what is causing it, since I saw this error in a rather large pipeline, but I managed to reproduce it with [this code example | https://github.com/vasia/flink/commit/1b7bbca1a6130fbcfe98b4b9b43967eb4c61f309] and a sufficiently large dataset, e.g. [this one | http://snap.stanford.edu/data/com-DBLP.html] (I'm running this locally). It seems like a null Buffer in RecordWriter. The exception message is the following: Exception in thread main org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:319) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:37) at org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:30) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$anon$1.applyOrElse(ActorLogMessages.scala:30) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.NullPointerException at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.setNextBuffer(SpanningRecordSerializer.java:93) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:92) at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.streamSolutionSetToFinalOutput(IterationHeadPactTask.java:405) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:365) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:221) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1946) Make yarn tests logging less verbose
Till Rohrmann created FLINK-1946: Summary: Make yarn tests logging less verbose Key: FLINK-1946 URL: https://issues.apache.org/jira/browse/FLINK-1946 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Priority: Minor Currently, the yarn tests log on the INFO level making the test outputs confusing. Furthermore some status messages are written to stdout. I think these messages are not necessary to be shown to the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/621#issuecomment-96565454 The tests are failing because you use spaces in you code for indentation. Could you please change all indentation to tabs to satisfy the style checker? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/593#issuecomment-96532907 Yes, this would make things a lot cleaner. @twalthr what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1789) Allow adding of URLs to the usercode class loader
[ https://issues.apache.org/jira/browse/FLINK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513657#comment-14513657 ] ASF GitHub Bot commented on FLINK-1789: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/593#issuecomment-96532907 Yes, this would make things a lot cleaner. @twalthr what do you think? Allow adding of URLs to the usercode class loader - Key: FLINK-1789 URL: https://issues.apache.org/jira/browse/FLINK-1789 Project: Flink Issue Type: Improvement Components: Distributed Runtime Reporter: Timo Walther Assignee: Timo Walther Priority: Minor Currently, there is no option to add customs classpath URLs to the FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even if they are already present on all nodes. It would be great if RemoteEnvironment also accepts valid classpaths URLs and forwards them to BlobLibraryCacheManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1938) Add Grunt for building the front-end
[ https://issues.apache.org/jira/browse/FLINK-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513713#comment-14513713 ] ASF GitHub Bot commented on FLINK-1938: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/623#issuecomment-96562609 You are definitely welcome to participate in the new web frontend development. Have a look at the branch I sent you and see if you find your way around the new code (it is not too much so far, so it is okay, hopefully). I am not familiar with Grunt. It says it is a JavaScript automation framework. What is it for, in the context of the web frontend? Add Grunt for building the front-end Key: FLINK-1938 URL: https://issues.apache.org/jira/browse/FLINK-1938 Project: Flink Issue Type: Improvement Components: Build System, Webfrontend Reporter: Vikhyat Korrapati Priority: Minor This is the first step towards implementing the web interface refactoring I proposed last year: https://groups.google.com/forum/#!topic/stratosphere-dev/GeXmDXF9DOY Once this is merged, I can get started with the rest of the refactoring. For now, the actual interface is kept the same, the only change is to how the build is done. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1682] Ported optimizer unit tests from ...
GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/627 [FLINK-1682] Ported optimizer unit tests from Record API to Java API This is a step towards removing the deprecated Record API. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink recordOptimizerTests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/627.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #627 commit 303e86cc7af2604c3d9c781bd08a656daf6b99ae Author: Fabian Hueske fhue...@apache.org Date: 2015-04-24T23:30:11Z [FLINK-1682] Ported optimizer unit tests from Record API to Java API --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1924) [Py] Refactor a few minor things
[ https://issues.apache.org/jira/browse/FLINK-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513747#comment-14513747 ] ASF GitHub Bot commented on FLINK-1924: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/616#issuecomment-96573887 +1, can you merge it @mxm [Py] Refactor a few minor things Key: FLINK-1924 URL: https://issues.apache.org/jira/browse/FLINK-1924 Project: Flink Issue Type: Improvement Components: Python API Affects Versions: 0.9 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Priority: Trivial Fix For: 0.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1924) [Py] Refactor a few minor things
[ https://issues.apache.org/jira/browse/FLINK-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513732#comment-14513732 ] ASF GitHub Bot commented on FLINK-1924: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/616#issuecomment-96570629 LGTM [Py] Refactor a few minor things Key: FLINK-1924 URL: https://issues.apache.org/jira/browse/FLINK-1924 Project: Flink Issue Type: Improvement Components: Python API Affects Versions: 0.9 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Priority: Trivial Fix For: 0.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1523) Vertex-centric iteration extensions
[ https://issues.apache.org/jira/browse/FLINK-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513707#comment-14513707 ] ASF GitHub Bot commented on FLINK-1523: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/537#issuecomment-96558324 Looks good in general. A few thoughts on this: Your prior discussion involved efficiency, and Vasia suggested to not carry the degrees in all cases (as they are not needed in most cases). It seems that was not yet realized, because the Vertex class always carries the degrees. I think we can improve the abstraction between the with-degree and without-degree case a bit. Cases where one has to throw a not supported can usually be improved with a good inheritance hierarchy. Does it work to make the VertexWithDegrees class a subclass of the Vertex class? Also, as a bit of background: Vertex is a subclass of Tuple2. Tuples are currently the fastest data types in Flink. By adding additional Fields to the Tuple, you are making it a POJO (as far as I know), which is a bit slower to serialize type. Also: primitives are usually better than boxed types. Prefer `long` over `Long` where possible. Vertex-centric iteration extensions --- Key: FLINK-1523 URL: https://issues.apache.org/jira/browse/FLINK-1523 Project: Flink Issue Type: Improvement Components: Gelly Reporter: Vasia Kalavri Assignee: Andra Lungu We would like to make the following extensions to the vertex-centric iterations of Gelly: - allow vertices to access their in/out degrees and the total number of vertices of the graph, inside the iteration. - allow choosing the neighborhood type (in/out/all) over which to run the vertex-centric iteration. Now, the model uses the updates of the in-neighbors to calculate state and send messages to out-neighbors. We could add a parameter with value in/out/all to the {{VertexUpdateFunction}} and {{MessagingFunction}}, that would indicate the type of neighborhood. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1930) NullPointerException in vertex-centric iteration
[ https://issues.apache.org/jira/browse/FLINK-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-1930. -- Resolution: Fixed Fixed in 88638de. NullPointerException in vertex-centric iteration Key: FLINK-1930 URL: https://issues.apache.org/jira/browse/FLINK-1930 Project: Flink Issue Type: Bug Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Ufuk Celebi Hello to my Squirrels, I came across this exception when having a vertex-centric iteration output followed by a group by. I'm not sure if what is causing it, since I saw this error in a rather large pipeline, but I managed to reproduce it with [this code example | https://github.com/vasia/flink/commit/1b7bbca1a6130fbcfe98b4b9b43967eb4c61f309] and a sufficiently large dataset, e.g. [this one | http://snap.stanford.edu/data/com-DBLP.html] (I'm running this locally). It seems like a null Buffer in RecordWriter. The exception message is the following: Exception in thread main org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:319) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:37) at org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:30) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$anon$1.applyOrElse(ActorLogMessages.scala:30) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.NullPointerException at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.setNextBuffer(SpanningRecordSerializer.java:93) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:92) at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.streamSolutionSetToFinalOutput(IterationHeadPactTask.java:405) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:365) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:221) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1924] Minor Refactoring
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/616#issuecomment-96570629 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-1923) Replace asynchronous logging in JobManager with regular slf4j logging
[ https://issues.apache.org/jira/browse/FLINK-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-1923: Assignee: Till Rohrmann Replace asynchronous logging in JobManager with regular slf4j logging - Key: FLINK-1923 URL: https://issues.apache.org/jira/browse/FLINK-1923 Project: Flink Issue Type: Task Components: JobManager Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Till Rohrmann Its hard to understand exactly whats going on in the JobManager because the log messages are send asynchronously by a logging actor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1945) Make python tests less verbose
Till Rohrmann created FLINK-1945: Summary: Make python tests less verbose Key: FLINK-1945 URL: https://issues.apache.org/jira/browse/FLINK-1945 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Priority: Minor Currently, the python tests print a lot of log messages to stdout. Furthermore there seems to be some println statements which clutter the console output. I think that these log messages are not required for the tests and thus should be suppressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: yarn client tests
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/134 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Flink 964
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/112 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1930) NullPointerException in vertex-centric iteration
[ https://issues.apache.org/jira/browse/FLINK-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513706#comment-14513706 ] ASF GitHub Bot commented on FLINK-1930: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/624 NullPointerException in vertex-centric iteration Key: FLINK-1930 URL: https://issues.apache.org/jira/browse/FLINK-1930 Project: Flink Issue Type: Bug Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Ufuk Celebi Hello to my Squirrels, I came across this exception when having a vertex-centric iteration output followed by a group by. I'm not sure if what is causing it, since I saw this error in a rather large pipeline, but I managed to reproduce it with [this code example | https://github.com/vasia/flink/commit/1b7bbca1a6130fbcfe98b4b9b43967eb4c61f309] and a sufficiently large dataset, e.g. [this one | http://snap.stanford.edu/data/com-DBLP.html] (I'm running this locally). It seems like a null Buffer in RecordWriter. The exception message is the following: Exception in thread main org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:319) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:37) at org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:30) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$anon$1.applyOrElse(ActorLogMessages.scala:30) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.NullPointerException at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.setNextBuffer(SpanningRecordSerializer.java:93) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:92) at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.streamSolutionSetToFinalOutput(IterationHeadPactTask.java:405) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:365) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:221) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1930] Separate output buffer pool and r...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/624 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1523][gelly] Vertex centric iteration e...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/537#issuecomment-96558324 Looks good in general. A few thoughts on this: Your prior discussion involved efficiency, and Vasia suggested to not carry the degrees in all cases (as they are not needed in most cases). It seems that was not yet realized, because the Vertex class always carries the degrees. I think we can improve the abstraction between the with-degree and without-degree case a bit. Cases where one has to throw a not supported can usually be improved with a good inheritance hierarchy. Does it work to make the VertexWithDegrees class a subclass of the Vertex class? Also, as a bit of background: Vertex is a subclass of Tuple2. Tuples are currently the fastest data types in Flink. By adding additional Fields to the Tuple, you are making it a POJO (as far as I know), which is a bit slower to serialize type. Also: primitives are usually better than boxed types. Prefer `long` over `Long` where possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-1946) Make yarn tests logging less verbose
[ https://issues.apache.org/jira/browse/FLINK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-1946: -- Component/s: YARN Client Make yarn tests logging less verbose Key: FLINK-1946 URL: https://issues.apache.org/jira/browse/FLINK-1946 Project: Flink Issue Type: Improvement Components: YARN Client Reporter: Till Rohrmann Priority: Minor Currently, the yarn tests log on the INFO level making the test outputs confusing. Furthermore some status messages are written to stdout. I think these messages are not necessary to be shown to the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/571#discussion_r29129684 --- Diff: flink-staging/flink-hbase/pom.xml --- @@ -112,6 +112,12 @@ under the License. /exclusion /exclusions /dependency + dependency --- End diff -- I do not have a HBase setup here. Could you try to exclude all dependencies of hbase-server and add them until it works? I hope the TableInputFormat and TableOutputFormat have not too many external dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1828) Impossible to output data to an HBase table
[ https://issues.apache.org/jira/browse/FLINK-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513737#comment-14513737 ] ASF GitHub Bot commented on FLINK-1828: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/571#discussion_r29129684 --- Diff: flink-staging/flink-hbase/pom.xml --- @@ -112,6 +112,12 @@ under the License. /exclusion /exclusions /dependency + dependency --- End diff -- I do not have a HBase setup here. Could you try to exclude all dependencies of hbase-server and add them until it works? I hope the TableInputFormat and TableOutputFormat have not too many external dependencies. Impossible to output data to an HBase table --- Key: FLINK-1828 URL: https://issues.apache.org/jira/browse/FLINK-1828 Project: Flink Issue Type: Bug Components: Hadoop Compatibility Affects Versions: 0.9 Reporter: Flavio Pompermaier Labels: hadoop, hbase Fix For: 0.9 Right now it is not possible to use HBase TableOutputFormat as output format because Configurable.setConf is not called in the configure() method of the HadoopOutputFormatBase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...
Github user fpompermaier commented on a diff in the pull request: https://github.com/apache/flink/pull/571#discussion_r29130488 --- Diff: flink-staging/flink-hbase/pom.xml --- @@ -112,6 +112,12 @@ under the License. /exclusion /exclusions /dependency + dependency --- End diff -- Ok, I hope to be able to do it before this evening! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1828) Impossible to output data to an HBase table
[ https://issues.apache.org/jira/browse/FLINK-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513753#comment-14513753 ] ASF GitHub Bot commented on FLINK-1828: --- Github user fpompermaier commented on a diff in the pull request: https://github.com/apache/flink/pull/571#discussion_r29130488 --- Diff: flink-staging/flink-hbase/pom.xml --- @@ -112,6 +112,12 @@ under the License. /exclusion /exclusions /dependency + dependency --- End diff -- Ok, I hope to be able to do it before this evening! Impossible to output data to an HBase table --- Key: FLINK-1828 URL: https://issues.apache.org/jira/browse/FLINK-1828 Project: Flink Issue Type: Bug Components: Hadoop Compatibility Affects Versions: 0.9 Reporter: Flavio Pompermaier Labels: hadoop, hbase Fix For: 0.9 Right now it is not possible to use HBase TableOutputFormat as output format because Configurable.setConf is not called in the configure() method of the HadoopOutputFormatBase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1925] Fixes blocking method submitTask ...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/622#discussion_r29140659 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -795,108 +795,111 @@ extends Actor with ActorLogMessages with ActorLogging { var startRegisteringTask = 0L var task: Task = null -// all operations are in a try / catch block to make sure we send a result upon any failure -try { - // check that we are already registered - if (!isConnected) { -throw new IllegalStateException(TaskManager is not associated with a JobManager) - } - if (slot 0 || slot = numberOfSlots) { -throw new Exception(sTarget slot ${slot} does not exist on TaskManager.) - } +if (!isConnected) { + sender ! Failure( +new IllegalStateException(TaskManager is not associated with a JobManager.) + ) +} else if (slot 0 || slot = numberOfSlots) { + sender ! Failure(new Exception(sTarget slot $slot does not exist on TaskManager.)) +} else { + sender ! Acknowledge - val userCodeClassLoader = libraryCacheManager match { -case Some(manager) = - if (LOG.isDebugEnabled) { -startRegisteringTask = System.currentTimeMillis() - } + Future { +try { --- End diff -- Can we pull the code in the future into its own method? Makes it easier to understand by separating the different parts along the methods. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1925) Split SubmitTask method up into two phases: Receive TDD and instantiation of TDD
[ https://issues.apache.org/jira/browse/FLINK-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513981#comment-14513981 ] ASF GitHub Bot commented on FLINK-1925: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/622#discussion_r29140659 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -795,108 +795,111 @@ extends Actor with ActorLogMessages with ActorLogging { var startRegisteringTask = 0L var task: Task = null -// all operations are in a try / catch block to make sure we send a result upon any failure -try { - // check that we are already registered - if (!isConnected) { -throw new IllegalStateException(TaskManager is not associated with a JobManager) - } - if (slot 0 || slot = numberOfSlots) { -throw new Exception(sTarget slot ${slot} does not exist on TaskManager.) - } +if (!isConnected) { + sender ! Failure( +new IllegalStateException(TaskManager is not associated with a JobManager.) + ) +} else if (slot 0 || slot = numberOfSlots) { + sender ! Failure(new Exception(sTarget slot $slot does not exist on TaskManager.)) +} else { + sender ! Acknowledge - val userCodeClassLoader = libraryCacheManager match { -case Some(manager) = - if (LOG.isDebugEnabled) { -startRegisteringTask = System.currentTimeMillis() - } + Future { +try { --- End diff -- Can we pull the code in the future into its own method? Makes it easier to understand by separating the different parts along the methods. Split SubmitTask method up into two phases: Receive TDD and instantiation of TDD Key: FLINK-1925 URL: https://issues.apache.org/jira/browse/FLINK-1925 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann A user reported that a job times out while submitting tasks to the TaskManager. The reason is that the JobManager expects a TaskOperationResult response upon submitting a task to the TM. The TM downloads then the required jars from the JM which blocks the actor thread and can take a very long time if many TMs download from the JM. Due to this, the SubmitTask future throws a TimeOutException. A possible solution could be that the TM eagerly acknowledges the reception of the SubmitTask message and executes the task initialization within a future. The future will upon completion send a UpdateTaskExecutionState message to the JM which switches the state of the task from deploying to running. This means that the handler of SubmitTask future in {{Execution}} won't change the state of the task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514099#comment-14514099 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29144391 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java --- @@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, I final int delimLimit = limit-delimiter.length+1; + if (bytes.length == 0) { --- End diff -- This check is not strictly necessary, IMO. `bytes` is a larger byte array which is reused by the calling `GenericCsvInputFormat`. To reduce the processing overhead of each field, I would omit the check (here and in the Long and Short parsers) Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514102#comment-14514102 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/566#issuecomment-96647676 Looks good. I added few minor comments inline. Did you check if the changes should also go into the `ByteParser`? Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1923) Replace asynchronous logging in JobManager with regular slf4j logging
[ https://issues.apache.org/jira/browse/FLINK-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513973#comment-14513973 ] ASF GitHub Bot commented on FLINK-1923: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/628#issuecomment-96622572 I am curious, why did you rewrite the TaskManager? I thought that one was logging synchronously already. Replace asynchronous logging in JobManager with regular slf4j logging - Key: FLINK-1923 URL: https://issues.apache.org/jira/browse/FLINK-1923 Project: Flink Issue Type: Task Components: JobManager Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Till Rohrmann Its hard to understand exactly whats going on in the JobManager because the log messages are send asynchronously by a logging actor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1792] TM Monitoring: CPU utilization, h...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/553#issuecomment-96631425 Thank you. I'm trying to find time to review your changes soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1792) Improve TM Monitoring: CPU utilization, hide graphs by default and show summary only
[ https://issues.apache.org/jira/browse/FLINK-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514030#comment-14514030 ] ASF GitHub Bot commented on FLINK-1792: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/553#issuecomment-96631425 Thank you. I'm trying to find time to review your changes soon. Improve TM Monitoring: CPU utilization, hide graphs by default and show summary only Key: FLINK-1792 URL: https://issues.apache.org/jira/browse/FLINK-1792 Project: Flink Issue Type: Sub-task Components: Webfrontend Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Sachin Bhat As per https://github.com/apache/flink/pull/421 from FLINK-1501, there are some enhancements to the current monitoring required - Get the CPU utilization in % from each TaskManager process - Remove the metrics graph from the overview and only show the current stats as numbers (cpu load, heap utilization) and add a button to enable the detailed graph. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1922) Failed task deployment causes NPE on input split assignment
[ https://issues.apache.org/jira/browse/FLINK-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-1922: Assignee: Till Rohrmann Failed task deployment causes NPE on input split assignment --- Key: FLINK-1922 URL: https://issues.apache.org/jira/browse/FLINK-1922 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Till Rohrmann The input split assignment code is returning {null} if the Task has failed, which is causing a NPE. We should improve our error handling / reporting in that situation. {code} 13:12:31,002 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Status of job c0b47ce41e9a85a628a628a3977705ef (Flink Java Job at Tue Apr 21 13:10:36 UTC 2015) changed to FAILING Cannot deploy task - TaskManager not responding.. 13:12:47,591 ERROR org.apache.flink.runtime.operators.RegularPactTask - Error in task code: CHAIN DataSource (at userMethod (org.apache.flink.api.java.io.AvroInputFormat)) - FlatMap (FlatMap at main(UserClass.java:111)) (20/50) java.lang.RuntimeException: Requesting the next InputSplit failed. at org.apache.flink.runtime.taskmanager.TaskInputSplitProvider.getNextInputSplit(TaskInputSplitProvider.java:88) at org.apache.flink.runtime.operators.DataSourceTask$1.hasNext(DataSourceTask.java:337) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:136) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.NullPointerException at java.io.ByteArrayInputStream.init(ByteArrayInputStream.java:106) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:301) at org.apache.flink.runtime.taskmanager.TaskInputSplitProvider.getNextInputSplit(TaskInputSplitProvider.java:83) ... 4 more 13:12:47,595 INFO org.apache.flink.runtime.taskmanager.Task - CHAIN DataSource (at SomeMethod (org.apache.flink.api.java.io.AvroInputFormat)) - FlatMap (FlatMap at main(SomeClass.java:111)) (20/50) switched to FAILED : java.lang.RuntimeException: Requesting the next InputSplit failed. at org.apache.flink.runtime.taskmanager.TaskInputSplitProvider.getNextInputSplit(TaskInputSplitProvider.java:88) at org.apache.flink.runtime.operators.DataSourceTask$1.hasNext(DataSourceTask.java:337) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:136) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.NullPointerException at java.io.ByteArrayInputStream.init(ByteArrayInputStream.java:106) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:301) at org.apache.flink.runtime.taskmanager.TaskInputSplitProvider.getNextInputSplit(TaskInputSplitProvider.java:83) ... 4 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1923) Replace asynchronous logging in JobManager with regular slf4j logging
[ https://issues.apache.org/jira/browse/FLINK-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513971#comment-14513971 ] ASF GitHub Bot commented on FLINK-1923: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/628#discussion_r29140183 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java --- @@ -357,16 +356,6 @@ protected void unregisterTask() { taskManager.tell(new UnregisterTask(executionId), ActorRef.noSender()); } - protected void notifyExecutionStateChange(ExecutionState executionState, --- End diff -- Was this method unused? Replace asynchronous logging in JobManager with regular slf4j logging - Key: FLINK-1923 URL: https://issues.apache.org/jira/browse/FLINK-1923 Project: Flink Issue Type: Task Components: JobManager Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Till Rohrmann Its hard to understand exactly whats going on in the JobManager because the log messages are send asynchronously by a logging actor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1923] Replaces asynchronous logging wit...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/628#discussion_r29140183 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java --- @@ -357,16 +356,6 @@ protected void unregisterTask() { taskManager.tell(new UnregisterTask(executionId), ActorRef.noSender()); } - protected void notifyExecutionStateChange(ExecutionState executionState, --- End diff -- Was this method unused? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1682) Port Record-API based optimizer tests to new Java API
[ https://issues.apache.org/jira/browse/FLINK-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514045#comment-14514045 ] ASF GitHub Bot commented on FLINK-1682: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/627#issuecomment-96631688 Yes, very good point. I replaced the `.print()` statements by `.output(new DiscardingOutputFormat()` as suggested (not only in the ported tests but also some more on the way). Port Record-API based optimizer tests to new Java API - Key: FLINK-1682 URL: https://issues.apache.org/jira/browse/FLINK-1682 Project: Flink Issue Type: Sub-task Components: Optimizer Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Minor Fix For: 0.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1682] Ported optimizer unit tests from ...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/627#issuecomment-96631688 Yes, very good point. I replaced the `.print()` statements by `.output(new DiscardingOutputFormat()` as suggested (not only in the ported tests but also some more on the way). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1922] Fixes NPE when TM receives a null...
GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/631 [FLINK-1922] Fixes NPE when TM receives a null input split The ```TaskInputSplitProvider``` did not handle null input splits which are wrapped into a ```NextInputSplit``` message properly. Fixed this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixInputSplit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/631.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #631 commit dad90438f7508309122611f5d79ee9295d242a6f Author: Till Rohrmann trohrm...@apache.org Date: 2015-04-27T12:30:16Z [FLINK-1922] [runtime] Fixes NPE when TM receives a null input split --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1820] CSVReader: In case of an empty st...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/566#issuecomment-96647676 Looks good. I added few minor comments inline. Did you check if the changes should also go into the `ByteParser`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-1945) Make python tests less verbose
[ https://issues.apache.org/jira/browse/FLINK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler reassigned FLINK-1945: --- Assignee: Chesnay Schepler Make python tests less verbose -- Key: FLINK-1945 URL: https://issues.apache.org/jira/browse/FLINK-1945 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Chesnay Schepler Priority: Minor Currently, the python tests print a lot of log messages to stdout. Furthermore there seems to be some println statements which clutter the console output. I think that these log messages are not required for the tests and thus should be suppressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514122#comment-14514122 ] ASF GitHub Bot commented on FLINK-1820: --- Github user FelixNeutatz commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29145755 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java --- @@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, I final int delimLimit = limit-delimiter.length+1; + if (bytes.length == 0) { --- End diff -- If I skip this check - the LongParserTest, ShortParserTest ... will fail because of an out-of-bound-exception ... Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1820] CSVReader: In case of an empty st...
Github user FelixNeutatz commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29145755 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java --- @@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, I final int delimLimit = limit-delimiter.length+1; + if (bytes.length == 0) { --- End diff -- If I skip this check - the LongParserTest, ShortParserTest ... will fail because of an out-of-bound-exception ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1933) Add distance measure interface and basic implementation to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513953#comment-14513953 ] ASF GitHub Bot commented on FLINK-1933: --- GitHub user chiwanpark opened a pull request: https://github.com/apache/flink/pull/629 [FLINK-1933] Add distance measure interface and basic implementation to machine learning library This PR contains following changes: * Add `dot` method and `magnitude` method. * Add `DistanceMeasure` trait. * Add 7 basic implementation of `DistanceMeasure`. * Add tests for above changes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chiwanpark/flink FLINK-1933 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/629.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #629 commit 3fac0ff339de93ab1c3b3582924af75a1e6057ea Author: Chiwan Park chiwanp...@icloud.com Date: 2015-04-24T03:50:28Z [FLINK-1933] [ml] Add dot product and magnitude into Vector commit c8f940c2439f754ef0e640b5440507bce4b859d2 Author: Chiwan Park chiwanp...@icloud.com Date: 2015-04-27T09:48:56Z [FLINK-1933] [ml] Add distance measure interface and basic implementation to machine learning library Add distance measure interface and basic implementation to machine learning library --- Key: FLINK-1933 URL: https://issues.apache.org/jira/browse/FLINK-1933 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Chiwan Park Assignee: Chiwan Park Labels: ML Add distance measure interface to calculate distance between two vectors and some implementations of the interface. In FLINK-1745, [~till.rohrmann] suggests a interface following: {code} trait DistanceMeasure { def distance(a: Vector, b: Vector): Double } {code} I think that following list of implementation is sufficient to provide first to ML library users. * Manhattan distance [1] * Cosine distance [2] * Euclidean distance (and Squared) [3] * Tanimoto distance [4] * Minkowski distance [5] * Chebyshev distance [6] [1]: http://en.wikipedia.org/wiki/Taxicab_geometry [2]: http://en.wikipedia.org/wiki/Cosine_similarity [3]: http://en.wikipedia.org/wiki/Euclidean_distance [4]: http://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_coefficient_.28extended_Jaccard_coefficient.29 [5]: http://en.wikipedia.org/wiki/Minkowski_distance [6]: http://en.wikipedia.org/wiki/Chebyshev_distance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1911] [streaming] Streaming projection ...
GitHub user szape opened a pull request: https://github.com/apache/flink/pull/630 [FLINK-1911] [streaming] Streaming projection without types Since the DataSet projection has been reworked to not require the .types(...) call the Streaming and Batch methods were out of sync. So, the streaming API projection was modified accordingly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mbalassi/flink FLINK-1911 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/630.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #630 commit a36e7bfe8cf5d5a949714de215aaabf03494f62d Author: szape nemderogator...@gmail.com Date: 2015-04-20T14:53:46Z [FLINK-1911] [streaming] Working streaming projection prototype without types --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1925] Fixes blocking method submitTask ...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/622#discussion_r29140453 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java --- @@ -345,26 +346,9 @@ public void onComplete(Throwable failure, Object success) throws Throwable { } } else { - if (success == null) { - markFailed(new Exception(Failed to deploy the task to slot + slot + : TaskOperationResult was null)); - } - - if (success instanceof TaskOperationResult) { - TaskOperationResult result = (TaskOperationResult) success; - - if (!result.executionID().equals(attemptId)) { - markFailed(new Exception(Answer execution id does not match the request execution id.)); - } else if (result.success()) { - switchToRunning(); - } else { - // deployment failed :( - markFailed(new Exception(Failed to deploy the task + - getVertexWithAttempt() + to slot + slot + : + result - .description())); - } - } else { + if (!(success instanceof Messages.Acknowledge$)) { --- End diff -- I think this line is not parsable in Eclipse (the $ mess with the Java parser). A workaround is to expose the case object class and object via a utility method and check against that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1922) Failed task deployment causes NPE on input split assignment
[ https://issues.apache.org/jira/browse/FLINK-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514068#comment-14514068 ] ASF GitHub Bot commented on FLINK-1922: --- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/631 [FLINK-1922] Fixes NPE when TM receives a null input split The ```TaskInputSplitProvider``` did not handle null input splits which are wrapped into a ```NextInputSplit``` message properly. Fixed this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixInputSplit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/631.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #631 commit dad90438f7508309122611f5d79ee9295d242a6f Author: Till Rohrmann trohrm...@apache.org Date: 2015-04-27T12:30:16Z [FLINK-1922] [runtime] Fixes NPE when TM receives a null input split Failed task deployment causes NPE on input split assignment --- Key: FLINK-1922 URL: https://issues.apache.org/jira/browse/FLINK-1922 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Till Rohrmann The input split assignment code is returning {null} if the Task has failed, which is causing a NPE. We should improve our error handling / reporting in that situation. {code} 13:12:31,002 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Status of job c0b47ce41e9a85a628a628a3977705ef (Flink Java Job at Tue Apr 21 13:10:36 UTC 2015) changed to FAILING Cannot deploy task - TaskManager not responding.. 13:12:47,591 ERROR org.apache.flink.runtime.operators.RegularPactTask - Error in task code: CHAIN DataSource (at userMethod (org.apache.flink.api.java.io.AvroInputFormat)) - FlatMap (FlatMap at main(UserClass.java:111)) (20/50) java.lang.RuntimeException: Requesting the next InputSplit failed. at org.apache.flink.runtime.taskmanager.TaskInputSplitProvider.getNextInputSplit(TaskInputSplitProvider.java:88) at org.apache.flink.runtime.operators.DataSourceTask$1.hasNext(DataSourceTask.java:337) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:136) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.NullPointerException at java.io.ByteArrayInputStream.init(ByteArrayInputStream.java:106) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:301) at org.apache.flink.runtime.taskmanager.TaskInputSplitProvider.getNextInputSplit(TaskInputSplitProvider.java:83) ... 4 more 13:12:47,595 INFO org.apache.flink.runtime.taskmanager.Task - CHAIN DataSource (at SomeMethod (org.apache.flink.api.java.io.AvroInputFormat)) - FlatMap (FlatMap at main(SomeClass.java:111)) (20/50) switched to FAILED : java.lang.RuntimeException: Requesting the next InputSplit failed. at org.apache.flink.runtime.taskmanager.TaskInputSplitProvider.getNextInputSplit(TaskInputSplitProvider.java:88) at org.apache.flink.runtime.operators.DataSourceTask$1.hasNext(DataSourceTask.java:337) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:136) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.NullPointerException at java.io.ByteArrayInputStream.init(ByteArrayInputStream.java:106) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:301) at org.apache.flink.runtime.taskmanager.TaskInputSplitProvider.getNextInputSplit(TaskInputSplitProvider.java:83) ... 4 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1820] CSVReader: In case of an empty st...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143814 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/FloatParser.java --- @@ -102,6 +111,10 @@ public static final float parseField(byte[] bytes, int startPos, int length, cha } String str = new String(bytes, startPos, i); + int len = str.length(); + if(len str.trim().length()) { --- End diff -- See other comment on `String.trim()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514093#comment-14514093 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143814 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/FloatParser.java --- @@ -102,6 +111,10 @@ public static final float parseField(byte[] bytes, int startPos, int length, cha } String str = new String(bytes, startPos, i); + int len = str.length(); + if(len str.trim().length()) { --- End diff -- See other comment on `String.trim()` Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514092#comment-14514092 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143809 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/FloatParser.java --- @@ -41,6 +41,15 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, F } String str = new String(bytes, startPos, i-startPos); + int len = str.length(); + if (len == 0) { + setErrorState(ParseErrorState.EMPTY_STRING); + return -1; + } + if(len str.trim().length()) { --- End diff -- See other comment on `String.trim()` Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1820] CSVReader: In case of an empty st...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143800 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/DoubleParser.java --- @@ -42,6 +42,15 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, D } String str = new String(bytes, startPos, i-startPos); + int len = str.length(); + if (len == 0) { + setErrorState(ParseErrorState.EMPTY_STRING); + return -1; + } + if(len str.trim().length()) { --- End diff -- See other comment on `String.trim()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514091#comment-14514091 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143800 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/DoubleParser.java --- @@ -42,6 +42,15 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, D } String str = new String(bytes, startPos, i-startPos); + int len = str.length(); + if (len == 0) { + setErrorState(ParseErrorState.EMPTY_STRING); + return -1; + } + if(len str.trim().length()) { --- End diff -- See other comment on `String.trim()` Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1820] CSVReader: In case of an empty st...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143809 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/FloatParser.java --- @@ -41,6 +41,15 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, F } String str = new String(bytes, startPos, i-startPos); + int len = str.length(); + if (len == 0) { + setErrorState(ParseErrorState.EMPTY_STRING); + return -1; + } + if(len str.trim().length()) { --- End diff -- See other comment on `String.trim()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514087#comment-14514087 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143622 --- Diff: flink-java/src/test/java/org/apache/flink/api/java/io/CsvInputFormatTest.java --- @@ -353,6 +354,99 @@ public void testIntegerFieldsl() throws IOException { assertEquals(Integer.valueOf(888), result.f2); assertEquals(Integer.valueOf(999), result.f3); assertEquals(Integer.valueOf(000), result.f4); + + result = format.nextRecord(result); + assertNull(result); + assertTrue(format.reachedEnd()); + } + catch (Exception ex) { + fail(Test failed due to a + ex.getClass().getName() + : + ex.getMessage()); + } + } + + @Test + public void testEmptyFields() throws IOException { + try { + final String fileContent = |0|0|0|0\n + + 1||1|1|1|\n + + 2|2| |2|2|\n + + 3 |3|3| |3|\n + + 4|4|4|4| |\n; + final FileInputSplit split = createTempFile(fileContent); + + final TupleTypeInfoTuple5Short, Integer, Long, Float, Double typeInfo = + TupleTypeInfo.getBasicTupleTypeInfo(Short.class, Integer.class, Long.class, Float.class, Double.class); + final CsvInputFormatTuple5Short, Integer, Long, Float, Double format = new CsvInputFormatTuple5Short, Integer, Long, Float, Double(PATH, typeInfo); + + format.setFieldDelimiter(|); + + format.configure(new Configuration()); + format.open(split); + + Tuple5Short, Integer, Long, Float, Double result = new Tuple5Short, Integer, Long, Float, Double(); + + try { + result = format.nextRecord(result); + fail(Empty String Parse Exception was not thrown! (ShortParser)); + } catch (ParseException e) {} + try { + result = format.nextRecord(result); + fail(Empty String Parse Exception was not thrown! (IntegerParser)); + } catch (ParseException e) {} + try { + result = format.nextRecord(result); + fail(Empty String Parse Exception was not thrown! (LongParser)); + } catch (ParseException e) {} + try { + result = format.nextRecord(result); --- End diff -- Doesn't this call fail because of the tailing whitespace in the `short` field? Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1820] CSVReader: In case of an empty st...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143622 --- Diff: flink-java/src/test/java/org/apache/flink/api/java/io/CsvInputFormatTest.java --- @@ -353,6 +354,99 @@ public void testIntegerFieldsl() throws IOException { assertEquals(Integer.valueOf(888), result.f2); assertEquals(Integer.valueOf(999), result.f3); assertEquals(Integer.valueOf(000), result.f4); + + result = format.nextRecord(result); + assertNull(result); + assertTrue(format.reachedEnd()); + } + catch (Exception ex) { + fail(Test failed due to a + ex.getClass().getName() + : + ex.getMessage()); + } + } + + @Test + public void testEmptyFields() throws IOException { + try { + final String fileContent = |0|0|0|0\n + + 1||1|1|1|\n + + 2|2| |2|2|\n + + 3 |3|3| |3|\n + + 4|4|4|4| |\n; + final FileInputSplit split = createTempFile(fileContent); + + final TupleTypeInfoTuple5Short, Integer, Long, Float, Double typeInfo = + TupleTypeInfo.getBasicTupleTypeInfo(Short.class, Integer.class, Long.class, Float.class, Double.class); + final CsvInputFormatTuple5Short, Integer, Long, Float, Double format = new CsvInputFormatTuple5Short, Integer, Long, Float, Double(PATH, typeInfo); + + format.setFieldDelimiter(|); + + format.configure(new Configuration()); + format.open(split); + + Tuple5Short, Integer, Long, Float, Double result = new Tuple5Short, Integer, Long, Float, Double(); + + try { + result = format.nextRecord(result); + fail(Empty String Parse Exception was not thrown! (ShortParser)); + } catch (ParseException e) {} + try { + result = format.nextRecord(result); + fail(Empty String Parse Exception was not thrown! (IntegerParser)); + } catch (ParseException e) {} + try { + result = format.nextRecord(result); + fail(Empty String Parse Exception was not thrown! (LongParser)); + } catch (ParseException e) {} + try { + result = format.nextRecord(result); --- End diff -- Doesn't this call fail because of the tailing whitespace in the `short` field? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-1937) Cannot create SparseVector with only one non-zero element.
[ https://issues.apache.org/jira/browse/FLINK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-1937: Assignee: Till Rohrmann Cannot create SparseVector with only one non-zero element. -- Key: FLINK-1937 URL: https://issues.apache.org/jira/browse/FLINK-1937 Project: Flink Issue Type: Bug Components: Machine Learning Library Reporter: Chiwan Park Assignee: Till Rohrmann Labels: ML I tried creating SparseVector with only one non-zero element. But I couldn't create it. Following code causes the problem. {code} val vec2 = SparseVector.fromCOO(3, (1, 1)) {code} I got a compile error following: {code:none} Error:(60, 29) overloaded method value fromCOO with alternatives: (size: Int,entries: Iterable[(Int, Double)])org.apache.flink.ml.math.SparseVector and (size: Int,entries: (Int, Double)*)org.apache.flink.ml.math.SparseVector cannot be applied to (Int, (Int, Int)) val vec2 = SparseVector.fromCOO(3, (1, 1)) ^ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (FLINK-1937) Cannot create SparseVector with only one non-zero element.
[ https://issues.apache.org/jira/browse/FLINK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-1937: - Comment: was deleted (was: Hi [~chiwanpark], the problem is that you're giving a tuple of (Int, Int) to the function {{fromCOO}} which expects a tuple of (Int, Double). Creating the {{SparseVector}} with {code} val vec2 = SparseVector.fromCoo(3, (1, 1.0)) {code} should fix your problem. The underlying problem is that the Scala compiler cannot cast a tuple of (Int, Int) to (Int, Double) even though Int values are a subset of Double. We could add methods which do this for us, though.) Cannot create SparseVector with only one non-zero element. -- Key: FLINK-1937 URL: https://issues.apache.org/jira/browse/FLINK-1937 Project: Flink Issue Type: Bug Components: Machine Learning Library Reporter: Chiwan Park Assignee: Till Rohrmann Labels: ML I tried creating SparseVector with only one non-zero element. But I couldn't create it. Following code causes the problem. {code} val vec2 = SparseVector.fromCOO(3, (1, 1)) {code} I got a compile error following: {code:none} Error:(60, 29) overloaded method value fromCOO with alternatives: (size: Int,entries: Iterable[(Int, Double)])org.apache.flink.ml.math.SparseVector and (size: Int,entries: (Int, Double)*)org.apache.flink.ml.math.SparseVector cannot be applied to (Int, (Int, Int)) val vec2 = SparseVector.fromCOO(3, (1, 1)) ^ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1911) DataStream and DataSet projection is out of sync
[ https://issues.apache.org/jira/browse/FLINK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513964#comment-14513964 ] ASF GitHub Bot commented on FLINK-1911: --- GitHub user szape opened a pull request: https://github.com/apache/flink/pull/630 [FLINK-1911] [streaming] Streaming projection without types Since the DataSet projection has been reworked to not require the .types(...) call the Streaming and Batch methods were out of sync. So, the streaming API projection was modified accordingly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mbalassi/flink FLINK-1911 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/630.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #630 commit a36e7bfe8cf5d5a949714de215aaabf03494f62d Author: szape nemderogator...@gmail.com Date: 2015-04-20T14:53:46Z [FLINK-1911] [streaming] Working streaming projection prototype without types DataStream and DataSet projection is out of sync Key: FLINK-1911 URL: https://issues.apache.org/jira/browse/FLINK-1911 Project: Flink Issue Type: Bug Components: Java API, Streaming Reporter: Gyula Fora Assignee: Péter Szabó Since the DataSet projection has been reworked to not require the .types(...) call the Streaming and Batch methods are out of sync. The streaming api projection needs to be modified accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1923] Replaces asynchronous logging wit...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/628#issuecomment-96622572 I am curious, why did you rewrite the TaskManager? I thought that one was logging synchronously already. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1820] CSVReader: In case of an empty st...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143763 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/DoubleParser.java --- @@ -103,6 +112,10 @@ public static final double parseField(byte[] bytes, int startPos, int length, ch } String str = new String(bytes, startPos, i); + int len = str.length(); + if(len str.trim().length()) { --- End diff -- `String.trim()` creates a new String object. Checking if the first or last character of the String is a whitespace is probably more efficient. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514090#comment-14514090 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143763 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/DoubleParser.java --- @@ -103,6 +112,10 @@ public static final double parseField(byte[] bytes, int startPos, int length, ch } String str = new String(bytes, startPos, i); + int len = str.length(); + if(len str.trim().length()) { --- End diff -- `String.trim()` creates a new String object. Checking if the first or last character of the String is a whitespace is probably more efficient. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1820] CSVReader: In case of an empty st...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29144391 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java --- @@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, I final int delimLimit = limit-delimiter.length+1; + if (bytes.length == 0) { --- End diff -- This check is not strictly necessary, IMO. `bytes` is a larger byte array which is reused by the calling `GenericCsvInputFormat`. To reduce the processing overhead of each field, I would omit the check (here and in the Long and Short parsers) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-1949) YARNSessionFIFOITCase sometimes fails to detect when the detached session finishes
Robert Metzger created FLINK-1949: - Summary: YARNSessionFIFOITCase sometimes fails to detect when the detached session finishes Key: FLINK-1949 URL: https://issues.apache.org/jira/browse/FLINK-1949 Project: Flink Issue Type: Bug Components: Tests, YARN Client Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger {code} 10:32:24,393 INFO org.apache.flink.yarn.YARNSessionFIFOITCase - CLI Frontend has returned, so the job is running 10:32:24,398 INFO org.apache.flink.yarn.YARNSessionFIFOITCase - waiting for the job with appId application_1430130687160_0003 to finish 10:32:24,629 INFO org.apache.flink.yarn.YARNSessionFIFOITCase - The job has finished. TaskManager output file found /home/travis/build/tillrohrmann/flink/flink-yarn-tests/../flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1430130687160_0003/container_1430130687160_0003_01_02/taskmanager-stdout.log 10:32:24,630 WARN org.apache.flink.yarn.YARNSessionFIFOITCase - Error while detached yarn session was running java.lang.AssertionError: Expected string '(all,2)' not found in string '' at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.flink.yarn.YARNSessionFIFOITCase.testDetachedPerJobYarnClusterInternal(YARNSessionFIFOITCase.java:504) at org.apache.flink.yarn.YARNSessionFIFOITCase.testDetachedPerJobYarnClusterWithStreamingJob(YARNSessionFIFOITCase.java:563) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} https://flink.a.o.uce.east.s3.amazonaws.com/travis-artifacts/tillrohrmann/flink/442/442.5.tar.gz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1945) Make python tests less verbose
[ https://issues.apache.org/jira/browse/FLINK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514117#comment-14514117 ] Chesnay Schepler commented on FLINK-1945: - There are no print statements, all output is either a) job progression: 04/26/2015 15:07:09 MapPartition (PythonMapPartition)(1/1) switched to SCHEDULED b) job output: String successful! (yes, this is actually the output of the job) I'll still try to get rid of the logging though, there was recently a similar issue so it shouldn't be too hard to fix. Make python tests less verbose -- Key: FLINK-1945 URL: https://issues.apache.org/jira/browse/FLINK-1945 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Chesnay Schepler Priority: Minor Currently, the python tests print a lot of log messages to stdout. Furthermore there seems to be some println statements which clutter the console output. I think that these log messages are not required for the tests and thus should be suppressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1933] Add distance measure interface an...
GitHub user chiwanpark opened a pull request: https://github.com/apache/flink/pull/629 [FLINK-1933] Add distance measure interface and basic implementation to machine learning library This PR contains following changes: * Add `dot` method and `magnitude` method. * Add `DistanceMeasure` trait. * Add 7 basic implementation of `DistanceMeasure`. * Add tests for above changes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chiwanpark/flink FLINK-1933 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/629.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #629 commit 3fac0ff339de93ab1c3b3582924af75a1e6057ea Author: Chiwan Park chiwanp...@icloud.com Date: 2015-04-24T03:50:28Z [FLINK-1933] [ml] Add dot product and magnitude into Vector commit c8f940c2439f754ef0e640b5440507bce4b859d2 Author: Chiwan Park chiwanp...@icloud.com Date: 2015-04-27T09:48:56Z [FLINK-1933] [ml] Add distance measure interface and basic implementation to machine learning library --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1924] Minor Refactoring
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/616#issuecomment-96617964 @aljoscha Alright, will merge in a bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1924) [Py] Refactor a few minor things
[ https://issues.apache.org/jira/browse/FLINK-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513949#comment-14513949 ] ASF GitHub Bot commented on FLINK-1924: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/616#issuecomment-96617964 @aljoscha Alright, will merge in a bit. [Py] Refactor a few minor things Key: FLINK-1924 URL: https://issues.apache.org/jira/browse/FLINK-1924 Project: Flink Issue Type: Improvement Components: Python API Affects Versions: 0.9 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Priority: Trivial Fix For: 0.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1925) Split SubmitTask method up into two phases: Receive TDD and instantiation of TDD
[ https://issues.apache.org/jira/browse/FLINK-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513978#comment-14513978 ] ASF GitHub Bot commented on FLINK-1925: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/622#discussion_r29140570 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -137,7 +137,7 @@ extends Actor with ActorLogMessages with ActorLogging { protected val resources = HardwareDescription.extractFromSystem(memoryManager.getMemorySize) /** Registry of all tasks currently executed by this TaskManager */ - protected val runningTasks = scala.collection.mutable.HashMap[ExecutionAttemptID, Task]() + protected val runningTasks = scala.collection.concurrent.TrieMap[ExecutionAttemptID, Task]() --- End diff -- I have recently read (I think in the Databricks Scala guide) that they discourage the Scala concurrent package, because of bugs. How about using the java concurrent HashMap? That one has pretty good performance seems to work reliably. Split SubmitTask method up into two phases: Receive TDD and instantiation of TDD Key: FLINK-1925 URL: https://issues.apache.org/jira/browse/FLINK-1925 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann A user reported that a job times out while submitting tasks to the TaskManager. The reason is that the JobManager expects a TaskOperationResult response upon submitting a task to the TM. The TM downloads then the required jars from the JM which blocks the actor thread and can take a very long time if many TMs download from the JM. Due to this, the SubmitTask future throws a TimeOutException. A possible solution could be that the TM eagerly acknowledges the reception of the SubmitTask message and executes the task initialization within a future. The future will upon completion send a UpdateTaskExecutionState message to the JM which switches the state of the task from deploying to running. This means that the handler of SubmitTask future in {{Execution}} won't change the state of the task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1925) Split SubmitTask method up into two phases: Receive TDD and instantiation of TDD
[ https://issues.apache.org/jira/browse/FLINK-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513993#comment-14513993 ] ASF GitHub Bot commented on FLINK-1925: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/622#issuecomment-96627457 Looks god modulo some comments. The most critical being the concurrent Map one. Split SubmitTask method up into two phases: Receive TDD and instantiation of TDD Key: FLINK-1925 URL: https://issues.apache.org/jira/browse/FLINK-1925 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann A user reported that a job times out while submitting tasks to the TaskManager. The reason is that the JobManager expects a TaskOperationResult response upon submitting a task to the TM. The TM downloads then the required jars from the JM which blocks the actor thread and can take a very long time if many TMs download from the JM. Due to this, the SubmitTask future throws a TimeOutException. A possible solution could be that the TM eagerly acknowledges the reception of the SubmitTask message and executes the task initialization within a future. The future will upon completion send a UpdateTaskExecutionState message to the JM which switches the state of the task from deploying to running. This means that the handler of SubmitTask future in {{Execution}} won't change the state of the task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1925] Fixes blocking method submitTask ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/622#issuecomment-96627457 Looks god modulo some comments. The most critical being the concurrent Map one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-1948) Add manual task slot handling for streaming jobs
Gyula Fora created FLINK-1948: - Summary: Add manual task slot handling for streaming jobs Key: FLINK-1948 URL: https://issues.apache.org/jira/browse/FLINK-1948 Project: Flink Issue Type: Improvement Components: Streaming Reporter: Gyula Fora Assignee: Gyula Fora Currently, all stream operators are automatically assigned to the same task sharing group, and the user has no control over this setting. We should add a way to manually affect the way operators are allocated to task manager slots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1937) Cannot create SparseVector with only one non-zero element.
[ https://issues.apache.org/jira/browse/FLINK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514095#comment-14514095 ] Till Rohrmann commented on FLINK-1937: -- Hi [~chiwanpark], the problem is that you're giving a tuple of (Int, Int) to the function {{fromCOO}} which expects a tuple of (Int, Double). Creating the {{SparseVector}} with {code} val vec2 = SparseVector.fromCoo(3, (1, 1.0)) {code} should fix your problem. The underlying problem is that the Scala compiler cannot cast a tuple of (Int, Int) to (Int, Double) even though Int values are a subset of Double. We could add methods which do this for us, though. Cannot create SparseVector with only one non-zero element. -- Key: FLINK-1937 URL: https://issues.apache.org/jira/browse/FLINK-1937 Project: Flink Issue Type: Bug Components: Machine Learning Library Reporter: Chiwan Park Assignee: Till Rohrmann Labels: ML I tried creating SparseVector with only one non-zero element. But I couldn't create it. Following code causes the problem. {code} val vec2 = SparseVector.fromCOO(3, (1, 1)) {code} I got a compile error following: {code:none} Error:(60, 29) overloaded method value fromCOO with alternatives: (size: Int,entries: Iterable[(Int, Double)])org.apache.flink.ml.math.SparseVector and (size: Int,entries: (Int, Double)*)org.apache.flink.ml.math.SparseVector cannot be applied to (Int, (Int, Int)) val vec2 = SparseVector.fromCOO(3, (1, 1)) ^ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1950) Increase default heap cutoff ratio from 20% to 30% and move default value to ConfigConstants
Robert Metzger created FLINK-1950: - Summary: Increase default heap cutoff ratio from 20% to 30% and move default value to ConfigConstants Key: FLINK-1950 URL: https://issues.apache.org/jira/browse/FLINK-1950 Project: Flink Issue Type: Task Components: YARN Client Affects Versions: 0.9 Reporter: Robert Metzger Priority: Blocker It seems that too many production users are facing issues with YARN killing containers due to resource overusage. We can mitigate the issue by using only 70% of the specified memory for the heap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1950) Increase default heap cutoff ratio from 20% to 30% and move default value to ConfigConstants
[ https://issues.apache.org/jira/browse/FLINK-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514232#comment-14514232 ] Robert Metzger commented on FLINK-1950: --- Spark is removing 10%, but at least 384 mb. They got to the value experimentally (thats my approach as well ;) https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L92 Increase default heap cutoff ratio from 20% to 30% and move default value to ConfigConstants Key: FLINK-1950 URL: https://issues.apache.org/jira/browse/FLINK-1950 Project: Flink Issue Type: Task Components: YARN Client Affects Versions: 0.9 Reporter: Robert Metzger Priority: Blocker It seems that too many production users are facing issues with YARN killing containers due to resource overusage. We can mitigate the issue by using only 70% of the specified memory for the heap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1820] CSVReader: In case of an empty st...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29147205 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java --- @@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, I final int delimLimit = limit-delimiter.length+1; + if (bytes.length == 0) { --- End diff -- I see. This is probably because an empty test string causes the test to call the parser with an 0-length array. We could add a dedicated `testEmptyField` test method to the `ParserTestBase` and remove the empty Strings from the set of invalid inputs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1950) Increase default heap cutoff ratio from 20% to 30% and move default value to ConfigConstants
[ https://issues.apache.org/jira/browse/FLINK-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514215#comment-14514215 ] Stephan Ewen commented on FLINK-1950: - 20% and 30% are both rather random magic numbers. Is there a better way to do this? Increase default heap cutoff ratio from 20% to 30% and move default value to ConfigConstants Key: FLINK-1950 URL: https://issues.apache.org/jira/browse/FLINK-1950 Project: Flink Issue Type: Task Components: YARN Client Affects Versions: 0.9 Reporter: Robert Metzger Priority: Blocker It seems that too many production users are facing issues with YARN killing containers due to resource overusage. We can mitigate the issue by using only 70% of the specified memory for the heap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514171#comment-14514171 ] ASF GitHub Bot commented on FLINK-1820: --- Github user FelixNeutatz commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29148302 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java --- @@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, I final int delimLimit = limit-delimiter.length+1; + if (bytes.length == 0) { --- End diff -- Sounds good (y) Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)